Miscellaneous
This section discusses some technique details of TBPLaS, including choosing the backend for Hamiltonian
diagonalization, switching between CPU and GPU, and load existing data files. We will use the terms
compile-time and run-time intensively. Here compile time indicates the compilation stage
described in Install, while run time means the stage where we build the models and run the
calculations. To switching between math libraries and CPU/GPU, the corresponding features must be
enabled at compile-time, and some parameters need to be set at run-time. Herein, we briefly describe
the technique design, while the detailed examples of the parameters can be found in
tbplas-cpp/samples/speedtest
.
Backend for diagonalization
The term backend is an abstraction and unified wrapper for the diagonalization algorithms provided by different math library vendors.TBPLaS supports three kinds of diagonalization backends:
backend using the built-in diagonalization solver of the
eigen
library, e.g.,SelfAdjointEigenSolver
andGeneralizedSelfAdjointEigenSolver
, which can call LAPACKE under the hoodbackend calling LAPACKE directly without the
eigen
librarysolver based on FEAST library which supports selective diagonalization of specific energy range
The eigen
backend is always enabled at compile-time, while the LAPACKE and FEAST backends is disabled
by default. To enable the LAPACKE backend, set the DIAG_BACKEND
option to some LAPACKE vendor. For
example, -DDIAG=openblas
enables the LAPACKE backend based on OpenBLAS, while -DDIAG=mkl
enables
the usage of MKL, etc. The FEAST backend must be enabled by the WITH_FEAST
option, e.g.,
-DWITH_FEAST=on
. Since FEAST requires LAPACKE itself, the LAPACKE backend should be also be enabled.
Maintaining the dependencies and conflicts of math libraries, is a tedious task. Yet we managed to
resolve the issues. See tbplas-cpp/cmake/TBPLASMathLib.cmake
for more details if you are curious.
If the backends have been enabled at compile-time, then they can be switched on by the diag_algo
argument at run-time. Taking tbplas-cpp/samples/speedtest/diag.py
as example:
DIAG_ALGO = "eigen"
def test_diag_bands():
model = tb.make_graphene_diamond()
if WITH_OVERLAP:
overlap = make_overlap(model)
solver = tb.DiagSolver(model, overlap)
else:
solver = tb.DiagSolver(model)
k_points = np.array([
[0.0, 0.0, 0.0],
[2./3, 1./3, 0.0],
[0.5, 0.0, 0.0],
[0.0, 0.0, 0.0]
])
k_path, k_idx = tb.gen_kpath(k_points, (1000, 1000, 1000))
solver.config.prefix = "graphene"
solver.config.k_points = k_path
solver.config.diag_algo = DIAG_ALGO
solver.config.convention = CONVENTION
timer = tb.Timer()
timer.tic("bands")
k_len, bands = solver.calc_bands()
timer.toc("bands")
if solver.is_master:
timer.report_total_time()
vis = tb.Visualizer()
vis.plot_bands(k_len, bands, k_idx, ["G", "K", "M", "G"])
Setting the global variable DIAG_ALGO
to eigen
switches on the default eigen backend, while
lapacke
enables the LAPACKE backend, respectively. The backend in use is printed to the stdout
during calculation:
Parallelization details:
MPI processes : 1
OMP_NUM_THREADS : n/a
MKL_NUM_THREADS : n/a
Output details:
Directory : ./
Prefix : graphene
Using Eigen backend for diagonalization.
bands : 0.16382
Parallelization details:
MPI processes : 1
OMP_NUM_THREADS : n/a
MKL_NUM_THREADS : n/a
Output details:
Directory : ./
Prefix : graphene
Using LAPACKE backend for diagonalization.
bands : 0.01992
To utilize the FEAST backend at run-time, set diag_algo
to feast_dense
or feast_sparse
depending on the model size. Some other parameters of FEAST library also need to be set. Taking
tbplas-cpp/samples/speedtest/diag.py
for example:
def test_diag_bands_feast():
model = tb.make_graphene_diamond()
model = tb.extend_prim_cell(model, dim=(3, 3, 1))
if WITH_OVERLAP:
overlap = make_overlap(model)
solver = tb.DiagSolver(model, overlap)
else:
solver = tb.DiagSolver(model)
k_points = np.array([
[0.0, 0.0, 0.0],
[2./3, 1./3, 0.0],
[0.5, 0.0, 0.0],
[0.0, 0.0, 0.0]
])
k_path, k_idx = tb.gen_kpath(k_points, (50, 50, 50))
solver.config.prefix = "graphene"
solver.config.k_points = k_path
# Use FEAST solver to evaluate all the states within [-10, 10] eV.
solver.config.diag_algo = FEAST_DIAG_ALGO
solver.config.feast_e_min = -10
solver.config.feast_e_max = 10
solver.config.feast_num_states = model.num_orb
# Not working on n02 since it calls the MKL version of FEAST, which do not
# support negative fpm[0]. Don't know why.
# solver.config.feast_fpm[0] = -1
# feast sparse conflicts with reuse_results at some k-point.
solver.config.feast_reuse_results = False
timer = tb.Timer()
timer.tic("bands")
k_len, bands = solver.calc_bands()
timer.toc("bands")
if solver.is_master:
timer.report_total_time()
vis = tb.Visualizer()
vis.plot_bands(k_len, bands, k_idx, ["G", "K", "M", "G"])
The fest_e_min
and feast_e_max
arguments set the energy range for searching for eigenstates.
feast_num_states
defines the number of eigenstates to search, where we use all the eigenstates
in the example. The feast_reuse_results
argument reuses the eigenstates from last run, and may
boost the calculation in some cases. The output should look like:
Parallelization details:
MPI processes : 1
OMP_NUM_THREADS : n/a
MKL_NUM_THREADS : n/a
Output details:
Directory : ./
Prefix : graphene
Using FEAST dense backend for diagonalization.
bands : 3.18614
GPU computation
To use GPU as the computing device, the WITH_CUDA
option must set to on
at compile-time.
Then the algo
attribute of config
of TBPMSolver
must be set to gpu
at run-time.
Taking tbplas-cpp/samples/speedtest/tbpm.py
as the example:
# Algorithm of TBPM, should be in cpu, cpu_fast and gpu
ALGO = "cpu"
def test_dos():
# Model
t = -2.7
a = 0.142
if QUICK_TEST:
dim = (1024, 1024, 1)
else:
dim = (4096, 4096, 1)
if BUILD_SAMPLE:
model = make_graphene_sample(t, a, with_onsite=WITH_ONSITE, dim=dim)
else:
model = make_graphene(t, a, with_onsite=WITH_ONSITE, dim=dim)
# Parameters
solver = tb.TBPMSolver(model)
solver.config.num_random_samples = 1
solver.config.rescale = 9.0
solver.config.ldos = False
solver.config.ldos_orbital_indices = { 1 }
solver.config.num_time_steps = 1024
solver.config.dimension = 2
solver.config.algo = ALGO
# Calculation
timer = tb.Timer()
timer.tic("corr_dos")
corr_dos = solver.calc_corr_dos()
timer.toc("corr_dos")
# Output
if solver.is_master:
timer.report_total_time()
analyzer = tb.Analyzer(f"{solver.config.prefix}_info.dat")
eng, dos = analyzer.calc_dos(corr_dos)
save_xy("dos_py.dat", eng, dos)
Setting the global variable ALGO
to gpu
enables the usage of GPU. The output should look like:
Parallelization details:
MPI processes : 1
OMP_NUM_THREADS : n/a
MKL_NUM_THREADS : n/a
Output details:
Directory : ./
Prefix : sample
Using TBPMGPU backend
Device 0: "NVIDIA GeForce GTX 1050 Ti"
CUDA Driver Version / Runtime Version 12.8 / 12.8
CUDA Capability Major/Minor version number: 6.1
(006) Multiprocessors, (128) CUDA Cores/MP: 768 CUDA Cores
GPU Max Clock rate: 1418 MHz (1.42 GHz)
Memory Clock rate: 3504 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 1048576 bytes
... ...
Calculating DOS correlation function.
Sample 1 of 1
Finished timestep 64 of 1024 12.436 sec.
Finished timestep 128 of 1024 24.934 sec.
... ...
which is similar to the CPU case, with additional information of the GPU device.
Reading existing data
There are many occasions where we need to load data from previous calculations, e.g., post-processing
for different purposes or plotting data in different styles. TBPLaS offers a complete set of functions
for loading the data from disk, as defined in tbplas/tbplas/diag/io.py
and
tbplas/tbplas/tbpm/io.py
, for loading diagonalization and TBPM data files, respectively. The usage
of these functions can be found in tbplas-cpp/samples/speedtest/plot_diag.py
and
tbplas-cpp/samples/speedtest/plot_tbpm.py
. For example, the plot_dos
function in plot_diag.py
calls load_dos()
function to load DOS:
def plot_dos(prefix: str):
energy, dos = tb.load_dos(prefix)
plt.plot(energy, dos)
plt.grid()
plt.show()
while the plot_dos
function in plot_tbpm.py
calls load_corr_dos()
to load the correlation
functions:
def plot_dos(prefix: str, analyzer: tb.Analyzer, vis: tb.Visualizer) -> None:
corr = tb.load_corr_dos(prefix)
energies, dos = analyzer.calc_dos(corr)
vis.plot_dos(energies, dos)
save_xy("dos_cpp.dat", energies, dos)
The data files are identified by the prefix
argument, which must be consistent with that in
diagonalization or TBPM calculations.