Miscellaneous
This section discusses some technique details of TBPLaS, including choosing the backend for Hamiltonian
diagonalization, switching between CPU and GPU, and load existing data files. We will use the terms
compile-time and run-time intensively. Here compile time indicates the compilation stage
described in Install, while run time means the stage where we build the models and run the
calculations. To switching between math libraries and CPU/GPU, the corresponding features must be
enabled at compile-time, and some parameters need to be set at run-time. Herein, we briefly describe
the technique design, while the detailed examples of the parameters can be found in
tbplas-cpp/samples/speedtest.
Backend for diagonalization
The term backend is an abstraction and unified wrapper for the diagonalization algorithms provided by different math library vendors.TBPLaS supports three kinds of diagonalization backends:
backend using the built-in diagonalization solver of the
eigenlibrary, e.g.,SelfAdjointEigenSolverandGeneralizedSelfAdjointEigenSolver, which can call LAPACKE under the hoodbackend calling LAPACKE directly without the
eigenlibrarysolver based on FEAST library which supports selective diagonalization of specific energy range
The eigen backend is always enabled at compile-time, while the LAPACKE and FEAST backends is disabled
by default. To enable the LAPACKE backend, set the DIAG_BACKEND option to some LAPACKE vendor. For
example, -DDIAG=openblas enables the LAPACKE backend based on OpenBLAS, while -DDIAG=mkl enables
the usage of MKL, etc. The FEAST backend must be enabled by the WITH_FEAST option, e.g.,
-DWITH_FEAST=on. Since FEAST requires LAPACKE itself, the LAPACKE backend should be also be enabled.
Maintaining the dependencies and conflicts of math libraries, is a tedious task. Yet we managed to
resolve the issues. See tbplas-cpp/cmake/TBPLASMathLib.cmake for more details if you are curious.
If the backends have been enabled at compile-time, then they can be switched on by the diag_algo
argument at run-time. Taking tbplas-cpp/samples/speedtest/diag.py as example:
DIAG_ALGO = "eigen"
def test_diag_bands():
    model = tb.make_graphene_diamond()
    if WITH_OVERLAP:
        overlap = make_overlap(model)
        solver = tb.DiagSolver(model, overlap)
    else:
        solver = tb.DiagSolver(model)
    k_points = np.array([
        [0.0, 0.0, 0.0],
        [2./3, 1./3, 0.0],
        [0.5, 0.0, 0.0],
        [0.0, 0.0, 0.0]
    ])
    k_path, k_idx = tb.gen_kpath(k_points, (1000, 1000, 1000))
    solver.config.prefix = "graphene"
    solver.config.k_points = k_path
    solver.config.diag_algo = DIAG_ALGO
    solver.config.convention = CONVENTION
    timer = tb.Timer()
    timer.tic("bands")
    k_len, bands = solver.calc_bands()
    timer.toc("bands")
    if solver.is_master:
        timer.report_total_time()
        vis = tb.Visualizer()
        vis.plot_bands(k_len, bands, k_idx, ["G", "K", "M", "G"])
Setting the global variable DIAG_ALGO to eigen switches on the default eigen backend, while
lapacke enables the LAPACKE backend, respectively. The backend in use is printed to the stdout
during calculation:
Parallelization details:
MPI processes    : 1
OMP_NUM_THREADS  : n/a
MKL_NUM_THREADS  : n/a
Output details:
Directory  : ./
Prefix     : graphene
Using Eigen backend for diagonalization.
        bands :    0.16382
Parallelization details:
MPI processes    : 1
OMP_NUM_THREADS  : n/a
MKL_NUM_THREADS  : n/a
Output details:
Directory  : ./
Prefix     : graphene
Using LAPACKE backend for diagonalization.
        bands :    0.01992
To utilize the FEAST backend at run-time, set diag_algo to feast_dense or feast_sparse
depending on the model size. Some other parameters of FEAST library also need to be set. Taking
tbplas-cpp/samples/speedtest/diag.py for example:
def test_diag_bands_feast():
    model = tb.make_graphene_diamond()
    model = tb.extend_prim_cell(model, dim=(3, 3, 1))
    if WITH_OVERLAP:
        overlap = make_overlap(model)
        solver = tb.DiagSolver(model, overlap)
    else:
        solver = tb.DiagSolver(model)
    k_points = np.array([
        [0.0, 0.0, 0.0],
        [2./3, 1./3, 0.0],
        [0.5, 0.0, 0.0],
        [0.0, 0.0, 0.0]
    ])
    k_path, k_idx = tb.gen_kpath(k_points, (50, 50, 50))
    solver.config.prefix = "graphene"
    solver.config.k_points = k_path
    # Use FEAST solver to evaluate all the states within [-10, 10] eV.
    solver.config.diag_algo = FEAST_DIAG_ALGO
    solver.config.feast_e_min = -10
    solver.config.feast_e_max = 10
    solver.config.feast_num_states = model.num_orb
    # Not working on n02 since it calls the MKL version of FEAST, which do not
    # support negative fpm[0]. Don't know why.
    # solver.config.feast_fpm[0] = -1
    # feast sparse conflicts with reuse_results at some k-point.
    solver.config.feast_reuse_results = False
    timer = tb.Timer()
    timer.tic("bands")
    k_len, bands = solver.calc_bands()
    timer.toc("bands")
    if solver.is_master:
        timer.report_total_time()
        vis = tb.Visualizer()
        vis.plot_bands(k_len, bands, k_idx, ["G", "K", "M", "G"])
The fest_e_min and feast_e_max arguments set the energy range for searching for eigenstates.
feast_num_states defines the number of eigenstates to search, where we use all the eigenstates
in the example. The feast_reuse_results argument reuses the eigenstates from last run, and may
boost the calculation in some cases. The output should look like:
Parallelization details:
MPI processes    : 1
OMP_NUM_THREADS  : n/a
MKL_NUM_THREADS  : n/a
Output details:
Directory  : ./
Prefix     : graphene
Using FEAST dense backend for diagonalization.
        bands :    3.18614
GPU computation
To use GPU as the computing device, the WITH_CUDA option must set to on at compile-time.
Then the algo attribute of config of TBPMSolver must be set to gpu at run-time.
Taking tbplas-cpp/samples/speedtest/tbpm.py as the example:
# Algorithm of TBPM, should be in cpu, cpu_fast and gpu
ALGO = "cpu"
def test_dos():
    # Model
    t = -2.7
    a = 0.142
    if QUICK_TEST:
        dim = (1024, 1024, 1)
    else:
        dim = (4096, 4096, 1)
    if BUILD_SAMPLE:
        model = make_graphene_sample(t, a, with_onsite=WITH_ONSITE, dim=dim)
    else:
        model = make_graphene(t, a, with_onsite=WITH_ONSITE, dim=dim)
    # Parameters
    solver = tb.TBPMSolver(model)
    solver.config.num_random_samples = 1
    solver.config.rescale = 9.0
    solver.config.ldos = False
    solver.config.ldos_orbital_indices = { 1 }
    solver.config.num_time_steps = 1024
    solver.config.dimension = 2
    solver.config.algo = ALGO
    # Calculation
    timer = tb.Timer()
    timer.tic("corr_dos")
    corr_dos = solver.calc_corr_dos()
    timer.toc("corr_dos")
    # Output
    if solver.is_master:
        timer.report_total_time()
        analyzer = tb.Analyzer(f"{solver.config.prefix}_info.dat")
        eng, dos = analyzer.calc_dos(corr_dos)
        save_xy("dos_py.dat", eng, dos)
Setting the global variable ALGO to gpu enables the usage of GPU. The output should look like:
Parallelization details:
MPI processes    : 1
OMP_NUM_THREADS  : n/a
MKL_NUM_THREADS  : n/a
Output details:
Directory  : ./
Prefix     : sample
Using TBPMGPU backend
Device 0: "NVIDIA GeForce GTX 1050 Ti"
CUDA Driver Version / Runtime Version          12.8 / 12.8
CUDA Capability Major/Minor version number:    6.1
(006) Multiprocessors, (128) CUDA Cores/MP:    768 CUDA Cores
GPU Max Clock rate:                            1418 MHz (1.42 GHz)
Memory Clock rate:                             3504 Mhz
Memory Bus Width:                              128-bit
L2 Cache Size:                                 1048576 bytes
... ...
Calculating DOS correlation function.
Sample 1 of 1
Finished timestep      64 of    1024          12.436 sec.
Finished timestep     128 of    1024          24.934 sec.
... ...
which is similar to the CPU case, with additional information of the GPU device.
Reading existing data
There are many occasions where we need to load data from previous calculations, e.g., post-processing
for different purposes or plotting data in different styles. TBPLaS offers a complete set of functions
for loading the data from disk, as defined in tbplas/tbplas/diag/io.py and
tbplas/tbplas/tbpm/io.py, for loading diagonalization and TBPM data files, respectively. The usage
of these functions can be found in tbplas-cpp/samples/speedtest/plot_diag.py and
tbplas-cpp/samples/speedtest/plot_tbpm.py. For example, the plot_dos function in plot_diag.py
calls load_dos() function to load DOS:
def plot_dos(prefix: str):
    energy, dos = tb.load_dos(prefix)
    plt.plot(energy, dos)
    plt.grid()
    plt.show()
while the plot_dos function in plot_tbpm.py calls load_corr_dos() to load the correlation
functions:
def plot_dos(prefix: str, analyzer: tb.Analyzer, vis: tb.Visualizer) -> None:
    corr = tb.load_corr_dos(prefix)
    energies, dos = analyzer.calc_dos(corr)
    vis.plot_dos(energies, dos)
    save_xy("dos_cpp.dat", energies, dos)
The data files are identified by the prefix argument, which must be consistent with that in
diagonalization or TBPM calculations.