Miscellaneous

This section discusses some technique details of TBPLaS, including choosing the backend for Hamiltonian diagonalization, switching between CPU and GPU, and load existing data files. We will use the terms compile-time and run-time intensively. Here compile time indicates the compilation stage described in Install, while run time means the stage where we build the models and run the calculations. To switching between math libraries and CPU/GPU, the corresponding features must be enabled at compile-time, and some parameters need to be set at run-time. Herein, we briefly describe the technique design, while the detailed examples of the parameters can be found in tbplas-cpp/samples/speedtest.

Backend for diagonalization

The term backend is an abstraction and unified wrapper for the diagonalization algorithms provided by different math library vendors.TBPLaS supports three kinds of diagonalization backends:

  • backend using the built-in diagonalization solver of the eigen library, e.g., SelfAdjointEigenSolver and GeneralizedSelfAdjointEigenSolver, which can call LAPACKE under the hood

  • backend calling LAPACKE directly without the eigen library

  • solver based on FEAST library which supports selective diagonalization of specific energy range

The eigen backend is always enabled at compile-time, while the LAPACKE and FEAST backends is disabled by default. To enable the LAPACKE backend, set the DIAG_BACKEND option to some LAPACKE vendor. For example, -DDIAG=openblas enables the LAPACKE backend based on OpenBLAS, while -DDIAG=mkl enables the usage of MKL, etc. The FEAST backend must be enabled by the WITH_FEAST option, e.g., -DWITH_FEAST=on. Since FEAST requires LAPACKE itself, the LAPACKE backend should be also be enabled. Maintaining the dependencies and conflicts of math libraries, is a tedious task. Yet we managed to resolve the issues. See tbplas-cpp/cmake/TBPLASMathLib.cmake for more details if you are curious.

If the backends have been enabled at compile-time, then they can be switched on by the diag_algo argument at run-time. Taking tbplas-cpp/samples/speedtest/diag.py as example:

DIAG_ALGO = "eigen"

def test_diag_bands():
    model = tb.make_graphene_diamond()
    if WITH_OVERLAP:
        overlap = make_overlap(model)
        solver = tb.DiagSolver(model, overlap)
    else:
        solver = tb.DiagSolver(model)
    k_points = np.array([
        [0.0, 0.0, 0.0],
        [2./3, 1./3, 0.0],
        [0.5, 0.0, 0.0],
        [0.0, 0.0, 0.0]
    ])
    k_path, k_idx = tb.gen_kpath(k_points, (1000, 1000, 1000))
    solver.config.prefix = "graphene"
    solver.config.k_points = k_path
    solver.config.diag_algo = DIAG_ALGO
    solver.config.convention = CONVENTION

    timer = tb.Timer()
    timer.tic("bands")
    k_len, bands = solver.calc_bands()
    timer.toc("bands")

    if solver.is_master:
        timer.report_total_time()
        vis = tb.Visualizer()
        vis.plot_bands(k_len, bands, k_idx, ["G", "K", "M", "G"])

Setting the global variable DIAG_ALGO to eigen switches on the default eigen backend, while lapacke enables the LAPACKE backend, respectively. The backend in use is printed to the stdout during calculation:

Parallelization details:
MPI processes    : 1
OMP_NUM_THREADS  : n/a
MKL_NUM_THREADS  : n/a

Output details:
Directory  : ./
Prefix     : graphene

Using Eigen backend for diagonalization.
        bands :    0.16382
Parallelization details:
MPI processes    : 1
OMP_NUM_THREADS  : n/a
MKL_NUM_THREADS  : n/a

Output details:
Directory  : ./
Prefix     : graphene

Using LAPACKE backend for diagonalization.
        bands :    0.01992

To utilize the FEAST backend at run-time, set diag_algo to feast_dense or feast_sparse depending on the model size. Some other parameters of FEAST library also need to be set. Taking tbplas-cpp/samples/speedtest/diag.py for example:

def test_diag_bands_feast():
    model = tb.make_graphene_diamond()
    model = tb.extend_prim_cell(model, dim=(3, 3, 1))
    if WITH_OVERLAP:
        overlap = make_overlap(model)
        solver = tb.DiagSolver(model, overlap)
    else:
        solver = tb.DiagSolver(model)
    k_points = np.array([
        [0.0, 0.0, 0.0],
        [2./3, 1./3, 0.0],
        [0.5, 0.0, 0.0],
        [0.0, 0.0, 0.0]
    ])
    k_path, k_idx = tb.gen_kpath(k_points, (50, 50, 50))
    solver.config.prefix = "graphene"
    solver.config.k_points = k_path

    # Use FEAST solver to evaluate all the states within [-10, 10] eV.
    solver.config.diag_algo = FEAST_DIAG_ALGO
    solver.config.feast_e_min = -10
    solver.config.feast_e_max = 10
    solver.config.feast_num_states = model.num_orb
    # Not working on n02 since it calls the MKL version of FEAST, which do not
    # support negative fpm[0]. Don't know why.
    # solver.config.feast_fpm[0] = -1
    # feast sparse conflicts with reuse_results at some k-point.
    solver.config.feast_reuse_results = False

    timer = tb.Timer()
    timer.tic("bands")
    k_len, bands = solver.calc_bands()
    timer.toc("bands")

    if solver.is_master:
        timer.report_total_time()
        vis = tb.Visualizer()
        vis.plot_bands(k_len, bands, k_idx, ["G", "K", "M", "G"])

The fest_e_min and feast_e_max arguments set the energy range for searching for eigenstates. feast_num_states defines the number of eigenstates to search, where we use all the eigenstates in the example. The feast_reuse_results argument reuses the eigenstates from last run, and may boost the calculation in some cases. The output should look like:

Parallelization details:
MPI processes    : 1
OMP_NUM_THREADS  : n/a
MKL_NUM_THREADS  : n/a

Output details:
Directory  : ./
Prefix     : graphene

Using FEAST dense backend for diagonalization.
        bands :    3.18614

GPU computation

To use GPU as the computing device, the WITH_CUDA option must set to on at compile-time. Then the algo attribute of config of TBPMSolver must be set to gpu at run-time. Taking tbplas-cpp/samples/speedtest/tbpm.py as the example:

# Algorithm of TBPM, should be in cpu, cpu_fast and gpu
ALGO = "cpu"

def test_dos():
    # Model
    t = -2.7
    a = 0.142
    if QUICK_TEST:
        dim = (1024, 1024, 1)
    else:
        dim = (4096, 4096, 1)
    if BUILD_SAMPLE:
        model = make_graphene_sample(t, a, with_onsite=WITH_ONSITE, dim=dim)
    else:
        model = make_graphene(t, a, with_onsite=WITH_ONSITE, dim=dim)

    # Parameters
    solver = tb.TBPMSolver(model)
    solver.config.num_random_samples = 1
    solver.config.rescale = 9.0
    solver.config.ldos = False
    solver.config.ldos_orbital_indices = { 1 }
    solver.config.num_time_steps = 1024
    solver.config.dimension = 2
    solver.config.algo = ALGO

    # Calculation
    timer = tb.Timer()
    timer.tic("corr_dos")
    corr_dos = solver.calc_corr_dos()
    timer.toc("corr_dos")

    # Output
    if solver.is_master:
        timer.report_total_time()
        analyzer = tb.Analyzer(f"{solver.config.prefix}_info.dat")
        eng, dos = analyzer.calc_dos(corr_dos)
        save_xy("dos_py.dat", eng, dos)

Setting the global variable ALGO to gpu enables the usage of GPU. The output should look like:

Parallelization details:
MPI processes    : 1
OMP_NUM_THREADS  : n/a
MKL_NUM_THREADS  : n/a

Output details:
Directory  : ./
Prefix     : sample

Using TBPMGPU backend

Device 0: "NVIDIA GeForce GTX 1050 Ti"
CUDA Driver Version / Runtime Version          12.8 / 12.8
CUDA Capability Major/Minor version number:    6.1
(006) Multiprocessors, (128) CUDA Cores/MP:    768 CUDA Cores
GPU Max Clock rate:                            1418 MHz (1.42 GHz)
Memory Clock rate:                             3504 Mhz
Memory Bus Width:                              128-bit
L2 Cache Size:                                 1048576 bytes
... ...
Calculating DOS correlation function.
Sample 1 of 1
Finished timestep      64 of    1024          12.436 sec.
Finished timestep     128 of    1024          24.934 sec.
... ...

which is similar to the CPU case, with additional information of the GPU device.

Reading existing data

There are many occasions where we need to load data from previous calculations, e.g., post-processing for different purposes or plotting data in different styles. TBPLaS offers a complete set of functions for loading the data from disk, as defined in tbplas/tbplas/diag/io.py and tbplas/tbplas/tbpm/io.py, for loading diagonalization and TBPM data files, respectively. The usage of these functions can be found in tbplas-cpp/samples/speedtest/plot_diag.py and tbplas-cpp/samples/speedtest/plot_tbpm.py. For example, the plot_dos function in plot_diag.py calls load_dos() function to load DOS:

def plot_dos(prefix: str):
    energy, dos = tb.load_dos(prefix)
    plt.plot(energy, dos)
    plt.grid()
    plt.show()

while the plot_dos function in plot_tbpm.py calls load_corr_dos() to load the correlation functions:

def plot_dos(prefix: str, analyzer: tb.Analyzer, vis: tb.Visualizer) -> None:
    corr = tb.load_corr_dos(prefix)
    energies, dos = analyzer.calc_dos(corr)
    vis.plot_dos(energies, dos)
    save_xy("dos_cpp.dat", energies, dos)

The data files are identified by the prefix argument, which must be consistent with that in diagonalization or TBPM calculations.