.. _misc:

Miscellaneous
=============

This section discusses some technique details of TBPLaS, including choosing the backend for Hamiltonian
diagonalization, switching between CPU and GPU, and load existing data files. We will use the terms
*compile-time* and *run-time* intensively. Here *compile time* indicates the compilation stage
described in :ref:`install`, while *run time* means the stage where we build the models and run the
calculations. To switching between math libraries and CPU/GPU, the corresponding features must be
enabled at compile-time, and some parameters need to be set at run-time. Herein, we briefly describe
the technique design, while the detailed examples of the parameters can be found in
``tbplas-cpp/samples/speedtest``.

Backend for diagonalization
---------------------------

The term *backend* is an abstraction and unified wrapper for the diagonalization algorithms provided
by different math library vendors.TBPLaS supports three kinds of diagonalization backends:

* backend using the built-in diagonalization solver of the ``eigen`` library, e.g.,
  ``SelfAdjointEigenSolver`` and ``GeneralizedSelfAdjointEigenSolver``, which can call LAPACKE under
  the hood
* backend calling LAPACKE directly without the ``eigen`` library
* solver based on FEAST library which supports selective diagonalization of specific energy range

The ``eigen`` backend is always enabled at compile-time, while the LAPACKE and FEAST backends is disabled
by default. To enable the LAPACKE backend, set the ``DIAG_BACKEND`` option to some LAPACKE vendor. For
example, ``-DDIAG=openblas`` enables the LAPACKE backend based on OpenBLAS, while ``-DDIAG=mkl`` enables
the usage of MKL, etc. The FEAST backend must be enabled by the ``WITH_FEAST`` option, e.g.,
``-DWITH_FEAST=on``. Since FEAST requires LAPACKE itself, the LAPACKE backend should be also be enabled.
Maintaining the dependencies and conflicts of math libraries, is a tedious task. Yet we managed to
resolve the issues. See ``tbplas-cpp/cmake/TBPLASMathLib.cmake`` for more details if you are curious.

If the backends have been enabled at compile-time, then they can be switched on by the ``diag_algo``
argument at run-time. Taking ``tbplas-cpp/samples/speedtest/diag.py`` as example:

.. code-block:: python

    DIAG_ALGO = "eigen"

    def test_diag_bands():
        model = tb.make_graphene_diamond()
        if WITH_OVERLAP:
            overlap = make_overlap(model)
            solver = tb.DiagSolver(model, overlap)
        else:
            solver = tb.DiagSolver(model)
        k_points = np.array([
            [0.0, 0.0, 0.0],
            [2./3, 1./3, 0.0],
            [0.5, 0.0, 0.0],
            [0.0, 0.0, 0.0]
        ])
        k_path, k_idx = tb.gen_kpath(k_points, (1000, 1000, 1000))
        solver.config.prefix = "graphene"
        solver.config.k_points = k_path
        solver.config.diag_algo = DIAG_ALGO
        solver.config.convention = CONVENTION

        timer = tb.Timer()
        timer.tic("bands")
        k_len, bands = solver.calc_bands()
        timer.toc("bands")

        if solver.is_master:
            timer.report_total_time()
            vis = tb.Visualizer()
            vis.plot_bands(k_len, bands, k_idx, ["G", "K", "M", "G"])

Setting the global variable ``DIAG_ALGO`` to ``eigen`` switches on the default eigen backend, while
``lapacke`` enables the LAPACKE backend, respectively. The backend in use is printed to the stdout
during calculation:

.. code-block:: text

    Parallelization details:
    MPI processes    : 1     
    OMP_NUM_THREADS  : n/a   
    MKL_NUM_THREADS  : n/a   

    Output details:
    Directory  : ./
    Prefix     : graphene

    Using Eigen backend for diagonalization.
            bands :    0.16382

.. code-block:: text

    Parallelization details:
    MPI processes    : 1     
    OMP_NUM_THREADS  : n/a   
    MKL_NUM_THREADS  : n/a   

    Output details:
    Directory  : ./
    Prefix     : graphene

    Using LAPACKE backend for diagonalization.
            bands :    0.01992

To utilize the FEAST backend at run-time, set ``diag_algo`` to ``feast_dense`` or ``feast_sparse``
depending on the model size. Some other parameters of FEAST library also need to be set. Taking
``tbplas-cpp/samples/speedtest/diag.py`` for example:

.. code-block:: python

    def test_diag_bands_feast():
        model = tb.make_graphene_diamond()
        model = tb.extend_prim_cell(model, dim=(3, 3, 1)) 
        if WITH_OVERLAP:
            overlap = make_overlap(model)
            solver = tb.DiagSolver(model, overlap)
        else:
            solver = tb.DiagSolver(model)
        k_points = np.array([
            [0.0, 0.0, 0.0],
            [2./3, 1./3, 0.0],
            [0.5, 0.0, 0.0],
            [0.0, 0.0, 0.0]
        ])  
        k_path, k_idx = tb.gen_kpath(k_points, (50, 50, 50))
        solver.config.prefix = "graphene"
        solver.config.k_points = k_path

        # Use FEAST solver to evaluate all the states within [-10, 10] eV.
        solver.config.diag_algo = FEAST_DIAG_ALGO
        solver.config.feast_e_min = -10 
        solver.config.feast_e_max = 10
        solver.config.feast_num_states = model.num_orb
        # Not working on n02 since it calls the MKL version of FEAST, which do not
        # support negative fpm[0]. Don't know why.
        # solver.config.feast_fpm[0] = -1
        # feast sparse conflicts with reuse_results at some k-point.
        solver.config.feast_reuse_results = False

        timer = tb.Timer()
        timer.tic("bands")
        k_len, bands = solver.calc_bands()
        timer.toc("bands")

        if solver.is_master:
            timer.report_total_time()
            vis = tb.Visualizer()
            vis.plot_bands(k_len, bands, k_idx, ["G", "K", "M", "G"])

The ``fest_e_min`` and ``feast_e_max`` arguments set the energy range for searching for eigenstates.
``feast_num_states`` defines the number of eigenstates to search, where we use all the eigenstates
in the example. The ``feast_reuse_results`` argument reuses the eigenstates from last run, and may
boost the calculation in some cases. The output should look like:

.. code-block:: text

    Parallelization details:
    MPI processes    : 1     
    OMP_NUM_THREADS  : n/a   
    MKL_NUM_THREADS  : n/a   

    Output details:
    Directory  : ./
    Prefix     : graphene

    Using FEAST dense backend for diagonalization.
            bands :    3.18614


GPU computation
---------------

To use GPU as the computing device, the ``WITH_CUDA`` option must set to ``on`` at compile-time.
Then the ``algo`` attribute of ``config`` of :class:`.TBPMSolver` must be set to ``gpu`` at run-time.
Taking ``tbplas-cpp/samples/speedtest/tbpm.py`` as the example:

.. code-block:: python

    # Algorithm of TBPM, should be in cpu, cpu_fast and gpu
    ALGO = "cpu"

    def test_dos():
        # Model
        t = -2.7
        a = 0.142
        if QUICK_TEST:
            dim = (1024, 1024, 1)
        else:
            dim = (4096, 4096, 1)
        if BUILD_SAMPLE:
            model = make_graphene_sample(t, a, with_onsite=WITH_ONSITE, dim=dim)
        else:
            model = make_graphene(t, a, with_onsite=WITH_ONSITE, dim=dim)

        # Parameters
        solver = tb.TBPMSolver(model)
        solver.config.num_random_samples = 1
        solver.config.rescale = 9.0
        solver.config.ldos = False
        solver.config.ldos_orbital_indices = { 1 }
        solver.config.num_time_steps = 1024
        solver.config.dimension = 2
        solver.config.algo = ALGO

        # Calculation
        timer = tb.Timer()
        timer.tic("corr_dos")
        corr_dos = solver.calc_corr_dos()
        timer.toc("corr_dos")

        # Output
        if solver.is_master:
            timer.report_total_time()
            analyzer = tb.Analyzer(f"{solver.config.prefix}_info.dat")
            eng, dos = analyzer.calc_dos(corr_dos)
            save_xy("dos_py.dat", eng, dos)

Setting the global variable ``ALGO`` to ``gpu`` enables the usage of GPU. The output should look like:

.. code-block:: text

    Parallelization details:
    MPI processes    : 1     
    OMP_NUM_THREADS  : n/a   
    MKL_NUM_THREADS  : n/a   

    Output details:
    Directory  : ./
    Prefix     : sample

    Using TBPMGPU backend

    Device 0: "NVIDIA GeForce GTX 1050 Ti"
    CUDA Driver Version / Runtime Version          12.8 / 12.8
    CUDA Capability Major/Minor version number:    6.1
    (006) Multiprocessors, (128) CUDA Cores/MP:    768 CUDA Cores
    GPU Max Clock rate:                            1418 MHz (1.42 GHz)
    Memory Clock rate:                             3504 Mhz
    Memory Bus Width:                              128-bit
    L2 Cache Size:                                 1048576 bytes
    ... ...
    Calculating DOS correlation function.
    Sample 1 of 1
    Finished timestep      64 of    1024          12.436 sec.
    Finished timestep     128 of    1024          24.934 sec.
    ... ...

which is similar to the CPU case, with additional information of the GPU device.


Reading existing data
---------------------

There are many occasions where we need to load data from previous calculations, e.g., post-processing
for different purposes or plotting data in different styles. TBPLaS offers a complete set of functions
for loading the data from disk, as defined in ``tbplas/tbplas/diag/io.py`` and
``tbplas/tbplas/tbpm/io.py``, for loading diagonalization and TBPM data files, respectively. The usage
of these functions can be found in ``tbplas-cpp/samples/speedtest/plot_diag.py`` and
``tbplas-cpp/samples/speedtest/plot_tbpm.py``. For example, the ``plot_dos`` function in ``plot_diag.py``
calls :func:`.load_dos` function to load DOS:

.. code-block:: python

    def plot_dos(prefix: str):
        energy, dos = tb.load_dos(prefix)
        plt.plot(energy, dos)
        plt.grid()
        plt.show()

while the ``plot_dos`` function in ``plot_tbpm.py`` calls :func:`.load_corr_dos` to load the correlation
functions:

.. code-block:: python

    def plot_dos(prefix: str, analyzer: tb.Analyzer, vis: tb.Visualizer) -> None:
        corr = tb.load_corr_dos(prefix)
        energies, dos = analyzer.calc_dos(corr)
        vis.plot_dos(energies, dos)
        save_xy("dos_cpp.dat", energies, dos)

The data files are identified by the ``prefix`` argument, which must be consistent with that in
diagonalization or TBPM calculations.