.. _misc: Miscellaneous ============= This section discusses some technique details of TBPLaS, including choosing the backend for Hamiltonian diagonalization, switching between CPU and GPU, and load existing data files. We will use the terms *compile-time* and *run-time* intensively. Here *compile time* indicates the compilation stage described in :ref:`install`, while *run time* means the stage where we build the models and run the calculations. To switching between math libraries and CPU/GPU, the corresponding features must be enabled at compile-time, and some parameters need to be set at run-time. Herein, we briefly describe the technique design, while the detailed examples of the parameters can be found in ``tbplas-cpp/samples/speedtest``. Backend for diagonalization --------------------------- The term *backend* is an abstraction and unified wrapper for the diagonalization algorithms provided by different math library vendors.TBPLaS supports three kinds of diagonalization backends: * backend using the built-in diagonalization solver of the ``eigen`` library, e.g., ``SelfAdjointEigenSolver`` and ``GeneralizedSelfAdjointEigenSolver``, which can call LAPACKE under the hood * backend calling LAPACKE directly without the ``eigen`` library * solver based on FEAST library which supports selective diagonalization of specific energy range The ``eigen`` backend is always enabled at compile-time, while the LAPACKE and FEAST backends is disabled by default. To enable the LAPACKE backend, set the ``DIAG_BACKEND`` option to some LAPACKE vendor. For example, ``-DDIAG=openblas`` enables the LAPACKE backend based on OpenBLAS, while ``-DDIAG=mkl`` enables the usage of MKL, etc. The FEAST backend must be enabled by the ``WITH_FEAST`` option, e.g., ``-DWITH_FEAST=on``. Since FEAST requires LAPACKE itself, the LAPACKE backend should be also be enabled. Maintaining the dependencies and conflicts of math libraries, is a tedious task. Yet we managed to resolve the issues. See ``tbplas-cpp/cmake/TBPLASMathLib.cmake`` for more details if you are curious. If the backends have been enabled at compile-time, then they can be switched on by the ``diag_algo`` argument at run-time. Taking ``tbplas-cpp/samples/speedtest/diag.py`` as example: .. code-block:: python DIAG_ALGO = "eigen" def test_diag_bands(): model = tb.make_graphene_diamond() if WITH_OVERLAP: overlap = make_overlap(model) solver = tb.DiagSolver(model, overlap) else: solver = tb.DiagSolver(model) k_points = np.array([ [0.0, 0.0, 0.0], [2./3, 1./3, 0.0], [0.5, 0.0, 0.0], [0.0, 0.0, 0.0] ]) k_path, k_idx = tb.gen_kpath(k_points, (1000, 1000, 1000)) solver.config.prefix = "graphene" solver.config.k_points = k_path solver.config.diag_algo = DIAG_ALGO solver.config.convention = CONVENTION timer = tb.Timer() timer.tic("bands") k_len, bands = solver.calc_bands() timer.toc("bands") if solver.is_master: timer.report_total_time() vis = tb.Visualizer() vis.plot_bands(k_len, bands, k_idx, ["G", "K", "M", "G"]) Setting the global variable ``DIAG_ALGO`` to ``eigen`` switches on the default eigen backend, while ``lapacke`` enables the LAPACKE backend, respectively. The backend in use is printed to the stdout during calculation: .. code-block:: text Parallelization details: MPI processes : 1 OMP_NUM_THREADS : n/a MKL_NUM_THREADS : n/a Output details: Directory : ./ Prefix : graphene Using Eigen backend for diagonalization. bands : 0.16382 .. code-block:: text Parallelization details: MPI processes : 1 OMP_NUM_THREADS : n/a MKL_NUM_THREADS : n/a Output details: Directory : ./ Prefix : graphene Using LAPACKE backend for diagonalization. bands : 0.01992 To utilize the FEAST backend at run-time, set ``diag_algo`` to ``feast_dense`` or ``feast_sparse`` depending on the model size. Some other parameters of FEAST library also need to be set. Taking ``tbplas-cpp/samples/speedtest/diag.py`` for example: .. code-block:: python def test_diag_bands_feast(): model = tb.make_graphene_diamond() model = tb.extend_prim_cell(model, dim=(3, 3, 1)) if WITH_OVERLAP: overlap = make_overlap(model) solver = tb.DiagSolver(model, overlap) else: solver = tb.DiagSolver(model) k_points = np.array([ [0.0, 0.0, 0.0], [2./3, 1./3, 0.0], [0.5, 0.0, 0.0], [0.0, 0.0, 0.0] ]) k_path, k_idx = tb.gen_kpath(k_points, (50, 50, 50)) solver.config.prefix = "graphene" solver.config.k_points = k_path # Use FEAST solver to evaluate all the states within [-10, 10] eV. solver.config.diag_algo = FEAST_DIAG_ALGO solver.config.feast_e_min = -10 solver.config.feast_e_max = 10 solver.config.feast_num_states = model.num_orb # Not working on n02 since it calls the MKL version of FEAST, which do not # support negative fpm[0]. Don't know why. # solver.config.feast_fpm[0] = -1 # feast sparse conflicts with reuse_results at some k-point. solver.config.feast_reuse_results = False timer = tb.Timer() timer.tic("bands") k_len, bands = solver.calc_bands() timer.toc("bands") if solver.is_master: timer.report_total_time() vis = tb.Visualizer() vis.plot_bands(k_len, bands, k_idx, ["G", "K", "M", "G"]) The ``fest_e_min`` and ``feast_e_max`` arguments set the energy range for searching for eigenstates. ``feast_num_states`` defines the number of eigenstates to search, where we use all the eigenstates in the example. The ``feast_reuse_results`` argument reuses the eigenstates from last run, and may boost the calculation in some cases. The output should look like: .. code-block:: text Parallelization details: MPI processes : 1 OMP_NUM_THREADS : n/a MKL_NUM_THREADS : n/a Output details: Directory : ./ Prefix : graphene Using FEAST dense backend for diagonalization. bands : 3.18614 GPU computation --------------- To use GPU as the computing device, the ``WITH_CUDA`` option must set to ``on`` at compile-time. Then the ``algo`` attribute of ``config`` of :class:`.TBPMSolver` must be set to ``gpu`` at run-time. Taking ``tbplas-cpp/samples/speedtest/tbpm.py`` as the example: .. code-block:: python # Algorithm of TBPM, should be in cpu, cpu_fast and gpu ALGO = "cpu" def test_dos(): # Model t = -2.7 a = 0.142 if QUICK_TEST: dim = (1024, 1024, 1) else: dim = (4096, 4096, 1) if BUILD_SAMPLE: model = make_graphene_sample(t, a, with_onsite=WITH_ONSITE, dim=dim) else: model = make_graphene(t, a, with_onsite=WITH_ONSITE, dim=dim) # Parameters solver = tb.TBPMSolver(model) solver.config.num_random_samples = 1 solver.config.rescale = 9.0 solver.config.ldos = False solver.config.ldos_orbital_indices = { 1 } solver.config.num_time_steps = 1024 solver.config.dimension = 2 solver.config.algo = ALGO # Calculation timer = tb.Timer() timer.tic("corr_dos") corr_dos = solver.calc_corr_dos() timer.toc("corr_dos") # Output if solver.is_master: timer.report_total_time() analyzer = tb.Analyzer(f"{solver.config.prefix}_info.dat") eng, dos = analyzer.calc_dos(corr_dos) save_xy("dos_py.dat", eng, dos) Setting the global variable ``ALGO`` to ``gpu`` enables the usage of GPU. The output should look like: .. code-block:: text Parallelization details: MPI processes : 1 OMP_NUM_THREADS : n/a MKL_NUM_THREADS : n/a Output details: Directory : ./ Prefix : sample Using TBPMGPU backend Device 0: "NVIDIA GeForce GTX 1050 Ti" CUDA Driver Version / Runtime Version 12.8 / 12.8 CUDA Capability Major/Minor version number: 6.1 (006) Multiprocessors, (128) CUDA Cores/MP: 768 CUDA Cores GPU Max Clock rate: 1418 MHz (1.42 GHz) Memory Clock rate: 3504 Mhz Memory Bus Width: 128-bit L2 Cache Size: 1048576 bytes ... ... Calculating DOS correlation function. Sample 1 of 1 Finished timestep 64 of 1024 12.436 sec. Finished timestep 128 of 1024 24.934 sec. ... ... which is similar to the CPU case, with additional information of the GPU device. Reading existing data --------------------- There are many occasions where we need to load data from previous calculations, e.g., post-processing for different purposes or plotting data in different styles. TBPLaS offers a complete set of functions for loading the data from disk, as defined in ``tbplas/tbplas/diag/io.py`` and ``tbplas/tbplas/tbpm/io.py``, for loading diagonalization and TBPM data files, respectively. The usage of these functions can be found in ``tbplas-cpp/samples/speedtest/plot_diag.py`` and ``tbplas-cpp/samples/speedtest/plot_tbpm.py``. For example, the ``plot_dos`` function in ``plot_diag.py`` calls :func:`.load_dos` function to load DOS: .. code-block:: python def plot_dos(prefix: str): energy, dos = tb.load_dos(prefix) plt.plot(energy, dos) plt.grid() plt.show() while the ``plot_dos`` function in ``plot_tbpm.py`` calls :func:`.load_corr_dos` to load the correlation functions: .. code-block:: python def plot_dos(prefix: str, analyzer: tb.Analyzer, vis: tb.Visualizer) -> None: corr = tb.load_corr_dos(prefix) energies, dos = analyzer.calc_dos(corr) vis.plot_dos(energies, dos) save_xy("dos_cpp.dat", energies, dos) The data files are identified by the ``prefix`` argument, which must be consistent with that in diagonalization or TBPM calculations.