Skip to content

Commit 36da300

Browse files
authored
Merge pull request #330 from naik-aakash/update_featurizers
Add new functions/methods to featurize module
2 parents 02926ed + a8b66ae commit 36da300

File tree

67 files changed

+1782
-348
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

67 files changed

+1782
-348
lines changed

.github/workflows/python-package.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ jobs:
3737
fail-fast: false
3838
matrix:
3939
python-version: ["3.10", "3.11", "3.12"]
40-
split: [1, 2, 3, 4, 5, 6]
40+
split: [1, 2, 3, 4, 5]
4141

4242

4343
steps:
@@ -67,7 +67,7 @@ jobs:
6767
MPLBACKEND: Agg # non-interactive backend for matplotlib
6868
run: |
6969
micromamba activate lobpy
70-
pytest --cov=lobsterpy --cov-report term-missing --cov-append --splits 6 --group ${{ matrix.split }} -vv --durations-path ./tests/test_data/.pytest-split-durations
70+
pytest --cov=lobsterpy --cov-report term-missing --cov-append --splits 5 --group ${{ matrix.split }} -vv --durations-path ./tests/test_data/.pytest-split-durations
7171
7272
- name: Upload coverage
7373
if: matrix.python-version == '3.10'

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
exclude: ^(docs|examples|tests|.github)
1+
exclude: ^(docs|examples|.github|tests/test_data)
22

33
ci:
44
autoupdate_schedule: monthly

README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,9 +87,13 @@ Please cite our papers:
8787
* A. A. Naik, K. Ueltzen, C. Ertural, A. J. Jackson, J. George, *Journal of Open Source Software* **2024**, *9*, 6286. [https://joss.theoj.org/papers/10.21105/joss.06286](https://joss.theoj.org/papers/10.21105/joss.06286).
8888
* J. George, G. Petretto, A. Naik, M. Esters, A. J. Jackson, R. Nelson, R. Dronskowski, G.-M. Rignanese, G. Hautier, *ChemPlusChem* **2022**, *87*, e202200123. [https://doi.org/10.1002/cplu.202200123](https://doi.org/10.1002/cplu.202200123) (Information on the methodology of the automatic analysis)
8989

90+
If you use any of the following Featurizers, also cite the respective papers:
91+
92+
* `FeaturizeCharges`: R. Nelson, C. Ertural, P. C. Müller, R. Dronskowski, in Comprehensive Inorganic Chemistry III, *Elsevier*, **2023**, pp. 141–201. [https://doi.org/10.1016/B978-0-12-823144-9.00120-5](https://doi.org/10.1016/B978-0-12-823144-9.00120-5)
93+
* `FeaturizeIcoxxlist`: V. L. Deringer, W. Zhang, M. Lumeij, S. Maintz, M. Wuttig, R. Mazzarello, R. Dronskowski, *Angewandte Chemie International Edition 2014*, *53*, 10817–10820. [https://doi.org/10.1002/anie.201404223](https://doi.org/10.1002/anie.201404223)
94+
9095
Please cite [pymatgen](https://github.com/materialsproject/pymatgen), [Lobster](https://schmeling.ac.rwth-aachen.de/cohp/index.php?menuID=1), and [ChemEnv](https://doi.org/10.1107/S2052520620007994) correctly as well.
9196

92-
.
9397

9498
## LobsterPy is now a part of an atomate2 workflow
9599
![LobsterWorkflow](https://github.com/JaGeo/LobsterPy/assets/22094846/337615ac-542e-446c-bc63-fb5946b16544)

docs/fundamentals/index.ipynb

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -279,6 +279,22 @@
279279
"\n",
280280
"It is possible to compute ionicity based on above mentioned formula using either the Mulliken or Löwdin charges obtained from LOBSTER run."
281281
]
282+
},
283+
{
284+
"metadata": {},
285+
"cell_type": "markdown",
286+
"source": [
287+
"### ICOXX based features (ICOXX: ICOHP / ICOBI / ICOOP)\n",
288+
"\n",
289+
"Bond weighted distribution function (BWDF) is an extension of radial distribution function (RDF), which encodes information about the bonding character. The formulation for this was introduced in [V. L. Deringer, W. Zhang, M. Lumeij, S. Maintz, M. Wuttig, R. Mazzarello, R. Dronskowski, Angewandte Chemie International Edition 2014, 53, 10817–10820](https://doi.org/10.1002/anie.201404223) and is defined as follows :\n",
290+
"\n",
291+
"$\\mathrm{BWDF}=\\sum_{\\mathrm{B}>\\mathrm{A}}\\left[\\delta\\left(r-\\left|\\mathbf{r}_{\\mathrm{AB}}\\right|\\right) \\times B_{\\mathrm{AB}}\\right]$\n",
292+
"\n",
293+
"In the formula above, $\\delta$ is the Dirac delta function, $r$ is the distance between atoms A and B, and $B_{\\mathrm{AB}}$ is the bond strength between atoms A and B. The bond strength can be ICOHPs, ICOBIs, or ICOOPs. The BWDF is thus a histogram of bond strengths as a function of bond length. One can compute this for entire structure, for each unique atom pair in the structure, per site or per bond label.\n",
294+
"\n",
295+
"Here, from the BWDF mainly statistical features like mean, standard deviation, skewness, kurtosis, weighted mean, and weighted standard deviation are computed.\n"
296+
],
297+
"id": "a2f623edafa5efc9"
282298
}
283299
],
284300
"metadata": {

docs/reference/cli_subcommands/descriptionquality.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
description-quality
22
===================
33

4-
Deliver a text description of the LOBSTER calc quality analysis. Mandatory required files: POSCAR, POTCAR or POTCAR symbols, lobsterout, lobsterin. Optional files (BVA comparison): CHARGE.lobster, (DOS comparison): DOSCAR.lobster/ DOSCAR.LSO.lobster, Vasprun.xml.
4+
Deliver a text description of the LOBSTER calc quality analysis. Mandatory required files: structure file (preferably CONTCAR), POTCAR or POTCAR symbols, lobsterout, lobsterin. Optional files (BVA comparison): CHARGE.lobster, (DOS comparison): DOSCAR.lobster/ DOSCAR.LSO.lobster, Vasprun.xml.
55

66
.. argparse::
77
:module: lobsterpy.cli

docs/tutorial/commandlineinterface.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Creating input files
1515

1616

1717
With LobsterPy, these intricate details are handled with a single command. We need the standard VASP input files, i.e.
18-
``INCAR, KPOINTS, POTCAR and POSCAR`` in the calculation directory. Once you have these files, one needs to run the following command:
18+
``INCAR, KPOINTS, POTCAR and CONTCAR`` in the calculation directory. Once you have these files, one needs to run the following command:
1919

2020
``lobsterpy create-inputs``
2121

@@ -182,7 +182,7 @@ Following is the json file produced.
182182
- ``lobsterpy description-quality --potcar-symbols "Na_pv Cl" --bvacomp --doscomp`` command will automatically analyze your lobster calculation quality.
183183

184184
.. note::
185-
The LOBSTER calculation directory need to have POTCAR, POSCAR, LOBSTER calculation input and output files to run the **lobsterpy calc-description** command successfully.
185+
The LOBSTER calculation directory need to have POTCAR, structure file (preferably CONTCAR), LOBSTER calculation input and output files to run the **lobsterpy description-quality** command successfully.
186186
If POTCAR is not available then you need to supply **--potcar-symbols** along with the command. Other optional files are vasprun.xml if **--doscomp** is switched on.
187187

188188
.. code:: bash

docs/tutorial/tutorial.ipynb

Lines changed: 80 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,9 @@
5151
"outputs": [],
5252
"source": [
5353
"from pathlib import Path\n",
54+
"\n",
55+
"from pexpect.replwrap import python\n",
56+
"\n",
5457
"from lobsterpy.cohp.analyze import Analysis\n",
5558
"from lobsterpy.cohp.describe import Description\n",
5659
"import warnings\n",
@@ -92,7 +95,7 @@
9295
"source": [
9396
"# Initialize Analysis object\n",
9497
"analyse = Analysis(\n",
95-
" path_to_poscar=directory / \"POSCAR.gz\",\n",
98+
" path_to_poscar=directory / \"CONTCAR.gz\",\n",
9699
" path_to_icohplist=directory / \"ICOHPLIST.lobster.gz\",\n",
97100
" path_to_cohpcar=directory / \"COHPCAR.lobster.gz\",\n",
98101
" path_to_charge=directory / \"CHARGE.lobster.gz\",\n",
@@ -195,7 +198,7 @@
195198
"\n",
196199
"```python\n",
197200
"analyse = Analysis(\n",
198-
" path_to_poscar=directory / \"POSCAR.gz\",\n",
201+
" path_to_poscar=directory / \"CONTCAR.gz\",\n",
199202
" path_to_icohplist=directory / \"ICOBILIST.lobster.gz\",\n",
200203
" path_to_cohpcar=directory / \"COBICAR.lobster.gz\",\n",
201204
" path_to_charge=directory / \"CHARGE.lobster.gz\",\n",
@@ -232,7 +235,7 @@
232235
"outputs": [],
233236
"source": [
234237
"analyse = Analysis(\n",
235-
" path_to_poscar=directory / \"POSCAR.gz\",\n",
238+
" path_to_poscar=directory / \"CONTCAR.gz\",\n",
236239
" path_to_icohplist=directory / \"ICOHPLIST.lobster.gz\",\n",
237240
" path_to_cohpcar=directory / \"COHPCAR.lobster.gz\",\n",
238241
" path_to_charge=directory / \"CHARGE.lobster.gz\",\n",
@@ -358,7 +361,7 @@
358361
"source": [
359362
"# Get calculation quality summary dict\n",
360363
"calc_quality_K3Sb = Analysis.get_lobster_calc_quality_summary(\n",
361-
" path_to_poscar=directory / \"POSCAR.gz\",\n",
364+
" path_to_poscar=directory / \"CONTCAR.gz\",\n",
362365
" path_to_charge=directory / \"CHARGE.lobster.gz\",\n",
363366
" path_to_lobsterin=directory / \"lobsterin.gz\",\n",
364367
" path_to_lobsterout=directory / \"lobsterout.gz\",\n",
@@ -557,7 +560,7 @@
557560
"# Load Lobster DOS\n",
558561
"directory = Path(\".\") / \"..\" / \"..\" / \"tests\" / \"test_data\" / \"NaCl_comp_range\"\n",
559562
"dos = Doscar(doscar=directory / 'DOSCAR.lobster.gz',\n",
560-
" structure_file=directory / 'POSCAR.gz')"
563+
" structure_file=directory / 'CONTCAR.gz')"
561564
]
562565
},
563566
{
@@ -569,7 +572,7 @@
569572
"# Load Lobster DOS (Change this cell block type to Code when executing locally)\n",
570573
"directory = Path(\"LobsterPy\") / \"tests\" / \"test_data\" / \"NaCl_comp_range\"\n",
571574
"dos = Doscar(doscar=directory / 'DOSCAR.lobster.gz',\n",
572-
" structure_file=directory / 'POSCAR.gz')\n",
575+
" structure_file=directory / 'CONTCAR.gz')\n",
573576
"```"
574577
]
575578
},
@@ -585,9 +588,7 @@
585588
"cell_type": "code",
586589
"execution_count": null,
587590
"id": "a271f2f0",
588-
"metadata": {
589-
"scrolled": false
590-
},
591+
"metadata": {},
591592
"outputs": [],
592593
"source": [
593594
"style.use('default') # Complete reset the matplotlib figure style\n",
@@ -658,7 +659,7 @@
658659
"outputs": [],
659660
"source": [
660661
"graph_NaCl_all = LobsterGraph(\n",
661-
" path_to_poscar=directory / \"POSCAR.gz\",\n",
662+
" path_to_poscar=directory / \"CONTCAR.gz\",\n",
662663
" path_to_charge=directory / \"CHARGE.lobster.gz\",\n",
663664
" path_to_cohpcar=directory / \"COHPCAR.lobster.gz\",\n",
664665
" path_to_icohplist=directory / \"ICOHPLIST.lobster.gz\",\n",
@@ -679,7 +680,7 @@
679680
"```python\n",
680681
"#### (Change this cell block type to Code or copy it from here when executing locally)\n",
681682
"graph_NaCl_all = LobsterGraph(\n",
682-
" path_to_poscar=directory / \"POSCAR.gz\",\n",
683+
" path_to_poscar=directory / \"CONTCAR.gz\",\n",
683684
" path_to_charge=directory / \"CHARGE.lobster.gz\",\n",
684685
" path_to_cohpcar=directory / \"COHPCAR.lobster.gz\",\n",
685686
" path_to_icohplist=directory / \"ICOHPLIST.lobster.gz\",\n",
@@ -746,7 +747,7 @@
746747
"metadata": {},
747748
"outputs": [],
748749
"source": [
749-
"from lobsterpy.featurize.batch import (BatchCoxxFingerprint, BatchDosFeaturizer,\n",
750+
"from lobsterpy.featurize.batch import (BatchCoxxFingerprint, BatchIcoxxlistFeaturizer, BatchDosFeaturizer,\n",
750751
" BatchSummaryFeaturizer, BatchStructureGraphs)"
751752
]
752753
},
@@ -834,6 +835,72 @@
834835
"fp_cohp_bonding.get_similarity_matrix_df()"
835836
]
836837
},
838+
{
839+
"metadata": {},
840+
"cell_type": "markdown",
841+
"source": "### BatchIcoxxlistFeaturizer",
842+
"id": "a5220531c2082185"
843+
},
844+
{
845+
"metadata": {},
846+
"cell_type": "markdown",
847+
"source": [
848+
"`BatchIcoxxlistFeaturizer` provides a convenient way to extract BWDF as features from the LOBSTER calculation directory. The extracted features consist of the following:\n",
849+
"\n",
850+
"1. BWDF mean, standard deviation , skewness, kurtosis, weighted mean, and weighted standard deviation\n",
851+
"2. Complete BWDF as columns in the dataframe\n",
852+
"3. BWDF values sorted by bond distances as columns in the dataframe\n",
853+
"4. Bond distances sorted by BWDF values as columns in the dataframe"
854+
],
855+
"id": "9de31478633c9afc"
856+
},
857+
{
858+
"metadata": {},
859+
"cell_type": "code",
860+
"outputs": [],
861+
"execution_count": null,
862+
"source": [
863+
"# Initialize the batch ICOXXLIST featurizer\n",
864+
"batch_icohp = BatchIcoxxlistFeaturizer(path_to_lobster_calcs=directory / \"..\" / \"Featurizer_test_data\" / \"Lobster_calcs\", # path to parent lobster calcs\n",
865+
" normalization=\"formula_units\", # will normalize the BWDF values by formula units\n",
866+
" max_length=6, # maximum bond length for BWDF computation\n",
867+
" bin_width=0.1, # sets number for bins\n",
868+
" bwdf_df_type=\"stats\", # Type of BWDF dataframe to generate (stats, binned, sorted_bwdf, sorted_dists)\n",
869+
" read_icobis=False, # set to true to read ICOBI data\n",
870+
" read_icoops=False, # set to true to read ICOOP data\n",
871+
" n_jobs=3,)"
872+
],
873+
"id": "2ed6793640da4fac"
874+
},
875+
{
876+
"metadata": {},
877+
"cell_type": "markdown",
878+
"source": [
879+
"```python\n",
880+
"## Initialize batch ICOXXLIST featurizer (Change this cell block type to Code and remove formatting when executing locally)\n",
881+
"batch_icohp = BatchIcoxxlistFeaturizer(path_to_lobster_calcs=directory / \"..\" / \"Featurizer_test_data\" / \"Lobster_calcs\", # path to parent lobster calcs\n",
882+
" normalization=\"formula_units\", # will normalize the BWDF values by formula units\n",
883+
" max_length=6, # maximum bond length for BWDF computation\n",
884+
" bin_width=0.1, # sets number for bins\n",
885+
" bwdf_df_type=\"stats\", # Type of BWDF dataframe to generate (stats, binned, sorted_bwdf, sorted_dists)\n",
886+
" read_icobis=False, # set to true to read ICOBI data\n",
887+
" read_icoops=False, # set to true to read ICOOP data\n",
888+
" n_jobs=3,)\n",
889+
"```"
890+
],
891+
"id": "b3111a3881b29058"
892+
},
893+
{
894+
"metadata": {},
895+
"cell_type": "code",
896+
"outputs": [],
897+
"execution_count": null,
898+
"source": [
899+
"# get the BWDF stats df\n",
900+
"batch_icohp.get_df()"
901+
],
902+
"id": "7c731fe3e32065be"
903+
},
837904
{
838905
"cell_type": "markdown",
839906
"id": "4bf34582",
@@ -1067,7 +1134,7 @@
10671134
"name": "python",
10681135
"nbconvert_exporter": "python",
10691136
"pygments_lexer": "ipython3",
1070-
"version": "3.12.4"
1137+
"version": "3.10.14"
10711138
},
10721139
"widgets": {
10731140
"application/vnd.jupyter.widget-state+json": {

examples/example_script_NaCl.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
# Setup analysis dict
99
analyse = Analysis(
10-
path_to_poscar=os.path.join(directory, "POSCAR"),
10+
path_to_poscar=os.path.join(directory, "CONTCAR"),
1111
path_to_icohplist=os.path.join(directory, "ICOHPLIST.lobster"),
1212
path_to_cohpcar=os.path.join(directory, "COHPCAR.lobster"),
1313
path_to_charge=os.path.join(directory, "CHARGE.lobster"),

0 commit comments

Comments
 (0)