ChemMap is a Python library that tries to bridge the gap from metabolomics to proteomics using existing databases.
| Table of Contents |
|---|
| ChemMap in a Nutshell |
| How to Download |
| How to Use |
A sketch of the main method of ChemMap can be found on the following diagram.
![]() |
|---|
| Schema showing the workflow of ChemMap |
The main functionality of ChemMap, the function map_smiles_to_proteins, accepts a
SMILES or a list of them and on the first phase
tries to extract PubChem's and ChEBI's chemical identifiers of this molecule using the
PUG REST API. Should you select "expand_all"
or "expand_pubchem" as parameters of the search_method, ChemMap would then find molecules that are structurally
similar using PUG REST API fastsimilarity_2d endpoint, which uses Tanimoto similarity scores. It is noteworthy that
in order to extract ChEBI's identifiers at this stage we are relying on them being reported on PubChem, which might not
be the case for newly reported ChEBI substances.
On the second phase, if either "expand_all" or "expand_chebi" where selected as input for the parameter
search_method. The workflow will use libChEBIpy to find substances that are
related to the ones found by one of the following relationships is_conjugate_base_of is_conjugate_acid_of, is_a,
is_tautomer_of or is_enantiomer_of.
On the last step, the ChEBI identifiers are used to search for the presence of the compound on a Rhea reaction as a substrate. If we found one, we retrieve the EC Number and UniProt protein identifier, if available. On the background we are using the UniProt SPARQL Endpoint and the fact that Rhea and UniProt are synchronized on every UniProt release (more here).
The output of this process are 3 dataframes that contain, compound data (as explained in the first and second phases),
reaction data (last step) and reaction data of similar structures, respectively. Should the to_tsv parameter
be passed to the method, the data will then be saved on a folder with name corresponding to the date and time up to the
second.
This library can be downloaded through pip
pip install chemmapor by direct clone using git
git clone [email protected]:anguera5/ChemMap.gitand installing the dependencies with poetry.
cd ChemMap
poetry env activateA minimal use case would look as follows. We are interested in knowing all the chemical identifiers and its reactions
for Aspirin. A quick Google search will show us that the SMILES for Aspirin is CC(=O)OC1=CC=CC=C1C(=O)O
from ChemMap.chem_map import ChemMap
smiles = "CC(=O)OC1=CC=CC=C1C(=O)O"
search_method = "expand_all"
cm = ChemMap()
cm.map_smiles_to_proteins(smiles, search_method=search_method)