Skip to content

anguera5/ChemMap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ChemMap

ChemMap is a Python library that tries to bridge the gap from metabolomics to proteomics using existing databases.

Table of Contents
ChemMap in a Nutshell
How to Download
How to Use

ChemMap in a Nutshell

A sketch of the main method of ChemMap can be found on the following diagram.

app_schema.png
Schema showing the workflow of ChemMap

The main functionality of ChemMap, the function map_smiles_to_proteins, accepts a SMILES or a list of them and on the first phase tries to extract PubChem's and ChEBI's chemical identifiers of this molecule using the PUG REST API. Should you select "expand_all" or "expand_pubchem" as parameters of the search_method, ChemMap would then find molecules that are structurally similar using PUG REST API fastsimilarity_2d endpoint, which uses Tanimoto similarity scores. It is noteworthy that in order to extract ChEBI's identifiers at this stage we are relying on them being reported on PubChem, which might not be the case for newly reported ChEBI substances.

On the second phase, if either "expand_all" or "expand_chebi" where selected as input for the parameter search_method. The workflow will use libChEBIpy to find substances that are related to the ones found by one of the following relationships is_conjugate_base_of is_conjugate_acid_of, is_a, is_tautomer_of or is_enantiomer_of.

On the last step, the ChEBI identifiers are used to search for the presence of the compound on a Rhea reaction as a substrate. If we found one, we retrieve the EC Number and UniProt protein identifier, if available. On the background we are using the UniProt SPARQL Endpoint and the fact that Rhea and UniProt are synchronized on every UniProt release (more here).

The output of this process are 3 dataframes that contain, compound data (as explained in the first and second phases), reaction data (last step) and reaction data of similar structures, respectively. Should the to_tsv parameter be passed to the method, the data will then be saved on a folder with name corresponding to the date and time up to the second.

How to Download

This library can be downloaded through pip

pip install chemmap

or by direct clone using git

git clone [email protected]:anguera5/ChemMap.git

and installing the dependencies with poetry.

cd ChemMap
poetry env activate

How to Use

A minimal use case would look as follows. We are interested in knowing all the chemical identifiers and its reactions for Aspirin. A quick Google search will show us that the SMILES for Aspirin is CC(=O)OC1=CC=CC=C1C(=O)O

from ChemMap.chem_map import ChemMap

smiles = "CC(=O)OC1=CC=CC=C1C(=O)O"
search_method = "expand_all" 
cm = ChemMap()
cm.map_smiles_to_proteins(smiles, search_method=search_method)

About

A repository to bridge the gap from metabolomics to proteomics using existing databases

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages