MDverse · pierrepo · Jan 27, 2026 · Jan 26, 2026 · Jan 26, 2026 · Jan 26, 2026
diff --git a/.gitignore b/.gitignore
@@ -27,6 +27,3 @@ __pycache__/
 
 # MAC tmp files
 .DS_Store
-
-Test/*
-!Test/Github_version
diff --git a/docs/atlas.md b/docs/atlas.md
@@ -1,63 +1,62 @@
-# ATLAS.
+# ATLAS
 
 ATLAS (Atlas of proTein moLecular dynAmicS) is an open-access data repository that gathers standardized molecular dynamics simulations of protein structures, accompanied by their analysis in the form of interactive diagrams and trajectory visualisation. All raw trajectories as well as the results of analysis are available for download.
 
-- web site: https://www.dsimb.inserm.fr/ATLAS/
-- documentation: https://www.dsimb.inserm.fr/ATLAS/api/redoc
-- API: https://www.dsimb.inserm.fr/ATLAS/api/
+- web site: <https://www.dsimb.inserm.fr/ATLAS/>
+- publication: [ATLAS: protein flexibility description from atomistic molecular dynamics simulations](https://academic.oup.com/nar/article/52/D1/D384/7438909), Nucleic Acids Research, 2024.
 
-No account / token is needed to access ATLAS API.
+## API
 
----
+- Base URL: <https://www.dsimb.inserm.fr/ATLAS/api/>
+- [documentation](https://www.dsimb.inserm.fr/ATLAS/api/redoc)
 
-## Finding molecular dynamics datasets and files
+No account / token is needed to access ATLAS API.
 
 ### Datasets
 
 In ATLAS, each dataset corresponds to a molecular dynamics simulation of a **protein chain** and is uniquely identified by a **PDB ID and chain identifier** (`pdb_chain`).
 
-The list of all available datasets can be obtained from the ATLAS HTML index:
-
-https://www.dsimb.inserm.fr/ATLAS/
-
-This page is used as the **discovery layer** to extract all available PDB chain identifiers.
+The list of all available datasets can be obtained from the ATLAS index page: <https://www.dsimb.inserm.fr/ATLAS/>
 
----
+All datasets (pdb chains) are extracted from this page with a regular expression.
 
-### API entrypoint to search for entries
+### Metadata for a given dataset
 
 API endpoint to retrieve metadata for a given dataset:
 
-- Path: `/ATLAS/metadata/{pdb_chain}`
-- documentation: https://www.dsimb.inserm.fr/ATLAS/api/redoc
+- Endpoint: `/ATLAS/metadata/{pdb_chain}`
+- HTTP method: GET
+- documentation: <https://www.dsimb.inserm.fr/ATLAS/api/redoc>
 
-This endpoint returns structured JSON metadata describing the protein and its molecular dynamics simulation.
+This endpoint returns structured JSON metadata describing the simulated protein.
 
----
+Example with dataset id `1k5n_A`:
 
-### Files
+- [web page](https://www.dsimb.inserm.fr/ATLAS/database/ATLAS/1k5n_A/1k5n_A.html)
+- [API view](https://www.dsimb.inserm.fr/ATLAS/api/ATLAS/metadata/1k5n_A)
 
-Files associated with a given dataset are hosted in a public directory.
-
-- Base path: `/database/ATLAS/{pdb_chain}/`
+Remarks:
 
-These directories contain structure files (PDB, CIF), molecular dynamics trajectories, and precomputed analysis results.
+- The title of the dataset is the protein name.
+- No comment or description is provided. We used the organism as description.
 
----
+### Metadata for files
 
-## Examples
-
-### 1k5n_A
+Files associated with a given dataset are hosted in a public directory.
 
-- entry id: `1k5n_A`
-- entry on ATLAS GUI: https://www.dsimb.inserm.fr/ATLAS/database/ATLAS/1k5n_A/1k5n_A.html
-- entry on ATLAS API: https://www.dsimb.inserm.fr/ATLAS/api/ATLAS/metadata/1k5n_A
+For each dataset, 3 zip files are provided. They are accessible through the web page of each individual dataset: <https://www.dsimb.inserm.fr/ATLAS/database/ATLAS/{pdb_chain}/{pdb_chain}.html>
 
-### Description (called "Comment") :
+Zip files url follow these patterns:
 
-HLA class I histocompatibility antigen, B alpha chain
+- Analysis & MDs (1,000 frames, only protein): <https://www.dsimb.inserm.fr/ATLAS/database/ATLAS/{pdb_chain}/{pdb_chain}_analysis.zip>
+- MDs (10,000 frames, only protein): <https://www.dsimb.inserm.fr/ATLAS/database/ATLAS/{pdb_chain}/{pdb_chain}_protein.zip>
+- MDs (10,000 frames, total system): <https://www.dsimb.inserm.fr/ATLAS/database/ATLAS/{pdb_chain}/{pdb_chain}_total.zip>
 
-### Files
+Example with dataset id `1k5n_A`:
 
-- files on ATLAS GUI: https://www.dsimb.inserm.fr/ATLAS/database/ATLAS/1k5n_A/1k5n_A.html
+- [web page](https://www.dsimb.inserm.fr/ATLAS/database/ATLAS/1k5n_A/1k5n_A.html)
+- [1k5n_A_analysis.zip](https://www.dsimb.inserm.fr/ATLAS/database/ATLAS/1k5n_A/1k5n_A_analysis.zip)
+- [1k5n_A_protein.zip](https://www.dsimb.inserm.fr/ATLAS/database/ATLAS/1k5n_A/1k5n_A_protein.zip)
+- [1k5n_A_total.zip](https://www.dsimb.inserm.fr/ATLAS/database/ATLAS/1k5n_A/1k5n_A_total.zip)
 
+We parse HTML content of dataset page and use regular expressions to extract URLs, file names and file sizes.
diff --git a/docs/zenodo.md b/docs/zenodo.md
@@ -10,8 +10,7 @@ So we don't expect much files to have an individual size above 50 GB.
 
 ## API
 
-### Documentation
-
+- Base URL: <https://zenodo.org/>
 - [REST API](https://developers.zenodo.org/)
 - List of [HTTP status codes](https://developers.zenodo.org/#http-status-codes)
 
@@ -21,10 +20,6 @@ Zenodo requires a token to access its API with higher rate limits. See "[Authent
 
 Example of direct API link for a given dataset: <https://zenodo.org/api/records/8183728>
 
-### Base ULR
-
-<https://zenodo.org/>
-
 ### Query
 
 [Search guide](https://help.zenodo.org/guides/search/)

diff --git a/pyproject.toml b/pyproject.toml
@@ -3,6 +3,27 @@ name = "mdverse-scrapers"
 version = "0.1.0"
 description = "MDverse scrapers"
 readme = "README.md"
+license = "BSD-3-Clause"
+authors = [
+    { name = "Pierre Poulain", email = "pierre.poulain@cupnet.net" },
+    { name = "Essmay Touami", email = "essmay.touami@etu.u-paris.fr" },
+    { name = "Salahudin Sheikh", email = "sheikh@ibpc.fr"}
+]
+maintainers = [
+    { name = "Pierre Poulain", email = "pierre.poulain@cupnet.net" }
+]
+classifiers = [
+    "Development Status :: 4 - Beta",
+    "License :: OSI Approved :: BSD License",
+    "Operating System :: OS Independent",
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3.13",
+    "Programming Language :: Python :: 3.14",
+    "Intended Audience :: Science/Research",
+    "Topic :: Database",
+    "Topic :: Scientific/Engineering :: Bio-Informatics",
+    "Topic :: Scientific/Engineering :: Chemistry",
+]
 requires-python = ">=3.12"
 dependencies = [
     "beautifulsoup4>=4.13.3",
@@ -50,3 +71,4 @@ build-backend = "uv_build"
 scrape-zenodo = "mdverse_scrapers.scrapers.zenodo:main"
 scrape-figshare = "mdverse_scrapers.scrapers.figshare:main"
 scrape-nomad = "mdverse_scrapers.scrapers.nomad:main"
+scrape-atlas = "mdverse_scrapers.scrapers.atlas:main"