|
| 1 | +The following metadata fields can be extracted from a readme.md file. |
| 2 | +Unlike others files formats (pom, cargo, cabal...), README documents do not follow a formal specification. They are free‑form text files, usually written in markdown or restructuredtext, and their structure varies widely across projects. SOMEF applies heuristics to identify common sections (e.g., Title, Description, Installation, Usage, License...) and extracts metadata accordingly. |
| 3 | + |
| 4 | +| Software metadata category | SOMEF metadata JSON path | README.MD metadata file field | |
| 5 | +|--------------------------------|----------------------------------------|----------------------------------------| |
| 6 | +| acknowledgement | acknowledgement[i].result.value | hearders with acknowledgement | |
| 7 | +| citation | citation[i].result.value | headers with citation, reference, cite. Extract bibtext **(1)** | |
| 8 | +| contact | contact[i].result.value | headers with contact | |
| 9 | +| contributing_guidelines | contributing_guidelines[i].result.value | headers with contributing | |
| 10 | +| contributors | contributors[i].result.value | headers with contributor | |
| 11 | +| description | description[i].result.value | headers with description, introduction, basics, initiation, overview | |
| 12 | +| documentation | documentation[i].result.value | github or gitlab url documentation **(2)**, headers with documentation, readthedocs same name project, readthedocs in badges, wiki links in badges and text | |
| 13 | +| download | download[i].result.value | headers with download | |
| 14 | +| executable_example | executable_example[i].result.value | extracts Binder from badgets **(3)** | |
| 15 | +| faq | faq[i].result.value | headers with faq, errors, problems | |
| 16 | +| full_title | full_title[i].result.value | extract full title **(4)** | |
| 17 | +| homepage | homepage[i].result.value | homepage from badgets **(5)** | |
| 18 | +| identifier | idenfier[i].result.value | extract from badgets directly or get from zenodo with latest doi **(6)**, swh identifiers **(7)** | |
| 19 | +| images | images[i].result.value | other images in the README apart from the logo | |
| 20 | +| installation | installation[i].result.value | headers with installation, install, setup, prepare, preparation, manual, guide | |
| 21 | +| license | license[i].result.value | headers with license | |
| 22 | +| logo | logo[i].result.value | look images in badges and text **(8)** | |
| 23 | +| package_distribution | package_distribution[i].result.value | Pypi or latest Pypi version in badges **(9)** | |
| 24 | +| related_documentation | dorelated_documentationumentation[i].result.value | readthedocs diferent name project | |
| 25 | +| run | run[i].result.value | headers with run, execute | |
| 26 | +| readme_url | readme_url[i].result.value | url in raw githubuser content **(10)** | |
| 27 | +| related_papers | related_papers[i].result.value | look for arXiv reference in all the text **(11)** | |
| 28 | +| repository_status | repository_status[i].result.value | badges with Project status **(12)** | |
| 29 | +| requirements | requirements[i].result.value | headers with requirement, prerequisite, dependency, dependent | |
| 30 | +| support | support[i].result.value | headers with support, help, report | |
| 31 | +| support_channels | support_channels[i].result.value | extract information of gitter, reddit and discord in badges and text **(13)** | |
| 32 | +| usage | usage[i].result.value | headers with usage, example, implement, implementation, demo, tutorial, start, started | |
| 33 | + |
| 34 | + |
| 35 | +------ |
| 36 | + |
| 37 | +**(1)** |
| 38 | +- Example: |
| 39 | +```bib |
| 40 | +@inproceedings{garijo2017widoco, |
| 41 | + title={WIDOCO: a wizard for documenting ontologies}, |
| 42 | + author={Garijo, Daniel}, |
| 43 | + booktitle={International Semantic Web Conference}, |
| 44 | + pages={94--102}, |
| 45 | + year={2017}, |
| 46 | + organization={Springer, Cham}, |
| 47 | + doi = {10.1007/978-3-319-68204-4_9}, |
| 48 | + funding = {USNSF ICER-1541029, NIH 1R01GM117097-01}, |
| 49 | + url={http://dgarijo.com/papers/widoco-iswc2017.pdf} |
| 50 | +} |
| 51 | +``` |
| 52 | +- Result: |
| 53 | +``` |
| 54 | +{ |
| 55 | + "result": { |
| 56 | + "value": "@inproceedings{garijo2017widoco,\n url = {http://dgarijo.com/papers/widoco-iswc2017.pdf},\n funding = {USNSF ICER-1541029, NIH 1R01GM117097-01},\n doi = {10.1007/978-3-319-68204-4_9},\n organization = {Springer, Cham},\n year = {2017},\n pages = {94--102},\n booktitle = {International Semantic Web Conference},\n author = {Garijo, Daniel},\n title = {WIDOCO: a wizard for documenting ontologies},\n}", |
| 57 | + "type": "Text_excerpt", |
| 58 | + "format": "bibtex", |
| 59 | + "doi": "10.1007/978-3-319-68204-4_9", |
| 60 | + "title": "WIDOCO: a wizard for documenting ontologies", |
| 61 | + "author": "Garijo, Daniel", |
| 62 | + "url": "http://dgarijo.com/papers/widoco-iswc2017.pdf" |
| 63 | + }, |
| 64 | +} |
| 65 | +``` |
| 66 | + |
| 67 | + |
| 68 | +**(2)** |
| 69 | +- Example if github: |
| 70 | +``` |
| 71 | +f"https://github.com/{owner}/{repo_name}/tree/{urllib.parse.quote(repo_default_branch)}/{docs_path}" |
| 72 | +``` |
| 73 | +- Example if gitlab: |
| 74 | +``` |
| 75 | +f"https://{domain_gitlab}/{owner}/{repo_name}/-/tree/{urllib.parse.quote(repo_default_branch)}/{docs_path}" |
| 76 | +``` |
| 77 | + |
| 78 | +**(3)** |
| 79 | +- Example: `[](https://mybinder.org/v2/gh/user/repo/HEAD)` |
| 80 | +- Result: `"value": "https://mybinder.org/v2/gh/user/repo/HEAD"` |
| 81 | + |
| 82 | +**(4)** |
| 83 | +- Example: `# WIzard for DOCumenting Ontologies (WIDOCO)` |
| 84 | +- Result: |
| 85 | +``` |
| 86 | +"full_title": [ |
| 87 | + { |
| 88 | + "result": { |
| 89 | + "type": "String", |
| 90 | + "value": "WIzard for DOCumenting Ontologies (WIDOCO)" |
| 91 | + }, |
| 92 | + "confidence": 1, |
| 93 | + "technique": "regular_expression", |
| 94 | + "source": "https://raw.githubusercontent.com/dgarijo/Widoco/master/README.md" |
| 95 | + } |
| 96 | +] |
| 97 | +``` |
| 98 | + |
| 99 | +**(5)** |
| 100 | +- Example: `[](https://myproject.org)` |
| 101 | +- Result: `"value": "https://myproject.org"` |
| 102 | + |
| 103 | + |
| 104 | +**(6)** |
| 105 | +- Example: `[](https://doi.org/10.5281/zenodo.11093793)` |
| 106 | +- Result: `"value": "https://doi.org/10.5281/zenodo.11093793"` |
| 107 | + |
| 108 | +**(7)** |
| 109 | +- Example: `[](https://archive.softwareheritage.org/swh:1:dir:40d462bbecefc3a9c3e810567d1f0d7606e0fae7;origin=...)` |
| 110 | +- Result: ` "value": "https://archive.softwareheritage.org/swh:1:dir:40d462bbecefc3a9c3e810567d1f0d7606e0fae7",` |
| 111 | + |
| 112 | + |
| 113 | +**(8)** |
| 114 | +- Example: `` |
| 115 | +- Result: `"value": "https://raw.githubusercontent.com/dgarijo/Widoco/master/src/main/resources/logo/logo2.png"`` |
| 116 | + |
| 117 | +**(9)** |
| 118 | +- Example: `[](https://badge.fury.io/py/somef) ` |
| 119 | +- Result: `"value": "https://pypi.org/project/somef"` |
| 120 | + |
| 121 | + |
| 122 | +**(10)** |
| 123 | +- Example: |
| 124 | +``` |
| 125 | +[Yulun Zhang](http://yulunzhang.com/), [Yapeng Tian](http://yapengtian.org/), [Yu Kong](http://www1.ece.neu.edu/~yukong/), [Bineng Zhong](https://scholar.google.de/citations?user=hvRBydsAAAAJ&hl=en), and [Yun Fu](http://www1.ece.neu.edu/~yunfu/), "Residual Dense Network for Image Super-Resolution", CVPR 2018 (spotlight), [[arXiv]](https://arxiv.org/abs/1802.08797) |
| 126 | +``` |
| 127 | +- Result: `"value": "https://arxiv.org/abs/1802.08797"` |
| 128 | + |
| 129 | + |
| 130 | +**(11)** |
| 131 | +- Example: |
| 132 | +``` |
| 133 | +f"https://raw.githubusercontent.com/{owner}/{repo_name}/{repo_ref}/{urllib.parse.quote(partial)}" |
| 134 | +``` |
| 135 | + |
| 136 | +**(12)** |
| 137 | +- Example: |
| 138 | +``` |
| 139 | + [](https://www.repostatus.org/#active) |
| 140 | +``` |
| 141 | +- Result: |
| 142 | +``` |
| 143 | +"value": "https://www.repostatus.org/#active", |
| 144 | +"description": "Active \u2013 The project has reached a stable, usable state and is being actively developed." |
| 145 | +``` |
| 146 | + |
| 147 | +**(13)** |
| 148 | +- Example: |
| 149 | +``` |
| 150 | +[](https://gitter.im/myproject/community) |
| 151 | +[Reddit](https://www.reddit.com/r/myproject) |
| 152 | +[Discord](https://discord.com/invite/xyz789) |
| 153 | +``` |
| 154 | +- Result: |
| 155 | +``` |
| 156 | +"value": "https://gitter.im/myproject/community" |
| 157 | +.... |
| 158 | +"value": "https://www.reddit.com/r/myproject" |
| 159 | +..... |
| 160 | +"value": "https://discord.com/invite/xyz789" |
| 161 | +``` |
| 162 | + |
| 163 | + |
0 commit comments