Skip to content

Rewrite of generate-environment-identifiers-dict.sh #1028

@molangning

Description

@molangning

Describe the feature request:
The current script uses majestic's domain list, which may be missing a lot more domains as compared to the other lists in the checklist. Another issue that I find with it is that it makes a new (insecure) ssl session for every domain in the list, which is both insecure and inefficient.

More lists recommendations would be appreciated as these lists may be outdated.

Additional context:
You can use this command to interact with the sql server directly
psql -h crt.sh -p 5432 -U guest certwatch
https://groups.google.com/g/crtsh/c/sUmV0mBz8bQ/m/K-6Vymd_AAAJ

Domain list
https://hackertarget.com/top-million-site-list-download/
https://radar.cloudflare.com/domains
https://www.domcop.com/top-10-million-websites
https://s3-us-west-1.amazonaws.com/umbrella-static/index.html
https://majestic.com/reports/majestic-million
https://builtwith.com/top-sites
https://tranco-list.eu/
https://statvoo.com/dl/top-1million-sites.csv.zip

Next steps:

  • Implement a script that pulls domains from domcorp, alexa, cloudflare, majestic and others
  • Dedupe the list/find better ways to extract environment ids
  • Change to use sql interface
  • I intend to open a pull request later

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions