To enable a more automated approach to gathering information about companies company_dns was created. This release enables the synthesis of data from the SEC EDGAR repository and Wikipedia. A Medium article entitled "A case for API based open company firmographics" is available discussing the process and motivation behind the creation of this service.
The embedded web interface is a modern single-page application for exploring SEC EDGAR filings and industry classifications. For detailed documentation on features, architecture, and usage, see html/README.md.
The V3.2.0 release brings comprehensive security hardening and modernizes the web framework:
- Security Middleware: Blocks 70+ known attack patterns including WordPress probes, PHP exploits, SQL injection, and XSS attempts
- Rate Limiting: IP-based rate limiting (100-1000 req/min by endpoint type) using SlowAPI
- Structured Logging: Severity-based logging with automatic request tracking and performance metrics
- Input Validation: Pydantic v2.10.0 provides strict request/response validation
- FastAPI Migration: Complete replacement of Starlette with modern FastAPI framework
- Automatic API Docs: Interactive Swagger UI at
/docsand ReDoc at/redoc - Enhanced Type Safety: All endpoints use Pydantic models with proper type hints
- Python 3.13 Support: Tested with Python 3.13 via Homebrew, uses
.venvconvention
fastapi>=0.115.0: Modern ASGI framework with automatic OpenAPI docspydantic>=2.10.0: Data validation and serializationslowapi>=0.1.9: Rate limiting for API protectionpython-multipart>=0.0.9: Multipart form data support
✅ Malicious paths (/wp-login.php, xmlrpc.php, etc.) return 403 Forbidden
✅ SQL injection patterns blocked before reaching business logic
✅ XSS attempts filtered at middleware level
✅ All endpoints enforce rate limits per IP address
For release notes prior to V3.2.0 (including V3.1.0 and V3.0.0), see the consolidated changelog: CHANGELOG.md.
The install and setup process is either for users or developers. Instructions for both are provided below.
New from V3.0.0 are automated docker builds providing a fresh image on a monthly basis. There are three reasons for this:
- Gets the latest information from EDGAR such that when the service is queried the user can access the latest quarterly and yearly filings.
- As the code progresses, and is checked into main, users will automatically get the latest improvements and fixes.
- Creates images for both x86 and ARM architectures.
The image can be pulled using docker pull ghcr.io/miha42-github/company_dns/company_dns:latest. With the image pulled you can run it using docker run -m 1G -p 8000:8000 company_dns:latest which will run the image in the foreground, and running the image in the background docker run -d -m 1G -p 8000:8000 company_dns:latest. GitHub's container registry is used to store the images, and more information on this package can be found at company_dns/company_dns.
Assuming you have setup access to GitHub and a Linux or MacOS system of some kind, you'll need to get the repository.
- Create a directory that will contain the code:
mkdir ~/dev - Enter the directory:
cd ~/dev/ - Clone the repository:
git clone [email protected]:miha42-github/company_dns.git
Since the docker build process takes care of data cache creation, Python requirements installation and other items getting company_dns running is relatively straight forward. To simplify the process further the svc_ctl.sh script is provided.
svc_ctl.sh automates build/run/log tasks for company_dns. Common workflows:
- From
~/dev/company_dns:./svc_ctl.sh buildthen./svc_ctl.sh start(background) or./svc_ctl.sh foreground(interactive). - Watch logs:
./svc_ctl.sh tail. - Stop or kill:
./svc_ctl.sh stop(graceful) or./svc_ctl.sh kill(forceful). - Rebuild and restart:
./svc_ctl.sh rebuild. - Check status or dependencies:
./svc_ctl.sh statusor./svc_ctl.sh check-deps. - Cleanup stopped containers/images:
./svc_ctl.sh cleanup.
NAME:
./svc_ctl.sh <sub-command>
COMMANDS:
help - Display help
check-deps - Validate docker and required files
start - Start the service in the background
stop - Stop the running container gracefully
kill - Forcefully stop the running container
build - Build the Docker image
rebuild - Rebuild image and restart the service
foreground - Run the service in the foreground
tail - View logs of the running container
status - Check container status and port
cleanup - Remove stopped containers and dangling images
Depending upon the intention for getting the code it could be running in a Python virtual environment or in a vanilla file system. Regardless the steps below can be followed to get the service up and running.
Before you get started it is important to install all prequisites and then create the cache database.
- Enter the directory with the service bits (assuming you're using ~/dev):
cd ~/dev/company_dns/company_dns - Install all prerequsites:
pip3 install -r ./requirements.txt - Create the database cache
python3 ./makedb.py
If everything above completed successfully then running company_dns can be performed via python3 ./company_dns.py this will run the service in the foreground.
Regardless of the approach taken to run the company_dns checking to see if it is operating is important. A quick way to check on service availability when running on localhost is to follow this link: http://localhost:8000/. If this is successful the embedded web interface will display (see screenshot below) describing core capabilities and function, examples with curl, and some helpful links to the company_dns GitHub repository.
A live system is available for Mediuroast efforts and for anyone to try out, relevant links are below.
- Embedded background - https://company-dns.mediumroast.io/
- Company search for IBM - https://company-dns.mediumroast.io/V3.0/global/company/merged/firmographics/IBM
- Standard industry code search for
Oil- https://www.mediumroast.io/company_dns/V3.0/na/sic/description/oil
If you encounter a problem with the company_dns please first review existing open issues, and if you find a match then please add a comment with any detail you might deem relevant. If you're unable to find an issue that matches the behavior you're seeing please open a new issue.
We try to keep high level Todos and Improvements in a list contained in a section below, and as we begin to work on things we will create a corresponding issue, link to it, progress and close it. However, if there is a change in design, major improvement, and so on something may fall off the list below. If something isn't on the list then please create a new issue and we will evaluate. We'll let you know if we pick up your request and progress to working on it.
Here are the things that are likely to be worked but without any strict deadline:
- Determine if feasible to talk to the companies house API for gathering data from the UK
- Initial feasibility has been checked, but the value of the data is still being evaluated
- Evaluate if financial data can be added from EDGAR, Wikipedia and Companies House
- Provide instructions/details for running on a Pi or Arm based system
- Since one of the target docker images is for ARM, the next logical step is to provide instructions for running on a Pi.
Since this code falls under a liberal Apache-V2 license it is provided as is, without warranty or guarantee of support. Feel free to fork the code, but please provide attribution to the authors.
- PyEdgar - used to interface with the SEC's EDGAR repository
- SQLite - helps all utilities and the RESTful service quickly and expressively respond to interactions with the other elements to find appropriate company data
- FastAPI - used to create the RESTful service
- Uvicorn - used to run the RESTful service
- GeoPy with ArcGIS - Enables proper address formatting and reporting of lat-long pairs for companies
- wptools - provides access to MediaWiki data for company search