| Version | Supported |
|---|---|
| 1.x | ✅ |
If you discover a security vulnerability in TDO Standalone Extractor, please report it responsibly:
- Do NOT create a public GitHub issue for security vulnerabilities
- Email the maintainers directly with details of the vulnerability
- Include:
- Description of the vulnerability
- Steps to reproduce
- Potential impact
- Any suggested fixes (optional)
We will acknowledge receipt within 48 hours and provide an estimated timeline for a fix.
- Never commit API keys to version control
- Store
GEMINI_API_KEYin a.envfile (automatically excluded by.gitignore) - Use environment variables in CI/CD pipelines
- Rotate API keys periodically
- Use Google Cloud IAM to restrict API key permissions
- Only process files from trusted sources
- Be aware that extracted data may contain sensitive information from source documents
- Review output files before sharing (they may contain infrastructure IOCs, file paths, etc.)
- Consider sanitizing output before sharing publicly
- Extracted JSON and Markdown files may contain:
- IP addresses and domain names (infrastructure IOCs)
- File paths from source documents
- Organization names and identities
- Detailed technical information about threats
- Review outputs before sharing outside your organization
- Run in isolated environments when processing untrusted documents
- Use virtual environments to isolate dependencies
- Keep dependencies updated:
pip install -U -r requirements.txt - Consider containerization for production deployments
- ✅ Environment-based configuration (no hardcoded secrets)
- ✅
.envfile excluded from version control - ✅ Input file format validation
- ✅ Graceful error handling without exposing internals
- ✅ No network access except for configured LLM API
For production or sensitive deployments, consider:
- Running behind a firewall with API egress restricted to Google AI endpoints
- Implementing rate limiting for batch processing
- Adding authentication for any web interfaces
- Logging access and extraction activities
- Encrypting stored extraction results
This project uses the following external dependencies:
| Package | Purpose | Security Notes |
|---|---|---|
google-generativeai |
LLM API client | Official Google package |
PyMuPDF (fitz) |
PDF parsing | Well-maintained, handles malformed PDFs safely |
python-docx |
DOCX parsing | Standard library for Office documents |
pydantic |
Data validation | Provides input validation |
python-dotenv |
Environment loading | Standard env file handling |
Monitor dependencies for vulnerabilities:
# Install safety checker
pip install safety
# Check for known vulnerabilities
safety check -r requirements.txt- This tool sends document text to Google's Gemini API for processing
- Review Google's AI data usage policies before processing sensitive documents
- Consider data residency requirements for your organization
- No telemetry or usage data is collected by this tool itself
Security-related changes will be documented here:
- v1.0.0 - Initial security review and documentation