Skip to content

Security: Blevene/standalone_tdo

Security

SECURITY.md

Security Policy

Supported Versions

Version Supported
1.x

Reporting a Vulnerability

If you discover a security vulnerability in TDO Standalone Extractor, please report it responsibly:

  1. Do NOT create a public GitHub issue for security vulnerabilities
  2. Email the maintainers directly with details of the vulnerability
  3. Include:
    • Description of the vulnerability
    • Steps to reproduce
    • Potential impact
    • Any suggested fixes (optional)

We will acknowledge receipt within 48 hours and provide an estimated timeline for a fix.

Security Best Practices for Users

API Key Protection

  • Never commit API keys to version control
  • Store GEMINI_API_KEY in a .env file (automatically excluded by .gitignore)
  • Use environment variables in CI/CD pipelines
  • Rotate API keys periodically
  • Use Google Cloud IAM to restrict API key permissions

File Handling

  • Only process files from trusted sources
  • Be aware that extracted data may contain sensitive information from source documents
  • Review output files before sharing (they may contain infrastructure IOCs, file paths, etc.)
  • Consider sanitizing output before sharing publicly

Output Data Considerations

  • Extracted JSON and Markdown files may contain:
    • IP addresses and domain names (infrastructure IOCs)
    • File paths from source documents
    • Organization names and identities
    • Detailed technical information about threats
  • Review outputs before sharing outside your organization

Deployment Security

  • Run in isolated environments when processing untrusted documents
  • Use virtual environments to isolate dependencies
  • Keep dependencies updated: pip install -U -r requirements.txt
  • Consider containerization for production deployments

Security Features

Implemented

  • ✅ Environment-based configuration (no hardcoded secrets)
  • .env file excluded from version control
  • ✅ Input file format validation
  • ✅ Graceful error handling without exposing internals
  • ✅ No network access except for configured LLM API

Recommended Additional Measures

For production or sensitive deployments, consider:

  • Running behind a firewall with API egress restricted to Google AI endpoints
  • Implementing rate limiting for batch processing
  • Adding authentication for any web interfaces
  • Logging access and extraction activities
  • Encrypting stored extraction results

Dependency Security

This project uses the following external dependencies:

Package Purpose Security Notes
google-generativeai LLM API client Official Google package
PyMuPDF (fitz) PDF parsing Well-maintained, handles malformed PDFs safely
python-docx DOCX parsing Standard library for Office documents
pydantic Data validation Provides input validation
python-dotenv Environment loading Standard env file handling

Monitor dependencies for vulnerabilities:

# Install safety checker
pip install safety

# Check for known vulnerabilities
safety check -r requirements.txt

Data Privacy

  • This tool sends document text to Google's Gemini API for processing
  • Review Google's AI data usage policies before processing sensitive documents
  • Consider data residency requirements for your organization
  • No telemetry or usage data is collected by this tool itself

Changelog

Security-related changes will be documented here:

  • v1.0.0 - Initial security review and documentation

There aren’t any published security advisories