Documentation:
- Introduction
- Best Practices
- Installation
- Configuration
- Quickstart
- Usage
- Custom Patterns
- Built-in Patterns
- Example Configuration
- Troubleshooting
- Release Notes
- Licence
pgEdge Anonymizer is a command-line tool for anonymizing personally identifiable information (PII) in PostgreSQL databases. The tool replaces sensitive data with realistic fake values that you can use for development and testing, while maintaining data consistency and referential integrity.
- 100+ built-in patterns for common PII types across 19 countries
- Consistent replacement - same input produces same output within a run
- Foreign key awareness - automatically handles CASCADE relationships
- Large database support - efficient batch processing with server-side cursors
- Format preservation - maintains original data formatting where possible
- Single transaction - all changes committed atomically or rolled back
- Extensible - define custom patterns using date, number, or mask formats
Anonymizer lets you create an experimental data set that preserves the shape and integrity of a Postgres database in just three steps:
- Create a configuration file that specifies the replacement patterns for your columns.
- Build and run the
pgedge-anonymizerto convert your columns. - Review the results.
Before running pgedge-anonymizer, you need to create a configuration file named pgedge-anonymizer.yaml; the file should contain:
- a
databasesection, with connection details for your database. - a
columnssection, listing the fully-qualified columns that you wish to anonymize (inschema_name.table_name.column_nameformat). patternsproperties for each column that specifies the form that replacement content will take.
For example:
database:
host: localhost
port: 5432
database: myapp
user: anonymizer
columns:
- column: public.users.email
pattern: EMAIL
- column: public.users.phone
pattern: US_PHONE
- column: public.users.ssn
pattern: US_SSNAfter creating a configuration file, run the anonymizer:
pgedge-anonymizer runReview the list of changes as pgedge-anonymizer runs, displaying statistics:
Processing public.users.email (est. 50000 rows)...
10000 rows processed
20000 rows processed
30000 rows processed
40000 rows processed
50000 rows processed
Completed: 50000 rows, 48234 values anonymized
=== Anonymization Statistics ===
Total columns processed: 1
Total rows processed: 50000
Total values anonymized: 48234
Total duration: 2.34s
Throughput: 21367 rows/sec
Prerequisites
- Go 1.24 or later
- PostgreSQL (for integration tests)
- Python 3.12+ (for documentation)
Use the following command to build pgedge-anonymizer:
make build # Build binaryUse the following command to run the Anonymizer test suite:
make testUse the following command to run the Go Linter:
make lintUse the following command to format the code:
make fmt- GitHub Issues
- Full documentation is available at the pgEdge website.
This project is licensed under the PostgreSQL License.