pgEdge Anonymizer

Documentation:

pgEdge Anonymizer is a command-line tool for anonymizing personally identifiable information (PII) in PostgreSQL databases. The tool replaces sensitive data with realistic fake values that you can use for development and testing, while maintaining data consistency and referential integrity.

Features

100+ built-in patterns for common PII types across 19 countries
Consistent replacement - same input produces same output within a run
Foreign key awareness - automatically handles CASCADE relationships
Large database support - efficient batch processing with server-side cursors
Format preservation - maintains original data formatting where possible
Single transaction - all changes committed atomically or rolled back
Extensible - define custom patterns using date, number, or mask formats

Quick Start

Anonymizer lets you create an experimental data set that preserves the shape and integrity of a Postgres database in just three steps:

Create a configuration file that specifies the replacement patterns for your columns.
Build and run the pgedge-anonymizer to convert your columns.
Review the results.

Before running pgedge-anonymizer, you need to create a configuration file named pgedge-anonymizer.yaml; the file should contain:

a database section, with connection details for your database.
a columns section, listing the fully-qualified columns that you wish to anonymize (in schema_name.table_name.column_name format).
patterns properties for each column that specifies the form that replacement content will take.

For example:

database:
  host: localhost
  port: 5432
  database: myapp
  user: anonymizer

columns:
  - column: public.users.email
    pattern: EMAIL

  - column: public.users.phone
    pattern: US_PHONE

  - column: public.users.ssn
    pattern: US_SSN

After creating a configuration file, run the anonymizer:

pgedge-anonymizer run

Review the list of changes as pgedge-anonymizer runs, displaying statistics:

Processing public.users.email (est. 50000 rows)...
  10000 rows processed
  20000 rows processed
  30000 rows processed
  40000 rows processed
  50000 rows processed
  Completed: 50000 rows, 48234 values anonymized

=== Anonymization Statistics ===
Total columns processed: 1
Total rows processed:    50000
Total values anonymized: 48234
Total duration:          2.34s
Throughput:              21367 rows/sec

Developer Notes

Prerequisites

Go 1.24 or later
PostgreSQL (for integration tests)
Python 3.12+ (for documentation)

Use the following command to build pgedge-anonymizer:

make build        # Build binary

Use the following command to run the Anonymizer test suite:

make test

Use the following command to run the Go Linter:

make lint

Use the following command to format the code:

make fmt

Support

GitHub Issues
Full documentation is available at the pgEdge website.

License

This project is licensed under the PostgreSQL License.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.claude		.claude
.github/workflows		.github/workflows
cmd/pgedge-anonymizer		cmd/pgedge-anonymizer
docs		docs
examples		examples
internal		internal
.gitignore		.gitignore
.goreleaser.yaml		.goreleaser.yaml
LICENCE.md		LICENCE.md
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
mkdocs.yml		mkdocs.yml
pgedge-anonymizer-patterns.yaml		pgedge-anonymizer-patterns.yaml
pgedge-anonymizer.yaml.example		pgedge-anonymizer.yaml.example
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pgEdge Anonymizer

Features

Quick Start

Developer Notes

Support

License

About

Uh oh!

Releases

Contributors 4

Uh oh!

Languages

License

pgEdge/pgedge-anonymizer

Folders and files

Latest commit

History

Repository files navigation

pgEdge Anonymizer

Features

Quick Start

Developer Notes

Support

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors 4

Uh oh!

Languages