Add crafted bad-data MMDB files for libmaxminddb by oschwald · Pull Request #219 · maxmind/MaxMind-DB

oschwald · 2026-02-23T19:31:48Z

Summary

Add --bad-data flag to write-test-data command for generating intentionally malformed MMDB files
Port libmaxminddb's mmdb_test_writer.h to Go as rawmmdb.go for crafting raw binary MMDB files
Generate 4 new bad-data databases in bad-data/libmaxminddb/

Context

PR maxmind/libmaxminddb#416 reviewer suggested moving crafted test databases here so other reader implementations (Go, Python, etc.) can test against them.

Databases

File	Error	Approach
`libmaxminddb-oversized-array.mmdb`	`MMDB_INVALID_DATA_ERROR` from `get_entry_data_list`	Raw binary (mmdbwriter validates sizes)
`libmaxminddb-oversized-map.mmdb`	`MMDB_INVALID_DATA_ERROR` from `get_entry_data_list`	Raw binary
`libmaxminddb-deep-nesting.mmdb`	`MMDB_INVALID_DATA_ERROR` from depth limit (512)	mmdbwriter (structurally valid, 600 levels)
`libmaxminddb-uint64-max-epoch.mmdb`	No reader error; exercises overflow in time conversions	Raw binary (`BuildEpoch` is `int64` in mmdbwriter)

Test plan

go build ./cmd/write-test-data compiles
go vet ./... passes
Generator produces identical output on repeated runs (deterministic)
libmaxminddb make check passes with these files (all 27 tests green)

🤖 Generated with Claude Code

gemini-code-assist · 2026-02-23T19:32:05Z

Summary of Changes

Hello @oschwald, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces functionality to generate a suite of intentionally malformed MaxMind DB (MMDB) files. The primary goal is to provide robust test cases for other MMDB reader implementations, ensuring they can gracefully handle corrupt or extreme data structures without crashing or encountering unhandled errors. This enhancement allows for more comprehensive testing of error handling mechanisms across various MaxMind DB client libraries.

Highlights

New --bad-data flag: A new command-line flag --bad-data has been added to the write-test-data command, allowing the generation of intentionally malformed MMDB files into a specified directory.
Raw MMDB binary encoding helpers: A new Go file pkg/writer/rawmmdb.go was added, porting libmaxminddb's mmdb_test_writer.h functionality. This provides low-level binary encoding functions to craft specific malformed MMDB structures that cannot be created with the standard mmdbwriter.
Generation of bad-data databases: Four new bad-data databases are now generated: libmaxminddb-oversized-array.mmdb, libmaxminddb-oversized-map.mmdb, libmaxminddb-deep-nesting.mmdb, and libmaxminddb-uint64-max-epoch.mmdb. These files are designed to test error handling and edge cases in MMDB reader implementations.
Deep nesting database: A database with 600 levels of nested maps is created, specifically to exceed libmaxminddb's MAXIMUM_DATA_STRUCTURE_DEPTH and trigger MMDB_INVALID_DATA_ERROR.
UINT64_MAX epoch database: A database with build_epoch set to UINT64_MAX is generated to test for potential overflow issues in time conversions within reader implementations.

Changelog

bad-data/README.md
- Added a note explaining the purpose and potential impact of the libmaxminddb-uint64-max-epoch.mmdb file.
cmd/write-test-data/main.go
- Introduced a new command-line flag --bad-data to specify the output directory for malformed MMDB files.
- Integrated a call to w.WriteBadDataDBs to generate these files if the --bad-data flag is provided.
pkg/writer/baddata.go
- Created WriteBadDataDBs to orchestrate the generation of various malformed MMDB files.
- Implemented writeRawDB for writing raw byte slices as MMDB files.
- Implemented writeDeepNestingDB to create an MMDB with 600 levels of nested maps, exceeding typical depth limits.
pkg/writer/rawmmdb.go
- Introduced low-level functions like writeMap, writeString, writeUint16, writeUint32, writeUint64, writeMetaKey, writeLargeArray, writeLargeMap, writeEmptyArray, writeSearchTree, and writeMetadataBlock for precise binary MMDB construction.
- Implemented buildOversizedArrayDB to create an MMDB with an array claiming 1,000,000 elements but containing only 2.
- Implemented buildOversizedMapDB to create an MMDB with a map claiming 1,000,000 entries but containing only 1 key-value pair.
- Implemented buildUint64MaxEpochDB to create an MMDB with a build_epoch set to UINT64_MAX, testing time conversion overflows.

Activity

The pull request description includes a detailed summary of changes, context, a table of new databases with their expected errors and approaches, and a test plan.
The pull request was generated using Claude Code, indicating an AI-assisted creation process.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request introduces functionality to generate intentionally malformed MMDB files for testing error handling in reader implementations. This includes adding a --bad-data flag to the write-test-data command, porting mmdb_test_writer.h to Go, and generating four new bad-data databases. The changes are well-structured and address the stated goal of providing robust test cases for MaxMind DB readers. The new pkg/writer/baddata.go and pkg/writer/rawmmdb.go files encapsulate the logic for creating these specialized databases, which is a good separation of concerns. The README.md update provides important context for one of the new bad-data files. Overall, the changes are a valuable addition for improving the robustness of MaxMind DB reader implementations.

gemini-code-assist · 2026-02-23T19:33:34Z

bad-data/README.md

+Note: `libmaxminddb/libmaxminddb-uint64-max-epoch.mmdb` contains a valid
+database structure with `build_epoch` set to `UINT64_MAX`. It may not produce
+a reader error but can cause overflow in time type conversions.


The note about libmaxminddb-uint64-max-epoch.mmdb is very helpful for understanding the purpose of this specific bad-data file. It clarifies that it's not about structural invalidity but rather an extreme value that can cause overflow in time conversions, which is an important distinction for testers.

gemini-code-assist · 2026-02-23T19:33:34Z

cmd/write-test-data/main.go

 func main() {
 	source := flag.String("source", "", "Source data directory")
 	target := flag.String("target", "", "Destination directory for the generated mmdb files")
+	badData := flag.String("bad-data", "", "Destination directory for generated bad mmdb files")


Adding a dedicated badData flag for the destination directory of malformed MMDB files is a good approach. It keeps the generation of these specific test files separate and clearly identifiable, which is important for managing test data.

gemini-code-assist · 2026-02-23T19:33:34Z

cmd/write-test-data/main.go

+	if *badData != "" {
+		if err := w.WriteBadDataDBs(*badData); err != nil {
+			fmt.Printf("writing bad data test databases: %+v\n", err)
+			os.Exit(1)
+		}


The conditional execution of WriteBadDataDBs based on the badData flag ensures that these potentially large or specialized files are only generated when explicitly requested. This prevents unnecessary overhead during regular test data generation.

gemini-code-assist · 2026-02-23T19:33:34Z

pkg/writer/baddata.go

+func (w *Writer) WriteBadDataDBs(target string) error {
+	//nolint:gosec // not security sensitive.
+	if err := os.MkdirAll(target, os.ModePerm); err != nil {
+		return fmt.Errorf("creating bad-data directory: %w", err)
+	}
+
+	// Raw binary databases — can't use mmdbwriter because the data is
+	// intentionally invalid or uses values mmdbwriter can't represent.
+	for _, db := range []struct {
+		name string
+		data []byte
+	}{
+		{"libmaxminddb-oversized-array.mmdb", buildOversizedArrayDB()},
+		{"libmaxminddb-oversized-map.mmdb", buildOversizedMapDB()},
+		{"libmaxminddb-uint64-max-epoch.mmdb", buildUint64MaxEpochDB()},
+	} {
+		if err := writeRawDB(target, db.name, db.data); err != nil {
+			return fmt.Errorf("writing %s: %w", db.name, err)
+		}
+	}
+
+	// Deep nesting uses mmdbwriter — structurally valid, just 600 levels deep.
+	if err := writeDeepNestingDB(target); err != nil {
+		return fmt.Errorf("writing deep nesting database: %w", err)
+	}
+
+	return nil


The WriteBadDataDBs function effectively orchestrates the creation of different types of malformed databases. Separating the raw binary databases from the mmdbwriter-generated ones (like writeDeepNestingDB) is a clear and logical division, reflecting the different approaches needed to craft these files.

gemini-code-assist · 2026-02-23T19:33:34Z

pkg/writer/baddata.go

+	for range depth {
+		value = mmdbtype.Map{"a": value}
+	}


The for range depth loop is a concise and idiomatic way to create the deeply nested map structure. This effectively generates the desired depth for testing the MAXIMUM_DATA_STRUCTURE_DEPTH limit.

gemini-code-assist · 2026-02-23T19:33:34Z

pkg/writer/rawmmdb.go

+func writeLargeArray(buf []byte, size uint32) int {
+	adjusted := size - 65821
+	buf[0] = (0 << 5) | 31 // extended type, size = case 31
+	buf[1] = 4             // extended type: 7 + 4 = 11 (array)
+	buf[2] = byte(adjusted >> 16)
+	buf[3] = byte(adjusted >> 8)
+	buf[4] = byte(adjusted)
+	return 5


The writeLargeArray function correctly implements the case-31 size encoding for arrays, which is crucial for crafting intentionally oversized arrays that mmdbwriter cannot produce directly. This demonstrates a deep understanding of the MMDB format specification.

gemini-code-assist · 2026-02-23T19:33:34Z

pkg/writer/rawmmdb.go

+func writeLargeMap(buf []byte, size uint32) int {
+	adjusted := size - 65821
+	buf[0] = (7 << 5) | 31 // type 7 (map), size = case 31
+	buf[1] = byte(adjusted >> 16)
+	buf[2] = byte(adjusted >> 8)
+	buf[3] = byte(adjusted)
+	return 4


Similarly, writeLargeMap correctly handles the case-31 size encoding for maps. This is essential for creating malformed maps that claim a large number of entries but contain fewer, which can expose vulnerabilities in reader implementations.

gemini-code-assist · 2026-02-23T19:33:34Z

pkg/writer/rawmmdb.go

+// buildOversizedArrayDB creates a complete MMDB with an array claiming
+// 1,000,000 elements but containing only 2 actual entries.
+func buildOversizedArrayDB() []byte {
+	const nodeCount = 1
+	const recordValue = nodeCount + 16
+
+	buf := make([]byte, 1024)
+	pos := 0
+
+	pos += writeSearchTree(buf[pos:], recordValue)
+
+	// 16-byte null separator
+	pos += dataSeparatorSize
+
+	// Data: array claiming 1M elements, only 2 strings present
+	pos += writeLargeArray(buf[pos:], 1_000_000)
+	pos += writeString(buf[pos:], "x")
+	pos += writeString(buf[pos:], "y")
+
+	pos += writeMetadataBlock(buf[pos:], nodeCount, 1_000_000_000)
+
+	return buf[:pos]


The buildOversizedArrayDB function effectively creates a database with an array claiming 1,000,000 elements but only containing two. This is a perfect example of a crafted bad-data file designed to test error handling for array size mismatches.

gemini-code-assist · 2026-02-23T19:33:34Z

pkg/writer/rawmmdb.go

+// buildOversizedMapDB creates a complete MMDB with a map claiming
+// 1,000,000 entries but containing only 1 key-value pair.
+func buildOversizedMapDB() []byte {
+	const nodeCount = 1
+	const recordValue = nodeCount + 16
+
+	buf := make([]byte, 1024)
+	pos := 0
+
+	pos += writeSearchTree(buf[pos:], recordValue)
+
+	// 16-byte null separator
+	pos += dataSeparatorSize
+
+	// Data: map claiming 1M entries, only 1 k/v pair present
+	pos += writeLargeMap(buf[pos:], 1_000_000)
+	pos += writeString(buf[pos:], "k")
+	pos += writeString(buf[pos:], "v")
+
+	pos += writeMetadataBlock(buf[pos:], nodeCount, 1_000_000_000)
+
+	return buf[:pos]


The buildOversizedMapDB function similarly creates a database with a map claiming 1,000,000 entries but only containing one key-value pair. This is another excellent test case for map size validation in reader implementations.

gemini-code-assist · 2026-02-23T19:33:34Z

pkg/writer/rawmmdb.go

+// buildUint64MaxEpochDB creates a complete MMDB with build_epoch set to
+// UINT64_MAX (18446744073709551615). The database is structurally valid
+// but the extreme epoch value can cause overflow in time conversions.
+func buildUint64MaxEpochDB() []byte {
+	const nodeCount = 1
+	const recordValue = nodeCount + 16
+
+	buf := make([]byte, 1024)
+	pos := 0
+
+	pos += writeSearchTree(buf[pos:], recordValue)
+
+	// 16-byte null separator
+	pos += dataSeparatorSize
+
+	// Data: a simple map with one string entry
+	pos += writeMap(buf[pos:], 1)
+	pos += writeString(buf[pos:], "ip")
+	pos += writeString(buf[pos:], "test")
+
+	pos += writeMetadataBlock(buf[pos:], nodeCount, ^uint64(0))
+
+	return buf[:pos]


The buildUint64MaxEpochDB function correctly sets the build_epoch to UINT64_MAX. This is a valuable test case for checking how different reader implementations handle extreme epoch values, particularly regarding potential overflows in time conversions, as noted in the README.md.

Add --bad-data flag to write-test-data that generates intentionally malformed MMDB files for testing error handling in reader implementations. New generators: - Oversized array: claims 1M elements, has 2 (raw binary) - Oversized map: claims 1M entries, has 1 (raw binary) - UINT64_MAX build_epoch: extreme metadata value (raw binary) - Deep nesting: 600-level nested maps via mmdbwriter The raw binary approach is necessary for 3 of 4 databases because mmdbwriter validates data structures and can't represent UINT64_MAX as a build epoch (its BuildEpoch field is int64). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Generated via: write-test-data --bad-data bad-data/libmaxminddb Files added: - libmaxminddb-oversized-array.mmdb: array claiming 1M elements, 2 present - libmaxminddb-oversized-map.mmdb: map claiming 1M entries, 1 present - libmaxminddb-deep-nesting.mmdb: 600-level nested maps (exceeds 512 depth limit) - libmaxminddb-uint64-max-epoch.mmdb: valid DB with UINT64_MAX build_epoch The first three should produce MMDB_INVALID_DATA_ERROR in libmaxminddb. The epoch database is structurally valid but exercises overflow in time conversions across reader implementations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add two new bad-data generators for libmaxminddb testing: - corrupt-search-tree: metadata claims 100 nodes but file has only 1 - deep-array-nesting: 600-level nested arrays exceeding depth limit Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

findRepoRoot() walks up from cwd looking for the go.mod belonging to this module, then uses the result to default -source, -target, and -bad-data flags. This allows zero-flag invocation from anywhere inside the repo tree while still allowing explicit overrides. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Document zero-flag usage and flag overrides for write-test-data. Update copyright year to 2026. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

horgh · 2026-02-23T21:55:35Z

pkg/writer/baddata.go

+	if err != nil {
+		return fmt.Errorf("creating file: %w", err)
+	}
+	defer outputFile.Close()


Would we want to error check this one, given it's writing?

horgh · 2026-02-23T22:01:07Z

pkg/writer/baddata.go

+	if err != nil {
+		return fmt.Errorf("creating file: %w", err)
+	}
+	defer outputFile.Close()


horgh · 2026-02-23T22:05:41Z

README.md

+# Generating Test Data
+
+The `write-test-data` command generates the MMDB test files under `test-data/`
+and `bad-data/`.


It looks like this'll be bad-data/libmaxminddb currently.

Yeah. There is a pre-existing pattern of putting these bad databases in a subdir based on the implementation where it exposed a bug. Most of the existing databases are static files that were found via fuzzing. Presumably a future database made by the program could be under maxminddb-rust or something.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist bot reviewed Feb 23, 2026

View reviewed changes

oschwald force-pushed the greg/eng-4239 branch 2 times, most recently from 3db65ec to f6e7af9 Compare February 23, 2026 20:14

oschwald and others added 5 commits February 23, 2026 12:28

Add test data generation instructions to README

c0f5ef9

Document zero-flag usage and flag overrides for write-test-data. Update copyright year to 2026. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

oschwald force-pushed the greg/eng-4239 branch from f6e7af9 to c0f5ef9 Compare February 23, 2026 20:28

horgh requested changes Feb 23, 2026

View reviewed changes

oschwald and others added 2 commits February 23, 2026 14:17

Check error from outputFile.Close() in bad-data writers

da51897

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Clarify bad-data subdirectory naming convention in README

2a0c110

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

horgh approved these changes Feb 23, 2026

View reviewed changes

horgh merged commit ddee32e into main Feb 23, 2026
12 checks passed

horgh deleted the greg/eng-4239 branch February 23, 2026 22:37

Comments

Conversation

oschwald commented Feb 23, 2026

Summary

Context

Databases

Test plan

Uh oh!

gemini-code-assist bot commented Feb 23, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

horgh Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

horgh Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

horgh Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

oschwald Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants