Skip to content

Comments

Add crafted bad-data MMDB files for libmaxminddb#219

Merged
horgh merged 7 commits intomainfrom
greg/eng-4239
Feb 23, 2026
Merged

Add crafted bad-data MMDB files for libmaxminddb#219
horgh merged 7 commits intomainfrom
greg/eng-4239

Conversation

@oschwald
Copy link
Member

Summary

  • Add --bad-data flag to write-test-data command for generating intentionally malformed MMDB files
  • Port libmaxminddb's mmdb_test_writer.h to Go as rawmmdb.go for crafting raw binary MMDB files
  • Generate 4 new bad-data databases in bad-data/libmaxminddb/

Context

PR maxmind/libmaxminddb#416 reviewer suggested moving crafted test databases here so other reader implementations (Go, Python, etc.) can test against them.

Databases

File Error Approach
libmaxminddb-oversized-array.mmdb MMDB_INVALID_DATA_ERROR from get_entry_data_list Raw binary (mmdbwriter validates sizes)
libmaxminddb-oversized-map.mmdb MMDB_INVALID_DATA_ERROR from get_entry_data_list Raw binary
libmaxminddb-deep-nesting.mmdb MMDB_INVALID_DATA_ERROR from depth limit (512) mmdbwriter (structurally valid, 600 levels)
libmaxminddb-uint64-max-epoch.mmdb No reader error; exercises overflow in time conversions Raw binary (BuildEpoch is int64 in mmdbwriter)

Test plan

  • go build ./cmd/write-test-data compiles
  • go vet ./... passes
  • Generator produces identical output on repeated runs (deterministic)
  • libmaxminddb make check passes with these files (all 27 tests green)

🤖 Generated with Claude Code

@gemini-code-assist
Copy link

Summary of Changes

Hello @oschwald, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces functionality to generate a suite of intentionally malformed MaxMind DB (MMDB) files. The primary goal is to provide robust test cases for other MMDB reader implementations, ensuring they can gracefully handle corrupt or extreme data structures without crashing or encountering unhandled errors. This enhancement allows for more comprehensive testing of error handling mechanisms across various MaxMind DB client libraries.

Highlights

  • New --bad-data flag: A new command-line flag --bad-data has been added to the write-test-data command, allowing the generation of intentionally malformed MMDB files into a specified directory.
  • Raw MMDB binary encoding helpers: A new Go file pkg/writer/rawmmdb.go was added, porting libmaxminddb's mmdb_test_writer.h functionality. This provides low-level binary encoding functions to craft specific malformed MMDB structures that cannot be created with the standard mmdbwriter.
  • Generation of bad-data databases: Four new bad-data databases are now generated: libmaxminddb-oversized-array.mmdb, libmaxminddb-oversized-map.mmdb, libmaxminddb-deep-nesting.mmdb, and libmaxminddb-uint64-max-epoch.mmdb. These files are designed to test error handling and edge cases in MMDB reader implementations.
  • Deep nesting database: A database with 600 levels of nested maps is created, specifically to exceed libmaxminddb's MAXIMUM_DATA_STRUCTURE_DEPTH and trigger MMDB_INVALID_DATA_ERROR.
  • UINT64_MAX epoch database: A database with build_epoch set to UINT64_MAX is generated to test for potential overflow issues in time conversions within reader implementations.
Changelog
  • bad-data/README.md
    • Added a note explaining the purpose and potential impact of the libmaxminddb-uint64-max-epoch.mmdb file.
  • cmd/write-test-data/main.go
    • Introduced a new command-line flag --bad-data to specify the output directory for malformed MMDB files.
    • Integrated a call to w.WriteBadDataDBs to generate these files if the --bad-data flag is provided.
  • pkg/writer/baddata.go
    • Created WriteBadDataDBs to orchestrate the generation of various malformed MMDB files.
    • Implemented writeRawDB for writing raw byte slices as MMDB files.
    • Implemented writeDeepNestingDB to create an MMDB with 600 levels of nested maps, exceeding typical depth limits.
  • pkg/writer/rawmmdb.go
    • Introduced low-level functions like writeMap, writeString, writeUint16, writeUint32, writeUint64, writeMetaKey, writeLargeArray, writeLargeMap, writeEmptyArray, writeSearchTree, and writeMetadataBlock for precise binary MMDB construction.
    • Implemented buildOversizedArrayDB to create an MMDB with an array claiming 1,000,000 elements but containing only 2.
    • Implemented buildOversizedMapDB to create an MMDB with a map claiming 1,000,000 entries but containing only 1 key-value pair.
    • Implemented buildUint64MaxEpochDB to create an MMDB with a build_epoch set to UINT64_MAX, testing time conversion overflows.
Activity
  • The pull request description includes a detailed summary of changes, context, a table of new databases with their expected errors and approaches, and a test plan.
  • The pull request was generated using Claude Code, indicating an AI-assisted creation process.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces functionality to generate intentionally malformed MMDB files for testing error handling in reader implementations. This includes adding a --bad-data flag to the write-test-data command, porting mmdb_test_writer.h to Go, and generating four new bad-data databases. The changes are well-structured and address the stated goal of providing robust test cases for MaxMind DB readers. The new pkg/writer/baddata.go and pkg/writer/rawmmdb.go files encapsulate the logic for creating these specialized databases, which is a good separation of concerns. The README.md update provides important context for one of the new bad-data files. Overall, the changes are a valuable addition for improving the robustness of MaxMind DB reader implementations.

Comment on lines +6 to +8
Note: `libmaxminddb/libmaxminddb-uint64-max-epoch.mmdb` contains a valid
database structure with `build_epoch` set to `UINT64_MAX`. It may not produce
a reader error but can cause overflow in time type conversions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The note about libmaxminddb-uint64-max-epoch.mmdb is very helpful for understanding the purpose of this specific bad-data file. It clarifies that it's not about structural invalidity but rather an extreme value that can cause overflow in time conversions, which is an important distinction for testers.

func main() {
source := flag.String("source", "", "Source data directory")
target := flag.String("target", "", "Destination directory for the generated mmdb files")
badData := flag.String("bad-data", "", "Destination directory for generated bad mmdb files")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Adding a dedicated badData flag for the destination directory of malformed MMDB files is a good approach. It keeps the generation of these specific test files separate and clearly identifiable, which is important for managing test data.

Comment on lines +70 to +74
if *badData != "" {
if err := w.WriteBadDataDBs(*badData); err != nil {
fmt.Printf("writing bad data test databases: %+v\n", err)
os.Exit(1)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The conditional execution of WriteBadDataDBs based on the badData flag ensures that these potentially large or specialized files are only generated when explicitly requested. This prevents unnecessary overhead during regular test data generation.

Comment on lines 16 to 42
func (w *Writer) WriteBadDataDBs(target string) error {
//nolint:gosec // not security sensitive.
if err := os.MkdirAll(target, os.ModePerm); err != nil {
return fmt.Errorf("creating bad-data directory: %w", err)
}

// Raw binary databases — can't use mmdbwriter because the data is
// intentionally invalid or uses values mmdbwriter can't represent.
for _, db := range []struct {
name string
data []byte
}{
{"libmaxminddb-oversized-array.mmdb", buildOversizedArrayDB()},
{"libmaxminddb-oversized-map.mmdb", buildOversizedMapDB()},
{"libmaxminddb-uint64-max-epoch.mmdb", buildUint64MaxEpochDB()},
} {
if err := writeRawDB(target, db.name, db.data); err != nil {
return fmt.Errorf("writing %s: %w", db.name, err)
}
}

// Deep nesting uses mmdbwriter — structurally valid, just 600 levels deep.
if err := writeDeepNestingDB(target); err != nil {
return fmt.Errorf("writing deep nesting database: %w", err)
}

return nil

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The WriteBadDataDBs function effectively orchestrates the creation of different types of malformed databases. Separating the raw binary databases from the mmdbwriter-generated ones (like writeDeepNestingDB) is a clear and logical division, reflecting the different approaches needed to craft these files.

Comment on lines +72 to +74
for range depth {
value = mmdbtype.Map{"a": value}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The for range depth loop is a concise and idiomatic way to create the deeply nested map structure. This effectively generates the desired depth for testing the MAXIMUM_DATA_STRUCTURE_DEPTH limit.

Comment on lines 58 to 65
func writeLargeArray(buf []byte, size uint32) int {
adjusted := size - 65821
buf[0] = (0 << 5) | 31 // extended type, size = case 31
buf[1] = 4 // extended type: 7 + 4 = 11 (array)
buf[2] = byte(adjusted >> 16)
buf[3] = byte(adjusted >> 8)
buf[4] = byte(adjusted)
return 5

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The writeLargeArray function correctly implements the case-31 size encoding for arrays, which is crucial for crafting intentionally oversized arrays that mmdbwriter cannot produce directly. This demonstrates a deep understanding of the MMDB format specification.

Comment on lines 70 to 76
func writeLargeMap(buf []byte, size uint32) int {
adjusted := size - 65821
buf[0] = (7 << 5) | 31 // type 7 (map), size = case 31
buf[1] = byte(adjusted >> 16)
buf[2] = byte(adjusted >> 8)
buf[3] = byte(adjusted)
return 4

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similarly, writeLargeMap correctly handles the case-31 size encoding for maps. This is essential for creating malformed maps that claim a large number of entries but contain fewer, which can expose vulnerabilities in reader implementations.

Comment on lines +138 to +159
// buildOversizedArrayDB creates a complete MMDB with an array claiming
// 1,000,000 elements but containing only 2 actual entries.
func buildOversizedArrayDB() []byte {
const nodeCount = 1
const recordValue = nodeCount + 16

buf := make([]byte, 1024)
pos := 0

pos += writeSearchTree(buf[pos:], recordValue)

// 16-byte null separator
pos += dataSeparatorSize

// Data: array claiming 1M elements, only 2 strings present
pos += writeLargeArray(buf[pos:], 1_000_000)
pos += writeString(buf[pos:], "x")
pos += writeString(buf[pos:], "y")

pos += writeMetadataBlock(buf[pos:], nodeCount, 1_000_000_000)

return buf[:pos]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The buildOversizedArrayDB function effectively creates a database with an array claiming 1,000,000 elements but only containing two. This is a perfect example of a crafted bad-data file designed to test error handling for array size mismatches.

Comment on lines +162 to +183
// buildOversizedMapDB creates a complete MMDB with a map claiming
// 1,000,000 entries but containing only 1 key-value pair.
func buildOversizedMapDB() []byte {
const nodeCount = 1
const recordValue = nodeCount + 16

buf := make([]byte, 1024)
pos := 0

pos += writeSearchTree(buf[pos:], recordValue)

// 16-byte null separator
pos += dataSeparatorSize

// Data: map claiming 1M entries, only 1 k/v pair present
pos += writeLargeMap(buf[pos:], 1_000_000)
pos += writeString(buf[pos:], "k")
pos += writeString(buf[pos:], "v")

pos += writeMetadataBlock(buf[pos:], nodeCount, 1_000_000_000)

return buf[:pos]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The buildOversizedMapDB function similarly creates a database with a map claiming 1,000,000 entries but only containing one key-value pair. This is another excellent test case for map size validation in reader implementations.

Comment on lines +186 to +208
// buildUint64MaxEpochDB creates a complete MMDB with build_epoch set to
// UINT64_MAX (18446744073709551615). The database is structurally valid
// but the extreme epoch value can cause overflow in time conversions.
func buildUint64MaxEpochDB() []byte {
const nodeCount = 1
const recordValue = nodeCount + 16

buf := make([]byte, 1024)
pos := 0

pos += writeSearchTree(buf[pos:], recordValue)

// 16-byte null separator
pos += dataSeparatorSize

// Data: a simple map with one string entry
pos += writeMap(buf[pos:], 1)
pos += writeString(buf[pos:], "ip")
pos += writeString(buf[pos:], "test")

pos += writeMetadataBlock(buf[pos:], nodeCount, ^uint64(0))

return buf[:pos]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The buildUint64MaxEpochDB function correctly sets the build_epoch to UINT64_MAX. This is a valuable test case for checking how different reader implementations handle extreme epoch values, particularly regarding potential overflows in time conversions, as noted in the README.md.

@oschwald oschwald force-pushed the greg/eng-4239 branch 2 times, most recently from 3db65ec to f6e7af9 Compare February 23, 2026 20:14
oschwald and others added 5 commits February 23, 2026 12:28
Add --bad-data flag to write-test-data that generates intentionally
malformed MMDB files for testing error handling in reader implementations.

New generators:
- Oversized array: claims 1M elements, has 2 (raw binary)
- Oversized map: claims 1M entries, has 1 (raw binary)
- UINT64_MAX build_epoch: extreme metadata value (raw binary)
- Deep nesting: 600-level nested maps via mmdbwriter

The raw binary approach is necessary for 3 of 4 databases because
mmdbwriter validates data structures and can't represent UINT64_MAX
as a build epoch (its BuildEpoch field is int64).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Generated via: write-test-data --bad-data bad-data/libmaxminddb

Files added:
- libmaxminddb-oversized-array.mmdb: array claiming 1M elements, 2 present
- libmaxminddb-oversized-map.mmdb: map claiming 1M entries, 1 present
- libmaxminddb-deep-nesting.mmdb: 600-level nested maps (exceeds 512 depth limit)
- libmaxminddb-uint64-max-epoch.mmdb: valid DB with UINT64_MAX build_epoch

The first three should produce MMDB_INVALID_DATA_ERROR in libmaxminddb.
The epoch database is structurally valid but exercises overflow in time
conversions across reader implementations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add two new bad-data generators for libmaxminddb testing:
- corrupt-search-tree: metadata claims 100 nodes but file has only 1
- deep-array-nesting: 600-level nested arrays exceeding depth limit

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
findRepoRoot() walks up from cwd looking for the go.mod belonging to
this module, then uses the result to default -source, -target, and
-bad-data flags. This allows zero-flag invocation from anywhere inside
the repo tree while still allowing explicit overrides.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document zero-flag usage and flag overrides for write-test-data.
Update copyright year to 2026.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
if err != nil {
return fmt.Errorf("creating file: %w", err)
}
defer outputFile.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we want to error check this one, given it's writing?

if err != nil {
return fmt.Errorf("creating file: %w", err)
}
defer outputFile.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise

# Generating Test Data

The `write-test-data` command generates the MMDB test files under `test-data/`
and `bad-data/`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this'll be bad-data/libmaxminddb currently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. There is a pre-existing pattern of putting these bad databases in a subdir based on the implementation where it exposed a bug. Most of the existing databases are static files that were found via fuzzing. Presumably a future database made by the program could be under maxminddb-rust or something.

oschwald and others added 2 commits February 23, 2026 14:17
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@horgh horgh merged commit ddee32e into main Feb 23, 2026
12 checks passed
@horgh horgh deleted the greg/eng-4239 branch February 23, 2026 22:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants