Add crafted bad-data MMDB files for libmaxminddb#219
Conversation
Summary of ChangesHello @oschwald, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces functionality to generate a suite of intentionally malformed MaxMind DB (MMDB) files. The primary goal is to provide robust test cases for other MMDB reader implementations, ensuring they can gracefully handle corrupt or extreme data structures without crashing or encountering unhandled errors. This enhancement allows for more comprehensive testing of error handling mechanisms across various MaxMind DB client libraries. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request introduces functionality to generate intentionally malformed MMDB files for testing error handling in reader implementations. This includes adding a --bad-data flag to the write-test-data command, porting mmdb_test_writer.h to Go, and generating four new bad-data databases. The changes are well-structured and address the stated goal of providing robust test cases for MaxMind DB readers. The new pkg/writer/baddata.go and pkg/writer/rawmmdb.go files encapsulate the logic for creating these specialized databases, which is a good separation of concerns. The README.md update provides important context for one of the new bad-data files. Overall, the changes are a valuable addition for improving the robustness of MaxMind DB reader implementations.
| Note: `libmaxminddb/libmaxminddb-uint64-max-epoch.mmdb` contains a valid | ||
| database structure with `build_epoch` set to `UINT64_MAX`. It may not produce | ||
| a reader error but can cause overflow in time type conversions. |
There was a problem hiding this comment.
The note about libmaxminddb-uint64-max-epoch.mmdb is very helpful for understanding the purpose of this specific bad-data file. It clarifies that it's not about structural invalidity but rather an extreme value that can cause overflow in time conversions, which is an important distinction for testers.
cmd/write-test-data/main.go
Outdated
| func main() { | ||
| source := flag.String("source", "", "Source data directory") | ||
| target := flag.String("target", "", "Destination directory for the generated mmdb files") | ||
| badData := flag.String("bad-data", "", "Destination directory for generated bad mmdb files") |
| if *badData != "" { | ||
| if err := w.WriteBadDataDBs(*badData); err != nil { | ||
| fmt.Printf("writing bad data test databases: %+v\n", err) | ||
| os.Exit(1) | ||
| } |
| func (w *Writer) WriteBadDataDBs(target string) error { | ||
| //nolint:gosec // not security sensitive. | ||
| if err := os.MkdirAll(target, os.ModePerm); err != nil { | ||
| return fmt.Errorf("creating bad-data directory: %w", err) | ||
| } | ||
|
|
||
| // Raw binary databases — can't use mmdbwriter because the data is | ||
| // intentionally invalid or uses values mmdbwriter can't represent. | ||
| for _, db := range []struct { | ||
| name string | ||
| data []byte | ||
| }{ | ||
| {"libmaxminddb-oversized-array.mmdb", buildOversizedArrayDB()}, | ||
| {"libmaxminddb-oversized-map.mmdb", buildOversizedMapDB()}, | ||
| {"libmaxminddb-uint64-max-epoch.mmdb", buildUint64MaxEpochDB()}, | ||
| } { | ||
| if err := writeRawDB(target, db.name, db.data); err != nil { | ||
| return fmt.Errorf("writing %s: %w", db.name, err) | ||
| } | ||
| } | ||
|
|
||
| // Deep nesting uses mmdbwriter — structurally valid, just 600 levels deep. | ||
| if err := writeDeepNestingDB(target); err != nil { | ||
| return fmt.Errorf("writing deep nesting database: %w", err) | ||
| } | ||
|
|
||
| return nil |
There was a problem hiding this comment.
The WriteBadDataDBs function effectively orchestrates the creation of different types of malformed databases. Separating the raw binary databases from the mmdbwriter-generated ones (like writeDeepNestingDB) is a clear and logical division, reflecting the different approaches needed to craft these files.
| for range depth { | ||
| value = mmdbtype.Map{"a": value} | ||
| } |
| func writeLargeArray(buf []byte, size uint32) int { | ||
| adjusted := size - 65821 | ||
| buf[0] = (0 << 5) | 31 // extended type, size = case 31 | ||
| buf[1] = 4 // extended type: 7 + 4 = 11 (array) | ||
| buf[2] = byte(adjusted >> 16) | ||
| buf[3] = byte(adjusted >> 8) | ||
| buf[4] = byte(adjusted) | ||
| return 5 |
There was a problem hiding this comment.
| func writeLargeMap(buf []byte, size uint32) int { | ||
| adjusted := size - 65821 | ||
| buf[0] = (7 << 5) | 31 // type 7 (map), size = case 31 | ||
| buf[1] = byte(adjusted >> 16) | ||
| buf[2] = byte(adjusted >> 8) | ||
| buf[3] = byte(adjusted) | ||
| return 4 |
| // buildOversizedArrayDB creates a complete MMDB with an array claiming | ||
| // 1,000,000 elements but containing only 2 actual entries. | ||
| func buildOversizedArrayDB() []byte { | ||
| const nodeCount = 1 | ||
| const recordValue = nodeCount + 16 | ||
|
|
||
| buf := make([]byte, 1024) | ||
| pos := 0 | ||
|
|
||
| pos += writeSearchTree(buf[pos:], recordValue) | ||
|
|
||
| // 16-byte null separator | ||
| pos += dataSeparatorSize | ||
|
|
||
| // Data: array claiming 1M elements, only 2 strings present | ||
| pos += writeLargeArray(buf[pos:], 1_000_000) | ||
| pos += writeString(buf[pos:], "x") | ||
| pos += writeString(buf[pos:], "y") | ||
|
|
||
| pos += writeMetadataBlock(buf[pos:], nodeCount, 1_000_000_000) | ||
|
|
||
| return buf[:pos] |
| // buildOversizedMapDB creates a complete MMDB with a map claiming | ||
| // 1,000,000 entries but containing only 1 key-value pair. | ||
| func buildOversizedMapDB() []byte { | ||
| const nodeCount = 1 | ||
| const recordValue = nodeCount + 16 | ||
|
|
||
| buf := make([]byte, 1024) | ||
| pos := 0 | ||
|
|
||
| pos += writeSearchTree(buf[pos:], recordValue) | ||
|
|
||
| // 16-byte null separator | ||
| pos += dataSeparatorSize | ||
|
|
||
| // Data: map claiming 1M entries, only 1 k/v pair present | ||
| pos += writeLargeMap(buf[pos:], 1_000_000) | ||
| pos += writeString(buf[pos:], "k") | ||
| pos += writeString(buf[pos:], "v") | ||
|
|
||
| pos += writeMetadataBlock(buf[pos:], nodeCount, 1_000_000_000) | ||
|
|
||
| return buf[:pos] |
| // buildUint64MaxEpochDB creates a complete MMDB with build_epoch set to | ||
| // UINT64_MAX (18446744073709551615). The database is structurally valid | ||
| // but the extreme epoch value can cause overflow in time conversions. | ||
| func buildUint64MaxEpochDB() []byte { | ||
| const nodeCount = 1 | ||
| const recordValue = nodeCount + 16 | ||
|
|
||
| buf := make([]byte, 1024) | ||
| pos := 0 | ||
|
|
||
| pos += writeSearchTree(buf[pos:], recordValue) | ||
|
|
||
| // 16-byte null separator | ||
| pos += dataSeparatorSize | ||
|
|
||
| // Data: a simple map with one string entry | ||
| pos += writeMap(buf[pos:], 1) | ||
| pos += writeString(buf[pos:], "ip") | ||
| pos += writeString(buf[pos:], "test") | ||
|
|
||
| pos += writeMetadataBlock(buf[pos:], nodeCount, ^uint64(0)) | ||
|
|
||
| return buf[:pos] |
There was a problem hiding this comment.
3db65ec to
f6e7af9
Compare
Add --bad-data flag to write-test-data that generates intentionally malformed MMDB files for testing error handling in reader implementations. New generators: - Oversized array: claims 1M elements, has 2 (raw binary) - Oversized map: claims 1M entries, has 1 (raw binary) - UINT64_MAX build_epoch: extreme metadata value (raw binary) - Deep nesting: 600-level nested maps via mmdbwriter The raw binary approach is necessary for 3 of 4 databases because mmdbwriter validates data structures and can't represent UINT64_MAX as a build epoch (its BuildEpoch field is int64). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Generated via: write-test-data --bad-data bad-data/libmaxminddb Files added: - libmaxminddb-oversized-array.mmdb: array claiming 1M elements, 2 present - libmaxminddb-oversized-map.mmdb: map claiming 1M entries, 1 present - libmaxminddb-deep-nesting.mmdb: 600-level nested maps (exceeds 512 depth limit) - libmaxminddb-uint64-max-epoch.mmdb: valid DB with UINT64_MAX build_epoch The first three should produce MMDB_INVALID_DATA_ERROR in libmaxminddb. The epoch database is structurally valid but exercises overflow in time conversions across reader implementations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add two new bad-data generators for libmaxminddb testing: - corrupt-search-tree: metadata claims 100 nodes but file has only 1 - deep-array-nesting: 600-level nested arrays exceeding depth limit Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
findRepoRoot() walks up from cwd looking for the go.mod belonging to this module, then uses the result to default -source, -target, and -bad-data flags. This allows zero-flag invocation from anywhere inside the repo tree while still allowing explicit overrides. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document zero-flag usage and flag overrides for write-test-data. Update copyright year to 2026. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
f6e7af9 to
c0f5ef9
Compare
pkg/writer/baddata.go
Outdated
| if err != nil { | ||
| return fmt.Errorf("creating file: %w", err) | ||
| } | ||
| defer outputFile.Close() |
There was a problem hiding this comment.
Would we want to error check this one, given it's writing?
pkg/writer/baddata.go
Outdated
| if err != nil { | ||
| return fmt.Errorf("creating file: %w", err) | ||
| } | ||
| defer outputFile.Close() |
| # Generating Test Data | ||
|
|
||
| The `write-test-data` command generates the MMDB test files under `test-data/` | ||
| and `bad-data/`. |
There was a problem hiding this comment.
It looks like this'll be bad-data/libmaxminddb currently.
There was a problem hiding this comment.
Yeah. There is a pre-existing pattern of putting these bad databases in a subdir based on the implementation where it exposed a bug. Most of the existing databases are static files that were found via fuzzing. Presumably a future database made by the program could be under maxminddb-rust or something.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
--bad-dataflag towrite-test-datacommand for generating intentionally malformed MMDB filesmmdb_test_writer.hto Go asrawmmdb.gofor crafting raw binary MMDB filesbad-data/libmaxminddb/Context
PR maxmind/libmaxminddb#416 reviewer suggested moving crafted test databases here so other reader implementations (Go, Python, etc.) can test against them.
Databases
libmaxminddb-oversized-array.mmdbMMDB_INVALID_DATA_ERRORfromget_entry_data_listlibmaxminddb-oversized-map.mmdbMMDB_INVALID_DATA_ERRORfromget_entry_data_listlibmaxminddb-deep-nesting.mmdbMMDB_INVALID_DATA_ERRORfrom depth limit (512)libmaxminddb-uint64-max-epoch.mmdbBuildEpochisint64in mmdbwriter)Test plan
go build ./cmd/write-test-datacompilesgo vet ./...passesmake checkpasses with these files (all 27 tests green)🤖 Generated with Claude Code