Harden libmaxminddb against crafted/malformed databases by oschwald · Pull Request #416 · maxmind/libmaxminddb

oschwald · 2026-02-20T21:09:05Z

Summary

Addresses ENG-4239. Nine targeted fixes for integer overflows, NULL dereferences, memory amplification, and undefined behavior found during a security audit of libmaxminddb.

Library fixes (src/maxminddb.c):

Validate array/map size in get_entry_data_list against remaining data (memory amplification DoS)
Replace GetFileSize with GetFileSizeEx on Windows to handle >4GB files correctly
Fix integer overflow in MMDB_read_node/find_ipv4_start_node pointer arithmetic (cast to uint64_t)
Fix integer overflow in find_address_in_search_tree bounds check

mmdblookup fixes (bin/mmdblookup.c):

Fix printf format specifiers (%i→PRIu32, %llu→PRIu64) in metadata dump
Fix gmtime NULL dereference on out-of-range build_epoch
Fix print_indentation stack overflow on negative indent

Defensive hardening:

NULL file_content after munmap to prevent double-unmap
Initialize *mmdb_error in MMDB_lookup_string on getaddrinfo failure

Each fix has a dedicated test in t/.

Test plan

make check — all 27 tests pass
Review each commit individually (one logical fix per commit)
Verify Windows build compiles (the GetFileSizeEx and error code changes are #ifdef-guarded)

🤖 Generated with Claude Code

MMDB_read_node() used > instead of >= when checking node_number against node_count, allowing node_number == node_count through the bounds check. Since nodes are 0-indexed, the last valid node is node_count - 1, so node_count itself is one past the end of the search tree. Reading at that offset hits the 16-byte null separator between the search tree and data section, producing records that decode as 0. When converted to a data offset via data_section_offset_for_record(), this underflows (0 - node_count - 16), resulting in an invalid entry offset. The fix changes > to >= so that node_number == node_count is correctly rejected with MMDB_INVALID_NODE_NUMBER_ERROR. A test has been added to verify this boundary condition. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist · 2026-02-20T21:09:37Z

Summary of Changes

Hello @oschwald, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the security and stability of the libmaxminddb library and its associated mmdblookup utility. It addresses several critical issues, including potential denial-of-service vectors, memory corruption, and undefined behavior, by implementing robust validation checks and correcting arithmetic operations. The changes ensure that the library can gracefully handle malformed database files and edge cases, making it more reliable for production environments.

Highlights

Security Hardening: Implemented nine targeted fixes to address potential security vulnerabilities such as integer overflows, NULL dereferences, memory amplification, and undefined behavior, identified during a security audit.
Robustness Improvements: Enhanced the library's resilience against malformed or crafted MaxMind DB files by validating array/map sizes, handling large file sizes on Windows, and preventing stack overflows from deeply nested data structures.
API Stability: Improved the mmdblookup utility by fixing printf format specifier mismatches, preventing gmtime NULL dereferences, and ensuring mmdb_error is always initialized on getaddrinfo failures.
Comprehensive Testing: Added dedicated test cases for each fix, covering scenarios like bad data sizes, out-of-range build epochs, negative indentation, double-close operations, getaddrinfo failures, maximum recursion depth, and overflow bounds.

Changelog

Changes.md
- Updated with a comprehensive list of security and stability fixes for the 1.13.0 release.
bin/mmdblookup.c
- Included <inttypes.h> for portable integer types.
- Adjusted alloca size for the program name buffer on Windows to prevent an off-by-one error.
- Updated printf format specifiers in metadata dump to use PRIu32, PRIu16, and PRIu64 for correct type matching.
- Added a check for gmtime() return value to prevent NULL dereference when displaying metadata for an out-of-range build_epoch.
src/maxminddb.c
- Modified skip_map_or_array function signature to include a depth parameter for recursion tracking.
- Changed the type of size from DWORD to ssize_t in map_file for broader compatibility.
- Replaced GetFileSize with GetFileSizeEx and added size validation in map_file to correctly handle files larger than 4GB on Windows.
- Ensured *mmdb_error is initialized to MMDB_SUCCESS when getaddrinfo fails in MMDB_lookup_string.
- Cast node_count to uint64_t in find_address_in_search_tree bounds check to prevent integer overflow.
- Cast node_value to uint64_t in find_ipv4_start_node pointer arithmetic to prevent integer overflow.
- Corrected the bounds check in MMDB_read_node from node_number > node_count to node_number >= node_count and cast node_number to uint64_t.
- Added a recursion depth limit to skip_map_or_array to prevent stack overflows from deeply nested data structures.
- Implemented size validation for array and map entries in get_entry_data_list to prevent memory amplification attacks.
- Set mmdb->file_content to NULL after munmap in free_mmdb_struct to prevent double-unmap issues.
- Initialized mmdb->metadata.languages.count and mmdb->metadata.description.count to 0 in their respective free functions.
- Clamped negative indent values to 0 in print_indentation to prevent stack buffer overflows.
t/Makefile.am
- Added new test programs: bad_data_size_t, bad_epoch_t, bad_indent_t, double_close_t, gai_error_t, max_depth_t, and overflow_bounds_t.
t/bad_data_size_t.c
- Added a new test file to verify that crafted MMDB files with arrays or maps claiming large sizes but containing insufficient data are rejected.
t/bad_epoch_t.c
- Added a new test file to ensure the library handles MMDB files with out-of-range build_epoch values without crashing.
t/bad_indent_t.c
- Added a new test file to confirm that print_indentation correctly handles negative indent values by clamping them to zero.
t/double_close_t.c
- Added a new test file to verify that calling MMDB_close multiple times on the same MMDB_s struct does not lead to crashes or undefined behavior.
t/gai_error_t.c
- Added a new test file to ensure mmdb_error is properly set to MMDB_SUCCESS when getaddrinfo fails in MMDB_lookup_string.
t/max_depth_t.c
- Added a new test file to validate the recursion depth limit for deeply nested maps and arrays, preventing stack overflows.
t/overflow_bounds_t.c
- Added a new test file to confirm that 64-bit arithmetic is correctly used in bounds checks and that normal lookups continue to function as expected.
t/read_node_t.c
- Added a new test function run_read_node_invalid_node_number_tests to verify that MMDB_read_node correctly rejects invalid node numbers.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request significantly hardens libmaxminddb against various forms of crafted or malformed databases, addressing potential security vulnerabilities such as integer overflows, NULL dereferences, memory amplification, and undefined behavior. The changes include robust validation checks, correct handling of large file sizes on Windows, proper type casting for arithmetic operations, and improved error handling. The addition of new test cases for these specific issues is also a great step towards ensuring the library's stability and security. Overall, these are crucial improvements for the library's resilience.

Changes.md

src/maxminddb.c

Changes.md

bin/mmdblookup.c

t/Makefile.am

skip_map_or_array(), used internally by MMDB_aget_value and MMDB_get_value to skip past map/array values during path lookup, had no recursion depth limit. A crafted MMDB file with deeply nested maps or arrays (e.g. 200,000 levels) could cause unbounded recursion, exhausting the stack and crashing the process. The fix adds a depth parameter matching the existing MAXIMUM_DATA_STRUCTURE_DEPTH (512) limit already enforced by get_entry_data_list(). Callers pass an initial depth of 0, and each recursive call increments it. When the limit is reached, MMDB_INVALID_DATA_ERROR is returned. A new test (max_depth_t) verifies that: - A 600-level nested MMDB returns MMDB_INVALID_DATA_ERROR - A 10-level nested MMDB works normally Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

On Windows, mmdblookup allocated strlen(argv[0]) bytes for the program name buffer, but _splitpath writes fname + ext + '\0', which needs strlen(argv[0]) + 1 bytes. This caused a one-byte stack buffer overflow when the null terminator was written past the end of the alloca'd buffer. The fix adds + 1 to the alloca size. This code path is Windows-only (#ifdef _WIN32) and only affects the CLI tool, not the library. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When getaddrinfo fails, *mmdb_error was never written, leaving callers with an indeterminate value if they checked it without first checking *gai_error. Now explicitly set to MMDB_SUCCESS in the GAI failure path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When MMDB_dump_entry_data_list() was called with a negative indent, the value was cast to size_t producing a massive memset length. Clamp negative values to 0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

free_mmdb_struct did not NULL file_content after unmapping, so a second call to MMDB_close would double-munmap the same address (UB). Also zero description.count so free_descriptions_metadata is safe on re-entry. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gmtime() returns NULL for out-of-range time_t values. A crafted database with build_epoch=UINT64_MAX would cause strftime to dereference NULL. Now check the return value and display "unknown" for invalid epochs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The addition of node_count and data_section_size was performed in uint32_t arithmetic. On very large databases where the sum exceeds UINT32_MAX, this wraps and causes valid records to be rejected as corrupt. Cast to uint64_t before addition. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use PRIu32/PRIu16/PRIu64 from <inttypes.h> instead of %i and %llu for metadata fields. The previous specifiers were technically undefined behavior and could corrupt subsequent variadic arguments on platforms where uint64_t and unsigned long long have different representations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…rithmetic Cast node_number/node_value to uint64_t before multiplying by record_length to prevent uint32 overflow on very large databases. This matches the pattern already used in find_address_in_search_tree. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

GetFileSize(fd, NULL) discards the upper 32 bits of the file size. Replace with GetFileSizeEx which returns the full 64-bit size, and add an SSIZE_MAX bounds check matching the POSIX code path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

A crafted database could claim millions of array/map elements while only having a few bytes of data, causing disproportionate memory allocation (~40x amplification). Now validate that the claimed element count does not exceed the remaining data section bytes before entering the allocation loop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

bad_epoch_t.c, overflow_bounds_t.c, and bad_data_size_t.c all had identical copies of write_map, write_string, write_uint16, write_uint32, write_uint64, write_meta_key, and the METADATA_MARKER defines. Move these into a shared static-inline header. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

t/Makefile.am

src/maxminddb.c

t/bad_epoch_t.c

src/maxminddb.c

horgh · 2026-02-20T23:58:59Z

src/maxminddb.c

    result->netmask = current_bit;

-    if (value >= node_count + mmdb->data_section_size) {
+    if (value >= (uint64_t)node_count + mmdb->data_section_size) {


- Consistency with record_type() (line 1080): The equivalent check there uses subtraction (record - node_count < mmdb->data_section_size) instead of addition, which naturally avoids overflow since record is uint64_t. The two functions use different idioms for the same logical check — not a bug, but worth being aware of for future maintainability.

Not sure if we want to change anything, but it has a minor point.

src/maxminddb.c

t/bad_data_size_t.c

All 7 new tests were in Makefile.am but missing from CMakeLists.txt. Cross-platform tests: double_close_t, gai_error_t, overflow_bounds_t Unix-only tests: bad_data_size_t, bad_epoch_t, bad_indent_t, max_depth_t Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Both are const uint8_t * pointers into the mmap'd region. After munmap they become dangling. NULL them alongside the existing file_content and file_size cleanup in free_mmdb_struct(). Add assertions in double_close_t to verify the new NULL assignments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

MMDB_read_node is a public API. The node_number >= node_count check validates the logical number, but a truncated database file can still produce an out-of-bounds pointer. Both internal functions (find_address_in_search_tree, find_ipv4_start_node) already have this check — this brings the public API to parity. Returns MMDB_CORRUPT_SEARCH_TREE_ERROR matching the internal functions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Check fwrite return values in test DB creation functions. On failure, clean up the file handle and allocated memory before bailing out. Affected files: bad_epoch_t.c, bad_data_size_t.c (2 calls), max_depth_t.c Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The data-pool-t continuation line used spaces where all other continuation lines use tabs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Document that *mmdb_error is set to MMDB_SUCCESS when *gai_error is non-zero, since no database error occurred in that case. Recommend checking *gai_error first. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Previously, test_read() only tested MMDB_open and MMDB_lookup_string, asserting that all bad-data files must fail at one of these stages. New bad-data files (oversized array/map, deep nesting) pass both open and lookup but fail during MMDB_get_entry_data_list. The uint64-max-epoch database is fully valid but exercises extreme metadata values. Extend test_read() to try progressively deeper operations: 1. MMDB_open - if error, pass 2. MMDB_lookup_string - if error, pass 3. MMDB_get_entry_data_list - if error, pass 4. If no error at any level, still pass (extreme but valid data) This is backward-compatible: existing bad-data files that fail at open or lookup still behave identically. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Point submodule to branch with 4 new test databases in bad-data/libmaxminddb/: - libmaxminddb-oversized-array.mmdb - libmaxminddb-oversized-map.mmdb - libmaxminddb-deep-nesting.mmdb - libmaxminddb-uint64-max-epoch.mmdb This commit should be updated to point to main after the MaxMind-DB PR merges. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The C-side MMDB builder functions are now superseded by the Go generators in MaxMind-DB. Remove the header and rewrite bad_data_size_t, bad_epoch_t, and max_depth_t to load from bad-data/libmaxminddb/ via a new bad_database_path() helper. Also remove the unused include from overflow_bounds_t. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

After unmapping, the data_section and metadata_section pointers were NULLed but their _size fields remained stale. Zero them to make the struct fully inert after close. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When gmtime() returns NULL for an extreme build_epoch value, display "out of range" instead of "unknown" since the failure is specifically due to the epoch being outside the representable range. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Tests that MMDB_read_node rejects node_number == node_count (the off-by-one fix from >= instead of >) and accepts node_count - 1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add a clarifying comment before the array bounds check, matching the existing comment on the map bounds check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds test_deep_array_nesting_rejected() which loads a 600-level nested array MMDB and asserts MMDB_INVALID_DATA_ERROR from MMDB_get_entry_data_list, complementing the existing map nesting test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist bot reviewed Feb 20, 2026

View reviewed changes

oschwald force-pushed the greg/eng-4239 branch 2 times, most recently from 1a49a6f to 5f12a54 Compare February 20, 2026 21:34

oschwald and others added 12 commits February 20, 2026 13:52

Fix print_indentation stack overflow on negative indent

ffa29c1

When MMDB_dump_entry_data_list() was called with a negative indent, the value was cast to size_t producing a massive memset length. Clamp negative values to 0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

oschwald force-pushed the greg/eng-4239 branch from 5f12a54 to dbb6558 Compare February 20, 2026 21:53

horgh requested changes Feb 21, 2026

View reviewed changes

oschwald and others added 6 commits February 23, 2026 08:30

Fix mixed tab/space indentation in Makefile.am

1e3a552

The data-pool-t continuation line used spaces where all other continuation lines use tabs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Clarify mmdb_error behavior on GAI failure in docs

91e17e2

Document that *mmdb_error is set to MMDB_SUCCESS when *gai_error is non-zero, since no database error occurred in that case. Recommend checking *gai_error first. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

oschwald mentioned this pull request Feb 23, 2026

Add crafted bad-data MMDB files for libmaxminddb maxmind/MaxMind-DB#219

Merged

4 tasks

oschwald and others added 4 commits February 23, 2026 11:33

Zero data/metadata section sizes in MMDB_close

51764c8

After unmapping, the data_section and metadata_section pointers were NULLed but their _size fields remained stale. Zero them to make the struct fully inert after close. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

oschwald and others added 4 commits February 23, 2026 11:48

Add test for MMDB_read_node off-by-one bounds check

d3fff39

Tests that MMDB_read_node rejects node_number == node_count (the off-by-one fix from >= instead of >) and accepts node_count - 1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add comment for array size validation

88af1d6

Add a clarifying comment before the array bounds check, matching the existing comment on the map bounds check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

horgh approved these changes Feb 23, 2026

View reviewed changes

horgh merged commit 3eda67f into main Feb 23, 2026
31 checks passed

horgh deleted the greg/eng-4239 branch February 23, 2026 22:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Harden libmaxminddb against crafted/malformed databases#416

Harden libmaxminddb against crafted/malformed databases#416
horgh merged 27 commits intomainfrom
greg/eng-4239

oschwald commented Feb 20, 2026

Uh oh!

gemini-code-assist bot commented Feb 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

horgh Feb 20, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

oschwald commented Feb 20, 2026

Summary

Test plan

Uh oh!

gemini-code-assist bot commented Feb 20, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

horgh Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants