Skip to content

Comments

Harden libmaxminddb against crafted/malformed databases#416

Merged
horgh merged 27 commits intomainfrom
greg/eng-4239
Feb 23, 2026
Merged

Harden libmaxminddb against crafted/malformed databases#416
horgh merged 27 commits intomainfrom
greg/eng-4239

Conversation

@oschwald
Copy link
Member

Summary

Addresses ENG-4239. Nine targeted fixes for integer overflows, NULL dereferences, memory amplification, and undefined behavior found during a security audit of libmaxminddb.

Library fixes (src/maxminddb.c):

  • Validate array/map size in get_entry_data_list against remaining data (memory amplification DoS)
  • Replace GetFileSize with GetFileSizeEx on Windows to handle >4GB files correctly
  • Fix integer overflow in MMDB_read_node/find_ipv4_start_node pointer arithmetic (cast to uint64_t)
  • Fix integer overflow in find_address_in_search_tree bounds check

mmdblookup fixes (bin/mmdblookup.c):

  • Fix printf format specifiers (%iPRIu32, %lluPRIu64) in metadata dump
  • Fix gmtime NULL dereference on out-of-range build_epoch
  • Fix print_indentation stack overflow on negative indent

Defensive hardening:

  • NULL file_content after munmap to prevent double-unmap
  • Initialize *mmdb_error in MMDB_lookup_string on getaddrinfo failure

Each fix has a dedicated test in t/.

Test plan

  • make check — all 27 tests pass
  • Review each commit individually (one logical fix per commit)
  • Verify Windows build compiles (the GetFileSizeEx and error code changes are #ifdef-guarded)

🤖 Generated with Claude Code

MMDB_read_node() used > instead of >= when checking node_number against
node_count, allowing node_number == node_count through the bounds check.
Since nodes are 0-indexed, the last valid node is node_count - 1, so
node_count itself is one past the end of the search tree. Reading at
that offset hits the 16-byte null separator between the search tree and
data section, producing records that decode as 0. When converted to a
data offset via data_section_offset_for_record(), this underflows
(0 - node_count - 16), resulting in an invalid entry offset.

The fix changes > to >= so that node_number == node_count is correctly
rejected with MMDB_INVALID_NODE_NUMBER_ERROR. A test has been added to
verify this boundary condition.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gemini-code-assist
Copy link

Summary of Changes

Hello @oschwald, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the security and stability of the libmaxminddb library and its associated mmdblookup utility. It addresses several critical issues, including potential denial-of-service vectors, memory corruption, and undefined behavior, by implementing robust validation checks and correcting arithmetic operations. The changes ensure that the library can gracefully handle malformed database files and edge cases, making it more reliable for production environments.

Highlights

  • Security Hardening: Implemented nine targeted fixes to address potential security vulnerabilities such as integer overflows, NULL dereferences, memory amplification, and undefined behavior, identified during a security audit.
  • Robustness Improvements: Enhanced the library's resilience against malformed or crafted MaxMind DB files by validating array/map sizes, handling large file sizes on Windows, and preventing stack overflows from deeply nested data structures.
  • API Stability: Improved the mmdblookup utility by fixing printf format specifier mismatches, preventing gmtime NULL dereferences, and ensuring mmdb_error is always initialized on getaddrinfo failures.
  • Comprehensive Testing: Added dedicated test cases for each fix, covering scenarios like bad data sizes, out-of-range build epochs, negative indentation, double-close operations, getaddrinfo failures, maximum recursion depth, and overflow bounds.
Changelog
  • Changes.md
    • Updated with a comprehensive list of security and stability fixes for the 1.13.0 release.
  • bin/mmdblookup.c
    • Included <inttypes.h> for portable integer types.
    • Adjusted alloca size for the program name buffer on Windows to prevent an off-by-one error.
    • Updated printf format specifiers in metadata dump to use PRIu32, PRIu16, and PRIu64 for correct type matching.
    • Added a check for gmtime() return value to prevent NULL dereference when displaying metadata for an out-of-range build_epoch.
  • src/maxminddb.c
    • Modified skip_map_or_array function signature to include a depth parameter for recursion tracking.
    • Changed the type of size from DWORD to ssize_t in map_file for broader compatibility.
    • Replaced GetFileSize with GetFileSizeEx and added size validation in map_file to correctly handle files larger than 4GB on Windows.
    • Ensured *mmdb_error is initialized to MMDB_SUCCESS when getaddrinfo fails in MMDB_lookup_string.
    • Cast node_count to uint64_t in find_address_in_search_tree bounds check to prevent integer overflow.
    • Cast node_value to uint64_t in find_ipv4_start_node pointer arithmetic to prevent integer overflow.
    • Corrected the bounds check in MMDB_read_node from node_number > node_count to node_number >= node_count and cast node_number to uint64_t.
    • Added a recursion depth limit to skip_map_or_array to prevent stack overflows from deeply nested data structures.
    • Implemented size validation for array and map entries in get_entry_data_list to prevent memory amplification attacks.
    • Set mmdb->file_content to NULL after munmap in free_mmdb_struct to prevent double-unmap issues.
    • Initialized mmdb->metadata.languages.count and mmdb->metadata.description.count to 0 in their respective free functions.
    • Clamped negative indent values to 0 in print_indentation to prevent stack buffer overflows.
  • t/Makefile.am
    • Added new test programs: bad_data_size_t, bad_epoch_t, bad_indent_t, double_close_t, gai_error_t, max_depth_t, and overflow_bounds_t.
  • t/bad_data_size_t.c
    • Added a new test file to verify that crafted MMDB files with arrays or maps claiming large sizes but containing insufficient data are rejected.
  • t/bad_epoch_t.c
    • Added a new test file to ensure the library handles MMDB files with out-of-range build_epoch values without crashing.
  • t/bad_indent_t.c
    • Added a new test file to confirm that print_indentation correctly handles negative indent values by clamping them to zero.
  • t/double_close_t.c
    • Added a new test file to verify that calling MMDB_close multiple times on the same MMDB_s struct does not lead to crashes or undefined behavior.
  • t/gai_error_t.c
    • Added a new test file to ensure mmdb_error is properly set to MMDB_SUCCESS when getaddrinfo fails in MMDB_lookup_string.
  • t/max_depth_t.c
    • Added a new test file to validate the recursion depth limit for deeply nested maps and arrays, preventing stack overflows.
  • t/overflow_bounds_t.c
    • Added a new test file to confirm that 64-bit arithmetic is correctly used in bounds checks and that normal lookups continue to function as expected.
  • t/read_node_t.c
    • Added a new test function run_read_node_invalid_node_number_tests to verify that MMDB_read_node correctly rejects invalid node numbers.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly hardens libmaxminddb against various forms of crafted or malformed databases, addressing potential security vulnerabilities such as integer overflows, NULL dereferences, memory amplification, and undefined behavior. The changes include robust validation checks, correct handling of large file sizes on Windows, proper type casting for arithmetic operations, and improved error handling. The addition of new test cases for these specific issues is also a great step towards ensuring the library's stability and security. Overall, these are crucial improvements for the library's resilience.

@oschwald oschwald force-pushed the greg/eng-4239 branch 2 times, most recently from 1a49a6f to 5f12a54 Compare February 20, 2026 21:34
oschwald and others added 12 commits February 20, 2026 13:52
skip_map_or_array(), used internally by MMDB_aget_value and
MMDB_get_value to skip past map/array values during path lookup, had
no recursion depth limit. A crafted MMDB file with deeply nested maps
or arrays (e.g. 200,000 levels) could cause unbounded recursion,
exhausting the stack and crashing the process.

The fix adds a depth parameter matching the existing
MAXIMUM_DATA_STRUCTURE_DEPTH (512) limit already enforced by
get_entry_data_list(). Callers pass an initial depth of 0, and each
recursive call increments it. When the limit is reached,
MMDB_INVALID_DATA_ERROR is returned.

A new test (max_depth_t) verifies that:
- A 600-level nested MMDB returns MMDB_INVALID_DATA_ERROR
- A 10-level nested MMDB works normally

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
On Windows, mmdblookup allocated strlen(argv[0]) bytes for the program
name buffer, but _splitpath writes fname + ext + '\0', which needs
strlen(argv[0]) + 1 bytes. This caused a one-byte stack buffer overflow
when the null terminator was written past the end of the alloca'd buffer.

The fix adds + 1 to the alloca size. This code path is Windows-only
(#ifdef _WIN32) and only affects the CLI tool, not the library.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When getaddrinfo fails, *mmdb_error was never written, leaving callers
with an indeterminate value if they checked it without first checking
*gai_error. Now explicitly set to MMDB_SUCCESS in the GAI failure path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When MMDB_dump_entry_data_list() was called with a negative indent,
the value was cast to size_t producing a massive memset length. Clamp
negative values to 0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
free_mmdb_struct did not NULL file_content after unmapping, so a second
call to MMDB_close would double-munmap the same address (UB). Also zero
description.count so free_descriptions_metadata is safe on re-entry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
gmtime() returns NULL for out-of-range time_t values. A crafted database
with build_epoch=UINT64_MAX would cause strftime to dereference NULL.
Now check the return value and display "unknown" for invalid epochs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The addition of node_count and data_section_size was performed in
uint32_t arithmetic. On very large databases where the sum exceeds
UINT32_MAX, this wraps and causes valid records to be rejected as
corrupt. Cast to uint64_t before addition.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use PRIu32/PRIu16/PRIu64 from <inttypes.h> instead of %i and %llu
for metadata fields. The previous specifiers were technically undefined
behavior and could corrupt subsequent variadic arguments on platforms
where uint64_t and unsigned long long have different representations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rithmetic

Cast node_number/node_value to uint64_t before multiplying by
record_length to prevent uint32 overflow on very large databases.
This matches the pattern already used in find_address_in_search_tree.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GetFileSize(fd, NULL) discards the upper 32 bits of the file size.
Replace with GetFileSizeEx which returns the full 64-bit size, and
add an SSIZE_MAX bounds check matching the POSIX code path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
A crafted database could claim millions of array/map elements while
only having a few bytes of data, causing disproportionate memory
allocation (~40x amplification). Now validate that the claimed element
count does not exceed the remaining data section bytes before entering
the allocation loop.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
bad_epoch_t.c, overflow_bounds_t.c, and bad_data_size_t.c all had
identical copies of write_map, write_string, write_uint16, write_uint32,
write_uint64, write_meta_key, and the METADATA_MARKER defines. Move
these into a shared static-inline header.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
result->netmask = current_bit;

if (value >= node_count + mmdb->data_section_size) {
if (value >= (uint64_t)node_count + mmdb->data_section_size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  - Consistency with record_type() (line 1080): The equivalent check there uses subtraction (record - node_count < mmdb->data_section_size) instead of addition, which naturally
  avoids overflow since record is uint64_t. The two functions use different idioms for the same logical check — not a bug, but worth being aware of for future maintainability.

Not sure if we want to change anything, but it has a minor point.

oschwald and others added 6 commits February 23, 2026 08:30
All 7 new tests were in Makefile.am but missing from CMakeLists.txt.

Cross-platform tests: double_close_t, gai_error_t, overflow_bounds_t
Unix-only tests: bad_data_size_t, bad_epoch_t, bad_indent_t, max_depth_t

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both are const uint8_t * pointers into the mmap'd region. After
munmap they become dangling. NULL them alongside the existing
file_content and file_size cleanup in free_mmdb_struct().

Add assertions in double_close_t to verify the new NULL assignments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MMDB_read_node is a public API. The node_number >= node_count check
validates the logical number, but a truncated database file can still
produce an out-of-bounds pointer. Both internal functions
(find_address_in_search_tree, find_ipv4_start_node) already have this
check — this brings the public API to parity.

Returns MMDB_CORRUPT_SEARCH_TREE_ERROR matching the internal functions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Check fwrite return values in test DB creation functions. On failure,
clean up the file handle and allocated memory before bailing out.

Affected files: bad_epoch_t.c, bad_data_size_t.c (2 calls), max_depth_t.c

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The data-pool-t continuation line used spaces where all other
continuation lines use tabs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document that *mmdb_error is set to MMDB_SUCCESS when *gai_error is
non-zero, since no database error occurred in that case. Recommend
checking *gai_error first.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
oschwald and others added 4 commits February 23, 2026 11:33
Previously, test_read() only tested MMDB_open and MMDB_lookup_string,
asserting that all bad-data files must fail at one of these stages.

New bad-data files (oversized array/map, deep nesting) pass both open
and lookup but fail during MMDB_get_entry_data_list. The uint64-max-epoch
database is fully valid but exercises extreme metadata values.

Extend test_read() to try progressively deeper operations:
1. MMDB_open - if error, pass
2. MMDB_lookup_string - if error, pass
3. MMDB_get_entry_data_list - if error, pass
4. If no error at any level, still pass (extreme but valid data)

This is backward-compatible: existing bad-data files that fail at
open or lookup still behave identically.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Point submodule to branch with 4 new test databases in
bad-data/libmaxminddb/:

- libmaxminddb-oversized-array.mmdb
- libmaxminddb-oversized-map.mmdb
- libmaxminddb-deep-nesting.mmdb
- libmaxminddb-uint64-max-epoch.mmdb

This commit should be updated to point to main after the
MaxMind-DB PR merges.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The C-side MMDB builder functions are now superseded by the Go
generators in MaxMind-DB. Remove the header and rewrite
bad_data_size_t, bad_epoch_t, and max_depth_t to load from
bad-data/libmaxminddb/ via a new bad_database_path() helper. Also
remove the unused include from overflow_bounds_t.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After unmapping, the data_section and metadata_section pointers were
NULLed but their _size fields remained stale. Zero them to make the
struct fully inert after close.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
oschwald and others added 4 commits February 23, 2026 11:48
When gmtime() returns NULL for an extreme build_epoch value, display
"out of range" instead of "unknown" since the failure is specifically
due to the epoch being outside the representable range.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests that MMDB_read_node rejects node_number == node_count (the
off-by-one fix from >= instead of >) and accepts node_count - 1.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a clarifying comment before the array bounds check, matching the
existing comment on the map bounds check.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds test_deep_array_nesting_rejected() which loads a 600-level nested
array MMDB and asserts MMDB_INVALID_DATA_ERROR from
MMDB_get_entry_data_list, complementing the existing map nesting test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@horgh horgh merged commit 3eda67f into main Feb 23, 2026
31 checks passed
@horgh horgh deleted the greg/eng-4239 branch February 23, 2026 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants