Harden libmaxminddb against crafted/malformed databases#416
Conversation
MMDB_read_node() used > instead of >= when checking node_number against node_count, allowing node_number == node_count through the bounds check. Since nodes are 0-indexed, the last valid node is node_count - 1, so node_count itself is one past the end of the search tree. Reading at that offset hits the 16-byte null separator between the search tree and data section, producing records that decode as 0. When converted to a data offset via data_section_offset_for_record(), this underflows (0 - node_count - 16), resulting in an invalid entry offset. The fix changes > to >= so that node_number == node_count is correctly rejected with MMDB_INVALID_NODE_NUMBER_ERROR. A test has been added to verify this boundary condition. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary of ChangesHello @oschwald, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the security and stability of the libmaxminddb library and its associated Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request significantly hardens libmaxminddb against various forms of crafted or malformed databases, addressing potential security vulnerabilities such as integer overflows, NULL dereferences, memory amplification, and undefined behavior. The changes include robust validation checks, correct handling of large file sizes on Windows, proper type casting for arithmetic operations, and improved error handling. The addition of new test cases for these specific issues is also a great step towards ensuring the library's stability and security. Overall, these are crucial improvements for the library's resilience.
1a49a6f to
5f12a54
Compare
skip_map_or_array(), used internally by MMDB_aget_value and MMDB_get_value to skip past map/array values during path lookup, had no recursion depth limit. A crafted MMDB file with deeply nested maps or arrays (e.g. 200,000 levels) could cause unbounded recursion, exhausting the stack and crashing the process. The fix adds a depth parameter matching the existing MAXIMUM_DATA_STRUCTURE_DEPTH (512) limit already enforced by get_entry_data_list(). Callers pass an initial depth of 0, and each recursive call increments it. When the limit is reached, MMDB_INVALID_DATA_ERROR is returned. A new test (max_depth_t) verifies that: - A 600-level nested MMDB returns MMDB_INVALID_DATA_ERROR - A 10-level nested MMDB works normally Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
On Windows, mmdblookup allocated strlen(argv[0]) bytes for the program name buffer, but _splitpath writes fname + ext + '\0', which needs strlen(argv[0]) + 1 bytes. This caused a one-byte stack buffer overflow when the null terminator was written past the end of the alloca'd buffer. The fix adds + 1 to the alloca size. This code path is Windows-only (#ifdef _WIN32) and only affects the CLI tool, not the library. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When getaddrinfo fails, *mmdb_error was never written, leaving callers with an indeterminate value if they checked it without first checking *gai_error. Now explicitly set to MMDB_SUCCESS in the GAI failure path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When MMDB_dump_entry_data_list() was called with a negative indent, the value was cast to size_t producing a massive memset length. Clamp negative values to 0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
free_mmdb_struct did not NULL file_content after unmapping, so a second call to MMDB_close would double-munmap the same address (UB). Also zero description.count so free_descriptions_metadata is safe on re-entry. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
gmtime() returns NULL for out-of-range time_t values. A crafted database with build_epoch=UINT64_MAX would cause strftime to dereference NULL. Now check the return value and display "unknown" for invalid epochs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The addition of node_count and data_section_size was performed in uint32_t arithmetic. On very large databases where the sum exceeds UINT32_MAX, this wraps and causes valid records to be rejected as corrupt. Cast to uint64_t before addition. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use PRIu32/PRIu16/PRIu64 from <inttypes.h> instead of %i and %llu for metadata fields. The previous specifiers were technically undefined behavior and could corrupt subsequent variadic arguments on platforms where uint64_t and unsigned long long have different representations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rithmetic Cast node_number/node_value to uint64_t before multiplying by record_length to prevent uint32 overflow on very large databases. This matches the pattern already used in find_address_in_search_tree. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GetFileSize(fd, NULL) discards the upper 32 bits of the file size. Replace with GetFileSizeEx which returns the full 64-bit size, and add an SSIZE_MAX bounds check matching the POSIX code path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
A crafted database could claim millions of array/map elements while only having a few bytes of data, causing disproportionate memory allocation (~40x amplification). Now validate that the claimed element count does not exceed the remaining data section bytes before entering the allocation loop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
bad_epoch_t.c, overflow_bounds_t.c, and bad_data_size_t.c all had identical copies of write_map, write_string, write_uint16, write_uint32, write_uint64, write_meta_key, and the METADATA_MARKER defines. Move these into a shared static-inline header. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5f12a54 to
dbb6558
Compare
| result->netmask = current_bit; | ||
|
|
||
| if (value >= node_count + mmdb->data_section_size) { | ||
| if (value >= (uint64_t)node_count + mmdb->data_section_size) { |
There was a problem hiding this comment.
- Consistency with record_type() (line 1080): The equivalent check there uses subtraction (record - node_count < mmdb->data_section_size) instead of addition, which naturally
avoids overflow since record is uint64_t. The two functions use different idioms for the same logical check — not a bug, but worth being aware of for future maintainability.
Not sure if we want to change anything, but it has a minor point.
All 7 new tests were in Makefile.am but missing from CMakeLists.txt. Cross-platform tests: double_close_t, gai_error_t, overflow_bounds_t Unix-only tests: bad_data_size_t, bad_epoch_t, bad_indent_t, max_depth_t Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both are const uint8_t * pointers into the mmap'd region. After munmap they become dangling. NULL them alongside the existing file_content and file_size cleanup in free_mmdb_struct(). Add assertions in double_close_t to verify the new NULL assignments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MMDB_read_node is a public API. The node_number >= node_count check validates the logical number, but a truncated database file can still produce an out-of-bounds pointer. Both internal functions (find_address_in_search_tree, find_ipv4_start_node) already have this check — this brings the public API to parity. Returns MMDB_CORRUPT_SEARCH_TREE_ERROR matching the internal functions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Check fwrite return values in test DB creation functions. On failure, clean up the file handle and allocated memory before bailing out. Affected files: bad_epoch_t.c, bad_data_size_t.c (2 calls), max_depth_t.c Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The data-pool-t continuation line used spaces where all other continuation lines use tabs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document that *mmdb_error is set to MMDB_SUCCESS when *gai_error is non-zero, since no database error occurred in that case. Recommend checking *gai_error first. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously, test_read() only tested MMDB_open and MMDB_lookup_string, asserting that all bad-data files must fail at one of these stages. New bad-data files (oversized array/map, deep nesting) pass both open and lookup but fail during MMDB_get_entry_data_list. The uint64-max-epoch database is fully valid but exercises extreme metadata values. Extend test_read() to try progressively deeper operations: 1. MMDB_open - if error, pass 2. MMDB_lookup_string - if error, pass 3. MMDB_get_entry_data_list - if error, pass 4. If no error at any level, still pass (extreme but valid data) This is backward-compatible: existing bad-data files that fail at open or lookup still behave identically. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Point submodule to branch with 4 new test databases in bad-data/libmaxminddb/: - libmaxminddb-oversized-array.mmdb - libmaxminddb-oversized-map.mmdb - libmaxminddb-deep-nesting.mmdb - libmaxminddb-uint64-max-epoch.mmdb This commit should be updated to point to main after the MaxMind-DB PR merges. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The C-side MMDB builder functions are now superseded by the Go generators in MaxMind-DB. Remove the header and rewrite bad_data_size_t, bad_epoch_t, and max_depth_t to load from bad-data/libmaxminddb/ via a new bad_database_path() helper. Also remove the unused include from overflow_bounds_t. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After unmapping, the data_section and metadata_section pointers were NULLed but their _size fields remained stale. Zero them to make the struct fully inert after close. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When gmtime() returns NULL for an extreme build_epoch value, display "out of range" instead of "unknown" since the failure is specifically due to the epoch being outside the representable range. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests that MMDB_read_node rejects node_number == node_count (the off-by-one fix from >= instead of >) and accepts node_count - 1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a clarifying comment before the array bounds check, matching the existing comment on the map bounds check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds test_deep_array_nesting_rejected() which loads a 600-level nested array MMDB and asserts MMDB_INVALID_DATA_ERROR from MMDB_get_entry_data_list, complementing the existing map nesting test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Addresses ENG-4239. Nine targeted fixes for integer overflows, NULL dereferences, memory amplification, and undefined behavior found during a security audit of libmaxminddb.
Library fixes (
src/maxminddb.c):get_entry_data_listagainst remaining data (memory amplification DoS)GetFileSizewithGetFileSizeExon Windows to handle >4GB files correctlyMMDB_read_node/find_ipv4_start_nodepointer arithmetic (cast touint64_t)find_address_in_search_treebounds checkmmdblookupfixes (bin/mmdblookup.c):printfformat specifiers (%i→PRIu32,%llu→PRIu64) in metadata dumpgmtimeNULL dereference on out-of-rangebuild_epochprint_indentationstack overflow on negative indentDefensive hardening:
file_contentaftermunmapto prevent double-unmap*mmdb_errorinMMDB_lookup_stringongetaddrinfofailureEach fix has a dedicated test in
t/.Test plan
make check— all 27 tests passGetFileSizeExand error code changes are#ifdef-guarded)🤖 Generated with Claude Code