fixtures: add function to resolve sample shortened name by MD5 #2802

kami922 · 2026-01-01T21:16:24Z

"## Summary

This PR adds a reverse lookup function for test fixtures, addressing issue #1743.

Problem

The codebase has get_sample_md5_by_name() to look up MD5 hashes by sample name, but there was no way to do the reverse - find the sample name given an MD5 hash.

Solution

Added get_sample_short_name_by_md5() function that takes an MD5 hash and returns the corresponding sample's shortened name. This is useful for test infrastructure that needs to validate sample identity.

Changes

tests/fixtures.py: Added get_sample_short_name_by_md5() function (inverse of existing get_sample_md5_by_name())
tests/test_fixtures.py: Added unit tests verifying both direct lookup and roundtrip behavior
CHANGELOG.md: Added entry under Development section

Testing

.venv/bin/python3 -m pytest tests/test_fixtures.py -v

This adds get_sample_short_name_by_md5() to provide reverse lookup functionality for test fixtures. The new function is the inverse of the existing get_sample_md5_by_name() function. This is useful when you have an MD5 hash and need to determine which sample fixture it corresponds to, particularly for test infrastructure that validates sample identity. Changes: - Add get_sample_short_name_by_md5() function in tests/fixtures.py - Add tests/test_fixtures.py with unit tests for the new function - Tests verify both direct lookup and roundtrip behavior - Update CHANGELOG.md closes mandiant#1743

gemini-code-assist · 2026-01-01T21:16:37Z

Summary of Changes

Hello @kami922, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a gap in the test fixture infrastructure by implementing a reverse lookup mechanism. Previously, it was possible to find an MD5 hash by sample name, but not the other way around. The new function get_sample_short_name_by_md5() resolves this, enabling test systems to more robustly validate sample identities by converting MD5 hashes back to their corresponding short names.

Highlights

New Functionality: Introduced get_sample_short_name_by_md5() in tests/fixtures.py to provide a reverse lookup for test samples, allowing retrieval of a sample's shortened name given its MD5 hash.
Test Coverage: Added a new test file tests/test_fixtures.py with unit tests for get_sample_short_name_by_md5(), including checks for known samples, error handling for unknown MD5s, and roundtrip verification with get_sample_md5_by_name().
Changelog Update: An entry has been added to CHANGELOG.md under the Development section to document the new fixture function.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a useful reverse lookup function for test fixtures. The implementation is functional, but I've provided suggestions to improve maintainability and performance, particularly by refactoring a long if/elif chain into a dictionary lookup. The new tests are comprehensive, and I've suggested using pytest.mark.parametrize to enhance their readability and extensibility. Lastly, a minor correction is needed in the CHANGELOG.md to align with project conventions and protect privacy.

gemini-code-assist · 2026-01-01T21:18:46Z

tests/fixtures.py

+def get_sample_short_name_by_md5(md5):
+    """reverse lookup: given an MD5 hash, return the sample's shortened name"""
+    if md5 == "5f66b82558ca92e54e77f216ef4c066c":
+        return "mimikatz"
+    elif md5 == "e80758cf485db142fca1ee03a34ead05":
+        return "kernel32"
+    elif md5 == "a8565440629ac87f6fef7d588fe3ff0f":
+        return "kernel32-64"
+    elif md5 == "56bed8249e7c2982a90e54e1e55391a2":
+        return "pma12-04"
+    elif md5 == "7faafc7e4a5c736ebfee6abbbc812d80":
+        return "pma16-01"
+    elif md5 == "290934c61de9176ad682ffdd65f0a669":
+        return "pma01-01"
+    elif md5 == "c8403fb05244e23a7931c766409b5e22":
+        return "pma21-01"
+    elif md5 == "db648cd247281954344f1d810c6fd590":
+        return "al-khaser x86"
+    elif md5 == "3cb21ae76ff3da4b7e02d77ff76e82be":
+        return "al-khaser x64"
+    elif md5 == "b7841b9d5dc1f511a93cc7576672ec0c":
+        return "39c05b15e9834ac93f206bc114d0a00c357c888db567ba8f5345da0529cbed41"
+    elif md5 == "499c2a85f6e8142c3f48d4251c9c7cd6":
+        return "499c2a85f6e8142c3f48d4251c9c7cd6"
+    elif md5 == "9324d1a8ae37a36ae560c37448c9705a":
+        return "9324d1a8ae37a36ae560c37448c9705a"
+    elif md5 == "a198216798ca38f280dc413f8c57f2c2":
+        return "a198216798ca38f280dc413f8c57f2c2"
+    elif md5 == "a933a1a402775cfa94b6bee0963f4b46":
+        return "a933a1a402775cfa94b6bee0963f4b46"
+    elif md5 == "bfb9b5391a13d0afd787e87ab90f14f5":
+        return "bfb9b5391a13d0afd787e87ab90f14f5"
+    elif md5 == "c91887d861d9bd4a5872249b641bc9f9":
+        return "c91887d861d9bd4a5872249b641bc9f9"
+    elif md5 == "64d9f7d96b99467f36e22fada623c3bb":
+        return "64d9f7d96b99467f36e22fada623c3bb"
+    elif md5 == "82bf6347acf15e5d883715dc289d8a2b":
+        return "82bf6347acf15e5d883715dc289d8a2b"
+    elif md5 == "773290480d5445f11d3dc1b800728966":
+        return "773290480d5445f11d3dc1b800728966"
+    elif md5 == "56a6ffe6a02941028cc8235204eef31d":
+        return "3b13b6ba338bc08b0ddcd24515a99faf211da0e7296798162c5c4f9de8af9e2c"
+    elif md5 == "7351f8a40c5450557b24622417fc478d":
+        return "7351f8a40c5450557b24622417fc478d"
+    elif md5 == "79abd17391adc6251ecdc58d13d76baf":
+        return "79abd17391adc6251ecdc58d13d76baf"
+    elif md5 == "946a99f36a46d335dec080d9a4371940":
+        return "946a99f36a46d335dec080d9a4371940"
+    elif md5 == "b9f5bd514485fb06da39beff051b9fdc":
+        return "b9f5bd514485fb06da39beff051b9fdc"
+    elif md5 == "3db3e55b16a7b1b1afb970d5e77c5d98":
+        return "294b8db1f2702b60fb2e42fdc50c2cee6a5046112da9a5703a548a4fa50477bc"
+    elif md5 == "2bf18d0403677378adad9001b1243211":
+        return "2bf18d0403677378adad9001b1243211"
+    elif md5 == "76fa734236daa023444dec26863401dc":
+        return "ea28761b7b6aa3b4111b8e0f17f83e40"
+    else:
+        raise ValueError(f"unexpected sample MD5: {md5}")
+


This long if/elif chain is inefficient and hard to maintain. Using a dictionary for lookups would be much better, providing O(1) average time complexity instead of O(n). It also makes the code more readable and less prone to errors when adding new samples.

For even better maintainability, you might consider creating a single source of truth for the sample mappings to avoid data duplication between get_sample_md5_by_name and this new function. This would prevent potential inconsistencies when adding or updating samples in the future.

_MD5_TO_SHORT_NAME = { "5f66b82558ca92e54e77f216ef4c066c": "mimikatz", "e80758cf485db142fca1ee03a34ead05": "kernel32", "a8565440629ac87f6fef7d588fe3ff0f": "kernel32-64", "56bed8249e7c2982a90e54e1e55391a2": "pma12-04", "7faafc7e4a5c736ebfee6abbbc812d80": "pma16-01", "290934c61de9176ad682ffdd65f0a669": "pma01-01", "c8403fb05244e23a7931c766409b5e22": "pma21-01", "db648cd247281954344f1d810c6fd590": "al-khaser x86", "3cb21ae76ff3da4b7e02d77ff76e82be": "al-khaser x64", "b7841b9d5dc1f511a93cc7576672ec0c": "39c05b15e9834ac93f206bc114d0a00c357c888db567ba8f5345da0529cbed41", "499c2a85f6e8142c3f48d4251c9c7cd6": "499c2a85f6e8142c3f48d4251c9c7cd6", "9324d1a8ae37a36ae560c37448c9705a": "9324d1a8ae37a36ae560c37448c9705a", "a198216798ca38f280dc413f8c57f2c2": "a198216798ca38f280dc413f8c57f2c2", "a933a1a402775cfa94b6bee0963f4b46": "a933a1a402775cfa94b6bee0963f4b46", "bfb9b5391a13d0afd787e87ab90f14f5": "bfb9b5391a13d0afd787e87ab90f14f5", "c91887d861d9bd4a5872249b641bc9f9": "c91887d861d9bd4a5872249b641bc9f9", "64d9f7d96b99467f36e22fada623c3bb": "64d9f7d96b99467f36e22fada623c3bb", "82bf6347acf15e5d883715dc289d8a2b": "82bf6347acf15e5d883715dc289d8a2b", "773290480d5445f11d3dc1b800728966": "773290480d5445f11d3dc1b800728966", "56a6ffe6a02941028cc8235204eef31d": "3b13b6ba338bc08b0ddcd24515a99faf211da0e7296798162c5c4f9de8af9e2c", "7351f8a40c5450557b24622417fc478d": "7351f8a40c5450557b24622417fc478d", "79abd17391adc6251ecdc58d13d76baf": "79abd17391adc6251ecdc58d13d76baf", "946a99f36a46d335dec080d9a4371940": "946a99f36a46d335dec080d9a4371940", "b9f5bd514485fb06da39beff051b9fdc": "b9f5bd514485fb06da39beff051b9fdc", "3db3e55b16a7b1b1afb970d5e77c5d98": "294b8db1f2702b60fb2e42fdc50c2cee6a5046112da9a5703a548a4fa50477bc", "2bf18d0403677378adad9001b1243211": "2bf18d0403677378adad9001b1243211", "76fa734236daa023444dec26863401dc": "ea28761b7b6aa3b4111b8e0f17f83e40", } def get_sample_short_name_by_md5(md5): """reverse lookup: given an MD5 hash, return the sample's shortened name""" try: return _MD5_TO_SHORT_NAME[md5] except KeyError: raise ValueError(f"unexpected sample MD5: {md5}")

gemini-code-assist · 2026-01-01T21:18:46Z

CHANGELOG.md


 - ci: remove redundant "test_run" action from build workflow @mike-hunhoff #2692
 - dev: add bumpmyversion to bump and sync versions across the project @mr-tz
+- fixtures: add function to resolve sample shortened name by MD5 [email protected] #TBD


Please replace your email address with your GitHub handle to maintain privacy and consistency with other entries in this file. Also, the #TBD should be replaced with the pull request number once it's known.

Suggested change

- fixtures: add function to resolve sample shortened name by MD5 [email protected] #TBD

- fixtures: add function to resolve sample shortened name by MD5 @kamranulhaq2002 #TBD

gemini-code-assist · 2026-01-01T21:18:46Z

tests/test_fixtures.py

+def test_get_sample_short_name_by_md5():
+    """Test that get_sample_short_name_by_md5 correctly returns sample names"""
+    # Test known samples
+    assert fixtures.get_sample_short_name_by_md5("5f66b82558ca92e54e77f216ef4c066c") == "mimikatz"
+    assert fixtures.get_sample_short_name_by_md5("e80758cf485db142fca1ee03a34ead05") == "kernel32"
+    assert fixtures.get_sample_short_name_by_md5("a8565440629ac87f6fef7d588fe3ff0f") == "kernel32-64"
+    assert fixtures.get_sample_short_name_by_md5("db648cd247281954344f1d810c6fd590") == "al-khaser x86"
+    assert fixtures.get_sample_short_name_by_md5("3cb21ae76ff3da4b7e02d77ff76e82be") == "al-khaser x64"
+
+    # Test that unknown MD5 raises ValueError
+    with pytest.raises(ValueError, match="unexpected sample MD5"):
+        fixtures.get_sample_short_name_by_md5("0000000000000000000000000000000")


This test function handles two separate concerns: testing known values and testing error handling for unknown values. It's a good practice to separate these into distinct test functions for clarity. Additionally, using pytest.mark.parametrize for the known values will make the test more readable and easier to extend with new test cases.

Suggested change

def test_get_sample_short_name_by_md5():

"""Test that get_sample_short_name_by_md5 correctly returns sample names"""

# Test known samples

assert fixtures.get_sample_short_name_by_md5("5f66b82558ca92e54e77f216ef4c066c") == "mimikatz"

assert fixtures.get_sample_short_name_by_md5("e80758cf485db142fca1ee03a34ead05") == "kernel32"

assert fixtures.get_sample_short_name_by_md5("a8565440629ac87f6fef7d588fe3ff0f") == "kernel32-64"

assert fixtures.get_sample_short_name_by_md5("db648cd247281954344f1d810c6fd590") == "al-khaser x86"

assert fixtures.get_sample_short_name_by_md5("3cb21ae76ff3da4b7e02d77ff76e82be") == "al-khaser x64"

# Test that unknown MD5 raises ValueError

with pytest.raises(ValueError, match="unexpected sample MD5"):

fixtures.get_sample_short_name_by_md5("0000000000000000000000000000000")

@pytest.mark.parametrize(

"md5, name",

[

("5f66b82558ca92e54e77f216ef4c066c", "mimikatz"),

("e80758cf485db142fca1ee03a34ead05", "kernel32"),

("a8565440629ac87f6fef7d588fe3ff0f", "kernel32-64"),

("db648cd247281954344f1d810c6fd590", "al-khaser x86"),

("3cb21ae76ff3da4b7e02d77ff76e82be", "al-khaser x64"),

],

)

def test_get_sample_short_name_by_md5(md5, name):

"""Test that get_sample_short_name_by_md5 correctly returns sample names"""

assert fixtures.get_sample_short_name_by_md5(md5) == name

def test_get_sample_short_name_by_md5_unknown():

"""Test that get_sample_short_name_by_md5 raises ValueError for an unknown MD5"""

with pytest.raises(ValueError, match="unexpected sample MD5"):

fixtures.get_sample_short_name_by_md5("0000000000000000000000000000000")

gemini-code-assist · 2026-01-01T21:18:46Z

tests/test_fixtures.py

+def test_md5_name_lookup_roundtrip():
+    """Test that get_sample_md5_by_name and get_sample_short_name_by_md5 are inverses"""
+    # Test samples with simple names (not hash-based names)
+    simple_names = ["mimikatz", "kernel32", "kernel32-64", "pma12-04", "pma16-01", "pma01-01", "pma21-01"]
+
+    for name in simple_names:
+        md5 = fixtures.get_sample_md5_by_name(name)
+        assert fixtures.get_sample_short_name_by_md5(md5) == name
+
+    # Test samples with hash-based names that use startswith() logic
+    hash_based_samples = [
+        ("499c2a85f6e8142c3f48d4251c9c7cd6", "499c2a85f6e8142c3f48d4251c9c7cd6"),
+        ("9324d1a8ae37a36ae560c37448c9705a", "9324d1a8ae37a36ae560c37448c9705a"),
+        ("64d9f7d96b99467f36e22fada623c3bb", "64d9f7d96b99467f36e22fada623c3bb"),
+    ]
+
+    for md5_hash, expected_name in hash_based_samples:
+        assert fixtures.get_sample_short_name_by_md5(fixtures.get_sample_md5_by_name(expected_name)) == expected_name


This test can be made more readable and maintainable by using pytest.mark.parametrize. This separates the test data from the test logic, making it easier to add new test cases and understand the intent of the test. I've also split it into two separate tests for simple and hash-based names for clarity.

@pytest.mark.parametrize( "name", ["mimikatz", "kernel32", "kernel32-64", "pma12-04", "pma16-01", "pma01-01", "pma21-01"], ) def test_md5_name_lookup_roundtrip_simple_names(name): """Test that get_sample_md5_by_name and get_sample_short_name_by_md5 are inverses for simple names.""" md5 = fixtures.get_sample_md5_by_name(name) assert fixtures.get_sample_short_name_by_md5(md5) == name @pytest.mark.parametrize( "name", [ "499c2a85f6e8142c3f48d4251c9c7cd6", "9324d1a8ae37a36ae560c37448c9705a", "64d9f7d96b99467f36e22fada623c3bb", ], ) def test_md5_name_lookup_roundtrip_hash_names(name): """Test that get_sample_md5_by_name and get_sample_short_name_by_md5 are inverses for hash-based names.""" assert fixtures.get_sample_short_name_by_md5(fixtures.get_sample_md5_by_name(name)) == name

…ests - Refactor get_sample_short_name_by_md5() to use dictionary lookup for O(1) performance - Add _MD5_TO_SHORT_NAME module-level constant for efficient lookups - Refactor tests to use pytest.mark.parametrize for better readability - Split tests into separate functions for different concerns - Fix CHANGELOG: use @kamranulhaq2002 instead of email - Update CHANGELOG: use PR mandiant#2802 instead of #TBD - Fix import order per isort requirements - Apply black and ruff formatting Addresses feedback from Gemini Code Assist bot review.

…ests - Refactor get_sample_short_name_by_md5() to use dictionary lookup for O(1) performance - Add _MD5_TO_SHORT_NAME module-level constant for efficient lookups - Refactor tests to use pytest.mark.parametrize for better readability - Split tests into separate functions for different concerns - Fix CHANGELOG: use @kamranulhaq2002 instead of email - Update CHANGELOG: use PR mandiant#2802 instead of #TBD - Apply code formatting (isort, black, ruff) Addresses feedback from Gemini Code Assist bot review.

williballenthin

please review the associated issue for strategies that we'd like to use when refactoring this code. We don't want to maintain long lists of file names and hashes manually.

gemini-code-assist bot reviewed Jan 1, 2026

View reviewed changes

kami922 mentioned this pull request Jan 1, 2026

fixtures: add function to resolve sample shortened name by MD5 #1743

Open

kami922 force-pushed the fix-1743-add-md5-to-name-lookup branch from 18c192f to 703ff5d Compare January 2, 2026 16:05

Merge branch 'master' into fix-1743-add-md5-to-name-lookup

df3da2d

williballenthin requested changes Jan 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fixtures: add function to resolve sample shortened name by MD5 #2802

fixtures: add function to resolve sample shortened name by MD5 #2802

Uh oh!

kami922 commented Jan 1, 2026

Uh oh!

gemini-code-assist bot commented Jan 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 1, 2026

Uh oh!

gemini-code-assist bot Jan 1, 2026

Uh oh!

gemini-code-assist bot Jan 1, 2026

Uh oh!

gemini-code-assist bot Jan 1, 2026

Uh oh!

williballenthin left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	- fixtures: add function to resolve sample shortened name by MD5 [email protected] #TBD
	- fixtures: add function to resolve sample shortened name by MD5 @kamranulhaq2002 #TBD

fixtures: add function to resolve sample shortened name by MD5 #2802

Are you sure you want to change the base?

fixtures: add function to resolve sample shortened name by MD5 #2802

Uh oh!

Conversation

kami922 commented Jan 1, 2026

Problem

Solution

Changes

Testing

Uh oh!

gemini-code-assist bot commented Jan 1, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

williballenthin left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants