Skip to content

Conversation

@calilkhalil
Copy link

Summary

This PR implements the ability to include resident data directly in JSON/CSV output files, eliminating the need to correlate separate binary files manually. Three new command-line parameters provide granular control over which resident data to include.

Motivation

Currently, the --dr flag extracts resident data as separate binary files in the Resident/ subdirectory. This approach creates challenges for automated analysis:

  • Fragmented data requiring manual correlation using Entry-Sequence-Attribute IDs
  • Complex pipeline scripts needed to join metadata with content
  • No filtering capability for specific file types

Resident data is highly relevant in DFIR contexts, particularly for:

  • Malicious scripts (PowerShell, batch, VBS, JavaScript)
  • Configuration files (registry exports, config files)
  • Small payloads and droppers
  • IOC text files

Changes

New Parameters

  • --ir: Enable resident data inclusion in JSON/CSV output (boolean, default: false)
  • --re <extensions>: Comma-separated list of file extensions to include (e.g., ".txt,.ps1,.bat")
  • --rm <bytes>: Maximum size in bytes for resident data inclusion (integer, default: 4096, max: 1024000)

Implementation Details

Modified Files:

  • MFTECmd/MFTRecordOut.cs: Added three new properties

    • ResidentDataBase64: Base64-encoded binary data
    • ResidentDataHex: Hex-formatted byte representation
    • ResidentDataASCII: UTF-8/ASCII text if valid, null otherwise
  • MFTECmd/Program.cs: Core processing logic

    • Added parameter handling in command-line parser
    • Implemented PopulateResidentData() method with filtering logic
    • Updated GetCsvData() to conditionally populate resident data
    • Propagated parameters through the processing pipeline

Processing Flow:

  1. Size validation against --rm parameter
  2. Extension filtering using HashSet for O(1) lookup
  3. DATA attribute extraction (resident only)
  4. Multi-format encoding: Base64, Hex, and validated UTF-8/ASCII
  5. Only processes first resident DATA attribute per file

Text Validation:
ASCII field is populated only if data contains:

  • Printable ASCII characters (32-126)
  • Control characters (CR, LF, TAB)
  • Otherwise remains null for binary data

Evidence

The screenshots below demonstrate the feature in action using the same MFT file:

Without --ir flag (baseline behavior):

image

File "The Chains Not Seen.txt" appears in output with standard metadata fields only. No resident data fields are present.

With --ir flag enabled:

image

The same file now includes three additional fields with the resident data:

  • ResidentDataBase64: Full Base64-encoded content
  • ResidentDataHex: Hex representation of the data
  • ResidentDataASCII: Human-readable text content extracted from resident data

This allows immediate access to file contents without requiring separate binary file extraction and correlation.

Usage Examples

# Include all resident data
MFTECmd.exe -f $MFT --json output --ir

# Filter by extension
MFTECmd.exe -f $MFT --json output --ir --re ".txt,.ps1,.bat"

# Limit size
MFTECmd.exe -f $MFT --csv output --ir --rm 2048

# Combined filtering
MFTECmd.exe -f $MFT --json output --ir --re ".ps1,.vbs" --rm 8192

Testing

I have included a sample $MFT.zip file for testing. The file contains resident data entries that can be used to validate the implementation:

  1. Build the project
  2. Run baseline test without resident data:
   MFTECmd.exe -f $MFT_sample --json output --jsonf baseline.json
  1. Run with resident data enabled:
   MFTECmd.exe -f $MFT_sample --json output --jsonf with_resident.json --ir
  1. Compare outputs to verify new fields appear only when --ir is specified
  2. Test filtering with specific extensions:
   MFTECmd.exe -f $MFT_sample --json output --ir --re ".txt,.ps1"
  1. Validate size limit enforcement:
   MFTECmd.exe -f $MFT_sample --json output --ir --rm 2048

Backward Compatibility

This change is fully backward compatible:

  • New parameters are optional with default values
  • Output format unchanged when flags are not used
  • Existing workflows remain unaffected
  • No breaking changes to existing functionality

Performance Considerations

  • Minimal CPU overhead: 5-10% increase for Base64/Hex encoding
  • Early exit optimizations: size and extension filtering before processing
  • Memory proportional to filtered resident files only
  • HashSet used for O(1) extension lookup
  • Only first resident DATA attribute processed per file

Use Cases

This feature streamlines several DFIR workflows:

  1. Malware Analysis: Direct access to small script payloads without file extraction
  2. IOC Hunting: Search resident data fields for indicators using jq/grep
  3. Timeline Analysis: Include file contents in SIEM ingestion pipelines
  4. Automated Triage: Parse resident data programmatically without filesystem operations

Future Enhancements

Potential improvements for subsequent PRs:

  • Regex pattern support for file matching
  • Automatic encoding detection (UTF-16, CP1252)
  • Optional compression (gzip + Base64)
  • Hash computation (MD5/SHA256) for resident data

Checklist

  • Code follows project conventions
  • Parameter naming uses abbreviated format (--ir, --re, --rm)
  • Backward compatible implementation
  • Performance optimizations applied
  • Sample data provided for testing
  • Feature validated with real MFT file

…ptions

- Add --ir flag to enable resident data inclusion

- Add --re parameter for extension filtering

- Add --rm parameter for size limit (default 4096 bytes)

- Add ResidentDataBase64, ResidentDataHex, ResidentDataASCII fields to MFTRecordOut

- Implement PopulateResidentData() with validation and encoding logic

- Support multiple output formats: Base64, Hex, UTF-8/ASCII

- Add early exit optimizations for performance

- Maintain backward compatibility with existing workflows
@AndrewRathbun
Copy link
Contributor

This looks really cool but quick question, what's the largest resident file you've seen dumped from an MFT? For me, it's maybe 744 bytes? Typically around 735 bytes. The default size being set at 4096 with a maximum possible integer being MUCH larger seems irrelevant, no?

@calilkhalil
Copy link
Author

This looks really cool but quick question, what's the largest resident file you've seen dumped from an MFT? For me, it's maybe 744 bytes? Typically around 735 bytes. The default size being set at 4096 with a maximum possible integer being MUCH larger seems irrelevant, no?

You're absolutely correct. The typical MFT FILE record size is 1024 bytes (can be 2048 or 4096 in some configurations), and after accounting for standard attributes ($STANDARD_INFORMATION, $FILE_NAME, headers, etc.), resident data is typically limited to around 700-750 bytes, with a practical maximum around 900 bytes in most scenarios.

The 4096 default was chosen conservatively to avoid filtering out edge cases, but you're right that it's unnecessarily high given NTFS constraints.

@calilkhalil
Copy link
Author

calilkhalil commented Nov 3, 2025

@AndrewRathbun, I can adjust the defaults to:

  • --rm default: 1024 bytes (covers all typical scenarios)

  • Maximum allowed: Keep at 1024000 for flexibility with non-standard MFT configurations, though this is largely theoretical

The current implementation already validates and skips non-resident data regardless of the size parameter, so this change would only affect the default filtering behavior.

Would you prefer I update the default to 1024 in this PR, or handle it separately?

@calilkhalil
Copy link
Author

Hey,

Just checking in on this PR. Are there any blockers or additional changes needed?

@calilkhalil
Copy link
Author

Hi @EricZimmerman and @AndrewRathbun, just wanted to follow up on this PR. Is there anything else you'd like me to change or any feedback?

As noted by @AndrewRathbun, typical MFT FILE record size is 1024 bytes
(can be 2048 or 4096 in some configurations), and after accounting for
standard attributes, resident data is typically limited to ~700-750 bytes
with a practical maximum around 900 bytes.

Changed default value to 1024 to better match NTFS constraints while
still covering all typical scenarios. Maximum allowed remains at 1024000
for flexibility with non-standard MFT configurations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants