Skip to content

Conversation

@ZeroSumQuant
Copy link
Owner

Summary

Implements Phase-4 Task A - manifest builder for file classification and tracking.

Changes

File Classification

Added comprehensive file classification in classify_file():

  • binary: Images, archives, compiled files (.png, .zip, .pyc, etc.)
  • template: HTML/XML templates (.html, .jinja2, .mustache, etc.)
  • notebook: Jupyter notebooks (.ipynb)
  • test: Python files with pytest/unittest imports
  • script: Python files with shebang or __name__ == '__main__'
  • module: Other Python files
  • data: CSV, JSON, Parquet, HDF5 files
  • documentation: Markdown, RST, LICENSE, README files
  • configuration: Project config files (pyproject.toml, .gitignore, etc.)
  • other: Everything else

Manifest Features

  • Records file size, suffix, and executable permissions
  • Flags data files >20MB as "oversize"
  • Excludes common build/cache directories
  • Saves manifest to .cake/manifest.json
  • Provides summary counts by category

Tests

Added comprehensive test suite in test_phase4_manifest.py:

  • Tests for each file classification type
  • Tests for oversize file detection
  • Tests for directory exclusion logic
  • Tests for manifest structure and saving

Next Steps

After merge, implement Phase-4 Task B - the organizer that uses this manifest to reorganize files into proper directories.

Testing

All tests pass:

pytest tests/test_phase4_manifest.py -v
# 17 passed

@ZeroSumQuant ZeroSumQuant merged commit f306f34 into main Jun 5, 2025
5 checks passed
ZeroSumQuant added a commit that referenced this pull request Jun 5, 2025
* feat(manifest): add binary & template buckets; shebang script detection (#46)

* feat(organise): use git mv when repository present
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants