-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Priority: High
Implement automatic category deduplication to prevent redundant categories:
Current Issues:
- Multiple categories with identical or very similar content
- No mechanism to detect and merge duplicate categories
- Manual cleanup required after each categorization
Expected Outcome:
- Automatic detection of duplicate/similar categories
- Intelligent merging of redundant categories
- Clean, distinct category set after each recategorization
Technical Approach:
- Implement category similarity calculation using TF-IDF vectors
- Add deduplication logic in
_generate_category_names() - Merge categories above similarity threshold (e.g., 0.8)
- Preserve the most descriptive category name when merging
Metadata
Metadata
Assignees
Labels
No labels