Skip to content

Implement Category Deduplication System #8

@Davz33

Description

@Davz33

Priority: High

Implement automatic category deduplication to prevent redundant categories:

Current Issues:

  • Multiple categories with identical or very similar content
  • No mechanism to detect and merge duplicate categories
  • Manual cleanup required after each categorization

Expected Outcome:

  • Automatic detection of duplicate/similar categories
  • Intelligent merging of redundant categories
  • Clean, distinct category set after each recategorization

Technical Approach:

  • Implement category similarity calculation using TF-IDF vectors
  • Add deduplication logic in _generate_category_names()
  • Merge categories above similarity threshold (e.g., 0.8)
  • Preserve the most descriptive category name when merging

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions