Skip to content

Replace mmseg semantic-segmentation with SAM; integrate with object detection #1152

@jeffbl

Description

@jeffbl

Currently we use mmseg for semantic segmentation. SAM is likely a more reliable choice, especially if running on objects found with LLM object detection.

If we do this, note that the current photo-audio-handler is likely to need tweaks, or even significant redesign:

  • Types of objects and segments found may change and require filtering, although prompt should be tweaked first (e.g., to ask for only the most salient parts of graphic)
  • If there is a close to 1:1 mapping between objects and segments, the audio experience should probably be changed to something where the centroid is rendered as it currently is, but then the segment outline comes immediately after, for each object/region.
  • Things that we currently get from semseg are background aspects like "sky", or "beach" or "wall". An object detector won't pick those up, so we may want to segment them separately, since they can be important aspects of a photograph.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions