Replace mmseg semantic-segmentation with SAM; integrate with object detection

Currently we use mmseg for semantic segmentation. SAM is likely a more reliable choice, especially if running on [objects found with LLM object detection](https://github.com/Shared-Reality-Lab/IMAGE-server/issues/1147).

If we do this, note that the current photo-audio-handler is likely to need tweaks, or even significant redesign:
- Types of objects and segments found may change and require filtering, although prompt should be tweaked first (e.g., to ask for only the most salient parts of graphic)
- If there is a close to 1:1 mapping between objects and segments, the audio experience should probably be changed to something where the centroid is rendered as it currently is, but then the segment outline comes immediately after, for each object/region.
- Things that we currently get from semseg are background aspects like "sky", or "beach" or "wall". An object detector won't pick those up, so we may want to segment them separately, since they can be important aspects of a photograph.
- 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace mmseg semantic-segmentation with SAM; integrate with object detection #1152

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Replace mmseg semantic-segmentation with SAM; integrate with object detection #1152

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions