Skip to content

Live API does not perceive visual information and hallucinates #1108

@franzscherr

Description

@franzscherr

Description of the bug:

The current example script for the gemini live API (quickstarts/Get_started_LiveAPI.py) is unable to perceive visual information and hallucinates wildly.

Actual vs expected behavior:

Setting:

  • Run the script as python Get_started_LiveAPI.py --mode screen or python Get_started_LiveAPI.py --mode camera, and
  • query the model with "What do you see on the screen that I am sharing"

This will result in responses like "There is a man shown with dark hair..." or "A chess game...".

Expected response: "The screen shows a terminal with various commands..."

Hence one can conclude that vision currently does not work with live API?

Any other information you'd like to share?

No response

Metadata

Metadata

Assignees

Labels

type:bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions