MCP Assist for Home Assistant

A Home Assistant conversation agent that uses MCP (Model Context Protocol) for efficient entity discovery, achieving 95% token reduction compared to traditional methods. Works with LM Studio, llama.cpp, Ollama, OpenAI, Google Gemini, Anthropic Claude, and OpenRouter.

Key Features

✅ 95% Token Reduction: Uses MCP tools for dynamic entity discovery instead of sending all entities
✅ No Entity Dumps: Never sends 12,000+ token entity lists to the LLM
✅ Smart Entity Index: Pre-generated system structure index (~400-800 tokens) for context-aware queries
✅ Multi-Platform Support: Works with LM Studio, llama.cpp, Ollama, OpenAI, Google Gemini, Anthropic Claude, and OpenRouter
✅ Multilingual Support: 21 languages with localized UI, system prompts, and speech detection
✅ Multi-turn Conversations: Maintains conversation context and history
✅ Dynamic Discovery: Finds entities by area, type, device_class, state, or name on-demand
✅ Web Search Tools: Optional DuckDuckGo or Brave Search integration for current information
✅ Works with 1000+ Entities: Efficient even with large Home Assistant installations
✅ Multi-Profile Support: Run multiple conversation agents with different models

The Problem MCP Assist Solves

Traditional voice assistants send your entire entity list (lights, switches, sensors, etc.) to the LLM with every request. For a typical home with 200+ devices, this means:

12,000+ tokens sent every time
Expensive API costs (cloud LLMs)
Slow response times
Context window limitations
Poor performance with large homes

How MCP Assist Works

Instead of dumping all entities, MCP Assist:

Starts an MCP Server on Home Assistant that exposes entity discovery tools
Your LLM connects to the MCP server and gets access to these tools:
- get_index - Get system structure index (areas, domains, device_classes, people, etc.)
- discover_entities - Find entities by type, area, domain, device_class, or state
- get_entity_details - Get current state and attributes
- perform_action - Control devices
- run_script - Execute scripts and return response data
- run_automation - Trigger automations manually
- list_areas - List all areas in your home
- list_domains - List all entity types
- set_conversation_state - Smart follow-up handling
LLM uses the index for smart queries - Understands what exists without full context dump
LLM discovers on-demand - Only fetches the entities it needs for each request
Token usage drops from 12,000+ to ~400 tokens per request

Token Usage Comparison

Method	Token Usage	Description
Traditional	12,000+ tokens	Sends all entity states
MCP Assist	~400 tokens	Uses MCP tools for discovery
Reduction	95%	Massive efficiency gain

Smart Entity Index (v0.5.0+)

The Smart Entity Index provides a lightweight (~400-800 tokens) snapshot of your Home Assistant system structure, enabling context-aware queries without full entity dumps. The index includes areas, domains, device classes, people, calendars, zones, automations, and scripts. For entities without standardized device_class attributes (like custom integrations), LLM-powered gap-filling automatically infers semantic categories from naming patterns. This results in faster, more accurate queries that use ~95% fewer tokens compared to traditional entity dumps.

Multilingual Support (v0.12.0+)

MCP Assist supports 21 languages with localized configuration interfaces, language-aware system prompts, and region-specific speech detection patterns. The integration automatically detects your Home Assistant system language and provides appropriate defaults for system prompts, follow-up phrases, and end conversation words. Supported languages include: Arabic, Chinese (Simplified), Czech, Danish, Dutch, Finnish, Filipino, French, German, Greek, Hindi, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, and Turkish.

Requirements

Home Assistant 2024.1+
One of:
- Local LLMs: LM Studio v0.3.17+, llama.cpp, or Ollama
- Cloud LLMs: OpenAI, Google Gemini, Anthropic Claude, or OpenRouter (API key required)
Python 3.11+

Installation

Add to HACS

Option A: HACS (Recommended)

Click the badge above to add this repository to HACS, or manually add it as a custom repository
Install "MCP Assist" from HACS
Restart Home Assistant

Option B: Manual Installation

Copy the custom_components/mcp_assist folder to your Home Assistant custom_components directory
Restart Home Assistant

Configuration

1. Add the Integration

Go to Settings → Devices & Services → Add Integration
Search for "MCP Assist" and select it

2. Setup Flow

Step 1 - Profile & Server Type:

Profile Name: Give your assistant a name (e.g., "Living Room Assistant")
Server Type: Choose your LLM provider
- LM Studio - Local, free, runs on your machine
- llama.cpp - Local, free, official llama.cpp server
- Ollama - Local, free, command-line based
- OpenAI - Cloud, paid, GPT-5.2 series
- Google Gemini - Cloud, paid/free tier, Gemini 3.0 series
- OpenRouter - Cloud, multi-model gateway with access to 200+ models

Step 2 - Server Configuration:

For Local Servers (LM Studio / llama.cpp / Ollama):

Server URL: Where your LLM server is running
- LM Studio: http://localhost:1234 (default)
- llama.cpp: http://localhost:8080 (default)
- Ollama: http://localhost:11434 (default)
MCP Server Port: Port for the MCP server (default: 8090)

For Cloud Providers (OpenAI / Gemini / Anthropic / OpenRouter):

API Key: Your provider API key (see below for setup)
MCP Server Port: Port for the MCP server (default: 8090)

Step 3 - Model & Prompts:

Model Name: Select from auto-loaded models or enter manually
System Prompt: Customize the assistant's personality
Technical Instructions: Advanced prompt for tool usage (pre-configured)

Step 4 - Advanced Settings:

Temperature: Response randomness (0.0-1.0)
Max Response Tokens: Maximum length of responses
Response Mode: None / Smart / Always (conversation continuation behavior)
Follow-up Phrases: Configurable phrases for pattern detection (default: "anything else, would you, can i", etc.)
End Conversation Words: Words/phrases that end conversations (default: "bye, thanks, stop", etc.)
Control Home Assistant: Enable/disable device control
Max Tool Iterations: How many tool calls allowed per request
Web Search Provider: Choose none, DuckDuckGo, or Brave Search
Brave Search API Key: Your API key (if using Brave Search)
Debug Mode: Extra logging for troubleshooting
Ollama Keep Alive (Ollama only): Control how long models stay loaded in memory
- -1 = Keep loaded indefinitely
- 0 = Unload immediately after response
- "5m" = Keep for 5 minutes (default)
- Duration strings like "24h", "168h" also supported
Ollama Context Window (Ollama only): Custom context window size (0 = use model default)

3. Set as Voice Assistant

In Home Assistant, go to Settings → Voice Assistants
Set your preferred assistant to your MCP Assist profile name
Test with commands!

Usage Examples

Basic Commands

"Turn on the kitchen lights"
"Turn off all the lights in the bedroom"
"What's the temperature in the living room?"

Multi-Turn Conversations

User: "What lights are on?"
Assistant: "The kitchen and living room lights are on."
User: "Turn off the kitchen one"
Assistant: "I've turned off the kitchen light."

Complex Query Example

User: "Do we have a leak?"

Behind the scenes:

1. LLM calls get_index → Sees moisture sensors and water flow monitors exist in system
2. LLM calls discover_entities(device_class="moisture")
   → Returns: binary_sensor.bathroom_leak, binary_sensor.kitchen_sink_leak, binary_sensor.laundry_leak
3. LLM calls discover_entities(name_contains="water flow")
   → Returns: sensor.water_flow_rate
4. LLM calls get_entity_details for each sensor
   → bathroom leak: "off", kitchen leak: "off", laundry leak: "on", water flow: "2.5 gpm"
5. LLM synthesizes response

Assistant: "Yes, the laundry room leak sensor is detecting water and water is flowing at 2.5 gallons per minute. The bathroom and kitchen sensors are dry."

Follow-up User: "Turn off the water main"

Behind the scenes:

1. LLM calls discover_entities(name_contains="water main")
   → Returns: switch.water_main_shutoff
2. LLM calls perform_action(entity_id="switch.water_main_shutoff", action="turn_off")
   → Success

Assistant: "I've shut off the main water valve."

Web Search (if enabled)

"What's the weather forecast for tomorrow?"
"Search for the latest Home Assistant updates"
"What time does the store close?"

Configuration Options

Profile Settings

Profile Name: Unique name for this assistant
Server Type: LM Studio, Ollama (more coming)
Server URL: Where your LLM is running
Model Name: Which model to use

Prompts

System Prompt: Sets the assistant's personality and behavior
Technical Instructions: Low-level instructions for tool usage (usually leave as default)

Advanced Settings

Max Response Tokens: Limit response length (default: 500)
Max History Messages: How many conversation turns to remember (default: 10)
Max Tool Iterations: Prevent infinite loops (default: 10)
Response Mode:
- None: Never ask follow-ups, end immediately
- Smart (default): Contextual follow-ups when relevant, user can end with "bye"/"thanks"
- Always: Natural conversational follow-ups, user can end with "bye"/"thanks"
Follow-up Phrases: Comma-separated phrases for detecting when assistant wants to continue (configurable per profile)
End Conversation Words: Comma-separated words/phrases for user-initiated ending (configurable per profile)
Enable Smart Entity Index: Context-aware entity discovery with automatic gap-filling for uncommon devices (default: enabled)

Temperature Settings

Temperature controls response randomness (0.0 = deterministic, 1.0 = creative). Different providers have different optimal values:

Provider	Default	Reason
Gemini	`1.0`	Google requires 1.0 for Gemini 3 to avoid "looping or degraded performance"
OpenAI (GPT-4)	`0.5`	Balanced for reliable tool calling
OpenAI (GPT-5/o1)	N/A	Reasoning models don't use temperature
Anthropic Claude	`0.5`	Works well across 0.5-1.0 range
LM Studio / llama.cpp	`0.5`	Lower temps improve tool calling accuracy
Ollama	`0.5`	Model-dependent, lower is safer for tools
OpenRouter	`0.5`	Depends on underlying model

Note: You can always override these defaults in Advanced Settings. For Home Assistant voice control, lower temperatures (0.5-0.7) generally provide more consistent tool calling and accurate entity control.

MCP Server Settings

MCP Server Port: Default 8090 (change if port conflict)
Additional Allowed IPs/Ranges: Whitelist Docker containers (e.g., 172.30.0.0/16) or specific IPs for external MCP clients like Claude Code add-on

Web Search

Web Search Provider: Choose between:
- None: Search disabled
- DuckDuckGo: Free web search (no API key required)
- Brave Search: Requires API key from https://brave.com/search/api/
Brave Search API Key: Required only if using Brave Search

Shared vs Per-Profile Settings

MCP Assist has two types of settings:

Per-Profile Settings (independent per conversation agent):

Model name, system prompt, technical instructions
Temperature, max tokens, response mode
Debug mode, max iterations
Server URL (for local LLMs)

Shared Settings (affect ALL profiles):

MCP server port
Web search provider (none/duckduckgo/brave)
Brave API key
Allowed IPs/CIDR ranges
Smart entity index (gap-filling)

When you change shared settings in one profile's options, they apply to all profiles. This is intentional since all profiles share the same MCP server.

Model Compatibility Guide

Not all LLM models support tool calling (function calling) equally well. This integration works best with frontier models (GPT-5.2, Claude Opus 4.5, Gemini 3 Flash) or higher-spec local models. Smaller models may struggle with complex multi-entity queries that require synthesizing large tool result sets.

Understanding Tool Calling Requirements

Tool calling (function calling) requires the model to:

Understand the user's request
Decide which tool to call
Format the tool arguments correctly as JSON
Interpret the tool results
Generate a natural response

Factors affecting tool calling success:

Model size: Larger models (8B+) generally handle tool calling better
Model architecture: Vision-Language (VL) models behave differently than standard models
Inference engine: LM Studio and Ollama optimize models differently
Quantization level: Q4 vs Q8 can affect instruction following

Instruct vs Thinking/Reasoning Models

Instruct Models (e.g., qwen3-8b-instruct):

Fast response times
Best for simple, single-action requests ("turn on the kitchen lights")
May struggle with complex queries requiring multiple tool calls
Good for basic voice commands

Thinking/Reasoning Models (e.g., qwen3-8b-thinking):

Slower response times (more deliberate reasoning)
Much better at complex requests requiring multiple tool calls and context
Handles multi-step queries reliably ("check all rooms for open windows, then turn off lights in those rooms")
Recommended for Home Assistant where queries often involve discovery + action combinations

Choose the model type that best fits your use case. Thinking/reasoning models offer better reliability with complex multi-tool queries, while instruct models provide faster responses for simple commands.

Recommended Models

Consistently Reliable:

✅ Qwen3 VL 32B Instruct - Excellent tool calling
✅ Qwen3 30B A3B Instruct - Very good tool calling
✅ Qwen3 8B Instruct - Good balance, works reliably
✅ Anthropic Opus 4.5 - The very best at tool calling (cloud)
✅ OpenAI GPT-5.2 - Excellent tool calling, very fast (cloud)
✅ Google Gemini 3 Flash - Excellent tool calling, fast, cost-effective (cloud)

Testing Your Model

When tool calling doesn't work, you'll see:

Model claims "I turned on the lights" but nothing happens
No perform_action tool calls in the logs
Actions don't execute, only narration

When tool calling works correctly, you'll see in logs:

discover_entities called to find devices
perform_action called to control them
"✅ Successfully executed" messages
Devices actually change state
Tool calls visible in Settings → Voice Assistants → (profile) → Debug

General Guidelines

Start with larger models (30B) if your hardware supports it - they work consistently across platforms.

If using smaller models (4B-8B), test thoroughly:

Try a simple command like "turn on the kitchen lights"
Check logs to verify tools are being called
Confirm the device actually changes state
If it doesn't work, try the same model on a different platform (LM Studio vs Ollama)

Vision-Language (VL) models are optimized for multimodal tasks and may have different tool calling behavior than standard models.

Dynamic Model Switching

One of MCP Assist's features is dynamic model switching - you can change models in the configuration UI and it takes effect immediately without restarting Home Assistant. This makes it easy to:

Test different models
Switch between fast (Q4) and quality (Q8) quantizations
Try new models as they're released

Troubleshooting

Integration Won't Start

Check that the MCP port isn't already in use
Verify Home Assistant has permission to bind to the port
Check the Home Assistant logs for specific error messages

LM Studio Can't Connect to MCP

Ensure the MCP server is running (check integration status)
Verify the MCP configuration in LM Studio is correct
Check that the URL in LM Studio matches your MCP port (default: 8090)
Restart LM Studio after changing MCP configuration

Ollama Connection Issues

Verify Ollama is running: ollama list
Check the URL matches where Ollama is running (default: http://localhost:11434)
Ensure the model is loaded in Ollama

Cloud Provider Connection Issues

Verify your API key is valid with your provider
Check you have sufficient credits/quota remaining
Check for rate limit errors in Home Assistant logs
Try regenerating your API key if authentication fails
Verify your internet connection is working
Check Home Assistant logs for specific error codes

No Response from Assistant

Verify your LLM has a model loaded
Check that the model name in the integration matches exactly
Ensure your entities are exposed to the conversation assistant
Check Home Assistant logs for API errors
Enable Debug Mode for more detailed logging

Poor Response Quality

Try a different/larger model
Adjust the temperature setting (lower = more focused)
Ensure the model supports tool calling/function calling
Check that Technical Instructions are not modified

Tools Not Working

Verify "Control Home Assistant" is enabled
Check that entities are exposed (Settings → Voice Assistants → Expose)
Look for MCP server errors in logs
Ensure Max Tool Iterations isn't set too low

Entity Exposure

The integration only discovers entities that are exposed to the "conversation" assistant. To expose entities:

Go to Settings → Voice Assistants → Expose
Select entities you want the assistant to control
The integration will automatically discover these when needed

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Home Assistant Community: Community Forum

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
.github/workflows		.github/workflows
custom_components/mcp_assist		custom_components/mcp_assist
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
hacs.json		hacs.json

License

mike-nott/mcp-assist

Folders and files

Latest commit

History

Repository files navigation

MCP Assist for Home Assistant

Key Features

The Problem MCP Assist Solves

How MCP Assist Works

Token Usage Comparison

Smart Entity Index (v0.5.0+)

Multilingual Support (v0.12.0+)

Requirements

Installation

Add to HACS

Option A: HACS (Recommended)

Option B: Manual Installation

Configuration

1. Add the Integration

2. Setup Flow

3. Set as Voice Assistant

Usage Examples

Basic Commands

Multi-Turn Conversations

Complex Query Example

Web Search (if enabled)

Configuration Options

Profile Settings

Prompts

Advanced Settings

Temperature Settings

MCP Server Settings

Web Search

Shared vs Per-Profile Settings

Model Compatibility Guide

Understanding Tool Calling Requirements

Instruct vs Thinking/Reasoning Models

Recommended Models

Testing Your Model

General Guidelines

Dynamic Model Switching

Troubleshooting

Integration Won't Start

LM Studio Can't Connect to MCP

Ollama Connection Issues

Cloud Provider Connection Issues

No Response from Assistant

Poor Response Quality

Tools Not Working

Entity Exposure

License

Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 29

Packages 0

Contributors 4

Languages

Packages