Production-ready examples for building multimodal AI agents with the Strands Agent framework. Process images, documents, and videos with persistent memory using 10-30 lines of code.
Build intelligent agents that process multiple content types with built-in tools—no custom code required:
| Content Type | Formats | Built-in Tool |
|---|---|---|
| Images | PNG, JPEG, GIF, WebP | image_reader, generate_image |
| Documents | PDF, CSV, DOCX, XLS, XLSX | file_read |
| Videos | MP4, MOV, AVI, MKV, WebM | video_reader, nova_reels |
Why Strands Agents?
# Traditional approach: 100+ lines of custom code
# Strands approach: 10 lines with built-in tools
from strands import Agent
from strands.models import BedrockModel
from strands_tools import generate_image, nova_reels
agent = Agent(
model=BedrockModel(model_id="us.anthropic.claude-3-5-sonnet-20241022-v2:0"),
tools=[generate_image, nova_reels] # Built-in tools ready to use!
)
response = agent("Generate a travel image and video of Paris")✅ 10-30 lines of code for complete agents
✅ Built-in tools - No custom code required
✅ AWS-native - Seamless Bedrock integration
✅ Production-ready - Memory, observability included
- AWS account with Amazon Bedrock access
- Python 3.9 or later
- AWS CLI configured
# Clone repository
git clone https://github.com/elizabethfuentes12/strands-agent-samples
cd strands-agent-samples/notebook
# Set up environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
# Configure AWS
aws configure
# Open notebooks in your preferred IDE
# VS Code, JupyterLab, or any notebook editor| # | Notebook | Built-in Tools | Code | What You Build |
|---|---|---|---|---|
| Beginner | ||||
| 1 | Hello World | Basic tools | ~10 lines | Basic agent setup |
| 2 | Custom Tools | Custom tools | ~15 lines | Tool integration |
| 3 | MCP Integration | MCP | ~20 lines | External services |
| Multimodal AI | 📁 multimodal-understanding/ | |||
| 01 | Image & Document Analysis | image_reader, file_read |
~15 lines | Multimodal content processing |
| 02 | Video Analysis & MCP | video_reader, MCP |
~20 lines | Video processing + external tools |
| 03 | Local Memory with FAISS | mem0_memory |
~25 lines | Vector storage & semantic search |
| 04 | Production Memory with S3 | s3_vector_memory |
~25 lines | AWS-native vector storage |
| 05 | AI Content Generation | generate_image, nova_reels |
~30 lines | Generate images & videos |
| 06 | Intelligent Travel Assistant | All tools combined | ~35 lines | Complete AI assistant |
| Advanced | ||||
| 4 | MCP Tools | MCP servers | ~25 lines | Custom MCP servers |
| 5 | Agent-to-Agent | A2A protocol | ~30 lines | Multi-agent systems |
| 6 | Observability | LangFuse, RAGAS | ~35 lines | Production monitoring |
strands-agent-samples/
├── notebook/ # Learning materials
│ ├── 01-hello-world-strands-agents.ipynb
│ ├── 02-custom-tools.ipynb
│ ├── 03-mcp-integration.ipynb
│ ├── 04-Strands_MCP_AND_Tools.ipynb
│ ├── 05-Strands_A2A_Tools.ipynb
│ ├── 06-Strands_Observability_with_LangFuse_and_Evaluation_with_RAGAS.ipynb
│ ├── multimodal-understanding/ # 6-chapter journey
│ │ ├── 01-multimodal-basic.ipynb
│ │ ├── 02-multimodal-with-mcp.ipynb
│ │ ├── 03-multimodal-with-faiss.ipynb
│ │ ├── 04-multimodal-with-s3-vectors.ipynb
│ │ ├── 05-travel-content-generator.ipynb
│ │ ├── 06-travel-assistant-demo.ipynb
│ │ ├── video_reader.py
│ │ ├── video_reader_local.py
│ │ ├── s3_memory.py
│ │ └── travel_content_generator.py
│ ├── mcp_calulator.py
│ ├── mcp_custom_tools_server.py
│ ├── run_a2a_system.py
│ ├── data-sample/ # Test files
│ └── requirements.txt
└── my_agent_cdk/ # AWS CDK deployment
├── lambdas/code/lambda-s-agent # Weather agent
└── lambdas/code/lambda-s-multimodal # Multimodal agent
Process diverse content types through unified interfaces:
- Image Analysis - Visual understanding with Claude Sonnet
- Document Processing - Text extraction from PDFs, Office files
- Video Analysis - Frame extraction and temporal understanding
- Content Generation - Create images and videos with Amazon Nova
Build agents that remember and learn:
- FAISS - Local vector storage for development
- S3 Vectors - AWS-native production memory
- User Isolation - Multi-tenant memory management
- Semantic Search - Context-aware information retrieval
Enterprise-ready implementations:
- Observability - LangFuse tracing and monitoring
- Evaluation - RAGAS metrics for quality assessment
- Multi-Agent - A2A protocol for agent collaboration
- Serverless - AWS Lambda deployment with CDK
| Model | Purpose | Use Case |
|---|---|---|
| Claude 3.5 Sonnet | Text, images, documents | Multimodal understanding |
| Amazon Nova Pro | Video analysis | Video content processing |
| Amazon Nova Canvas | Image generation | Create visual content |
| Amazon Nova Reel | Video generation | Generate video content |
| Titan Embeddings | Vector generation | Semantic search |
| Service | Purpose |
|---|---|
| Amazon Bedrock | Model inference |
| Amazon S3 Vectors | Vector storage |
| Amazon S3 | Media storage |
| AWS Lambda | Serverless compute |
| Framework | Purpose |
|---|---|
| Strands Agents SDK | Agent framework |
| FAISS | Vector search |
| LangFuse | Observability |
| RAGAS | Evaluation |
- Automated content moderation for images and videos
- Document analysis for compliance and insights
- Multi-format data extraction and processing
- Customer support with conversation memory
- Research assistants with cross-modal correlation
- Educational tutors with adaptive learning
- Business intelligence with automated insights
- Knowledge management with semantic search
- Automated content generation pipelines
Deploy serverless agents to AWS Lambda:
cd my_agent_cdk
# Install dependencies
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Package Lambda layers
pip install -r layers/lambda_requirements.txt \
--python-version 3.12 \
--platform manylinux2014_aarch64 \
--target layers/strands/_dependencies \
--only-binary=:all:
python layers/package_for_lambda.py
# Deploy
cdk bootstrap # First time only
cdk deployAvailable Functions:
- Weather Agent - Forecasting with Strands Agent
- Multimodal Agent - Process images, documents, videos
- Start with beginner notebooks
- Progress through multimodal journey
- Test locally with FAISS
- Deploy to production with S3 Vectors
- Use FAISS for development (free, local)
- Monitor Bedrock API usage
- Optimize prompt lengths
- Use appropriate model sizes
- Implement observability with LangFuse
- Set up evaluation with RAGAS
- Use S3 Vectors for scalable memory
- Follow AWS security best practices
| Issue | Solution |
|---|---|
| AWS credentials not found | Run aws configure with valid credentials |
| Bedrock access denied | Enable models in Bedrock console |
| Import errors | Verify pip install -r requirements.txt completed |
| Video generation fails | Check S3 bucket exists and has permissions |
Need help? Check Strands documentation or open an issue.
Contributions welcome! See CONTRIBUTING.md for guidelines.
Report security issues per CONTRIBUTING.md.
This library is licensed under the MIT-0 License. See LICENSE.
Ready to build intelligent multimodal AI agents?
Start with the notebooks and explore the possibilities.