Skip to content

🤖 Production-ready samples for building multi-modal AI agents that understand images, documents, videos, and text using Amazon Bedrock and Strands Agents. Features Claude integration, MCP tools, streaming responses, and enterprise-grade architecture.

License

Notifications You must be signed in to change notification settings

elizabethfuentes12/strands-agent-samples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Strands Agent Samples

Python 3.9+ AWS Bedrock License MIT-0

Production-ready examples for building multimodal AI agents with the Strands Agent framework. Process images, documents, and videos with persistent memory using 10-30 lines of code.


🎯 What You'll Build

Build intelligent agents that process multiple content types with built-in tools—no custom code required:

Content Type Formats Built-in Tool
Images PNG, JPEG, GIF, WebP image_reader, generate_image
Documents PDF, CSV, DOCX, XLS, XLSX file_read
Videos MP4, MOV, AVI, MKV, WebM video_reader, nova_reels

Why Strands Agents?

# Traditional approach: 100+ lines of custom code
# Strands approach: 10 lines with built-in tools

from strands import Agent
from strands.models import BedrockModel
from strands_tools import generate_image, nova_reels

agent = Agent(
    model=BedrockModel(model_id="us.anthropic.claude-3-5-sonnet-20241022-v2:0"),
    tools=[generate_image, nova_reels]  # Built-in tools ready to use!
)

response = agent("Generate a travel image and video of Paris")

10-30 lines of code for complete agents
Built-in tools - No custom code required
AWS-native - Seamless Bedrock integration
Production-ready - Memory, observability included


🚀 Quick Start

Prerequisites

  • AWS account with Amazon Bedrock access
  • Python 3.9 or later
  • AWS CLI configured

Installation

# Clone repository
git clone https://github.com/elizabethfuentes12/strands-agent-samples
cd strands-agent-samples/notebook

# Set up environment
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r requirements.txt

# Configure AWS
aws configure

# Open notebooks in your preferred IDE
# VS Code, JupyterLab, or any notebook editor

📚 Complete Learning Path

# Notebook Built-in Tools Code What You Build
Beginner
1 Hello World Basic tools ~10 lines Basic agent setup
2 Custom Tools Custom tools ~15 lines Tool integration
3 MCP Integration MCP ~20 lines External services
Multimodal AI 📁 multimodal-understanding/
01 Image & Document Analysis image_reader, file_read ~15 lines Multimodal content processing
02 Video Analysis & MCP video_reader, MCP ~20 lines Video processing + external tools
03 Local Memory with FAISS mem0_memory ~25 lines Vector storage & semantic search
04 Production Memory with S3 s3_vector_memory ~25 lines AWS-native vector storage
05 AI Content Generation generate_image, nova_reels ~30 lines Generate images & videos
06 Intelligent Travel Assistant All tools combined ~35 lines Complete AI assistant
Advanced
4 MCP Tools MCP servers ~25 lines Custom MCP servers
5 Agent-to-Agent A2A protocol ~30 lines Multi-agent systems
6 Observability LangFuse, RAGAS ~35 lines Production monitoring

📖 Detailed guides | 📝 Article


🏗️ Repository Structure

strands-agent-samples/
├── notebook/                              # Learning materials
│   ├── 01-hello-world-strands-agents.ipynb
│   ├── 02-custom-tools.ipynb
│   ├── 03-mcp-integration.ipynb
│   ├── 04-Strands_MCP_AND_Tools.ipynb
│   ├── 05-Strands_A2A_Tools.ipynb
│   ├── 06-Strands_Observability_with_LangFuse_and_Evaluation_with_RAGAS.ipynb
│   ├── multimodal-understanding/         # 6-chapter journey
│   │   ├── 01-multimodal-basic.ipynb
│   │   ├── 02-multimodal-with-mcp.ipynb
│   │   ├── 03-multimodal-with-faiss.ipynb
│   │   ├── 04-multimodal-with-s3-vectors.ipynb
│   │   ├── 05-travel-content-generator.ipynb
│   │   ├── 06-travel-assistant-demo.ipynb
│   │   ├── video_reader.py
│   │   ├── video_reader_local.py
│   │   ├── s3_memory.py
│   │   └── travel_content_generator.py
│   ├── mcp_calulator.py
│   ├── mcp_custom_tools_server.py
│   ├── run_a2a_system.py
│   ├── data-sample/                      # Test files
│   └── requirements.txt
└── my_agent_cdk/                         # AWS CDK deployment
    ├── lambdas/code/lambda-s-agent       # Weather agent
    └── lambdas/code/lambda-s-multimodal  # Multimodal agent

🎯 Key Features

Multimodal Processing

Process diverse content types through unified interfaces:

  • Image Analysis - Visual understanding with Claude Sonnet
  • Document Processing - Text extraction from PDFs, Office files
  • Video Analysis - Frame extraction and temporal understanding
  • Content Generation - Create images and videos with Amazon Nova

Memory Systems

Build agents that remember and learn:

  • FAISS - Local vector storage for development
  • S3 Vectors - AWS-native production memory
  • User Isolation - Multi-tenant memory management
  • Semantic Search - Context-aware information retrieval

Production Patterns

Enterprise-ready implementations:

  • Observability - LangFuse tracing and monitoring
  • Evaluation - RAGAS metrics for quality assessment
  • Multi-Agent - A2A protocol for agent collaboration
  • Serverless - AWS Lambda deployment with CDK

🛠️ Technologies

AI Models

Model Purpose Use Case
Claude 3.5 Sonnet Text, images, documents Multimodal understanding
Amazon Nova Pro Video analysis Video content processing
Amazon Nova Canvas Image generation Create visual content
Amazon Nova Reel Video generation Generate video content
Titan Embeddings Vector generation Semantic search

AWS Services

Service Purpose
Amazon Bedrock Model inference
Amazon S3 Vectors Vector storage
Amazon S3 Media storage
AWS Lambda Serverless compute

Frameworks

Framework Purpose
Strands Agents SDK Agent framework
FAISS Vector search
LangFuse Observability
RAGAS Evaluation

💡 Use Cases

Content Intelligence

  • Automated content moderation for images and videos
  • Document analysis for compliance and insights
  • Multi-format data extraction and processing

Intelligent Assistants

  • Customer support with conversation memory
  • Research assistants with cross-modal correlation
  • Educational tutors with adaptive learning

Enterprise Solutions

  • Business intelligence with automated insights
  • Knowledge management with semantic search
  • Automated content generation pipelines

☁️ AWS CDK Deployment

Deploy serverless agents to AWS Lambda:

cd my_agent_cdk

# Install dependencies
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Package Lambda layers
pip install -r layers/lambda_requirements.txt \
  --python-version 3.12 \
  --platform manylinux2014_aarch64 \
  --target layers/strands/_dependencies \
  --only-binary=:all:

python layers/package_for_lambda.py

# Deploy
cdk bootstrap  # First time only
cdk deploy

Available Functions:

  • Weather Agent - Forecasting with Strands Agent
  • Multimodal Agent - Process images, documents, videos

📖 View CDK documentation →


📖 Documentation

Strands Agents

AWS Services

Frameworks

Articles


🎓 Best Practices

Development Workflow

  1. Start with beginner notebooks
  2. Progress through multimodal journey
  3. Test locally with FAISS
  4. Deploy to production with S3 Vectors

Cost Optimization

  • Use FAISS for development (free, local)
  • Monitor Bedrock API usage
  • Optimize prompt lengths
  • Use appropriate model sizes

Production Deployment

  • Implement observability with LangFuse
  • Set up evaluation with RAGAS
  • Use S3 Vectors for scalable memory
  • Follow AWS security best practices

🐛 Troubleshooting

Issue Solution
AWS credentials not found Run aws configure with valid credentials
Bedrock access denied Enable models in Bedrock console
Import errors Verify pip install -r requirements.txt completed
Video generation fails Check S3 bucket exists and has permissions

Need help? Check Strands documentation or open an issue.


🤝 Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

🔒 Security

Report security issues per CONTRIBUTING.md.

📄 License

This library is licensed under the MIT-0 License. See LICENSE.


Ready to build intelligent multimodal AI agents?
Start with the notebooks and explore the possibilities.


🇻🇪🇨🇱 Created by Eli | Dev.to | GitHub | Twitter | YouTube

About

🤖 Production-ready samples for building multi-modal AI agents that understand images, documents, videos, and text using Amazon Bedrock and Strands Agents. Features Claude integration, MCP tools, streaming responses, and enterprise-grade architecture.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published