[BUG] Bedrock Guardrail False Positive on Tool Results

### Checks

- [x] I have updated to the lastest minor and patch version of Strands
- [x] I have checked the documentation and this is not expected behavior
- [x] I have searched [./issues](./issues?q=) and there are no duplicates of my issue

### Strands Version

1.23.0

### Python Version

3.12.12

### Operating System

macOS 26.2

### Installation Method

other

### Steps to Reproduce

#### Summary
Bedrock Guardrails incorrectly flags tool results as prompt injection attacks on subsequent messages. Tool results are stored in the conversation history with role: "user", and when the guardrail evaluates the conversation on any follow-up message, it misidentifies previous tool results as malicious user input.

**Key Finding:** The issue occurs because tool results are stored with role: "user" in the conversation history. When the guardrail evaluates subsequent requests, it scans all role: "user" messages—including tool results—and flags certain patterns (e.g., "You are Test Admin User.") as prompt injection attacks.

#### Environment
- SDK: AWS Strands Agents
- Model: us.anthropic.claude-haiku-4-5-20251001-v1:0
- Guardrail: Bedrock Guardrail with prompt attack detection enabled

#### Code snippet
```python
"""Simple reproduction of guardrail false positive."""

import os

import boto3
import strands
from dotenv import load_dotenv
from strands import Agent
from strands.models.bedrock import BedrockModel

load_dotenv()

@strands.tool
def identity_tool() -> str:
    """Tool that returns identity-like text."""
    return "You are Test Admin User."

boto_session = boto3.Session()
bedrock_model = BedrockModel(
    boto_session=boto_session,
    model_id=os.getenv("BEDROCK_MODEL_ID_HAIKU_4_5"),
    streaming=False,
    guardrail_id=os.getenv("BEDROCK_GUARDRAIL"),
    guardrail_version="DRAFT",
    guardrail_trace="enabled",
    guardrail_redact_input=False
)

agent = Agent(
    model=bedrock_model,
    tools=[identity_tool],
)

print("--- Step 1 ---")

# Step 1: This triggers identity_tool, returns "You are Test Admin User."
agent("Call the identity tool")

print("\n\n--- Step 2 ---")

# Step 2: This fails with prompt attack detection
# Guardrail sees "You are Test Admin User." in history as role="user"
agent("Hi there")  # <-- FAILS HERE

print("\n\nCONVERSATION:")
for i, msg in enumerate(agent.messages):
    print(f"[{i}] role: {msg.get('role')}")
    print(f"    content: {msg.get('content')}")
    print()
```

#### What happens
1. A tool returns innocuous text like "You are Test Admin User."
2. This gets stored as role: "user" in conversation history
3. User sends any follow-up message (e.g., "hi there")
4. The guardrail evaluates the full conversation history
5. The guardrail sees the previous tool result with role: "user" and flags it as a prompt injection attack
6. The request is blocked

### Expected Behavior

The guardrail should distinguish between actual user input and tool results when evaluating for prompt injection attacks.

### Actual Behavior

The guardrail treats all `role: "user"` messages identically, regardless of whether they are:
- Actual user input, or
- Tool results returned by the system

### Additional Context

#### Investigation & Attempted Fixes
- Enabling guardrail_latest_message doesn’t prove to be successful, since looking at the Strands SDK's _format_bedrock_messages method, if the last message isn’t a text or an image, then nothing gets wrapped in guard content. In this case, the toolResult is neither text nor image. Thus, the whole conversation history is evaluated by the guardrail.
- Disabling the guardrail when receiving tool results (via hooks) may pose security risks.
- Since _format_bedrock_messages preserves existing guardContent, wrapping the toolResult with guardContent when the agent returns event.agent.messages and this way the guardrail will only evaluate the toolResult could work but this is not an ideal solution.
- Disabling the built-in guardrail on the bedrock model, and instead, using boto3 to manually call ApplyGuardrail via hooks to fully control when and what the guardrail scans could work too but it would add extra code to maintain.

### Possible Solution

_No response_

### Related Issues

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Bedrock Guardrail False Positive on Tool Results #1671

Checks

Strands Version

Python Version

Operating System

Installation Method

Steps to Reproduce

Summary

Environment

Code snippet

What happens

Expected Behavior

Actual Behavior

Additional Context

Investigation & Attempted Fixes

Possible Solution

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Bedrock Guardrail False Positive on Tool Results #1671

Description

Checks

Strands Version

Python Version

Operating System

Installation Method

Steps to Reproduce

Summary

Environment

Code snippet

What happens

Expected Behavior

Actual Behavior

Additional Context

Investigation & Attempted Fixes

Possible Solution

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions