#!bash scripts/prereq.sh
Lab 2: Personalize our agent by adding memory
Author: Aarushi Nema
Date: 2025-09-17
Tags: Customer Support Agent, Amazon BedRock AgentCore, AWS, Machine Learning, Agentic AI, Memory, Personalization, Short-Term Memory, Long-Term Memory, Production, Cloud Computing, Strands Agents, Anthropic Claude, Bedrock
Categories: Machine Learning, Agentic AI, AWS, Workshop
Abstract: This lab focuses on enhancing the customer support agent with memory capabilities using Amazon Bedrock AgentCore Memory. Learn to implement both short-term and long-term memory for personalized customer experiences.
Overview
In Lab 1, you built a Customer Support Agent that worked well for a single user in a local session. However, real-world customer support needs to scale beyond a single user running in a local environment.
When we run an Agent in Production, we’ll need: - Multi-User Support: Handle thousands of customers simultaneously - Persistent Storage: Save conversations beyond session lifecycle - Long-Term Learning: Extract customer preferences and behavioral patterns - Cross-Session Continuity: Remember customers across different interactions
Workshop Progress: - Lab 1 (Done): Create Agent Prototype - Build a functional customer support agent - Lab 2 (Current): Enhance with Memory - Add conversation context and personalization - Lab 3: Scale with Gateway & Identity - Share tools across agents securely - Lab 4: Deploy to Production - Use AgentCore Runtime with observability - Lab 5: Build User Interface - Create a customer-facing application
In this lab, you’ll add the missing persistence and learning layer that transforms your Goldfish-Agent (forgets the conversation in seconds) into an smart personalized Assistant.
Memory is a critical component of intelligence. While Large Language Models (LLMs) have impressive capabilities, they lack persistent memory across conversations. Amazon Bedrock AgentCore Memory addresses this limitation by providing a managed service that enables AI agents to maintain context over time, remember important facts, and deliver consistent, personalized experiences.
AgentCore Memory operates on two levels: - Short-Term Memory: Immediate conversation context and session-based information that provides continuity within a single interaction or closely related sessions. - Long-Term Memory: Persistent information extracted and stored across multiple conversations, including facts, preferences, and summaries that enable personalized experiences over time.
Learning Objectives
- Understand the importance of memory in AI agents
- Implement AgentCore Memory resources with multiple strategies
- Configure short-term and long-term memory for customer support
- Learn about USER_PREFERENCE and SEMANTIC memory strategies
- Transform a stateless agent into a personalized assistant
Architecture for Lab 2
<img src="images/architecture_lab2_memory.png" width="75%"/>
Multi-user agent with persistent short term and long term memory capabilities.
Prerequisites
- AWS Account with appropriate permissions
- Python 3.10+ installed locally
- AWS CLI configured with credentials
- Anthropic Claude 3.7 enabled on Amazon Bedrock
- Strands Agents and other libraries installed in the next cells
- These resources are created for you within an AWS workshop account
- AWS Lambda function
- AWS Lambda Execution IAM Role
- AgentCore Gateway IAM Role
- DynamoDB tables used by the AWS Lambda function.
- Cognito User Pool and User Pool Client #### Not using an AWS workshop account?
Note: If you are running this as a self-paced lab you must run create the cloudformation resources as shown in the workshop self-paced steps. If you have not, then uncomment and run the below code segment
Step 1: Import Libraries
Let’s import the libraries for AgentCore Memory. For it, we will use the Amazon Bedrock AgentCore Python SDK, a lightweight wrapper that helps you working with AgentCore capabilities.
import logging
# Import agentCore Memory
from bedrock_agentcore.memory import MemoryClient
from bedrock_agentcore.memory.constants import StrategyType
from strands.hooks import AfterInvocationEvent, HookProvider, HookRegistry, MessageAddedEvent
import boto3
from boto3.session import Session
= Session()
boto_session = boto_session.region_name
REGION
= logging.getLogger(__name__)
logger
from lab_helpers.utils import get_ssm_parameter, put_ssm_parameter
Step 2: Create Bedrock AgentCore Memory resources
Amazon Bedrock AgentCore Memory is a fully managed service that provides persistent memory capabilities for AI agents.
AgentCore Memory Concepts:
- Short-Term Memory (STM): Immediately stores conversation context within the session
- Long-Term Memory (LTM): Asynchronously processes STM to extract meaningful patterns, preferences and facts
- Memory Strategies: Different approaches for extracting and organizing information:
- USER_PREFERENCE: Learns customer preferences, behaviors, and patterns
- SEMANTIC: Stores factual information using vector embeddings for similarity search
- Namespaces: Logical grouping of memories by customer and context type. We’ll create these two namespaces:
support/customer/{actorId}/preferences
: Customer preferences and behavioral patternssupport/customer/{actorId}/semantic
: Factual information and conversation history
This structure enables multi-tenant memory where each customer’s information is isolated and easily retrievable.
Memory Creation Process:
Creating memory resources involves provisioning the underlying infrastructure (vector databases, processing pipelines, etc.). This typically takes 2-3 minutes as AWS sets up the managed services behind the scenes.
= MemoryClient(region_name=REGION)
memory_client = "CustomerSupportMemory"
memory_name
def create_or_get_memory_resource():
try:
= get_ssm_parameter("/app/customersupport/agentcore/memory_id")
memory_id =memory_id)
memory_client.gmcp_client.get_memory(memoryIdreturn memory_id
except:
try:
= [
strategies
{
StrategyType.USER_PREFERENCE.value: {"name": "CustomerPreferences",
"description": "Captures customer preferences and behavior",
"namespaces": ["support/customer/{actorId}/preferences"],
}
},
{
StrategyType.SEMANTIC.value: {"name": "CustomerSupportSemantic",
"description": "Stores facts from conversations",
"namespaces": ["support/customer/{actorId}/semantic"],
}
},
]print("Creating AgentCore Memory resources. This will take 2-3 minutes...")
print("While we wait, let's understand what's happening behind the scenes:")
print("• Setting up managed vector databases for semantic search")
print("• Configuring memory extraction pipelines")
print("• Provisioning secure, multi-tenant storage")
print("• Establishing namespace isolation for customer data")
# *** AGENTCORE MEMORY USAGE *** - Create memory resource with semantic strategy
= memory_client.create_memory_and_wait(
response =memory_name,
name="Customer support agent memory",
description=strategies,
strategies=90, # Memories expire after 90 days
event_expiry_days
)= response["id"]
memory_id try:
"/app/customersupport/agentcore/memory_id", memory_id)
put_ssm_parameter(except:
raise
return memory_id
except Exception as e:
print(f"Failed to create memory resource: {e}")
return None
= create_or_get_memory_resource()
memory_id if memory_id:
print("✅ AgentCore Memory created successfully!")
print(f"Memory ID: {memory_id}")
else:
print("Memory resource not created. Try Again !")
Creating AgentCore Memory resources. This will take 2-3 minutes...
While we wait, let's understand what's happening behind the scenes:
• Setting up managed vector databases for semantic search
• Configuring memory extraction pipelines
• Provisioning secure, multi-tenant storage
• Establishing namespace isolation for customer data
✅ AgentCore Memory created successfully!
Memory ID: CustomerSupportMemory-cGl9C845Vd
Step 3: Seed previous customer interactions
Why are we seeding memory?
In production, agents accumulate memory naturally through customer interactions. However, for this lab, we’re seeding historical conversations to demonstrate how Long-Term Memory (LTM) works without waiting for real conversations.
How memory processing works: 1. create_event
stores interactions in Short-Term Memory (STM) instantly 2. STM is asynchronously processed by Long-Term Memory strategies 3. LTM extracts patterns, preferences, and facts for future retrieval
Let’s seed some customer history to see this in action:
# List existing memory resources
for memory in memory_client.list_memories():
print(f"Memory Arn: {memory.get('arn')}")
print(f"Memory ID: {memory.get('id')}")
print("--------------------------------------------------------------------")
# Seed with previous customer interactions
= "customer_001"
CUSTOMER_ID
= [
previous_interactions "I'm having issues with my MacBook Pro overheating during video editing.","USER"),
("I can help with that thermal issue. For video editing workloads, let's check your Activity Monitor and adjust performance settings. Your MacBook Pro order #MB-78432 is still under warranty.", "ASSISTANT"),
("What's the return policy on gaming headphones? I need low latency for competitive FPS games", "USER"),
("For gaming headphones, you have 30 days to return. Since you're into competitive FPS, I'd recommend checking the audio latency specs - most gaming models have <40ms latency.", "ASSISTANT"),
("I need a laptop under $1200 for programming. Prefer 16GB RAM minimum and good Linux compatibility. I like ThinkPad models.", "USER"),
("Perfect! For development work, I'd suggest looking at our ThinkPad E series or Dell XPS models. Both have excellent Linux support and 16GB RAM options within your budget.", "ASSISTANT"),
(
]
# Save previous interactions
if memory_id:
try:
memory_client.create_event(=memory_id,
memory_id=CUSTOMER_ID,
actor_id="previous_session",
session_id=previous_interactions
messages
)print("✅ Seeded customer history successfully")
print("📝 Interactions saved to Short-Term Memory")
print("⏳ Long-Term Memory processing will begin automatically...")
except Exception as e:
print(f"⚠️ Error seeding history: {e}")
Memory Arn: arn:aws:bedrock-agentcore:us-west-2:900569417635:memory/CustomerSupportMemory-cGl9C845Vd
Memory ID: CustomerSupportMemory-cGl9C845Vd
--------------------------------------------------------------------
✅ Seeded customer history successfully
📝 Interactions saved to Short-Term Memory
⏳ Long-Term Memory processing will begin automatically...
Understanding Memory Processing
After creating events with create_event
, AgentCore Memory processes the data in two stages:
- Immediate: Messages stored in Short-Term Memory (STM)
- Asynchronous: STM processed into Long-Term Memory (LTM) strategies
LTM processing typically takes 20-30 seconds as the system: - Analyzes conversation patterns - Extracts customer preferences and behaviors - Creates semantic embeddings for factual information - Organizes memories by namespace for efficient retrieval
Let’s check if our Long-Term Memory processing is complete by retrieving customer preferences:
import time
# Wait for Long-Term Memory processing to complete
print("🔍 Checking for processed Long-Term Memories...")
= 0
retries = 6 # 1 minute wait
max_retries
while retries < max_retries:
= memory_client.retrieve_memories(
memories =memory_id,
memory_id=f"support/customer/{CUSTOMER_ID}/preferences",
namespace="can you summarize the support issue"
query
)
if memories:
print(f"✅ Found {len(memories)} preference memories after {retries * 10} seconds!")
break
+= 1
retries if retries < max_retries:
print(f"⏳ Still processing... waiting 10 more seconds (attempt {retries}/{max_retries})")
10)
time.sleep(else:
print("⚠️ Memory processing is taking longer than expected. This can happen with overloading..")
break
print("🎯 AgentCore Memory automatically extracted these customer preferences from our seeded conversations:")
print("=" * 80)
for i, memory in enumerate(memories, 1):
if isinstance(memory, dict):
= memory.get('content', {})
content if isinstance(content, dict):
= content.get('text', '')
text print(f" {i}. {text}")
🔍 Checking for processed Long-Term Memories...
⏳ Still processing... waiting 10 more seconds (attempt 1/6)
⏳ Still processing... waiting 10 more seconds (attempt 2/6)
⏳ Still processing... waiting 10 more seconds (attempt 3/6)
⏳ Still processing... waiting 10 more seconds (attempt 4/6)
✅ Found 3 preference memories after 40 seconds!
🎯 AgentCore Memory automatically extracted these customer preferences from our seeded conversations:
================================================================================
1. {"context":"User reported technical issue with MacBook Pro during video editing","preference":"Uses MacBook Pro for video editing, experiencing performance/thermal challenges","categories":["technology","computing","video editing","hardware"]}
2. {"context":"User inquired about gaming headphones with specific performance requirement","preference":"Needs low latency gaming headphones for competitive FPS games","categories":["gaming","audio equipment","technology"]}
3. {"context":"User explicitly mentioned requirements for laptop purchase for programming","preference":"Wants laptop under $1200, with 16GB RAM minimum, good Linux compatibility, preferring ThinkPad models","categories":["technology","computing","laptops","programming"]}
Exploring Semantic Memory
Semantic memory stores factual information from conversations using vector embeddings. This enables similarity-based retrieval of relevant facts and context.
import time
# Retrieve semantic memories (factual information)
while True:
= memory_client.retrieve_memories(
semantic_memories =memory_id,
memory_id=f"support/customer/{CUSTOMER_ID}/semantic",
namespace="information on the technical support issue"
query
)print("🧠 AgentCore Memory identified these factual details from conversations:")
print("=" * 80)
if memories:
break
10)
time.sleep(for i, memory in enumerate(semantic_memories, 1):
if isinstance(memory, dict):
= memory.get('content', {})
content if isinstance(content, dict):
= content.get('text', '')
text print(f" {i}. {text}")
🧠 AgentCore Memory identified these factual details from conversations:
================================================================================
1. The user is interested in gaming headphones with low latency for competitive FPS games.
2. The user is looking for a laptop under $1200 for programming, with a preference for 16GB RAM and good Linux compatibility.
3. The user is experiencing overheating issues with their MacBook Pro during video editing.
Step 3: Implement Strands Hooks to save and retrieve agent interactions
Now we’ll integrate AgentCore Memory with our agent using Strands’ hook system. This creates an automatic memory layer that works seamlessly with any agent conversation.
- MessageAddedEvent: Triggered when messages are added to the conversation, allowing us to retrieve and inject customer context
- AfterInvocationEvent: Fired after agent responses, enabling automatic storage of interactions to memory
The hook system ensures memory operations happen automatically without manual intervention, creating a seamless experience where customer context is preserved across conversations.
To create the hooks we will extend the HookProvider
class:
class CustomerSupportMemoryHooks(HookProvider):
"""Memory hooks for customer support agent"""
def __init__(
self, memory_id: str, client: MemoryClient, actor_id: str, session_id: str
):self.memory_id = memory_id
self.client = client
self.actor_id = actor_id
self.session_id = session_id
self.namespaces = {
"type"]: i["namespaces"][0]
i[for i in self.client.get_memory_strategies(self.memory_id)
}
def retrieve_customer_context(self, event: MessageAddedEvent):
"""Retrieve customer context before processing support query"""
= event.agent.messages
messages if (
-1]["role"] == "user"
messages[and "toolResult" not in messages[-1]["content"][0]
):= messages[-1]["content"][0]["text"]
user_query
try:
= []
all_context
for context_type, namespace in self.namespaces.items():
# *** AGENTCORE MEMORY USAGE *** - Retrieve customer context from each namespace
= self.client.retrieve_memories(
memories =self.memory_id,
memory_id=namespace.format(actorId=self.actor_id),
namespace=user_query,
query=3,
top_k
)# Post-processing: Format memories into context strings
for memory in memories:
if isinstance(memory, dict):
= memory.get("content", {})
content if isinstance(content, dict):
= content.get("text", "").strip()
text if text:
all_context.append(f"[{context_type.upper()}] {text}"
)
# Inject customer context into the query
if all_context:
= "\n".join(all_context)
context_text = messages[-1]["content"][0]["text"]
original_text -1]["content"][0][
messages["text"
= f"Customer Context:\n{context_text}\n\n{original_text}"
] f"Retrieved {len(all_context)} customer context items")
logger.info(
except Exception as e:
f"Failed to retrieve customer context: {e}")
logger.error(
def save_support_interaction(self, event: AfterInvocationEvent):
"""Save customer support interaction after agent response"""
try:
= event.agent.messages
messages if len(messages) >= 2 and messages[-1]["role"] == "assistant":
# Get last customer query and agent response
= None
customer_query = None
agent_response
for msg in reversed(messages):
if msg["role"] == "assistant" and not agent_response:
= msg["content"][0]["text"]
agent_response elif (
"role"] == "user"
msg[and not customer_query
and "toolResult" not in msg["content"][0]
):= msg["content"][0]["text"]
customer_query break
if customer_query and agent_response:
# *** AGENTCORE MEMORY USAGE *** - Save the support interaction
self.client.create_event(
=self.memory_id,
memory_id=self.actor_id,
actor_id=self.session_id,
session_id=[
messages"USER"),
(customer_query, "ASSISTANT"),
(agent_response,
],
)"Saved support interaction to memory")
logger.info(
except Exception as e:
f"Failed to save support interaction: {e}")
logger.error(
def register_hooks(self, registry: HookRegistry) -> None:
"""Register customer support memory hooks"""
self.retrieve_customer_context)
registry.add_callback(MessageAddedEvent, self.save_support_interaction)
registry.add_callback(AfterInvocationEvent, "Customer support memory hooks registered")
logger.info(
Step 4: Create a Customer Support Agent with memory
Next, we will implement the Customer Support Agent just as we did in Lab 1, but this time we instantiate the class CustomerSupportMemoryHooks
and we pass the memory hook to the agent contructor.
import uuid
from strands import Agent
from strands.models import BedrockModel
from lab_helpers.lab1_strands_agent import (
SYSTEM_PROMPT,
get_return_policy, web_search,
get_product_info, get_technical_support, MODEL_ID
)
= str(uuid.uuid4())
SESSION_ID = CustomerSupportMemoryHooks(memory_id, memory_client, CUSTOMER_ID, SESSION_ID)
memory_hooks
# Initialize the Bedrock model (Anthropic Claude 3.7 Sonnet)
= BedrockModel(
model =MODEL_ID,
model_id=REGION
region_name
)
# Create the customer support agent with all 5 tools
= Agent(
agent =model,
model=[memory_hooks], # Pass Memory Hooks
hooks=[
tools# Tool 1: Simple product information lookup
get_product_info, # Tool 2: Simple return policy lookup
get_return_policy,
web_search,
get_technical_support
],=SYSTEM_PROMPT
system_prompt )
Step 5: Test Personalized Agent
Let’s test our memory-enhanced agent! Watch how it uses the customer’s historical preferences to provide personalized recommendations.
The agent will automatically: 1. Retrieve relevant customer context from memory 2. Use that context to personalize the response 3. Save this new interaction for future use
from IPython.display import display, Markdown
print("🎧 Testing headphone recommendation with customer memory...\n\n")
= agent("Which headphones would you recommend?") response1
🎧 Testing headphone recommendation with customer memory...
I'd be happy to help you find some gaming headphones with low latency, perfect for competitive FPS games where timing is critical. Let me get some specific information about gaming headphones for you.
Tool #1: get_product_info
Based on your need for low-latency gaming headphones for competitive FPS games, let me search for some specific recommendations that would work well for your requirements:
Tool #2: web_search
Based on the information I've gathered, here are my recommendations for gaming headphones that would be ideal for competitive FPS games where low latency is crucial:
### Top Recommendations for Low Latency Gaming Headphones:
1. **SteelSeries Arctis Nova Pro** - Considered a top choice for competitive gaming with excellent multi-device connectivity. The wireless version offers very low latency that's ideal for FPS games.
2. **Turtle Beach Stealth 700 Gen 2** - Specifically mentioned as a next-generation competitive gaming headset with improved audio performance for better game performance.
3. **Headphones with aptX Low Latency Support** - Look for headphones supporting this codec, as it can reduce latency to under 40ms, which experts consider ideal for competitive gaming.
### What to Look For:
- **Latency under 40ms** - For competitive FPS games, this is considered the benchmark for lag-free audio
- **Good positional audio** - Critical for accurately locating enemies in FPS games
- **Comfort** - Important for long gaming sessions
- **Quality microphone** - For clear communication with teammates
### Connection Type Consideration:
- **Wired options** generally offer the lowest latency and are most reliable for competitive play
- **Wireless options** with specialized gaming-focused transmission technology can also perform well
Would you like more specific information about any of these models? Or would you prefer I search for options in a particular price range? I can also provide information about return policies if you'd like to try a pair before fully committing to them.
print("\n💻 Testing laptop preference recall...\n\n")
= agent("What is my preferred laptop brand and requirements?") response2
💻 Testing laptop preference recall...
Based on your previous interactions, I can see your preferred laptop specifications quite clearly.
Your preferred laptop requirements are:
- Brand preference: ThinkPad models
- Budget: Under $1200
- RAM: Minimum of 16GB
- Operating system compatibility: Good Linux compatibility
- Purpose: Programming
ThinkPad is definitely your preferred laptop brand, and you're looking for a model that meets these specific requirements for programming work. ThinkPads are known for their excellent Linux compatibility, which aligns perfectly with your preferences.
Is there anything specific about ThinkPad models you'd like to know more about, or would you like me to provide some recommendations for ThinkPad models that meet your requirements for programming? I'd be happy to search for current models that fit your budget and specifications.
Notice how the Agent remembers: • Your gaming preferences (low latency headphones) • Your laptop preferences (ThinkPad, 16GB RAM, Linux compatibility) • Your budget constraints ($1200 for laptops) • Previous technical issues (MacBook overheating)
This is the power of AgentCore Memory - persistent, personalized customer experiences!
Congratulations! 🎉
You have successfully completed Lab 2: Add memory to the Customer Support Agent!
What You Accomplished:
- Created a serverless managed memory with Amazon Bedrock AgentCore Memory
- Implemented long-term memory to store User-Preferences and Semantic (Factual) information.
- Integrated AgentCore Memory with the customer support Agent using the hook mechanism provided by Strands Agents