DEV Community

Cover image for Build Long-Term and Short-Term Memory for Agents Using RedisVL
Peng Qian
Peng Qian

Posted on • Originally published at dataleadsfuture.com

Build Long-Term and Short-Term Memory for Agents Using RedisVL

Introduction

For this weekend note, I want to share some tries I made using RedisVL to add short-term and long-term memory to my agent system.

TLDR: RedisVL works pretty well for short-term memory. It feels a bit simpler than using the traditional Redis API. For long-term memory with semantic search, the experience is not good. I do not recommend it.

Why RedisVL?

Big companies like to use mature infrastructure to build new features.

We know mem0 and Graphiti are good open source software for long-term agent memory. But companies want to stay safe. Building new infrastructure costs money. It is unstable. It needs people who know how to run it.

So when Redis launched RedisVL with vector search, we naturally wanted to try it first. You can connect it to existing Redis clusters and start using it. That sounds nice. But is it really nice? We need to try it for real.

Today I will cover how to use MessageHistory and SemanticMessageHistory from RedisVL to add short-term and long-term memory to agents built on the Microsoft Agent Framework.

You can find the source code at the end of this article.


📫 Don’t forget to follow my blog to stay updated on my latest progress in AI application practices.


Preparation

Install Redis

If you want to try it locally, you can install a Redis instance with Docker.

docker run -d --name redis -p 6379:6379 -p 8001:8001 redis/redis-stack:latest
Enter fullscreen mode Exit fullscreen mode

Cannot use Docker Desktop? See my other article.

A Quick Guide to Containerizing Agent Applications with Podman

The Redis instance will listen on ports 6379 and 8001. Your RedisVL client should connect to redis://localhost:6379. You can visit http://localhost:8001 in the browser to open the Redis console.

Install RedisVL

Install RedisVL with pip.

pip install redisvl
Enter fullscreen mode Exit fullscreen mode

After installation, you can use the RedisVL CLI to manage your indexes and keep your testing neat.

rvl index listall
Enter fullscreen mode Exit fullscreen mode

Implement Short-Term Memory Using MessageHistory

There are lots of “How to” RedisVL articles online, so let’s start straight from Microsoft Agent Framework and see how to use MessageHistory for short-term memory.

As in the official tutorial, you should implement a RedisVLMessageStore based on ChatMessageStoreProtocol.

class RedisVLMessageStore(ChatMessageStoreProtocol):
    def __init__(
        self,
        thread_id: str = "common_thread",
        top_k: int = 6,
        session_tag: str | None = None,
        redis_url: str | None = "redis://localhost:6379",
    ):
        self._thread_id = thread_id
        self._top_k = top_k
        self._session_tag = session_tag or f"session_{uuid4()}"
        self._redis_url = redis_url
        self._init_message_history()
Enter fullscreen mode Exit fullscreen mode

In __init__ you should note two parameters.

  • thread_id is used for the name parameter when creating MessageHistory. I like to bind it to the agent. Each agent gets a unique thread_id.
  • session_tag lets you set a tag for each user so different sessions do not mix.

The protocol asks us to implement two methods list_messages and add_messages.

  • list_messages runs before the agent calls the LLM. It gets all available chat messages from the message store. It takes no parameters, so it cannot support long-term memory. More on that later.
  • add_messages runs after the agent gets the LLM’s reply. It stores new messages into the message store.

Here is how the message store works.

The calling order of message store in the agent. Image by Author

So in list_messages and add_messages, we just use RedisVL’s MessageHistory to do the job.

list_messages below uses get_recent to get top_k recent messages and turns them into ChatMessage.

class RedisVLMessageStore(ChatMessageStoreProtocol):
    ...

    async def list_messages(self) -> list[ChatMessage]:
        messages: list[dict[str, str]] = self._message_history.get_recent(
            top_k=self._top_k,
            session_tag=self._session_tag,
        )
        return [self._back_to_chat_message(message)
                for message in messages]
Enter fullscreen mode Exit fullscreen mode

add_messages turns the ChatMessage into Redis messages and calls add_messages to store them.

class RedisVLMessageStore(ChatMessageStoreProtocol):
    ...

    async def add_messages(self, messages: Sequence[ChatMessage]):
        messages = [self._to_redis_message(message)
                    for message in messages]
        self._message_history.add_messages(
            messages,
            session_tag=self._session_tag
        )
Enter fullscreen mode Exit fullscreen mode

That is short-term memory done with RedisVL. You may also implement deserialize, serialize and update_from_state for saving and loading the memory, but it is not important now. See the full code at the end.

Test RedisVLMessageStore

Let’s build an agent and test the message store.

agent = OpenAILikeChatClient(
    model_id=Qwen3.NEXT
).create_agent(
    name="assistant",
    instructions="You're a little helper who answers my questions in one sentence.",
    chat_message_store_factory=lambda: RedisVLMessageStore(
        session_tag="user_abc"
    )
)
Enter fullscreen mode Exit fullscreen mode

Now a console loop for multi-turn dialog. Remember, Microsoft Agent Framework does not support short-term memory unless you use an AgentThread and pass it to run.

async def main():
    thread = agent.get_new_thread()
    while True:
        user_input = input("User: ")
        if user_input.startswith("exit"):
            break
        response = await agent.run(user_input, thread=thread)
        print(f"\nAssistant: {response.text}")
    thread.message_store.clear()
Enter fullscreen mode Exit fullscreen mode

AgentThread when created calls the factory method to build the RedisVLMessageStore.

To check if the store works, we can use mlflow.openai.autolog() to see if messages sent to the LLM contain historical messages.

import mlflow
mlflow.set_tracking_uri(os.environ.get("MLFLOW_TRACKING_URI"))
mlflow.set_experiment("Default")
mlflow.openai.autolog()
Enter fullscreen mode Exit fullscreen mode

You can see that the conversation comes with a complete history of messages. Image by Author

See my other article for using MLFlow to track LLM calls.

Monitoring Qwen 3 Agents with MLflow 3.x: End-to-End Tracing Tutorial

Let’s open the Redis console to see the cache.

How the cache is stored in Redis. Image by Author

As you can see, after using MessageHistory as MAF's message store, we can implement multi-turn conversations with historical messages.

With thread_id and session_tag parameters, we can also implement the feature that lets users switch between multiple conversation sessions, like in popular LLM chat applications.

Feels simpler than the official RedisMessageStore solution right?


Implement Long-Term Memory Using SemanticMessageHistory

SemanticMessageHistory is a subclass of MessageHistory. It adds a get_relevant method for vector search.

Example:

prompt = "what have I learned about the size of England?"
semantic_history.set_distance_threshold(0.35)
context = semantic_history.get_relevant(prompt)
for message in context:
    print(message)
Enter fullscreen mode Exit fullscreen mode
Batches: 100%|██████████| 1/1 [00:00<00:00, 56.30it/s]
{'role': 'user', 'content': 'what is the size of England compared to Portugal?'}
Enter fullscreen mode Exit fullscreen mode

Compared to MessageHistory the big thing here is that we can get the most relevant historical messages based on the user request.

You might think that if MessageStore short-term memory is nice, then SemanticMessageHistory with semantic search must be even better.

From my experience, this is not the case.

From my test results, it is not like that. Let’s now make a long-term memory adapter for Microsoft Agent Framework using SemanticMessageHistory and see the result.

Use SemanticMessageHistory in Microsoft Agent Framework

Earlier I said list_messages in ChatMessageStoreProtocol has no parameters, so we cannot search history. Thus, we cannot use MessageStore for long-term memory.

Microsoft Agent Framework has a ContextProvider class. From its name, it is for context engineering.

So we should build long-term memory on this class.

class RedisVLSemanticMemory(ContextProvider):
    def __init__(
        self,
        thread_id: str | None = None,
        session_tag: str | None = None,
        distance_threshold: float = 0.3,
        redis_url: str = "redis://localhost:6379",
        embedding_model: str = "BAAI/bge-m3",
        embedding_api_key: str | None = None,
        embedding_endpoint: str | None = None,
    ):
        self._thread_id = thread_id or "semantic_thread"
        self._session_tag = session_tag or f"session_{uuid4()}"
        self._distance_threshold = distance_threshold
        self._redis_url = redis_url
        self._embedding_model = embedding_model
        self._embedding_api_key = embedding_api_key or os.getenv("EMBEDDING_API_KEY")
        self._embedding_endpoint = embedding_endpoint or os.getenv("EMBEDDING_ENDPOINT")
        self._init_semantic_store()
Enter fullscreen mode Exit fullscreen mode

ContextProvider has two methods invoked and invoking.

  • invoked runs after LLM call. It stores the latest messages in RedisVL. It has both request_message and response_messages parameters but stores them separately.
  • invoking runs before LLM call. It uses the user’s current input to search for relevant history in RedisVL and returns a Context object.

The Context object has three variables.

  • instructions string. The agent adds this to the system prompt.
  • messages list. Put history messages found in long-term memory here.
  • tools list for functions. The agent adds these tools to its ChatOptions.

The purpose of the three types of messages retrieved. Image by Author

Since we want to use vector search to get relevant history, we put those messages in messages. The order between MessageStore messages and ContextProvider messages matters. Here is the order of their calls.

The calling order of long-term and short-term memory in the agent. Image by Author

Setting up a TextVectorizer

Semantic vector search needs embeddings. We must set up a vectorizer.

In __init__ besides thread_id and session_tag we set the embedding model info.

class RedisVLSemanticMemory(ContextProvider):
    ...
    def _init_semantic_store(self) -> None:
        if not self._embedding_api_key:
            vectorizer = HFTextVectorizer(
                model=self._embedding_model,
            )
        else:
            vectorizer = OpenAITextVectorizer(
                model=self._embedding_model,
                api_config={
                    "api_key": self._embedding_api_key,
                    "base_url": self._embedding_endpoint
                }
            )

        self._semantic_store = SemanticMessageHistory(
            name=self._thread_id,
            session_tag=self._session_tag,
            distance_threshold=self._distance_threshold,
            redis_url=self._redis_url,
            vectorizer=vectorizer,
        )
Enter fullscreen mode Exit fullscreen mode

I can choose a server-hosted embedding model with OpenAI API or a local HuggingFace model, depending on whether embedding_api_key is set.

Implement invoked and invoking methods

invoked is easy. As said SemanticMessageHistory stores request and response separately. I merge them into one list, then call add_messages.

class RedisVLSemanticMemory(ContextProvider):
    ...
    async def invoked(
        self,
        request_messages: ChatMessage | Sequence[ChatMessage],
        response_messages: ChatMessage | Sequence[ChatMessage] | None = None,
        invoke_exception: Exception | None = None,
        **kwargs: Any,
    ) -> None:
        if isinstance(request_messages, ChatMessage):
            request_messages = [request_messages]
        if isinstance(response_messages, ChatMessage):
            response_messages = [response_messages]
        chat_messages = request_messages + response_messages
        messages = [self._to_redis_message(message)
                    for message in chat_messages]
        self._semantic_store.add_messages(
            messages=messages,
            session_tag=self._session_tag,
        )
Enter fullscreen mode Exit fullscreen mode

invoking below:

class RedisVLSemanticMemory(ContextProvider):
    ...
    async def invoking(
        self,
        messages: ChatMessage | MutableSequence[ChatMessage],
        **kwargs: Any
    ) -> Context:
        if isinstance(messages, ChatMessage): # 1
            messages = [messages]
        prompt = "\n".join([message.text
                            for message in messages])
        context = self._semantic_store.get_relevant(
            prompt=prompt,
            raw=True,
            session_tag=self._session_tag,
        )
        context = sorted(context, key=lambda m: m['timestamp']) # 2
        relevant_messages = [self._back_to_chat_message(message)
                             for message in context]
        print([m.text for m in relevant_messages])
        return Context(messages=relevant_messages) # 3
Enter fullscreen mode Exit fullscreen mode

Points to note.

  • The messages parameter may be a list for multi-modal input. Merge all text.
  • Since messages are stored separately, I need to sort them by timestamp to keep order.
  • Put the retrieved messages into Context.messages so they go to the end of the current chat messages.

Test semantic memory

Unlike message store, we can set ContextProvider directly in the agent.

memory_provider = RedisVLSemanticMemory(
    session_tag="user_abc",
    distance_threshold=0.3,
)
agent = OpenAILikeChatClient(
    model_id=Qwen3.NEXT
).create_agent(
    name="assistant",
    instructions="You're a little helper who answers my questions in one sentence.",
    context_providers=memory_provider,
)
Enter fullscreen mode Exit fullscreen mode

Now a main with a thread instance to keep short-term memory while testing multi-turn dialog.

async def main():
    thread = agent.get_new_thread()
    while True:
        user_input = input("User: ")
        if user_input.startswith("exit"):
            break
        response = await agent.run(user_input, thread=thread)
        print(response.text)
    memory_provider.clear()
Enter fullscreen mode Exit fullscreen mode

Test result:

The  raw `distance_threshold` endraw  is too high, causing irrelevant messages to be retrieved. Image by Author

It seems the default value of distance_threshold 0.3 is too high. Let's set it lower:

memory_provider = RedisVLSemanticMemory(
    session_tag="user_abc",
    distance_threshold=0.2,
)
Enter fullscreen mode Exit fullscreen mode

Test again:

Only request messages were retrieved, not response messages. Image by Author

Lower threshold stops unrelated messages. But since requests and responses are stored separately, only requests are found. ContextProvider puts retrieved messages at the end of the message list. The LLM may think the user asked two questions. MLFlow shows it.

Two similar questions were both added to the message list, but without attaching the already provided answers. Image by Author

This is bad. We care more about the LLM’s answers than the requests. But vector search often finds the questions, not the answers. This just adds useless questions and does not help the LLM answer.

Hard to say if the fault is Microsoft Agent Framework or RedisVL.

When ContextProviderlong-term finds related chat messages, they go after the ones from message store. If long-term and short-term messages repeat, they can confuse the LLM.

Also, RedisVL not storing requests and responses together is a choice I do not like. LLM responses cost more. In production, a response may involve web search, RAG retrieval, or running code. But vector search finds just the request, not the answer. That is a waste.


Conclusion

Today, we tried using RedisVL for short-term and long-term memory in Microsoft Agent Framework and checked the results.

RedisVL is very handy for short-term agent memory. It is simpler than using the Redis API.

But SemanticMessageHistory for semantic search of the user history did not perform well. I explained why.

Thanks to the solid Redis infrastructure, semantic caches with RedisVL are simpler than other vector solutions.

Next time, I will show you a semantic cache with RedisVL to save big costs for your company.

Share your thoughts in the comments.

👉 Subscribe to my blog to follow my latest agent app work.

And share this article with friends. Maybe it will help more people.😁


Enjoyed this read? Subscribe now to get more cutting-edge data science tips straight to your inbox! Your feedback and questions are welcome — let’s discuss in the comments below!

This article was originally published on Data Leads Future.

Top comments (0)