The frustration that started this
If you use Claude Code for real work, you've hit this wall: you're deep in a session, you've made architectural decisions, debugged tricky issues, established patterns — and then context compaction happens. Claude summarizes your conversation to free up tokens, and suddenly it's forgotten that you switched from MongoDB to PostgreSQL three hours ago.
You explain it again. It forgets again. Repeat.
I got tired of re-explaining my own codebase to my AI assistant. So I built Claude Cortex — a memory system that works like a brain, not a notepad.
What it actually does
Claude Cortex is an MCP server that gives Claude Code three types of memory:
- Short-term memory (STM) — session-level, high detail, decays within hours
- Long-term memory (LTM) — cross-session, consolidated from STM, persists for weeks
- Episodic memory — specific events: "when I tried X, Y happened"
The key insight: not everything is worth remembering. The system scores every piece of information for salience — how important it actually is:
"Remember that we're using PostgreSQL" → architecture decision → 0.9 salience
"Fixed the auth bug by clearing the token cache" → error resolution → 0.8 salience
"The current file has 200 lines" → temporary context → 0.2 salience (won't persist)
Memories also decay over time, just like human memory:
score = base_salience × (0.995 ^ hours_since_access)
But every time a memory is accessed, it gets reinforced by 1.2×. Frequently useful memories survive. One-off details fade away. This isn't a key-value store — it's a system that learns what matters.
The compaction problem, solved
Here's the specific workflow that used to drive me nuts:
Before Cortex:
Session starts → Work for 2 hours → Compaction happens →
Claude: "What database are you using?" → You: *screams internally*
After Cortex:
Session starts → Work for 2 hours → Compaction happens →
PreCompact hook auto-extracts 3-5 important memories →
Claude: "Let me check my memory..." →
Recalls: PostgreSQL, JWT auth, React frontend, modular architecture →
Continues working seamlessly
The PreCompact hook is the secret weapon. It runs automatically before every compaction event, scanning the conversation for decisions, error fixes, learnings, and architecture notes. No manual intervention needed.
v1.6.0: The intelligence overhaul
The first version was essentially CRUD-with-decay. It worked, but the subsystems were isolated — search didn't improve linking, linking didn't improve search, salience was set once and never evolved.
v1.6.0 was a seven-task overhaul to make everything feed back into everything else:
1. Semantic linking via embeddings
Previously, memories only linked if they shared tags. Now, two memories about PostgreSQL with completely different tags will still link — the system computes embedding similarity and creates connections at ≥0.6 cosine similarity.
2. Search feedback loops
Every search now does three things:
- Returns results (obviously)
- Reinforces salience of returned memories (with diminishing returns)
- Creates links between co-returned results
Your search patterns literally shape the knowledge graph.
3. Dynamic salience evolution
Salience isn't static anymore. During consolidation:
- Hub memories (lots of links) get a logarithmic bonus
- Contradicted memories get a small penalty
- The system learns which memories are structurally important
4. Contradiction surfacing
If you told Claude "use PostgreSQL" in January and "use MongoDB" in March, the system detects this and flags it:
⚠️ WARNING: Contradicts "Use PostgreSQL" (Memory #42)
No more silently holding conflicting information.
5. Memory enrichment
Memories accumulate context over time. If you search for "JWT auth" and the query contains information the memory doesn't have, it gets appended. Memories grow richer through use.
6. Real consolidation
The old system just deduplicated exact matches. Now it clusters related STM memories and merges them into coherent LTM entries:
STM: "Set up JWT tokens with RS256 signing"
STM: "JWT tokens expire after 24 hours"
STM: "Added JWT verification middleware"
→ Consolidated LTM: "JWT authentication system using RS256 signing.
Tokens expire after 24 hours with 7-day refresh tokens.
Verification middleware on all protected routes."
Three noisy short-term memories become one structured long-term memory.
7. Activation weight tuning
Recently activated memories get a meaningful boost in search results. If you just looked at something, it's more likely to be relevant again.
Getting started
Install
npm install -g claude-cortex
Configure Claude Code
Create .mcp.json in your project (or ~/.claude/.mcp.json for global):
{
"mcpServers": {
"memory": {
"type": "stdio",
"command": "npx",
"args": ["-y", "claude-cortex"]
}
}
}
Set up the PreCompact hook
Add to ~/.claude/settings.json:
{
"hooks": {
"PreCompact": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "npx -y claude-cortex-hook pre-compact"
}
]
}
]
}
}
Restart Claude Code, approve the MCP server, and you're done. Claude will start remembering things automatically.
Use it naturally
You don't need to learn new commands. Just talk to Claude:
"Remember that we're using PostgreSQL for the database"
"What do you know about our auth setup?"
"Get the context for this project"
The system handles categorization, salience scoring, and storage behind the scenes.
The dashboard
There's also an optional 3D brain visualization dashboard — because honestly, watching memories form as glowing nodes in a neural network is just cool.
npx claude-cortex service install # auto-start on login
It shows your memory graph in real-time via WebSocket, with search, filters, stats, and even a SQL console for poking at the database directly. Memories are color-coded: blue for architecture, purple for patterns, green for preferences, red for errors, yellow for learnings.
How it compares
Most MCP memory tools are flat key-value stores. You manually save and manually retrieve. Claude Cortex is different in a few ways:
- Salience detection — it decides what's worth remembering, not you
- Temporal decay — old irrelevant stuff fades naturally
- STM → LTM consolidation — short-term memories get merged into long-term ones
- Semantic linking — memories form a knowledge graph, not a list
- PreCompact hook — survives Claude Code's context compaction automatically
It's not perfect. Embeddings add some latency. The consolidation heuristics are tuned for my workflows and might need adjustment for yours. The dashboard is a nice-to-have, not a must-have. But for the core problem — Claude forgetting things it shouldn't forget — it works really well.
The stack
- TypeScript, compiled to ESM
- SQLite with FTS5 for full-text search
-
@huggingface/transformersfor local embeddings (v1.6.1 fixed ARM64 support) - MCP protocol for Claude Code integration
- React + Three.js for the dashboard
- 56 passing tests, MIT licensed
Try it out
npm install -g claude-cortex
If you're using Claude Code for anything beyond quick one-offs, give it a shot. The difference between an AI that remembers your project and one that doesn't is night and day.
Stars and feedback welcome — this is a solo project and I'm iterating fast.
Top comments (10)
Really cool project — this kind of persistent memory layer is exactly what makes tools like Claude actually usable for real coding workflows. The way you score salience and let memories decay or reinforce feels a lot closer to how humans remember what matters, instead of dumping context every time the session compresses.
I’ve run into the same pain where Claude asks “What database are we using?” after hours of work — it quickly turns productive sessions into repetition loops. Solutions that give the model STM + LTM + episodic memory genuinely feel like the next step for AI coding assistants.
Would love to hear how well the system handles cross-session architectural decisions long term.
Thanks! Really appreciate that.
To answer your question on cross-session architectural decisions — this is exactly where LTM shines. Once a decision like "we use PostgreSQL" gets consolidated from STM into LTM with high salience (0.9 for architecture decisions), it persists across sessions and gets reinforced every time it's accessed.
In practice, architectural decisions survive weeks of sessions without manual intervention. The decay rate (0.995^hours) is slow enough that high-salience memories stick around, and the reinforcement loop (1.2x boost on access) means anything you actually reference stays strong.
The real magic is consolidation — multiple related STM entries about the same decision get merged into one rich LTM entry, so you get a complete picture rather than fragments.
Still iterating on the decay curves — would love to hear how it works for your workflow if you give it a shot!
This is an excellent piece of work — and more importantly, a very clear diagnosis of the real problem 👍
What you’ve built isn’t “memory” in the CRUD sense, it’s state with pressure, decay, and reinforcement, which is exactly what breaks once context compaction enters the picture. The PreCompact hook is a particularly elegant move — treating compaction as a first-class lifecycle event rather than an accident to recover from.
I also like that you’re explicit about salience as a learned property, not a user-managed one. That’s the key difference between a notepad and a brain. Most tools stop at persistence; this actually models importance over time.
This feels less like a Claude add-on and more like missing infrastructure for long-running, real-world AI-assisted development. Really strong systems thinking here 🚀
Really appreciate the thoughtful breakdown — you nailed what I was going for.
The PreCompact hook was born out of frustration. I kept trying to fix compaction after the fact, and then it clicked: compaction is a lifecycle event, not an error. Once I started treating it that way, everything fell into place.
Salience as a learned property was a deliberate choice. The moment you ask users to manually tag importance, nobody does it consistently. The system has to figure it out from context, access patterns, and structural position in the knowledge graph.
Really glad the systems thinking comes through — making the subsystems feed back into each other rather than operating in isolation is the part I'm most proud of. Thanks for the kind words!
I think the “compaction is a lifecycle event, not an error” framing is the key insight here.
A lot of systems treat compaction like garbage collection: something you reluctantly do when things get messy. But what you’re describing feels closer to metabolism — it’s not a cleanup step, it’s part of staying healthy. Designing PreCompact as a first-class hook makes the whole process proactive instead of reactive, and that’s a huge conceptual shift.
Also fully agree on salience being learned rather than user-declared. Any system that relies on manual tagging eventually collapses under human inconsistency (or just… people being busy). Inferring salience from access patterns + context + graph position is exactly the kind of “it should just work” design that scales beyond power users.
And the systems-thinking aspect really does come through: compaction, salience, graph structure, retrieval — all reinforcing each other instead of being isolated mechanisms. That feedback-loop approach is what separates a cohesive architecture from a collection of clever features.
Really excited to see where you take this — it reads like the kind of design that will stay robust even as the dataset and usage patterns evolve.
amazing work, i was thinking about the possibility to start such a project, looks like there no need to do it anymore :)
but, come to think of it, why wouldnt ai companies create such a thing? i think it either a bit deeper than what we think, or they are hiding it to keep thier sales high by forcing people to consume extra tokens, both ways, your project has became a threat to them :D
Vibe coded to right
Ha! Not gonna lie, there was definitely some vibe coding involved in the early prototypes. But hey, if the vibes are right, ship it 😄
This solves the real problem: context loss, not model capability.
Exactly. That's the one-liner version of why this exists. Most people reach for bigger models or more tokens when the real issue is that context evaporates. Fix the memory, and the model you already have becomes dramatically more useful. Appreciate the concise framing!