DEV Community

CyborgNinja1
CyborgNinja1

Posted on

I built a brain for Claude Code because it keeps forgetting everything

The frustration that started this

If you use Claude Code for real work, you've hit this wall: you're deep in a session, you've made architectural decisions, debugged tricky issues, established patterns — and then context compaction happens. Claude summarizes your conversation to free up tokens, and suddenly it's forgotten that you switched from MongoDB to PostgreSQL three hours ago.

You explain it again. It forgets again. Repeat.

I got tired of re-explaining my own codebase to my AI assistant. So I built Claude Cortex — a memory system that works like a brain, not a notepad.

What it actually does

Claude Cortex is an MCP server that gives Claude Code three types of memory:

  • Short-term memory (STM) — session-level, high detail, decays within hours
  • Long-term memory (LTM) — cross-session, consolidated from STM, persists for weeks
  • Episodic memory — specific events: "when I tried X, Y happened"

The key insight: not everything is worth remembering. The system scores every piece of information for salience — how important it actually is:

"Remember that we're using PostgreSQL" → architecture decision → 0.9 salience
"Fixed the auth bug by clearing the token cache" → error resolution → 0.8 salience  
"The current file has 200 lines" → temporary context → 0.2 salience (won't persist)
Enter fullscreen mode Exit fullscreen mode

Memories also decay over time, just like human memory:

score = base_salience × (0.995 ^ hours_since_access)
Enter fullscreen mode Exit fullscreen mode

But every time a memory is accessed, it gets reinforced by 1.2×. Frequently useful memories survive. One-off details fade away. This isn't a key-value store — it's a system that learns what matters.

The compaction problem, solved

Here's the specific workflow that used to drive me nuts:

Before Cortex:

Session starts → Work for 2 hours → Compaction happens → 
Claude: "What database are you using?" → You: *screams internally*
Enter fullscreen mode Exit fullscreen mode

After Cortex:

Session starts → Work for 2 hours → Compaction happens →
PreCompact hook auto-extracts 3-5 important memories →
Claude: "Let me check my memory..." → 
Recalls: PostgreSQL, JWT auth, React frontend, modular architecture →
Continues working seamlessly
Enter fullscreen mode Exit fullscreen mode

The PreCompact hook is the secret weapon. It runs automatically before every compaction event, scanning the conversation for decisions, error fixes, learnings, and architecture notes. No manual intervention needed.

v1.6.0: The intelligence overhaul

The first version was essentially CRUD-with-decay. It worked, but the subsystems were isolated — search didn't improve linking, linking didn't improve search, salience was set once and never evolved.

v1.6.0 was a seven-task overhaul to make everything feed back into everything else:

1. Semantic linking via embeddings

Previously, memories only linked if they shared tags. Now, two memories about PostgreSQL with completely different tags will still link — the system computes embedding similarity and creates connections at ≥0.6 cosine similarity.

2. Search feedback loops

Every search now does three things:

  • Returns results (obviously)
  • Reinforces salience of returned memories (with diminishing returns)
  • Creates links between co-returned results

Your search patterns literally shape the knowledge graph.

3. Dynamic salience evolution

Salience isn't static anymore. During consolidation:

  • Hub memories (lots of links) get a logarithmic bonus
  • Contradicted memories get a small penalty
  • The system learns which memories are structurally important

4. Contradiction surfacing

If you told Claude "use PostgreSQL" in January and "use MongoDB" in March, the system detects this and flags it:

⚠️ WARNING: Contradicts "Use PostgreSQL" (Memory #42)
Enter fullscreen mode Exit fullscreen mode

No more silently holding conflicting information.

5. Memory enrichment

Memories accumulate context over time. If you search for "JWT auth" and the query contains information the memory doesn't have, it gets appended. Memories grow richer through use.

6. Real consolidation

The old system just deduplicated exact matches. Now it clusters related STM memories and merges them into coherent LTM entries:

STM: "Set up JWT tokens with RS256 signing"
STM: "JWT tokens expire after 24 hours"
STM: "Added JWT verification middleware"

→ Consolidated LTM: "JWT authentication system using RS256 signing.
   Tokens expire after 24 hours with 7-day refresh tokens.
   Verification middleware on all protected routes."
Enter fullscreen mode Exit fullscreen mode

Three noisy short-term memories become one structured long-term memory.

7. Activation weight tuning

Recently activated memories get a meaningful boost in search results. If you just looked at something, it's more likely to be relevant again.

Getting started

Install

npm install -g claude-cortex
Enter fullscreen mode Exit fullscreen mode

Configure Claude Code

Create .mcp.json in your project (or ~/.claude/.mcp.json for global):

{
  "mcpServers": {
    "memory": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "claude-cortex"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Set up the PreCompact hook

Add to ~/.claude/settings.json:

{
  "hooks": {
    "PreCompact": [
      {
        "matcher": "",
        "hooks": [
          {
            "type": "command",
            "command": "npx -y claude-cortex-hook pre-compact"
          }
        ]
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

Restart Claude Code, approve the MCP server, and you're done. Claude will start remembering things automatically.

Use it naturally

You don't need to learn new commands. Just talk to Claude:

"Remember that we're using PostgreSQL for the database"
"What do you know about our auth setup?"
"Get the context for this project"
Enter fullscreen mode Exit fullscreen mode

The system handles categorization, salience scoring, and storage behind the scenes.

The dashboard

There's also an optional 3D brain visualization dashboard — because honestly, watching memories form as glowing nodes in a neural network is just cool.

npx claude-cortex service install  # auto-start on login
Enter fullscreen mode Exit fullscreen mode

It shows your memory graph in real-time via WebSocket, with search, filters, stats, and even a SQL console for poking at the database directly. Memories are color-coded: blue for architecture, purple for patterns, green for preferences, red for errors, yellow for learnings.

How it compares

Most MCP memory tools are flat key-value stores. You manually save and manually retrieve. Claude Cortex is different in a few ways:

  • Salience detection — it decides what's worth remembering, not you
  • Temporal decay — old irrelevant stuff fades naturally
  • STM → LTM consolidation — short-term memories get merged into long-term ones
  • Semantic linking — memories form a knowledge graph, not a list
  • PreCompact hook — survives Claude Code's context compaction automatically

It's not perfect. Embeddings add some latency. The consolidation heuristics are tuned for my workflows and might need adjustment for yours. The dashboard is a nice-to-have, not a must-have. But for the core problem — Claude forgetting things it shouldn't forget — it works really well.

The stack

  • TypeScript, compiled to ESM
  • SQLite with FTS5 for full-text search
  • @huggingface/transformers for local embeddings (v1.6.1 fixed ARM64 support)
  • MCP protocol for Claude Code integration
  • React + Three.js for the dashboard
  • 56 passing tests, MIT licensed

Try it out

npm install -g claude-cortex
Enter fullscreen mode Exit fullscreen mode

If you're using Claude Code for anything beyond quick one-offs, give it a shot. The difference between an AI that remembers your project and one that doesn't is night and day.

Stars and feedback welcome — this is a solo project and I'm iterating fast.

Top comments (10)

Collapse
 
cyber8080 profile image
Cyber Safety Zone

Really cool project — this kind of persistent memory layer is exactly what makes tools like Claude actually usable for real coding workflows. The way you score salience and let memories decay or reinforce feels a lot closer to how humans remember what matters, instead of dumping context every time the session compresses.

I’ve run into the same pain where Claude asks “What database are we using?” after hours of work — it quickly turns productive sessions into repetition loops. Solutions that give the model STM + LTM + episodic memory genuinely feel like the next step for AI coding assistants.

Would love to hear how well the system handles cross-session architectural decisions long term.

Collapse
 
mkdelta221 profile image
CyborgNinja1

Thanks! Really appreciate that.

To answer your question on cross-session architectural decisions — this is exactly where LTM shines. Once a decision like "we use PostgreSQL" gets consolidated from STM into LTM with high salience (0.9 for architecture decisions), it persists across sessions and gets reinforced every time it's accessed.

In practice, architectural decisions survive weeks of sessions without manual intervention. The decay rate (0.995^hours) is slow enough that high-salience memories stick around, and the reinforcement loop (1.2x boost on access) means anything you actually reference stays strong.

The real magic is consolidation — multiple related STM entries about the same decision get merged into one rich LTM entry, so you get a complete picture rather than fragments.

Still iterating on the decay curves — would love to hear how it works for your workflow if you give it a shot!

Collapse
 
volt1480 profile image
Dominik Michelitsch

This is an excellent piece of work — and more importantly, a very clear diagnosis of the real problem 👍

What you’ve built isn’t “memory” in the CRUD sense, it’s state with pressure, decay, and reinforcement, which is exactly what breaks once context compaction enters the picture. The PreCompact hook is a particularly elegant move — treating compaction as a first-class lifecycle event rather than an accident to recover from.

I also like that you’re explicit about salience as a learned property, not a user-managed one. That’s the key difference between a notepad and a brain. Most tools stop at persistence; this actually models importance over time.

This feels less like a Claude add-on and more like missing infrastructure for long-running, real-world AI-assisted development. Really strong systems thinking here 🚀

Collapse
 
mkdelta221 profile image
CyborgNinja1

Really appreciate the thoughtful breakdown — you nailed what I was going for.

The PreCompact hook was born out of frustration. I kept trying to fix compaction after the fact, and then it clicked: compaction is a lifecycle event, not an error. Once I started treating it that way, everything fell into place.

Salience as a learned property was a deliberate choice. The moment you ask users to manually tag importance, nobody does it consistently. The system has to figure it out from context, access patterns, and structural position in the knowledge graph.

Really glad the systems thinking comes through — making the subsystems feed back into each other rather than operating in isolation is the part I'm most proud of. Thanks for the kind words!

Collapse
 
volt1480 profile image
Dominik Michelitsch • Edited

I think the “compaction is a lifecycle event, not an error” framing is the key insight here.

A lot of systems treat compaction like garbage collection: something you reluctantly do when things get messy. But what you’re describing feels closer to metabolism — it’s not a cleanup step, it’s part of staying healthy. Designing PreCompact as a first-class hook makes the whole process proactive instead of reactive, and that’s a huge conceptual shift.

Also fully agree on salience being learned rather than user-declared. Any system that relies on manual tagging eventually collapses under human inconsistency (or just… people being busy). Inferring salience from access patterns + context + graph position is exactly the kind of “it should just work” design that scales beyond power users.

And the systems-thinking aspect really does come through: compaction, salience, graph structure, retrieval — all reinforcing each other instead of being isolated mechanisms. That feedback-loop approach is what separates a cohesive architecture from a collection of clever features.

Really excited to see where you take this — it reads like the kind of design that will stay robust even as the dataset and usage patterns evolve.

Collapse
 
__07e349d3f736 profile image
القسم التقني

amazing work, i was thinking about the possibility to start such a project, looks like there no need to do it anymore :)

but, come to think of it, why wouldnt ai companies create such a thing? i think it either a bit deeper than what we think, or they are hiding it to keep thier sales high by forcing people to consume extra tokens, both ways, your project has became a threat to them :D

Collapse
 
slick_phantom profile image
phåńtøm šłîçk

Vibe coded to right

Collapse
 
mkdelta221 profile image
CyborgNinja1

Ha! Not gonna lie, there was definitely some vibe coding involved in the early prototypes. But hey, if the vibes are right, ship it 😄

Collapse
 
balavaradhalingam profile image
Balasaranya Varadhalingam

This solves the real problem: context loss, not model capability.

Collapse
 
mkdelta221 profile image
CyborgNinja1

Exactly. That's the one-liner version of why this exists. Most people reach for bigger models or more tokens when the real issue is that context evaporates. Fix the memory, and the model you already have becomes dramatically more useful. Appreciate the concise framing!