Building in Public: How We Fixed EdgeAI's Memory — It Was Forgetting Every Message

· Meaningful Blog

Last week we shipped a 3-layer memory system for EdgeAI.

It stores facts. It encrypts personal data. It promotes high-confidence knowledge to permanent storage.

It was architecturally sound.

And yet — EdgeAI was still forgetting everything mid-conversation.

---

The bug

Here's what was actually happening:

You'd ask: "Can you filter my connections by type?"

EdgeAI would answer: "Yes! I can show Family, Professional, and Friend. Want me to do that?"

You'd reply: "yes"

EdgeAI would respond: "I'm EdgeAI, here to help. What would you like assistance with?"

A full reset. Every time.

---

Root cause: a race condition in the history pipeline

The architecture had a subtle flaw.

The frontend sent only the current message to the server — no history.

The server loaded conversation history from the database. But the database was written by a separate endpoint, triggered by a React `useEffect` that fired after the message was added to state.

So the server was always reading history that was one message behind. On the first reply of a session: empty.

The AI had no idea what was said before. Every message looked like the start of a new conversation.

---

The second problem: "yes" was classified as a greeting

Even with history, short replies like "yes", "ok", "sure" were falling through to the LLM intent classifier.

The classifier saw "yes" in isolation and guessed `chitchat`.

`chitchat` routes to a hardcoded greeting template.

Hence: "I'm EdgeAI, here to help."

---

5 fixes, shipped together

Fix 1 — Eliminate the race condition

The frontend already has the full message history in state. We now pass the last 8 messages directly in the request body on every send.

The database lookup becomes a fallback for session restoration only. No more one-message lag.

Fix 2 — Tier 0: follow-up regex

Before any other classification, we now check for short follow-up replies:

`yes / no / ok / sure / go ahead / show me / tell me more`

If one of these matches and there's prior history, we route to a new `followup` intent — not chitchat.

This runs at zero latency. No LLM call.

Fix 3 — History in every handler

The `handleGeneral` and `handleAppQuery` handlers now always inject the last 6 messages into the prompt — regardless of whether a user ID is present.

Previously this was conditional. A subtle bug that made history invisible to the model in edge cases.

Fix 4 — Lower temperature

Llama 3.2 3B at `temperature: 0.3` produces inconsistent outputs on short prompts.

We dropped it to `0.1` for all factual and query handlers. The model is now more deterministic and less likely to drift.

Fix 5 — Continuation handler

When a follow-up is detected, instead of re-running the full classification pipeline, we inject the prior exchange and ask the model to continue:

`Prior exchange: [last user message] / [last AI message] → User follow-up: "yes"`

The model continues naturally. No context loss.

---

What the same conversation looks like now

"Can you filter my connections by type?"

"Yes! I can show Family, Professional, and Friend. Want me to do that?"

"yes"

"Here are your connections grouped by type: Family (4): ..., Professional (7): ..., Friend (3): ..."

That's what it should have been doing from day one.

---

The lesson

A memory architecture is only as good as the pipeline that delivers context to the model.

We had the storage layer right. The delivery layer had a race condition, a misclassification, and a missing history injection.

Three separate bugs. One broken experience.

What's the most frustrating AI behaviour you've encountered — where the system clearly should have remembered something but didn't?