Building in Public: How We Fixed EdgeAI's Memory — It Was Forgetting Every Message
· Meaningful Blog
Last week we shipped a 3-layer memory system for EdgeAI.
It stores facts. It encrypts personal data. It promotes high-confidence knowledge to permanent storage.
It was architecturally sound.
And yet — EdgeAI was still forgetting everything mid-conversation.
---
The bug
Here's what was actually happening:
You'd ask: "Can you filter my connections by type?"
EdgeAI would answer: "Yes! I can show Family, Professional, and Friend. Want me to do that?"
You'd reply: "yes"
EdgeAI would respond: "I'm EdgeAI, here to help. What would you like assistance with?"
A full reset. Every time.
---
Root cause: a race condition in the history pipeline
The architecture had a subtle flaw.
The frontend sent only the current message to the server — no history.
The server loaded conversation history from the database. But the database was written by a separate endpoint, triggered by a React `useEffect` that fired after the message was added to state.
So the server was always reading history that was one message behind. On the first reply of a session: empty.
The AI had no idea what was said before. Every message looked like the start of a new conversation.
---
The second problem: "yes" was classified as a greeting
Even with history, short replies like "yes", "ok", "sure" were falling through to the LLM intent classifier.
The classifier saw "yes" in isolation and guessed `chitchat`.
`chitchat` routes to a hardcoded greeting template.
Hence: "I'm EdgeAI, here to help."
---
5 fixes, shipped together
Fix 1 — Eliminate the race condition
The frontend already has the full message history in state. We now pass the last 8 messages directly in the request body on every send.
The database lookup becomes a fallback for session restoration only. No more one-message lag.
Fix 2 — Tier 0: follow-up regex
Before any other classification, we now check for short follow-up replies:
`yes / no / ok / sure / go ahead / show me / tell me more`
If one of these matches and there's prior history, we route to a new `followup` intent — not chitchat.
This runs at zero latency. No LLM call.
Fix 3 — History in every handler
The `handleGeneral` and `handleAppQuery` handlers now always inject the last 6 messages into the prompt — regardless of whether a user ID is present.
Previously this was conditional. A subtle bug that made history invisible to the model in edge cases.
Fix 4 — Lower temperature
Llama 3.2 3B at `temperature: 0.3` produces inconsistent outputs on short prompts.
We dropped it to `0.1` for all factual and query handlers. The model is now more deterministic and less likely to drift.
Fix 5 — Continuation handler
When a follow-up is detected, instead of re-running the full classification pipeline, we inject the prior exchange and ask the model to continue:
`Prior exchange: [last user message] / [last AI message] → User follow-up: "yes"`
The model continues naturally. No context loss.
---
What the same conversation looks like now
"Can you filter my connections by type?"
→ "Yes! I can show Family, Professional, and Friend. Want me to do that?"
"yes"
→ "Here are your connections grouped by type: Family (4): ..., Professional (7): ..., Friend (3): ..."
That's what it should have been doing from day one.
---
The lesson
A memory architecture is only as good as the pipeline that delivers context to the model.
We had the storage layer right. The delivery layer had a race condition, a misclassification, and a missing history injection.
Three separate bugs. One broken experience.
What's the most frustrating AI behaviour you've encountered — where the system clearly should have remembered something but didn't?