AI “knowing“ a software project is a fantasy. But if we get smart with vector stores we might get a little closer.

We’re trying to find ways to get our AI coding agents understand our projects better.

The problem we’re trying to solve is the inherent forgetfulness of the model.

That’s particularly apparent when handling multiple projects. Let me explain.

We’re a small software development company based in Amsterdam. In the course of a day’s work we might handle different projects on different stages of development.

Typically, when a project is almost finished, you end up with many loose ends, each of which is registered as a ticket. But at the same time, tickets from other projects come up.

We’ll have multiple coding agents working on those tickets.

Of course we have global rules set up for the agents. How to read a ticket, read the PMs comments, work on the ticket, how to format a fix or feature branch, how to add comments to the ticket after finishing.

We should use RAG and vector stores to augment the prompts our ticketing system sends to our coding agents.

The issue is, and this has been said a lot, our AI coding agents are always starting from scratch. Have never retained any knowledge from previous encounters with the project.

We’ve created mappings on different levels, in different formats, we have intricate project descriptions, nothing seems to help.

What are we doing wrong?

Here’s what My AI says might help.

Project-Specific Knowledge Bases

For each project, in the repo, maintain a lightweight, machine-readable document that captures key decisions and is added to as the project matures.

We actually already do this, but maybe we need to find a better way to handle it.

Personally I find it a waste of tokens to point the agent to the docs folder for every ticket.

But maybe we should take the context window into account.

The model I use most is Claude Code, which has a context window of 200,000 tokens. The docs folder contains about 30 documents each about 1,250 tokens. That means 40,000 tokens are used from your context window. I know that you don’t pay the full amount for subsequent inferences, but it still occupies the context window.

Which brings me to the following, second measure we could take.

Agent “Memory” via Embeddings

To combat excessive token use, we can index the docs folder.

We’d create a vector index of the project’s knowledge base, maybe using FAISS as we do in our chatbot product. The knowledge base would be re-indexed as it changes.

Then, in the agents global rules, we’ll add the instruction to search the index against the text of the ticket.

But a ticket can be quite lengthy, 1000 tokens or more, how to distill it into a query to the vector index without losing detail?

Here’s how:

Split the ticket into smaller, semantically meaningful chunks (e.g., 200–300 tokens each).

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=300,
    chunk_overlap=50,
    separators=["\n\n", "\n", ".", " ", ""]

)
chunks = splitter.split_text(long_ticket_text)

Embed each chunk separately to capture local context.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')
chunk_embeddings = model.encode(chunks)

Summarise the ticket. This would be optional, but powerful for both the coding agent and the PM.

from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
summary = summarizer(long_ticket_text, max_length=130, min_length=30, do_sample=False)
query = summary[0]['summary_text']

Now combine the summary with the key terms, the latter being proper nouns and roles that are often critical to the ticket’s context. Alternatively, terms that appear frequently in the text but are not used in common language.

import spacy

nlp = spacy.load(“en_core_web_sm")
doc = nlp(long_ticket_text)
entities = [ent.text for ent in doc.ents if ent.label_ in ["ORG", "PERSON", "GPE", "PRODUCT"]]

Then, augment the query with the entities you just produced:

augmented_query = f"{summary} Key terms: {', '.join(entities)}"

We can now query the vector index. We either search using the summary, so with a broad context, or search using chunks for detailed context:

import faiss

index = faiss.read_index("vector_store.index")
query_embedding = model.encode([augmented_query])
distances, indices = index.search(query_embedding, k=5)

Then re-rank the results by semantic alignment with the original ticket:

from sentence_transformers import CrossEncoder

reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
scores = reranker.predict([(long_ticket_text, doc) for doc in retrieved_docs])

You’d get a prompt that looks like this:

prompt = f"""
Ticket Summary: {summary}
Relevant Context:
{"\n".join([f"- {chunks[i]}" for i in indices[0]])}
Question: {specific_question}
"""

In our particular situation, working with coding agents through our ticketing system as we do, we might add the summary to the ticket, also because it will be helpful to PMs when testing or discussing the ticket with the client.

The relevant context could be pulled in from an internal web application that is fed by continuously updated vector stores pertaining to the project.

And the question would be again sourced from the ticket itself, where correctly structured tickets are vital.

Vector stores are the closest AI gets to “pathways in your brain”. At least for now.

It’s been just a few months since coding agents have gotten good enough for real production work and already we wouldn’t know what to do without them.

But the challenges are real. We’re past the agent being caught in a loop at a syntax error a junior could resolve in a flash. We’re past agents going off on a spree, creating solutions that are not on the project board. All good things.

We’re in the process of wiring up our software company to become as highly automated as possible without losing the human touch. Coding agents adhering to the development cycle plays a crucial part in this plan.

Just doing tickets, like the rest of us. But for that, they’re going to get smarter. We need to make them get smarter.

Vector stores are most definitely not the final answer, but there are some comparisons possible.

When you learn, your brain physically rewires neurons, strengthening connections between related concepts. Smells, emotions, or even a stakeholder’s tone can trigger recall of unrelated but contextually linked memories.

Humans recognise patterns even with incomplete or noisy data. They should, it’s crucial for survival.

In vector stores, text is converted into high-dimensional vectors (embeddings) that capture semantic meaning. For example:

"CTO insists on zero downtime for payments" → [0.2, -0.5, 0.8, ...]

“Stripe API failure caused outage" → [0.3, -0.4, 0.7, ...]

The vector index finds vectors "close" to each other in space. Both sentences above might cluster near a "critical systems" vector.

The system doesn’t know what "downtime" means. It just calculates the vectors are mathematically similar to past issues flagged as high-priority.

Not much like real “knowing”, but it looks like it could help.

The reason we need humans in the loop is because they actually understand what we’re trying to do.

At least, OUR humans do.

The approach described above, even if it delivers some measure of success, is quite convoluted compared to the ease with which humans achieve mapping a project in their minds.

But that doesn’t at all mean we’re discarding the overwhelming advantages of working with coding agents.

We have ourselves, the humans in the loop, at different stages in the development cycle, steering the process and explaining what we’ve done and why.

It’s finicky, it’s trial and error, it’s hard work. But we’re sure it’s the future and the rewards will come.

Header image: M.C. Escher, Relativity, lithograph, 1953