Retrieval Augmented Generation (RAG) is a technique to generate more customized and accurate AI suggestions by using the entire coding environment as a source of relevant context for code completions and chat queries.
In this blog post, I go through all the details about how it works and how we implemented RAG in Refact.ai â an open-source AI coding assistant for IDEs.
**
Introduction: Why RAG matters in AI coding
**
Imagine you have 2 files:
If you just have my_file.py supplied to the model, it doesnât have any way to know to complete âsay_helloâ and that it needs a parameter to that function.
This problem of limited scope of AI models gets much worse as your project gets bigger. So, how to fetch it with the necessary information from your codebase, and do it in real time, and accurately?
It takes a specialized RAG pipeline for this work inside your IDE, and thatâs the point of our new release.
**
Refact.ai Technology Stack
**
We use an intermediate layer between a plugin inside IDE and the AI model called refact-lsp. And yes, it works as an LSP server, too.
Its purpose is to run on your computer, keep track of all the files in your project, index them into an in-memory database, and forward requests for code completion or chat to the AI model, along with the relevant context.
refact-lsp is written in Rust, combining speed of a compiled language and safety guarantees. Rust is great: it has a library for almost any topic you can imagine, including vector databases and a port of tree-sitter â a library to parse source files in many programming languages.
The amazing thing about it: refact-lsp compiles into a single executable file that doesnât require any other software to be installed on your computer â itâs self-sufficient! It means it will not interfere with whatever you are doing on your computer, and it will not break as you update your environment. In fact you donât even see it, it gets installed together with the Refact.ai plugin in your favorite IDE.
**
AST and VecDB
**
There are two kinds of indexing possible: based on Abstract Syntax Tree (AST) and based on Vector Database (VecDB).
What is AST? We use tree-sitter to parse the source files, and then get the positions of function definitions, classes, etc. It is therefore possible to build an index in memory â a mapping from the name of a thing to its position, and make functions like âgo to definitionâ âreferencesâ very fast.
What is VecDB? There are AI models that convert a piece of text (typically up to 1024 tokens) into a vector (typically 768 floating point numbers). All the documents get split into pieces and vectorized, vectors stored in a VecDB. These AI models are trained in such a way that if you vectorize the search query, the closest matches (in a sense of l2 metric between the vectors) in the database will be semantically similar or relevant to the query.
The problem with VecDB is that you need to vectorize the query as well, and that might take some time â not good for code completion that needs to be real-time.
Itâs not an issue for a chat though: here you can play with both indexes using @-commands. More about it is described a few sections down.
**
VecDB: Splitting the Source Code Right
**
To vectorize a piece of text, we first need to make sure itâs a complete construct in a programming language, such as a single function, a single class. This way the semantic matching offered by VecDB will work best.
The easiest way to implement this is to use empty lines as a hint for the boundaries to split:
You can see in this example that the functions are separated by an empty line. We in fact use this method for text files without an available tree-sitter module.
But can we do better in splitting as well? Sure, of course we can! We can simplify the class by shortening function bodies:
If this skeletonized version of the class gets vectorized, you can see itâs much easier to match it against a query when you search for things like âclasses that have jump in themâ, compared to the situation when the splitter just vectorized âjumpâ function without its class.
**
AST: Simple Tricks to Make It Better
**
A library like tree-sitter can transform the source code into individual elements: function definitions, function calls, classes, etc.The most useful case: match types and function calls near the cursor with definitions.
See how it works:
If you click on the âFIMâ (fill-in-the-middle) button, you can see these in the sidebar with a đ icon.
But besides this simple matching, there are some tricks we can do. For the symbols near the cursor, we can first look at their type, and then go to the definitions. And for classes, we can go to their parent class. Those are simple rules that work for all programming languages!
Finally, treating all the identifiers as just strings, we can find similar pieces of code – it should have similar identifiers in them. A similar code can help a model to generate a good answer as well.
**
Post Processing
**
Letâs say youâve found in the AST and VecDB indexes 50 interesting points that might help the model to do its job. Now you have additional issues to solve:
There might be just too many results to fit into the AI modelâs context. Thereâs a budget measured in tokens to fit memory requirements, or latency requirements (code completion is real time), or model limitation.
The results themselves might not make much sense without at least a little bit of structure where it appears. For example, for a âfunction do_the_thing()â itâs important to show itâs inside a class, and which class.
There can be overlapping or duplicate results.
Those problems can be solved with good post-processing.
This is how our post-processing works:
It loads all files mentioned in the search results into memory, and it keeps track of the âusefulnessâ of each line.
Each result from AST or VecDB now just makes an increase in the usefulness of the lines it refers to. For example, if âmy_functionâ is found, all the lines that define my_function will increase in usefulness, and the lines that contain the signature of the function (name, parameters and return type) will increase in usefulness more, compared to its body.
All that is left to do is to sort all the lines by usefulness, then take the most useful until the token budget is reached, and output the result:
You see there, post-processing can fit into any token budget, keeping the most useful lines, and rejecting less useful ones, replacing them with ellipsis.
One interesting effect is skeletonization of the code. As the budget decreases, less and less lines can make it into the context, our post-processing prefers to keep some of the code structure (which class the function belongs to) over the body of that function.
**
Oh Look, Itâs Similar to grep-ast!
**
Yes you are right, it is! In fact we took inspiration from grep-ast, a small utility that uses tree-sitter to look for a string in a directory, and it also prefers to keep the structure of code so you can see where logically in the code your results are.
It doesnât have a notion of token budget though, and itâs written in python so itâs not very fast, and it doesnât have any indexes to search faster.
**
RAG in Refact.ai Chat with @-commands
**
In Refact.ai, weâve made RAG support for chat LLMs too. It can be used with commands to add some important context.
@workspace – Uses VecDB to look for any query. You can give it a query on the same line like this: â@workspace Frog definitionâ, or it will take any lines below it as a query, so you can search for multi-line things like code pieces.
@definition – Looks up for the definition of a symbol. For example, you can ask: â@definition MyClassâ
@references – Same, but it returns references. Example: â@references MyClassâ.
@file – Attaches a file. You can use file_name:LINE1-LINE2 notation for large files to be more specific, for example â@file large_file.ext:42â or â@file large_file.ext:42-56â.
@symbols-at – Looks up any symbols near a given line in a file, and adds the results to the chat context. Uses the same procedure as code completion does. For it to work, you need to specify file and line number: â@symbols-at some_file.ext:42â.
When you start a new chat, there are options available:
âSearch workspaceâ is equivalent to typing @workspace in the input field: it will use your question as a search query.
âAttach current_file.extâ is equivalent to â@file current_file.ext:CURSOR_LINEâ command that attaches the file, and uses the current cursor position (CURSOR_LINE) to deal with potentially large files.
âLookup symbolsâ extracts any symbols around the cursor and searches for them in the AST index. Itâs equivalent to â@symbols-at current_file.ext:CURSOR_LINEâ.
âSelected N linesâ adds the current selection as a snippet for model to analyze or modify, itâs equivalent to writing a piece of code in backquotes my code.
**
Interesting Things You Can Try with RAG
**
Summarize a File
Take a large file, open chat (Option+C) or toolbox (F1), and type âsummarize in 1 paragraphâ. The post-processing described above makes the file fit the chat context you have available. Check out how the file looks in the tooltip for the đ Attached file. The bigger the original file, the more skeletonized version youâll see.
Summarize Interaction
Unfamiliar code is a big problem for humans: it might take hours to understand the interaction of several classes. Hereâs another way to do it: use @definition or @file to put the classes of interest to the context, and ask chat how they interact.
Code Near Cursor with Context
You can add context to chat using the same procedure as code completion: use âLookup symbols near cursorâ or @symbols-at command.
So How Good Is It?
We tested code completion models with and without RAG, and here are the results.
Weâve made a small test that is easy to understand and interpret, it works like this: take 100 random repositories from github for each programming language, delete a random string in a random function, and run single-line code completion to restore it exactly.
It is not perfect, because sometimes itâs hard to reproduce comments exactly (if thereâs any on the deleted line), and there are many easy cases (like a closing bracket) that will not benefit from RAG at all. But still itâs a good test because itâs simple! Here is the dataset.
The results of the test (running StarCoder2/3b):
Takeaways:
RAG always helps!
It helps Java more than Python, because many projects in Python donât use type hints.
Here it is! We believe we’ve developed the best in-IDE RAG for code completion and chat, at least among the open-source solutions we’ve explored.
If you have any questions, feel free to ask them in the comments. I’ll be glad to answer or have a discussion. Or welcome to join our Discord!
P.S. You can try RAG in our Pro plan â use promo code RAGROCKS for a 2-month free trial. Have fun!