Skip to main content

Command Palette

Search for a command to run...

Components of a RAG Application

Exploring the Core Elements

Updated
2 min read
F
I build distributed systems that stay reliable under pressure, and I bring ML intuition to every layer of the stack. With 2+ years shipping production-grade C++/gRPC services at Cohesity (formerly Veritas), I've owned everything from anomaly detection pipelines to cyber-resiliency features end-to-end. Now pursuing my MSc at Stuttgart, I'm combining systems depth with autonomous intelligence research to work on problems that actually matter at scale. At Cohesity, I contributed to Stargate - an enterprise-scale distributed file services layer - where I designed garbage collection logic, built high-performance concurrent file services with thread safety guarantees, and implemented stress-testing frameworks that validated system robustness under high-load conditions. I also independently drove a cyber-resiliency feature from architecture proposal through to production delivery. At Veritas, I led anomaly detection for structured workloads (Oracle, MySQL) using unsupervised ML (K-Means, DBSCAN, Isolation Forest) and built the full ELK data pipeline - Logstash, Elasticsearch, Kibana - containerized with Docker and orchestrated with Kubernetes.

RAG (Retrieval-Augmented Generation) includes three main components:

  1. Embedding Model: This model takes textual information (queries, documents, etc.) and transforms them into numerical representations called "embeddings." These embeddings capture the semantic meaning of the text in a high-dimensional space. Imagine them as unique addresses for each piece of information within a vast, multi-dimensional library.

    • Why Embeddings?

      • Efficiency: Matching text queries to documents directly can be computationally expensive. Embeddings allow for efficient similarity searches in the vector database.

      • Semantic Understanding: Embeddings go beyond simple keyword matching. They capture the underlying meaning of the text, enabling RAG to identify relevant documents even if they don't use the exact same words as the query.

  2. Vector Database: The vector database stores the embeddings generated by the embedding model. It acts like the actual library where all the information (documents, articles, etc.) resides.

    • Function of the Vector Database:

      • Retrieval: When a user enters a query, the embedding model converts it into an embedding. The vector database then searches for documents with embeddings most similar to the query embedding. This effectively retrieves the most relevant information based on semantic meaning.
  3. Large Language Model (LLM): This powerful AI model is fed with the user query and the retrieved information to generate responses.

    • How LLM uses the Retrieved Information?

      • The LLM receives the user query along with the retrieved documents (identified by the vector database). This provides the LLM with context to understand the intent behind the query.

      • With the query and relevant information, the LLM can generate a more comprehensive and informative response. It can leverage the retrieved information to provide factual grounding, answer complex questions, or complete specific tasks.


Here's an analogy:

Imagine a librarian (vector database) with a vast library organized using vector embeddings. When you ask a question (user query), the librarian quickly retrieves the most relevant books (retrieved documents) based on their content (embeddings). With these books in hand (retrieved information), a researcher (LLM) can then analyze the information and provide you with a well-supported and informative answer.