ApertureData

Large language models (LLMs) are powerful. However, because they operate as “black boxes”, LLMs are difficult to explain and can hallucinate, producing convincing yet inaccurate answers that erode trust. Without an explanation, alignment toward accuracy is a challenging path. Retrieval-augmented generation (RAG) has an answer as it blends traditional or vector search with generative AI, delivering more accurate responses while clarifying the source of truth.

RAG is valuable for businesses, with projections suggesting that 25% of large enterprises will adopt it by 2030. However, RAG has limitations, leading to a growing interest in agentic RAG. They are more intelligent than RAG systems since they use RAG as a tool and reason accordingly if other tools are required.

In this article, we’ll explore the challenges of vanilla RAG, how agentic RAG can address them, and provide a practical implementation using ApertureDB and SmolAgents.

Overview of Agentic RAG

Agentic RAG is an enhanced version of the traditional RAG pipeline, where AI agents are integrated to improve the retrieval and generation process. While vanilla RAG typically involves retrieval with optional reranking and diversification before generation, agentic RAG builds upon this by introducing agents that can::

Reformulate Queries: Agents can reframe and refine the query based on the context, ensuring more focused retrieval.
Assess Retrieval Results: Once results are fetched, agents can evaluate their relevance and quality. If needed, they can re-query or adjust the search to improve outcomes.
Perform Iterative Retrieval: Agents can perform multiple rounds of retrieval, continuously refining the context for more accurate responses. They might rephrase queries, change tools, or use additional sources, ensuring the most relevant information is provided.
Decision-Making: Agents decide which tools to use (e.g., vector search, web search, APIs) and when to re-query, enhancing the overall effectiveness of the RAG process.

There are many frameworks for creating workflows and working with agents. We will look at SmolAgents, Hugging Face’s lightweight and flexible framework designed for simplicity and first-class support for Code Agents.

Code Agents are a type of agent designed to perform actions by writing and executing code directly rather than returning structured outputs like JSON or text. They integrate with tools and Python functions that include type hints and descriptions and pair them with any language model, including those from Hugging Face, OpenAI, and Anthropic. SmolAgents also integrates with the Hugging Face Hub, enabling users to share and load tools easily.

Why Agentic RAG is Superior

While effective in many scenarios, vanilla RAG has significant limitations that can undermine its reliability. Here are some of the limitations that can be fixed with agentic RAG:

Limitations of Vanilla RAG

Vanilla RAG relies on a single retrieval step to gather documents. If the retrieved documents are irrelevant or incomplete, the generation step produces flawed responses, as there’s no mechanism to critique or refine the results.
Semantic similarity in vanilla RAG uses the user query as the sole reference point. This can lead to misaligned results, especially if the query phrasing doesn’t match the structure of the target documents. For instance, questions like "What are the benefits of exercise?" might fail to retrieve relevant documents phrased in statements, downgrading their similarity scores.
If no relevant documents are retrieved, vanilla RAG might generate highly hallucinated responses instead of exploring alternate approaches like searching the web.

Here’s how to address the above issues:

Agentic RAG equips an agent to adjust and refine queries dynamically. If the first retrieval attempt fails, the agent critiques the results and formulates a new query to re-retrieve documents. The failure can depend on the number of documents retrieved or the similarity score being less than the threshold or anything dependent on the use-case. Once failure is detected, it can ask the LLM to reformulate the query or ask the user to feed in more context. This iterative process ensures more accurate and complete information is gathered.
By generating reference sentences closer to the language of the documents (e.g., declarative statements instead of interrogative ones), Agentic RAG improves semantic matching. Techniques like HyDE allow the agent to create hypothetical answers to guide retrieval, while re-retrieval capabilities enable better handling of complex or ambiguous queries.
When no relevant documents are initially found, Agentic RAG prevents hallucinations by exploring other approaches defined in the tools. For example, it can search the web for relevant information or talk to another database. The possibilities are endless.

Let’s see a use case of the agentic RAG. A customer asks a chatbot, "How do I reset my device?" If the initial search misses relevant instructions, Vanilla RAG might retrieve irrelevant documents or generate an inaccurate response.

In Agentic RAG:

The agent refines the query to "Steps to reset a device" and extracts the device number.
It selects the appropriate tool from its list to connect to a database of manuals and retrieve the relevant device number.
The system fetches the necessary data from the database.
If the initial results are insufficient, the agent reassesses the query and re-queries using alternative phrasing or generates new snippets to improve accuracy.

Agentic RAG for Research Paper Search with ApertureDB & SmolAgents

Researchers and professionals often feel overwhelmed by the sheer volume of daily academic papers. Traditional keyword-based search engines often return irrelevant results or overlook key studies because they rely on exact matches. What is the main challenge? The vast volume of literature can lead to inefficient reviews and missed opportunities for discovery.

Solution

This guide will walk you through the process of building a solution using Retrieval-Augmented Generation (RAG) with ApertureDB as a vector store and Hugging Face SmolAgents for query refinement and retrieval of research papers. Since research papers are typically in PDF format, we will use the Unstructured library to extract their content. It also supports various other unstructured data formats, offering a range of useful functionalities.

Import and Setups

Before diving into the implementation, let’s prepare the environment.

Setup

Run the following commands to install the necessary dependencies. These include libraries for handling datasets, processing PDFs, generating embeddings, and managing the vector database.

‍

!pip install opendatasets !pip install 'smolagents[litellm]' !pip install openai !pip install smolagents !pip install unstructured[pdf] !pip install gradio !pip install langchain-openai !pip install --quiet --upgrade aperturedb !pip install pandas !pip install langchain-community !pip install arxiv

‍

We are required to update system packages with sudo apt-get update, install poppler-utils for PDF manipulation, and use pip to add Tesseract bindings for OCR. Finally, install the Tesseract OCR engine (apt install tesseract-ocr) to enable text extraction from images and PDFs.

‍

sudo apt-get update apt-get install poppler-utils pip install tesseract apt install tesseract-ocr

‍

Imports

We’ll use various libraries to handle data fetching, embedding generation, and interaction with ApertureDB. Import these modules in your script:

import os import json import arxiv import requests import pandas as pd import opendatasets as od from langchain_core.documents import Document from unstructured.partition.auto import partition from langchain_openai import OpenAIEmbeddings, ChatOpenAI from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_community.document_loaders import UnstructuredPDFLoader from langchain.chains import ( StuffDocumentsChain, LLMChain ) from langchain.schema import HumanMessage, AIMessage from langchain.prompts import PromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate, ChatPromptTemplate, MessagesPlaceholder from langchain.callbacks.manager import trace_as_chain_group import gradio as gr from smolagents import Tool from langchain_community.vectorstores import ApertureDB from smolagents import ToolCallingAgent, LiteLLMModel

Setting up the .env File (Optional)

Create a .env file in your project directory to securely store sensitive credentials. This file will store your OpenAI API key.

OPENAI_API_KEY=your_openai_api_key

In your code, ensure you load these environment variables using the os module:

from dotenv import load_dotenv load_dotenv() OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") KAGGLE_TOKEN = os.getenv("KAGGLE_TOKEN")

Preparing the Data

We’ll use the ArXiv Papers dataset to simulate a research environment. The dataset contains metadata and content for millions of research papers.

Download the Dataset

To fetch the dataset from Kaggle, run the following:

dataset = 'https://www.kaggle.com/datasets/Cornell-University/arxiv' od.download(dataset)

You’ll need to provide your Kaggle username and API token, which can be generated by creating a Kaggle account and creating a new API token. After providing the necessary information, it will download the dataset.

Extracting and Chunking Data

We’ll process the dataset by extracting paper details from PDFs and splitting the text into smaller chunks.

def fetch_paper_details(arxiv_id): """Download and parse a paper from ArXiv.""" paper = next(arxiv.Client().results(arxiv.Search(id_list=[arxiv_id]))) paper.download_pdf(filename=f"{arxiv_id}.pdf") return partition(f"{arxiv_id}.pdf")

‍

The fetch_paper_details function downloads the PDF and utilizes the partition feature of the unstructured library to identify the file type and extract structured content from the unstructured document.

‍

papers = [] text_splitter = RecursiveCharacterTextSplitter( chunk_size=5000, # Maximum size of each chunk chunk_overlap=200, # Overlap between chunks length_function=len, is_separator_regex=False, ) sample = 100 # Limit the number of papers for this demonstration # Process a subset of the dataset with open("arxiv/arxiv-metadata-oai-snapshot.json", "r") as file: for _ in range(sample): line = file.readline() data = json.loads(line) arxiv_id = data.get("id", "") paper_details = "".join( text if isinstance((text := element.text), str) else "".join(str(part) for part in text) if isinstance(text, (list, tuple)) else str(text) for element in fetch_paper_details(arxiv_id) ) chunks = text_splitter.create_documents([paper_details]) for idx, chunk in enumerate(chunks): document_id = f"{arxiv_id}_{idx + 1}" document = Document( page_content=chunk.page_content, id=document_id, metadata={ 'title': data.get("title", ""), 'authors': data.get("authors", ""), 'submitter': data.get("submitter", ""), 'abstract': data.get("abstract", ""), 'paper_content': chunk.page_content } ) papers.append(document)

We process a subset of an arXiv metadata JSON dataset, extracting and chunking paper content for structured analysis. It uses the fetch_paper_details and splits the text into manageable chunks with overlap using the RecursiveCharacterTextSplitter, and creates Document objects with metadata for each chunk.

Setting Up ApertureDB

ApertureDB is a multimodal database that stores and manages diverse data types, including images, videos, documents, feature vectors (embeddings), and their associated metadata, such as annotations. To get started, sign up here and create an instance.

After creating an instance, wait till the status shows running. Click on connect as shown in the image:

Fig 1: Connection to ApertureDB Instance

‍

As we are on Colab, copy the connection string for it.

Configuring ApertureDB

We can configure ApertureDB as follows by providing the connection string:

! adb config create --active --from-json

Inserting Embeddings

Once your data is ready, generate embeddings and store them in ApertureDB:

embeddings = OpenAIEmbeddings(api_key=OPENAI_API_KEY) vector_db = ApertureDB.from_documents(papers, embeddings)

It's straightforward: first, load the desired embedding model, and then use LangChain to store the vectors and documents in the ApertureDB vector store.

Building the Agentic Workflow

The workflow involves creating an intelligent agent capable of retrieving semantically relevant documents and dynamically refining queries.

Creating the Retriever Tool

A tool is a class in SmolAgents designed to help the LLM understand its purpose, inputs, and outputs. This makes it accessible within the agent system. Here are the things required to create a tool in smolAgents:

Key Attributes:
1. name: A short identifier for the tool describing what it does (e.g., "document_retriever").
2. description: A clear explanation of the tool’s purpose, helping the LLM decide when to use it.
3. inputs: A dictionary defining what the tool accepts. For example, a "query" input might have a type (string) and a description explaining its role.
4. output_type: The kind of data the tool returns, like "string", tells the LLM what to expect.
Core Logic (forward): The tool's primary function is implemented in the forward method. This is where the operation happens, such as retrieving documents based on a query.
Helper Methods (Optional): Supporting methods can be added for reusable functionality, keeping the code modular and clear.
Tool Initialization: When creating the tool instance, dependencies (like an OpenAI client) are passed into the constructor to ensure it works as intended.

The RetrieverTool class interacts with ApertureDB to retrieve relevant documents based on semantic similarity.

Here's the rephrased version of the class, name, description, and inputs:

class DocumentRetrieverTool(Tool): name = "document_retriever" description = "Performs semantic search to fetch documents relevant to a given query." inputs = { "query": { "type": "string", "description": "The input query, which should be semantically aligned with the content of the desired documents.", } } output_type = "string" def __init__(self, openai_client, **kwargs): super().__init__(**kwargs) self.embedder = openai_client def retriever(self, query: str, n=5): retriever = vector_db.as_retriever(search_type="mmr", search_kwargs={"k": n}) results = retriever.invoke(query) return "\nRetrieved documents:\n" + "".join( [ f"\n\n===== Document {str(i)} =====\n" + doc.page_content for i, doc in enumerate(results) ] ) def forward(self, query: str) -> str: docs = self.retriever(query) return docs document_retriever_tool = DocumentRetrieverTool(openai_client=OpenAI(api_key=OPENAI_API_KEY))

The tool above will retrieve the documents from the ApertureDB with a refined query.

Integrating with ApertureDB

Finally, the retriever tool can be connected to an orchestrator model that can have multiple tools in its hand:

model = LiteLLMModel(model_id="gpt-3.5-turbo") agent = ToolCallingAgent(tools=[document_retriever_tool], model=model) question = "Why is calculating Higgs Boson decay important?" agent_output = agent.run(question)

By following this guide, you’ve built a RAG system capable of handling academic papers effectively.

Why Vectorstore is an Important Piece in the RAG Puzzle: Features of ApertureDB

The quality of generation hinges on the retriever’s ability to pull in the most relevant insights from the vector database. That’s why vector stores play an important role in performant RAG applications. This makes it a must to choose the right vector database. Here are the following considerations:

Scalability: How much domain-specific data will your database need to manage? As your data grows, ensure the database can scale efficiently, supporting potentially billions of embeddings.
Dimensionality of Embeddings: Different models produce embeddings with varying dimensions. Higher dimensions can capture more context but may lead to slower queries and diminishing returns. Consider a database that can handle all.
Search Accuracy and Indexing: While ANN improves speed and memory usage, it trades off some accuracy, which may or may not suit your application. Your vector database must support what you require.

ApertureDB provides the following in addition to the above:

Multimodal Database: Supports various data types, including images, videos, and text, enabling seamless integration of multiple data modalities in RAG systems. In the 2nd part, we will enhance the functionality of the current agent and analyze the images within the arXiv paper to gather additional context.
Graph Database for Metadata: Facilitates the creation of knowledge graphs by linking multimodal metadata, enriching the context for data retrieval and relationships in RAG systems.
On-the-Fly Image and Video Modification: Allows dynamic adjustments to images and videos, minimizing data duplication and optimizing resource use in visual RAG workflows.
Integrations: It integrates with most AI/ML frameworks, like HuggingFace, Langchain, etc., to make development easier and more modular.

Recap

While traditional RAG bridges the gap by empowering LLMs with private knowledge through vector databases, the rising demand for faster, more adaptive AI calls for a shift to next-gen solutions like agentic RAG. They enable multi-step retrieval by refining and adjusting the query depending on the results. In this article, we’ve built an Agentic RAG application to search and ask questions about research papers using ApertureDB and Smolagents.

Feel free to explore and adapt the code to create your version of this use case. You can start experimenting with ApertureDB today, sign up for a free instance, and see the power of agentic RAG firsthand.

Tags:

Retrieval augmented generation (RAG)

AI Agents

Multimodal / Generative AI

Vector / similarity / semantic search

Related Blogs

Beyond Vanilla RAG: Unlocking Enhanced Retrieval with GraphRAG and ApertureDB

Blogs

April 2, 2025

Beyond Vanilla RAG: Unlocking Enhanced Retrieval with GraphRAG and ApertureDB

Unlock the power of GraphRAG for better AI retrieval. Learn how ApertureDB enables structured knowledge graphs for accurate, context-rich LLM responses in addition to its vector search and multimodal data management capabilities.

Watch Now

Blogs

February 10, 2025

Is Your Chatbot Secure?

ApertureData and Realm Labs help developers build secure RAG chatbots by combining advanced permissions management with graph-vector storage, ensuring data protection and efficient access control.

Watch Now

Insights and Musings from the Grace Hopper Panel: Navigating Conflicts and Synergies in Multimodal AI Across Industries

Blogs

November 26, 2024

Insights and Musings from the Grace Hopper Panel: Navigating Conflicts and Synergies in Multimodal AI Across Industries

This panel of experts got together to demystify the various AI tools and technologies, understand how the technologies interplay, and help our audience understand both the similarities and differences, across various industries.

Watch Now

Building Real World RAG-based Applications with ApertureDB

Blogs

November 21, 2024

Building Real World RAG-based Applications with ApertureDB

LLMs, RAGs, Chatbots, Agents. All hot topics! 🔥 But what does it mean to implement these and make them work well? See some real examples built on ApertureDB's purpose-built multimodal vector db.

Watch Now

Blogs

Nov 21, 2024

Building Real World RAG-based Applications with ApertureDB

Combining different AI technologies, such as LLMs, embedding models, and a database like ApertureDB that is purpose-built for multimodal AI, can significantly enhance the ability to retrieve and generate relevant content.

Managing Visual Data for Machine Learning and Data Science. Painlessly.

Blogs

Oct 15, 2024

Managing Visual Data for Machine Learning and Data Science. Painlessly.

Visual data or image/video data is growing fast. ApertureDB is a unique database...

Blogs

Oct 15, 2024

What’s in Your Visual Dataset?

CV/ML users need to find, analyze, pre-process as needed; and to visualize their images and videos along with any metadata easily...

Transforming Retail and Ecommerce with Multimodal AI

Blogs

Oct 15, 2024

Transforming Retail and Ecommerce with Multimodal AI

Multimodal AI can boost retail sales by enabling better user experience at lower cost but needs the right infrastructure...

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1

Blogs

Oct 15, 2024

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1

Multimodal AI, vector databases, large language models (LLMs)...

How a Purpose-Built Database for Multimodal AI Can Save You Time and Money

Blogs

Oct 15, 2024

How a Purpose-Built Database for Multimodal AI Can Save You Time and Money

With extensive data systems needed for modern applications, costs...

Minute-Made Data Preparation with ApertureDB

Blogs

Oct 15, 2024

Minute-Made Data Preparation with ApertureDB

Working with visual data (images, videos) and its metadata is no picnic...

Why Do We Need A Purpose-Built Database For Multimodal Data?

Blogs

Oct 15, 2024

Why Do We Need A Purpose-Built Database For Multimodal Data?

Recently, data engineering and management has grown difficult for companies building modern applications...

Building a Specialized Database for Analytics on Images and Videos

Blogs

Oct 15, 2024

Building a Specialized Database for Analytics on Images and Videos

ApertureDB is a database for visual data such as images, videos, embeddings and associated metadata like annotations, purpose-built for...

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2

Blogs

Oct 15, 2024

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2

Multimodal AI, vector databases, large language models (LLMs)...

Challenges and Triumphs: Multimodal AI in Life Sciences

Blogs

Oct 15, 2024

Challenges and Triumphs: Multimodal AI in Life Sciences

AI presents a new and unparalleled transformational opportunity for the life sciences sector...

Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?

Blogs

Oct 15, 2024

Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?

The data landscape has dramatically changed in the last two decades...

Can A RAG Chatbot Really Improve Content?

Blogs

Oct 15, 2024

Can A RAG Chatbot Really Improve Content?

We asked our chatbot questions like "Can ApertureDB store pdfs?" and the answer it gave..

Blogs

Oct 15, 2024

ApertureDB Now Available on DockerHub

Getting started with ApertureDB has never been easier or safer...

Are Vector Databases Enough for Visual Data Use Cases?

Blogs

Oct 15, 2024

Are Vector Databases Enough for Visual Data Use Cases?

ApertureDB vector search and classification functionality is offered as part of our unified API defined to...

Accelerate Industrial and Visual Inspection with Multimodal AI

Blogs

Oct 15, 2024

Accelerate Industrial and Visual Inspection with Multimodal AI

From worker safety to detecting product defects to overall quality control, industrial and visual inspection plays a crucial role...

ApertureDB 2.0: Redefining Visual Data Management for AI

Blogs

Oct 15, 2024

ApertureDB 2.0: Redefining Visual Data Management for AI

A key to solving Visual AI challenges is to bring together the key learnings of...

Agentic RAG with ApertureDB and HuggingFace SmolAgents

Overview of Agentic RAG

Why Agentic RAG is Superior

Limitations of Vanilla RAG

In Agentic RAG:

Agentic RAG for Research Paper Search with ApertureDB & SmolAgents

Solution

Import and Setups

Setup

Imports

Setting up the .env File (Optional)

Preparing the Data

Download the Dataset

Extracting and Chunking Data

Setting Up ApertureDB

Configuring ApertureDB

Inserting Embeddings

Building the Agentic Workflow

Creating the Retriever Tool

Integrating with ApertureDB

Why Vectorstore is an Important Piece in the RAG Puzzle: Features of ApertureDB

Recap

Related Blogs

Ready to Accelerate your AI Workflows?