Use Realm and ApertureDB To Build Secure RAG Applications

Introduction

Retrieval-augmented generation (RAG) is currently the standard architecture to build AI chatbots. It provides fundamental advantages over fine-tuning based approaches which makes it particularly suitable for enterprise applications, such as:

Easy updates to content without requiring fine-tuning
Grounding and citations which help reduce hallucinations
Personalization

But they have one limitation that can lead to potentially disastrous consequences in the enterprise: the inability to provide role-based access control and information security.

Enterprise data is extremely sensitive and cannot fall into the wrong hands as it may contain personal information, company financials or intellectual property. Leakage of this information can attract business and regulatory risks, including fines.
‍

To build a knowledge base for a chatbot, the application typically scraps all the data, including sensitive files, from the underlying source and embeds it into a vectorDB. When the user makes a query, its embedding is compared to the existing embedding corpus to find the most relevant data which is passed as context to the LLM query.

To make sure that sensitive or restricted information is not accidentally retrieved, it is very important to restrict information from going into a query’s context based on the user’s overall permission and sensitivity of the information. But doing so is not easy.

Challenges In Building A Secure Chatbot

There are three main challenges in building a secure and permissioned RAG:

1: Crawling and syncing permissions: To get a complete view of user access, one needs to build infrastructure to scrap file metadata like access control lists as well as user metadata from sources like Active Directory. Additionally, permissions can often change faster than the content itself and require frequent synchronization.

2: Efficient indexing: In modern authorization models, access is not granted to the user in a straightforward manner. Users are often grouped together based on their roles and departments and file/folder level access is given to these roles. If a user’s membership to a group changes, their access to certain documents is also affected. Storing these relationships in a way that a user’s access to a chunk can be determined efficiently is not an easy task.

3: Permissions don’t tell the full story: In most large organizations, permissions are often excessively granted. One example of this is document-sharing using links, where anyone with access to the link can access the document (Fig 2). These links are meant to be secret and only given to users on a need-to-know basis. However, information about which user actually has access to the link cannot be determined by looking at the file. This leads to chatbots serving this content to anyone and is a major cause of data leakage in chatbots.

‍

Fig 2: Shareable link setting in Google Drive
‍

Building the infrastructure to connect with various data sources, scrap ACLs and group memberships and keep it updated by syncing regularly is a significant effort for chatbot developers and any mistakes can be costly.

Realm Labs has solved this problem for chatbot developers by building these enterprise connectors for permissions and data. Realm connects to a number of data sources like Sharepoint, Google Drive, Slack, GitHub etc; indexes data and permissions to any vectorDB of choice. Chatbot developers can leverage Realm to bring enterprise-grade security to their chatbots and get to their customers faster.

Why Choosing The Right Vector Database Matters
‍

Permissions to documents are rarely granted to the user directly but rather by grouping users into roles.

They can be granted at a file, folder, or drive level and similarly assigned to a user, group, or a domain. A user might get the permission to access a file due to their membership into a large group and so on.
‍

Fig 3: Complex permission graph in enterprise
‍

In such cases, permissions of a user to a file may also change due to modifications to any part of the access path. In the example above, if the user’s membership to group1 is removed, their access to the file is lost, even though the file’s ACLs, mentioned at the group level, are not modified.

These examples highlight that the best way to represent access paths are graphs. Each node of the graph can represent a user, a group, a domain, a folder, or a file. An edge can exist between these nodes based on their relationship: membership, access, or containing (e.g Fig 3).

Once all the access paths are represented in this structure, we can know if a user has access to a file by simply finding a path from the user to the file. Any modifications to group membership or file ACLs can be easily incorporated by updating the corresponding edge in the graph without having to re-calculate all user-file accesses.

Now what we need is a vectorDB that can store this graphical information alongside embeddings so that we can quickly check if a selected chunk can be accessed by the user or not.
‍

How ApertureDB’s Graph-Vector Combination Simplifies Secure RAG Pipelines

ApertureDB is a database purpose-built for multimodal AI. It is not just a graph database, a vector database, or raw data storage but these technologies along with some data preparation support is blending together to build a data layer that lets data teams centralize, unify, search, and access different modalities of data via a unified interface. All data types like documents, images, files, folders, users, groups, etc would be represented in ApertureDB as Objects and the relationships amongst them is represented by Connections. The ApertureDB query language allows users to Add, Find, Update, and Delete various objects and connections.

ApertureDB represents certain data types such as images, videos, blobs, bounding boxes as first class objects and then has a general Entity type to allow applications to define what they need. So a “File” will be an entity in the ApertureDB metadata graph. Similarity, descriptors or embeddings can be introduced for any of these objects and users can do a vector search. The unique aspect about this set up is that every time you introduce information such as documents, who has access to it, you are essentially building your organization’s document knowledge graph. Then a RAG query would do a vector search and seamlessly navigate through this graph to filter based on permissions and return only the documents that not only match the vector search but also the querying user’s permissions.

Working example

When handling millions of enterprise users and resources, managing access control efficiently is critical. Traditional RBAC systems struggle with scale and flexibility, but combining Realm’s enterprise connectors with ApertureDB’s graph-vector storage enables a dynamic, query-efficient ACL model that enforces access without performance bottlenecks.

Bootstrapping the Enterprise ACL Graph

At the core of access control is the ACL graph—a structured representation of who has access to what. This graph is built by crawling enterprise systems using Realm's connectors and persisting the relationships in ApertureDB.

Crawling Enterprise Systems with Realm

Enterprise data resides in SharePoint, OneDrive, AWS S3, and internal databases. Realm’s connectors crawl these sources, discovering:

Users (employees, contractors, service accounts)
Groups (departments, project teams, security groups)
Resources (files, datasets, internal tools)
Access Policies (permissions, ACLs, roles)

Example: Crawling Microsoft SharePoint & OneDrive
‍

Why ApertureDB? Unlike traditional databases, ApertureDB is optimized for graph queries—making access control enforcement faster, scalable, and integrated directly with the vectorDB.

Storing the ACL Graph in ApertureDB

Once the ACL graph is constructed, we persist it in ApertureDB as a structured entity-relationship model:

Nodes = Users, Groups, Resources (freyam@realmlabs.ai sites, pages, documents and folders in the case of Sharepoint?
Edges = Access relationships
‍

Why Store as a Graph? Access control is inherently a graph problem—we need to traverse relationships dynamically. ApertureDB handles complex ACL queries in O(1) or O(log N) time, while relational databases struggle with recursive joins.

Ingesting Users & Files into ApertureDB

With the ACL structure in place, we now normalize user and resource metadata to enforce access control efficiently.

Loading Users into ApertureDB

Users are the entry point for access control. We ensure:

Users exist before enforcing access.
Duplicate entries don’t clutter the database.
Updates reflect permission changes.

‍

‍

🔹 Why Check Before Insert? This avoids duplicate user records, ensuring cleaner queries and better performance.

Loading Files & Establishing Access Relationships

Files must be linked to users based on access permissions.

‍

‍

🔹 Why Use Connections? ApertureDB efficiently traverses these connections, allowing instant access verification per user.

Enforcing Access Control During Retrieval

With all relationships in place, we now enforce real-time access control when retrieving resources.

Approach 1: Precompute ACL & Filter Queries

Instead of retrieving all data, we filter unauthorized files upfront—reducing the vector search scope.

‍

‍

🔹 Why Precompute? This reduces vector search costs by filtering out unauthorized resources before retrieval.

Approach 2: Post-filter After Retrieval

In cases where dynamic access control rules apply, we retrieve first, then validate permissions dynamically.
‍

‍

Conclusion

By integrating Realm’s secure connectors with ApertureDB’s graph engine, we deliver a scalable, real-time access control system ready for enterprise workloads. Combining the powers of Realm and ApertureDB, chatbot builders can build an effective chatbot within hours with:

1: Enterprise-grade security and privacy

2: High retrieval quality

3: Low latency and management overhead

‍

About Realm Labs

Realm Labs is an AI Security company focused on securing the AI stack. We are a team of AI Security experts with more than 15 years of experience in data and AI Security as well as ML Ops, explainability etc. At its core, Realm is a complete suite of AI Security solutions that can allow developers to build AI applications with enterprise-grade security controls.

About ApertureData

ApertureData is focused on accelerating AI development with "all types of data". Our product, ApertureDB, is the world’s first graph-vector hybrid database, purpose-built for multimodal AI applications. ApertureDB lowers setup time by 6-9 months (fewer integrations, easy setup etc), speeds up development by 10X, all while maintaining low TCO (savings upwards of $2M per team of 10) and high performance (2-35X better against contemporary databases).

‍

Tags:

Retrieval augmented generation (RAG)

Data privacy and security

Knowledge graph and graph databases

Vector / similarity / semantic search

Related Blogs

Automating Knowledge Graph Creation with Gemini and ApertureDB – Part 4

Blogs

August 15, 2025

Automating Knowledge Graph Creation with Gemini and ApertureDB – Part 4

This blog compares Vanilla RAG and Graph RAG pipelines built with Gemini and ApertureDB, showing that while traditional retrieval metrics favor pure vector search, LLM-based evaluation often prefers Graph RAG for producing clearer, richer, and more connected answers.

Watch Now

Automating Knowledge Graph Creation with Gemini and ApertureDB – Part 3

August 4, 2025

Automating Knowledge Graph Creation with Gemini and ApertureDB – Part 3

In Part 3 of her blog series, Ayesha Imran moves beyond symbolic structure, adding a semantic layer with Gemini embeddings and ingest vectors into ApertureDB native database to enable the hybrid retrieval that makes GraphRAG possible.

Watch Now

The Misunderstood World of Knowledge Graphs

Blogs

July 21, 2025

The Misunderstood World of Knowledge Graphs

Graph databases are powerful in what they can let us build but there are a lot of misconceptions limiting their adoption. This blog addresses those and shows what's possible.

Watch Now

What Does Multimodality Truly Mean For AI?

Blogs

July 1, 2025

What Does Multimodality Truly Mean For AI?

For human quality AI or better, applications based on classic ML to Gen AI to AI agents, will have to be based on multimodal data since we, as humans, process a combination of text, voice, imagery to, relationships to answer questions or decide what we want to do. We explore what that really means.

Watch Now

Building Real World RAG-based Applications with ApertureDB

Blogs

Nov 21, 2024

Building Real World RAG-based Applications with ApertureDB

Combining different AI technologies, such as LLMs, embedding models, and a database like ApertureDB that is purpose-built for multimodal AI, can significantly enhance the ability to retrieve and generate relevant content.

Managing Visual Data for Machine Learning and Data Science. Painlessly.

Blogs

Oct 15, 2024

Managing Visual Data for Machine Learning and Data Science. Painlessly.

Visual data or image/video data is growing fast. ApertureDB is a unique database...

Blogs

Oct 15, 2024

What’s in Your Visual Dataset?

CV/ML users need to find, analyze, pre-process as needed; and to visualize their images and videos along with any metadata easily...

Transforming Retail and Ecommerce with Multimodal AI

Blogs

Oct 15, 2024

Transforming Retail and Ecommerce with Multimodal AI

Multimodal AI can boost retail sales by enabling better user experience at lower cost but needs the right infrastructure...

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1

Blogs

Oct 15, 2024

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1

Multimodal AI, vector databases, large language models (LLMs)...

How a Purpose-Built Database for Multimodal AI Can Save You Time and Money

Blogs

Oct 15, 2024

How a Purpose-Built Database for Multimodal AI Can Save You Time and Money

With extensive data systems needed for modern applications, costs...

Minute-Made Data Preparation with ApertureDB

Blogs

Oct 15, 2024

Minute-Made Data Preparation with ApertureDB

Working with visual data (images, videos) and its metadata is no picnic...

Why Do We Need A Purpose-Built Database For Multimodal Data?

Blogs

Oct 15, 2024

Why Do We Need A Purpose-Built Database For Multimodal Data?

Recently, data engineering and management has grown difficult for companies building modern applications...

Building a Specialized Database for Analytics on Images and Videos

Blogs

Oct 15, 2024

Building a Specialized Database for Analytics on Images and Videos

ApertureDB is a database for visual data such as images, videos, embeddings and associated metadata like annotations, purpose-built for...

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2

Blogs

Oct 15, 2024

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2

Multimodal AI, vector databases, large language models (LLMs)...

Challenges and Triumphs: Multimodal AI in Life Sciences

Blogs

Oct 15, 2024

Challenges and Triumphs: Multimodal AI in Life Sciences

AI presents a new and unparalleled transformational opportunity for the life sciences sector...

Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?

Blogs

Oct 15, 2024

Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?

The data landscape has dramatically changed in the last two decades...

Can A RAG Chatbot Really Improve Content?

Blogs

Oct 15, 2024

Can A RAG Chatbot Really Improve Content?

We asked our chatbot questions like "Can ApertureDB store pdfs?" and the answer it gave..

Blogs

Oct 15, 2024

ApertureDB Now Available on DockerHub

Getting started with ApertureDB has never been easier or safer...

Are Vector Databases Enough for Visual Data Use Cases?

Blogs

Oct 15, 2024

Are Vector Databases Enough for Visual Data Use Cases?

ApertureDB vector search and classification functionality is offered as part of our unified API defined to...

Accelerate Industrial and Visual Inspection with Multimodal AI

Blogs

Oct 15, 2024

Accelerate Industrial and Visual Inspection with Multimodal AI

From worker safety to detecting product defects to overall quality control, industrial and visual inspection plays a crucial role...

ApertureDB 2.0: Redefining Visual Data Management for AI

Blogs

Oct 15, 2024

ApertureDB 2.0: Redefining Visual Data Management for AI

A key to solving Visual AI challenges is to bring together the key learnings of...

Is Your Chatbot Secure?

Use Realm and ApertureDB To Build Secure RAG Applications

Introduction

Challenges In Building A Secure Chatbot

Why Choosing The Right Vector Database Matters
‍

How ApertureDB’s Graph-Vector Combination Simplifies Secure RAG Pipelines

Working example