Blogs

Is Your Chatbot Secure?

February 10, 2025
10
Saurabh Shintre
Saurabh Shintre
Vishakha Gupta
Vishakha Gupta
Is Your Chatbot Secure?

Use Realm and ApertureDB To Build Secure RAG Applications

Introduction

Retrieval-augmented generation (RAG) is currently the standard architecture to build AI chatbots. It provides fundamental advantages over fine-tuning based approaches which makes it particularly suitable for enterprise applications, such as:

  • Easy updates to content without requiring fine-tuning
  • Grounding and citations which help reduce hallucinations
  • Personalization

But they have one limitation that can lead to potentially disastrous consequences in the enterprise: the inability to provide role-based access control and information security.

Enterprise data is extremely sensitive and cannot fall into the wrong hands as it may contain personal information, company financials or intellectual property. Leakage of this information can attract business and regulatory risks, including fines.
‍

Fig 1: RAG Architecture
‍

To build a knowledge base for a chatbot, the application typically scraps all the data, including sensitive files, from the underlying source and embeds it into a vectorDB. When the user makes a query, its embedding is compared to the existing embedding corpus to find the most relevant data which is passed as context to the LLM query.

To make sure that sensitive or restricted information is not accidentally retrieved, it is very important to restrict information from going into a query’s context based on the user’s overall permission and sensitivity of the information. But doing so is not easy.

Challenges In Building A Secure Chatbot

There are three main challenges in building a secure and permissioned RAG:

1: Crawling and syncing permissions: To get a complete view of user access, one needs to build infrastructure to scrap file metadata like access control lists as well as user metadata from sources like Active Directory. Additionally, permissions can often change faster than the content itself and require frequent synchronization.

2: Efficient indexing: In modern authorization models, access is not granted to the user in a straightforward manner. Users are often grouped together based on their roles and departments and file/folder level access is given to these roles. If a user’s membership to a group changes, their access to certain documents is also affected. Storing these relationships in a way that a user’s access to a chunk can be determined efficiently is not an easy task.

3: Permissions don’t tell the full story: In most large organizations, permissions are often excessively granted. One example of this is document-sharing using links, where anyone with access to the link can access the document (Fig 2). These links are meant to be secret and only given to users on a need-to-know basis. However, information about which user actually has access to the link cannot be determined by looking at the file. This leads to chatbots serving this content to anyone and is a major cause of data leakage in chatbots.

‍

Fig 2: Shareable link setting in Google Drive
‍

Building the infrastructure to connect with various data sources, scrap ACLs and group memberships and keep it updated by syncing regularly is a significant effort for chatbot developers and any mistakes can be costly.

Realm Labs has solved this problem for chatbot developers by building these enterprise connectors for permissions and data. Realm connects to a number of data sources like Sharepoint, Google Drive, Slack, GitHub etc; indexes data and permissions to any vectorDB of choice. Chatbot developers can leverage Realm to bring enterprise-grade security to their chatbots and get to their customers faster.

Why Choosing The Right Vector Database Matters
‍

Permissions to documents are rarely granted to the user directly but rather by grouping users into roles.

They can be granted at a file, folder, or drive level and similarly assigned to a user, group, or a domain. A user might get the permission to access a file due to their membership into a large group and so on.
‍

Fig 3: Complex permission graph in enterprise
‍

In such cases, permissions of a user to a file may also change due to modifications to any part of the access path. In the example above, if the user’s membership to group1 is removed, their access to the file is lost, even though the file’s ACLs, mentioned at the group level, are not modified.

These examples highlight that the best way to represent access paths are graphs. Each node of the graph can represent a user, a group, a domain, a folder, or a file. An edge can exist between these nodes based on their relationship: membership, access, or containing (e.g Fig 3).

Once all the access paths are represented in this structure, we can know if a user has access to a file by simply finding a path from the user to the file. Any modifications to group membership or file ACLs can be easily incorporated by updating the corresponding edge in the graph without having to re-calculate all user-file accesses.

Now what we need is a vectorDB that can store this graphical information alongside embeddings so that we can quickly check if a selected chunk can be accessed by the user or not.
‍

How ApertureDB’s Graph-Vector Combination Simplifies Secure RAG Pipelines

ApertureDB is a database purpose-built for multimodal AI. It is not just a graph database, a vector database, or raw data storage but these technologies along with some data preparation support is blending together to build a data layer that lets data teams centralize, unify, search, and access different modalities of data via a unified interface. All data types like documents, images, files, folders, users, groups, etc would be represented in ApertureDB as Objects and the relationships amongst them is represented by Connections. The ApertureDB query language allows users to Add, Find, Update, and Delete various objects and connections.

ApertureDB represents certain data types such as images, videos, blobs, bounding boxes as first class objects and then has a general Entity type to allow applications to define what they need. So a “File” will be an entity in the ApertureDB metadata graph. Similarity, descriptors or embeddings can be introduced for any of these objects and users can do a vector search. The unique aspect about this set up is that every time you introduce information such as documents, who has access to it, you are essentially building your organization’s document knowledge graph. Then a RAG query would do a vector search and seamlessly navigate through this graph to filter based on permissions and return only the documents that not only match the vector search but also the querying user’s permissions.

Working example

When handling millions of enterprise users and resources, managing access control efficiently is critical. Traditional RBAC systems struggle with scale and flexibility, but combining Realm’s enterprise connectors with ApertureDB’s graph-vector storage enables a dynamic, query-efficient ACL model that enforces access without performance bottlenecks.

Bootstrapping the Enterprise ACL Graph

At the core of access control is the ACL graph—a structured representation of who has access to what. This graph is built by crawling enterprise systems using Realm's connectors and persisting the relationships in ApertureDB.

Crawling Enterprise Systems with Realm

Enterprise data resides in SharePoint, OneDrive, AWS S3, and internal databases. Realm’s connectors crawl these sources, discovering:

  • Users (employees, contractors, service accounts)
  • Groups (departments, project teams, security groups)
  • Resources (files, datasets, internal tools)
  • Access Policies (permissions, ACLs, roles)


Example: Crawling Microsoft SharePoint & OneDrive
‍


Why
ApertureDB? Unlike traditional databases, ApertureDB is optimized for graph queries—making access control enforcement faster, scalable, and integrated directly with the vectorDB.

Storing the ACL Graph in ApertureDB

Once the ACL graph is constructed, we persist it in ApertureDB as a structured entity-relationship model:

  • Nodes = Users, Groups, Resources (freyam@realmlabs.ai sites, pages, documents and folders in the case of Sharepoint?
  • Edges = Access relationships
    ‍


Why Store as a Graph
? Access control is inherently a graph problem—we need to traverse relationships dynamically. ApertureDB handles complex ACL queries in O(1) or O(log N) time, while relational databases struggle with recursive joins.

Ingesting Users & Files into ApertureDB

With the ACL structure in place, we now normalize user and resource metadata to enforce access control efficiently.

Loading Users into ApertureDB

Users are the entry point for access control. We ensure:

  • Users exist before enforcing access.
  • Duplicate entries don’t clutter the database.
  • Updates reflect permission changes.

    ‍

‍

‍

🔹 Why Check Before Insert? This avoids duplicate user records, ensuring cleaner queries and better performance.

Loading Files & Establishing Access Relationships

Files must be linked to users based on access permissions.

‍

‍

🔹 Why Use Connections? ApertureDB efficiently traverses these connections, allowing instant access verification per user.

Enforcing Access Control During Retrieval

With all relationships in place, we now enforce real-time access control when retrieving resources.

Approach 1: Precompute ACL & Filter Queries

Instead of retrieving all data, we filter unauthorized files upfront—reducing the vector search scope.

‍

‍

🔹 Why Precompute? This reduces vector search costs by filtering out unauthorized resources before retrieval.

Approach 2: Post-filter After Retrieval

In cases where dynamic access control rules apply, we retrieve first, then validate permissions dynamically.
‍

‍

Conclusion

By integrating Realm’s secure connectors with ApertureDB’s graph engine, we deliver a scalable, real-time access control system ready for enterprise workloads. Combining the powers of Realm and ApertureDB, chatbot builders can build an effective chatbot within hours with:

1: Enterprise-grade security and privacy

2: High retrieval quality

3: Low latency and management overhead

‍

About Realm Labs

Realm Labs is an AI Security company focused on securing the AI stack. We are a team of AI Security experts with more than 15 years of experience in data and AI Security as well as ML Ops, explainability etc. At its core, Realm is a complete suite of AI Security solutions that can allow developers to build AI applications with enterprise-grade security controls.

About ApertureData

ApertureData is focused on accelerating AI development with "all types of data". Our product, ApertureDB, is the world’s first graph-vector hybrid database, purpose-built for multimodal AI applications. ApertureDB lowers setup time by 6-9 months (fewer integrations, easy setup etc), speeds up development by 10X, all while maintaining low TCO (savings upwards of $2M per team of 10) and high performance (2-35X better against contemporary databases).

‍

‍

Tags:

Related Blogs

Building Real World RAG-based Applications with ApertureDB
Blogs
Building Real World RAG-based Applications with ApertureDB
LLMs, RAGs, Chatbots, Agents. All hot topics! 🔥 But what does it mean to implement these and make them work well? See some real examples built on ApertureDB's purpose-built multimodal vector db.
Read More
Watch Now
Applied
2048 Ventures: Our Investment in ApertureData
Blogs
2048 Ventures: Our Investment in ApertureData
Learn why we think ApertureData is going to transform visual data management for ML...
Read More
Watch Now
Industry Experts
Building AI Databases: Insights on Vector and Graph Solutions
Videos & Podcasts
Building AI Databases: Insights on Vector and Graph Solutions
AI Innovation Summit talk on the how we designed and developed ApertureDB to...
Read More
Watch Now
Product
ApertureDB Now Available on DockerHub
Blogs
ApertureDB Now Available on DockerHub
Getting started with ApertureDB has never been easier or safer...
Read More
Watch Now
Product
Building Real World RAG-based Applications with ApertureDB
Blogs
Building Real World RAG-based Applications with ApertureDB
Combining different AI technologies, such as LLMs, embedding models, and a database like ApertureDB that is purpose-built for multimodal AI, can significantly enhance the ability to retrieve and generate relevant content.
Read More
Managing Visual Data for Machine Learning and Data Science. Painlessly.
Blogs
Managing Visual Data for Machine Learning and Data Science. Painlessly.
Visual data or image/video data is growing fast. ApertureDB is a unique database...
Read More
What’s in Your Visual Dataset?
Blogs
What’s in Your Visual Dataset?
CV/ML users need to find, analyze, pre-process as needed; and to visualize their images and videos along with any metadata easily...
Read More
Transforming Retail and Ecommerce with Multimodal AI
Blogs
Transforming Retail and Ecommerce with Multimodal AI
Multimodal AI can boost retail sales by enabling better user experience at lower cost but needs the right infrastructure...
Read More
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1
Blogs
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1
Multimodal AI, vector databases, large language models (LLMs)...
Read More
How a Purpose-Built Database for Multimodal AI Can Save You Time and Money
Blogs
How a Purpose-Built Database for Multimodal AI Can Save You Time and Money
With extensive data systems needed for modern applications, costs...
Read More
Minute-Made Data Preparation with ApertureDB
Blogs
Minute-Made Data Preparation with ApertureDB
Working with visual data (images, videos) and its metadata is no picnic...
Read More
Why Do We Need A Purpose-Built Database For Multimodal Data?
Blogs
Why Do We Need A Purpose-Built Database For Multimodal Data?
Recently, data engineering and management has grown difficult for companies building modern applications...
Read More
Building a Specialized Database for Analytics on Images and Videos
Blogs
Building a Specialized Database for Analytics on Images and Videos
ApertureDB is a database for visual data such as images, videos, embeddings and associated metadata like annotations, purpose-built for...
Read More
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2
Blogs
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2
Multimodal AI, vector databases, large language models (LLMs)...
Read More
Challenges and Triumphs: Multimodal AI in Life Sciences
Blogs
Challenges and Triumphs: Multimodal AI in Life Sciences
AI presents a new and unparalleled transformational opportunity for the life sciences sector...
Read More
Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?
Blogs
Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?
The data landscape has dramatically changed in the last two decades...
Read More
Can A RAG Chatbot Really Improve Content?
Blogs
Can A RAG Chatbot Really Improve Content?
We asked our chatbot questions like "Can ApertureDB store pdfs?" and the answer it gave..
Read More
ApertureDB Now Available on DockerHub
Blogs
ApertureDB Now Available on DockerHub
Getting started with ApertureDB has never been easier or safer...
Read More
Are Vector Databases Enough for Visual Data Use Cases?
Blogs
Are Vector Databases Enough for Visual Data Use Cases?
ApertureDB vector search and classification functionality is offered as part of our unified API defined to...
Read More
Accelerate Industrial and Visual Inspection with Multimodal AI
Blogs
Accelerate Industrial and Visual Inspection with Multimodal AI
From worker safety to detecting product defects to overall quality control, industrial and visual inspection plays a crucial role...
Read More
ApertureDB 2.0: Redefining Visual Data Management for AI
Blogs
ApertureDB 2.0: Redefining Visual Data Management for AI
A key to solving Visual AI challenges is to bring together the key learnings of...
Read More

Ready to Accelerate your AI Workflows?

Unlock 10X productivity and simplify multimodal AI data management with ApertureDB—try it for free or schedule a demo today!

Stay Connected:
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.