In the first blog of this series, we covered the basics of multimodal AI, generative AI, and how vector databases enable semantic search over text and multimodal data . In this second blog of the series, we’ll look at real-life examples of where vector databases are used when building some text and multimodal AI applications like semantic search often for question/answering solutions, recognizing faces for security, and using robots to check store shelves . These examples will help us understand fundamental requirements when building various text and multimodal AI applications, and give us an idea of why we need advanced data solutions, beyond vector databases, that can enable more complex and contextual data searches.

Search And Retrieval Requirements of Sample Multimodal AI Applications

Vector indexing, search, and classification are hard problems that are tackled by a growing collection of vector databases in the market or incumbents that have introduced vector support. However , these databases can’t do everything that is expected from the data layer in a classic machine learning pipeline regardless of whether it's for traditional use cases or for GenAI applications.

Let's walk through a few examples we have encountered when working with our users, to really understand what’s required from the data layer, to support multimodal AI in the real world.

Chat Support or Question-Answering Using Semantic Search

A common example we come across now is chat or question-answering support (let’s call it chatbot for now) thanks to LLM-based bots like ChatGPT that often use semantic search . Let’s start with a manufacturer that makes millions of different products that provide error codes when something goes wrong and that also include a product manual with purchase. A customer may experience an issue and sees an error code but doesn’t understand what it means. They visit the manufacturers chatbot and enter in the error code as a first step of similarity search. Next, they want to add more restrictions to the search by focusing on products sold in 2019 so adding more metadata filtering. In addition, the customer also wants a copy of the product manual to learn more. At this time, they not only want an ‘answer’ to the meaning of the error code but also want a link to the complete original manual for their review with some author or date restrictions. This evolved the query from similarity search to address constraints due to complex metadata filtering followed by pdf access requirements making it a problem where vector databases and even the incumbent databases with “vector add-ons” cannot solely provide a complete solution.

Facial Recognition

Another very simple but common use case example is finding a specific face in a collection of faces for say, surveillance/safety concerns or just remembering someone’s name. First, we start with a similarity search of face images but then you want to restrict your search to limit by nationality where people from the United States are the only ones we want to find. This introduces a metadata attribute. Next, you remember that the person you are searching for acted in a specific movie, so you add that search constraint. You also would like to see video clips of the movies in which this person appeared. As you can see, your search gradually evolved from simple similarity search to more and more complex metadata filtering followed by video access, capabilities that vector databases were simply not built to support on their own.

‍

Visual Recommendations

Let’s say an e-commerce company with over millions of products wants to support visual recommendations based on colors and shapes of products but wants to limit items to their fall 2023 catalog or to a specific supplier catalog. The goal is then similarity search constrained by complex metadata filtering followed by image access.

Before running these queries, there are many steps that must be completed to support the visual recommendations for the ecommerce company. This includes creating and managing datasets easily so they can be revised with every catalog update. Iteratively training models that will be used to extract embeddings for visual search, will also be needed with all the millions of images that are being constantly updated. Finally the embeddings will need to be effectively indexed to support similarity search quickly and easily at scale. While a vector database can solve the last part, a) the dataset management of product and image data, b) training with millions of relevant images, and finally, c) keeping track of all the metadata, embeddings, and images to respond to users, they all need seamless integration with the entire machine learning pipeline, not just vector or metadata search support.

*Source: https://www.visenze.com/discovery-suite/modules/smart-recommendations*

‍

Smart Retail

Ever seen a robot checking for empty shelves at a grocery or retail store? The robots go up and down the aisles looking for empty shelves to then determine what product needs to be added. This query includes matching thousands of vectors to their corresponding product name that requires classification followed by product lookup, at scale.

Again, before the queries can be executed, there are many tasks that need to be completed and managed effectively. All the labeled data must be created and managed to be easily updated with every new image of an empty shelf. Models must be constantly trained as new products are added or old products removed requiring large amounts of images. Finally all the embeddings need to be indexed to perform quickly and easily to support the business needs.

‍

Source: https://www.badger-technologies.com/resource/badger-resources.html

Building a Data Stack for Current and Future AI Applications

All of these examples share common requirements to index high-dimensional embeddings from any multimodal data type as well as to allow metadata-based filtering and retrieval of the corresponding data after the search. This necessitates certain requirements from the data layer such as:

High throughput vector search and classification
Filter on rich and evolving metadata
Ability to connect with original data
Seamless integrations with various steps of an AI pipeline beyond just a query or analytics framework
Reliability and stability at scale to support production
Production-ready, cloud-agnostic, often virtual private cloud (VPC) deployments

But can vector databases handle all this by themselves? Today’s popular databases can:

Store feature vectors and return an ID, but the ID needs to be managed separately and linked to different data types.
Sometimes allow users to attach a row of columns and include some metadata information but by nature of it being a few columns of metadata, it is difficult to represent typically complex application metadata
Rarely support complex filtering, such as overlaying intricate schemas that mimic graphs, even when we are talking about incumbent relational databases allowing vector search
Often lack the ability to manage multimodal data and provide provenance information

While vector databases are able to solve the vector indexing and search challenges, you need to add other databases and storage options to the data pipeline and architecture to handle the rest, resulting in a complex glued together system that is brittle and painful to maintain and reuse. We also cover the other database alternatives and their pros/cons in our blog on multimodal data requirements in detail.

Given that the traditional data and database tools result in a spaghetti data architecture, a technical debt that should be avoided, one solution then is to architect a purpose-built database that can manage complex and varied data types while exposing a unified interface to index, search, and retrieve those various data types. With the right implementation, it can save companies lots of headaches and wasted engineering time. ApertureDB is one such database. Databases like ApertureDB focus on multimodal data and AI giving businesses a unified, scalable, and easy-to-use solution for data management and analysis in today's fast-changing world.

What Next?

What to learn more? To continue digging deeper into the world of multimodal data and AI, check out these blogs on a) Why Do We Need A Purpose-Built Database For Multimodal Data? , b) Your Multimodal Data Is Constantly Evolving - How Bad Can It Get? , and c) How a Purpose-Built Database for Multimodal AI Can Save You Time and Money.

Last but not least, you can subscribe here to learn more about text and multimodal AI and data as we document lessons from our journey in our blog.

I want to acknowledge the insights and valuable edits from Laura Horvath and Drew Ogle.

‍

Can A RAG Chatbot Really Improve Content?

Blogs

Oct 15, 2024

Can A RAG Chatbot Really Improve Content?

We asked our chatbot questions like "Can ApertureDB store pdfs?" and the answer it gave..

Minute-Made Data Preparation with ApertureDB

Blogs

Oct 15, 2024

Minute-Made Data Preparation with ApertureDB

Working with visual data (images, videos) and its metadata is no picnic...

ApertureDB 2.0: Redefining Visual Data Management for AI

Blogs

Oct 15, 2024

ApertureDB 2.0: Redefining Visual Data Management for AI

A key to solving Visual AI challenges is to bring together the key learnings of...

Managing Visual Data for Machine Learning and Data Science. Painlessly.

Blogs

Oct 15, 2024

Managing Visual Data for Machine Learning and Data Science. Painlessly.

Visual data or image/video data is growing fast. ApertureDB is a unique database...

Building Real World RAG-based Applications with ApertureDB

Blogs

Nov 21, 2024

Building Real World RAG-based Applications with ApertureDB

Combining different AI technologies, such as LLMs, embedding models, and a database like ApertureDB that is purpose-built for multimodal AI, can significantly enhance the ability to retrieve and generate relevant content.

Blogs

Oct 15, 2024

Managing Visual Data for Machine Learning and Data Science. Painlessly.

Visual data or image/video data is growing fast. ApertureDB is a unique database...

Blogs

Oct 15, 2024

What’s in Your Visual Dataset?

CV/ML users need to find, analyze, pre-process as needed; and to visualize their images and videos along with any metadata easily...

Transforming Retail and Ecommerce with Multimodal AI

Blogs

Oct 15, 2024

Transforming Retail and Ecommerce with Multimodal AI

Multimodal AI can boost retail sales by enabling better user experience at lower cost but needs the right infrastructure...

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1

Blogs

Oct 15, 2024

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1

Multimodal AI, vector databases, large language models (LLMs)...

How a Purpose-Built Database for Multimodal AI Can Save You Time and Money

Blogs

Oct 15, 2024

How a Purpose-Built Database for Multimodal AI Can Save You Time and Money

With extensive data systems needed for modern applications, costs...

Blogs

Oct 15, 2024

Minute-Made Data Preparation with ApertureDB

Working with visual data (images, videos) and its metadata is no picnic...

Why Do We Need A Purpose-Built Database For Multimodal Data?

Blogs

Oct 15, 2024

Why Do We Need A Purpose-Built Database For Multimodal Data?

Recently, data engineering and management has grown difficult for companies building modern applications...

Building a Specialized Database for Analytics on Images and Videos

Blogs

Oct 15, 2024

Building a Specialized Database for Analytics on Images and Videos

ApertureDB is a database for visual data such as images, videos, embeddings and associated metadata like annotations, purpose-built for...

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2

Blogs

Oct 15, 2024

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2

Multimodal AI, vector databases, large language models (LLMs)...

Challenges and Triumphs: Multimodal AI in Life Sciences

Blogs

Oct 15, 2024

Challenges and Triumphs: Multimodal AI in Life Sciences

AI presents a new and unparalleled transformational opportunity for the life sciences sector...

Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?

Blogs

Oct 15, 2024

Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?

The data landscape has dramatically changed in the last two decades...

Blogs

Oct 15, 2024

Can A RAG Chatbot Really Improve Content?

We asked our chatbot questions like "Can ApertureDB store pdfs?" and the answer it gave..

Blogs

Oct 15, 2024

ApertureDB Now Available on DockerHub

Getting started with ApertureDB has never been easier or safer...

Are Vector Databases Enough for Visual Data Use Cases?

Blogs

Oct 15, 2024

Are Vector Databases Enough for Visual Data Use Cases?

ApertureDB vector search and classification functionality is offered as part of our unified API defined to...

Accelerate Industrial and Visual Inspection with Multimodal AI

Blogs

Oct 15, 2024

Accelerate Industrial and Visual Inspection with Multimodal AI

From worker safety to detecting product defects to overall quality control, industrial and visual inspection plays a crucial role...

Blogs

Oct 15, 2024

ApertureDB 2.0: Redefining Visual Data Management for AI

A key to solving Visual AI challenges is to bring together the key learnings of...

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2

Search And Retrieval Requirements of Sample Multimodal AI Applications

Chat Support or Question-Answering Using Semantic Search

Facial Recognition

Visual Recommendations

Smart Retail

Building a Data Stack for Current and Future AI Applications

What Next?

Related Posts

Ready to Accelerate your AI Workflows?