Blogs

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2

October 15, 2024
6 min read
Vishakha Gupta
Vishakha Gupta

In the first blog of this series, we covered the basics of multimodal AI, generative AI, and how vector databases enable semantic search over text and multimodal data . In this second blog of the series, we’ll look at real-life examples of where vector databases are used when building some text and multimodal AI applications like semantic search  often for question/answering solutions, recognizing faces  for security, and using robots to check store shelves . These examples will help us understand fundamental requirements when building various text and multimodal AI applications, and give us an idea of why we need advanced data solutions, beyond vector databases,  that can enable more complex and contextual data searches.

Search And Retrieval Requirements of Sample Multimodal AI Applications

Vector indexing, search, and classification are hard problems that are tackled by a growing collection of vector databases in the market or incumbents  that have introduced vector support. However , these databases can’t do everything that is expected from the data layer in a classic machine learning pipeline regardless of whether it's  for traditional use cases or for GenAI applications.

Let's walk through a few examples we have encountered when working with our users, to really understand what’s required from the data layer, to support multimodal AI in the real world.

Chat Support or Question-Answering Using Semantic Search

A common example we come across now is chat or question-answering support (let’s call it chatbot for now) thanks to LLM-based bots like ChatGPT that often use semantic search . Let’s start with a manufacturer that makes millions of different products that provide error codes when something goes wrong and that also include a product manual with purchase. A customer may experience an issue and sees an error code but doesn’t understand what it means. They visit the manufacturers chatbot and enter in the error code as a first step of similarity search. Next, they want to add more restrictions to the search by focusing on products sold in 2019 so adding more metadata filtering. In addition, the customer also wants a copy of the product manual to learn more. At this time, they not only want an ‘answer’ to the meaning of the error code but also want a link to the complete original manual for their review with some author or date restrictions. This evolved the query from similarity search to address constraints due to complex metadata filtering followed by pdf access requirements making it a problem where vector databases and even the incumbent databases with “vector add-ons” cannot solely provide a complete solution.

Facial Recognition

Another very simple but common use case example is finding a specific face  in a collection of faces for say, surveillance/safety concerns or just remembering someone’s name. First, we start with a similarity search of face images but then you want to restrict your search to limit by nationality where people from the United States are the only ones we want to find. This introduces a metadata attribute. Next, you remember that the person you are searching for acted in a specific movie, so you add that search constraint. You also would like to see video clips of the movies in which this person appeared. As you can see, your search gradually evolved from simple similarity search to more and more complex metadata filtering followed by video access, capabilities that vector databases were simply not built to support on their own.

Visual Recommendations

Let’s say an e-commerce company with over millions of products wants to support visual recommendations based on colors and shapes of products but wants to limit items to their fall 2023 catalog or to a specific supplier catalog. The goal is then similarity search constrained by complex metadata filtering followed by image access.

Before running these queries, there are many steps that must be completed to support the visual recommendations for the ecommerce company. This includes creating and managing datasets easily so they can be revised with every catalog update. Iteratively training models that will be used to extract embeddings for visual search, will also be needed with all the millions of images that are being constantly updated. Finally the embeddings will need to be effectively indexed to support similarity search quickly and easily at scale. While a vector database can solve the last part, a) the dataset management of product and image data, b) training with millions of relevant images, and finally, c) keeping track of all the metadata, embeddings, and images to respond to users, they all need seamless integration with the entire machine learning pipeline, not just vector or metadata search support.

Source: https://www.visenze.com/discovery-suite/modules/smart-recommendations

Smart Retail

Ever seen a robot checking for empty shelves  at a grocery or retail store?  The robots go up and down the aisles looking for empty shelves to then determine what product needs to be added. This query includes matching thousands of vectors to their corresponding product name that requires classification followed by product lookup, at scale.

Again, before the queries can be executed, there are many tasks that need to be completed and managed effectively. All the labeled data must be created and managed to be easily updated with every new image of an empty shelf. Models must be constantly trained as new products are added or old products removed requiring large amounts of images. Finally all the embeddings need to be indexed to perform quickly and easily to support the business needs.

Source: https://www.badger-technologies.com/resource/badger-resources.html

Building a Data Stack for Current and Future AI Applications

All of these examples share common requirements to index high-dimensional embeddings from any multimodal data type as well as to allow metadata-based filtering and retrieval of the corresponding data after the search. This necessitates certain requirements from the data layer such as:

  • High throughput vector search and classification
  • Filter on rich and evolving metadata
  • Ability to connect with original data
  • Seamless integrations with various steps of an AI pipeline beyond just a query or analytics framework
  • Reliability and stability at scale to support production
  • Production-ready, cloud-agnostic, often virtual private cloud (VPC) deployments

But can vector databases handle all this by themselves? Today’s popular databases can:

  • Store feature vectors and return an ID, but the ID needs to be managed separately and linked to different data types.
  • Sometimes allow users to attach a row of columns and include some metadata information but by nature of it being a few columns of metadata, it is difficult to represent typically complex application metadata
  • Rarely support complex filtering, such as overlaying intricate schemas that mimic graphs, even when we are talking about incumbent relational databases allowing vector search
  • Often lack the ability to manage multimodal data and provide provenance information

While vector databases are able to solve the vector indexing and search challenges, you need to add other databases and storage options to the data pipeline and architecture to handle the rest, resulting in a complex glued together system that is brittle and painful to maintain and reuse. We also cover the other database alternatives and their pros/cons in our blog on multimodal data requirements  in detail.

Given that the traditional data and database tools result in a spaghetti data architecture, a technical debt that should be avoided, one solution then is to architect a purpose-built database that can manage complex and varied data types while exposing a unified interface to index, search, and retrieve those various data types. With the right implementation, it can save companies lots of headaches and wasted engineering time. ApertureDB is one such database. Databases like ApertureDB focus on multimodal data and AI giving businesses a unified, scalable, and easy-to-use solution for data management and analysis in today's fast-changing world.

What Next?

What to learn more? To continue digging deeper into the world of multimodal data and AI, check out these blogs on a) Why Do We Need A Purpose-Built Database For Multimodal Data? , b) Your Multimodal Data Is Constantly Evolving - How Bad Can It Get? , and c) How a Purpose-Built Database for Multimodal AI Can Save You Time and Money.

Last but not least, you can subscribe here to learn more about text and multimodal AI and data as we document lessons from our journey in our blog.

I want to acknowledge the insights and valuable edits from Laura Horvath and Drew Ogle.

Related Posts

Transforming Retail and Ecommerce with Multimodal AI
Blogs
Transforming Retail and Ecommerce with Multimodal AI
Multimodal AI can boost retail sales by enabling better user experience at lower cost but needs the right infrastructure...
Read More
Read More
Challenges and Triumphs: Multimodal AI in Life Sciences
Blogs
Challenges and Triumphs: Multimodal AI in Life Sciences
AI presents a new and unparalleled transformational opportunity for the life sciences sector...
Read More
Read More
Building a Specialized Database for Analytics on Images and Videos
Blogs
Building a Specialized Database for Analytics on Images and Videos
ApertureDB is a database for visual data such as images, videos, embeddings and associated metadata like annotations, purpose-built for...
Read More
Read More
Managing Visual Data for Machine Learning and Data Science. Painlessly.
Blogs
Managing Visual Data for Machine Learning and Data Science. Painlessly.
Visual data or image/video data is growing fast. ApertureDB is a unique database...
Read More
Read More
Managing Visual Data for Machine Learning and Data Science. Painlessly.
Blogs
Managing Visual Data for Machine Learning and Data Science. Painlessly.
Visual data or image/video data is growing fast. ApertureDB is a unique database...
Read More
What’s in Your Visual Dataset?
Blogs
What’s in Your Visual Dataset?
CV/ML users need to find, analyze, pre-process as needed; and to visualize their images and videos along with any metadata easily...
Read More
Transforming Retail and Ecommerce with Multimodal AI
Blogs
Transforming Retail and Ecommerce with Multimodal AI
Multimodal AI can boost retail sales by enabling better user experience at lower cost but needs the right infrastructure...
Read More
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1
Blogs
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1
Multimodal AI, vector databases, large language models (LLMs)...
Read More
How a Purpose-Built Database for Multimodal AI Can Save You Time and Money
Blogs
How a Purpose-Built Database for Multimodal AI Can Save You Time and Money
With extensive data systems needed for modern applications, costs...
Read More
Minute-Made Data Preparation with ApertureDB
Blogs
Minute-Made Data Preparation with ApertureDB
Working with visual data (images, videos) and its metadata is no picnic...
Read More
Why Do We Need A Purpose-Built Database For Multimodal Data?
Blogs
Why Do We Need A Purpose-Built Database For Multimodal Data?
Recently, data engineering and management has grown difficult for companies building modern applications...
Read More
Building a Specialized Database for Analytics on Images and Videos
Blogs
Building a Specialized Database for Analytics on Images and Videos
ApertureDB is a database for visual data such as images, videos, embeddings and associated metadata like annotations, purpose-built for...
Read More
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2
Blogs
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2
Multimodal AI, vector databases, large language models (LLMs)...
Read More
Challenges and Triumphs: Multimodal AI in Life Sciences
Blogs
Challenges and Triumphs: Multimodal AI in Life Sciences
AI presents a new and unparalleled transformational opportunity for the life sciences sector...
Read More
Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?
Blogs
Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?
The data landscape has dramatically changed in the last two decades...
Read More
Can A RAG Chatbot Really Improve Content?
Blogs
Can A RAG Chatbot Really Improve Content?
We asked our chatbot questions like "Can ApertureDB store pdfs?" and the answer it gave..
Read More
ApertureDB Now Available on DockerHub
Blogs
ApertureDB Now Available on DockerHub
Getting started with ApertureDB has never been easier or safer...
Read More
Are Vector Databases Enough for Visual Data Use Cases?
Blogs
Are Vector Databases Enough for Visual Data Use Cases?
ApertureDB vector search and classification functionality is offered as part of our unified API defined to...
Read More
Accelerate Industrial and Visual Inspection with Multimodal AI
Blogs
Accelerate Industrial and Visual Inspection with Multimodal AI
From worker safety to detecting product defects to overall quality control, industrial and visual inspection plays a crucial role...
Read More
ApertureDB 2.0: Redefining Visual Data Management for AI
Blogs
ApertureDB 2.0: Redefining Visual Data Management for AI
A key to solving Visual AI challenges is to bring together the key learnings of...
Read More
Stay Connected:
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.