Blogs

Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?

July 22, 2024
8 min read
Vishakha Gupta
Vishakha Gupta

The data landscape has dramatically changed in the last two decades. Twenty years ago, a data scientist may have only interacted with standard structured databases such as PostgreSQL. But today, as companies race to leverage the growing capacities of AI models, data scientists and engineers juggle multiple data types at once—text, image, video , etc. Like jumping from two dimensions to three, the shift to multimodal data is simultaneously exciting and challenging. It’s key to understand not only the changing landscape, but also how to get maximum value from multimodal data and choose the right tools.

How does your data evolve?

Multimodal data is inherently complex. But as AI use cases explode, multimodal data practices are rapidly evolving in step. A few elements typically drive this evolution.

Annotations

Annotating multimodal data is key to accurately training models during supervised learning. Annotations already enhance data richness and complexity, but annotation processes can also change over time. In medicine, for instance, breakthrough discoveries could require teams to refine their annotation process, and new medical imaging techniques might demand more granular annotations to keep models up to date.

Embeddings

Embeddings encode different types of data in a shared vector space, making it easy to identify and represent relationships between those vectors. For example, embeddings constitute a core component of user-facing recommendation engines because they enable similarity searches.

Embeddings can change over time for a couple of reasons. Changes in the real world can lead to changes in the underlying embedding spaces. For example, social media sentiment data may shift during significant world events.

Changes can also come from within. Variations in a company’s data collection cadence and methodology, updates to an AI model, or the introduction of new modalities can all cause an evolution in the underlying embeddings.

New classifications

As a company’s AI models mature, those models can be used to extract (newer) insights from incoming data. Imagine a company that’s trained a model to detect faces. With that newly-trained model, the company can create an entirely new classification set of facial emotions, which become its own data points.

Derived data

Companies can combine information from multiple modalities into richer, more useful representations—this is derived data. For example, a company may want to perform a sentiment analysis after collecting product reviews that include text and uploaded user images. By concatenating the two sets of embeddings, the company now has a derived dataset that combines information from both the text and images, which will be useful for training models that need to understand the relationship between the two.

Derived data can evolve for a number of reasons. In the example above, the company could start enabling users to upload videos in addition to text and images. Sometimes, in order to saturate training of large models on fast machines, companies might be forced to create copies of their data, in the formats required as input for these models, leading to additional provenance  and data governance information.

Provenance information

As AI models play an increasingly active role in real-life applications, capturing provenance information—the metadata that describes data’s origin and history—is becoming critical.

Say a healthcare provider relies on AI models with MRI image inputs for diagnoses. It’s crucial that the provider can track the source of the data (machine, parameters used), immediate processing steps (noise reduction, motion correction), manual annotations, and transformation history (fusion with other modalities, like CT or PET scans) so that it can quickly address any mistakes from the model.

New use cases

One of the most satisfying aspects of working with AI models is getting them to succeed. Say an e-commerce company builds a recommendation engine to show its customers products with similar colors. Customers start clicking on and buying those related products. Success! Now the growth team wants to evolve the recommendation model to include products made of similar materials, so the engineering team must add further filters on the product metadata.

Because AI models are limitless in their applications, it’s almost guaranteed that a given company’s set of use cases will evolve over time.

Why is it challenging to manage evolving multimodal data?

Schema challenges with relational databases

Traditional relational databases, such as PostgreSQL or MySQL, have served engineers and data scientists well for decades. But when it comes to multimodal data, these databases fall short for one glaring reason: rigid relational schemas do not play well with the complex relationships that exist in multimodal data.

For example, imagine an AI model that helps doctors recommend treatments based on a mixture of doctor, patient, treatment, and CT scan data. While one could theoretically structure four PostgreSQL datasets with interlinking foreign keys, many-to-many mappings will require additional reference tables. The liquid, evolving nature of multimodal data is at odds with rigid schema enforcement, so engineers do themselves a favor by scaling in systems that allow for flexible schemas over time.

Datasets

A key component of training AI models is the specific data used, but managing versions of the same dataset can be tedious and costly. For example, say a recommendation model is trained on a dataset of sofa images. The company begins working with new vendors and gets a fresh dataset of sofa images, so the engineering team trains v2 of the model with the new data but notices that the model’s recommendation quality deteriorates.

In cases like this, it’s important to be able to track the changes made to the model and the actual datasets used. Unfortunately, many data teams revert to manually storing copies of the data. This quickly becomes not only an organizational nightmare – which dataset trained which model? - but also a pricey one, since multiple versions of similar datasets increase storage costs.

Scalability

As a company’s metadata and data grow, engineers and data scientists are forced to reckon with scalability. Not only must they consider raw storage capacity and cost (vertical scaling) but also how to distribute workload across multiple nodes (horizontal scaling). Without proper planning, companies can quickly find themselves paying too much for multiple databases, sacrificing the performance of their AI workflows, or both. Balancing these concerns with ease-of-setup is a challenge for every team.

Easily connecting processing pipelines with data updates

As a business grows, so does its data ingestion. Think of how an e-commerce company’s product catalog constantly evolves, or how a travel service aggregator collects new reviews from its customers. As existing data pipelines grow and change, it’s crucial to seamlessly connect these datasets with existing data infrastructure, enrich the data with metadata, and fold them into AI model workflows to glean useful insights.

This is easier said than done: today, data and engineering teams find it challenging to update existing data schemas, process visual data, and easily label data to enable their workflows to keep pace with new ingestion.

Challenges with consistent views and transactions across data pieces

It’s easy to underestimate the importance of standardizing the engineering and data teams’ view of multimodal data. If a company uses multiple disconnected databases, not only will these teams (and by extension, the whole company) struggle to build a single view of the data, but it will also be tricky to build consistent read/write transaction processes across these multiple databases.

Unfortunately, this is the reality for too many companies today: because there are few products tailor-made  for multimodal data management, teams often opt for several disjointed databases, setting themselves up for an endless Sisyphus-style struggle to maintain a consistent view of their multimodal data.

How do we simplify these challenges with ApertureDB?

Data storage and preprocessing

Out of the gate, ApertureDB supports storage of multimodal data types like documents, images, videos. The query interface has in-built preprocessing  support for image and video data, simplifying downstream processes that rely on these data and helping searches and analyses run faster. This also removes the need for users to create copies of this data to support various format requirements downstream and often results in lowering network traffic since most such operations result in downsampling the data.

Vector database

ApertureDB comes with a vector database, optimized for storing, indexing, and querying high-dimensional vector data. This enables several use cases, like:

  1. Powering recommendation engines with similarity searches
  2. Building accurate chatbots with RAG
  3. Enabling powerful search applications with semantic and multimodal searches

In-memory graph database: the connective tissue

Importantly, ApertureDB comes with an in-memory graph database  that stores application metadata as a knowledge graph. By leveraging the flexibility of a graph database, users can seamlessly connect metadata between any user-defined entities as well as their vector representations and original data.

For example, users can connect AI models to the data used to train them, task the model with classifying new data, and attach accuracy values to the new classifications. This enables searches such as “Find images classified by Model X where accuracy is > 0.9.” This also allows users to combine their vector searches with advanced graph filtering before accessing the required data in a suitable format for downstream ML processing.

The graph database also makes it easy to adjust schemas  on the fly as AI needs change, although ApertureDB does not require users to declare schemas up front.

Query engine: unifying interface for applications

ApertureDB features a unified API  across all the aforementioned data types  based on a native JSON-based query language, coordinated by an orchestrator. Not only does this API help standardize a team’s view of its multimodal data, but it also helps ApertureDB users avoid needing to compose queries that deal with multiple systems.

Transaction support across various modalities

ApertureDB implements ACID transactions for the queries spanning the different data types thus offering relevant database guarantees at the level of these complex objects.  

Integrations across ML pipelines

ApertureDB’s Python SDK offers convenient ETL and ML processing wrappers over the JSON query language, and simplifies integrations across the AI toolchain. This makes it easy for engineers to write standardized queries to serve multimodal data to their applications in the required format.

Schema dashboard

ApertureDB offers a dashboard UI that allows users to easily check what objects exist in a dataset, the object properties, and how different objects relate to each other.

This dashboard makes it surprisingly simple for data science, engineering, and analytics teams to manage the complex relationships between multimodal data types.

Conclusion

AI workflows are exploding. Every industry, from e-commerce to logistics to medicine, is racing to uncover new uses for ever-evolving multimodal data. Engineers must keep up with the pace. This is why we built ApertureDB: to give engineers and data scientists a purpose-built tool for multimodal data management, search, and visualization which could replace the hodge-podge of DIY solutions that existed in the market.

If you’re interested in learning more about how ApertureDB works, reach out to us at team@aperturedata.io . We have built an industry-leading database for multi-modal AI to future-proof data pipelines as multimodal AI methods evolve. Stay informed about our journey by subscribing to our blog.

I want to acknowledge the insights and valuable edits from Ian Yanusko as well as feedback from Ayla Khan (Recursion Pharmaceuticals).
Tags:

Related Blogs

Building Real World RAG-based Applications with ApertureDB
Blogs
Building Real World RAG-based Applications with ApertureDB
LLMs, RAGs, Chatbots, Agents. All hot topics! 🔥 But what does it mean to implement these and make them work well? See some real examples built on ApertureDB's purpose-built multimodal vector db.
Read More
Watch Now
Applied
How do you find what’s in your image or video datasets?
Videos & Podcasts
How do you find what’s in your image or video datasets?
See how ApertureDB Web Frontend simplifies navigating large collections of visual data...
Read More
Watch Now
Product
A Database for Multimodal AI
Videos & Podcasts
A Database for Multimodal AI
GenAI zoo talk on ApertureDB and how to build a chatbot and multimodal AI...
Read More
Watch Now
Product
Data science challenges in extracting value from image and video based data
Videos & Podcasts
Data science challenges in extracting value from image and video based data
Learn why data scientists and ML practitioners working on visual data need...
Read More
Watch Now
Applied
Building Real World RAG-based Applications with ApertureDB
Blogs
Building Real World RAG-based Applications with ApertureDB
Combining different AI technologies, such as LLMs, embedding models, and a database like ApertureDB that is purpose-built for multimodal AI, can significantly enhance the ability to retrieve and generate relevant content.
Read More
Managing Visual Data for Machine Learning and Data Science. Painlessly.
Blogs
Managing Visual Data for Machine Learning and Data Science. Painlessly.
Visual data or image/video data is growing fast. ApertureDB is a unique database...
Read More
What’s in Your Visual Dataset?
Blogs
What’s in Your Visual Dataset?
CV/ML users need to find, analyze, pre-process as needed; and to visualize their images and videos along with any metadata easily...
Read More
Transforming Retail and Ecommerce with Multimodal AI
Blogs
Transforming Retail and Ecommerce with Multimodal AI
Multimodal AI can boost retail sales by enabling better user experience at lower cost but needs the right infrastructure...
Read More
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1
Blogs
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1
Multimodal AI, vector databases, large language models (LLMs)...
Read More
How a Purpose-Built Database for Multimodal AI Can Save You Time and Money
Blogs
How a Purpose-Built Database for Multimodal AI Can Save You Time and Money
With extensive data systems needed for modern applications, costs...
Read More
Minute-Made Data Preparation with ApertureDB
Blogs
Minute-Made Data Preparation with ApertureDB
Working with visual data (images, videos) and its metadata is no picnic...
Read More
Why Do We Need A Purpose-Built Database For Multimodal Data?
Blogs
Why Do We Need A Purpose-Built Database For Multimodal Data?
Recently, data engineering and management has grown difficult for companies building modern applications...
Read More
Building a Specialized Database for Analytics on Images and Videos
Blogs
Building a Specialized Database for Analytics on Images and Videos
ApertureDB is a database for visual data such as images, videos, embeddings and associated metadata like annotations, purpose-built for...
Read More
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2
Blogs
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2
Multimodal AI, vector databases, large language models (LLMs)...
Read More
Challenges and Triumphs: Multimodal AI in Life Sciences
Blogs
Challenges and Triumphs: Multimodal AI in Life Sciences
AI presents a new and unparalleled transformational opportunity for the life sciences sector...
Read More
Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?
Blogs
Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?
The data landscape has dramatically changed in the last two decades...
Read More
Can A RAG Chatbot Really Improve Content?
Blogs
Can A RAG Chatbot Really Improve Content?
We asked our chatbot questions like "Can ApertureDB store pdfs?" and the answer it gave..
Read More
ApertureDB Now Available on DockerHub
Blogs
ApertureDB Now Available on DockerHub
Getting started with ApertureDB has never been easier or safer...
Read More
Are Vector Databases Enough for Visual Data Use Cases?
Blogs
Are Vector Databases Enough for Visual Data Use Cases?
ApertureDB vector search and classification functionality is offered as part of our unified API defined to...
Read More
Accelerate Industrial and Visual Inspection with Multimodal AI
Blogs
Accelerate Industrial and Visual Inspection with Multimodal AI
From worker safety to detecting product defects to overall quality control, industrial and visual inspection plays a crucial role...
Read More
ApertureDB 2.0: Redefining Visual Data Management for AI
Blogs
ApertureDB 2.0: Redefining Visual Data Management for AI
A key to solving Visual AI challenges is to bring together the key learnings of...
Read More

Ready to Accelerate your AI Workflows?

Unlock 10X productivity and simplify multimodal AI data management with ApertureDB—try it for free or schedule a demo today!

Stay Connected:
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.