ApertureData

Image and video data, or visual data, has seen unprecedented growth in the last few years. Applications across domains are shifting to Machine Learning (ML) and Data Science to create new products with better user experiences and to derive insights from this vast and rich collection of visual data. These insights help businesses gain a better understanding of their customers and provide inference points for making complex decisions.

c. 2016

In 2016, Luis, the rest of our team at Intel Labs, and I, started looking at visual cloud infrastructure for large scale ML deployments . We have spoken with 100s of data engineers, ML (infrastructure) engineers, data scientists, and systems researchers working in multiple application domains, such as medical imaging, smart retail, sports, entertainment, and smart city since then. These conversations have confirmed the tremendous progress made in improving the performance and accuracy of ML models as well as the shift in focus towards developing infrastructure for large scale deployment and improving the data quality. Practitioners routinely tell us that big visual data management is either an active problem for them or one that they see on their very near horizon. These insights and our desire to address the challenges of visual data management led us to form ApertureData. To better understand our solution, let us first look more specifically at the issues that users have to face.

Visual Data Infrastructure Challenges Today

Visual data is a collection of images and videos that typically grows over time. For example, visual data could be X-rays or MRI scans of patients in the radiology department of a health center, pictures of clothes from different retailers, or traffic camera videos to detect pedestrian patterns. This visual data is usually accompanied by some metadata, such as patient age, source of data capture, date, location, and other attributes that exist at the time of creation. Over time, this metadata continues to be enhanced with regions of interest annotations, feature vectors , and more application context. The visual data itself may be needed in different resolutions or formats, depending on the end goal, for example, display vs. training.

Depending on how far along an organization is in their ML deployment journey, it faces three basic problems when working with this information-rich but complex to manage visual data:

The semi-duplicate dataset problem - Often, a large team of data scientists train on smaller subsets of a larger dataset so that they can develop models that focus on different classes of entities. For instance, training the model to recognize different animals or training the model to recognize dogs specifically. Some of the current and popularly used ML models often require constant retraining due to updates to input data, misclassifications, or improvements in the datasets to fix biases. Parameters describing the dataset such as sources of data capture, annotations, the amount of space a certain entity class occupies in an image or frame, can be stored in comma separated value (.csv, .xlsx) files. As a result, for each new training cycle, the data scientists lose precious time and resources in creating copies of visual data in their storage buckets, parsing the csv files to understand this data before they can prepare it for consumption by ML frameworks like PyTorch, and finally launch the training tasks. Given that their other teammates might be training for potentially overlapping classes (e.g. all dogs are animals), this can also result in duplication of dataset across the team resulting not just in wasted time but also storage, networking, and compute resources involved in replicating data.
The technical debt / glue code problem - The primary challenge with visual data is its multimodal nature. When creating infrastructure to store and search efficiently, besides handling size and volume of visual data, the solution needs to tackle images, potentially videos or individual frames, regions of interest within these images or frames along with corresponding labels, and all the other application metadata. With the lack of visual-first data management options that understand these special characteristics, this visual data and metadata are often scattered across multiple disparate systems such as cloud buckets and databases, with wrapper scripts to bind queries to multiple systems and interchange formats. This is essentially glue code. As visual data is often pre-processed as part of a ML pipeline (e.g. cropped, zoomed, rotated, normalized), additional glue code is continually added to these scripts to layer data transformations and ML functionalities. This glue code leads to an increasing amount of technical debt with multiple data access points and a maintenance nightmare, which worsens as an ML deployment scales to tackle larger datasets. It requires constant upkeep as versions or interface of various components in the pipeline change, causing increased usage of resources (extra engineers, more infrastructure), go-to-market (GTM) delays, increased risk of failure of the infrastructure, and loss of revenue.
The ML-in-practice problem - ML practitioners need tools to manipulate datasets. For instance, the ability to explore a given visual dataset to ensure they are creating a balanced training set (e.g. an animal dataset should contain not just cats or dogs but horses, lions, tigers and other animals). Once such a dataset is identified and when experimenting with models that achieve the best accuracy for a desired task or for comparing various models, the dataset needs to be stable, like a snapshot. The lack of ability to search through visual datasets and create snapshots of the desired dataset across the glue code layers discussed earlier lead to extremely slow alternatives of manual inspection and copies as checkpoints. Beyond these, certain teams might want to consider using feature vectors to speed up their ML or to perform similarity searches. Given there are limited options for feature indexing and searching, especially ones that can live across reboots, most teams resort to using some internal solutions. Solutions to all these ML-in-practice problems tend to be team or organization specific, and are often not well integrated with the wrapper or glue scripts described earlier, adding further to the mountain of technical debt.

Visual data management in the context of ML and data science is one of the early pain points that needs to be addressed by teams across various industries so they can get desired results from using ML. Beyond its impact on user productivity, there is also a sizeable business impact that results from a misuse or overuse of resources due to a lack of unified solution, there is a hiring cost due to needing more data scientists or mismatching engineering skill set and finally but most importantly, there is a market cost associated with the delays that result from setting up infrastructure. We believe these problems can be solved by creating a new way to manage visual datasets, which lays the path for an increasingly ML-driven future.

ML-Ready Visual Database Infrastructure

To solve the visual data management problems and create a solution that brings step change innovation, we asked ourselves:

Could we design a high-performance, scalable system that recognized the unique nature of visual data and offered interfaces designed to handle it?
What would ML users’ lives look like if they could spend most of the time focusing on ML and data science rather than worrying about their data infrastructure?
Could we combine feature search with metadata search to more closely match expected results from a user query?
Could we offer a unified interface and backend infrastructure that can cater to all the stages of ML and any use case of visual data?
Could we do more for visual ML?

The questions led us to create the open source Visual Data Management System . Using this new system, we enabled a new class of applications to scale to much larger data sizes at radically improved performance. This open source system forms the core of our product, ApertureDB: a unique, and purpose-built database for visual analytics.

Introducing ApertureDB

ApertureDB stores and manages images, videos, feature vectors, and associated metadata like annotations. It natively supports complex searching and preprocessing operations over media objects. ApertureDB’s visual data-first approach saves hundreds of hours of data platform engineering efforts spent by data science and ML engineering teams, setting them up for success when scaling their visual analytics pipelines. It removes the time consuming tasks of manually linking visual data with metadata, related access challenges, and overhead of maintaining multiple disparate data systems.

Using ApertureDB, (potentially smaller) ML and data science teams can focus on application development and on providing value to their customers. By offloading data infrastructure scaling to ApertureDB, they get an average 15x increase in data access speed. For large ML deployment, ApertureDB provides network overhead reduction of up to 63% due to the optimizations ApertureDB offers via the unified interface.

Partner with us - use ApertureDB

If your organization uses or intends to use ML on visual data (small or large team) or you are simply curious about our technology, our approach to infrastructure development, and where we are headed, please contact us team@aperturedata.io or sign up for a free trial .

We will be documenting our journey in these blogs, click here to subscribe.

I want to thank Luis Remis , ApertureData co-founder, for helping focus the content. I also want to acknowledge the insights and valuable edits from Namrata Banerjee, Jim Blakley, Jonathan Gray, Priyanka Tembey, and Romain Cledat.

‍

Tags:

Visual Data

Dataset preparation and management

Machine Learning

Usability and Debugging

Related Blogs

Beyond SQL: The Query Language Multimodal AI Really Needs

Blogs

September 17, 2025

Beyond SQL: The Query Language Multimodal AI Really Needs

ApertureDB has its own query language, AQL, using JSON as its native format, because traditional languages like SQL, Cypher, were insufficient for managing, searching, and processing multimodal AI data at scale. AQL allows for expressing complex data types and operations, and ApertureDB also offers simplified interfaces, including SQL and SPARQL wrappers, and natural language access through RAG and MCP. The core idea is to prioritize the problem's solution over existing language barriers, aligning with the evolving needs of AI systems.

Watch Now

ApertureDB and AI Workflows: Building Blocks of Multimodal AI Applications

Blogs

September 1, 2025

ApertureDB and AI Workflows: Building Blocks of Multimodal AI Applications

ApertureDB AI Workflows are designed to simplify the creation of multimodal AI applications by providing modular, flexible, and purpose-built components for AI pipelines. These workflows automate common AI/ML tasks such as data ingestion, search, and data correlation, integrating with ApertureDB's graph, vector, and multimodal capabilities, and partnering with models and services from other tools.

Watch Now

The Misunderstood World of Knowledge Graphs

Blogs

July 21, 2025

The Misunderstood World of Knowledge Graphs

Graph databases are powerful in what they can let us build but there are a lot of misconceptions limiting their adoption. This blog addresses those and shows what's possible.

Watch Now

Blogs

June 6, 2025

Smarter Agents Start with Smarter Data

Building smart AI agents isn't just about better models — it's about better data infrastructure. This blog explores why legacy stacks fail multimodal AI and sets the stage for modern solutions that enable agents to reason, act, and scale.

Watch Now

Building Real World RAG-based Applications with ApertureDB

Blogs

Nov 21, 2024

Building Real World RAG-based Applications with ApertureDB

Combining different AI technologies, such as LLMs, embedding models, and a database like ApertureDB that is purpose-built for multimodal AI, can significantly enhance the ability to retrieve and generate relevant content.

Managing Visual Data for Machine Learning and Data Science. Painlessly.

Blogs

Oct 15, 2024

Managing Visual Data for Machine Learning and Data Science. Painlessly.

Visual data or image/video data is growing fast. ApertureDB is a unique database...

Blogs

Oct 15, 2024

What’s in Your Visual Dataset?

CV/ML users need to find, analyze, pre-process as needed; and to visualize their images and videos along with any metadata easily...

Transforming Retail and Ecommerce with Multimodal AI

Blogs

Oct 15, 2024

Transforming Retail and Ecommerce with Multimodal AI

Multimodal AI can boost retail sales by enabling better user experience at lower cost but needs the right infrastructure...

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1

Blogs

Oct 15, 2024

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1

Multimodal AI, vector databases, large language models (LLMs)...

How a Purpose-Built Database for Multimodal AI Can Save You Time and Money

Blogs

Oct 15, 2024

How a Purpose-Built Database for Multimodal AI Can Save You Time and Money

With extensive data systems needed for modern applications, costs...

Minute-Made Data Preparation with ApertureDB

Blogs

Oct 15, 2024

Minute-Made Data Preparation with ApertureDB

Working with visual data (images, videos) and its metadata is no picnic...

Why Do We Need A Purpose-Built Database For Multimodal Data?

Blogs

Oct 15, 2024

Why Do We Need A Purpose-Built Database For Multimodal Data?

Recently, data engineering and management has grown difficult for companies building modern applications...

Building a Specialized Database for Analytics on Images and Videos

Blogs

Oct 15, 2024

Building a Specialized Database for Analytics on Images and Videos

ApertureDB is a database for visual data such as images, videos, embeddings and associated metadata like annotations, purpose-built for...

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2

Blogs

Oct 15, 2024

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2

Multimodal AI, vector databases, large language models (LLMs)...

Challenges and Triumphs: Multimodal AI in Life Sciences

Blogs

Oct 15, 2024

Challenges and Triumphs: Multimodal AI in Life Sciences

AI presents a new and unparalleled transformational opportunity for the life sciences sector...

Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?

Blogs

Oct 15, 2024

Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?

The data landscape has dramatically changed in the last two decades...

Can A RAG Chatbot Really Improve Content?

Blogs

Oct 15, 2024

Can A RAG Chatbot Really Improve Content?

We asked our chatbot questions like "Can ApertureDB store pdfs?" and the answer it gave..

Blogs

Oct 15, 2024

ApertureDB Now Available on DockerHub

Getting started with ApertureDB has never been easier or safer...

Are Vector Databases Enough for Visual Data Use Cases?

Blogs

Oct 15, 2024

Are Vector Databases Enough for Visual Data Use Cases?

ApertureDB vector search and classification functionality is offered as part of our unified API defined to...

Accelerate Industrial and Visual Inspection with Multimodal AI

Blogs

Oct 15, 2024

Accelerate Industrial and Visual Inspection with Multimodal AI

From worker safety to detecting product defects to overall quality control, industrial and visual inspection plays a crucial role...

ApertureDB 2.0: Redefining Visual Data Management for AI

Blogs

Oct 15, 2024

ApertureDB 2.0: Redefining Visual Data Management for AI

A key to solving Visual AI challenges is to bring together the key learnings of...

Managing Visual Data for Machine Learning and Data Science. Painlessly.

c. 2016

Visual Data Infrastructure Challenges Today

ML-Ready Visual Database Infrastructure

Introducing ApertureDB

Partner with us - use ApertureDB

I want to thank Luis Remis , ApertureData co-founder, for helping focus the content. I also want to acknowledge the insights and valuable edits from Namrata Banerjee, Jim Blakley, Jonathan Gray, Priyanka Tembey, and Romain Cledat.

Related Blogs

Start Your Multimodal AI Journey Today