Blogs

Managing Visual Data for Machine Learning and Data Science. Painlessly.

October 15, 2024
7 min read
Vishakha Gupta
Vishakha Gupta

Image and video data, or visual data, has seen unprecedented growth in the last few years. Applications across domains are shifting to Machine Learning (ML) and Data Science to create new products with better user experiences and to derive insights from this vast and rich collection of visual data. These insights help businesses gain a better understanding of their customers and provide inference points for making complex decisions.

c. 2016

In 2016, Luis, the rest of our team at Intel Labs, and I, started looking at visual cloud infrastructure  for large scale ML deployments . We have spoken with 100s of data engineers, ML (infrastructure) engineers, data scientists, and systems researchers working in multiple  application domains, such as medical imaging, smart retail, sports, entertainment, and smart city  since then. These conversations have confirmed the tremendous progress made in improving the performance and accuracy of ML models as well as the shift in focus towards developing infrastructure for large scale deployment and improving the data quality.  Practitioners routinely tell us that big visual data management is either an active problem for them or one that they see on their very near horizon. These insights and our desire to address the challenges of visual data management led us to form ApertureData. To better understand our solution, let us first look more specifically at the issues that users have to face.

Visual Data Infrastructure Challenges Today

Visual data is a collection of images and videos that typically grows over time. For example, visual data  could be X-rays or MRI scans of patients in the radiology department of a health center, pictures of clothes from different retailers, or traffic camera videos to detect pedestrian patterns. This visual data is usually accompanied by some metadata, such as patient age, source of data capture, date, location, and other attributes that exist at the time of creation. Over time, this metadata continues to be enhanced with regions of interest  annotations, feature vectors , and more application context. The visual data itself may be needed in different resolutions or formats, depending on the end goal, for example, display vs. training.

Depending on how far along an organization is in their ML deployment journey, it faces three basic problems when working with this information-rich but complex to manage visual data:

  1. The semi-duplicate dataset problem - Often, a large team of data scientists train on smaller subsets of a larger dataset so that they can develop models that focus on different classes of entities. For instance, training the model to recognize different animals or training the model to recognize dogs specifically. Some of the current and popularly used ML models often require constant retraining due to updates to input data, misclassifications, or improvements in the datasets to fix biases. Parameters describing the dataset such as sources of data capture, annotations, the amount of space a certain entity class occupies in an image or frame, can be stored in comma separated value (.csv, .xlsx) files. As a result, for each new training cycle, the data scientists lose precious time and resources in creating copies of visual data in their storage buckets, parsing the csv files to understand this data before they can prepare it for consumption by ML frameworks like PyTorch, and finally launch the training tasks. Given that their other teammates might be training for potentially overlapping classes (e.g. all dogs are animals), this can also result in duplication of dataset across the team resulting not just in wasted time but also storage, networking, and compute resources involved in replicating data.
  2. The technical debt / glue code problem  - The primary challenge with visual data is its multimodal nature. When creating infrastructure to store and search efficiently, besides handling size and volume of visual data, the solution needs to tackle images, potentially videos or individual frames, regions of interest within these images or frames along with corresponding labels, and all the other application metadata. With the lack of visual-first data management options that understand these special characteristics, this visual data and metadata are often scattered across multiple disparate systems such as cloud buckets and databases, with wrapper scripts to bind queries to multiple systems and interchange formats. This is essentially glue code. As visual data is often pre-processed as part of a ML pipeline (e.g. cropped, zoomed, rotated, normalized), additional glue code is continually added to these scripts to layer data transformations and ML functionalities. This glue code leads to an increasing amount of technical debt  with multiple data access points and a maintenance nightmare, which worsens as an ML deployment scales to tackle larger datasets. It requires constant upkeep as versions or interface of various components in the pipeline change, causing increased usage of resources (extra engineers, more infrastructure), go-to-market (GTM) delays, increased risk of failure of the infrastructure, and loss of revenue.
  3. The ML-in-practice problem  - ML practitioners need tools to manipulate datasets. For instance, the ability to explore a given visual dataset to ensure they are creating a balanced training set (e.g. an animal dataset should contain not just cats or dogs but horses, lions, tigers and other animals). Once such a dataset is identified and when experimenting with models that achieve the best accuracy for a desired task or for comparing various models, the dataset needs to be stable, like a snapshot. The lack of ability to search through visual datasets and create snapshots of the desired dataset across the glue code layers discussed earlier lead to extremely slow alternatives of manual inspection and copies as checkpoints. Beyond these, certain teams might want to consider using feature vectors  to speed up their ML or to perform similarity searches. Given there are limited options  for feature indexing and searching, especially ones that can live across reboots, most teams resort to using some internal solutions. Solutions to all these ML-in-practice problems tend to be team or organization specific, and are often not well integrated with the wrapper or glue scripts described earlier, adding further to the mountain of technical debt.

Visual data management in the context of ML and data science is one of the early pain points  that needs to be addressed by teams across various industries so they can get desired results from using ML. Beyond its impact on user productivity, there is also a sizeable business impact that results from a misuse or overuse of resources due to a lack of unified solution, there is a hiring cost due to needing more data scientists or mismatching engineering skill set and finally but most importantly, there is a market cost associated with the delays that result from setting up infrastructure. We believe these problems can be solved by creating a new way to manage visual datasets, which lays the path for an increasingly ML-driven future.

ML-Ready Visual Database Infrastructure

To solve the visual data management problems and create a solution that brings step change innovation, we asked ourselves:

  1. Could we design a high-performance, scalable system that recognized the unique nature of visual data and offered interfaces designed to handle it?
  2. What would ML users’ lives look like if they could spend most of the time focusing on ML and data science rather than worrying about their data infrastructure?
  3. Could we combine feature search with metadata search to more closely match expected results from a user query?
  4. Could we offer a unified interface and backend infrastructure that can cater to all the stages of ML and any use case of visual data?
  5. Could we do more for visual ML?

The questions led us to create the open source Visual Data Management System . Using this new system, we enabled a new class of applications to scale to much larger data sizes at radically improved performance. This open source system forms the core of our product, ApertureDB:  a unique, and purpose-built database for visual analytics.

Introducing ApertureDB

ApertureDB stores and manages images, videos, feature vectors, and associated metadata like annotations. It natively supports complex searching and preprocessing operations over media objects. ApertureDB’s visual data-first approach saves hundreds of hours of data platform engineering efforts spent by data science and ML engineering teams, setting them up for success when scaling their visual analytics pipelines. It removes the time consuming tasks of manually linking visual data with metadata, related access challenges, and overhead of maintaining multiple disparate data systems.

Using ApertureDB, (potentially smaller) ML and data science teams can focus on application development and on providing value to their customers. By offloading data infrastructure scaling to ApertureDB, they get an average 15x increase in data access speed. For large ML deployment, ApertureDB provides network overhead reduction of up to 63% due to the optimizations ApertureDB offers via the unified interface.

Partner with us - use ApertureDB

If your organization uses or intends to use ML on visual data (small or large team) or you are simply curious about our technology, our approach to infrastructure development, and where we are headed, please contact us team@aperturedata.io or sign up for a free trial .

We will be documenting our journey in these blogs, click here  to subscribe.

I want to thank Luis Remis , ApertureData co-founder, for helping focus the content. I also want to acknowledge the insights and valuable edits from Namrata Banerjee, Jim Blakley, Jonathan Gray, Priyanka Tembey, and Romain Cledat.

Related Posts

Accelerate Industrial and Visual Inspection with Multimodal AI
Blogs
Accelerate Industrial and Visual Inspection with Multimodal AI
From worker safety to detecting product defects to overall quality control, industrial and visual inspection plays a crucial role...
Read More
Read More
Building a Specialized Database for Analytics on Images and Videos
Blogs
Building a Specialized Database for Analytics on Images and Videos
ApertureDB is a database for visual data such as images, videos, embeddings and associated metadata like annotations, purpose-built for...
Read More
Read More
ApertureDB Now Available on DockerHub
Blogs
ApertureDB Now Available on DockerHub
Getting started with ApertureDB has never been easier or safer...
Read More
Read More
Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?
Blogs
Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?
The data landscape has dramatically changed in the last two decades...
Read More
Read More
Managing Visual Data for Machine Learning and Data Science. Painlessly.
Blogs
Managing Visual Data for Machine Learning and Data Science. Painlessly.
Visual data or image/video data is growing fast. ApertureDB is a unique database...
Read More
What’s in Your Visual Dataset?
Blogs
What’s in Your Visual Dataset?
CV/ML users need to find, analyze, pre-process as needed; and to visualize their images and videos along with any metadata easily...
Read More
Transforming Retail and Ecommerce with Multimodal AI
Blogs
Transforming Retail and Ecommerce with Multimodal AI
Multimodal AI can boost retail sales by enabling better user experience at lower cost but needs the right infrastructure...
Read More
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1
Blogs
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1
Multimodal AI, vector databases, large language models (LLMs)...
Read More
How a Purpose-Built Database for Multimodal AI Can Save You Time and Money
Blogs
How a Purpose-Built Database for Multimodal AI Can Save You Time and Money
With extensive data systems needed for modern applications, costs...
Read More
Minute-Made Data Preparation with ApertureDB
Blogs
Minute-Made Data Preparation with ApertureDB
Working with visual data (images, videos) and its metadata is no picnic...
Read More
Why Do We Need A Purpose-Built Database For Multimodal Data?
Blogs
Why Do We Need A Purpose-Built Database For Multimodal Data?
Recently, data engineering and management has grown difficult for companies building modern applications...
Read More
Building a Specialized Database for Analytics on Images and Videos
Blogs
Building a Specialized Database for Analytics on Images and Videos
ApertureDB is a database for visual data such as images, videos, embeddings and associated metadata like annotations, purpose-built for...
Read More
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2
Blogs
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2
Multimodal AI, vector databases, large language models (LLMs)...
Read More
Challenges and Triumphs: Multimodal AI in Life Sciences
Blogs
Challenges and Triumphs: Multimodal AI in Life Sciences
AI presents a new and unparalleled transformational opportunity for the life sciences sector...
Read More
Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?
Blogs
Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?
The data landscape has dramatically changed in the last two decades...
Read More
Can A RAG Chatbot Really Improve Content?
Blogs
Can A RAG Chatbot Really Improve Content?
We asked our chatbot questions like "Can ApertureDB store pdfs?" and the answer it gave..
Read More
ApertureDB Now Available on DockerHub
Blogs
ApertureDB Now Available on DockerHub
Getting started with ApertureDB has never been easier or safer...
Read More
Are Vector Databases Enough for Visual Data Use Cases?
Blogs
Are Vector Databases Enough for Visual Data Use Cases?
ApertureDB vector search and classification functionality is offered as part of our unified API defined to...
Read More
Accelerate Industrial and Visual Inspection with Multimodal AI
Blogs
Accelerate Industrial and Visual Inspection with Multimodal AI
From worker safety to detecting product defects to overall quality control, industrial and visual inspection plays a crucial role...
Read More
ApertureDB 2.0: Redefining Visual Data Management for AI
Blogs
ApertureDB 2.0: Redefining Visual Data Management for AI
A key to solving Visual AI challenges is to bring together the key learnings of...
Read More
Stay Connected:
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.