Blogs

What’s in Your Visual Dataset?

October 15, 2024
7 min read
Vishakha Gupta
Vishakha Gupta

The great thing about being a founder is the conversations I get to have with data science and AI users solving so many different problems with their use of images, videos or both. The sheer number of new solutions built using visual data and the problems they solve are inspiring. However, I have also learned about many challenges and patchwork solutions, often around infrastructure for this type of data, that make me wonder: why is a particular problem sometimes solved in surprisingly convoluted ways or not at all? One such problem is about looking into the data used and understanding it.

Why Peek Inside Your Complex, Computer Vision Data?

During discovery calls, when I ask users about their most pressing problems, one of the common ones that often comes up is their ability to understand their data and to efficiently visualize large amounts of it.

Know What You Have

Collecting good datasets is known to be challenging, particularly when dealing with computer vision data. Companies often have to purchase datasets to train their models. Regardless of whether the dataset pre-existed, was collected in-house, or purchased; data science and analytics teams need to view or navigate through it in order to understand what the data looks like. Better understanding can lead to better models, faster.

Data-centric Model Debugging

A common use case for visual AI teams is to train and fine-tune models in order to improve and to accommodate new data. Naturally, the ability to search and navigate their visual datasets is invaluable when they are trying to debug how these datasets affect their models. You can’t debug what you can’t see! It’s as simple as that.

Application Insights

Assuming you have information about who was in an image or a video, where they were, and what they were doing, the ability to gather insights from existing data can be very valuable. Really, isn’t that the end goal for any analytics effort? For example, “How many people were in the area of interest yesterday?” could tell a store manager how well their product arrangement is working, or querying for the queue length at a security checkpoint can help plan for more resources. This means not just giving a file name and rendering an image or video in a browser but being able to search using intelligence data and see what matches.

How Do Data Teams Query Visual Datasets Today?

Just like platform engineers look at logs to debug, people working with data need to find, analyze, pre-process as needed; and to visualize their images and videos along with any additional information like labels, regions of interest, and application metadata.

However, I have spoken to machine learning (ML) engineers who have had to train models without really knowing how good that dataset is, what the images in it look like, or just scan through a few because it was too complicated to just query a subset and understand it.

To even get to scan a few, we have heard quite a few painful stories where their most obvious choice is to find relevant images, download them in local folders, find the right viewers, struggle with encodings in some cases, and then suffer even further if they want to see the effect of any augmentations to the data. Some write scripts to generate HTML files to display the images in the desired format whenever they want to visualize a large amount of data or results. Some create simple web pages to filter by one identifying metadata property so the images pop up, i.e. just barely meeting the definition of a UI.

This problem is even worse with videos. We have learnt about quite a few efforts for visualizing videos where ML teams had to wrangle various video encodings, and had trouble when trying to process them or when they were too large.

Given the need for visualizing data is so prominent, if they don’t spend resources building bare minimum tools as described above, users tend to find (often poor) substitutes to work around the problems. Teams can try to repurpose model testing, data curation, or labeling tools in an attempt to visualize and understand parts of their workflows and how their data fits into it.

But whichever slice of ML tooling you go with for the sake of visualizing your data, what do you do for other use cases that also rely on the same data? For example, how do you associate labels or access data for training or inference? Will your chosen tool let you (1) examine the metadata information to start filtering, (2) see what the pre-processed or augmented version of the data would look like, or (3) create custom queries which could then be used within ML pipelines?

These are features that would naturally be supported if visual data were being managed by a database that understood them.

ApertureDB for Easily Viewing Your Image and Video Datasets

ApertureDB is a unique database that natively recognizes images, videos, feature vectors that represent their content, and annotations that indicate where some objects of interest are. Given the nature of applications that would need such a database (ML applications), pre-processing or augmentation of said data, queries based on annotations or application metadata, and near neighbor searches are all natively supported by ApertureDB through its query API. Something that our users find very useful is the graphical interface or ApertureDB UI.

Salient Features of ApertureDB UI

ApertureDB UI gives our users an easy way to get started with ApertureDB. Like any database UI, ApertureDB UI allows them to query and explore the supported data types.

Check out our demo video to see the UI in action.

Know Your Metadata

Metadata, particularly from the application context, is key to making sense of the data. For example, our e-commerce users often want to find images of the type “silo” in order to create a clean training dataset or our smart retail users are often interested in videos of specific events that were shot in the last 24 hours. Similar queries come up for media, medical imaging, smart city, and other computer vision based applications. All these are application metadata elements that are part of ApertureDB metadata and stored in a property graph format.

Usually, the data engineering teams are responsible for populating the database while the data scientists or analysts query it. The “Status” tab, shown in the figure below, gives an overview of the entire graph schema and comes in very handy for anyone who wants to know what’s already loaded in the database before writing queries.

Graphically Filter And Visualize Images Or Videos

From the “Image Search” or “Video Search” tabs shown in the figure below, users can visually explore their data, filter by metadata properties, display annotations, and any other supported operations whose results they might want to visualize. There is a handy toggle to show the actual API query sent to the database and the JSON response received by the UI.

Peek Into The Annotations, Maybe Even Fixup A Few

If you click on any searched image, you can not only see all the associated metadata properties that you asked for but also any annotations that were linked to the image, overlaid on it with their labels. Our UI already supports quick fixes to these annotations which are propagated back to the database. This can be very useful when you notice some deviations or errors after your explicit labeling step.

There are also some access control features within the UI that are out of scope for this blog.

What’s Next?

There are numerous enhancements planned and in progress which will continue to improve this UI, driven by customer feedback and use cases. For example, a) support for searching data by labels, b) support for near neighbor searches or feature classifications, and c) enhanced support for videos where users can see metadata per frame or on key frames and essentially make use of all the capabilities offered by our API.

We are a customer driven company and welcome your feedback on what could help us further enhance our product. Please share your thoughts on the most important or useful capabilities at team@aperturedata.io and subscribe here for the latest on how we are helping mainstream AI on vision data. You can try the UI through our online trial. If you’re excited to join an early stage startup and make a big difference, we’re hiring.

I want to acknowledge the insights and valuable edits from Priyanka Somrah, Steve Huber, and Josh Stoddard.

Related Posts

Building a Specialized Database for Analytics on Images and Videos
Blogs
Building a Specialized Database for Analytics on Images and Videos
ApertureDB is a database for visual data such as images, videos, embeddings and associated metadata like annotations, purpose-built for...
Read More
Read More
Accelerate Industrial and Visual Inspection with Multimodal AI
Blogs
Accelerate Industrial and Visual Inspection with Multimodal AI
From worker safety to detecting product defects to overall quality control, industrial and visual inspection plays a crucial role...
Read More
Read More
How a Purpose-Built Database for Multimodal AI Can Save You Time and Money
Blogs
How a Purpose-Built Database for Multimodal AI Can Save You Time and Money
With extensive data systems needed for modern applications, costs...
Read More
Read More
Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?
Blogs
Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?
The data landscape has dramatically changed in the last two decades...
Read More
Read More
Building Real World RAG-based Applications with ApertureDB
Blogs
Building Real World RAG-based Applications with ApertureDB
Combining different AI technologies, such as LLMs, embedding models, and a database like ApertureDB that is purpose-built for multimodal AI, can significantly enhance the ability to retrieve and generate relevant content.
Read More
Managing Visual Data for Machine Learning and Data Science. Painlessly.
Blogs
Managing Visual Data for Machine Learning and Data Science. Painlessly.
Visual data or image/video data is growing fast. ApertureDB is a unique database...
Read More
What’s in Your Visual Dataset?
Blogs
What’s in Your Visual Dataset?
CV/ML users need to find, analyze, pre-process as needed; and to visualize their images and videos along with any metadata easily...
Read More
Transforming Retail and Ecommerce with Multimodal AI
Blogs
Transforming Retail and Ecommerce with Multimodal AI
Multimodal AI can boost retail sales by enabling better user experience at lower cost but needs the right infrastructure...
Read More
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1
Blogs
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1
Multimodal AI, vector databases, large language models (LLMs)...
Read More
How a Purpose-Built Database for Multimodal AI Can Save You Time and Money
Blogs
How a Purpose-Built Database for Multimodal AI Can Save You Time and Money
With extensive data systems needed for modern applications, costs...
Read More
Minute-Made Data Preparation with ApertureDB
Blogs
Minute-Made Data Preparation with ApertureDB
Working with visual data (images, videos) and its metadata is no picnic...
Read More
Why Do We Need A Purpose-Built Database For Multimodal Data?
Blogs
Why Do We Need A Purpose-Built Database For Multimodal Data?
Recently, data engineering and management has grown difficult for companies building modern applications...
Read More
Building a Specialized Database for Analytics on Images and Videos
Blogs
Building a Specialized Database for Analytics on Images and Videos
ApertureDB is a database for visual data such as images, videos, embeddings and associated metadata like annotations, purpose-built for...
Read More
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2
Blogs
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2
Multimodal AI, vector databases, large language models (LLMs)...
Read More
Challenges and Triumphs: Multimodal AI in Life Sciences
Blogs
Challenges and Triumphs: Multimodal AI in Life Sciences
AI presents a new and unparalleled transformational opportunity for the life sciences sector...
Read More
Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?
Blogs
Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?
The data landscape has dramatically changed in the last two decades...
Read More
Can A RAG Chatbot Really Improve Content?
Blogs
Can A RAG Chatbot Really Improve Content?
We asked our chatbot questions like "Can ApertureDB store pdfs?" and the answer it gave..
Read More
ApertureDB Now Available on DockerHub
Blogs
ApertureDB Now Available on DockerHub
Getting started with ApertureDB has never been easier or safer...
Read More
Are Vector Databases Enough for Visual Data Use Cases?
Blogs
Are Vector Databases Enough for Visual Data Use Cases?
ApertureDB vector search and classification functionality is offered as part of our unified API defined to...
Read More
Accelerate Industrial and Visual Inspection with Multimodal AI
Blogs
Accelerate Industrial and Visual Inspection with Multimodal AI
From worker safety to detecting product defects to overall quality control, industrial and visual inspection plays a crucial role...
Read More
ApertureDB 2.0: Redefining Visual Data Management for AI
Blogs
ApertureDB 2.0: Redefining Visual Data Management for AI
A key to solving Visual AI challenges is to bring together the key learnings of...
Read More
Stay Connected:
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.