The great thing about being a founder is the conversations I get to have with data science and AI users solving so many different problems with their use of images, videos or both. The sheer number of new solutions built using visual data and the problems they solve are inspiring. However, I have also learned about many challenges and patchwork solutions, often around infrastructure for this type of data, that make me wonder: why is a particular problem sometimes solved in surprisingly convoluted ways or not at all? One such problem is about looking into the data used and understanding it.
Why Peek Inside Your Complex, Computer Vision Data?
During discovery calls, when I ask users about their most pressing problems, one of the common ones that often comes up is their ability to understand their data and to efficiently visualize large amounts of it.
Know What You Have
Collecting good datasets is known to be challenging, particularly when dealing with computer vision data. Companies often have to purchase datasets to train their models. Regardless of whether the dataset pre-existed, was collected in-house, or purchased; data science and analytics teams need to view or navigate through it in order to understand what the data looks like. Better understanding can lead to better models, faster.
Data-centric Model Debugging
A common use case for visual AI teams is to train and fine-tune models in order to improve and to accommodate new data. Naturally, the ability to search and navigate their visual datasets is invaluable when they are trying to debug how these datasets affect their models. You can’t debug what you can’t see! It’s as simple as that.
Application Insights
Assuming you have information about who was in an image or a video, where they were, and what they were doing, the ability to gather insights from existing data can be very valuable. Really, isn’t that the end goal for any analytics effort? For example, “How many people were in the area of interest yesterday?” could tell a store manager how well their product arrangement is working, or querying for the queue length at a security checkpoint can help plan for more resources. This means not just giving a file name and rendering an image or video in a browser but being able to search using intelligence data and see what matches.
How Do Data Teams Query Visual Datasets Today?
Just like platform engineers look at logs to debug, people working with data need to find, analyze, pre-process as needed; and to visualize their images and videos along with any additional information like labels, regions of interest, and application metadata.
However, I have spoken to machine learning (ML) engineers who have had to train models without really knowing how good that dataset is, what the images in it look like, or just scan through a few because it was too complicated to just query a subset and understand it.
To even get to scan a few, we have heard quite a few painful stories where their most obvious choice is to find relevant images, download them in local folders, find the right viewers, struggle with encodings in some cases, and then suffer even further if they want to see the effect of any augmentations to the data. Some write scripts to generate HTML files to display the images in the desired format whenever they want to visualize a large amount of data or results. Some create simple web pages to filter by one identifying metadata property so the images pop up, i.e. just barely meeting the definition of a UI.
This problem is even worse with videos. We have learnt about quite a few efforts for visualizing videos where ML teams had to wrangle various video encodings, and had trouble when trying to process them or when they were too large.
Given the need for visualizing data is so prominent, if they don’t spend resources building bare minimum tools as described above, users tend to find (often poor) substitutes to work around the problems. Teams can try to repurpose model testing, data curation, or labeling tools in an attempt to visualize and understand parts of their workflows and how their data fits into it.
But whichever slice of ML tooling you go with for the sake of visualizing your data, what do you do for other use cases that also rely on the same data? For example, how do you associate labels or access data for training or inference? Will your chosen tool let you (1) examine the metadata information to start filtering, (2) see what the pre-processed or augmented version of the data would look like, or (3) create custom queries which could then be used within ML pipelines?
These are features that would naturally be supported if visual data were being managed by a database that understood them.
ApertureDB for Easily Viewing Your Image and Video Datasets
ApertureDB is a unique database that natively recognizes images, videos, feature vectors that represent their content, and annotations that indicate where some objects of interest are. Given the nature of applications that would need such a database (ML applications), pre-processing or augmentation of said data, queries based on annotations or application metadata, and near neighbor searches are all natively supported by ApertureDB through its query API. Something that our users find very useful is the graphical interface or ApertureDB UI.
Salient Features of ApertureDB UI
ApertureDB UI gives our users an easy way to get started with ApertureDB. Like any database UI, ApertureDB UI allows them to query and explore the supported data types.
Check out our demo video to see the UI in action.
Know Your Metadata
Metadata, particularly from the application context, is key to making sense of the data. For example, our e-commerce users often want to find images of the type “silo” in order to create a clean training dataset or our smart retail users are often interested in videos of specific events that were shot in the last 24 hours. Similar queries come up for media, medical imaging, smart city, and other computer vision based applications. All these are application metadata elements that are part of ApertureDB metadata and stored in a property graph format.
Usually, the data engineering teams are responsible for populating the database while the data scientists or analysts query it. The “Status” tab, shown in the figure below, gives an overview of the entire graph schema and comes in very handy for anyone who wants to know what’s already loaded in the database before writing queries.
Graphically Filter And Visualize Images Or Videos
From the “Image Search” or “Video Search” tabs shown in the figure below, users can visually explore their data, filter by metadata properties, display annotations, and any other supported operations whose results they might want to visualize. There is a handy toggle to show the actual API query sent to the database and the JSON response received by the UI.
Peek Into The Annotations, Maybe Even Fixup A Few
If you click on any searched image, you can not only see all the associated metadata properties that you asked for but also any annotations that were linked to the image, overlaid on it with their labels. Our UI already supports quick fixes to these annotations which are propagated back to the database. This can be very useful when you notice some deviations or errors after your explicit labeling step.
There are also some access control features within the UI that are out of scope for this blog.
What’s Next?
There are numerous enhancements planned and in progress which will continue to improve this UI, driven by customer feedback and use cases. For example, a) support for searching data by labels, b) support for near neighbor searches or feature classifications, and c) enhanced support for videos where users can see metadata per frame or on key frames and essentially make use of all the capabilities offered by our API.
We are a customer driven company and welcome your feedback on what could help us further enhance our product. Please share your thoughts on the most important or useful capabilities at team@aperturedata.io and subscribe here for the latest on how we are helping mainstream AI on vision data. You can try the UI through our online trial. If you’re excited to join an early stage startup and make a big difference, we’re hiring.
I want to acknowledge the insights and valuable edits from Priyanka Somrah, Steve Huber, and Josh Stoddard.