Blogs

Minute-Made Data Preparation with ApertureDB

October 15, 2024
8 min read
Luis Remis
Luis Remis

Working with visual data (images, videos) and its metadata is no picnic . Especially if you are running analytics on them. You not only need to deal with the visual data itself, but also have to deal with multimodal metadata (i.e., bounding boxes coordinates and their labels, or a text description with the content of the image), and even feature vectors, just to name a few. When you complete a project and attempt to apply a "cookie-cutter" approach to solve a new, yet similar business problem, you discover that you have to deal with various types of metadata all over again. There is not much re-use. Your prior experience does help, but only linearly, not exponentially. If you were wondering what is missing, you are not alone.

If you have trained an ML model to run object detection, segmentation, or tracking, you are familiar with the long list of steps needed before you even start tuning your neural network model's parameters (let alone run the training and wait for the results).

Let’s walk through some of the steps. For this, let’s use as an example a popular dataset: Common Objects in Context  (or COCO for short). COCO is a large-scale object detection, segmentation, and captioning dataset that includes more than 200 thousand images, together with annotations. These annotations include bounding boxes, object segmentation, and recognition in context for about 1.5 million different objects. While COCO could be stored on a single machine (less than 100GB, including images and labels, bounding boxes, and segmentation information), its rich annotations and complexity, make it a great example dataset to start with.

COCO can be used for training models on general objects (for which labels are provided) to later train (using transfer learning) on more specific objects based on application specific needs. You could also select a subset of the images for training, throwing away images that are not relevant to your application (i.e., images containing objects and labels that are no relevant).

Now, consider a ML developer constructing a pipeline for object detection . First, you have to have all images in an accessible place, and the right data structures that associate each image and its bounding boxes with the right labels. This requires assigning consistent identifiers for the images and adding their metadata in some form of relational or key-value database. Finally, if the ML pipeline needs images that are of a size different than the original ones present in the dataset, there is additional compute diverted towards pre-processing after the images are fetched. All these steps require integration of different software solutions that provide various functionalities that can then be stitched together with a script for this specific use case.

More generally, data scientists and machine learning developers usually end up building an ad-hoc solution that results in a combination of databases and file systems to store metadata and visual data (images, videos), respectively. There are N technologies for dealing with visual data and M technologies for metadata. Each is mostly unaware of each other, therefore a connector, or a "glue", has to be written in order to have a solution for a given problem, as we describe in greater detail here . Companies (or team within the same company) select one of each technology according to immediate needs, which results in multiple implementations for each of N*M connectors. These ad-hoc solutions make replicating experiments difficult, and more importantly, they do not scale well when deployed in the real-world. The reason behind such complexity is the lack of a one-system that can be used to store and access all the data the application needs.

Now, to add more real-world complexity, less assume we work in a team of ML engineers and data scientist. There are many steps in the ML pipeline, and many people involved in the task. It is important for the team to share data, intermediate results, and good practices, and, given usually growing scale, having multiple replicas of the dataset is no longer an option. An ad-hoc solution with loosely defined interfaces is painful to maintain and error-prone. What we need is a database system design for visual data and ML workloads.

The good news is that at ApertureData, we have been focusing all our efforts on solving these problems. Our solution, ApertureDB , provides an API  that enables a smooth interaction with visual data. Images, videos, bounding boxes, feature vectors and labels are first-class entities.

Accelerating Data Preparation

Let’s see how easy it is to interact with the COCO dataset with ApertureDB in Python. As a starting point, we have pre-populated the dataset on ApertureDB. Connecting to ApertureDB is very simple. All we need to do is instantiate a connector to the database, using the binary-compatible VDMS  connector (VDMS is the original and open-source version of ApertureDB):

 import vdms

 db = vdms.vdms() # Instanciate a DB connector

 db.connect(“coco.datasets.aperturedata.io”)

Now, with a connection open to the database, we can start sending queries. We can do this in two ways: using the Native API , or using ApertureDB Object Model API. The latter allows us to interact with the data in the database as if we were dealing with Python Objects, and without the need to write query using the native API. We will use the Object Model API for this demonstration.

Here is an example of how easy it is to filter and retrieve images. Let’s retrieve only the images that are larger than 600x600 pixels, and we only one images of a specific type of Creative Common License:

 from aperturedb import Image

 imgs = Image.Images(db)

 const = Image.Constraints()

 const.greater(“width”, 600)

 const.greater(“height”, 600)

 const.equal("license", 4)

 imgs.search(constraints=const)

And that’s it. The imgs  object will contain a handler for all the images. we can pass those images to our favourite framework for processing (Pytorch, Tensorflow, etc). We also provide tools for quick visualization. Showing the images is as simple as it can be:

 imgs.display(limit=6)  # We limit the number of images for display purposes.

Combining the snippets above will help us find the images that are 600x600 in resolution and covered by license 4 with a few (6) examples displayed below:

Even if the images are now fully accessible to the application and ready for use, they will be expected to be preprocessed with basic operations like a resize (to make it the size expected in the input layer of our neural network), or even a normalization step. It is a shame that we tranferred the full-size image from ApertureDB only to later discard most of that information with a resize (which most likely will downsample the image to, say, 224x224). But because ApertureDB was designed with ML use cases in mind, these basic pre-processing operations are supported as part of the API. Instead of retrieving the full-size images, we can simply indicate that we want the images to be resize before retrieving them, saving bandwidth and accelerating the fetching significantly.

We can retrieve resized images directly (and even rotate them as a data augmentation technique) by simply doing:

   import vdms

   from aperturedb import Image

   db = vdms.vdms()

   db.connect("coco.datasets.aperturedata.io")

   imgs = Image.Images(db)

   const = Image.Constraints()

   const.greater("width",  600)

   const.greater("height", 600)

   ops = Image.Operations()

   ops.resize(224, 224)

   ops.rotate(45.0, True)

   imgs.search(constraints=const, operations=ops)

   imgs.display(limit=6)

The result is:

As part of our pipeline, we may also need to retrieve metadata associated with the images. We can easily obtain a list of all the metadata properties associated with the images by doing:

 prop_names = imgs.get_props_names()

 print(prop_names)

 # Reponse:

 ['aspect_ratio', 'coco_url', 'date_captured', 'date_insertion', 'file_name', 'flickr_url', 'height', 'id', 'license', 'seg_id', 'type', 'width', 'yfcc_id']

For instance, let’s retrieve the metadata information associated with the images we retrieved in our last query:

 import json

 props = imgs.get_properties(["coco_url", "height", "width"])

 print("Response: ", json.dumps (props, indent=4))

 Response:

 {

   "341010":                                 # This is the image id

  {

       "coco_url": "http://images.cocodataset.org/train2017/000000341010.jpg",

       "height": 640,

       "width": 640

   },

   "208867":                                 # This is the image id

   {

       "coco_url": "http://images.cocodataset.org/train2017/000000208867.jpg",

       "height": 640,

       "width": 628

   },

   …

 }

What about bounding boxes and segmentation masks?

Because ApertureDB recognizes images, videos, and bounding boxes as first-class citizens of the database, we can easily retrieve and show these objects with little effort. For instance, retrieving the bounding boxes associated with the images is as simple as doing:

 imgs.display(limit=6, show_bboxes=True)

Similarly, let’s retrieve the segmentation annotations associated with the images. Same as bounding boxes, we can do that with a simple function parameter, as follows.

 imgs.display(limit=6, show_segmentation=True)

Note that the segmentation masks needed will be retrieved from the ApertureDB only when explicitly requested (in this case, for displaying them). You can also use other interfaces to retrieve bounding boxes coordinates, labels, and segmentation polygons directly as JSON objects or Python dictionaries.

Support for Similarity Search

Another important piece of the puzzle when dealing with visual data are feature vectors. For instance, in the context of the COCO dataset, feature vectors represent information about the nature or content of the images. Feature vectors represent this information in a high-dimensional hyper-space, which can be used to perform a “similarity search” (i.e., finding objects that are similar). This similarity is not tied to metadata (like labels or image properties), but rather to “how similar” those feature vectors are. Because ApertureDB recognizes feature vectors as part of its interface, associating feature vectors to images, and, later, searching for similar images is as simple as it can be.

Let's take as an example the following image:

Now, let’s find the 6 most similar images (and, their bounding boxes, just because it is so easy):

As we have seen, interacting with visual data has never been easier, and we are only scratching the surface of ApertureDB’s powerful API.

Last, but not least, ApertureDB offers a web user interface  that allows user to explore the content of the database and run image search. Here is an example of what our webUI looks like for the COCO dataset:

We have made an ApertureDB free trial available through our website , with all the sample code we show here. Simply select the COCO usecase from the dropdown menu, and you will receive an email with instructions on how to access the instance of ApertureDB, pre-populated with the COCO dataset.

Coming up next…

While COCO is a good case to study because of its rich annotations, its size is rather small. We have designed ApertureDB to support a significantly large number of objects while providing the same simple interfaces, without sacrificing performance. We have been using much larger datasets such as the Yahoo Flickr Creative Common 100 million dataset (or YFCC100m for short) both to support real-world customer use cases, and for internal evaluation. We will be releasing this evaluation and more details soon.

Special thanks to Vishakha and Andrei from ApertureData, who helped me shape this article, and to my friends and family that reviewed and annotate countless times.

Related Posts

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2
Blogs
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2
Multimodal AI, vector databases, large language models (LLMs)...
Read More
Read More
Can A RAG Chatbot Really Improve Content?
Blogs
Can A RAG Chatbot Really Improve Content?
We asked our chatbot questions like "Can ApertureDB store pdfs?" and the answer it gave..
Read More
Read More
Accelerate Industrial and Visual Inspection with Multimodal AI
Blogs
Accelerate Industrial and Visual Inspection with Multimodal AI
From worker safety to detecting product defects to overall quality control, industrial and visual inspection plays a crucial role...
Read More
Read More
Building a Specialized Database for Analytics on Images and Videos
Blogs
Building a Specialized Database for Analytics on Images and Videos
ApertureDB is a database for visual data such as images, videos, embeddings and associated metadata like annotations, purpose-built for...
Read More
Read More
Building Real World RAG-based Applications with ApertureDB
Blogs
Building Real World RAG-based Applications with ApertureDB
Combining different AI technologies, such as LLMs, embedding models, and a database like ApertureDB that is purpose-built for multimodal AI, can significantly enhance the ability to retrieve and generate relevant content.
Read More
Managing Visual Data for Machine Learning and Data Science. Painlessly.
Blogs
Managing Visual Data for Machine Learning and Data Science. Painlessly.
Visual data or image/video data is growing fast. ApertureDB is a unique database...
Read More
What’s in Your Visual Dataset?
Blogs
What’s in Your Visual Dataset?
CV/ML users need to find, analyze, pre-process as needed; and to visualize their images and videos along with any metadata easily...
Read More
Transforming Retail and Ecommerce with Multimodal AI
Blogs
Transforming Retail and Ecommerce with Multimodal AI
Multimodal AI can boost retail sales by enabling better user experience at lower cost but needs the right infrastructure...
Read More
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1
Blogs
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1
Multimodal AI, vector databases, large language models (LLMs)...
Read More
How a Purpose-Built Database for Multimodal AI Can Save You Time and Money
Blogs
How a Purpose-Built Database for Multimodal AI Can Save You Time and Money
With extensive data systems needed for modern applications, costs...
Read More
Minute-Made Data Preparation with ApertureDB
Blogs
Minute-Made Data Preparation with ApertureDB
Working with visual data (images, videos) and its metadata is no picnic...
Read More
Why Do We Need A Purpose-Built Database For Multimodal Data?
Blogs
Why Do We Need A Purpose-Built Database For Multimodal Data?
Recently, data engineering and management has grown difficult for companies building modern applications...
Read More
Building a Specialized Database for Analytics on Images and Videos
Blogs
Building a Specialized Database for Analytics on Images and Videos
ApertureDB is a database for visual data such as images, videos, embeddings and associated metadata like annotations, purpose-built for...
Read More
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2
Blogs
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2
Multimodal AI, vector databases, large language models (LLMs)...
Read More
Challenges and Triumphs: Multimodal AI in Life Sciences
Blogs
Challenges and Triumphs: Multimodal AI in Life Sciences
AI presents a new and unparalleled transformational opportunity for the life sciences sector...
Read More
Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?
Blogs
Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?
The data landscape has dramatically changed in the last two decades...
Read More
Can A RAG Chatbot Really Improve Content?
Blogs
Can A RAG Chatbot Really Improve Content?
We asked our chatbot questions like "Can ApertureDB store pdfs?" and the answer it gave..
Read More
ApertureDB Now Available on DockerHub
Blogs
ApertureDB Now Available on DockerHub
Getting started with ApertureDB has never been easier or safer...
Read More
Are Vector Databases Enough for Visual Data Use Cases?
Blogs
Are Vector Databases Enough for Visual Data Use Cases?
ApertureDB vector search and classification functionality is offered as part of our unified API defined to...
Read More
Accelerate Industrial and Visual Inspection with Multimodal AI
Blogs
Accelerate Industrial and Visual Inspection with Multimodal AI
From worker safety to detecting product defects to overall quality control, industrial and visual inspection plays a crucial role...
Read More
ApertureDB 2.0: Redefining Visual Data Management for AI
Blogs
ApertureDB 2.0: Redefining Visual Data Management for AI
A key to solving Visual AI challenges is to bring together the key learnings of...
Read More
Stay Connected:
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.