Vector databases have recently been getting increased recognition due to the role they can play for LLMs and semantic search. This article does a great job summarizing quite a few well-known vector databases along with their pros and cons.
We work with, and talk to, many users focusing on images and videos (visual data) based analytics. Given that we have been working on vector search and classification for a few years now in the context of finding similar items for visual recommendations, and vector classification with our smart retail and e-commerce customers, we thought it would be interesting to understand what vector databases offer in the context of computer vision use cases, and evaluate them from the perspective of what is needed to simplify visual data pipelines for data scientists and engineers.
Use Cases Based on Visual Embeddings
This article does a great job summarizing what vector embeddings are and how they can be constructed for various types of data like text, and images. For this blog, let’s focus on visual embeddings or embeddings derived from images or videos.
Figure 1: An example of visual embedding would be the 256 or 128 dimensional feature vector extracted from an example model
A visual embedding or feature vector is a lower-dimensional representation of the image or video, typically a series of numbers taken from some of the last layers of a model used for inference. Basically, it is a vector of numbers which can be used for many tasks such as similarity search or classification to recognize items using content in images or videos rather than external keywords or labels as shown in Figure 1. While that sounds really cool to not need keywords or labels, in reality, the queries almost always require a combination of two or three of these search modalities as we will see from some representative use cases outlined below.
Sharing or Reusing Embeddings
We frequently encounter cases where our users have ways of extracting embeddings, but are missing efficient and scalable ways to index, reuse, and share them. One of the teams we work with described how their data scientists would query and download the image dataset, generate embeddings, and build an in-memory vector index each time they had to debug their visual recommendations, and eventually discard the index after solving the problem. If they could index those embeddings easily, the entire workflow would be as simple as running a query to use those embeddings rather than starting from building indexes each time they are needed. All such examples usually involve querying the dataset for a specific group of items (e.g. “shoes”), which can over time involve complex metadata queries. The ability to index and find embeddings can also simplify collaboration across teams since people can build on each other’s work more efficiently.
Visual Recommendations
The vector representations are designed to produce similar vectors for similar images or video content, where similar vectors are those that are nearby in Euclidean space. This can be very helpful when recommending products to users in e-commerce use cases, finding images or videos with similar defects in visual inspection cases, searching for similar tumor impressions in medical imaging and so on.
Figure 2: Recommendations based on visual similarity combine similarity search with other metadata to get desired results
Our customers often create multiple feature vectors for the same group of images using different models to compare which ones give the best recommendations. They also sometimes designate best images per product (common metadata in e-commerce applications) and create recommendations only for those per group of products. Naturally, there is quite a lot of metadata search and actual image access and visualization required for building applications such as these.
Classification
Another application for vector representation is classification. A feature vector classification algorithm can help determine what the right group or label is, for a given query image as shown in Figure 3.
Figure 3: Vector classification can help use embeddings from a query image to find the corresponding label
For example, one of our smart retail customers relies on visually identifying items that have been misplaced (read case study here). They start with a map of items that they expect to find at specific locations and a library of images of those items. At regular intervals, they capture images of the locations on the map, and extract feature vectors from these images. To determine whether the images contain the items they expect, they use vector classification combined with a K-nearest neighbor search. If they find an image where the classification does not match, and the expected item does not appear among any of the nearest neighbor images, they flag that location as a possible misplacement. These queries can involve a lot of connected metadata and data.
Managing Embeddings Today
As summarized in this blog, there are quite a few vector databases for users to choose from e.g. Pinecone, Milvus, Qdrant, Vald, to name a few. In fact, existing databases now offer support for vector search like with pgvector, Redis similarity search, and others. The right choice for any application depends not only on the specific use case, but also on cost, scale, algorithms, distance metrics, SaaS vs. on-premise decisions, automated extraction of embeddings, and capabilities expected beyond similarity search and classification. When it comes to visual data, the choice goes even further beyond challenges addressed by these solutions.
Through our user interactions, we have learnt how some of these vector databases can be deployed for use cases similar to the ones that we have discussed above. While these vector databases are able to help them solve the vector indexing and search challenges they faced, when we started digging deeper what caught our attention was what the overall data pipeline and architecture looked like for their visual use cases.
Figure 4: A do-it-yourself (DIY) spaghetti solution to put all data infrastructure pieces together. Vector search and classification (shown in red) is often just one piece of the puzzle.
Invariably, they had to create a relational database mapping between the feature vector identifiers returned by the chosen vector database and other metadata e.g. about the product whose images were used to produce the embeddings, requiring more complex queries across different metadata parameters. Sometimes the embeddings represent only a region of interest in an image which means they also need to be linked to both the image itself and the region of interest with the ability to trace back to the original images and annotations when displaying the result.
For these very reasons, even though a lot of the vector database products started out as purely vector indexing and search tools, they have realized how those functionalities are often insufficient, because their users have to continue investing in DIY systems as shown in Figure 4. As a result, they have been introducing metadata filter capabilities at various granularities ranging from a simple keyword filter to some understanding of a graph-like structure. But often because these filters are being overlaid on a primarily vector search solution, it continues to be hard for users to link their existing, rich metadata from various databases, annotations, and data storage with these vector databases. The onus of consistent naming and identification falls on the data engineers setting up the data pipelines.
Another challenge in some vector database solutions is their hosted nature. While these solutions can work in a lot of cases, our conversations with people dealing with images and video indicate their strong hesitations in uploading data to another cloud account as a long term solution either due to privacy and/or cost concerns.
Why Use ApertureDB?
Our biggest value proposition is that we integrate functionalities of a vector database, intelligence graph, and visual data to seamlessly query across data domains. Data scientists and engineers looking at visual machine learning find ApertureDB appealing because it offers a simple API to access whatever visual assets you need from a shared repository containing images, videos, annotations, embeddings, and more.
Our metadata, stored in an in-memory property graph behind the scenes, is native and more than a simple one word filter. Embeddings, extracted from any method and type of data, live with their source images / videos or regions of interest and other related metadata counterparts. You can use our API to run vector queries, metadata filtering, and image or video manipulation within a single transaction and rely on ApertureDB to keep scaling as your application requirements and data grow.
Figure 5: A purpose-built database for visual analytics like ApertureDB shifts focus back to analytics
Vector Search and Classification with ApertureDB
ApertureDB vector search and classification functionality is offered as part of our unified API. We offer a flexible API for exact or ANN powered by FAISS, modified for CRUD and memory efficiency. We support multiple ANN algorithms and/or configurations simultaneously as well as all major distance metrics. The entire database can be self-hosted or managed.
For the smart retail customer use case referred above, they have a library of reference images, feature vector embeddings for each image, metadata describing which items are depicted, and locations where they expect to find those items. They then need to query this data at scale across all of these data modalities. This is all achieved using a single ApertureDB query.
Even with a single-node deployment, this customer found that ApertureDB is 2.5X as fast with the same accuracy as their previous solution (a popular vector database), prompting them to replace their inventory management solution with ApertureDB in production as they scale to 10s of millions of embeddings per index (read case study here). Our distributed version, will allow them to scale further in both throughput and recall (topic for a future blog).
Conclusion
If you are planning to implement or are currently utilizing ML data infrastructure for operating on pixel data like images and videos, is vector indexing and search needed? The answer is yes quite often, though not always. Are the current vector databases enough to achieve application end goals? Our customer discussions indicate that often they are not. While they solve one piece of the problem, practical implementations typically require stitching in a few more pieces such as a metadata store, actual data buckets, processing and augmentation libraries, label management, just to name a few.
So what if you are working with embeddings and metadata, but not visual data? We get this question often. We are agnostic to how the embeddings are derived, the metadata can be adapted to any application, and any source data can be stored as generic binary blobs in ApertureDB. Stay tuned for more on this, but in the meantime, please come talk to us.
If your organization uses or intends to use ML on visual data (small or large team) or you are simply curious about our technology, our approach to infrastructure development, or where we are headed, please contact us at team@aperturedata.io or try out ApertureDB on pre-loaded datasets. If you’re excited to join an early stage startup and make a big difference, we’re hiring. Last but not least, we will be documenting our journey and explaining all the components listed above on our blog, subscribe here.
Acknowledging here the valuable insights and edits by Josh Stoddard, Steve Huber, Romain Cledat, Luis Remis, and Drew Ogle.