Where we are going next with our $3M Seed Funding.
Databases have always been key to solving important data search and management problems, especially as the volume of data grows in magnitude . If you are dealing with numbers , emails, IDs, or even documents, it is now easy to find a database that will let you store and search across scores of these, quickly and easily. However, if you are dealing with large quantities of images, videos, and related information such as objects within the image or source of capture, there isn’t a database in the market that understands how to support these complex data types…until now!
The Problem
We have been viewing and streaming images and videos for quite a while now. Why is it suddenly important for organizations to search and access them any differently? Images and videos capture a significant amount of information in pixels. Data science, in particular m achine learning (ML) and computer vision (CV)-based techniques have now unlocked the inherent value of visual (image/video) data for real-world applications, without the need for manual inspection. Naturally, companies across various industries are increasingly using these techniques to power digital experiences such as better patient care through medical imaging, better product recommendations through visual similarity matching, sustainable farming through better farm views, detecting flaws in methodology through visual inspection and much more.
A few years back, when my co-founder, Luis Remis, and I were at Intel Labs, we observed how such applications rely on and create high volumes of visual content for these insights, and this volume is predicted to grow exponentially in the coming years . In addition, each individual image or video can itself be quite large. In such cases, metadata such as application context, labels, feature vectors and relationships among these become key to meaningfully using this data. With such a variety and volume of data types and access patterns to deal with, data science teams are left with the option of manually stitching up their own do-it-yourself (DIY) systems for managing visual data within their CV and ML workflows.
Don’t Do It Yourself
A typical DIY solution involves steps such as:
- Uniquely naming and storing images / videos in cloud buckets or file storage
- Storing metadata and annotations in files, databases, or both
- Writing scripts to search the metadata, find the links, and then fetch this data from wherever it resides
- In cases such as creating training data or displaying this data to other users, the data might need some preprocessing such as creating thumbnails.
- For use cases such as personalized recommendations, similarity search using visual embeddings can be quite powerful. Supporting this additional feature requires new tools or libraries, introducing yet another software component either off-the-shelf or developed in-house
Aside from learning how to deal with each component and maintaining them, the API differences among these disparate systems not only require plumbing but also open up the data scientist to inconsistencies and bugs that can prove elusive. Data science, particularly ML, keeps improving as researchers improve their data and methods. That means the information that is extracted from the data continues to evolve. With fixed schema databases, this can prove challenging to update. DIY systems also lack robustness and often come at the expense of performance, particularly at scale. With DIY systems 45%+ of data scientists’ time is wasted because of ill-designed data infrastructure that doesn’t meet their needs.
This isn’t just an end user engineering problem, it also has significant ramifications for businesses that are viewing ML driven solutions as turnkey to stay ahead of their competitors and provide increasingly better experiences to their customers.
ApertureDB: A Purpose-Built Solution for Visual Data and Analytics
As part of our research at Intel, Luis and I experienced first hand the complexity of setting up visual data management for such applications since we couldn’t find a single system that could address both visual data and data science requirements. The more we searched, the more we noticed infrastructure for visual data being a big challenge for teams of data scientists and CV / ML engineers given the DIY solutions described above. With the magnitude of the problem growing, the systems and computer vision experience we brought as a team gave us the confidence that we were the right people to solve this problem and redefine visual data management for data science and ML. We therefore spun out ApertureData .
Our product is a specialized database, ApertureDB, for visual data such as images, videos, feature vectors, and associated metadata like annotations. ApertureDB stands uniquely differentiated from other databases and infrastructure tools because it natively supports images, videos, and annotations. Naturally, we also provide necessary preprocessing operations like zooming, cropping and sampling videos. We manage the metadata information as a knowledge graph to enable complex visual searches utilizing the relationships between various entities. Since feature vectors can be used to describe content in images or frames, we also offer similarity search using feature vectors. For our users, all these capabilities are supported by one database behind a unified API.
With its unified approach around visual data, ApertureDB removes the need for teams to concoct and manage complex Frankenstein systems. We have tested ApertureDB with over 1.3+ billion metadata entities, connections, and over 300+ million images. ApertureDB is up to 35x faster compared to popular DIY systems on metadata-based search queries performed over 100 million images.
Data scientists in the visual intelligence and camera intelligence teams at Fortune 100 companies use ApertureDB to save months of data engineering when accessing data for CV / ML pipelines. They use ApertureDB as a unified repository containing product images, labels, embeddings, and product metadata for data science teams. ApertureDB’s easy-to-use API and seamless integration with ML frameworks like PyTorch saves them days when training models, overlaying segmentation masks, and searching by labels. They use our similarity search features to build their visual recommendation engine. We have given them a way to not only manage labels along with images captured by retail cameras but also the ability to easily manage user access to simplify working with third party labelers and visualize existing annotations through our REST-based graphical frontend. We have also been working with companies in the healthcare, smart agriculture and visual inspection space where the importance and use of visual data is growing rapidly.
Where Our New Funding Will Take Us
We have raised $3M in funding led by Root Ventures with participation from 2048 VC, Work-Bench, Alumni Ventures Group, Graph Ventures, Magic Fund, Hustle Fund, and a number of high caliber angels from Datadog, Github, Docker and more (Read about it in TechCrunch ) . This funding will position us to grow our team and customer base. It will accelerate the development of ApertureDB’s innovative ML-ready visual data management support, e.g. support for more complex annotations and integrations with more labeling / MLOps frameworks. It will also help us improve our enterprise features and offer ApertureDB as a managed service to our users.
Partner with us - use ApertureDB
ApertureDB will alleviate the need for expertise in complex-data infrastructure – a scarce skill set for companies of all sizes. Given the growing shortage of qualified data scientists, it is beneficial for companies to invest in solutions that can improve a data science team’s productivity. In short, ApertureDB will provide companies with a single unified system that integrates well with data science pipelines, enables rapid data engineering, and reduces the frustrations, costs, and implementation challenges of integrating multiple platforms.
If your organization uses or intends to use ML on visual data (small or large team) or you are simply curious about our technology, our approach to infrastructure development, and where we are headed, please contact us at team@aperturedata.io or sign up for a free trial . If you’re excited to join an early stage startup and make a big difference, we’re hiring . Last but not least, we will be documenting our journey on our blog, subscribe here .
I want to acknowledge the insights and valuable edits from the Work-Bench team, Steve Huber, Jaime Fawcett, and Romain Cledat.