Retailers and ecommerce leaders are obsessed with improving customer experience to ultimately drive bottom-line results. In the world of smart retail, innovations like inventory management, out of stock warnings, and frictionless stores are transforming shopper experience and lowering labor costs. Ecommerce vendors have invested billions of dollars to improve their consumers’ online experience through use of visual assets to optimize resources and ultimately increase sales. But how do they get from strategy to reality? With Multimodal AI.
Multimodal AI, stated simply, is intelligence derived from a combination of data types like images, videos, text, and audio, and is key to meeting these customer experience and business goals. It is becoming increasingly important for businesses to capitalize on understanding these constantly changing customer needs and stay ahead of the competition, as AI allows them to extract more value from their data.
Multimodal AI Use Cases In Retail And Ecommerce
While we are just scratching the surface on how multimodal data and AI can boost retail sales, lower labor costs, and provide great customer experience, let's look at a few use cases that are already proving worth their investment and how they are accomplished.
Actionable Data for Retail Operations
Tracking holes on shelves, misplaced items, price mismatch, planogram compliance, hazard mitigation, and fraud avoidance are just a few examples of how companies ensure a safe and smooth shopping experience for their consumers. Camera-based solutions allow these companies to capture pictures in real time and use vector classification and matching with product catalogs to be much more accurate and cover more ground daily and efficiently, making them more attractive than manual scans. These solutions rely on AI models trained on (labeled) store data at regular intervals to improve their accuracy.
Figure 1: Smart shelf scanning by automated robot in supermarket setting
AI-Driven Insights for Shopper Behavior and Frictionless Checkout
A common goal for retailers is frictionless shopping and checkout that leverages machine learning, computer vision, cameras, and sensors to detect shopper movements within store, time they spend interacting with various products, store layout, as well as the products customers put in their basket and purchase at the register, all with minimal need to line up at the checkout or wait in line to interact with a traditional cashier. This is made possible at scale with AI models trained to detect people, products, and their interactions within the stores, as collected from all the camera and sensor data in these stores. Using labels, product and model metadata, and their relationships with images and videos enables these shopper insights and analytics. Leading retailers use these insights for effective category management and to drive their overall retail strategy.
Figure 2: Heatmaps in a supermarket setting showing where people spend more time and other details
Personalized Recommendations
As consumers, we are likelier to buy something if it visually appeals to us. These personalized recommendations using product signatures or embeddings require deep learning models to help form clusters of similar products based on visual features like colors and patterns correlated to the user buying the products. Vector search and classification filtered with user metadata is a key element when recommending the right products. These can then be shown online or even in store on personalized displays, together with other relevant product information fetched from this enriched catalog.
Figure 3: Outfit recommendations that delight shoppers and boost cart sizes
Current Challenges Facing Data Scientists And AI Teams
The benefits of smart retailers and ecommerce leaders harnessing multimodal data are clear yet it is resource-consuming and depends on quality data to support. Their top goals are to provide valuable customer insights, optimize resources, and increase sales but they are challenging to reach. While AI algorithms and models are seeing rapid improvements, common challenges for data scientists and AI teams to prove the value and deploy in production are outlined below:
Data Not Accessible: Critical business information is often dispersed in hard-to-reach silos, creating a challenge for teams to access the relevant knowledge collaboratively. Unfortunately, this can lead to a lack of shared understanding or, even worse, inconsistent replication of data across different teams.
Data Inconsistency and Loss: When subpar tools are in use, data loss and consistency problems become significant concerns. This can cast doubt on the reliability of insights, whether it's due to outdated data or insufficient high-quality data, thereby questioning the true business value.
Rising Costs: Cloud costs are on the rise, raising questions about the cost vs. benefit of utilizing multimodal data. Data science expenses often surge without a commensurate return on investment (ROI) due to ineffective resource utilization caused by suboptimal tooling.
Not Production Ready: A production-ready system providing adequate scaling, performance, and security guarantees is even harder to build for complex data and such evolving use cases. This can easily cause 6 months to a year of delay in valuable ML research.
Cannot scale with growing needs: Scaling to large volumes is hard and achieving high performance can be very challenging.
Even with advancements in data science and machine learning, the success of AI heavily relies on dependable and accurate data. All of the use cases detailed above require:
- Easily storing and cataloging the data that’s being continuously generated
- Iteratively training ML models using these in-store or online images or videos, regularly, to continue to improve accuracy on latest data
- Seamlessly integrating with labeling and curation frameworks in-house or through 3rd party vendors as this data can often require annotations
- Finally, generating useful insights or creating relevant datasets using product and vector search capabilities which in turn require all the data to be indexed and continuously enriched, in a consistent manner
Next Steps On Your Multimodal AI Journey
Use cases like these and the challenges explained above are exactly why retailers and ecommerce leaders need a database purpose-built for multimodal AI. This can help them build a central repository of their product images, store videos, and corresponding attribute metadata as well as keep track of their annotations, embeddings, datasets, and relevant model behaviors. Such a database is also necessary to enable collaboration among data science and engineering teams so that they can build on each other’s work, and keep evolving the richness of information they manage. When successful, retailers and ecommerce leaders gain invaluable customer insights leading to better customer experiences with more efficient and profitable operations.
The ability to search, efficiently access, process, and visualize data is paramount for the success of AI deployments. Many retailers begin with cloud-based storage solutions but then realize, sometimes quite late, that when it comes to multimodal data for AI, specifically images, videos, or even documents, just knowing filenames often isn't enough. Searching via different modalities like metadata, labels, embeddings, requires multiple databases catering to each type, and then preprocessing the required data to the right format requires complex libraries like ffmpeg or opencv. The various components then need to be stitched together which is often done in an ad hoc manner and these traditional data management solutions don’t deliver what retailers and ecommerce leaders need.
Consider ApertureDB - A Purpose-Built Database for Launching Multimodal AI
A unified approach to multimodal data, ApertureDB replaces the manual integration of multiple systems to achieve multimodal search and access. ApertureDB unifies the management of images, videos, embeddings, and associated metadata including annotations and integrates functionalities of a vector database, intelligence graph, and multimodal data, to seamlessly query across data domains. It provides seamless integration within existing and new analytics pipelines in a cloud-agnostic manner to bring speed, agility, and productivity to data science and ML teams. ApertureDB allows all of the relevant data to be colocated for efficient retrieval, and complex queries to be handled transactionally.
Figure 4: A purpose-built database can really simplify users' data pipelines and shift focus back to the primary machine learning tasks and data understanding
If your organization uses or intends to use multimodal data (small or large team) or you are simply curious about our technology, our approach to infrastructure development, or where we are headed, please contact us at team@aperturedata.io or try out ApertureDB on pre-loaded datasets. If you’re excited to join an early-stage startup and make a big difference, we’re hiring. Last but not least, we will be documenting our journey and explaining all the components listed above on our blog, subscribe here.
I want to acknowledge Laura Horvath for helping write this blog and the insights from Drew Ogle, and the ApertureData team.