Multimodal Search for Enhanced Ecommerce Product Discovery

6
 min. read
December 24, 2024
Multimodal Search for Enhanced Ecommerce Product Discovery

Multimodal search is revolutionizing online shopping. Here's what you need to know:

  • Combines text, images, and voice for more accurate product searches
  • Boosts sales and improves product discovery
  • Major players: Focal, eBay, and Faire

Quick comparison:

Company Key Feature Strength Weakness
Focal Google Product Taxonomy Long-tail queries Fashion-focused
eBay Multimodal Item Embedding Big data processing Relies on seller data
Faire Image-Text Model (ITEm) Balances image/text Marketplace-specific

Results:

Challenges:

  • Privacy concerns
  • High computing power needs
  • Complex setup for smaller shops

Bottom line: Multimodal search is shaping the future of online shopping, making it easier for customers to find what they want and boosting sales for businesses.

1. Focal

Focal

Focal's multimodal search system is changing the game in e-commerce product categorization. How? By combining text and visual features to help shoppers find what they're looking for, even when words fail them.

The system digs into a treasure trove of product data:

  • Title
  • Description
  • Vendor
  • Product type
  • Collection
  • Tags
  • Images

But here's where it gets interesting: Focal's model is built on the Google Product Taxonomy (GPT). We're talking over 5,500 categories in a tree structure. That's some serious detail.

So, what's the big deal? Well, Focal's system delivers:

  • 8% boost in leaf precision
  • Double the coverage
  • Support for multiple languages

Now, you might be wondering how they pulled this off. The secret sauce? Their model treats each taxonomy level as its own classification problem. It's like having a team of experts, each focused on a specific part of the puzzle.

But let's be real - this isn't a small task. Focal's team had to use distributed TensorFlow on Google Cloud Platform to handle the complexity. The result? A model with over 250 million parameters. That's not just big - it's MASSIVE.

For e-commerce businesses looking to follow in Focal's footsteps, here's the takeaway:

  1. Data quality is KING. If you're missing info, go out and get it.
  2. Don't forget about merchant-level features. They can give your predictions a serious boost.

2. eBay's Product Data System

eBay

eBay's new Multimodal Item Embedding solution is shaking things up. It mixes text and image data to make better recommendations.

What's new?

  • Spots mismatches between images and text
  • Uses triplet loss with TransH for recall
  • Combines BERT (text) and Resnet-50 (image) embeddings

The results? Pretty impressive:

Metric Boost
Buyer Engagement +15%
Click Through Rate +15.9%
Purchase Through Rate +31.5%

These aren't just numbers. They're from real A/B tests.

Eddie Garcia, eBay's Chief Product Officer, says:

"AI and deep learning has been infused throughout eBay. I think the difference in the last six months has been the dawn of the large language models that make this very conversational."

But eBay's not done. They're:

  • Using AI to help sellers write descriptions
  • Planning AI-powered vehicle part upgrade suggestions
  • Tapping into 28 years of shopping data

With 20 billion images and data from 190 markets, eBay's got a LOT to work with. They're bringing order to the chaos of 1.2 billion listings.

eBay's not just about recommendations. They're using machine learning for:

  • Computer vision
  • Machine translation
  • Natural language processing
  • Search and personalization

Fun fact: Their machine translation boosted exports by 17.5% (English to Spanish).

eBay's multimodal approach shows how mixing data types can lead to better product discovery and more sales. It's a glimpse into the future of e-commerce.

sbb-itb-2812cee

3. Faire's Image and Text Model

Faire's marketplace uses a smart image-text model (ITEm) to connect brands and retailers. This model learns from product images and titles to improve product discovery.

ITEm's key features:

  • Learns on its own
  • Balances image and text data
  • Picks up on details without knowing where to look

The model uses five tasks to learn:

1. Image-text matching

2. Masked image modeling

3. Masked language modeling

4. Global masked language modeling

5. Global masked image modeling

These tasks help ITEm understand images and text better.

ITEm beats other systems in two areas:

Task How ITEm Does
Same Product Recommendation Better than single-type models
Leaf Category Prediction Best results, showing it learns well from both images and text

ITEm works with a big dataset:

Dataset Images Product Categories
ITOP 1.1 million 1,275

This dataset has matching products and non-matches to test accuracy.

Faire's tech helps brands:

  • Tags products into 3,000 types using name, description, and images
  • Suggests the best product type tag
  • Finds what makes products sell well

Brands that use Faire fully get 40% more views and orders in their first month.

Faire's search works for two types of shopping:

  1. When you know what you want
  2. When you're just browsing

This helps different types of customers find what they need.

Good and Bad Points

Let's compare these multimodal search tools:

Feature Focal eBay's System Faire's ITEm
Data Types Text, image Text, image, structured Text, image
Learning Self-supervised Supervised Self-supervised
Scalability High High High
Discovery Better Better Better
Strength Long-tail queries Big data processing Balances image/text
Weakness Fashion-only Needs seller data Marketplace-specific

Pros

Focal

  • Nails long-tail queries
  • Boosts fashion search accuracy
  • Cuts manual tagging

eBay's System

  • Handles tons of diverse data
  • Makes products easier to find
  • Improves buyer search results

Faire's ITEm

  • Balances image and text well
  • Beats single-type models
  • Top-notch category prediction

Cons

Focal

  • Stuck in fashion
  • Might miss super-niche items

eBay's System

  • Leans on seller data quality
  • Struggles with messy info

Faire's ITEm

  • Locked to Faire's marketplace
  • Needs tweaking for different products

E-commerce Impact

These tools are game-changers:

  • Finding Stuff: All three make shopping easier.
  • More Sales: Faire saw 40% more views and orders in a month.
  • Better Categories: Helps sellers and buyers navigate.
  • Personal Touch: Smarter recommendations based on context.

But there are hurdles:

  • Privacy: More data types = more risks.
  • Power Hungry: Need serious computing muscle.
  • Tricky Setup: Tough for smaller shops to implement.

As online shopping grows, these tools will shape how we buy stuff in the future.

Wrap-up

Multimodal search is shaking up online shopping. Here's what you need to know:

1. Better product discovery

Focal, eBay's system, and Faire's ITEm all make it easier for shoppers to find products.

2. Sales boost

Faire saw a 40% increase in views and orders in just one month after launching their multimodal search.

3. Improved accuracy

eBay's solution led to a 15.9% increase in Click Through Rate and a 31.5% jump in Purchase Through Rate.

4. Versatile search

Users can now search with text, images, or both, making shopping more intuitive.

Which tool is best? It depends:

  • Focal: Great for fashion and long-tail queries
  • eBay's system: Handles diverse data types at scale
  • Faire's ITEm: Balances image and text well

Keep in mind: These tools can be tough for smaller shops to implement. They need serious computing power and careful data handling.

As e-commerce grows, multimodal search will play a big role in its future.

Related posts