Retrieving Visually-Similar Products for Shopping Recommendations using Spark and Tensorflow

As an e-commerce company leading in fashion and lifestyle in the Netherlands, Wehkamp dedicates itself to provide a better shopping experience for customers. Using Spark, the data science team is able to develop various machine-learning projects that improve the shopping experience.

One of the applications is to create a service for retrieving visually similar products, which can then be used to show substitutional products, to build visual recommenders and to improve the overall recommendation system. In this project, Spark is used throughout the entire pipeline: retrieving and processing the image data, training model distributedly with Tensorflow, extracting image features, and computing similarity. In this talk, we are going to demonstrate how Spark and the Databricks enable a small team to unify data and AI workflows, develop a pipeline for visual similarity and train dedicated neural network models.

展开查看详情

1.WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics

2.Retrieving visually similar products for Shopping Recommendations using Spark and Tensorflow Zhichao Zhong, Wehkamp #UnifiedDataAnalytics #SparkAISummit

3.Agenda ● Introduction ● Implementation ○ Image embedding extraction ○ Similarity search ○ Pipeline overview ● Summary

4.Zhichao Zhong ● Data scientist @ wehkamp ● Ph.D. in applied mathematics @ CWI

5.About wehkamp the online wehkamp: the online department department store forstore familiesfor in thefamilies Netherlands.in the Netherlands √ > 400.000 > 500.000 € 661 million products daily visitors sales 18/19 11 million 67 years’ packages history 18/19

6. About wehkamp 1952 - first advertisement 1955 - first catalog 1995 - first steps online 2010 - completely online 2018 - mobile first 2019 - a great shop experience

7.Data science at wehkamp Use the online data science department to improve storeexperience the online shopping for families in the for customers. Netherlands √ Search ranking Recommendation system And many others ... Personalization Visual similarity

8.Visual similarity Visuals are important for shopping, especially for fashion (our largest category). People look at look-alike items when shopping. Visual similarity: to retrieve similar items based on images. 8

9. Use cases Use case: to show substitutes for out-of-stock items in the look. Substitute Out of stock 9

10.Use cases Use case: to show similar items together on the products overview page. 10

11.Use cases Use case: to recommend similar items for newly onboarded items (the cold-start problem). 11

12.Steps for visual-similarity How to retrieve visually similar items? Step 1: Extract image embeddings. Step 2: Search for similar embeddings. 1 6 3 ..... 2 1 2 6 3 ..... 2 1 0 5 2 ..... 2 1 1 1 6 3 ..... 7 1 2 3 6 3 ..... 2 9 Similarity CNN search 1 3 8 ..... 3 1 12

13.Image embedding Image embedding: low-dimensional vector representations of the image that contains abstract information. 512⨉512⨉3 1 3 8 ..... 3 256 1 3 1 3 1 13

14.Image embedding Use convolutional neural network (CNN) to extract the embeddings. embedding convolutional/pooling/activation layers fully-connected prediction layer CNN 14

15.Transfer learning Use a pre-trained model? Train a model from scratch? • Adopt the VGG16 model pre-trained on the ImageNet dataset (natural images). • Replace the fully-connected layers. • Train the fully-connected layers on our own dataset. 3 512⨉1 Embedding ⨉ 24 2 4⨉ ge 22 a Im layers FC layer FC layer from VGG16 4096 512 15

16.Triplet loss Data triplet: anchor image, positive image and negative image Positive Anchor Negative Triplet loss: Similarity is defined by the Euclidean distance. are the embeddings for anchor, positive and negative images respectively. FaceNet: A Unified Embedding for Face Recognition and Clustering, F. Schroff et al. (2015) 16

17.Triplet loss Minimize the triplet loss Positive Anchor Negative Learning Positive Anchor Negative ɑ 17

18.Siamese network Siamese network: identical CNNs take two or more inputs. Identical CNNs CNN CNN Triplet loss CNN 18

19.Data preparation Similar product images are put in the same group. Sample triplets: • sample 2 images from the same group as the anchor and positive images. • sample 1 image from other groups as the negative image. 3500 images => 56000 triplets Positive Anchor Negative FaceNet: A Unified Embedding for Face Recognition and Clustering, F. Schroff et al. (2015) 19

20.Training result Precision@k on the test data. k: number of embeddings returned by the similarity search. 20

21.Model training • 50 epochs, 29 hours on a Nvidia K80 GPU • How can we scale up the model training to – fit more data, – fine tune the hyperparameters quickly? Use distributed training to speed up the training ! 21

22.Distributed training • Distributed training framework: Horovod by Uber. – Good scaling efficiency. – Minimal code modification. • Training API: HorovodRunner on Databricks, integrated with Spark’s barrier mode. 22

23. Distributed training Code example Single-machine 23

24.Distributed training The throughput scales up with more GPUs, 24

25.Distributed training but not as much as expected. 25

26.Steps for visual-similarity How to retrieve visually similar items? Step 1: extract image embeddings. • Train a model on our own dataset. • From single-machine to distributed training. Step 2: search for similar embeddings. 1 6 3 ..... 2 1 2 6 3 ..... 2 1 0 5 2 ..... 2 1 1 1 6 3 ..... 7 1 2 3 6 3 ..... 2 9 Similarity CNN search 1 3 8 ..... 3 1 26

27.Similar items retrieval • Brute-force search can be expensive and slow for large size of high dimensional data. • We use the approximate similarity search implemented in Spark: • Hash step: hash similar embeddings into the same buckets using locality sensitive hashing (LSH). • Search step: only search for embeddings in the same buckets with Euclidean distance. Hashing for Similarity Search: A Survey, J. Wang et al. (2014) 27

28.Locality sensitive hashing LSH hashes dimensional vectors with a small distance into the same buckets with a high probability. LSH The hash function for Euclidean distance is: , where v is a random unit vector, r is the bucket length. Example: v = [0.44, 0.90], r = 2 x1 = [2.0, 2.0], h(x1) = 1 x2 = [2.0, 3.0], h(x2) = 1 x3 = [0.0, 5.0], h(x3) = 2 28

29.Parameters in LSH h1 h2 hn-1 hn Two parameters: bucketLength: the length of each hash bucket. numHashTable: the number of hash tables. accuracy query performance bucketLength numHashTable 29