E-commerce Product Recommendations - Recommendation System

1 minute read

Context

This dataset taken from Kaggle.

Overview

A simple Recommendation system involving a content-based filtering, using Cosine Similarity and Jaccard Similarity. A summary results can be seen below, but for details can be seen in this notebook.

Summary

Exploratory Data Analysis

First, I need to clean the data. As we can see below the random sample show us there is tags HTML like <br>, <b>, <ul>, etc.

GitHub Logo

And after that, I’m curious about the distribution of length of the description. So I visualize it,

GitHub Logo

The distribution show us, that the seller’s strategy its good. The description shouldn’t be too long. After that, i want to know what words often appear in this dataset. I’m using word cloud to answer that question, that we can see below:

GitHub Logo

Oh, That’s make sense. Because when we see the description of the dataset, it said “500 actual SKUs from an outdoor apparel brand’s product catalog.” Thats why, there is many clothing material words and many product is an eco-friendly clothes.

Cosine Similarity & Jaccard Similarity

Cosine Similarity has a simple equation

While Jaccard Similarity equation is

The equations show us, how Cosine and Jaccard Similarity will work. In Cosine Similarity, the smaller the angle and the greater the match between vectors. We can imagine this, as how close two objects. And then, For Jaccard, it compares members in two sets to see how many members are shared and which are distinct. The bigger size of the intersection, the higher score of Jaccard Similarity. So here it is, let say our customer choose ID Product 218:

GitHub Logo

This is the top 3 Recommendation Products with Cosine Similarity Approach:

GitHub Logo

And this is the top 3 Recommendation Products with Jaccard Similarity Approach:

GitHub Logo

Conclusion

In the recommendation system there is no better or worse system. But when we look from the results, I can say, they have a good results and the recommendations given are similar to the product chosen. For detail, can be seen in this notebook.