4220 2 (Bigdata)
4220 2 (Bigdata)
4220 2 (Bigdata)
Instructor: Li Yang
What’s big data?
• Term for data sets so large and complex that traditional data processing
and storage techniques fail
• Large Variety: (online data: web, photo, video, social; offline data: sensor
data,…)
• Traditional data processing techniques can’t scale to fit for big data without massive code
development
Evolution of big data techniques
• Hadoop
• HDFS
• Map Reduce
• User-friendly
• Efficient: 100 times faster in memory and 10 times faster running in disk than MapReduce
Big data analytics: comparison of Hadoop MapReduce and Apache Spark, 2016
Big Data Examples
• Walmart (more details later)
• Uber
• Machine learning: predict the demand everywhere and set the local price
• eBay
• requirement: rapidly data analysis for streaming data and quick action on it
• Procter&Gamble
• Hadoop
Example of Big Data: Walmart
How Big Data Analysis helped increase Walmarts Sales turnover?
• Design the most popular product to catch the trend (Christmas products)
• Demand
• Pricing
• Logistics
• Customized Recommendations
• Designed coupon
• Designed advertising
Big Data Analytic Solutions
• Social Media Big Data Solutions
• Big part of Walmart’s data driven decision are based on social media data: (Facebook comments, Pinterest pins,
Twitter Tweets, LinkedIn shares …)
1. Social Genome: developed by WalmartLabs; social network data; better analyze the context of their users
2. Shopycat-gift recommendation engine at Walmart: app developed by Walmart; help consumers to buy ideal
gift for their friends during the holiday rush; also give detail reference information for the recommendations
3. Inventory management at Walmart: help managers to optimize the storage for the products; how many cashiers
and self-checkout should be open?
• Walmart’s mobile application: a shopping list that can tell customers the position of their wants and helps them by
providing discounts; geofencing feature of Walmart’s mobile app senses whenever a user enters the Walmart store in US.
• https://www.forbes.com/sites/bernardmarr/2017/08/29/how-walmart-is-using-machine-learning-ai-iot-and-
big-data-to-boost-retail-performance/?sh=68bd71496cb1
Example of Big Data: Netflix
Net ix Recommender System — A Big Data Case Study, Kasula, 2020
• Netflix an American over-the-top content platform and production company headquartered in Los
Gatos, CA. The company's primary business is a subscription-based streaming service offering
online streaming from a library of films and television series, including those produced in-house.
(by Wikipedia)
• Their main source of income comes from users’ subscription fees. They allow users to stream data
from a wide range of their movies and TV shows at any time on a variety of internet-connected
services
• The primary asset of Netflix is their technology. Especially their recommendation system. The
study of the recommendation system is a branch of information filtering systems (Recommender
system, 2020).
• Most of the recommender systems study users by using their history. Recommender systems
have two primary approaches. They are collaborative filtering and content-filtering.
fl
Big Data Source: Netflix
• Stream related data such as the duration, time of playing, type of the device, day of the week and other
context-related information.
• The pattern and the titles that their subscribers add to their queues
• All the metadata related to a title in their catalog such as director, actor, genre, rating and reviews from different
platforms.
• volume: approximately 105TB of data with respect to videos alone; 10,000 GB of rating data alone
• velocity: collect data about the time of the data, the types of devices you watch content on, the duration of your watch
• Veracity: bias, noise, and abnormalities in data; Not all movies were rated equally by an individual
• Variety: most of the data in a structured format such as time of the day, duration of watch, popularity, social data, search-related
information, stream related data, etc. However, Netflix could also be using unstructured data. For example, thumbnail pictures that
it uses for personalization.
Data Ecosystem: Netflix
Big Data Example: Netflix
• Hadoop, Cassandra, S3
• Machine learning
• Other techniques
• Matrix factorization
• Ensemble method
Big Data Example: Netflix
• What are the results obtain from Big Data for Netflix?
• The overall engagement rate by the user with Netflix has increased with the help of the
recommender system. This led to lower cancellation rates and increased streaming hours volume
• Member satisfaction increased with the development and changes to the recommendation system.
• Personalization and recommendations save Netflix more than $1Billion per year.
• Examples:
• the winning algorithm was able to increase the predicting ratings and improved ‘Cinematch’
by 10.06% (Netflix Prize, 2020).
• According to (Netflix Technology Blog, 2017b), Singular Value Decomposition was able to
reduce the RMSE to 89.14% whereas Restricted Boltzmann Machines helped in reducing
RMSE to 89.90%