4220 2 (Bigdata)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Introduction to Big Data

Instructor: Li Yang
What’s big data?

• Term for data sets so large and complex that traditional data processing
and storage techniques fail

• Large Volum: (M,G,TB,PB)

• Large Variety: (online data: web, photo, video, social; offline data: sensor
data,…)

• Varied Velocity: (periodic, realtime)

• Veracity: (quality of data: usually poor for big data)


Why big data?
• High tech results in the need for restoring and processing huge amounts of data

• Web and Super (cloud) computing

• Traditional data processing techniques (RDBMSs) fail:

• Fit for numeric, well-structured, clean (no missing) data

• Scaling requires for high costs (expensive hardware)

• Fault tolerance (ability to rescue the hardware failure) is also expensive

• Traditional data processing techniques can’t scale to fit for big data without massive code
development
Evolution of big data techniques
• Hadoop

• HDFS

• Map Reduce

• Spark: designed to run on top of Hadoop (upgrade)

• User-friendly

• Efficient: 100 times faster in memory and 10 times faster running in disk than MapReduce

• Combines SQL, Streaming, and other complicated analytics

• Runs ‘everywhere’ (not only on Hadoop but also Mesos, …)

Big data analytics: comparison of Hadoop MapReduce and Apache Spark, 2016
Big Data Examples
• Walmart (more details later)

• Data mining: discover consumer’s purchase pattern

• Hadoop and NoSQL technique

• Uber

• Machine learning: predict the demand everywhere and set the local price

• Netflix (more details later)

• Machine learning: cater each consumer’s preference (recommendation engine)

• Hadoop, SQL, Cassandra: online on-demand video streaming data

• eBay

• requirement: rapidly data analysis for streaming data and quick action on it

• Apache Spark, Storm, Kafka

• Procter&Gamble

• marketing, product development, supply chain

• Hadoop
Example of Big Data: Walmart
How Big Data Analysis helped increase Walmarts Sales turnover?

• Walmart is an American multinational retail corporation that operates a chain of hypermarkets,


discount department stores, and grocery stores from the united states, headquartered in
Bentonville, Arkansas. (by Wikipedia)

• Walmart ranks ? in Fortune 500 in 2021.

Walmart had a banner 2020, with


U.S. e-commerce sales up 79%
as pandemic-weary customers
consolidated shopping trips to
fewer retailers and took
advantage of the big-box giant’s
strong curbside pickup offering.
Its Sam’s Club and international
businesses also boomed for
similar reasons.
Example of Big Data: Walmart
How Big Data Analysis helped increase Walmarts Sales turnover?

• Walmart is an American multinational retail corporation that operates a chain of hypermarkets,


discount department stores, and grocery stores from the united states, headquartered in
Bentonville, Arkansas. (by Wikipedia)

• Walmart ranks ? # 1 in Fortune 500 in 2021.

Walmart had a banner 2020, with


U.S. e-commerce sales up 79%
as pandemic-weary customers
consolidated shopping trips to
fewer retailers and took
advantage of the big-box giant’s
strong curbside pickup offering.
Its Sam’s Club and international
businesses also boomed for
similar reasons.
Walmart Data Source 1: consumers

Walmart tracks and targets


every consumer individually

• Walmart gathers information on what


customer’s buy, where they live and what are
the products they like through in-store Wi-Fi

• Walmart collects every clickable action on


Walmart.com-what consumers buy in-store
and online

• Walmart also pay attention to the local news,


trending on social network, even local
weather.
.

Walmart Data Source 2: employees and itself

Walmart tracks every


employe

• Walmart collects the online retailers’


informatio

• Walmart gathers the employees’ information


to optimize its own organization and improve
ef ciency
fi
n

Example of Big Data: Walmart


• Summary: American multinational retail giant Walmart collects 2.5
petabytes of unstructured data from 1 million customers every hour
Usage of Big Data by Walmart
• Launching new products

• Design the most popular product to catch the trend (Christmas products)

• Better Predictive Analysis

• Demand

• Pricing

• Logistics

• Customized Recommendations

• Designed coupon

• Designed advertising
Big Data Analytic Solutions
• Social Media Big Data Solutions

• Social Media Data is unstructured, informal and generally ungrammatical

• Big part of Walmart’s data driven decision are based on social media data: (Facebook comments, Pinterest pins,
Twitter Tweets, LinkedIn shares …)

1. Social Genome: developed by WalmartLabs; social network data; better analyze the context of their users

2. Shopycat-gift recommendation engine at Walmart: app developed by Walmart; help consumers to buy ideal
gift for their friends during the holiday rush; also give detail reference information for the recommendations

3. Inventory management at Walmart: help managers to optimize the storage for the products; how many cashiers
and self-checkout should be open?

• Mobile Big Data Solutions

• More than half of the Walmart’s customers use Smartphones

• Walmart’s mobile application: a shopping list that can tell customers the position of their wants and helps them by
providing discounts; geofencing feature of Walmart’s mobile app senses whenever a user enters the Walmart store in US.

• https://www.forbes.com/sites/bernardmarr/2017/08/29/how-walmart-is-using-machine-learning-ai-iot-and-
big-data-to-boost-retail-performance/?sh=68bd71496cb1
Example of Big Data: Netflix
Net ix Recommender System — A Big Data Case Study, Kasula, 2020

• Netflix an American over-the-top content platform and production company headquartered in Los
Gatos, CA. The company's primary business is a subscription-based streaming service offering
online streaming from a library of films and television series, including those produced in-house.
(by Wikipedia)

• Their main source of income comes from users’ subscription fees. They allow users to stream data
from a wide range of their movies and TV shows at any time on a variety of internet-connected
services

• The primary asset of Netflix is their technology. Especially their recommendation system. The
study of the recommendation system is a branch of information filtering systems (Recommender
system, 2020).

• Most of the recommender systems study users by using their history. Recommender systems
have two primary approaches. They are collaborative filtering and content-filtering.
fl
Big Data Source: Netflix

• Internal source of data:

• Billion ratings from its members

• Stream related data such as the duration, time of playing, type of the device, day of the week and other
context-related information.

• The pattern and the titles that their subscribers add to their queues

• All the metadata related to a title in their catalog such as director, actor, genre, rating and reviews from different
platforms.

• The search-related text information by Netflix subscribers

• External source of data:

• box office information, performance and critic reviews

• demographics, culture, language, and other temporal data


Big Data Example: Netflix

• What does Netflix want from Big Data?

• Recommend the `next content’ to its user

• What is the `next content’ for each consumer?

• What are the big-data challenges for Netflix?

• volume: approximately 105TB of data with respect to videos alone; 10,000 GB of rating data alone

• velocity: collect data about the time of the data, the types of devices you watch content on, the duration of your watch

• Veracity: bias, noise, and abnormalities in data; Not all movies were rated equally by an individual

• Variety: most of the data in a structured format such as time of the day, duration of watch, popularity, social data, search-related
information, stream related data, etc. However, Netflix could also be using unstructured data. For example, thumbnail pictures that
it uses for personalization.
Data Ecosystem: Netflix
Big Data Example: Netflix

• What are advanced techniques used for Big Data?

• Data Storage and preprocessing

• Hadoop, Cassandra, S3

• Machine learning

• Supervised learning: classification, regression

• Unsupervised learning: clustering, compression, dimension reduction

• Other techniques

• Matrix factorization

• Singular valuation decomposition

• Probabilistic graphic model

• Ensemble method
Big Data Example: Netflix

• What are the results obtain from Big Data for Netflix?

• The overall engagement rate by the user with Netflix has increased with the help of the
recommender system. This led to lower cancellation rates and increased streaming hours volume

• Member satisfaction increased with the development and changes to the recommendation system.

• Personalization and recommendations save Netflix more than $1Billion per year.

• Examples:

• the winning algorithm was able to increase the predicting ratings and improved ‘Cinematch’
by 10.06% (Netflix Prize, 2020).

• According to (Netflix Technology Blog, 2017b), Singular Value Decomposition was able to
reduce the RMSE to 89.14% whereas Restricted Boltzmann Machines helped in reducing
RMSE to 89.90%

You might also like