0% found this document useful (0 votes)
35 views19 pages

4220 2 (Bigdata)

American streaming giant Netflix collects billions of data points from user interactions to power its recommendation system. This includes ratings, watch histories, and metadata. Netflix uses big data techniques like Hadoop, machine learning algorithms, and matrix factorization to analyze this unstructured and high-volume data and provide personalized recommendations. This has increased user engagement and satisfaction while reducing cancellation rates.

Uploaded by

darren boesono
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views19 pages

4220 2 (Bigdata)

American streaming giant Netflix collects billions of data points from user interactions to power its recommendation system. This includes ratings, watch histories, and metadata. Netflix uses big data techniques like Hadoop, machine learning algorithms, and matrix factorization to analyze this unstructured and high-volume data and provide personalized recommendations. This has increased user engagement and satisfaction while reducing cancellation rates.

Uploaded by

darren boesono
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Introduction to Big Data

Instructor: Li Yang
What’s big data?

• Term for data sets so large and complex that traditional data processing
and storage techniques fail

• Large Volum: (M,G,TB,PB)

• Large Variety: (online data: web, photo, video, social; offline data: sensor
data,…)

• Varied Velocity: (periodic, realtime)

• Veracity: (quality of data: usually poor for big data)


Why big data?
• High tech results in the need for restoring and processing huge amounts of data

• Web and Super (cloud) computing

• Traditional data processing techniques (RDBMSs) fail:

• Fit for numeric, well-structured, clean (no missing) data

• Scaling requires for high costs (expensive hardware)

• Fault tolerance (ability to rescue the hardware failure) is also expensive

• Traditional data processing techniques can’t scale to fit for big data without massive code
development
Evolution of big data techniques
• Hadoop

• HDFS

• Map Reduce

• Spark: designed to run on top of Hadoop (upgrade)

• User-friendly

• Efficient: 100 times faster in memory and 10 times faster running in disk than MapReduce

• Combines SQL, Streaming, and other complicated analytics

• Runs ‘everywhere’ (not only on Hadoop but also Mesos, …)

Big data analytics: comparison of Hadoop MapReduce and Apache Spark, 2016
Big Data Examples
• Walmart (more details later)

• Data mining: discover consumer’s purchase pattern

• Hadoop and NoSQL technique

• Uber

• Machine learning: predict the demand everywhere and set the local price

• Netflix (more details later)

• Machine learning: cater each consumer’s preference (recommendation engine)

• Hadoop, SQL, Cassandra: online on-demand video streaming data

• eBay

• requirement: rapidly data analysis for streaming data and quick action on it

• Apache Spark, Storm, Kafka

• Procter&Gamble

• marketing, product development, supply chain

• Hadoop
Example of Big Data: Walmart
How Big Data Analysis helped increase Walmarts Sales turnover?

• Walmart is an American multinational retail corporation that operates a chain of hypermarkets,


discount department stores, and grocery stores from the united states, headquartered in
Bentonville, Arkansas. (by Wikipedia)

• Walmart ranks ? in Fortune 500 in 2021.

Walmart had a banner 2020, with


U.S. e-commerce sales up 79%
as pandemic-weary customers
consolidated shopping trips to
fewer retailers and took
advantage of the big-box giant’s
strong curbside pickup offering.
Its Sam’s Club and international
businesses also boomed for
similar reasons.
Example of Big Data: Walmart
How Big Data Analysis helped increase Walmarts Sales turnover?

• Walmart is an American multinational retail corporation that operates a chain of hypermarkets,


discount department stores, and grocery stores from the united states, headquartered in
Bentonville, Arkansas. (by Wikipedia)

• Walmart ranks ? # 1 in Fortune 500 in 2021.

Walmart had a banner 2020, with


U.S. e-commerce sales up 79%
as pandemic-weary customers
consolidated shopping trips to
fewer retailers and took
advantage of the big-box giant’s
strong curbside pickup offering.
Its Sam’s Club and international
businesses also boomed for
similar reasons.
Walmart Data Source 1: consumers

Walmart tracks and targets


every consumer individually

• Walmart gathers information on what


customer’s buy, where they live and what are
the products they like through in-store Wi-Fi

• Walmart collects every clickable action on


Walmart.com-what consumers buy in-store
and online

• Walmart also pay attention to the local news,


trending on social network, even local
weather.
.

Walmart Data Source 2: employees and itself

Walmart tracks every


employe

• Walmart collects the online retailers’


informatio

• Walmart gathers the employees’ information


to optimize its own organization and improve
ef ciency
fi
n

Example of Big Data: Walmart


• Summary: American multinational retail giant Walmart collects 2.5
petabytes of unstructured data from 1 million customers every hour
Usage of Big Data by Walmart
• Launching new products

• Design the most popular product to catch the trend (Christmas products)

• Better Predictive Analysis

• Demand

• Pricing

• Logistics

• Customized Recommendations

• Designed coupon

• Designed advertising
Big Data Analytic Solutions
• Social Media Big Data Solutions

• Social Media Data is unstructured, informal and generally ungrammatical

• Big part of Walmart’s data driven decision are based on social media data: (Facebook comments, Pinterest pins,
Twitter Tweets, LinkedIn shares …)

1. Social Genome: developed by WalmartLabs; social network data; better analyze the context of their users

2. Shopycat-gift recommendation engine at Walmart: app developed by Walmart; help consumers to buy ideal
gift for their friends during the holiday rush; also give detail reference information for the recommendations

3. Inventory management at Walmart: help managers to optimize the storage for the products; how many cashiers
and self-checkout should be open?

• Mobile Big Data Solutions

• More than half of the Walmart’s customers use Smartphones

• Walmart’s mobile application: a shopping list that can tell customers the position of their wants and helps them by
providing discounts; geofencing feature of Walmart’s mobile app senses whenever a user enters the Walmart store in US.

• https://www.forbes.com/sites/bernardmarr/2017/08/29/how-walmart-is-using-machine-learning-ai-iot-and-
big-data-to-boost-retail-performance/?sh=68bd71496cb1
Example of Big Data: Netflix
Net ix Recommender System — A Big Data Case Study, Kasula, 2020

• Netflix an American over-the-top content platform and production company headquartered in Los
Gatos, CA. The company's primary business is a subscription-based streaming service offering
online streaming from a library of films and television series, including those produced in-house.
(by Wikipedia)

• Their main source of income comes from users’ subscription fees. They allow users to stream data
from a wide range of their movies and TV shows at any time on a variety of internet-connected
services

• The primary asset of Netflix is their technology. Especially their recommendation system. The
study of the recommendation system is a branch of information filtering systems (Recommender
system, 2020).

• Most of the recommender systems study users by using their history. Recommender systems
have two primary approaches. They are collaborative filtering and content-filtering.
fl
Big Data Source: Netflix

• Internal source of data:

• Billion ratings from its members

• Stream related data such as the duration, time of playing, type of the device, day of the week and other
context-related information.

• The pattern and the titles that their subscribers add to their queues

• All the metadata related to a title in their catalog such as director, actor, genre, rating and reviews from different
platforms.

• The search-related text information by Netflix subscribers

• External source of data:

• box office information, performance and critic reviews

• demographics, culture, language, and other temporal data


Big Data Example: Netflix

• What does Netflix want from Big Data?

• Recommend the `next content’ to its user

• What is the `next content’ for each consumer?

• What are the big-data challenges for Netflix?

• volume: approximately 105TB of data with respect to videos alone; 10,000 GB of rating data alone

• velocity: collect data about the time of the data, the types of devices you watch content on, the duration of your watch

• Veracity: bias, noise, and abnormalities in data; Not all movies were rated equally by an individual

• Variety: most of the data in a structured format such as time of the day, duration of watch, popularity, social data, search-related
information, stream related data, etc. However, Netflix could also be using unstructured data. For example, thumbnail pictures that
it uses for personalization.
Data Ecosystem: Netflix
Big Data Example: Netflix

• What are advanced techniques used for Big Data?

• Data Storage and preprocessing

• Hadoop, Cassandra, S3

• Machine learning

• Supervised learning: classification, regression

• Unsupervised learning: clustering, compression, dimension reduction

• Other techniques

• Matrix factorization

• Singular valuation decomposition

• Probabilistic graphic model

• Ensemble method
Big Data Example: Netflix

• What are the results obtain from Big Data for Netflix?

• The overall engagement rate by the user with Netflix has increased with the help of the
recommender system. This led to lower cancellation rates and increased streaming hours volume

• Member satisfaction increased with the development and changes to the recommendation system.

• Personalization and recommendations save Netflix more than $1Billion per year.

• Examples:

• the winning algorithm was able to increase the predicting ratings and improved ‘Cinematch’
by 10.06% (Netflix Prize, 2020).

• According to (Netflix Technology Blog, 2017b), Singular Value Decomposition was able to
reduce the RMSE to 89.14% whereas Restricted Boltzmann Machines helped in reducing
RMSE to 89.90%

You might also like