0% found this document useful (0 votes)

9 views

Data Analytics Assignment

Da useful

Uploaded by

koushikmarelly

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Data Analytics Assignment

Da useful

Uploaded by

koushikmarelly

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 17

Data

Analytics
Assignment

Name: M. Sai Koushik

Roll No: 22011P0524
( CSE-IDP ) “B”
Data Analytics in Business: Modeling
Strategies Across Four Companies

Introduction:

In today’s digital era, data analytics has become the backbone of modern businesses,
driving informed decision-making and fostering innovation. Companies across industries
generate massive amounts of data daily, encompassing customer behaviour, operational
processes, market trends, and more. Harnessing this data effectively is crucial for
maintaining a competitive edge. This is where data modelling comes into play.

Data modelling involves transforming raw data into actionable insights through
structured methodologies and machine learning algorithms. Businesses use these models
to predict customer preferences, optimize operations, improve user experiences, and
design smarter products and services. From dividing datasets for training and testing to
scaling and imputing missing values, each step in the data pipeline ensures accuracy and
reliability of outcomes.

This document delves into how four industry leaders—Amazon, Netflix, Tesla, and
Google—utilize data analytics to model their business strategies. These companies are
pioneers in their respective domains, leveraging state-of-the-art data processing
techniques to achieve remarkable results.

Key aspects explored in this document include:

 Data Division: How datasets are split into training, validation, and test sets for
optimal performance.
 Data Scaling: Techniques used to normalize or standardize data for better model
compatibility.
 Model Selection: The types of machine learning models employed, such as
regression, classification, and neural networks.
 Decision Trees and Graphs: The role of visual and hierarchical decision-making
tools in improving predictions.
 Data Imputation: Strategies to address missing or incomplete data, ensuring robust
model performance.

Through these examples, the document aims to provide a comprehensive understanding

of how leading companies employ data analytics not only to solve complex challenges
but also to stay ahead in their industries. By examining their methodologies, we gain
valuable insights into best practices that can be applied across various domains.

1. Amazon:

Amazon's Data Analytics Framework

Amazon is one of the pioneers in leveraging data analytics to improve customer

experiences, optimize operations, and drive revenue. The company's analytics strategies
span diverse areas such as recommendation systems, inventory management, and voice
recognition through Alexa.

1. Data Division

Amazon collects vast amounts of data, including:

 User Behavior: Clicks, browsing history, purchase patterns.

 Product Data: Pricing, reviews, inventory levels.
 Operational Data: Shipping times, warehouse efficiency.

To ensure effective model training and testing, the data is divided as follows:

 Training Data (70%): Used to build models, such as collaborative filtering

algorithms for recommendations.
o Example: Historical purchase patterns and product views of millions of
users are used to train recommendation engines.
 Validation Data (15%): Used to fine-tune hyper parameters.
o Example: Validating the performance of recommendation models to ensure
accurate predictions for unseen products.
 Test Data (15%): Used for evaluating model performance.
o Example: Assessing how well the model predicts items a user is likely to
purchase.

2. Data Scaling

Scaling ensures that all features contribute equally to the model's performance. Amazon
applies:

 Min-Max Normalization:
o Normalizes features like user age, order value, and product ratings to a
range of [0, 1].
o Example:
 Original data: Order Value = $10, $100, $500
 Scaled data: Order Value = 0.02, 0.2, 1
o This ensures no single feature (like order value) disproportionately affects
the model.

 Standardization:
o For datasets with Gaussian distributions (e.g., delivery times), Amazon uses
z-score normalization.
o Example:
 Delivery times are standardized using: z=x−μσz = \frac{x - \mu}{\
sigma} where xx is a delivery time, μ\mu is the mean, and σ\sigma
is the standard deviation.

3. Models Used

Amazon relies on advanced machine learning models tailored for specific use cases:

1. Collaborative Filtering:
o Used for personalized recommendations.
o Example:
 A user buys a book on data science. The system suggests related
books based on purchases by other users with similar interests.

2. Deep Learning Models:

o Neural Networks power Alexa's speech recognition and natural language
processing.
o Example:
 When a user asks Alexa, "What’s the weather like?", deep learning
models process the audio to identify keywords and provide the
appropriate response.

3. Logistic Regression:
o Used for binary classification, such as detecting fraudulent transactions.
o Example:
 Amazon identifies patterns of unusual purchases or location
mismatches to flag suspicious activities.

4. Decision Trees or Graphs

Decision trees and graph-based methods are extensively used in Amazon's operations:

 Decision Trees:
o Applied in warehouse management to optimize inventory levels.
o Example:
 A decision tree predicts which products need to be stocked more
heavily during holiday seasons based on historical sales data.

 Graphs:
o Used in Amazon’s supply chain optimization.
o Example:
 Graph-based algorithms determine the shortest delivery routes,
reducing shipping times and costs.

5. Data Imputation

Handling missing or incomplete data is critical for Amazon's operations:

 Mean Imputation:
o For missing values in numerical features like delivery time or product
ratings.
o Example:
 If a product's average rating is missing, Amazon imputes it with the
average rating from similar products in the same category.

 Collaborative Imputation:
o For sparse user-item matrices in recommendation systems.
o Example:
 If a user hasn’t rated a product, the system estimates a rating based
on ratings given by similar users.

 Advanced Techniques:
o Amazon uses predictive imputation for real-time scenarios.
o Example:
 Predicting missing delivery time for orders based on traffic, weather,
and location data.
End-to-End Example: Amazon’s Recommendation System

1. Data Collection:
o Collects data on a user's interactions, including search terms, product views,
and purchases.

2. Data Division:
o 70% of this data trains the recommendation engine, 15% tunes parameters,
and 15% evaluates performance.

3. Data Scaling:
o Normalizes features like product prices and user demographics.

4. Model Training:
o Collaborative Filtering:
 A model predicts that if a user bought a laptop, they might also buy
accessories like a mouse or keyboard.

5. Imputation:
o For missing ratings, estimates are made using the average ratings of similar
products.

6. Real-Time Decision Making:

o When the user logs in, the model dynamically suggests products based on
past behavior and current trends.

2. Netflix:

Netflix: A Detailed Case Study

Netflix is one of the most prominent users of data analytics, leveraging massive amounts
of user data to drive its recommendation engine, optimize content delivery, and enhance
user experiences.

1. Data Division
Netflix divides its dataset strategically to ensure robust training, validation, and testing
for its models.

 Training Dataset (60%):

Used to train machine learning models. This dataset includes user watch history,
viewing patterns, device preferences, and ratings.
For example, data might include:
o User A: Watched Stranger Things, gave 5 stars.
o User B: Watched The Witcher, watched for 20 minutes, no rating.

 Validation Dataset (20%):

Used to fine-tune the models during training. Helps Netflix identify overfitting or
under fitting issues.
Example:
o Testing whether the model predicts The Crown as a good recommendation
for User C based on their history.

 Test Dataset (20%):

Used for final evaluation. This data has never been exposed to the model during
training or validation.
Example:
o Checking if the model correctly recommends documentaries for a user with
a history of watching science-related content.

2. Data Scaling

Netflix deals with diverse data types such as:

 Numeric Data: Ratings (1–5 stars), viewing duration (in minutes), and search
frequency.
o Method: Z-Score Standardization. This transforms the data to have a mean
of 0 and a standard deviation of 1, ensuring features like ratings and
durations are on the same scale.
o Example:
Original Ratings: [3, 4, 2, 5, 1]
Standardized Ratings: [0.5, 1.5, -0.5, 2.0, -1.5]

 Categorical Data: Genres (Action, Comedy, Drama), device types (Mobile, TV).
o Method: One-Hot Encoding.
Example: Action = [1, 0, 0], Comedy = [0, 1, 0].

This ensures that the model treats each category equally without introducing bias.
3. Models Used

Netflix employs various machine learning models depending on the specific use case:

 Matrix Factorization (Collaborative Filtering):

Netflix uses this for its recommendation engine. It identifies hidden patterns in
user behaviour by creating a matrix of users and items (movies/shows).
o Example:
If User A likes Stranger Things and User B has similar preferences, the model
recommends The Crown to User B if User A liked it.

 Recurrent Neural Networks (RNNs):

Used to predict user binge-watching patterns.
o Example: Based on the sequence of episodes watched, the RNN predicts
whether a user will continue with the next episode or stop watching.

 Content-Based Filtering:
This focuses on the content features (e.g., genre, actors, ratings) to recommend
similar movies.
o Example: If User A likes movies starring Keanu Reeves, the system might
recommend John Wick based on metadata.

4. Decision Trees and Graphs

Netflix uses decision trees and graph-based models for feature selection and clustering:

 Random Forests for Feature Selection:

Netflix uses random forests to identify which features (e.g., watch history, genre,
or device) most influence user preferences.
o Example:
Random Forest might conclude that a user's watch history contributes 60%
to recommendations, while device type contributes only 10%.

 Social Graphs for Clustering:

Netflix clusters users with similar preferences into groups using graph-based
algorithms.
o Example:
A cluster of users who frequently watch sci-fi movies might receive
recommendations for newly released sci-fi content.
5. Data Imputation Techniques

Missing data is a common challenge in large datasets. Netflix uses advanced imputation
techniques to address this:

 Weighted K-Nearest Neighbours (KNN):

For missing ratings, Netflix identifies similar users (neighbours) based on shared
preferences and assigns a weighted average of their ratings to fill in gaps.
o Example:
If User A didn’t rate Inception but has a similar watch history to User B (who
rated it 5 stars), Netflix imputes User A’s rating for Inception as ~4.8 stars.

 Time-Series Imputation:
For missing viewing patterns (e.g., due to network issues), Netflix uses forward-
fill or interpolation methods to estimate data.
o Example:
If a user's session logs are incomplete, Netflix estimates their viewing
duration based on prior patterns.

Practical Example: Personalized Recommendations

Imagine User A watches the following:

1. The Witcher (5 stars)

2. Stranger Things (4 stars)
3. The Crown (3 stars)

Steps Netflix follows:

1. Data Division:
User A's watch history is included in the training set, while their unseen
preferences (e.g., interest in Dark) are in the test set.
2. Data Scaling:
Viewing duration and ratings are scaled to ensure they’re treated uniformly by the
recommendation engine.
3. Model Application:
o Matrix factorization predicts User A’s interest in Dark based on correlations
between The Witcher and other fantasy titles.
o Content-based filtering highlights Dark due to shared themes with The
Witcher (fantasy, mystery).

4. Clustering and Social Graphs:

User A is grouped with other fantasy lovers, and Dark is prominently
recommended.
5. Imputation:
If User A skips rating The Witcher, Netflix imputes a 5-star rating based on
similar users’ behaviour.

4. Tesla:

Tesla: Data Analytics in Action

Tesla is at the forefront of innovation, using advanced data analytics to power its
autonomous driving systems, optimize manufacturing, and enhance user experiences.
Below is an in-depth look at how Tesla uses data analytics in its operations:

1. Data Division

Tesla collects an enormous amount of data from its fleet of vehicles worldwide. The data
is used to train and validate its machine learning models for autonomous driving.

 Training Data (80%):

Tesla uses driving data from various environments, including highways, city
roads, and rural areas, to train its deep learning models. This data includes:
o Visual data from cameras.
o Radar and LiDAR sensor readings.
o Telemetry data like speed, acceleration, and braking.

 Validation Data (10%):

A subset of data is reserved for validating the model's performance. Validation
data often includes edge cases, such as unusual weather conditions or rare driving
scenarios.
 Test Data (10%):
Tesla uses test data to evaluate the final model’s accuracy and ensure it performs
well in unseen environments.

Example:
For their Full Self-Driving (FSD) Beta program, Tesla collects millions of miles of
driving data and uses this to continually improve the accuracy of their autonomous
driving systems.
2. Data Scaling

Data scaling is essential to ensure consistent performance across different sensors and
environments.

 Robust Scaling:
Tesla uses robust scaling techniques to handle outliers in sensor data. For
example:
o Scaling sensor data (e.g., distances measured by LiDAR) to a consistent
range.
o Normalizing camera pixel values to enhance the image quality fed into
convolutional neural networks (CNNs).

 Feature Scaling:
Features like vehicle speed, road curvature, and object distances are standardized
so that no single feature disproportionately influences the model.

Example:
In training models to detect pedestrians, Tesla scales pixel intensity values from dashcam
images to ensure the neural network processes images efficiently, regardless of lighting
conditions.

3. Models Used

Tesla employs various machine learning models tailored to specific tasks within its
autonomous driving system:

 Convolutional Neural Networks (CNNs):

Used for image recognition and object detection.
o Identifies pedestrians, vehicles, traffic signs, and lane markings.
o Processes video data to predict the trajectory of objects.

 Reinforcement Learning:
Helps Tesla's vehicles learn optimal driving strategies based on simulated and
real-world data.
o Example: Deciding when to change lanes to optimize speed and safety.

 Logistic Regression:
Used for binary classification tasks, such as determining whether an object in the
road is an obstacle that needs avoidance.
 Time-Series Models:
Predicts future driving conditions based on current sensor readings and telemetry
data.
Example:
Tesla’s system can predict the movement of pedestrians by analyzing video frames and
learning their walking patterns.

4. Decision Trees or Graphs

Tesla utilizes decision trees and graph-based methods for optimization and decision-
making:

 Decision Trees for Route Optimization:

Decision trees evaluate various driving conditions, such as traffic density and road
type, to determine the most efficient route.
 Graph Neural Networks:
Tesla’s autonomous system constructs graphs to model interactions between road
users (e.g., cars, pedestrians, cyclists).
o These graphs help predict how different entities might behave and make
decisions accordingly.

Example:
In heavy traffic, Tesla’s system uses a graph to predict whether adjacent vehicles will
change lanes and adjusts its strategy to maintain safety.

5. Data Imputation Techniques

Tesla addresses missing or incomplete data through sophisticated imputation methods:

 Time-Series Imputation:
When GPS signals are temporarily lost, Tesla uses forward-fill imputation to
maintain accurate location tracking.
 Interpolation:
Sensor data gaps are filled using interpolation techniques to estimate missing
values.
 Synthetic Data Generation:
Tesla generates synthetic data for rare driving scenarios, such as animals crossing
the road, to improve model robustness.

Example:
If a radar sensor temporarily fails to detect an object due to interference, Tesla’s system
uses data from adjacent cameras to estimate the object’s position and maintain safe
driving behaviour.
6. Real-World Application Example

 Autonomous Driving:
Tesla’s Autopilot system uses an end-to-end machine learning pipeline:
1. Data Collection: Fleet-wide data is continuously collected and sent to
Tesla’s servers.
2. Model Training: Neural networks are trained on diverse datasets, ensuring
they can handle various conditions.
3. Real-Time Processing: In-car systems use pre-trained models to make
split-second decisions.

 Over-the-Air Updates:
Tesla updates its models regularly using insights gained from real-world data. For
example, new updates improve how the car handles stop signs or reacts to
merging lanes.

5. Google:

Google: Data Analytics in Action

1. Data Division

Google collects massive amounts of data daily from search queries, web crawlers, and
user interactions. To develop its algorithms and models, the company follows a
systematic approach to divide data:

 Training Set (70%):

o Used to train machine learning models, such as ranking algorithms for
search results.
o Includes billions of historical search queries and click-through patterns.
o Example: Queries like “best restaurants near me” are analyzed to train
models for intent recognition and location-based personalization.

 Validation Set (20%):

o Helps in tuning hyper parameters, such as learning rates and regularization
coefficients.
o Evaluates intermediate performance during model development to prevent
overfitting.
o Example: Validates the ranking quality of search results by comparing them
against user behavior data.
 Test Set (10%):
o Used exclusively to measure model accuracy and performance metrics (e.g.,
Precision, Recall, F1-Score).
o Example: Tests how well the algorithm predicts click-through rates (CTR)
for ads shown in search results.

2. Data Scaling

Data collected by Google is heterogeneous, spanning numerical, categorical, and textual

formats. Google ensures uniformity through:

 Log Transformations:
o Applied to web traffic data and ad performance metrics.
o Example: Log transformation helps in normalizing page views, which can
vary drastically from a few hundred to billions.

 Standardization:
o Standardizes features like click-through rates, dwell time, and keyword
relevance.
o Ensures model inputs are mean-centered with unit variance for algorithms
like Logistic Regression.

3. Models Used

Google employs diverse models tailored for its services:

 Transformer Models (e.g., BERT):

o Used for natural language understanding in search queries.
o Example: For a query like “How tall is the Eiffel Tower?”, BERT understands
the context and retrieves accurate information.

 Logistic Regression:
o Applied in Gmail for spam classification.
o Example: Analyzes email content, sender reputation, and attachment types
to classify emails as spam or important.

 Deep Learning Models:

o For Google Photos, convolutional neural networks (CNNs) process image
data for facial recognition and scene categorization.
o Example: Automatically tagging friends in photos based on learned features
from image datasets.
4. Decision Trees or Graphs

 Decision Trees:
o Used in Google Ads to optimize ad placement based on user preferences and
demographics.
o Example: A decision tree predicts whether a user is likely to click on an ad
for “running shoes” based on their search and purchase history.

 Graph Neural Networks:

o Utilized in Google’s PageRank algorithm to rank web pages based on their
interconnectivity.
o Example: A page with many inbound links from high-ranking websites is
ranked higher for relevant queries.

5. Data Imputation

Missing or incomplete data can arise from various sources like ad-click errors,
incomplete search terms, or network failures. Google addresses these issues with
advanced imputation techniques:

 Multivariate Imputation:
o Estimates missing values based on correlated features.
o Example: If CTR data for a specific ad is missing, it is estimated using similar
ads' CTR data within the same campaign.

 Predictive Imputation:
o Uses machine learning models to predict and fill gaps.
o Example: Predicts missing demographic details of users based on their
browsing behavior and historical data.

Detailed Example: Google Search

1. Problem Statement:
Improve the accuracy of Google’s search engine by delivering highly relevant
results in milliseconds.
2. Data Sources:
o Search queries from billions of users.
o Web crawling data from indexed pages.
o User behavior metrics (e.g., click-through rates, dwell time).

3. Pipeline:
o Data Collection: Collects real-time search queries and historical data.
o Preprocessing: Removes noise (e.g., typos in queries) and tokenizes text for
processing.
o Feature Engineering: Extracts features like query intent, keyword
relevance, and geolocation.
o Model Training: Trains transformer models like BERT to improve semantic
understanding of search queries.
o Evaluation: Validates the ranking algorithm using the validation set,
ensuring results are relevant and timely.
o Deployment: Deploys the trained model in production, continuously
updated with real-time user feedback.

Impact

Through these strategies, Google achieves:

 Faster, more accurate search results.

 Improved user satisfaction and engagement.
 Higher revenue from ads due to precise targeting.

Conclusion:

In the rapidly evolving landscape of modern business, data analytics serves as a

cornerstone for innovation and decision-making. Companies like Amazon, Netflix, Tesla,
and Google demonstrate the power of harnessing data to improve operational efficiency,
customer satisfaction, and market competitiveness. Each company adopts tailored data
modelling strategies that reflect their unique business objectives and challenges.

Amazon excels in personalized recommendations and inventory management by

employing collaborative filtering, decision trees, and advanced imputation techniques.
This approach enables the company to anticipate customer needs, optimize supply chains,
and enhance the overall shopping experience.

Netflix, with its focus on user engagement, leverages matrix factorization and deep
learning models like RNNs to refine content recommendations and predict user behavior.
Its meticulous data scaling and imputation techniques ensure the reliability and accuracy
of its models, ultimately driving subscriber retention and satisfaction.
Tesla, at the forefront of autonomous driving technology, utilizes sophisticated models
such as CNNs and logistic regression to process complex sensor data. Its use of decision
trees for route optimization and robust scaling techniques ensures that its vehicles can
operate safely and efficiently in diverse conditions.

Google capitalizes on its vast datasets to revolutionize search, advertising, and AI-driven
applications. Its use of transformer models like BERT and graph neural networks
exemplifies cutting-edge advancements in natural language processing and web page
ranking. Google’s meticulous data scaling and imputation techniques ensure the
consistency and effectiveness of its algorithms.

Across these companies, several commonalities emerge. The division of data into
training, validation, and test sets is a universal practice that ensures model robustness.
Data scaling techniques, such as normalization and standardization, are employed to
handle diverse feature ranges and outliers. Advanced machine learning models, including
neural networks and decision trees, form the backbone of their analytical strategies.
Additionally, imputation methods address missing data, maintaining dataset integrity and
preventing biases in model predictions.

These strategies highlight the immense potential of data analytics to transform

businesses. By leveraging data-driven insights, companies can make informed decisions,
anticipate future trends, and deliver superior products and services. As technology
continues to advance, the role of data analytics will only grow, cementing its position as
an indispensable tool for driving innovation and achieving long-term success.

Data Analytics Strategy Toolkit - Overview and Approach
100% (6)
Data Analytics Strategy Toolkit - Overview and Approach
48 pages
Sigmoid Introduction Deck - 012022
No ratings yet
Sigmoid Introduction Deck - 012022
19 pages
SR-2 SPECTRUM Multi-Temp Microprocessor Control System (TK 53080-2-OD) PDF
100% (6)
SR-2 SPECTRUM Multi-Temp Microprocessor Control System (TK 53080-2-OD) PDF
641 pages
4-6 Analytics Lifecycle Case Study Netflix
0% (1)
4-6 Analytics Lifecycle Case Study Netflix
2 pages
3 Steps To Tuning A Cisco WLAN Controller From Default Settings
No ratings yet
3 Steps To Tuning A Cisco WLAN Controller From Default Settings
19 pages
AIA 6550 Module 3 Milestone Technical Framework for Big Data Analytics
No ratings yet
AIA 6550 Module 3 Milestone Technical Framework for Big Data Analytics
15 pages
Amazon’s Utilization of Big Data
No ratings yet
Amazon’s Utilization of Big Data
13 pages
case study BDA
No ratings yet
case study BDA
4 pages
The Impact of Data Analytics on Business Decision Making
No ratings yet
The Impact of Data Analytics on Business Decision Making
8 pages
BMP 4005 Adriana Selaru
No ratings yet
BMP 4005 Adriana Selaru
13 pages
Darshan - BA Assignment
No ratings yet
Darshan - BA Assignment
10 pages
Data Analytics Strategy Toolkit - Overview
No ratings yet
Data Analytics Strategy Toolkit - Overview
18 pages
Data Task Breakdown
No ratings yet
Data Task Breakdown
12 pages
Big Data Requirement Gathering
No ratings yet
Big Data Requirement Gathering
11 pages
DS Model Steps
No ratings yet
DS Model Steps
8 pages
data science
No ratings yet
data science
8 pages
DBM Tut 2
No ratings yet
DBM Tut 2
4 pages
BA_CH-1
No ratings yet
BA_CH-1
8 pages
Datascience Pepar
No ratings yet
Datascience Pepar
9 pages
AI&DS IE REPORT
No ratings yet
AI&DS IE REPORT
6 pages
Each Stage of A Data Mining Project
No ratings yet
Each Stage of A Data Mining Project
5 pages
Amazon use Big Data
No ratings yet
Amazon use Big Data
2 pages
DEDADS
No ratings yet
DEDADS
18 pages
Capegemini
No ratings yet
Capegemini
6 pages
Instructions For Big Data Assignment
No ratings yet
Instructions For Big Data Assignment
5 pages
Rapport Bi
No ratings yet
Rapport Bi
94 pages
Document (20)-1
No ratings yet
Document (20)-1
8 pages
Attachment (20)
No ratings yet
Attachment (20)
25 pages
Module 3 Aws
No ratings yet
Module 3 Aws
132 pages
BIG DATA FRAMEWORK FOR AMAZON (An Orientation Template for New Employees)
No ratings yet
BIG DATA FRAMEWORK FOR AMAZON (An Orientation Template for New Employees)
7 pages
Amazon Forecast
No ratings yet
Amazon Forecast
23 pages
MachineLearning
No ratings yet
MachineLearning
7 pages
Jojo
No ratings yet
Jojo
3 pages
AP Lecture33
No ratings yet
AP Lecture33
35 pages
File 11
No ratings yet
File 11
6 pages
Machine Learning For Product Managers
No ratings yet
Machine Learning For Product Managers
7 pages
7708 -Predictive Analytics and Big Data 2024
No ratings yet
7708 -Predictive Analytics and Big Data 2024
202 pages
3-Data Considerations
No ratings yet
3-Data Considerations
46 pages
4220 2 (Bigdata)
No ratings yet
4220 2 (Bigdata)
19 pages
Q
No ratings yet
Q
12 pages
Neha Khandelwal 024, Question 3 BDTA Mid term
No ratings yet
Neha Khandelwal 024, Question 3 BDTA Mid term
3 pages
Document (20)
No ratings yet
Document (20)
5 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Data Analytics
No ratings yet
Data Analytics
30 pages
DS Architecture
No ratings yet
DS Architecture
7 pages
Unit 1
No ratings yet
Unit 1
10 pages
C2 - W1 Mlopssadsa
No ratings yet
C2 - W1 Mlopssadsa
111 pages
Predictive modeling (1)
No ratings yet
Predictive modeling (1)
27 pages
D-REPORT (2) (1)
No ratings yet
D-REPORT (2) (1)
19 pages
Big Data Lab
No ratings yet
Big Data Lab
45 pages
Karthik (project details)
No ratings yet
Karthik (project details)
14 pages
Here is an even more detailed and expanded version of Chapter 1 - Copy
No ratings yet
Here is an even more detailed and expanded version of Chapter 1 - Copy
5 pages
Tutorial Fro Data Mining - 7
No ratings yet
Tutorial Fro Data Mining - 7
3 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
Zoya Parasher - 2152916 - Big Data
No ratings yet
Zoya Parasher - 2152916 - Big Data
6 pages
Mba Winter 2022
No ratings yet
Mba Winter 2022
3 pages
4- Big Data Powering Business Intelligence
No ratings yet
4- Big Data Powering Business Intelligence
20 pages
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Fda 1
No ratings yet
Fda 1
5 pages
Lecture 1 introduction PM (1)
No ratings yet
Lecture 1 introduction PM (1)
21 pages
Awsdataanalyticsonawstechnicaliltinstructordeck2023 230304021823 0674c2bb
No ratings yet
Awsdataanalyticsonawstechnicaliltinstructordeck2023 230304021823 0674c2bb
146 pages
Sales Planning Hands-On Workshop Oct 24
No ratings yet
Sales Planning Hands-On Workshop Oct 24
48 pages
What Is SAP BTP
0% (1)
What Is SAP BTP
7 pages
BBDM 3294 SBL L6 Trend in Information Technology
No ratings yet
BBDM 3294 SBL L6 Trend in Information Technology
50 pages
Manual - SAP S4HANA Integration With JIRA
No ratings yet
Manual - SAP S4HANA Integration With JIRA
18 pages
Embedded Systems and Internet Protocol
No ratings yet
Embedded Systems and Internet Protocol
82 pages
TLE10 Types of Malwares
No ratings yet
TLE10 Types of Malwares
24 pages
PPP Lab Manual
No ratings yet
PPP Lab Manual
46 pages
Attributebased Access Control Vincent C Hu David F Ferraiolo pdf download
100% (1)
Attributebased Access Control Vincent C Hu David F Ferraiolo pdf download
89 pages
Please Do Not Place Trades Before Reading This Manual
No ratings yet
Please Do Not Place Trades Before Reading This Manual
8 pages
Arun internship report doc
No ratings yet
Arun internship report doc
16 pages
Sadlil Rhythom: Experience Life Philosophy
No ratings yet
Sadlil Rhythom: Experience Life Philosophy
2 pages
Viii - It - Chapter 2 - Back Exercise - Openshot Video Editor
No ratings yet
Viii - It - Chapter 2 - Back Exercise - Openshot Video Editor
3 pages
1 The Well-Liked Device, You Should Use Intel G31, GMA3100, or AMD 780 Mainboard
No ratings yet
1 The Well-Liked Device, You Should Use Intel G31, GMA3100, or AMD 780 Mainboard
12 pages
Introduction Python
No ratings yet
Introduction Python
1 page
Quick Installation Guide: Powerline Av Wireless N Starter Kit DHP-W307AV
No ratings yet
Quick Installation Guide: Powerline Av Wireless N Starter Kit DHP-W307AV
56 pages
Smart Aquarium Using IoT
No ratings yet
Smart Aquarium Using IoT
8 pages
Ouroboros-BFT: A Simple Byzantine Fault Tolerant Consensus Protocol
No ratings yet
Ouroboros-BFT: A Simple Byzantine Fault Tolerant Consensus Protocol
21 pages
CSC2401 Lecture 2
No ratings yet
CSC2401 Lecture 2
28 pages
Op Codes
No ratings yet
Op Codes
54 pages
21bps1595 Lab Assessment 3
No ratings yet
21bps1595 Lab Assessment 3
5 pages
Apnic Elearning:: Internet Routing Registry
No ratings yet
Apnic Elearning:: Internet Routing Registry
30 pages
DR - Faustus
No ratings yet
DR - Faustus
49 pages
SAP MM WM Configuration Pack Warehouse M
No ratings yet
SAP MM WM Configuration Pack Warehouse M
112 pages
JSPM'S Jayawantrao Sawant College of Engineeringhadpsar, Pune-33 Department of Information Technology Multiple Choice Questions Unit-1
No ratings yet
JSPM'S Jayawantrao Sawant College of Engineeringhadpsar, Pune-33 Department of Information Technology Multiple Choice Questions Unit-1
30 pages
My Maths Negative Numbers Homework
100% (1)
My Maths Negative Numbers Homework
7 pages
NaviTEK NT Plus Pro Manual English Iss5
No ratings yet
NaviTEK NT Plus Pro Manual English Iss5
38 pages
YouTube SEO & Client Hunting E-Book- SEO Station by Aoyon
No ratings yet
YouTube SEO & Client Hunting E-Book- SEO Station by Aoyon
107 pages
Services
No ratings yet
Services
8 pages

Data Analytics Assignment

Uploaded by

Data Analytics Assignment

Uploaded by

Data

Name: M. Sai Koushik

Key aspects explored in this document include:

Through these examples, the document aims to provide a comprehensive understanding

Amazon's Data Analytics Framework

Amazon is one of the pioneers in leveraging data analytics to improve customer

Amazon collects vast amounts of data, including:

 User Behavior: Clicks, browsing history, purchase patterns.

 Training Data (70%): Used to build models, such as collaborative filtering

2. Deep Learning Models:

4. Decision Trees or Graphs

Handling missing or incomplete data is critical for Amazon's operations:

6. Real-Time Decision Making:

Netflix: A Detailed Case Study

 Training Dataset (60%):

 Validation Dataset (20%):

 Test Dataset (20%):

Netflix deals with diverse data types such as:

 Matrix Factorization (Collaborative Filtering):

 Recurrent Neural Networks (RNNs):

4. Decision Trees and Graphs

 Random Forests for Feature Selection:

 Social Graphs for Clustering:

 Weighted K-Nearest Neighbours (KNN):

Practical Example: Personalized Recommendations

Imagine User A watches the following:

1. The Witcher (5 stars)

Steps Netflix follows:

4. Clustering and Social Graphs:

Tesla: Data Analytics in Action

 Training Data (80%):

 Validation Data (10%):

 Convolutional Neural Networks (CNNs):

4. Decision Trees or Graphs

 Decision Trees for Route Optimization:

5. Data Imputation Techniques

Tesla addresses missing or incomplete data through sophisticated imputation methods:

Google: Data Analytics in Action

 Training Set (70%):

 Validation Set (20%):

Data collected by Google is heterogeneous, spanning numerical, categorical, and textual

Google employs diverse models tailored for its services:

 Transformer Models (e.g., BERT):

 Deep Learning Models:

 Graph Neural Networks:

Detailed Example: Google Search

Through these strategies, Google achieves:

 Faster, more accurate search results.

In the rapidly evolving landscape of modern business, data analytics serves as a

Amazon excels in personalized recommendations and inventory management by

These strategies highlight the immense potential of data analytics to transform

You might also like