0% found this document useful (0 votes)
12 views

BigData&Analytics Module4

This document discusses a module on data science and linear regression. It covers key topics like data science concepts and process, linear regression, mean square error, root mean square error, and coefficient of determination. Examples of using linear regression for car price prediction are provided, including sample data, plotting attributes against price to test relationships, and metrics like MSE, RMSE, and R2 for the regression model.

Uploaded by

Mohamed Ehab
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

BigData&Analytics Module4

This document discusses a module on data science and linear regression. It covers key topics like data science concepts and process, linear regression, mean square error, root mean square error, and coefficient of determination. Examples of using linear regression for car price prediction are provided, including sample data, plotting attributes against price to test relationships, and metrics like MSE, RMSE, and R2 for the regression model.

Uploaded by

Mohamed Ehab
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Eslsca business school logo

Big Data & Business Analytics


Module (04) – Data Science & Linear Regression
Big Data & Business Analytics
Course Name Module
Module 4: Name
Data Science & Linear Regression
Module 02
Module 4 Learning Objectives

Module Objectives:
Data science concepts & process
What is linear regression
What is Mean Square Error MSE & Root Mean Square Error RMSE
What is Coefficient of Determination R2

What to Study for Exam:


Module 4 Lecture Notes (emphasis on above topics)

© 2020 Eslsca. All Rights Reserved 2


Big Data & Business Analytics
Course Name Module
Module 4: Name
Data Science & Linear Regression
Module 02
Module 4 1st Datamining/Data science
 One of the first articles to use the phrase “data mining” was published by
Michael C. Lovell (economist) in 1983 where he pointed that statistics could
lead to incorrect conclusions when not informed by knowledge.

 By 1990s, the idea of extracting value from data and identifying patterns had
become popular. Database and data warehouse vendors began using the
buzzword business intelligence.

 In 1996, a group of companies that included Teradata and NCR led a project to
standardize and formalize data mining process.

 With the proliferation of artificial intelligence AI and neural networks, data


mining is now a subset of machine learning and AI.

© 2020 Eslsca. All Rights Reserved 3


Big Data & Business Analytics Module 4: Name
Module Data Science & Linear Regression
Course Name
Module 02
Module 4 1st Datamining/Data science
Data science is a field of study that aims to use a
scientific approach to extract meaning and insights
from data.

Data Science tackles data cleansing, preparation,


analysis, visualization and evaluation.

Machine learning, on the other hand, refers to a


group of techniques used by data scientists that
allow computers to learn from data.

Deep learning is part of a broader family of machine


learning methods based on artificial neural
networks ANN.
https://www.javatpoint.com/data-science-vs-machine-learning
© 2020 Eslsca. All Rights Reserved 4
Big Data & Business Analytics
Course Name Module
Module 4:Name
Data Science & Linear Regression
Module 02
Module 4 1st Datamining/Data science

© 2020 Eslsca. All Rights Reserved 5


Big Data & Business Analytics
Course Name Module
Module 4: Name
Data Science & Linear Regression
Module 02
Module 4 2nd Datamining/Data science Process
1. Selection: This is a process of selecting relevant data from database which are important
for data mining problem

2. Preprocessing and cleaning: Most times the raw data we used to collect are not always
clean and may contain errors, missing values, noisy or inconsistent data. Thereby getting
rid of such anomalies are very important.

3. Features selection and extraction: Feature selection and extraction lets you refine data
with a smaller number of attributes than the original set.

4. Data Mining/Data science: This is the application of data mining techniques on the data to
discover the interesting patterns. Using various techniques such as regression, clustering,
classification and other techniques of analytics.

5. Interpretation and Evaluation: This is where we generate visualization, forecasting and


prediction
https://steemit.com/steemstem/@noble-noah/data-mining-and-application-big-data-rules-the-world

© 2020 Eslsca. All Rights Reserved 6


Big Data & Business Analytics
Course Name Module
Module 4: Name
Data Science & Linear Regression
Module 02
Module 4 3rd Regression
Cases that can be modelled as a mathematical equation are referred as regression
Examples:

o Predicting the failure of mechanical parts in automobile engines

o Predicting social media share scores

o Predicting performance scores, e.g. restaurant rating, revenues

o Estimating life expectancy

o Estimating population growth

o Temperature forecast

© 2020 Eslsca. All Rights Reserved 7


Big Data & Business Analytics
Course Name Module
Module 4: Name
Data Science & Linear Regression
Module 02
Module 4
• 3rd Linear Regression

We want to find the best line (linear function


y=f(X)) to explain the data.
The predicted value of y is given by:
𝑝

𝑦 = 𝛽0 + 𝑋𝑗 𝛽 𝑗
𝑗=1

To determine the model parameters 𝛽 from some


data, we need to minimize the Residual Sum of
Squares:
𝑁

RSS 𝛽 = 𝑦𝑖 − 𝛽𝑥𝑖 2 X
𝑖=1
Big Data & Business Analytics
Course Name Module
Module 4: Name
Data Science & Linear Regression
Module 02
Module 4 3rd Linear Regression
Regression Mean Squared Error MSE

RMSE is the square root of MSE


© 2020 Eslsca. All Rights Reserved 9
Big Data & Business Analytics
Course Name Module
Module 4: Name
Data Science & Linear Regression
Module 02
Module 4 3rd Linear Regression
Coefficient of determination- R Squared

High value of R squared is an


indicator of a close fit

© 2020 Eslsca. All Rights Reserved 10


Big Data & Business Analytics
Course Name Module
Module 4: Name
Data Science & Linear Regression
Module 02
Module 4 4th Linear Regression Examples
Car Price Prediction:
A Chinese automobile company Geely Auto aspires to enter the US market by setting up
their manufacturing unit there and producing cars locally to give competition to their
US and European counterparts.
The company wants to know:
- Which variables are significant in predicting the price of a car
- How well those variables describe the price of a car
Based on various market surveys, a large dataset of different types of cars across the
American market was obtained.
Various attributes were found to include:
fueltype, apiration, doornumber, carbody, drivewheel, enginelocation, wheelbase,
carlength, carwidth, carheight, curbweight, enginetype, cylindernumber, enginesize,
fuelsystem, boreratio, stroke, compressionratio, and others

https://www.kaggle.com/goyalshalini93/car-price-prediction-linear-regression-rfe/notebook
© 2020 Eslsca. All Rights Reserved 11
Big Data & Business Analytics
Course Name Module
Module 4: Name
Data Science & Linear Regression
Module 02
Module 4 4th Linear Regression Examples

Car Price Prediction:

Car Price Sample Data:

https://www.kaggle.com/goyalshalini93/car-price-prediction-linear-regression-rfe/notebook

© 2020 Eslsca. All Rights Reserved 12


Big Data & Business Analytics
Course Name Module
Module 4: Name
Data Science & Linear Regression
Module 02
Module 4 4th Linear Regression Examples
Car Price Prediction:
The price was plotted against the various attributes to test the relationship and the
significance:

https://www.kaggle.com/goyalshalini93/car-price-prediction-linear-regression-rfe/notebook

© 2020 Eslsca. All Rights Reserved 13


Big Data & Business Analytics
Course Name Module
Module 4: Name
Data Science & Linear Regression
Module 02
Module 4 4th Linear Regression Examples

Car Price Prediction:


The regression model for this example yield the following:
Mean squared error MSE: 8871405.16
Root Mean squared error RMSE: 2978.49042
Coefficient of determination/ R squared: 0.87

This high R squared indicates a strong linear relationship


Code and dataset are found on the following link:
https://www.kaggle.com/goyalshalini93/car-price-prediction-linear-regression-rfe/notebook

© 2020 Eslsca. All Rights Reserved 14


Big Data & Business Analytics
Course Name Module
Module 4: Name
Data Science & Linear Regression
Module 02
Module 4 5th Big Data Linear Regression Case Study

SAP Analytics:
Revitalizing the Shopping Center Experience with SAP Analytics. To stay
competitive in the era of e-commerce, AG Real Estate, the largest real
estate player in Belgium, is reinventing the mall experience. This requires
insight into how mall visitors shop and helps shopping center managers
create experiences that maximize revenue by keeping shoppers coming
back.
https://www.youtube.com/watch?v=lv6ZVr5114k

© 2020 Eslsca. All Rights Reserved 15


Big Data & Business Analytics
Course Name Module
Module 4: Name
Data Science & Linear Regression
Module 02
Module 4 Questions

© 2018 MegaSoft. All Rights Reserved 16


 Module Completed

Module 04

You might also like