0% found this document useful (0 votes)

93 views25 pages

Amit Khilare Used Device Data PM Project

The Predictive Modelling Project report focuses on developing a dynamic pricing strategy for used and refurbished devices, driven by the growing market projected to reach $52.7 billion by 2023. The report outlines the data collected in 2021, including various attributes of devices, and emphasizes the importance of exploratory data analysis (EDA) in building a linear regression model to predict used device prices. Key factors influencing pricing, such as screen size, camera resolution, and internal memory, are identified through statistical analysis and visualization techniques.

Uploaded by

hvac

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views25 pages

Amit Khilare Used Device Data PM Project

Uploaded by

hvac

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

__________________________________

Predictive Modelling Project

Report
__________________________________
PGP-DSBA (PGPDSBA.O.AUG24.A)

Prepared by- Amit Khilare

Project Predictive Modelling: Used Device Data

Context

Buying and selling used phones and tablets used to be something that happened on a
handful of online marketplace sites. But the used and refurbished device market has grown
considerably over the past decade, and a new IDC (International Data Corporation) forecast predicts
that the used phone market would be worth $52.7bn by 2023 with a compound annual growth rate
(CAGR) of 13.6% from 2018 to 2023. This growth can be attributed to an uptick in demand for used
phones and tablets that offer considerable savings compared with new models.

Refurbished and used devices continue to provide cost-effective alternatives to both

consumers and businesses that are looking to save money when purchasing one. There are plenty of
other benefits associated with the used device market. Used and refurbished devices can be sold
with warranties and can also be insured with proof of purchase. Third-party vendors/platforms, such
as Verizon, Amazon, etc., provide attractive offers to customers for refurbished devices. Maximizing
the longevity of devices through second-hand trade also reduces their environmental impact and
helps in recycling and reducing waste. The impact of the COVID-19 outbreak may further boost this
segment as consumers cut back on discretionary spending and buy phones and tablets only for
immediate needs.

Objective

The rising potential of this comparatively under-the-radar market fuels the need for an ML-
based solution to develop a dynamic pricing strategy for used and refurbished devices. ReCell, a
startup aiming to tap the potential in this market, has hired you as a data scientist. They want you to
analyze the data provided and build a linear regression model to predict the price of a used
phone/tablet and identify factors that significantly influence it.

Data Description

The data contains the different attributes of used/refurbished phones and tablets. The data
was collected in the year 2021. The detailed data dictionary is given below.
Dataset: used_device_data.csv

Data Dictionary

 brand name: Name of manufacturing brand

 os: OS on which the device runs
 screen size: Size of the screen in cm
 4g: Whether 4G is available or not
 5g: Whether 5G is available or not
 main_camera_mp: Resolution of the rear camera in megapixels
 selfie_camera_mp: Resolution of the front camera in megapixels
 int_memory: Amount of internal memory (ROM) in GB
 ram: Amount of RAM in GB
 battery: Energy capacity of the device battery in mAh
 weight: Weight of the device in grams
 release_year: Year when the device model was released
 days_used: Number of days the used/refurbished device has been used
 normalized_new_price: Normalized price of a new device of the same model in euros
 normalized_used_price: Normalized price of the used/refurbished device in euros
Theory:
Exploratory Data Visualization
Exploratory Data Analysis (EDA) is the first step in your data analysis process developed by “John
Tukey” in the 1970s. In statistics, exploratory data analysis is an approach to analysing data sets to summarize
their main characteristics, often with visual methods.
By the name itself, we can get to know that it is a step in which we need to explore the data set. When you are
trying to build a machine learning model you need to be pretty sure whether your data is making sense or not.
The main aim of exploratory data analysis is to obtain confidence in your data to an extent where you’re ready
to engage a machine learning algorithm.

1) Descriptive analysis - Central tendency

A measure of central tendency is a summary statistic that represents the centre point or typical value
of a dataset. These measures indicate where most values in a distribution fall and are also referred to as the
central location of a distribution. We can think of it as the tendency of data to cluster around a middle value.
In statistics, the three most common measures of central tendency are the mean,
Median, and mode. Each of these measures calculates the location of the central point using a different
method. Choosing the best measure of central tendency depends on the type of data we have

A. Mean (Average): Represents the sum of all values in a dataset divided by the total number of the
values.
B. Median: The middle value in a dataset that is arranged in ascending order (from the smallest value to
the largest value). If a dataset contains an even number of values, the median of the dataset is the
mean of the two middle values.
C. Mode: Defines the most frequently occurring value in a dataset. In some cases, a dataset may contain
multiple modes, while some datasets may not have any mode at all.

The selection of a central tendency measure depends on the properties of a dataset. For instance, the
mode is the only central tendency measure for categorical data, while a median works best with ordinary data.
Although the mean is regarded as the best measure of central tendency for quantitative data, that is not always
the case. For example, the mean may not work well with quantitative datasets that contain extremely large or
extremely small values. The extreme values may distort the mean. Thus, you may consider other measures.

2) Descriptive analysis - Dispersion and Distribution

A distribution tells you how likely certain events are, e.g for the normal distribution (continuous) you
can talk about the probability that you get a number between 3 and 7, or for a discrete distribution like the
Poisson, how likely are you to get 3?
A measure of dispersion tells you, if you see many events happen, how spread out are they going to be.

Dispersion:
Most measures of dispersion have the same units as the quantity being measured. In other words, if the
measurements are in meters or seconds, so is the measure of dispersion. There are two main types of
dispersion methods in statistics which are:
A. Absolute Measure of Dispersion:
An absolute measure of dispersion contains the same unit as the original data set. Absolute dispersion method
expresses the variations in terms of the average of deviations of observations like standard or mean deviations.
It includes range, standard deviation, quartile deviation, etc. The types of absolute measures of dispersion are
range, variance, standard deviation, quartiles, and so on.

B. Relative Measure of Dispersion:

The relative measures of dispersion are used to compare the distribution of two or more Data sets. This
measure compares values without units. Common relative dispersion Methods include:
3) Correlation
Correlation explains how one or more variables are related to each other. These variables can be input
data features which have been used to forecast our target variable.
Correlation, statistical technique which determines how one variable moves/changes in relation with the other
variable. It gives us the idea about the degree of the relationship of the two variables. It’s a bi-variate analysis
measure which describes the association between different variables. In most of the business it’s useful to
express one subject in terms of its relationship with others Types of Correlation:
Based on the degree of correlation:

A. Positive Correlation:
Two features (variables) can be positively correlated with each other. It means that when the value of one
variable increases then the value of the other variable(s) also increases.

B. Negative Correlation:
Two features (variables) can be negatively correlated with each other. It means that when the value of one
variable increases then the value of the other variable(s) decreases.

C. No Correlation:
Two features (variables) are not correlated with each other. It means that when the value of one variable
increases or decreases then the value of the other variable(s) doesn’t increase or decrease.

4) Data Visualization
Data visualization is defined as a graphical representation that contains the information and the data.
By using visual elements like charts, graphs, and maps, data visualization techniques provide an accessible way
to see and understand trends, outliers, and patterns in data.
In modern days we have a lot of data in our hands i.e, in the world of Big Data, data visualization tools, and
technologies are crucial to analyze massive amounts of information and make data-driven decisions. It is used
in many areas such as:
● To model complex events.
● Visualize phenomenons that cannot be observed directly, such as weather patterns, medical conditions, or
mathematical relationships.

Univariate Analysis Techniques for Data Visualization

1. Distribution Plot
It is one of the best univariate plots to know about the distribution of data. When we want to analyze
the impact on the target variable(output) with respect to an independent variable(input), we use distribution
plots a lot. This plot gives us a combination of both probability density functions(pdf) and histogram in a single
plot.

Bivariate Analysis Techniques for Data Visualization

1. Line Plot
This is the plot that you can see in the nook and corners of any sort of analysis between 2 variables.
The line plots are nothing but the values on a series of data points will be connected with straight lines. The
plot may seem very simple but it has more applications not only in machine learning but in many other areas.

2. Bar Plot
This is one of the widely used plots, that we would have seen multiple times not just in data analysis,
but we use this plot also wherever there is a trend analysis in many fields. Though it may seem simple it is
powerful in analyzing data like sales figures every week, revenue from a product, Number of visitors to a site on
each day of a week, etc.

Implementation:

1. Import Numpy and Pandas

2. Reading the csv file

3. Statistical summary of the dataset

4. Checking for duplicate values & Missing Values

5. Exploring Categorical features

Exploratory Data Analysis
Univariate Analysis
histogram_boxplot for normalized_new_price

histogram_boxplot for Screen Size

histogram_boxplot for main_camera

histogram_boxplot for selfie_camera

histogram_boxplot for internal memory

histogram_boxplot for 'ram'

histogram_boxplot for 'weight'

histogram_boxplot for 'battery'

histogram_boxplot for 'days_used'

Labelled Bar Plot for Brand Name against the count

Bivariate Analysis
Let's see how the amount of RAM varies across brands.

Dataframe of only those devices which offer a large battery and analyze.
Let's see how the price of used devices varies across the years.

Release year against average used price over release years.

checking how the prices vary for used phones and tablets offering 4G and 5G networks.
Outliner Check
Data Preparation for modelling
Model Building - Linear Regression
Checking Linear Regression Assumptions
VIF after dropping main_camera_mp
feature VIF
0 const 227.000703
1 screen_size 8.261309
2 selfie_camera_mp 2.851442
3 int_memory 1.339852
4 ram 2.272497
5 battery 4.018931
6 weight 6.207163
7 days_used 2.544869
8 normalized_new_price 2.713690
9 years_since_release 4.836333
10 brand_name_Alcatel 3.458240
11 brand_name_Apple 11.192808
12 brand_name_Asus 3.646213
13 brand_name_BlackBerry 1.622996
14 brand_name_Celkon 1.868741
15 brand_name_Coolpad 1.573721
16 brand_name_Gionee 2.076641
17 brand_name_Google 1.387899
18 brand_name_HTC 3.459903
19 brand_name_Honor 3.544615
20 brand_name_Huawei 6.395258
21 brand_name_Infinix 1.191416
22 brand_name_Karbonn 1.626941
23 brand_name_LG 5.348720
24 brand_name_Lava 1.824997
25 brand_name_Lenovo 4.702696
26 brand_name_Meizu 2.404332
27 brand_name_Micromax 3.776810
28 brand_name_Microsoft 2.091350
29 brand_name_Motorola 3.465108
30 brand_name_Nokia 3.747557
31 brand_name_OnePlus 1.585266
32 brand_name_Oppo 4.284583
33 brand_name_Others 10.827443
34 brand_name_Panasonic 1.886714
35 brand_name_Realme 1.965466
36 brand_name_Samsung 8.011342
37 brand_name_Sony 2.834760
38 brand_name_Spice 1.637533
39 brand_name_Vivo 3.727480
40 brand_name_XOLO 2.160588
41 brand_name_Xiaomi 4.063138
42 brand_name_ZTE 4.327130
43 os_Others 1.878917
44 os_Windows 1.739946
45 os_iOS 10.026109
46 4g_yes 2.420727
47 5g_yes 1.787727

Dropping high p-value variables

We will drop the predictor variables having a p-value greater than 0.05 as they do
not significantly impact the target variable.
• But sometimes p-values change after dropping a variable. So, we'll not
drop all variables at once.
• Instead, we will do the following:
• Build a model, check the p-values of the variables, and drop the
column with the highest p-value.
• Create a new model without the dropped feature, check the p-
values of the variables, and drop the column with the highest p-value.
• Repeat the above two steps till there are no columns with p-
value > 0.05.
The above process can also be done manually by picking one variable at a time that
has a high p-value, dropping it, and building a model again. But that might be a little
tedious and using a loop will be more efficient.
x_train3 = x_train_with_const[selected_features]

x_test3 = x_test[selected_features]

olsmodel2 = sm.OLS(y_train, sm.add_constant(x_train3)).fit() ## Complete the code fit OLS() on updated

dataset (no multicollinearity and no insignificant predictors)

print(olsmodel2.summary())
OLS Regression Results
=====================================================================
Dep. Variable: normalized_used_price R-squared: 0.847
Model: OLS Adj. R-squared: 0.846
Method: Least Squares F-statistic: 886.8
Date: Thu, 21 Nov 2024 Prob (F-statistic): 0.00
Time: 21:13:52 Log-Likelihood: 110.96
No. Observations: 2417 AIC: -189.9
Df Residuals: 2401 BIC: -97.27
Df Model: 15
Covariance Type: nonrobust
==================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------
const 1.3715 0.052 26.565 0.000 1.270 1.473
screen_size 0.0291 0.003 8.473 0.000 0.022 0.036
main_camera_mp 0.0234 0.001 16.151 0.000 0.021 0.026
selfie_camera_mp 0.0119 0.001 10.644 0.000 0.010 0.014
int_memory 0.0002 6.66e-05 2.836 0.005 5.83e-05 0.000
ram 0.0293 0.005 5.686 0.000 0.019 0.039
battery -1.46e-05 7.19e-06 -2.030 0.043 -2.87e-05 -4.94e-07
weight 0.0008 0.000 6.200 0.000 0.001 0.001
normalized_new_price 0.4092 0.011 36.760 0.000 0.387 0.431
years_since_release -0.0218 0.004 -5.847 0.000 -0.029 -0.014
brand_name_Celkon -0.1905 0.053 -3.571 0.000 -0.295 -0.086
brand_name_Nokia 0.0823 0.030 2.771 0.006 0.024 0.140
brand_name_Xiaomi 0.0829 0.025 3.296 0.001 0.034 0.132
os_Others -0.0598 0.030 -1.993 0.046 -0.119 -0.001
4g_yes 0.0461 0.015 3.023 0.003 0.016 0.076
5g_yes -0.0900 0.031 -2.862 0.004 -0.152 -0.028
====================================================================
Omnibus: 240.704 Durbin-Watson: 1.993
Prob(Omnibus): 0.000 Jarque-Bera (JB): 674.212
Skew: -0.537 Prob(JB): 3.95e-147
Kurtosis: 5.354 Cond. No. 4.08e+04
TEST FOR LINEARITY AND INDEPENDENCE
We will test for linearity and independence by making a plot of fitted values vs residuals and checking for
patterns.
If there is no pattern, then we say the model is linear and residuals are independent.
Otherwise, the model is showing signs of non-linearity and residuals are not independent.
import pylab

import scipy.stats as stats

stats.probplot(df_pred['residuals'], dist="norm", plot=pylab)

plt.show()
Final Model Summary
OLS Regression Results
=================================================================================
Dep. Variable: normalized_used_price R-squared: 0.849
Model: OLS Adj. R-squared: 0.846
Method: Least Squares F-statistic: 277.1
Date: Thu, 21 Nov 2024 Prob (F-statistic): 0.00
Time: 21:24:26 Log-Likelihood: 125.15
No. Observations: 2417 AIC: -152.3
Df Residuals: 2368 BIC: 131.4
Df Model: 48
Covariance Type: nonrobust
=========================================================================================
coef std err t P>|t| [0.025 0.975]
-----------------------------------------------------------------------------------------
const 1.4172 0.072 19.677 0.000 1.276 1.558
screen_size 0.0295 0.004 8.352 0.000 0.023 0.036
main_camera_mp 0.0232 0.002 14.892 0.000 0.020 0.026
selfie_camera_mp 0.0116 0.001 9.924 0.000 0.009 0.014
int_memory 0.0002 6.76e-05 2.779 0.005 5.53e-05 0.000
ram 0.0305 0.005 5.797 0.000 0.020 0.041
battery -1.665e-05 7.35e-06 -2.266 0.024 -3.11e-05 -2.24e-06
weight 0.0008 0.000 5.946 0.000 0.001 0.001
days_used 3.376e-05 3.07e-05 1.101 0.271 -2.64e-05 9.39e-05
normalized_new_price 0.4104 0.012 33.494 0.000 0.386 0.434
years_since_release -0.0255 0.005 -5.589 0.000 -0.034 -0.017
brand_name_Alcatel -0.0804 0.050 -1.618 0.106 -0.178 0.017
brand_name_Apple -0.0438 0.148 -0.297 0.767 -0.333 0.246
brand_name_Asus 0.0068 0.049 0.138 0.890 -0.090 0.103
brand_name_BlackBerry 0.0312 0.072 0.434 0.664 -0.110 0.172
brand_name_Celkon -0.2372 0.068 -3.485 0.001 -0.371 -0.104
brand_name_Coolpad -0.0308 0.071 -0.434 0.664 -0.170 0.108
brand_name_Gionee -0.0130 0.059 -0.221 0.825 -0.128 0.102
brand_name_Google -0.1192 0.083 -1.442 0.149 -0.281 0.043
brand_name_HTC -0.0403 0.050 -0.805 0.421 -0.138 0.058
brand_name_Honor -0.0478 0.051 -0.941 0.347 -0.147 0.052
brand_name_Huawei -0.0599 0.046 -1.299 0.194 -0.150 0.030
brand_name_Infinix 0.1338 0.113 1.179 0.238 -0.089 0.356
brand_name_Karbonn -0.0592 0.068 -0.867 0.386 -0.193 0.075
brand_name_LG -0.0608 0.047 -1.299 0.194 -0.152 0.031
brand_name_Lava -0.0230 0.063 -0.364 0.716 -0.147 0.101
brand_name_Lenovo -0.0364 0.047 -0.771 0.441 -0.129 0.056
brand_name_Meizu -0.0860 0.056 -1.530 0.126 -0.196 0.024
brand_name_Micromax -0.0645 0.049 -1.315 0.189 -0.161 0.032
brand_name_Microsoft 0.0749 0.082 0.916 0.360 -0.085 0.235
brand_name_Motorola -0.0686 0.051 -1.349 0.177 -0.168 0.031
brand_name_Nokia 0.0380 0.052 0.724 0.469 -0.065 0.141
brand_name_OnePlus -0.0384 0.073 -0.523 0.601 -0.182 0.105
brand_name_Oppo -0.0294 0.049 -0.599 0.549 -0.126 0.067
brand_name_Others -0.0679 0.044 -1.556 0.120 -0.154 0.018
brand_name_Panasonic -0.0427 0.062 -0.690 0.490 -0.164 0.078
brand_name_Realme -0.0359 0.063 -0.568 0.570 -0.160 0.088
brand_name_Samsung -0.0617 0.045 -1.376 0.169 -0.150 0.026
brand_name_Sony -0.0756 0.053 -1.428 0.153 -0.179 0.028
brand_name_Spice -0.0356 0.068 -0.520 0.603 -0.170 0.099
brand_name_Vivo -0.0644 0.050 -1.277 0.202 -0.163 0.034
brand_name_XOLO -0.0781 0.057 -1.362 0.173 -0.191 0.034
brand_name_Xiaomi 0.0325 0.050 0.655 0.513 -0.065 0.130
brand_name_ZTE -0.0460 0.048 -0.952 0.341 -0.141 0.049
os_Others -0.0604 0.033 -1.856 0.064 -0.124 0.003
os_Windows -0.0374 0.043 -0.861 0.389 -0.122 0.048
os_iOS -0.0141 0.148 -0.095 0.924 -0.304 0.276
4g_yes 0.0406 0.016 2.514 0.012 0.009 0.072
5g_yes -0.0916 0.032 -2.846 0.004 -0.155 -0.029
==============================================================================
Omnibus: 234.465 Durbin-Watson: 1.994
Prob(Omnibus): 0.000 Jarque-Bera (JB): 630.705
Skew: -0.536 Prob(JB): 1.11e-137
Kurtosis: 5.262 Cond. No. 1.85e+05
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.85e+05. This might indicate that there are
strong multicollinearity or other numerical problems.
Test Performance

{'MSE': 0.05715049913984885,

'RMSE': 0.23906170571601143,

'R-squared': 0.8321757391826543}

Conclusion:
Hence, we’ve successfully performed exploratory data analysis on our dataset which included
Descriptive Analysis - Central Tendency and Dispersion & Correlation between attributes. We also performed
different data visualization techniques such as bar plot, line plot and scatter plot.

FRA Extended
No ratings yet
FRA Extended
22 pages
Time Series Forecasting Project (Shoe Sales)
No ratings yet
Time Series Forecasting Project (Shoe Sales)
26 pages
Predictive Modelling Project - Nandini
No ratings yet
Predictive Modelling Project - Nandini
31 pages
AV Project Shivakumar Vanga
100% (1)
AV Project Shivakumar Vanga
37 pages
Data Mining Project - PCA - Hair Salon
No ratings yet
Data Mining Project - PCA - Hair Salon
8 pages
ML-2 Guided Project Report
No ratings yet
ML-2 Guided Project Report
63 pages
Ml-1-Guided-Bus Report
No ratings yet
Ml-1-Guided-Bus Report
35 pages
FRA Main Project Part B Guided
No ratings yet
FRA Main Project Part B Guided
23 pages
Assignment 5 - Heuristics and Principles
No ratings yet
Assignment 5 - Heuristics and Principles
4 pages
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
No ratings yet
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
48 pages
Data Basics For ML
No ratings yet
Data Basics For ML
23 pages
Planned Maintenance System
No ratings yet
Planned Maintenance System
9 pages
Wholesale Custumer
100% (1)
Wholesale Custumer
32 pages
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
No ratings yet
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
12 pages
Business - Report-Comp-Fin - Data - Part A - Problem
No ratings yet
Business - Report-Comp-Fin - Data - Part A - Problem
17 pages
Capstone Notes-1
No ratings yet
Capstone Notes-1
18 pages
API 6d 24ed. & 25ed. Comparision
100% (5)
API 6d 24ed. & 25ed. Comparision
23 pages
Finance Risk Analytics - Priyanka Sharma - Business Report
No ratings yet
Finance Risk Analytics - Priyanka Sharma - Business Report
49 pages
Ensemble Learning: Proprietary Content. ©great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
No ratings yet
Ensemble Learning: Proprietary Content. ©great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
6 pages
SMDM Project
No ratings yet
SMDM Project
17 pages
SMDM Project Report
100% (1)
SMDM Project Report
9 pages
FRA Project Report Milestone 1 PDF
No ratings yet
FRA Project Report Milestone 1 PDF
29 pages
DataMining Aug2021
100% (2)
DataMining Aug2021
49 pages
AS Graded Project Suchi Solanki
No ratings yet
AS Graded Project Suchi Solanki
21 pages
Predictive Modelling Project
No ratings yet
Predictive Modelling Project
29 pages
Business Report SMDM Project - Coded
No ratings yet
Business Report SMDM Project - Coded
27 pages
Palash Bhai - Machine Learning Assignment
100% (2)
Palash Bhai - Machine Learning Assignment
18 pages
TSF - Project
100% (1)
TSF - Project
5 pages
SMDM Project
100% (1)
SMDM Project
22 pages
Machine Learning Guided Project
No ratings yet
Machine Learning Guided Project
23 pages
Project Time Series Forecasting ROSE Dataset by Somya Dhar 1 PDF
No ratings yet
Project Time Series Forecasting ROSE Dataset by Somya Dhar 1 PDF
52 pages
Advance Stats Project Parijat
No ratings yet
Advance Stats Project Parijat
18 pages
SMDM Project Report-Survi Ghura
100% (1)
SMDM Project Report-Survi Ghura
26 pages
Unit .......
No ratings yet
Unit .......
45 pages
Random Forest - US - Heart - Patients - Class
100% (1)
Random Forest - US - Heart - Patients - Class
24 pages
Problem Statement 1
100% (1)
Problem Statement 1
17 pages
ML Models
No ratings yet
ML Models
2 pages
PM ProjectJune - 2021
100% (1)
PM ProjectJune - 2021
33 pages
Uber Drive Practice DP PDF
No ratings yet
Uber Drive Practice DP PDF
10 pages
Assignment Clustering
No ratings yet
Assignment Clustering
22 pages
Business Analytics Report: Submitted To
No ratings yet
Business Analytics Report: Submitted To
32 pages
Rahulsharma - 03 12 23
No ratings yet
Rahulsharma - 03 12 23
25 pages
AS Extended Buisnesss Report
No ratings yet
AS Extended Buisnesss Report
25 pages
P L Lohitha 19-04-23 TSF Business Report
No ratings yet
P L Lohitha 19-04-23 TSF Business Report
70 pages
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
No ratings yet
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
28 pages
SMDM Project Report Dipti
No ratings yet
SMDM Project Report Dipti
14 pages
Clustering Analysis: Prepared by Muralidharan N
100% (1)
Clustering Analysis: Prepared by Muralidharan N
16 pages
SMDM Project Report
100% (1)
SMDM Project Report
19 pages
Predictive Modeling
No ratings yet
Predictive Modeling
38 pages
Gen 6 Battery and Riser Card Replacement
100% (1)
Gen 6 Battery and Riser Card Replacement
14 pages
Answer Report: Data Mining
No ratings yet
Answer Report: Data Mining
32 pages
Surabhi FRA PartA
No ratings yet
Surabhi FRA PartA
13 pages
Factor-Hair RV PDF
No ratings yet
Factor-Hair RV PDF
23 pages
Child Friendly School S High School 1
No ratings yet
Child Friendly School S High School 1
17 pages
Education - Post 12th Standard - CSV
No ratings yet
Education - Post 12th Standard - CSV
11 pages
Vijayalakshmi
No ratings yet
Vijayalakshmi
17 pages
Data Mining Business Report
No ratings yet
Data Mining Business Report
38 pages
Capstone Presentation
No ratings yet
Capstone Presentation
58 pages
Quiz 3 Name: Kainat Iftikhar Reg# 2021630007 1. List Three Examples of Time Series Data. Time Series Data
No ratings yet
Quiz 3 Name: Kainat Iftikhar Reg# 2021630007 1. List Three Examples of Time Series Data. Time Series Data
2 pages
SMDM Report
No ratings yet
SMDM Report
12 pages
M4 Data Mining W4 Business Report
No ratings yet
M4 Data Mining W4 Business Report
22 pages
SMDM-Business Report
No ratings yet
SMDM-Business Report
11 pages
Ref System Fault Location
No ratings yet
Ref System Fault Location
24 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
Ingersoll Rand System Automation X8I Modbus Rtu User's Manual
No ratings yet
Ingersoll Rand System Automation X8I Modbus Rtu User's Manual
138 pages
Data Science & Business Analytics: Post Graduate Program in
No ratings yet
Data Science & Business Analytics: Post Graduate Program in
16 pages
Series 2A Pneu Cyl 2a - 0910-Uk PDF
No ratings yet
Series 2A Pneu Cyl 2a - 0910-Uk PDF
48 pages
Varian TOGA
No ratings yet
Varian TOGA
3 pages
Listening
No ratings yet
Listening
22 pages
Objective:: Power Plant Lab (Me-223L) Experiment No: 6 Title: Demonistration of Steam Engine
No ratings yet
Objective:: Power Plant Lab (Me-223L) Experiment No: 6 Title: Demonistration of Steam Engine
5 pages
Sample PF Packing List
No ratings yet
Sample PF Packing List
595 pages
Immigrants and Crime
No ratings yet
Immigrants and Crime
36 pages
Import As Import As From Import
No ratings yet
Import As Import As From Import
23 pages
AZ E-Lite
100% (1)
AZ E-Lite
85 pages
Narrative Report (Linguistics)
No ratings yet
Narrative Report (Linguistics)
5 pages
Bio IA Template
100% (2)
Bio IA Template
4 pages
Your Reliance Bill: Summary of Current Charges Amount (RS)
No ratings yet
Your Reliance Bill: Summary of Current Charges Amount (RS)
3 pages
Week4 EnhancedSystemDecomposition Part2
No ratings yet
Week4 EnhancedSystemDecomposition Part2
22 pages
Uc Colorado Springs
No ratings yet
Uc Colorado Springs
17 pages
2023+02+09+TD Z-Trak2+LP2C+4K+Series+Datasheet
No ratings yet
2023+02+09+TD Z-Trak2+LP2C+4K+Series+Datasheet
4 pages
Cylinder Head
No ratings yet
Cylinder Head
5 pages
Catcher User Manual For Customer Full PDF
No ratings yet
Catcher User Manual For Customer Full PDF
51 pages
LYONS, Martin. New Directions in The History of Written Culture
No ratings yet
LYONS, Martin. New Directions in The History of Written Culture
9 pages
UE271
No ratings yet
UE271
1 page
Mendels Law of Segregation
No ratings yet
Mendels Law of Segregation
10 pages
Amendment in Regional Transmission Grid Plan of Gwadar Area - Complete (1) - Pages-70-74
No ratings yet
Amendment in Regional Transmission Grid Plan of Gwadar Area - Complete (1) - Pages-70-74
5 pages
Gold Care
No ratings yet
Gold Care
16 pages
Quality Control Analysis of Cube Fish With Fault Tree Analysis (FTA) Method in ALJB A Case Study
No ratings yet
Quality Control Analysis of Cube Fish With Fault Tree Analysis (FTA) Method in ALJB A Case Study
6 pages
Chapter 6
No ratings yet
Chapter 6
10 pages
Chapter-1 Group7MMM
No ratings yet
Chapter-1 Group7MMM
4 pages

Amit Khilare Used Device Data PM Project

Uploaded by

Amit Khilare Used Device Data PM Project

Uploaded by

__________________________________

Predictive Modelling Project

Prepared by- Amit Khilare

Refurbished and used devices continue to provide cost-effective alternatives to both

 brand name: Name of manufacturing brand

1) Descriptive analysis - Central tendency

2) Descriptive analysis - Dispersion and Distribution

B. Relative Measure of Dispersion:

Univariate Analysis Techniques for Data Visualization

Bivariate Analysis Techniques for Data Visualization

1. Import Numpy and Pandas

2. Reading the csv file

3. Statistical summary of the dataset

5. Exploring Categorical features

histogram_boxplot for Screen Size

histogram_boxplot for main_camera

histogram_boxplot for internal memory

histogram_boxplot for 'ram'

histogram_boxplot for 'weight'

histogram_boxplot for 'days_used'

Release year against average used price over release years.

Dropping high p-value variables

olsmodel2 = sm.OLS(y_train, sm.add_constant(x_train3)).fit() ## Complete the code fit OLS() on updated

import scipy.stats as stats

stats.probplot(df_pred['residuals'], dist="norm", plot=pylab)

You might also like