Balaji 1
Balaji 1
In this Blog we are going to do implementing a salable model for predicting the mobile
price prediction using some of the regression techniques based of some of features in
the dataset which is called mobile Price Prediction. There are some of the processing
techniques for creating a model. In this project i used web scrapping techniques for
collecting the mobile data from E-Commerce website. We will see about it in upcoming
parts …
Motivation
The Motivation behind it I just wanna know about the various kinds of mobile prices
during the lock down period. Because now a days most of the E-Commerce website are
Because now a days many of the students including me also having the online class
rooms for continue our education systems. So I got the idea about to do some of the
useful things do in the lock down period. That’s why I decided to doing in this project.
As well as one of my brother asked to me “Bro why shouldn’t we do this mobile price
prediction from end to end? Like we are not going to do get the data from Kaggle for
this project” .So I decided to make in this way.
Introduction
Price is the most effective attribute of marketing and business. The very first question of
costumer is about the price of items. All the costumers are first worried and thinks “If he
would be able to purchase something with given specifications or not”. So to estimate price
at home is the basic purpose of the work. This paper is only the first step toward the above
1
mentioned destination. Artificial Intelligence-which makes machine capable to answer the
questions intelligently- now a days is very vast engineering field. Machine learning provides
us best techniques for artificial intelligence like classification, regression, supervised learning
and unsupervised learning and many more. Different tools are available for machine learning
tasks like MATLAB, Python, cygwin, WEKA etc. We can use any of classifiers like Decision tree
, Naïve Bayes and many more. Different type of feature selection algorithms are available to
select only best features and minimize dataset. This will reduce computational complexity of
the problem. As this is optimization problem so many optimization techniques are also used
to reduce dimensionality of the dataset. Mobile now a days is one of the most selling and
purchasing device. Every day new mobiles with new version and more features are launched.
Hundreds and thousands of mobile are sold and purchased on daily basis. So here the mobile
price_class prediction is a case study for the given type of problem i.e finding optimal product.
The same work can be done to estimate real price of all products like cars, bikes , generators,
Mobile prices are an important reflection of the Humans and some ranges are of great
interest for both buyers and sellers. Ask a mobile buyer to describe their dream Mobile
or Branded Mobile Phones. So in this blog we are going to see about how the prices are
2
segregated based on the some of the features. As well as the target feature prediction
In this dataset I wasn’t downloading from Kaggle or any other data collecting websites.
I just make or create the dataset using one of the web scrapping tools. I’ll tell about next
upcoming part. So a little bit of overview we understand about the data and its features.
Data Overview
3
• 5. Mobile_Size — It’s represents how many inches of the particular
mobile phone have. Here all the values are gave in inches
In this project I wasn’t get the dataset from Kaggle rather than I got an idea about
should understand one thing what is web scrapping? Web scraping, web harvesting, or
web data extraction is data scraping used for extracting data from websites.
Web scraping software may access the World Wide Web directly using the Hypertext
Transfer Protocol, or through a web browser. Wanna more about web scrapping
4
The tool for getting data from websites
Here I was use this tool for getting data from one of the E-Commerce website. This is a
tool for using web scrapping any data from any websites. You can also use python
coding for web scrapping. Now I’m just a beginner for web scrapping that’s why I used
a tool for getting data. Next upcoming days I develop my web scrapping coding skills.
If you wanna this tool for getting your data just click it for Download.
The major aim of in this project is to predict the house prices based on the features
5
Machine Learning Packages are used for in this Project
Data Collection
Here if you search the dataset in Kaggle you won’t be get the same dataset from Kaggle.
But you’ll be getting another kind of datasets like that. So Data collection part I already
mentioned to you using web scrapping method to collecting the data from one of the E-
Commerce website in Mobile sections. So here I’d like mentioned the link for you’ll be
getting the data. If you wanna get the dataset just click here.
6
Dataset before drop the first column
Note: After you should drop out the first column that’s Unnamed:
0 column.
Data Preprocessing
Data preprocessing is an important step in the data mining process. The phrase
“garbage in, garbage out” is particularly applicable to data mining and machine
In this project you might be performing lot of preprocessing steps. Because in this
dataset is not downloaded from Kaggle or any other data source website. This data
retrieve from E-Commerce website. But after I was get the dataset I was make a
dataset for model prediction. So you need not to and data preprocessing steps except
7
Checking Null or Missing Values
Note: We need not to have the Brand me feature for prediction because it just a
After handling all the null or missing values will be look like
8
Data Types Changing
# Data Typesdf.dtypes()
Note: Here some of the data types are floating point values. We need to change the
Note: After changing the data types dataset and data types will be look like.
9
can be used or not, but primarily EDA is for seeing what the data can tell us beyond
10
Feature Observation
# Finding out the correlation between the featurescorr =
df.corr()corr.shape
11
I think there is no null or missing values
plt.figure(figsize=(15,10))sns.set_style(‘whitegrid’)sns.countpl
ot(x=’Ratings’,data=df)
12
Rating Frequency
plt.figure(figsize=(15,10))sns.set_style(‘whitegrid’)sns.countpl
ot(x=’RAM’,data=df)
RAM Frequency
plt.figure(figsize=(15,10))sns.set_style(‘whitegrid’)sns.countpl
ot(x=’ROM’,data=df)
13
ROM Frequency
plt.figure(figsize=(15,10))sns.set_style(‘whitegrid’)sns.countpl
ot(x=’Primary_Cam’,data=df)
14
Primary Camera Frequency
plt.figure(figsize=(15,10))sns.set_style(‘whitegrid’)sns.countpl
ot(x=’Selfi_Cam’,data=df)
sns.distplot(df[‘RAM’].dropna(),kde=False,color=’darkred’,bins=10)
15
RAM Limitations
sns.distplot(df[‘Battery_Power’].dropna(),kde=False,color=’green’,b
ins=10)
16
Battery Power Limitations
sns.distplot(df[‘Price’].dropna(),kde=False,color=’darkblue’,bins=15
)
17
Price Limitations
sns.distplot(df[‘Battery_Power’].dropna(),kde=False,color=’darkblu
e’,bins=15)
18
Range of Battery Power
plt.figure(figsize=(10,10))sns.pairplot(data=df)
19
Pair Plot for all the features
Feature Selection
irrelevant features in your data can decrease the accuracy of the models and make
20
Importing Libraries
X = df.iloc[:,1:7] # Independent columnsy = df.iloc[:,[-1]] # Y
target column i.e price range
Values Assigning
# Apply SelectKBest class to extract top 10 best
featuresbestfeatures = SelectKBest(score_func=chi2, k=4)fit =
bestfeatures.fit(X,y)
Fitting Method
dfscores = pd.DataFrame(fit.scores_)dfcolumns =
pd.DataFrame(X.columns)# Concat two dataframes for better
visualizationfeatureScores =
pd.concat([dfcolumns,dfscores],axis=1)featureScores.columns =
[‘Specs’,’Score’] #naming the dataframe columnsfeatureScores
Best Features
print(featureScores.nlargest(4,’Score’)) #print 4 best features
Top 4 Features
21
Feature Importance
from sklearn.ensemble import ExtraTreesClassifierimport
matplotlib.pyplot as pltmodel =
ExtraTreesClassifier()model.fit(X,y)
Features Frequencies
22
Model Building
23
Model Performance
24
Support Vector Regressor
25
Methodology
Data
Collection
Apply Methodology
Pre-
Algorithm Processing
Accuracy
Of Result
26
Data Flow Diagrams
27
Prediction and Final score
This work can be concluded with the comparable results of both Feature selection algorithms
and classifier except the combination of WrapperattributEval and Descision Tree J48
classifier. This combination has achieved maximum accuracy and selected minimum but most
redundant features to the data set decreases the efficiency of both classifiers. While in
backward selection if we remove any important feature from the data set, its efficiency
decreases. The main reason of low accuracy rate is low number of instances in the data set.
28
One more thing should also be considered while working that converting a regression
References
[1] Sameerchand Pudaruth . “Predicting the Price of Used Cars using Machine Learning
[2] Shonda Kuiper, “Introduction to Multiple Regression: How Much Is Your Car Worth? ” ,
[3] Mariana Listiani , 2009. “Support Vector Regression Analysis for Price Prediction in a Car
[4] Limsombunchai, V. 2004. “House Price Prediction: Hedonic Price Model vs. Artificial
Neural Network”, New Zealand Agricultural and Resource Economics Society Conference,
[5] Kanwal Noor and Sadaqat Jan, “Vehicle Price Prediction System using Machine Learning
[6] Mobile data and specifications online available from https://www.gsmarena.com/ (Last
Accessed on Friday, December 22, 2017, 6:14:54 PM)
29
[7] Introduction to dimensionality reduction, A computer science portal for Geeks.
[8] Ethem Alpaydın, 2004. Introduction to Machine Learning, Third Edition. The MIT Press
http://weka.WrapperattributEval/doc.dev/weka/attributeS
[10] Thu Zar Phyu, Nyein Nyein Oo. Performance Comparison of Feature Selection Methods.
MATEC Web
Details:
Mobile Number:9100968754
Email: [email protected]
30