Car Popularity Prediction
Car Popularity Prediction
Approach
Sunakshi Mamgain Srikant Kumar Kabita Manjari Nayak
Department of Computer Science Department of Computer Science Department of Computer Science
M.Tech, I.I.I.T.-Bhubaneswar M.Tech, I.I.I.T.-Bhubaneswar M.Tech, I.I.I.T.-Bhubaneswar
Bhubaneeswar, India Bhubaneeswar, India Bhubaneeswar, India
[email protected] [email protected] [email protected]
Swati Vipsita
Department of Computer Science
I.I.I.T.-Bhubaneswar
Bhubaneeswar, India
[email protected]
Abstract— Today is a world of technology with a foreseen classified nor labeled. It also studies to infer a
future of a machine reacting and thinking same as human. In function from a system to describe a hidden
this process of emerging Artificial Intelligence, Machine structure from unlabeled data. Clustering is an
Learning, Knowledge Engineering, Deep Learning plays an approach of unsupervised learning.
essential role. In this paper, the problem is identified as
regression or classification problem and here we have solved a • Semi supervised learning [6] [11]: It takes the
real world problem of popularity prediction of a car company characteristics of both unsupervised learning and
using machine learning approaches. supervised learning. These algorithms uses small
amount of labeled data and large amount of
Keywords—Machine Learning, Regression, Classification, unlabeled data.
Supervised Machine Learning, Logistic Regression, KNN, • Reinforcement [12]: In this algorithm, interaction
Random Forest. is made to environment by actions and discovering
errors. It allows machines and software agents in
I. INTRODUCTION determining ideal behavior in a specific context
such that performance could be maximized.
In the era which we live in, technology has a big impact on
Regression and Classification problems are types of
our lives. Artificial intelligence [6], knowledge engineering,
problems in supervised learning. In classification,
Machine learning, Deep learning [4][5], Natural language
conclusion is drawn using values which are obtained by
processing[7][8] are emerging technologies which plays an
observation. A discrete output variable say y is
important role in the leading projects of today's world.
approximated by this problem using a mapping function say
Artificial intelligence is an area or branch which aims or
f on input variables say x. The output of classification is
emphasizes on creating machine that works intelligently and
generally discrete but it can also be continuous for every
their reactions is similar to that of human.
class label in the form of probability. A regression problem
has output variable as a real or continuous value. A
In Artificial Intelligence, Machine learning is an essential
continuous output variable say y is approximated by this
and core part providing the ability of learning and
problem using a mapping function say f on input variables
improving by itself. The focus of this technique is on
say x. The output of regression is generally continuous but it
creation of programs which can pick the data and learn from
can also be discrete for any class label in the form of an
it by itself. Earlier, statistician and developers worked
integer. A problem with many output variables is referred to
together for predicting success, failure, future etc. of any
multivariate regression problem.
product. This process led to delay of the product
In this paper we will be focusing on a problem picked from
development and launch. Maintenance of such product in
hackerrank where a company is trying to launch a new car
the changing technology and data is also one of the major
modified on the basis of the popular features of their
challenges.
existing cars. The popularity will be predicted using
machine learning approach. It can be classified as regression
Machine learning made this process easier and faster. problem especially a multivariate regression problem and
There are various Machine learning algorithms broadly the problem can be classified under supervised learning.
categorized into four paradigms: Thus various supervised learning algorithms will be used for
• Supervised learning [7] [9] [10]: This learning this prediction.
algorithm provides a function so as to make II. RELATED WORKS
predictions for output values, where process starts
from analysis of a known training dataset. This In paper “Predicting stock movement direction with
algorithm can be applied to the past learned data to machine learning: An extensive study on S&P 500
new data using labels so as to predict future events. stocks[1]”, author has reviewed some classification
• Unsupervised learning: This algorithm is used on algorithms such as random forest, gradient boosted trees,
training dataset and informs which is neither artificial neural network and logistic regression to predict