Sales Analysis and Prediction Using Pyth
Sales Analysis and Prediction Using Pyth
ABSTRACT
These days shopping centers and Big Marts maintain record of their selling details for all the persons to forecast
the customer’s potential demand and even monitor the inventory control. In a data center these data warehouses
essentially comprise a vast amount of consumer details and individual object attributes. In fact, deviations and
repeated variations are identified by removing data from the data warehouse. The resulting results will be used
to forecast potential revenue figures for retailers like Big Mart using numerous machine learning techniques. In
this paper, we build a predictive model using machine learning algorithms for predicting the sales of a company
and find which model performs better. The models are compared to find out which model performs better in
terms of performance.
Keywords: Data Analytics, Machine Learning, Linear Regression, Random Forest, Python
----------------------------------------------------------------------------------------------------------------------------- ----------
Date of Submission: 13-05-2020 Date of Acceptance: 26-05-2020
----------------------------------------------------------------------------------------------------------------------------- ----------
The dataset consists of 11 fields in the dataset language thus making the job easier to perform.
namely: Item_Identifier, Item_Weight, Other programming languages are harder than
Item_Fat_Content, Item_Visibility, Item_Type, Python. Python has emerged to be one of the
Item_MRP, Outlet_Identifier, favorite languages of the programmers. One that is
Outlet_Establishment_Year, Outlet_Size, widely used for developing various applications as
Outlet_Location_Type, and Outlet_Type fields. well as performing data analytics.
The description of the fields mentioned above are
as follows: 3.1 Features of Python
Item_Identifier: This field consists of the Python can achieve better productivity with less
unique product ID of the item. It is an ID variable. amount of code. However, it is not as fast as some
Item_Weight: This fields consists of the of the other programming languages. The features
weight of the product. This is not considered in of this language are:
hypothesis. High-level: it has components of natural
Item_Fat_Content: This field tells whether language that people use for communication. It is
the product has low fat or not. More than any other easy to understand what task the code is
items the low-fat items are preferred. This performing.
particular field is linked to the ‘Utility’ hypothesis. Interpreted: Debugging errors is easy and
Item_Visibility: This field tells us about efficient as the code is compiled line by line. This
the area assigned to a particular product with makes the Python programming language slow
respect to the percent of the total display area of all than other languages.
products. It is used for the hypothesis of the Easy syntax: Indentations are used instead
‘display area’. of braces in Python to determine which code block
Item_Type: This field tells about the is under a certain class or function. This makes the
category of the product. To derive more knowledge code easy to read.
about the utility this field can be used. Dynamic Semantics: There is no need to
Item_MRP: This field tells about the MRP initialize anything before using. This process is
of the product. This field is not important for done automatically in Python.
analysis and hence is not considered for the Portable: There is no need to make
hypothesis. changes in the code to run it on different systems.
Outlet_Identifier: This field consists of the This makes it easy to work on a task.
unique store ID. It is an ID variable. Open Source: It is free and can be used
Outlet_Establishment_Year: This field and modified by anyone as per their preference.
gives information about the year in which the store Object-Oriented Language: It helps
was established. It is not considered in the simulate real-world scenarios and provides security
hypothesis. to get a well-made application.
Outlet_Size: This field tells about the Simplicity: By understanding only
ground area that the store covers. This field is indentations one can code any application in less
linked to ‘store capacity’ hypothesis. lines of code.
Outlet_Location_Type: This field tells us Embedding Properties: It is powerful and
about the location that is the type of city where the versatile and allows embedding of code from other
store is located. This field is linked to the ‘city languages like C.
type’ hypothesis. Library Support: It supports various
Outlet_Type: This field tells about libraries that can make obtaining solutions easy and
whether the store is a supermarket or a small store. fast.
This field is also connected to the ‘store capacity’
hypothesis. 3.2 Usage of Python
Item_Outlet_Sales: This field is the Frameworks like Django and Flask are
outcome variable that is being predicted. It tells used for developing web applications.
about the sales of the product in a store. This field Creating workflows for the software.
is the desired outcome variable.[5] Modifying files and data in Databases.
Complex calculations and scientific and
III. PYTHON FOR DATA ANALYTICS analytic calculations.
Python is a programming language that
has a very easy syntax and semantics and is an 3.3 History of Python
interpreted language and high-level language. It Python programming language was
takes less effort to create applications using this developed approximately 30 years ago in 1990’s by
Guido van Rossum and first came into being in the obtain a model. This model helps us to predict the
year 1991. The main aspect of this programming final outcome.
language is its code readability and the usage of ETL refers to Extract, Transform and
large enough to be noticed whitespace. It uses the Load. This is the tool which will combine all three
multi programming paradigm. It also makes the of the functions. It is fed the data from a particular
usage of functional, imperative, object-oriented, database and the tool transforms the input data into
structured, and reflective paradigm. a suitable format. The raw data is transformed to an
There are about 8 different understandable format by using data mining
implementations of Python programming language techniques that is data preprocessing. Data
namely: CPython, PyPy, Stackless Python, processing is a very important step as the data
MicroPython, CircuitPython, IronPython, Jython, collected from real sources may be incomplete or
RustPython. Python language is influenced by a inconsistent.
number of other languages namely: ABC, Ada,
ALGOL 68, APL, C, C++, CLU, Dylan, Haskell,
Icon, Java, Lisp, Modula-3, Perl, Standard ML.
There are languages whose development is
influenced by Python. These languages are: Apache
Groovy, Boo, Cobra, CoffeeScript, D, F#, Genie,
Go, JavaScript, Julia, Nim, Ring, Ruby, Swift.
V. CONCLUSION
A software tool is proposed by us for
predicting the future sales based on the historical
data. With this tool, it can be found out how precise
is the prediction for linear regression and random
forest machine learning algorithms.
ACKNOWLEDGEMENT
The successful realization of the project is
an outgrowth of a consolidated effort of people
from disparate fronts. We are thankful to Dr.
Krishan Kumar for his valuable advice and support
extended to us without which we would have not
been able to complete the project for success.
We are thankful to Ms. Pronika Chawla
for her guidance and support.
Words cannot express our gratitude for all
those people who helped us directly or indirectly in
our Endeavour. We take this opportunity to express
our sincere thanks to everyone for their valuable
suggestions and also to our family and friends for
their support.
REFERENCES
[1]. Palak Mittal, Mansi Sharma, Dr. Prateek
Jain, A Detailed Study of Security and
Privacy Concerns in Big Data,International
Journal of Applied Engineering Research
ISSN 0973-4562 Volume 13, Number 10
(2018) pp. 7406-7411