ECOMMERCE
ECOMMERCE
BACHELOR OF
TECHNOLOGY IN
INFORMATION TECHNOLOGY
Submitted by
Assistant Professor
CERTIFICATE
This is to certify that the project report entitled “E-Commerce Pricing Model” submitted
by B.UMA RANI (318126511120), M. AISHWARYA (318126511138), B. VINNI
(318126511162), S H V S KRISHNA (319126511L01) in partial fulfilment of the
requirements for the award of the degree of Bachelor of Technology in Information
Technology of Anil Neerukonda Institute of technology and sciences, Visakhapatnam is a
record of bonafide work carried out under my guidance and supervision.
We would like to express our deep gratitude to our project guide Asst.Prof Reddi
Prasadu (Ph.D.) sir, Department of Information Technology, ANITS, for his guidance
with the required facilities for the completion of the project work. We are very much
encouragement and cooperation to carry out this work. We express our thanks to all
accomplishment of our project. We would like to thank all non-teaching staff of the
project. We would like to thank our parents, friends, and classmates for their
encouragement throughout our project period. At last, but not the least, we thank
3
DECLARATION
We hereby declare that the project work entitled “E-Commerce Pricing Model”
Neerukonda Institute of Technology and Sciences, and this project work is submitted in
the partial fulfillment of the requirements for the award of degree Bachelor of
Technology in Information Technology. This entire project is done with the best of our
knowledge and have not been submitted for the award of any other degree in any other
universities.
INDEX
I. ABSTRACT
Our study and project aim to develop an E-Commerce Pricing Model for helping customers to
buy their products at low cost. Nowadays everything is becoming online many people started
buying goods online on websites like amazon, flip kart ...etc. Many of them don’t have any idea
about products cost and they get confused with this fall and rise of products cost. So, our project
helps those people to buy their products without any second thoughts. Our proposed system
provides users with price history charts of products, predict the price drop, and send alert
messages to the customers. So that customers can save their money on every online purchase. In
this paper, we present an overview of our project E-Commerce Pricing Model UI-user interface,
Architectural Design, Introduction, Motivation of our project, the methodologies we use,
technologies we use, our research outcomes, societal use, snips of the working application, a
piece of code etc. According to the success of other systems which are developed under different
project goals, we hope our project helps people who were using.
II. LIST OF FIGURES 1
2
III. LIST OF TABLES
3
CHAPTER-1
INTRODUCTION
4
1.INTRODUCTION
This system allows users to buy their products at low cost without any second thoughts. This
software project concentrates on predicting the price drop from the user set value. First, we will
analyze the data and give user price history chart for the product he chooses, and we show price
prediction for that product using machine algorithm. so that user can buy their product at low
cost.
Nowadays everything is becoming online many people started buying goods online on websites
like amazon, flip kart ...etc. Many of them don’t have any idea about products cost and they get
confused with this fall and rise of products cost. There are some scenarios where price of the
product decreases within days or week after user buy their product which leads to customer
disappointment and confusion when to buy the product with all this price fall and rise.
We also faced this type of situations many times. These experiences motivate us to come up with
this project idea. And the other motivation that drives us to do this project is to save the
customers money on every online purchase. Successful completion of the project with higher
precision on the dataset can help users to buy their products at an affordable price.
To design the system to provide price history charts and to send price prediction alerts for their
product, as stated in 1.1, the developed system will assist all the online buyers to get know the real
cost of the product and will save their money at their every online purchase. The proposed
Solution will let the people to buy their product without any second thoughts. In comparison to
buying products without knowledge about the price history and actual cost of product the
customer may get cheated with false costs that every online shopping app show, this system will
cut time wasted
by users for online research for product cost. Users can login through their account to the model
5
The main intention of our project is to make customers buy their products at reasonable products.
Many systems which are were developed earlier does not provide an option of price prediction.
We are developing this system that user can find reasonable cost every product they want to buy
online.
The target groups of this system are customers who buy their goods online using online
apps.
The application is made as a Windows application anyone with a PC with windows in it
can access. Impact, Significance, and contribution:
Nowadays shopping online can be a lot more convenient than heading out. Instead, we
can buy through online shopping sites. So, our project helps people to analyze the prices
of products.
6
Prices for most of the Products on Shopping Sites change every day. We at our model
track
them and store them on our servers.
We make that data available for you at your fingertips for free. We provide buyers more
buying power than ever before.
Our pricing model tells you the best time and best price to buy a product.
Once you set the price and click on notify me, we will send you message when price
meets your desired price
The recent studies within the subject Economics and Operations Management have shown an
increasing tendency of consumers to be strategic in their purchase decisions. With the consumers
being strategic, the pricing plans of sellers must be optimized to maximize the revenue. this needs
specific knowledge about the factors, which influence the acquisition decisions of consumers and
their probable course of actions in several pricing scenarios. The shopping experience is one
amongst the foremost important factors that determine the potential purchase decisions of a
consumer. By offering quality customer services in a web environment within the variety of
specifically crafted product recommendations, discounts, tailor-made prices etc., sellers can make
their customers happier. With the appearance of huge data, shopping experience has become very
subjective and a more personalized one because the sellers are able to provide tailor-made offers,
recommendations etc. to every consumer supported his/her previous purchases. this can be indeed
very advantageous to the consumers as they get relevant product recommendations and offers,
which suit their needs.
Many studies illustrate that customer’s loyalty towards the sellers plays a really important role in
their prospective purchase decisions. The aim of each consumer within the market is to maximize
the satisfaction level. Purchasing products at the bottom prices (signals highest value) do certainly
improve their satisfaction.
7
1.5.1 Project Field:
Consumer’s perception of the price of a product has a significant influence on their shopping
experience. The price perception of the consumers in a way moderates their overall shopping
experience. With the modern pricing techniques like dynamic pricing, producers do squeeze a
large portion of the consumer surplus, which adversely affects the price perceptions of
consumers
and in turn their shopping experience. Therefore, it is also necessary to validate the moderating
effect of fair price perception of consumers on their overall shopping experience. Previous
research studying consumer price perceptions have identified that the acceptance of a particular
price by consumers depends on their perception of its fairness. Price fairness includes two
significant aspects i.e., economic, and social. From an economic perspective, a price that
maximizes the utility of consumers and covers the cost of benefits that they receive is considered
as acceptable. A
socially acceptable price is a fair price to the consumers as the price is presupposed as a tool that
operates in accordance with the rules and regulations of the society. With the introduction of
dynamic pricing, the fair perceptions of consumers have been seriously hurt. E-commerce giants
like Amazon and Walmart change the prices minute by minute and boosting their profits by 25%.
Earlier models don’t have email notifications they just display graphs and bar charts. In earlier
projects, the datasets they used had old data, so the prediction was not that accurate. The dataset
that they used was 2-3 years old and the sales that it was predicting now we're based on that data.
Nowadays data is being generated data such a large rate and there would be so many changes in
new data in comparison with the stored data There is a certain correlation between the parameters
8
which affect the sales.
CHAPTER-2
LITERATURE SURVEY
9
I. LITERATURE SURVEY:
Sales were based on the old dataset and not on user-generated data:
In earlier projects, where a stored dataset was used the prediction was not that accurate.
A dataset that was used in the project was 2-3 years old and the sales which it was
predicting now we’re based on that data.
Nowadays data is generating data at such a large rate and there would be so many
changes in new data in comparison with the stored data
The experiment is performed using WEKA (Waikato Environment for Knowledge Analysis). The
main steps of machine learning are as follows Data collection, Dimensionality Reduction,
Classification. This work can be concluded with the comparable results of both Feature
11
selection algorithms and classifier except the combination of WrapperattributEval and Decision
Tree J48 classifier. This combination has achieved maximum accuracy and selected minimum but
most appropriate features. It is important to note that in Forward selection by adding irrelevant or
redundant features to the data set decreases the efficiency of both classifiers. While in backward
selection if we remove any important feature from the data set, its efficiency decreases. The main
reason of low accuracy rate is low number of instances in the data set. One more thing should also
be considered while working that converting a regression problem into classification problem
introduces more error.
The application consists of two primary components: a web-based module written in PHP, and
an offline module written in C. The web component is responsible for communicating with both
eBay and end users of the program, as well as for testing instances against a model in real time.
The local component is responsible for learning the pricing model. A database stored on the web
server is the single point of contact between the two components of the software.
This application periodically gathers data for past laptop auctions and stores that data in a
database. It periodically runs a training algorithm on that data and builds a price prediction
model and stores this model in the database as well. For the casual buyer, a public web form is
maintained that allows individual get-item-by-id request for every laptop in which Buyer is
interested. Additionally, eBay limits the number of daily requests to 5000 per developer, unless
the application is manually approved by eBay staff.
However, if one requests an ended auction by item id, eBay will return the information,
including the final selling price. Finally, it is interesting to consider fully automating the process,
allowing the system to make purchases based solely on its evaluation without per-item sanction
by a human.
Logistic regression:
This thesis applies logistic regression to evaluate the _rm's price adjustment behavior.
The logistic regression is used for classification in this thesis, and it can classify whether
price change happens (Y = 1) or no price change (Y = 0).
The logistic regression model is expressed by following:
13
CHAPTER-3
SYSTEM ANALYSIS
14
SYSTEM ANALYSIS
Pandas: Pandas is very useful python package which is used to work with relational data
easily. It aims to be the basic high-level building block for doing practical, real-world data
analysis in Python. Additionally, it's the broader goal of becoming the foremost powerful and
versatile open-source data analysis/manipulation tool available in any language.
Tkinter: We have developed the Front-End using Tkinter, it is a python module widely used
for creating GUI supported in all operating systems like windows, Linux, etc. Python has
many other modules for GUI among all Tkinter is widely used and easy to develop
applications using it.
Matplotlib: Matplotlib is an excellent visualization library in python 2D arrays. It is a multi-
platform data visualization library built on the NumPy array and designed to work with a wide
range of SciPy stacks. We chose this library because we had an idea to add an accuracy rate in
query writing for the users, for which we use python dictionaries. Matplotlib contains many
platforms such as lines, bars, scatters, histograms, etc. Our system likely uses bars for plotting.
15
CHAPTER-4
SYSTEM DEVELOPMENT
16
4.SYSTEM DEVELOPMENT
To describe this application in terms of development, we used four different algorithms to seek
out the accurate asking price of the merchandise.
They are Light gbm regressor, XGboost Regressor, Catboost regressor, and Random Forest
Regressor.
LightGBM Regressor:
LightGBM may be a gradient boosting framework supported by decision trees to extend the
efficiency of the model and reduce memory usage.
It uses two novel techniques: Gradient-based One Side Sampling and Exclusive Feature Bundling
(EFB) which fulfills the constraints of the histogram-based algorithm that's primarily employed in
all GBDT (Gradient Boosting Decision Tree) frameworks. the 2 techniques of GOSS and EFB
described below form the characteristics of the LightGBM Algorithm. They comprise together to
create the model work efficiently and supply it innovative over other GBDT frameworks
Gradient-based One Side Sampling Technique for LightGBM:
Different data instances have varied roles within the computation of data gain. The instances with
larger gradients(i.e., under-trained instances) will contribute more to the knowledge gain. GOSS
keeps those instances with large gradients (e.g., larger than a predefined threshold, or among the
highest percentiles), and only randomly drops those instances with small gradients to retain the
accuracy of knowledge gain estimation. This treatment can result in a more accurate gain
estimation than uniformly sampling, with the identical target rate, especially when the worth of
knowledge gain contains a large range.
17
XGboost Regressor:
The objective function contains a loss function and a regularization term. It tells about the
difference between actual values and predicted values, i.e., how far the model results are from the
000 values.
the foremost common loss function in XGBoost for regression problems is reg: linear, which for
binary classification is reg: logistics. Ensemble learning involves training and mixing individual
models (known as base learners) to urge one prediction, and XGBoost is one in all the ensemble
learning methods. XGBoost expects to own the bottom learners which are uniformly bad at the
rest so when all the predictions are combined, bad predictions eliminate and better one sums up to
create final good predictions.
Catboost Regressor:
The main idea of boosting is to sequentially combine many weak models (a model performing
slightly better than random chance) and thus through greedy search create a powerful competitive
predictive model.
Random Forest could be a popular machine learning algorithm that belongs to the supervised
learning technique. It will be used for both Classification and Regression problems in ML. it's
supported the concept of ensemble learning, which could be a process of mixing multiple
classifiers to unravel a posh problem and improve the performance of the model. 20|Page because
the name suggests, "Random Forest may be a classifier that contains variety of decision trees on
various subsets of the given dataset and takes the typical to enhance the predictive accuracy of
that dataset." rather than wishing on one decision tree, the random forest takes the prediction from
each tree, and supported the bulk votes of predictions, it predicts the ultimate output. The greater
number of trees within the forest results in higher accuracy and prevents the matter of overfitting.
When we compare the prediction values of 4 algorithms with one another we notice that random
18
forest contains a more accurate value than the other algorithms. After obtaining individual
prediction values for four algorithms, we discover the mean of these prediction values so the
expected value is going to be very accurate.
19
CHAPTER-5
METHODOLOGY
20
5.METHODOLOGY
5.1 Methodology
The software methodology we opt to use is the iterative software development model. we have
selected this model as the planning, designing, etc. of our project is not completely done at the
initial time of the project so we have selected iterative SDLC.
The iterative technique starts with a primitive implementation of a piece of the software
requirements and then iteratively improves the evolving versions until the entire system is
complete. With each iteration, design changes are made, and new functional capabilities are
added. This strategy's main notion is to create a system in little portions over time (iterative)
(incremental).
Iterative and incremental development is a development method that combines iterative design or
the iterative process with a build strategy that is incremental. "During software development,
multiple iterations of the software development cycle may be in progress at the same time." This
procedure could be described as "evolutionary acquisition" or "incremental build."
In this incremental strategy, the complete need is divided into numerous builds. With each
iteration, the development module goes through the requirements, design, implementation, and
testing stages. Each subsequent release of the module adds to the capabilities of the previous
iteration. The process is continued until the entire system complies with the specifications.
Iterative and incremental development, like other SDLC models, has particular specialized
software industry uses. In the following cases, this model is most used:
The development team is working on a project that requires them to learn a new technology.
Because resources with the required skill sets are in short supply, they will be hired on a contract
basis for certain iterations. Some features and aims are high-risk, and they may alter in the future.
1.Gathering and Requirement Analysis: In this phase, requirements are acquired from customers
and an analyst determines whether they will be met. Analyst determines whether or not the
requirement will be met within budget. After that, the software team moves on to the next stage.
2. Design: During the design phase, the team uses several diagrams such as a Data Flow Diagram,
an Activity Diagram, a Class Diagram, a State Transition Diagram, and so on to create the
software.
4. Testing: After the development step is completed, software testing begins utilizing various test
methods. There are other test techniques, but the white box, black box, and grey box test methods
are the most frequent. Implementation: During the implementation phase, requirements are
defined in a coding language and converted into software programs.
5. Deployment: After all the processes have been completed, the software is deployed to its
working environment.
6. Review: Following the deployment of the product, a review phase is conducted to assess the
behavior and validity of the generated product. If any errors are discovered, the process is
restarted from the requirement gathering stage.
7. Maintenance: Following the deployment of software in the working environment, there may be
some bugs, errors, or new upgrades that require maintenance. Debugging and new addition
options are included in maintenance.
22
CHAPTER-6
HANDS ON DEVELOPMENT
23
6. HANDS ON DEVELOPMENT
To describe this application in terms of development, we used dataset to train our model.
Train Dataset:
Our Train dataset is of length 2453 and it consists of 8 columns like ProductID, Brand, category,
subcategory, variant, rating, date, price. We use this Train dataset to train our model for
predicting the selling price of a product.
Test Dataset:
Our Test dataset is of length 1052 and it consists of 7 columns like ProductID, Brand, category,
subcategory, variant, rating, Date. We use this Test dataset to test the model we created using train
dataset.
To seek out the accurate selling price of the Product we used four different machine learning
algorithms.They are Lightgbm regression, XGboost regression, Catboost regression, and Random
forest regression.
24
First, we take each algorithm and predicting the selling price of products and store them in an
excel sheet.
We will get four excel sheets after going through four algorithms using those four excel sheets we
find the ensembled predicted values of the product.
def rmse(y_true, y_pred):
return mean_squared_error(y_true, y_pred) ** 0.5
1.LightGBM Regression:
LightGBM may be a gradient boosting framework supported by decision trees to extend the
efficiency of the model and reduce memory usage.It uses two novel techniques: Gradient-based
One Side Sampling and Exclusive Feature Bundling (EFB) which fulfills the constraints of the
histogram-based algorithm that's primarily employed in all GBDT (Gradient Boosting Decision
Tree) frameworks. the 2 techniques of GOSS and EFB described below form the characteristics of
the LightGBM Algorithm. They comprise together to create the model work efficiently and supply
it innovative over other GBDT frameworks
25
Different data instances have varied roles within the computation of data gain. The instances with
larger gradients (i.e., under-trained instances) will contribute more to the knowledge gain. GOSS
keeps those instances with large gradients (e.g., larger than a predefined threshold, or among the
highest percentiles), and only randomly drops those instances with small gradients to retain the
accuracy of knowledge gain estimation. This treatment can result in a more accurate gain
estimation than uniformly sampling, with the identical target rate, especially when the worth of
knowledge gain contains a large range.
lgbm = LGBMRegressor (n_estimators=1000, num_leaves=127, max_depth=-1,
min_child_samples=4, learning_rate=0.02, colsample_bytree=0.4, reg_alpha=0.5,
reg_lambda=2)
= lgbm.fit (X_trn, np.log(y_trn), eval_set = [(X_val, np.log(y_val))], verbose=100,
early_stopping_rounds=100, eval_metric='rmse')
Above code is the important part of the algorithm we are sending no.of leaves, depth of the tree…
etc. as parameters for the predefined LGBMRegressor in Python language. Here, the OOF val
score is the accuracy rate of the Lightgbm algorithm and it is 0.6466.
2.XGboost Regression:
The objective function contains a loss function and a regularization term. It tells about the
26
=
difference between actual values and predicted values, i.e how far the model results are from
the 000 values. the foremost common loss function in XGBoost for regression problems is reg:
linear, which for binary classification is reg: logistics. Ensemble learning involves training and
mixing individual models (known as base learners) to urge one prediction, and XGBoost is one in
all the ensemble learning methods. XGBoost expects to own the bottom learners which are
uniformly bad at the rest so when all the predictions are combined, bad predictions eliminate and
better one sums up to create final good predictions.
Here, the OOF val score is the accuracy rate of the Xgboost algorithm and it is 0.6468.
After getting Predicted price for each product in the dataset we are saving those values in an excel
sheet called mean_encoding_xgboost2.xlsx.
sub.to_excel('mean_encoding_xgboost2.xlsx', index=False)
3.Catboost Regression:
The main idea of boosting is to sequentially combine many weak models (a model performing
slightly better than random chance) and thus through greedy search create a powerful competitive
predictive model.
Above code is the important part of the algorithm we are sending learning rate,depth of the
tree…etc as parameters for the Predefined CatBoostRegressor in Python language.
Here, the OOF val score is the accuracy rate of the Catboost algorithm and it is 0.6515.
OOF val score: 0.6515039864490813
Mean rmsle: 0.6505 and std Dev. is 0.04
After getting Predicted price for each product in the dataset we are saving those values in an excel
sheet called mean_encoding_catboost2.xlsx.
sub.to_excel('mean_encoding_catboost2.xlsx', index=False)
4.Random Forest Regressor:
Random Forest could be a popular machine learning algorithm that belongs to the supervised
learning technique. It will be used for both Classification and Regression problems in ML. it's
supported the concept of ensemble learning, which could be a process of mixing multiple
classifiers to unravel a posh problem and improve the performance of the model. because the
name suggests, "Random Forest may be a classifier that contains variety of decision trees on
various subsets of the given dataset and takes the typical to enhance the predictive accuracy of that
dataset." rather than wishing on one decision tree, the random forest takes the prediction from
each tree, and supported the bulk votes of predictions, it predicts the ultimate output. The greater
number of trees within the forest results in higher accuracy and prevents the matter of overfitting.
rfg = RandomForestRegressor(n_estimators=1000,random_state=1234,max_depth=8)_ =
rfg.fit(X_trn, np.log(y_trn))
Above code is the important part of the algorithm we are sending random states,depth of the
tree…etc as parameters for the Predefined RandomForestRegressor in Python language.
28
Here, the OOF val score is the accuracy rate of the Radom Forest algorithm and it is 0.6600.
After getting Predicted price for each product in the dataset we are saving those values in an excel
sheet called mean_encoding_rfg2.xlsx.
sub.to_excel('mean_encoding_rfg2.xlsx', index=False)
When we compare the prediction values of 4 algorithms with one another we notice that random
forest contains a more accurate value than the other algorithms.
5.Ensembling:
After obtaining individual prediction values for four algorithms, we find the mean of these
prediction values so the expected value are going to be very accurate.
sub['Selling_Price']. describe()
Above code is the important part of the algorithm. Here, the OOF val score is the accuracy rate of
the Radom Forest algorithm and it is 0.6452.
After getting Ensembled Predicted prices for each product in the dataset we are saving those
values in an excel sheet called mean_encoding_ensemble_lgbcatrfg2.xlsx.
29
CHAPTER-7
CODING
30
7. CODING
training_start_time = time.time()
rmsle = list()
max_iter = 10
folds = StratifiedKFold(n_splits = max_iter)
oofs = np.zeros(len(X))
test_preds = np.zeros(len(Xt))
for fold_, (trn_idx, val_idx) in enumerate(folds.split(X, pd.qcut(y, 10, labels=False, duplicates='drop'))):
print(f'\n---- Fold {fold_} -----\n')
X_trn, y_trn = X.iloc[trn_idx][features], y.iloc[trn_idx]
X_val, y_val = X.iloc[val_idx][features], y.iloc[val_idx]
X_test = Xt[features]
lgbm = LGBMRegressor(n_estimators=1000, num_leaves=127, max_depth=-1,min_child_samples=4,
learning_rate=0.02, colsample_bytree=0.4, reg_alpha=0.5, reg_lambda=2) _ = lgbm.fit(X_trn,
np.log(y_trn), eval_set = [(X_val, np.log(y_val))], verbose=100, early_stopping_rounds=100,
eval_metric='rmse')
oofs[val_idx] = np.exp(lgbm.predict(X_val))
current_test_pred = np.exp(lgbm.predict(X_test))
test_preds += np.exp(lgbm.predict(X_test))/max_iter
print(f'\n Fold {rmse(np.log(y_val), np.log(oofs[val_idx]))}')
rmsle = np.append(rmsle, rmse(np.log(y_val), np.log(oofs[val_idx])))
print(f'\nOOF val score: {rmse(np.log(y), np.log(oofs))}')
print(f'Mean rmsle: {np.mean(rmsle):.4f} and std Dev. is {np.std(rmsle):.2f} \n')
training_start_time = time.time()
rsmle = list()
max_iter = 10
folds = StratifiedKFold(n_splits = max_iter)
oofs = np.zeros(len(X))
test_preds = np.zeros(len(Xt))
for fold_, (trn_idx, val_idx) in enumerate(folds.split(X, pd.qcut(y, 10, labels=False,
duplicates='drop'))):
31
print(f'\n---- Fold {fold_} -----\n')
X_trn, y_trn = X.iloc[trn_idx][features], y.iloc[trn_idx]
X_val, y_val = X.iloc[val_idx][features], y.iloc[val_idx]
X_test = Xt[features]
xgb = XGBRegressor(n_estimators=1000, max_depth=12, learning_rate=0.05, colsample_bytree=0.45)
_ = xgb.fit(X_trn, np.log(y_trn), eval_set = [(X_val, np.log(y_val))], verbose=100,
early_stopping_rounds=100, eval_metric='rmse')
oofs[val_idx] = np.exp(xgb.predict(X_val))
current_test_pred = np.exp(xgb.predict(X_test))
test_preds += np.exp(xgb.predict(X_test))/max_iter
print(f'\n Fold {rmse(np.log(y_val), np.log(oofs[val_idx]))}')
rmsle = np.append(rmsle, rmse(np.log(y_val), np.log(oofs[val_idx])))
print(f'\nOOF val score: {rmse(np.log(y), np.log(oofs))}')
print(f'Mean rmsle: {np.mean(rmsle):.4f} and std Dev. is {np.std(rmsle):.2f} \n')
rmsle = list()
max_iter = 10
folds = StratifiedKFold(n_splits = max_iter)
oofs = np.zeros(len(X))
test_preds = np.zeros(len(Xt))
for fold_, (trn_idx, val_idx) in enumerate(folds.split(X, pd.qcut(y, 10, labels=False, duplicates='drop'))):
print(f'\n---- Fold {fold_} -----\n')
X_trn, y_trn = X.iloc[trn_idx][features], y.iloc[trn_idx]
X_val, y_val = X.iloc[val_idx][features], y.iloc[val_idx]
X_test = Xt[features]
cat = CatBoostRegressor(n_estimators=2000, learning_rate=0.05, max_depth=9, rsm=0.5)
_ = cat.fit(X_trn, np.log(y_trn), eval_set = [(X_val, np.log(y_val))], verbose=100,
early_stopping_rounds=100)
oofs[val_idx] = np.exp(cat.predict(X_val))
current_test_pred = np.exp(cat.predict(X_test))
test_preds += np.exp(cat.predict(X_test))/max_iter
print(f'\n Fold {rmse(np.log(y_val), np.log(oofs[val_idx]))}')
rmsle = np.append(rmsle, rmse(np.log(y_val), np.log(oofs[val_idx])))
print(f'\nOOF val score: {rmse(np.log(y), np.log(oofs))}')
print(f'Mean rmsle: {np.mean(rmsle):.4f} and std Dev. is {np.std(rmsle):.2f} \n')
training_start_time = time.time() 32
rmsle = list()
max_iter = 10
7.5 Ensembling:
feat_imp = pd.DataFrame({'columns':X.columns,'feature_importance':rfg.feature_importances_})
feat_imp
print(f'\nOOF val score: {rmse(np.log(y), np.log(vp0*0.2 + vp4*0.4 + vp2*0.4))}')
test_preds_ensemble = tp0*0.2 + tp4*0.4 + tp2*0.4
sub = pd.DataFrame({'Selling_Price': test_preds_ensemble})
sub['Selling_Price'] = np.clip(sub['Selling_Price'], y.min(), y.max())
sub['Selling_Price'].describe()
sub.head(10)
sub.to_excel('mean_encoding_ensemble_lgbcatrfg.xlsx', index=False)
34
CHAPTER-8
TESTING
35
8. Testing
8.1 Testing:
Testing in this project is the process of checking the software in terms of, requirement fulfillment,
Code functionality, Integration of Codes, etc. The testing we have done are Integration Testing,
Functionality Testing, Manual Testing, and Unit Testing. After every development, we have done
the testing for the code functionality by giving test cases, and when fulfilled the code is sent to
integration. The integration testing is done by checking whether the code is working fine after
integrating it with the user interface or not if the code is successfully integrated, again unit testing
is done on that part. After successful completion of passing the testing phase the project run into a
new phase of development.
The process of manually testing software for flaws is known as manual testing. It necessitate a
tester acting as an end user and utilizing the majority of the application's features to ensureproper
behavior. Various aspects of the application have tested manually by us.
The Unit Testing here have done by checking the code snippets by giving external data to the
coding snippets which consists of functions that are involved in the tutoring at both levels, the
code snippets like graph generation, and feedback generation is given data for which it has to
perform its respective task.
36
CHAPTER-9
RESULTS
9. Results 37
Fig 9.1 Home Page Interface
41
10. Conclusion
42
CHAPTER-11
REFERENCES
43
11. REFERENCES
[1] Zhao, K. and Wang, C. (2017) ‘Sales Forecast in E-commerce using Convolutional Neural
Network’, (August 2017). Available at: http://arxiv.org/abs/1708.07946.
[2] Bandara, K. et al. (2019) ‘Sales Demand Forecast in E-commerce using a Long Short-Term
Memory Neural Network Methodology’.
[3] Li, M., Ji, S. and Liu, G. (2018) ‘Forecasting of Chinese E-Commerce Sales: An Empirical
Comparison of ARIMA, Nonlinear Autoregressive Neural Network, and a Combined ARIMA-
NARNN Model’, Mathematical Problems in Engineering, 2018, pp. 1–12. doi:
10.1155/2018/6924960.
[4] Kuo-Kun Tseng & Regina Fang-Ying Lin & Hong Fu Zhou & Kevin Jati Kurniajaya & Qian
Yu Li, 2018. "Price prediction of e-commerce products through Internet sentiment analysis,"
[5]https://www.researchgate.net/publication/
330011161_PRICE_TRACKING_BEHAVIOUR_IN_ELECTRONIC_COMMERCE_AND_TH
E_MODERATING_ROLE_OF_FAIR_PRICE_PERCEPTION
[6] https://www.ijert.org/predicting-online-product-sales-using-machine-learning
44
CHAPTER-12
PUBLISHED PAPER