0% found this document useful (0 votes)

71 views6 pages

INF554: M L I E P AXA Data Challenge - Assignment: 1 Description of The Assignment

The document describes an assignment to build a machine learning model to forecast inbound call volume for AXA's call centers in France. The model should predict the number of incoming calls for each half-hour time slot, 7 days in advance. Students will be provided with historical call data from 2011-2013 to train their models. Models will be evaluated on a test data set based on their ability to accurately predict the number of calls in the submission file. The top-performing models, as judged by a leaderboard, will be recognized at an award ceremony hosted by AXA and the university.

Uploaded by

meriem baha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views6 pages

INF554: M L I E P AXA Data Challenge - Assignment: 1 Description of The Assignment

Uploaded by

meriem baha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

INF554: M ACHINE L EARNING I

É COLE P OLYTECHNIQUE
AXA Data Challenge - Assignment

Data Science and Mining (DaSciM) Team

October 21, 2016

1 Description of the Assignment

Whether in a contact center or bank branch environment, workforce managers face the constant chal-
lenge of balancing the priorities of service levels and labour costs. In the case where the demand (in-
bound calls, outbound calls, emails, web chats, etc.) is greater than supply (the agents themselves), the
price, in the form of reduced service levels, falling customer satisfaction and poor agent morale, rises.
On the other side, when supply is greater than demand, service levels tend to improve, but at the cost
of idle and unproductive agents. The key to optimising the bottom line performance of a contact centre
is to find a harmonious balance between supply and demand. This bottom line performance is directly
impacted by the direct costs of hiring and employing your agents, but it is also influenced by client
satisfaction, agent morale and other factors. Taking all the above into consideration, the basis of any
good staffing plan is an accurate workload forecast. An accurate forecast gives us the opportunity to
predict workload in order to get the right number of staff in place to handle it.
The specific project constitutes an AXA data challenge, where its purpose is to apply data mining
and machine learning techniques for the development of an inbound call forecasting system. The forecast-
ing system should be able to predict the number of incoming calls for the AXA call center in France, on
a per “half-hour” time slot basis. The prediction is for seven (7) days ahead in time. More specifically,
based on the history of the incoming calls up to a specific time stamp (you cannot use data/features
that corresponds to future time slots), the proposed model should be able to predict the number of the
calls, received seven (7) days later. In this way, the problem can be seen as a regression problem where
the goal is to design a model that achieves to predict the incoming calls of the AXA call center with
high accuracy. A detailed description of the dataset that will be used for the training of your proposed
models is given in Section 2 . The specific dataset includes telephony data retrieved from AXA call cen-
ters, and corresponds to the period spanning the calendar years 2011, 2012 and 2013. Last but not least,
the final evaluation of your model will be given by using a Leaderboard platform (a detailed description
about the Leaderboard platform can be found in Section 4).

1.1 Data Challenge Ceremony

As the data are provided by AXA Assistance and this data challenge forms part of the activities of the
AXA-X DaScIS chair - after the evaluation of your submissions there will be a reception organized by
the chair. You will be informed on the details in due time.

1
2 Dataset Description
In this section, we present the structure of the training dataset (train.csv 1 ) that will be used for
the training of your model. As mentioned previously, the training dataset includes telephony data
derived from AXA call centers, and correspond to the calendar years 2011-13. Figure 1 shows how
the training dataset has been derived. Each one of the rows of the dataset corresponds to the num-
ber of incoming calls for each different combination of values for the following attributes: DATE (time
stamps in half hour slots), SPLIT COD, ACD COD, ASS ASSIGNMENT. Please consider that some com-
binations may not be present on the dataset. For a detailed description of the attributes refer to the
field description.xlsx2 file.
The objective of your work here is to build a model (or a set of models) able to predict the number of
incoming calls (CSPL RECEIVED CALLS) for seven days after the current/given date for each different
set of values of the attributes: Date(time stamps in half hour slots), ASS ASSIGNMENT.

Figure 1: Data Sources of the training dataset.

As you can easily observe, the dataset has some missing values for some attributes (NULL). In the
preprocessing task, you should take care of a number of similar cases. In the case of features that take
numerical values, one approach could be to replace the missing values with the mean value of this
feature. Some other features may not be useful at the prediction task. It would be helpful to explore the
dataset and try to deal with such cases. Additionally, some of the features take values that correspond
to a string (e.g., the TPER TEAM feature takes values Jours and Nuit). In such cases, we can create
two new features (i.e., add two new columns to the data matrix) which correspond to the two possible
strings. Thus, if the TPER TEAM feature takes the value Jours, the feature that corresponds to Jours
will become equal to 1, while the feature that corresponds to Nuit will become equal to 0.
As part of the preprocessing step, you can also apply feature selection techniques to keep a subset
with the most informative features or dimensionality reduction methods (e.g., Linear Discriminat Anal-
ysis) to create a representation of the data in a new space preserving some of the underlying properties
of the data. It is also possible to create new features that do not exist in the dataset, but can be useful in
the forecasting task. Thus, you can create a new feature (i.e., add a new column to the data matrix) to
represent this information (this is known as feature engineering or generation).

3 Summary of the Pipeline

The pipeline that will be followed in the project is similar to the one followed in the labs. In the follow-
ing, we briefly describe each part of the pipeline.

• Data pre-processing: After loading the data, a preprocessing task should be done to transform the
data into an appropriate format. In the previous section, we discussed some of these points.
1 Training dataset:

https://moodle.polytechnique.fr/pluginfile.php/59386/mod_assign/introattachment/0/train_2011_
2012_2013.7z.001?forcedownload=1.
https://moodle.polytechnique.fr/pluginfile.php/59386/mod_assign/introattachment/0/train_2011_
2012_2013.7z.002?forcedownload=1
2 Dataset description:

https://moodle.polytechnique.fr/pluginfile.php/59386/mod_assign/introattachment/0/field_
description.zip?forcedownload=1.

2
• Feature engineering - Dimensionality reduction: The next step involves the feature engineering task,
i.e., how to select a subset of the features that will be used in the learning task (feature selection) or
how to create new features from the already existing ones (see also previous section). Moreover,
it is possible to apply dimensionality reduction tecniques in order to improve the performance of
the algorithms.
• Learning algorithm: The next step of the pipeline involves the selection of the appropriate learning
(i.e., regression) algorithm for the problem. At this point, you can test the performance of a num-
ber of different algorithms and choose the best one. Additionally, you can follow an ensemble
learning approach, combining many regression algorithms.

• Evaluation: In Section 4, we describe in detail how the evaluation will be performed.

4 Evaluation
You will build your model based on the training data contained in the train.csv file. To do this,
you can apply cross-validation techniques3 . The goal of cross-validation is to define a dataset to test the
model in the training phase, in order to limit problems like overfitting and have an insight on how the
model will generalize to an independent dataset (i.e., an unknown dataset, like the test dataset that will
be used to assess your model).
In k-fold cross-validation (assuming your model allows this type of validation), the original sample
is randomly partitioned into k equal size subsamples. On the k subsamples, a single subsample is
retained as the validation data for testing the model, and the remaining k − 1 subsamples are used
as training data. The cross-validation process is then repeated k times (the folds), with each of the k
subsamples used exactly once as the validation (i.e., test) data. The k results from the folds can then be
averaged (or otherwise combined) to produce a single estimation (average accuracy of the model).

4.1 How to evaluate your model?

Of course, having a good model that achieves good accuracy under cross validation does not guarantee
that the same accuracy will be also achieved for the test data. Thus, the final evaluation of your model
will be done on the test dataset contained in the submission.txt4 file. So, after having a model that
performs well under cross-validation, you should train the model using the whole training dataset and
test it on the test dataset as described below.

Submission file
For the final evaluation of your model, you have to predict the number of calls that will be received at
a number of different combination of values of the following attributes: DATE (corresponding to half
hour slots), ASS ASSIGNMENT. More specifically, get the predicted number of calls for each instance
(row) contained in the submission.txt file. Each row of the submission.txt file corresponds to a
different combination DATE (corresponding to half hour slots) and ASS ASSIGNMENT ( Table 1 presents
a snapshot of the submission.txt file). In the submission.txt example file, all the prediction
values are set equal to zero. You must replace those values with your predicted ones. Do not change
the format of the file (fields separated by tab). The final evaluation of your model will be made based
on LinEx loss function (see Section 4.2 for a detailed description).
The data corresponding to the required dates (the listed dates in the submission.txt file) are
omitted from the dataset. Moreover the data on a 6-day window a priori to those dates listed in the
submission.txt file, are also omitted to ensure that you will not use them for the predictions of your
submission.
3 Wikipedia’s
lemma for Cross-validation: http://en.wikipedia.org/wiki/Cross-validation_(statistics).
4 Testing
dataset:
https:https://moodle.polytechnique.fr/pluginfile.php/59386/mod_assign/introattachment/0/
submission.txt?forcedownload=1

3
DATE ASS ASSIGNMENT Prediction
2012-01-03 00:00:00.000 CAT 0
2012-01-03 00:00:00.000 Tlphonie 0
2012-01-03 00:00:00.000 Tech. Inter 0
2012-01-03 00:00:00.000 Tech. Axa 0
2012-01-03 00:00:00.000 Services 0

Table 1: First five rows of the submission.txt example file

Leaberboard Platform
The final evaluation of your model will be done using a private leaderboard platform, which is available
at the following link: http://moodle.lix.polytechnique.fr/data_chalenge/. The specific
platform will evaluate your predictions and the evaluation score as well as your position (with respect
to the rest users) will appear in the Leaderboard. In order to make a new submission you just need to
login to the platform (by using the identifier of your team along with your password) and upload the
submission.txt file. Your final score will be the best one that you have achieved. Finally, note that you
can submit up to 10 entries per day. Please be careful with the submission process as the submission
counter resets 24 hours after your last submission.

4.2 Evaluation metric

The objective of this task is the development of an accurate inbound call forecasting system, able to
keep the cost (human resources) at an affordable level and at the same time achieve a high level of
customer satisfaction. Nevertheless, it is very difficult to define a standard metric in order to evaluate
the customers’ satisfaction. Intuitively speaking, an underestimate of the number of incoming calls in
the call center is usually much more serious than an overestimate. In this direction, we adopt the LinEx
loss as evaluation metric instead of the standard mean square error (MSE), which is given by:

LinEx(y, ŷ) = exp α(y − ŷ) − α(y − ŷ) − 1,

where the true number of calls is y and that the predicted number (by your model) is ŷ; and α =
−0.1 which gives a relatively higher penalty to underestimating the number of calls. The final loss is
averaged over all examples. This penalty is illustrated in Figure 2. Having such a loss function should
encourage the design of algorithms towards building models that are not underestimate the number of
calls.

10
α = −0.1
α = −0.15
8 α = −0.05
MSE

6
E(y, ŷ)

0
−20 −15 −10 −5 0 5 10 15 20
= y − ŷ

Figure 2: LinEx (for various α) compared with MSE, where y is the true number of calls, ŷ is corre-
sponding predicted number of calls.

4
5 Useful Python Libraries
In this section, we briefly discuss some useful tools that can be useful in the project and you are encour-
aged to use.

• For the preprocessing task which also involves some initial data exploration, you may use the
pandas Python library for data analysis5 .
• A very powerful machine learning library in Python is scikit-learn6 . It can be used in the
preprocessing step (e.g., for feature selection) and in the calls forecasting task (a plethora of re-
gression algorithms have been implemented in scikit-learn.). Recall that we have already
used the scikit-learn in the labs.
• Finally, you are always encouraged to propose and develop your own learning algorithms or use
the ones developed in the labs.

6 Details about the Submission of the Project

As part of the project, you have to submit the following:

1. Your final submission file (submission.txt), which contains the estimated number of calls.
2. A 2-5 pages report, in which you should describe the approach and the methods that you used
in the project. Since this is a real data science task, we are interested to know how you dealt with
each part of the pipeline, e.g., if you have created new features and why, which algorithms did
you use for the calls forecasting task and why, their performance (accuracy and training time),
approaches that finally didn’t work but is interesting to present them, and in general, whatever
you think that is interesting to report). Also, in the report, please provide the names and the
emails of the team members, and the identifier of your team (e.g., INF554).
3. A directory with the code of your implementation.

4. Create a .zip file with the team identifier.zip (the identifier of your team), containing the
code and the report and submit it to Moodle platform (one submission per team).
5. Deadline: Friday, December 9, 23:59.

7 Oral presentation
Each team will give an oral presentation on Thursday, December 15. More details will be announced
later.

8 Project evaluation
Your final evaluation for the project will be based:
1. on the leaderboard score (according to the LinEx loss function, Section 4.2) of the proposed model,

2. on the report and code that you will submit,

3. on your oral presentation.

5 http://pandas.pydata.org/.
6 http://scikit-learn.org/.

5
Appendix
Even though the specifics of the problem defined in this document are unique, the general task has
been dealt in the past before. Therefore , we provide you here a list of approaches (features and models)
which were among the top ones in the previous versions of this assignment.

Feature engineering BEYOND the raw data:

• Considering the date/time aspect past feature have included :

– time since epoch
– time since start of day
– time as a categorical feature
– month
– week day
– week end
– night/day
– “day off”
– holidays: Taken from outside sources which indicate holidays or periods of vacation.
• average values of the target variable over various windows sizes (average on the past values)

• dummy variables from existing categorical ones

Models: The most prominent models were:

• Tree regressors
• Random Forest regressors
• Gradient boosting regressors

• Autoregressive models.
You are encouraged to explore these options but more importantly to explore solutions beyond them!

Capstone Project Vivek
100% (4)
Capstone Project Vivek
145 pages
Predictive Modelling
100% (1)
Predictive Modelling
58 pages
Ritesh Tandon Machine Learning Project
100% (5)
Ritesh Tandon Machine Learning Project
23 pages
Dimensionality Reduction of High Dimensional Data: Summer Internship Project Summary
No ratings yet
Dimensionality Reduction of High Dimensional Data: Summer Internship Project Summary
20 pages
Practical Computer Simulations For Product Analysts - by Mariya Mansurova - Apr, 2024 - Towards Data Science
No ratings yet
Practical Computer Simulations For Product Analysts - by Mariya Mansurova - Apr, 2024 - Towards Data Science
37 pages
Lab Assignment 1 Ucs551
No ratings yet
Lab Assignment 1 Ucs551
23 pages
Ids Final Sol
No ratings yet
Ids Final Sol
16 pages
Practice Problems of Regression
No ratings yet
Practice Problems of Regression
5 pages
Secure SDLC Consideration With NIST SP 800 64
No ratings yet
Secure SDLC Consideration With NIST SP 800 64
27 pages
Analysis and Comparison of Forecasting Algorithms For Telecom Customer Churn
No ratings yet
Analysis and Comparison of Forecasting Algorithms For Telecom Customer Churn
7 pages
Latest Data Mining Lab Manual
No ratings yet
Latest Data Mining Lab Manual
74 pages
S 11
No ratings yet
S 11
7 pages
Deeptesh Pal - Predictive Analytics Individual Assingment
No ratings yet
Deeptesh Pal - Predictive Analytics Individual Assingment
4 pages
Final - Bank Customer Response Prediction Model
No ratings yet
Final - Bank Customer Response Prediction Model
23 pages
ADA Assignment - Final - 2024
No ratings yet
ADA Assignment - Final - 2024
5 pages
Lec 2
No ratings yet
Lec 2
13 pages
Data Science - 2 Sets
No ratings yet
Data Science - 2 Sets
10 pages
Article Review 11 Eng
No ratings yet
Article Review 11 Eng
18 pages
Predictive Modelling Project
No ratings yet
Predictive Modelling Project
29 pages
Fraud Claim Detection
No ratings yet
Fraud Claim Detection
13 pages
Big Data Analysis On ML Main Points
No ratings yet
Big Data Analysis On ML Main Points
5 pages
Project Description
No ratings yet
Project Description
4 pages
Predictive Analytics in Insurance
No ratings yet
Predictive Analytics in Insurance
12 pages
TE ML LAB Mannual
No ratings yet
TE ML LAB Mannual
21 pages
ML SP24 Mid Term Exam - Solution
No ratings yet
ML SP24 Mid Term Exam - Solution
8 pages
AS - Problem Statement
No ratings yet
AS - Problem Statement
4 pages
01 Apply Data Preprocessing On Heart Dataset and Evaluate Performance Using Confusion Matrix
No ratings yet
01 Apply Data Preprocessing On Heart Dataset and Evaluate Performance Using Confusion Matrix
19 pages
TB 969425740
No ratings yet
TB 969425740
16 pages
FAQ's - Applied Statistics
No ratings yet
FAQ's - Applied Statistics
3 pages
Research Paper
No ratings yet
Research Paper
5 pages
BUDT Individual - Project - 5 - Spark ML Regression and Classification
No ratings yet
BUDT Individual - Project - 5 - Spark ML Regression and Classification
3 pages
CE802 Report
No ratings yet
CE802 Report
7 pages
Sari Go MM Ulaan U Deep Resume
No ratings yet
Sari Go MM Ulaan U Deep Resume
3 pages
Dsa - DK Question Paper
No ratings yet
Dsa - DK Question Paper
4 pages
Advance Statistics - Final Assignment
No ratings yet
Advance Statistics - Final Assignment
3 pages
ISW2001NBF - AEB (VERSIONE 6.1.3) - Installatore - ENG PDF
No ratings yet
ISW2001NBF - AEB (VERSIONE 6.1.3) - Installatore - ENG PDF
33 pages
Assignment 1 DA - E Oct 2023 V1-1
No ratings yet
Assignment 1 DA - E Oct 2023 V1-1
3 pages
2024 Fods Ques
No ratings yet
2024 Fods Ques
4 pages
PG IV 1110 Online Predictive Modelling End Term Paper
No ratings yet
PG IV 1110 Online Predictive Modelling End Term Paper
3 pages
Data Mining Problem 2 Report
No ratings yet
Data Mining Problem 2 Report
13 pages
Data Science For Online Customer Analytics - Assignment
No ratings yet
Data Science For Online Customer Analytics - Assignment
11 pages
User Manual 3134806
No ratings yet
User Manual 3134806
2 pages
TC & BSC Overview
No ratings yet
TC & BSC Overview
36 pages
JCS2121 Prog in C Syllabus
No ratings yet
JCS2121 Prog in C Syllabus
2 pages
TN Apogee Prepress 10.0 - Apogee Impose
No ratings yet
TN Apogee Prepress 10.0 - Apogee Impose
49 pages
BD U-3 (Anupam Sir)
No ratings yet
BD U-3 (Anupam Sir)
23 pages
Data Communication
No ratings yet
Data Communication
543 pages
The Dangerous Art of Text Mining A Methodology For Digital History Jo Guldi Download
No ratings yet
The Dangerous Art of Text Mining A Methodology For Digital History Jo Guldi Download
90 pages
Hints of Assignment5 - Fall 2024
No ratings yet
Hints of Assignment5 - Fall 2024
11 pages
Hack Wifi
No ratings yet
Hack Wifi
4 pages
Unit 5
No ratings yet
Unit 5
41 pages
Walker Books
No ratings yet
Walker Books
3 pages
50watts Audio Amplifier Using TDA7265
No ratings yet
50watts Audio Amplifier Using TDA7265
26 pages
QR Patrol 2 Page
No ratings yet
QR Patrol 2 Page
2 pages
Waveform Analysis Software User Manual V1.1
No ratings yet
Waveform Analysis Software User Manual V1.1
25 pages
Prepositions of Place - My Room
100% (1)
Prepositions of Place - My Room
1 page
Implementation of Nepal National Building Code Through Automated Building Permit System
No ratings yet
Implementation of Nepal National Building Code Through Automated Building Permit System
8 pages
Problem Sheet Solution
No ratings yet
Problem Sheet Solution
11 pages
USN-500 - User Manual: G, Oz, GN, DWT
No ratings yet
USN-500 - User Manual: G, Oz, GN, DWT
1 page
History of Computers
No ratings yet
History of Computers
49 pages
Kelompok 7 - Dokumentasi Proyek
No ratings yet
Kelompok 7 - Dokumentasi Proyek
18 pages
Natoreit Profile
No ratings yet
Natoreit Profile
7 pages
Unit 7 Resuemen Ingles
No ratings yet
Unit 7 Resuemen Ingles
8 pages
Genetic Algorithms
No ratings yet
Genetic Algorithms
14 pages
Coaching Center WorkFlow
No ratings yet
Coaching Center WorkFlow
8 pages
It 501 Object Technology & Uml: Multiple Choice Questions
No ratings yet
It 501 Object Technology & Uml: Multiple Choice Questions
4 pages
Lab Manual 10
No ratings yet
Lab Manual 10
2 pages
Dharmesh Vaya GDE
No ratings yet
Dharmesh Vaya GDE
2 pages
Choose An OTA For The Apple Watch Series 3 (42mm) IPSW Downloads
No ratings yet
Choose An OTA For The Apple Watch Series 3 (42mm) IPSW Downloads
1 page
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Improved Performance Research Integration Tool User Guide - Version 4.6
From Everand
Improved Performance Research Integration Tool User Guide - Version 4.6
Beth Plott
No ratings yet
Special Techniques in Excel
From Everand
Special Techniques in Excel
David Fong
No ratings yet
Model-Driven Online Capacity Management for Component-Based Software Systems
From Everand
Model-Driven Online Capacity Management for Component-Based Software Systems
André van Hoorn
No ratings yet
Cloud Brokering
From Everand
Cloud Brokering
Felipe Díaz-Sánchez
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Microsoft Azure Database Administrator DP 300
From Everand
Microsoft Azure Database Administrator DP 300
Manish Soni
No ratings yet
Tableau 8.2 Training Manual: From Clutter to Clarity
From Everand
Tableau 8.2 Training Manual: From Clutter to Clarity
Larry Keller
No ratings yet
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
From Everand
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
Andrei Besedin
2.5/5 (2)
IT Specialist: Artificial Intelligence Exam Prep - 500 Questions for Certification Success (0225)
From Everand
IT Specialist: Artificial Intelligence Exam Prep - 500 Questions for Certification Success (0225)
Satou Takahiro
No ratings yet
CISA Exam-Testing Concept-PERT/CPM/Gantt Chart/FPA/EVA/Timebox (Chapter-3)
From Everand
CISA Exam-Testing Concept-PERT/CPM/Gantt Chart/FPA/EVA/Timebox (Chapter-3)
Hemang Doshi
1.5/5 (3)
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet

INF554: M L I E P AXA Data Challenge - Assignment: 1 Description of The Assignment

Uploaded by

INF554: M L I E P AXA Data Challenge - Assignment: 1 Description of The Assignment

Uploaded by

INF554: M ACHINE L EARNING I

Data Science and Mining (DaSciM) Team

1 Description of the Assignment

1.1 Data Challenge Ceremony

Figure 1: Data Sources of the training dataset.

3 Summary of the Pipeline

• Evaluation: In Section 4, we describe in detail how the evaluation will be performed.

4.1 How to evaluate your model?

Table 1: First five rows of the submission.txt example file

4.2 Evaluation metric

LinEx(y, ŷ) = exp α(y − ŷ) − α(y − ŷ) − 1,

6 Details about the Submission of the Project

2. on the report and code that you will submit,

Feature engineering BEYOND the raw data:

• Considering the date/time aspect past feature have included :

• dummy variables from existing categorical ones

Models: The most prominent models were:

You might also like