0% found this document useful (0 votes)

164 views39 pages

1 - Practical Guide For Kaggle Competitions

The document provides tips for participating in machine learning competitions. It recommends defining goals for participation, organizing ideas systematically, and sorting parameters by importance and understandability. It also suggests starting with simple solutions, debugging the full pipeline, and progressing from simple to complex models. Additional tips include using good code practices like commenting and version control, reusing code between training and testing, and reading papers for new ideas and domain knowledge. The document stresses keeping code clean and reproducible.

Uploaded by

Armanul Alam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

164 views39 pages

1 - Practical Guide For Kaggle Competitions

Uploaded by

Armanul Alam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Practical Guide

Alexander Guschin
Practical guide: intro
Before you enter a competition

Define your goals. What you can get out of your participation?
1. To learn more about an interesting problem
2. To get acquainted with new software tools
3. To hunt for a medal
Before you enter a competition

Define your goals. What you can get out of your participation?
1. To learn more about an interesting problem
2. To get acquainted with new software tools
3. To hunt for a medal
After you enter a competition:
Working with ideas

1. Organize ideas in some structure

2. Select the most important and promising ideas
3. Try to understand the reasons why something
does/doesn’t work
After you enter a competition:
Everything is a hyperparameter

Sort all parameters by these principles:

1. Importance
2. Feasibility
3. Understanding
Note: changing one parameter can affect the whole pipeline
Dmitry Altukhov
Data loading

• Do basic preprocessing and convert csv/txt files into

hdf5/npy for much faster loading
• Do not forget that by default data is stored in 64-bit arrays,
most of the times you can safely downcast it to 32-bits
• Large datasets can be processed in chunks
Performance evaluation

• Extensive validation is not always needed

• Start with fastest models - LightGBM
Fast and dirty always better

• Don’t pay too much attention to code quality

• Keep things simple: save only important things
• If you feel uncomfortable with given computational
resources - rent a larger server
Mikhail Trofimov
Initial pipeline

• Start with simple (or even primitive) solution

Initial pipeline

• Start with simple (or even primitive) solution

• Debug full pipeline
− From reading data to writing submission file
Initial pipeline

• Start with simple (or even primitive) solution

• Debug full pipeline
− From reading data to writing submission file
• “From simple to complex”
− I prefer to start with Random Forest rather than
Gradient Boosted Decision Trees
Best Practices from Software Development
Best Practices from Software Development

• Use good variable names

− If your code is hard to read — you definitely will have
problems soon or later
Best Practices from Software Development

• Use good variable names

− If your code is hard to read — you definitely will have
problems soon or later
• Keep your research reproducible
− Fix random seed
− Write down exactly how any features were generated
− Use Version Control Systems (VCS, for example, git)
Best Practices from Software Development

• Use good variable names

• This can get you ideas about ML-related things

− For example, how to optimize AUC
Read papers

• This can get you ideas about ML-related things

− For example, how to optimize AUC
• Way to get familiar with problem domain
− Especially useful for feature generation
Dmitry Ulyanov
My pipeline

• Read forums and examine kernels first

– There are always discussions happening!
My pipeline

• Read forums and examine kernels first

– There are always discussions happening!
• Start with EDA and a baseline
– To make sure the data is loaded correctly
– To check if validation is stable
My pipeline

• Read forums and examine kernels first

– There are always discussions happening!
• Start with EDA and a baseline
– To make sure the data is loaded correctly
– To check if validation is stable
• I add features in bulks
– At start I create all the features I can make up
– I evaluate many features at once (not “add one and
evaluate”)
• Hyperparameters optimization
– First find the parameters to overfit train dataset
– And then try to trim model
Code organization: keeping it clean

• Very important to have reproducible results!

– Keep important code clean
Code organization: keeping it clean

• Very important to have reproducible results!

– Keep important code clean
• Long execution history leads to mistakes
Code organization: keeping it clean

• Very important to have reproducible results!

– Keep important code clean
• Long execution history leads to mistakes

• Your notebooks can become a total mess

Code organization: keeping it clean

• One notebook per submission (and use git)

Code organization: keeping it clean

• One notebook per submission (and use git)

• Before creating a submission restart the kernel

– Use “Restart and run all” button
Code organization: test/val

• Split train.csv into train and val with structure of train.csv

and test.csv

2.
Code organization: test/val

• When validating, set it at the top of the notebook

• To retrain models on the whole dataset and get

predictions for test set just change
Code organization: macros

I use macros for a frequent code

Code organization: macros
Code organization: custom library

• I use a library with frequent operations implemented

– Out-of-fold predictions
– Averaging
– I can specify a classifier by it’s name

Train With Shubham Syllabus
No ratings yet
Train With Shubham Syllabus
61 pages
Machine Learning Project Checklist
100% (1)
Machine Learning Project Checklist
10 pages
Jenkins
No ratings yet
Jenkins
42 pages
ML Projects For Final Year
No ratings yet
ML Projects For Final Year
7 pages
ML Checklist PDF
No ratings yet
ML Checklist PDF
4 pages
Lecture 8 - Lifecycle of A Data Science Project - Part 2
No ratings yet
Lecture 8 - Lifecycle of A Data Science Project - Part 2
43 pages
How To Avoid Machine Learning Pitfalls
No ratings yet
How To Avoid Machine Learning Pitfalls
25 pages
Design A Machine Learning System
No ratings yet
Design A Machine Learning System
9 pages
Lones 2024
No ratings yet
Lones 2024
28 pages
How To Avoid Machine Learning Pitfalls
No ratings yet
How To Avoid Machine Learning Pitfalls
33 pages
7641 Assignment 1
No ratings yet
7641 Assignment 1
4 pages
How To Avoid Machine Learning Pitfalls: A Guide For Academic Researchers
No ratings yet
How To Avoid Machine Learning Pitfalls: A Guide For Academic Researchers
17 pages
ML Advice Lecture
No ratings yet
ML Advice Lecture
87 pages
Class Xii Model Life Cycle
No ratings yet
Class Xii Model Life Cycle
6 pages
The ML Test Score A Rubric For ML Production Readiness and Technical
No ratings yet
The ML Test Score A Rubric For ML Production Readiness and Technical
10 pages
CapStone Project
No ratings yet
CapStone Project
4 pages
Hands On Machine Learning With Scikit Learn and TensorFlow-427-432
No ratings yet
Hands On Machine Learning With Scikit Learn and TensorFlow-427-432
6 pages
The ML Test Score: A Rubric For ML Production Readiness and Technical Debt Reduction
No ratings yet
The ML Test Score: A Rubric For ML Production Readiness and Technical Debt Reduction
10 pages
Subtitle
No ratings yet
Subtitle
2 pages
How To Avoid Machine Learning Pitfalls: A Guide For Academic Researchers
No ratings yet
How To Avoid Machine Learning Pitfalls: A Guide For Academic Researchers
19 pages
How To Succeed With A ML Project
No ratings yet
How To Succeed With A ML Project
52 pages
CT1-MLOPs S1 2
No ratings yet
CT1-MLOPs S1 2
68 pages
Model Life Cycle
No ratings yet
Model Life Cycle
25 pages
Developing AI Software - A Developer's Guide
No ratings yet
Developing AI Software - A Developer's Guide
3 pages
Milestone
No ratings yet
Milestone
7 pages
HarvardX PH527X Planning Checklist 2017
No ratings yet
HarvardX PH527X Planning Checklist 2017
5 pages
Jade Abbott - Mls Hidden Tasks
No ratings yet
Jade Abbott - Mls Hidden Tasks
78 pages
ML Interview Questions
No ratings yet
ML Interview Questions
146 pages
Roadmap
No ratings yet
Roadmap
6 pages
Identifing Software Bugs or Not Using SMLT Model
No ratings yet
Identifing Software Bugs or Not Using SMLT Model
34 pages
Applied Scientist Candidate Companion
No ratings yet
Applied Scientist Candidate Companion
5 pages
How To Build AI
No ratings yet
How To Build AI
10 pages
Class Xii Ai Worksheet Booklet Part2 2023-2024
No ratings yet
Class Xii Ai Worksheet Booklet Part2 2023-2024
26 pages
OPABP NidhiSrivastava
No ratings yet
OPABP NidhiSrivastava
7 pages
Week 3 A
No ratings yet
Week 3 A
18 pages
AI Project Building Foundations and Tools
No ratings yet
AI Project Building Foundations and Tools
11 pages
Compressed AI Project Building Foundations and Tools
No ratings yet
Compressed AI Project Building Foundations and Tools
11 pages
TERM 2 Notes
No ratings yet
TERM 2 Notes
8 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
91 pages
TFM Jenifer Tabita Ciuciu-Kis
No ratings yet
TFM Jenifer Tabita Ciuciu-Kis
83 pages
General Material
No ratings yet
General Material
16 pages
Lecture+Notes Intro To MLOps Session3
No ratings yet
Lecture+Notes Intro To MLOps Session3
8 pages
State of The Art Research Methodology For Machine
No ratings yet
State of The Art Research Methodology For Machine
58 pages
Artificial Intelligence - (Unit - 1)
No ratings yet
Artificial Intelligence - (Unit - 1)
47 pages
PYTHON PROGRAMMING FOR MACHINE LEARNING-220901004 - Compressed
No ratings yet
PYTHON PROGRAMMING FOR MACHINE LEARNING-220901004 - Compressed
6 pages
Unit 1: Capstone Project
No ratings yet
Unit 1: Capstone Project
21 pages
A Practical and Technical Introduction To Machine Learning
No ratings yet
A Practical and Technical Introduction To Machine Learning
23 pages
Road To ML Engineer
No ratings yet
Road To ML Engineer
4 pages
Road Map To Become AI ML Engineer
No ratings yet
Road Map To Become AI ML Engineer
5 pages
Lecture 1 Course Introduction
No ratings yet
Lecture 1 Course Introduction
18 pages
Heart Disease Prediction Report
No ratings yet
Heart Disease Prediction Report
113 pages
Pre-Lect Digital Image Processing
No ratings yet
Pre-Lect Digital Image Processing
42 pages
MLOps
No ratings yet
MLOps
16 pages
A Guide To Organizing Machine Learning Projects - v2
No ratings yet
A Guide To Organizing Machine Learning Projects - v2
26 pages
Unit 7 ML
No ratings yet
Unit 7 ML
33 pages
Unit 4 - DS - 1st Year
No ratings yet
Unit 4 - DS - 1st Year
6 pages
Ebug Final
No ratings yet
Ebug Final
25 pages
GenerativeAI ML Roadmap
No ratings yet
GenerativeAI ML Roadmap
26 pages
Data Science Checklist
No ratings yet
Data Science Checklist
22 pages
Week 2 - Select and Train A Model
No ratings yet
Week 2 - Select and Train A Model
29 pages
Why Do AI Initiatives Fail
No ratings yet
Why Do AI Initiatives Fail
5 pages
TensorFlow Developer Certificate Exam Practice Tests 2024 Made Easy
From Everand
TensorFlow Developer Certificate Exam Practice Tests 2024 Made Easy
Mr Troy
No ratings yet
Readme
No ratings yet
Readme
6 pages
Client Information and Contract Management System Edited 1
No ratings yet
Client Information and Contract Management System Edited 1
154 pages
Cover Letter Sample For Devops Engineer
100% (1)
Cover Letter Sample For Devops Engineer
8 pages
5.1 Ase
No ratings yet
5.1 Ase
29 pages
Devops Lab Manual
No ratings yet
Devops Lab Manual
67 pages
Troubleshooting Guide Developing Android Apps in Kotlin
No ratings yet
Troubleshooting Guide Developing Android Apps in Kotlin
13 pages
Certified Software Quality Engineer (CSQE) Body of Knowledge
No ratings yet
Certified Software Quality Engineer (CSQE) Body of Knowledge
11 pages
Git Cheat Sheet: Presented by Tower - The Best Git Client For Mac and Windows
100% (1)
Git Cheat Sheet: Presented by Tower - The Best Git Client For Mac and Windows
2 pages
314004
No ratings yet
314004
176 pages
Senthil Bits Dissertation
No ratings yet
Senthil Bits Dissertation
41 pages
OST Case Study
No ratings yet
OST Case Study
26 pages
Curriculum Vitae For Rufaro Gumindega
No ratings yet
Curriculum Vitae For Rufaro Gumindega
3 pages
Software Engineering 1 Prelim Exam
No ratings yet
Software Engineering 1 Prelim Exam
31 pages
Rys Git Tutorial
No ratings yet
Rys Git Tutorial
39 pages
SEPM Unit-5
No ratings yet
SEPM Unit-5
34 pages
Software Configuration Management System
No ratings yet
Software Configuration Management System
11 pages
My cs205
No ratings yet
My cs205
13 pages
School Record Management System Thesis
100% (3)
School Record Management System Thesis
8 pages
Angular: Pentastagiu Remote Brașov
No ratings yet
Angular: Pentastagiu Remote Brașov
22 pages
Software Configuration MGT
No ratings yet
Software Configuration MGT
34 pages
Setuptools
No ratings yet
Setuptools
43 pages
Durgasoft Git For Devop Study Material
100% (4)
Durgasoft Git For Devop Study Material
140 pages
Shreshth CS Project Main
No ratings yet
Shreshth CS Project Main
15 pages
Github IBM Template
No ratings yet
Github IBM Template
17 pages
Automation: Modernizing The Mainframe For Devops: by Don Macvittie
100% (1)
Automation: Modernizing The Mainframe For Devops: by Don Macvittie
16 pages
DevOps Unit 3
No ratings yet
DevOps Unit 3
16 pages
Git and Github Cheat Sheet
No ratings yet
Git and Github Cheat Sheet
2 pages
Git & GitHub Roadmap
No ratings yet
Git & GitHub Roadmap
7 pages

1 - Practical Guide For Kaggle Competitions

Uploaded by

1 - Practical Guide For Kaggle Competitions

Uploaded by

Practical Guide

1. Organize ideas in some structure

Sort all parameters by these principles:

• Do basic preprocessing and convert csv/txt files into

• Extensive validation is not always needed

• Don’t pay too much attention to code quality

• Start with simple (or even primitive) solution

• Start with simple (or even primitive) solution

• Start with simple (or even primitive) solution

• Use good variable names

• Use good variable names

• Use good variable names

• This can get you ideas about ML-related things

• This can get you ideas about ML-related things

• Read forums and examine kernels first

• Read forums and examine kernels first

• Read forums and examine kernels first

• Read forums and examine kernels first

• Very important to have reproducible results!

• Very important to have reproducible results!

• Very important to have reproducible results!

• Your notebooks can become a total mess

• One notebook per submission (and use git)

• One notebook per submission (and use git)

• Before creating a submission restart the kernel

• Split train.csv into train and val with structure of train.csv

• When validating, set it at the top of the notebook

• To retrain models on the whole dataset and get

I use macros for a frequent code

• I use a library with frequent operations implemented

You might also like