Framework To Approach A Kaggle Problem: 1. Importing The Training / Test Population

This document provides a framework and tips for approaching problems on the Kaggle machine learning competition platform. It notes that competing with experienced data scientists can be challenging, as some have automated tools for data exploration. The tips include: working hard; teaming up initially; focusing on feature engineering; researching the domain and problem; making simple initial submissions; being open to starting from scratch; and experimenting with algorithms and ensembles. It then outlines a framework involving importing training/test data, sampling the population, choosing attributes, and comparing models. The goal is to help readers get started competing on Kaggle to enter the new era of analytics and machine learning.

Uploaded by

Govind Naik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views2 pages

Framework To Approach A Kaggle Problem: 1. Importing The Training / Test Population

Uploaded by

Govind Naik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

Competing with the best data scientists can be challenging.

Especially so, if some of them

have been doing so for years. I know a few people who have well automated scripts to perform
most of the data exploration! These people are out deciding on best algorithms when rest of
the world is still figuring out the nuances of the data.

Here are a few things you need to keep in mind before starting a problem on Kaggle :

1. Like all good things in life, winning a Kaggle competition is all about hard work. Get
ready to devote long hours wondering on the same problem for days/weeks/months.
2. Team up with a good team mate for competing in initial competitions. Good team mate
is some one with similar bent of mind and thought process, but might have
complementary skills on tool / domain / work experience.
3. Be ready to do a lot of feature engineering – that is what differentiates the best from
the rest.
4. Do a preliminary research on the domain and the problem. There might be good
research papers with non-conventional effective solutions available on the internet.
5. Make simple initial solutions and submit them to get a sense on how much gap you
need to cover
6. Always be open to start from scratch
7. Experiment with different algorithms and be prepared to prepare ensembles.

The list is not exhaustive, but covers a significant portion. Now let’s look at a simple framework
to approach a Kaggle problem. Participants are challenged at each step of this framework by
Kaggle.

Framework to approach a Kaggle Problem

Next, we will take you through a step by step process of taking a simple shot on a Kaggle
statement. The process generally involve following pieces :

1. Importing the training / test population : Kaggle challenges you to import the training /
test dataset. In general, this is not very straight forward. For example in following problems,
training data needs to messaged well before we start working on the model.

Here are two problem statements where you need to extract data from multiple excel files :

a. Driver Telematic Analysis

b. BCI Challenge @ NER 2015

2. Sampling the population : In general the population size is huge and might not be the
best idea to train using the entire population. For example, “Sentiment Analysis fro Movie
Review” with an enormous number of phrases might be a bad idea to build an initial dictionary.
Choosing this sample can be done randomly or in a stratified way.
3. Choosing the right attributes : This is the most critical step which distinguishes different
submissions on Kaggle. In general we use Principle component analysis, factor analysis,
Information Value, Weight of Evidence to do this part. But there is no set procedure to do this.

4. Compare different ensemble / simple models : Once we have the input and the target
variables, we start building different models. The choice of model depends on the evaluation
metrics, type of input / target variable, distribution of population on target values etc.

In this article we will start with the first step leveraging the BCI challenge. We will start with
the problem statement and then define the scope of this article. After reading this article, I
believe you can start competing on Kaggle and start your journey to discover the new era of
Analytics & Machine Learning.

IELTS Academic Writing: How To Write 8+ Answers For The IELTS Exam!
From Everand
IELTS Academic Writing: How To Write 8+ Answers For The IELTS Exam!
Daniella Moyla
4.5/5 (7)
Building Better Models with JMP Pro
From Everand
Building Better Models with JMP Pro
Jim Grayson
No ratings yet
Fitzpatrick's Color Atlas and Synopsis of Clinical Dermatology
0% (2)
Fitzpatrick's Color Atlas and Synopsis of Clinical Dermatology
1 page
5G KPI 5G Optimization and Troubleshooting
33% (6)
5G KPI 5G Optimization and Troubleshooting
3 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Age and Gender Detection-3
67% (12)
Age and Gender Detection-3
20 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Agile Testing: An Overview
From Everand
Agile Testing: An Overview
Florian Heuer
4/5 (10)
The Data-Confident Internal Auditor: A Practical, Step-by-Step Guide
From Everand
The Data-Confident Internal Auditor: A Practical, Step-by-Step Guide
Yusuf Moolla
No ratings yet
KNIME Essentials
From Everand
KNIME Essentials
Gábor Bakos
No ratings yet
Inventory Accuracy Through Warehouse Control - ProQuest Central - ProQuest PDF
No ratings yet
Inventory Accuracy Through Warehouse Control - ProQuest Central - ProQuest PDF
4 pages
Introduction
No ratings yet
Introduction
10 pages
[V2] Kaggle's Community Competitions Setup Guide and FAQs
No ratings yet
[V2] Kaggle's Community Competitions Setup Guide and FAQs
24 pages
Dum 1111
No ratings yet
Dum 1111
2 pages
Win Kaggle Competition Course
No ratings yet
Win Kaggle Competition Course
14 pages
Confident Programmer Problem Solver: Six Steps Programming Students Can Take to Solve Coding Problems
From Everand
Confident Programmer Problem Solver: Six Steps Programming Students Can Take to Solve Coding Problems
Cloudy Heaven Games
No ratings yet
Optimum Sigma is NOT 6
From Everand
Optimum Sigma is NOT 6
Kermit Taylor
No ratings yet
Kaggle
No ratings yet
Kaggle
4 pages
Data Science
No ratings yet
Data Science
23 pages
Software Testing: A Guide to Testing Mobile Apps, Websites, and Games
From Everand
Software Testing: A Guide to Testing Mobile Apps, Websites, and Games
Mark Garzone
4.5/5 (3)
Analysis and Design of Algorithms: A Beginner’s Hope
From Everand
Analysis and Design of Algorithms: A Beginner’s Hope
Shefali Singhal
No ratings yet
A Career in Tech
From Everand
A Career in Tech
Gunner Technology
No ratings yet
Crafting a Classy Coder Career
From Everand
Crafting a Classy Coder Career
Tom Henricksen
No ratings yet
Mastering Classification Algorithms for Machine Learning: Learn how to apply Classification algorithms for effective Machine Learning solutions (English Edition)
From Everand
Mastering Classification Algorithms for Machine Learning: Learn how to apply Classification algorithms for effective Machine Learning solutions (English Edition)
PARTHA MAJUMDAR
No ratings yet
Kaggle Tutorial 1
No ratings yet
Kaggle Tutorial 1
29 pages
BluePrint for Software Engineering
From Everand
BluePrint for Software Engineering
Prakash Hegade
No ratings yet
What is Software Testing?: ISTQB Foundation Companion and Study Guide
From Everand
What is Software Testing?: ISTQB Foundation Companion and Study Guide
Daniel Chelliah
5/5 (8)
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
From Everand
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
Alok Kumar
No ratings yet
Software Engineering & Object Oriented Modeling
From Everand
Software Engineering & Object Oriented Modeling
Jitendra Patel
No ratings yet
Mastering Machine Learning Algorithms - Second Edition: Expert techniques for implementing popular machine learning algorithms, fine-tuning your models, and understanding how they work, 2nd Edition
From Everand
Mastering Machine Learning Algorithms - Second Edition: Expert techniques for implementing popular machine learning algorithms, fine-tuning your models, and understanding how they work, 2nd Edition
Giuseppe Bonaccorso
2/5 (1)
GROKKING ALGORITHM BLUEPRINT: Advanced Guide to Help You Excel Using Grokking Algorithms
From Everand
GROKKING ALGORITHM BLUEPRINT: Advanced Guide to Help You Excel Using Grokking Algorithms
William Turner
No ratings yet
Kaggle Kernels in Action: From Exploration to Competition
From Everand
Kaggle Kernels in Action: From Exploration to Competition
Robert Johnson
No ratings yet
CODING INTERVIEWS G U I D E: A Comprehensive Beginner's Guide to Learn the Realms of Coding Interviews and Top 150 Programming Questions and Solutions
From Everand
CODING INTERVIEWS G U I D E: A Comprehensive Beginner's Guide to Learn the Realms of Coding Interviews and Top 150 Programming Questions and Solutions
Olivia Miller
No ratings yet
Better Embedded System Software
From Everand
Better Embedded System Software
Philip Koopman
No ratings yet
Teaching Primary Programming with Scratch Pupil Book Year 5
From Everand
Teaching Primary Programming with Scratch Pupil Book Year 5
Phil Bagge
No ratings yet
Kaggle
No ratings yet
Kaggle
12 pages
Qa Testing Not Only for Professionals
From Everand
Qa Testing Not Only for Professionals
Alik Feld
No ratings yet
ML Checklist PDF
No ratings yet
ML Checklist PDF
4 pages
Optimizing AI and Machine Learning Solutions: Your ultimate guide to building high-impact ML/AI solutions (English Edition)
From Everand
Optimizing AI and Machine Learning Solutions: Your ultimate guide to building high-impact ML/AI solutions (English Edition)
Mirza Rahim Baig
No ratings yet
COMPUTER SCIENCE FOR ROOKIES
From Everand
COMPUTER SCIENCE FOR ROOKIES
Angel Bahabwa
No ratings yet
Introduction to Algorithms and Data Structures: A solid foundation for the real world of machine learning and data analytics
From Everand
Introduction to Algorithms and Data Structures: A solid foundation for the real world of machine learning and data analytics
Bolakale Aremu
No ratings yet
Coding for Kids Ages 10 and Up: Coding for Kids and Beginners using html, css and JavaScript
From Everand
Coding for Kids Ages 10 and Up: Coding for Kids and Beginners using html, css and JavaScript
Bob Mather
5/5 (2)
Data Analysis and Machine Learning With Kaggle How To Win Competitions On Kaggle and Build A Successful Career in Data Science 1801817472 9781801817479
No ratings yet
Data Analysis and Machine Learning With Kaggle How To Win Competitions On Kaggle and Build A Successful Career in Data Science 1801817472 9781801817479
48 pages
Tailoring Prompts For Success - The Ultimate ChatGPT Prompt Engineering Guide
From Everand
Tailoring Prompts For Success - The Ultimate ChatGPT Prompt Engineering Guide
Michael Ferguson
3/5 (1)
Algorithm Challenges: The Dojo Collection
From Everand
Algorithm Challenges: The Dojo Collection
Martin Puryear
No ratings yet
Software Development Accelerated Essentials: What You Didn't Know, You Needed to Know
From Everand
Software Development Accelerated Essentials: What You Didn't Know, You Needed to Know
Ed Gomez
No ratings yet
1Data Preprocessing
No ratings yet
1Data Preprocessing
4 pages
Teaching Primary Programming with Scratch Pupil Book Year 6
From Everand
Teaching Primary Programming with Scratch Pupil Book Year 6
Phil Bagge
No ratings yet
The 2 Minute Tester
From Everand
The 2 Minute Tester
David Bruce
No ratings yet
Agile Scrum Handbook – 3rd edition
From Everand
Agile Scrum Handbook – 3rd edition
Nader K. Rad
No ratings yet
Entity Framework Core
From Everand
Entity Framework Core
Kenji Elzerman
No ratings yet
Chatgpt Complete Guide
From Everand
Chatgpt Complete Guide
Joaquin Gener
No ratings yet
Coding for Beginners and Kids Using Python: Python Basics for Beginners, High School Students and Teens Using Project Based Learning
From Everand
Coding for Beginners and Kids Using Python: Python Basics for Beginners, High School Students and Teens Using Project Based Learning
Bob Mather
3/5 (1)
Investigating Performance: Design and Outcomes With Xapi
From Everand
Investigating Performance: Design and Outcomes With Xapi
Sean Putman
No ratings yet
First Kaggle Competition Experience
No ratings yet
First Kaggle Competition Experience
8 pages
Agile Project Management: The Complete Guide for Beginners to Scrum, Agile Project Management, and Software Development: Lean Guides with Scrum, Sprint, Kanban, DSDM, XP & Crystal Book, #6
From Everand
Agile Project Management: The Complete Guide for Beginners to Scrum, Agile Project Management, and Software Development: Lean Guides with Scrum, Sprint, Kanban, DSDM, XP & Crystal Book, #6
Greg Caldwell
No ratings yet
The Art of Software Testing
From Everand
The Art of Software Testing
Glenford J. Myers
3/5 (1)
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
From Everand
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
Calvert Long
No ratings yet
SEO Split Testing: Split Testing In SEO For Data Driven Success
From Everand
SEO Split Testing: Split Testing In SEO For Data Driven Success
Dr. Michael C. Melvin
No ratings yet
Granule (Oracle DBMS) Second Edition
From Everand
Granule (Oracle DBMS) Second Edition
Gerardus Blokdyk
No ratings yet
How To Start Kaggle
No ratings yet
How To Start Kaggle
40 pages
Hello World: Student to Software Professional - a Transformation Guide
From Everand
Hello World: Student to Software Professional - a Transformation Guide
Ashish Vaidya
No ratings yet
Ielts Writing Success. The Essential Step by Step Guide for Task 1 Writing. 8 Practice Tests for Bar Charts & Line Graphs. w/Band 9 Model Answer Key & On-line Support.
From Everand
Ielts Writing Success. The Essential Step by Step Guide for Task 1 Writing. 8 Practice Tests for Bar Charts & Line Graphs. w/Band 9 Model Answer Key & On-line Support.
Oliver Wilde
5/5 (1)
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
From Everand
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
Elaine Tate
No ratings yet
MS-Computer Science-XII
67% (3)
MS-Computer Science-XII
13 pages
Manual Usuario Hiki User-Manual-5392612
No ratings yet
Manual Usuario Hiki User-Manual-5392612
174 pages
Fake Currency Tester and Counter Project Report
100% (2)
Fake Currency Tester and Counter Project Report
44 pages
COS 318, Fall 2015 - General Information
No ratings yet
COS 318, Fall 2015 - General Information
2 pages
10 Must-know Seaborn Visualization Plots for Multivariate Data Analysis in Python _ by Susan Maina _ Towards Data Science
No ratings yet
10 Must-know Seaborn Visualization Plots for Multivariate Data Analysis in Python _ by Susan Maina _ Towards Data Science
39 pages
Howen Technologies International Co., Limited: Quotation
No ratings yet
Howen Technologies International Co., Limited: Quotation
1 page
0641 Oracle 11g Administration
100% (1)
0641 Oracle 11g Administration
68 pages
Top Interview Questions Asked To A Penetration Tester
No ratings yet
Top Interview Questions Asked To A Penetration Tester
11 pages
MicroLogic 33070
No ratings yet
MicroLogic 33070
2 pages
Philippine Christian University
100% (2)
Philippine Christian University
50 pages
Account Statement From 1 Sep 2020 To 30 Sep 2020: TXN Date Value Date Description Ref No./Cheque No. Debit Credit Balance
No ratings yet
Account Statement From 1 Sep 2020 To 30 Sep 2020: TXN Date Value Date Description Ref No./Cheque No. Debit Credit Balance
2 pages
Matrix Approach, Second Edition, Addison-Wesley, 1997
No ratings yet
Matrix Approach, Second Edition, Addison-Wesley, 1997
3 pages
EURANE
No ratings yet
EURANE
19 pages
The Impact of E-Commerce On The Digital Economy
No ratings yet
The Impact of E-Commerce On The Digital Economy
6 pages
Winter School Brochure
No ratings yet
Winter School Brochure
2 pages
DM Mini Msan
No ratings yet
DM Mini Msan
4 pages
Literature Review On LMS
No ratings yet
Literature Review On LMS
12 pages
Installation, Operation, and Maintenance Manual: Air Monitor
No ratings yet
Installation, Operation, and Maintenance Manual: Air Monitor
70 pages
Dynamic Modelling and Control of Some Power Electronic - Lesson-1
No ratings yet
Dynamic Modelling and Control of Some Power Electronic - Lesson-1
62 pages
HC26.11.310 HBM Bandwidth Kim Hynix Hot Chips HBM 2014 v7
No ratings yet
HC26.11.310 HBM Bandwidth Kim Hynix Hot Chips HBM 2014 v7
24 pages
AL-502 DBMS Unit 4
No ratings yet
AL-502 DBMS Unit 4
25 pages
TREND - Ds - Iq3xcite
No ratings yet
TREND - Ds - Iq3xcite
20 pages
355 Use of Electronic Certificates On Board 27 Enero 2020
No ratings yet
355 Use of Electronic Certificates On Board 27 Enero 2020
15 pages
What You Need To Know To Design A Wire Harness
No ratings yet
What You Need To Know To Design A Wire Harness
3 pages
Minimonsta Manual
No ratings yet
Minimonsta Manual
27 pages
Laboratory Exercise No. 6 Poles and Zeros of A Transfer Function
No ratings yet
Laboratory Exercise No. 6 Poles and Zeros of A Transfer Function
10 pages

Framework To Approach A Kaggle Problem: 1. Importing The Training / Test Population

Uploaded by

Framework To Approach A Kaggle Problem: 1. Importing The Training / Test Population

Uploaded by

Competing with the best data scientists can be challenging.

Especially so, if some of them

Framework to approach a Kaggle Problem

a. Driver Telematic Analysis

b. BCI Challenge @ NER 2015

You might also like