0% found this document useful (0 votes)
361 views

Self-Learning Data Science

This document provides a 31-day curriculum for learning data science. It is split into 4 parts that cover data preparation, exploratory data analysis, creating problem statements, and building machine learning models. Each part provides explanatory articles and hands-on tutorials focusing on key concepts and practical applications. The goal is to enable self-learning of data science through curated online resources and implementing projects using Python.

Uploaded by

ko no
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
361 views

Self-Learning Data Science

This document provides a 31-day curriculum for learning data science. It is split into 4 parts that cover data preparation, exploratory data analysis, creating problem statements, and building machine learning models. Each part provides explanatory articles and hands-on tutorials focusing on key concepts and practical applications. The goal is to enable self-learning of data science through curated online resources and implementing projects using Python.

Uploaded by

ko no
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

SELF LEARNING

DATA
SCIENCE
IN 31 DAYS

SPECIAL EDITION

THE RESEARCH NEST COMPILED BY


Empowering Humanity With Exclusive Insights Aditya Vivek Thota

Designed in Canva
THE RESEARCH NEST

PREFACE
With vast amounts of data being This unique booklet is intended to
generated in recent times, there is enable individuals for the same by
an ever increasing need for providing the best curated
professionals who can make any resources to learn and implement
valuable sense out of it: the data practical projects.
scientists. Today, with humongous
amounts of resources available
online, self-learning is not beyond
scope anymore.

THE PROJECT NOTE FROM THE EDITOR


Please do mention the credits
The booklet is split into 4 major parts when you share this booklet
with each one laying emphasis on elsewhere.
certain fundamental aspects of data For any feedback/errata you can
science. The book further focuses on mail us at
practicing data science using [email protected]
Python.

WHAT TO EXPECT?
EXPLANATORY
01 ARTICLES

HANDS-ON
02 TUTORIALS

PRACTICAL INSIGHTS
03
DATA PREPARATION

PART ONE
"Data really powers everything that we do"

- JEFF WEINER, CEO, LINKEDIN


FINDING YOUR DATA

The first step is all about identifying what domain


you want to work in and finding the relevant
dataset. Data science starts with data collection
after all. Choose a dataset in your domain of
interest, download the same and get ready for
some action!

Below are some links, where you can find public


datasets in different sectors:
REFERENCE LINKS TO OBTAIN DATASETS

01 KAGGLE

UC IRVINE MACHINE
02
LEARNING REPOSITORY

A COMPILATION OF ALL
03
PUBLIC DATASETS ON GITHUB
WHAT CAN YOU DO WITH YOUR DATASET?

Once you have your dataset ready, there are


broadly (but not limited to) three kinds of
applications you can build using the same. These
include prediction, classification, or
recommendation.

Apart from that, you can try to find hidden


patterns in the data. Have a good look at your
dataset and the variables in it. Identify what kind
of analysis it can be used for and finalize the
problem to tackle.

Is it classification, regression, or clustering


based? If your dataset appears inconclusive to
any of the above-mentioned categories, as a
beginner we would recommend you to change
your dataset and find a more relevant one.
SUBJECTS AND PRE-REQUSITES

Here is a comprehensive compilation of learning


resources you may need on your journey en-
route to becoming a data scientist.
While you may not need to know all of them in
detail to get started. Having a general idea of
these topics can prove to be extremely useful.

LINKS TO QUICKLY LEARN SOME KEY


CONCEPTS

FIVE BASIC STATISTICS CONCEPTS


01
DATA SCIENTISTS NEED TO KNOW

BASICS OF PROBABILITY FOR DATA


02
SCIENCE

A COMPREHENSIVE GUIDE TO LINEAR


03
ALGEBRA FOR DATA SCIENTISTS

04 CALCULUS IN DATA SCIENCE


DATA PRE-PROCESSING

Before one can start analyzing the dataset, one


needs to make some modifications to make it a
bit more programming friendly. Here are some
standard approaches used. Try implementing
these techniques as per relevance for your
chosen dataset.

TUTORIALS OF VARIOUS PRE-PROCESSING


APPROACHES

01 HANDLING MISSING VALUES

02 DEALING WITH CATEGORICAL DATA

03 NORMALIZATION OF DATA

04 DATA PRE-PROCESSING SUMMARY


EXPLORATORY DATA ANALYSIS

PART TWO
“In God we trust. All others must bring

data.”

- W. EDWARDS DEMING,
STATISTICIAN
PERFORMING EDA

Once we have a detailed and clean dataset in


hand, we can do various statistical analyses and
visualizations to better understand our data.

Wikipedia has an entire page dedicated to EDA.


You can refer the same to get the overview of
what it is all about.

LINKS TO SOME USEFUL RESOURCES

COMPREHENSIVE GUIDE TO DATA


01
EXPLORATION

02 VARIOUS EDA TECHNIQUES

HANDS-ON KAGGLE TUTORIAL FOR EDA


03
USING PYTHON

04 WIKIPEDIA PAGE ON EDA


There are several libraries available in Python for
performing EDA. You can easily find one based
on your requirement and proceed further.
Once the data is thoroughly analyzed, we can
proceed to the next step of building some
predictive models using different techniques and
ultimately formulate a tangible application with
practical significance.

TO LEARN MORE ABOUT THE STATISTICS


BEHIND HYPOTHESIS TESTING, VISIT
THESE LINKS:

LECTURE SLIDES ON HYPOTHESIS


01
TESTING

YOUR GUIDE TO MASTER HYPOTHESIS


02
TESTING IN STATISTICS
CREATING PROBLEM STATEMENTS

PART THREE
"Not everything that can be counted

counts, and not everything that counts

can be counted."

- ALBERT EINSTEIN, PHYSICIST


You have a clean dataset ready and doing an
exploratory data analysis should give a very clear
picture of what we can do with the dataset.
Choosing the right model for the situation can be
challenging for a beginner.

Based on your understanding, you can finalize to


use 2-3 methods and get ready to build your
model.

Here are two useful articles exploring basic


machine learning algorithms for data science and
the scenarios in which they are preferred.

TOP 10 MACHINE LEARNING


01
ALGORITHMS

CHOOSING THE RIGHT ALGORITHM


02
FOR YOUR DATASET
BUILDING YOUR MODELS

PART FOUR
"The goal is to turn data into information,

and information into insight."

- CARLY FIORINA,
FORMER CEO, HP
ESSENTIAL MACHINE LEARNING

With the dataset prepared and problem


statements formulated, the stage is all set to
build and train your models using various ML
methods.
Here are some must-read resources for any
aspiring data scientist summarizing almost
everything you need to know.

USEFUL REFERENCE LINKS

HOW TO APPROACH (ALMOST) ANY


01
MACHINE LEARNING PROBLEM?

IMPLEMENTATION OF DIFFERENT
02
MACHINE LEARNING ALGORITHMS

THE ULTIMATE KAGGLE TUTORIAL


03
FOR DATA SCIENCE

THE ULTIMATE KAGGLE TUTORIAL


04
FOR MACHINE LEARNING
ADDITIONAL HANDS-ON TUTORIALS

The following tutorials are for those interested in


further exploring the practical applications of
machine learning.

01 PREDICTING THE PRICE OF A HOUSE

SIGN LANGUAGE RECOGNITION USING


02
HAND GESTURES

TEXT EMOTION DETECTION USING


03
NATURAL LANGUAGE PROCESSING
END NOTES

This compilation is a effort of The Research Nest


and is associated with the e-learning social
media campaign, The December Data Festival,
2018.

We would love to hear your feedback and


suggestions for improvement. Do drop us a mail
at [email protected].

Hope you found this useful. To support and stay


updated with more such initiatives, please do
follow Research Nest on their social media
handles.

(Click to follow)

You might also like