7 More Steps To Mastering Machine Learning With Python - Page1
7 More Steps To Mastering Machine Learning With Python - Page1
Blog/News
Opinions
Tutorials
Top stories
Companies
Courses
Datasets
Education
Events (online)
Jobs
Software
Webinars
Predictive Analytics World Financial, May 31 - June 4, Las Vegas. Use code
KDnuggets for 15% off
KDnuggets Home » News » 2017 » Mar » Tutorials, Overviews » 7 More Steps to Mastering
Machine Learning With Python ( 17:n09 )
Tags: 7 Steps, Classification, Clustering, Deep Learning, Ensemble Methods, Gradient Boosting,
Machine Learning, Python, scikit-learn, Sebastian Raschka
This post is a follow-up to last year's introductory Python machine learning post, which includes a
series of tutorials for extending your knowledge beyond the original.
So, you have been thinking about picking up machine learning, but given the confusing state of the
web you don't know where to begin? Or maybe you have finished the first 7 steps and are looking for
some follow-up material, beyond the introductory?
Machine learning algorithms.
This post is the second installment of the 7 Steps to Mastering Machine Learning in Python series
(since there are 2 parts, I guess it now qualifies as a series). If you have started with the original post,
you should already be satisfactorily up to speed, skill-wise. If not, you may want to review that post
first, which may take some time, depending on your current level of understanding; however, I assure
you that doing so will be worth your effort.
After a quick review -- and a few options for a fresh perspective -- this post will focus more
categorically on several sets of related machine learning tasks. Since we can safely skip the
foundational modules this time around -- Python basics, machine learning basics, etc. -- we will jump
right into the various machine learning algorithms. We can also categorize our tutorials better along
functional lines this time.
I will, once again, state that the material contained herein is all freely available on the web, and all
rights and recognition for the works belong to their original authors. If something has not been
properly attributed, please feel free to let me know.
Just to review, these are the steps covered in the original post:
If, however, you are really green, I would start with the following, covering the absolute basics:
If you are looking for some alternative or complementary approaches to learning the basics of
machine learning, I have recently been enjoying Shai Ben-David's video lectures and freely available
textbook written with Shai Shalev-Shwartz. Find them both here:
Remember, the introductory material does not all need to be digested before moving forward with the
rest of the steps (in either this post or the original). Video lectures, texts, and other resources can be
consulted when implementing models using the reflected machine learning algorithms, or when
applicable concepts are being used practically in subsequent steps. Use your judgment.
We begin with the new material by first strengthening our classification know-how and introducing a
few additional algorithms. While part 1 of our post covered decision trees, support vector machines,
and logistic regression -- as well as the ensemble classifier Random Forests -- we will add k-nearest
neighbors, the Naive Bayes classier, and a multilayer perceptron into the mix.
Scikit-learn classifiers.
k-nearest neighbors (kNN) is a simple classifier and an example of a lazy learner, in which all
computation occurs at classification time (as opposed to occurring during a training step ahead of
time). kNN is non-parametric, and functions by comparing a data instance with the k closest instances
when making decisions about how it should be classified.
K-Nearest Neighbor classification using python
Naive Bayes is a classifier based on Bayes' Theorem. It assumes that there is independence among
features, and that the presence of any particular feature in one class is not related to any other feature's
presence in the same class.
The multilayer perceptron (MLP) is a simple feedforward neural network, consisting of multiple
layers of nodes, where each layer is fully connected with the layer which comes after it. The MLP
was introduced in Scikit-learn version 0.18.
First read an overview of the MLP classifier from the Scikit-learn documentation, and then practice
implementation with a tutorial.
We now move on to clustering, a form of unsupervised learning. In the first post we covered the k-
means algorithm; we will introduce DBSCAN and Expectation-maximization (EM) herein.
First off, read these introductory posts; the first is a quick comparison of k-means and EM clustering
techniques, a nice segue into new forms of clustering, and the second is an overview of clustering
techniques available in Scikit-learn:
First read a tutorial on the EM algorithm. Next, have a look at the relevant Scikit-learn
documentation. Finally, follow a tutorial and implement EM clustering yourself with Python.
If "Gaussian mixture models" is confusing at first glance, this relevant section from the Scikit-learn
documentation should alleviate any unnecessary worries:
First read and follow an example implementation of DBSCAN from Scikit-learn's documentation,
and then follow a concise tutorial:
Pages: 1 2
1. 24 Best (and Free) Books To Understand 1. 24 Best (and Free) Books To Understand
Machine Learning Machine Learning
2. COVID-19 Visualized: The power of 2. COVID-19 Visualized: The power of
effective visualizations for pandemic effective visualizations for pandemic
storytelling storytelling
3. How (not) to use Machine Learning for 3. Introducing MIDAS: A New Baseline for
time series forecasting: The sequel Anomaly Detection in Graphs
4. 50 Must-Read Free Books For Every 4. Covid-19, your community, and you a
Data Scientist in 2020 data science perspective
5. Free Mathematics Courses for Data 5. How (not) to use Machine Learning for
Science & Machine Learning time series forecasting: The sequel
6. Nine lessons learned during my first 6. 50 Must-Read Free Books For Every
year as a Data Scientist Data Scientist in 2020
7. New Poll: Coronavirus impact on 7. Coronavirus Data and Poll Analysis –
AI/Data Science/Machine Learning yes, there is hope, if we act now
community
Latest News
3 Reasons to Use Random Forest® Over a Neural Netwo...
KDnuggets 20:n14, Apr 8: Free Mathematics for Machine Learn...
2 Things You Need to Know about Reinforcement Learning ...
Simple Question Answering (QA) Systems That Use Text Si...
Build an app to generate photorealistic faces using Ten...
5 Ways Data Scientists Can Help Respond to COVID-19 and...
Top Stories
Last Week
Most Popular
1. COVID-19 Visualized: The power of effective visualizations for pandemic storytelling
2. How (not) to use Machine Learning for time series forecasting: The sequel
3. Stop Hurting Your Pandas!
4. Research into 1,001 Data Scientist LinkedIn Profiles, the latest
5. 24 Best (and Free) Books To Understand Machine Learning
6. Best Free Epidemiology Courses for Data Scientists
7. Python for data analysis... is it really that simple?!?
Most Shared
1. Introducing MIDAS: A New Baseline for Anomaly Detection in Graphs
2. How (not) to use Machine Learning for time series forecasting: The sequel
3. Best Free Epidemiology Courses for Data Scientists
4. Research into 1,001 Data Scientist LinkedIn Profiles, the latest
5. More Performance Evaluation Metrics for Classification Problems You Should Know
6. Advice for a Successful Data Science Career
7. Introduction to the K-nearest Neighbour Algorithm Using Examples
KDnuggets Home » News » 2017 » Mar » Tutorials, Overviews » 7 More Steps to Mastering
Machine Learning With Python ( 17:n09 )