0% found this document useful (0 votes)
69 views16 pages

How I'd Learn Machine Learning (If I Could Start Over) - by Egor Howell - Jan, 2024 - Towards Data Science

This is a proper guide book on hiw one can learn machine learning. It's equips one with the fundamental skills and knowledge required for the field and gives one mastery of the various languages utilized

Uploaded by

jaraemmanuel562
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views16 pages

How I'd Learn Machine Learning (If I Could Start Over) - by Egor Howell - Jan, 2024 - Towards Data Science

This is a proper guide book on hiw one can learn machine learning. It's equips one with the fundamental skills and knowledge required for the field and gives one mastery of the various languages utilized

Uploaded by

jaraemmanuel562
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Member-only story

How I’d Learn Machine Learning (If I


Could Start Over)
A full breakdown of how you can learn machine learning this year
effectively

Egor Howell · Follow


Published in Towards Data Science · 9 min read · 1 day ago

244

Image by author.

I have been working as a Data Scientist for over two years. Over time, I have
learned and mainly studied machine learning (ML). To me, it’s probably the
most fascinating part of the job.
ML is a BIG space, there is so much to learn and understand. However,
taking it one step at a time makes the whole process less daunting and much
easier to handle.

In this article, I want to go over the steps I would take if I had to learn ML
from scratch again. Let’s get into it!

How I’d Learn Machine Learning (If I Could Start Over)

Supplemental video.

Maths
Machine learning revolves around algorithms, which are essentially a series
of mathematical operations. These algorithms can be implemented through
various methods and in numerous programming languages, yet their
underlying mathematical principles are the same.

A frequent argument is that you don’t need to know maths for machine
learning because most modern-day libraries and packages abstract the
theory behind the algorithms.

However, I would argue that if you want to become a top-level Machine


Learning Engineer or Data Scientist, you need to know the basics of linear
algebra, calculus, and statistics at least.

There is of course more maths to learn, but best start with the basics and you can
always enrich your knowledge later on.

You don’t need to understand all these concepts to a master’s degree level but
should be able to answer questions like what is a derivative, how to multiply
matrices together and what is maximum likelihood estimation.

That list I just wrote is the bedrock of nearly every machine learning
algorithm, so having this solid foundation will set you up for success in the
long run.

Some of the key things I recommend you learn are:

Multivariable calculus

Matrices and their operations

Eigenvectors and eigenvalues

Probability distributions

Statistical uncertainty (confidence intervals, prediction intervals, etc.)

Now, there are numerous courses out there that you can take to learn all the
required maths. For a thorough introduction, I recommend the videos on
freeCodeCamp on Linear Algebra, Calculus, and Statistics.

You can also use websites such as Khan Academy and Brilliant which have
great resources on these topics. They also have a wide range of other
domains, so feel free to explore!

Khan Academy | Free Online Courses, Lessons & Practice


Learn for free about math, art, computer programming, economics,
physics, chemistry, biology, medicine, finance…
www.khanacademy.org

Brilliant | Learn interactively


Brilliant - Build quantitative skills in math, science, and computer
science with hands-on, interactive lessons.
brilliant.org

My main advice is to find one course, complete and move on. You can always
come back later if there are gaps in your knowledge or even use Google!

Python
Python is the gold standard and the go-to programming language for
machine learning.

Beginners often get caught up in the so-called “best way” to learn Python. In
reality, any introductory course will suffice as they will teach all the same
things.

The ones I recommend are either tutorialspoint, w3schools, or


freeCodeCamp. I have used all of these at some time and they are really
useful, particularly for someone completely new to the language.

The main things you want to learn are:

Native data structures (dictionaries, lists, sets, and tuples)

For and while loops

If-else conditional statements

Functions and classes

Some basic maths functions


Python Tutorial
W3Schools offers free online tutorials, references and exercises in
all the major languages of the web. Covering…
www.w3schools.com

My most important advice when taking any introduction Python course is to


code alongside the course. You need hands-on practice to let the key
concepts sink in. So, make sure you are doing the exercises.

Machine Learning Libraries


After your basic Python skills, it’s time to learn some of the more specific
data science and machine learning packages. The ones I recommend are:

NumPy — This library is designed for scientific computing, offering many


mathematical functions and matrix support. Developed in C, it has optimized
computations, which is particularly beneficial for handling large models and
big data. As always, I recommend the freeCodeCamp course.

Pandas — This is the go-to library for loading, manipulating, and working
with data in Python. It is great for almost any data analysis task and is easy to
use. freeCodeCamp pandas crash course.

Matplotlib — As a Data Scientist, you will need to visualize your data or


results. Matplotlib is the main visualization package in Python due to its wide
range of abilities. freeCodeCamp course.

I’d also recommend learning and installing Anaconda, a software


distribution framework of Python and R for scientific computing. It’s
basically a whole one-stop shop for data science and machine learning and
comes with all the necessary packages, Python, Jupyter Notebooks, and
environment manager. Again, I recommend this freeCodeCamp video,
which walks you through how to install and use Anaconda.
Python for Data Science - Course for Beginners (Learn Python, Pandas, Nu…
Nu…

Great overall crash course. However, you would only need the first half an hour of the video for the Anaconda
installation.

As with the previous sections, don’t spend too much time on this and get
stuck in tutorial hell. Learn the basics and move on to the next step, which is
probably the most exciting!

Machine Learning Algorithms & Theory


This is where the fun begins!

The previous three steps were all about getting your foundation ready to
tackle machine learning. These foundational tasks shouldn’t take too long,
maybe a month.

However, the machine learning theory part can take some time due to the
length of the courses. It’s important not to rush, as each subsequent step and
model usually builds on the previous.

The course I took at the beginning of my journey and the one I recommend
you start with is Andrew Ng’s Machine Learning Specialization on Coursera.
I took it back in 2020 when it was still in Octave! However, it has since been
revamped. There are cutting-edge topics in there such as recommendation
systems and reinforcement learning, not to mention the coding tutorials are
now in Python!

Machine Learning
Offered by Stanford University and DeepLearning.AI. #BreakIntoAI
with Machine Learning Specialization. Master…
www.coursera.org

This course will teach you the A-Z of machine learning and give you hands-
on experience implementing them in Python using specialized ML packages
like Sci-Kit Learn, XGBoost, and TensorFlow.

Even though this course is beginner-level, it will cover any question you are
likely to get in an ML interview, particularly if you are applying for entry-
level roles.

The next course I recommend is the Deep Learning Specialization by


Andrew Ng. This is the follow-on course from the Machine Learning
Specialization and will teach all you need to know about deep learning. It
even touches upon Large Language Models (LLMs)!

Deep Learning
Learn Deep Learning from deeplearning.ai. If you want to break into
Artificial intelligence (AI), this Specialization…
www.coursera.org

Although these two courses will cover pretty much all the theory you need
for ML, feel free to research and supplement your learning. There are so
many niches and specialisms, that it would simply be exhaustive to list them
all out here along with their courses.
For example, one course I took recently was Andrey Karpathy’s Neural
Networks: Zero to Hero. It started quite a low level by building a neural
network from scratch. However, in the last video, we built our own
Generative Pre-trained Transformers (GPT), the model that powers ChatGPT
and most of the recent AI boom!

Let's build GPT: from scratch, in code, spelled out.

Practise
The best way to learn anything is to practice and get hands-on experience.
This is by far the most important step in learning ML as it is what really
solidifies your understanding.

Kaggle
I would begin by entering a few competitions on Kaggle. The sole goal is not
to win and earn money, but to learn how to implement a machine learning
algorithm to a real-world problem. In essence, this is how machine learning
is used in industry, to solve business problems.

Try to enter a variety of competitions to get experience in several domains.


Some of the most common are time series forecasting, computer vision, and
language modeling. This will improve the breadth of your knowledge and
also help you understand what you want to specialize in at a later date!

ML From Scratch
Another method I used was to implement ML algorithms from scratch using
basic Python and packages like NumPy. Being able to write an algorithm
from first principles is one of the best ways to learn it.

You can start simply with linear regression and gradient descent. Then
move over to the hard stuff, eventually working your way up to a shallow
neural network!

You can check out my git repo where I have written some of these algorithms
from scratch.

GitHub - egorhowell/ML-Algorithms-From-Scratch: Deriving


Machine Learning algorithms from first…
Deriving Machine Learning algorithms from first principles. -
GitHub - egorhowell/ML-Algorithms-From-Scratch: Deriving…
github.com

How To “Actually” Stand Out


If you want to take things to the next level, then you need to show your work.
This is an asymmetric system. Putting in that extra 20% will put you ahead of
80% of people.

Blog
The easiest way to get started is by having a blog. Writing about ML concepts
and algorithms will improve your understanding and display your work to
potential employers. Very few people will be doing this, so you will be in the
top echelon of practitioners.
You can start writing about anything. For example: how a neural network
works or what are Markov chains. I found it useful to write a series of blogs
about one topic. For example, this is my Convolutional Neural Network
series.

Egor Howell

Convolutional Neural
Networks

View list 4 stories

Over time, you can write about more complex topics and start developing a
specialism that can help you target your job search if you want to. Although,
early on in your career this is probably unlikely.

Research Papers
To go even further, you can re-implement a research paper. It depends on
what paper you choose, but this is very hard. I have tried this before and
found it very difficult to match the results given in the paper. Nevertheless,
this is the pinnacle of learning ML and you will gain invaluable knowledge in
the process.

To find papers, I recommend subscribing to and following ML papers of the


week. They have a newsletter and Twitter account, which every week sends
out the biggest AI papers published that week along with their key links.
DAIR.AI
@dair_ai · Follow

The Top ML Papers of the Week (Jan 15 - Jan 21):

- AlphaCodium
- AlphaGeometry
- RAG vs. Finetuning
- Self-Rewarding Models
- Overview of LLMs for Evaluation
- Tuning Language Models by Proxy
...
7:17 PM · Jan 21, 2024

571 Reply Copy link

Read 5 replies

Example tweet.

To understand and implement the paper, I recommend the following steps:

Read & Digest — Take your time over this to ensure you understand what
the goal, model, and results were from the authors.

Data — If possible, try and get the same data used in the paper. Read and
analyze the data at your own speed.

Study Model Architecture — Review the model and its structure, try and
to learn why the author had this specific architecture for their problem.

Implement — Start building the model and generating results. Take this
one step at a time, slowly iterating on simple steps.

It’s important to document this work as well. You can do this anywhere like
on Twitter/X, LinkedIn, GitHub profile, or even a blog post. Re-
implementing a paper is one of the best ways to stand out, particularly if you
want to work in ML research.

Summary
These are the steps I would take if I had to learn machine learning
completely from scratch again. It is important to note that no one size fits all
and to tailor your learning to your background and experience. Some of the
courses and tutorials I listed here may not be your cup of tea and that’s fine.
The main takeaway is to simply learn basics and just enough to start getting
stuck into real machine learning problems and projects.

Happy learning!

Another Thing!
I have a free newsletter, Dishing the Data, where I share weekly tips for
becoming a better Data Scientist, and the latest AI news to keep you in the
loop. There is no “fluff” or “clickbait”, just pure actionable insights from a
practicing Data Scientist.

Dishing The Data | Egor Howell | Substack


Thoughts & Insights From A Data Scientist. Click to read Dishing
The Data, by Egor Howell, a Substack publication with…
dishingthedata.substack.com

Connect With Me!


YouTube 🎬
Newsletter 📄

LinkedIn 👔

Twitter 🖊

GitHub 🖥

Kaggle 🏅

(All emojis designed by OpenMoji — the open-source emoji and icon project.
License: CC BY-SA 4.0)
Data Science Artificial Intelligence Machine Learning Careers Statistics

Written by Egor Howell Follow


3.1K Followers · Writer for Towards Data Science

🎬
Top Writer: DS, ML, AI , Statistics & Optimization.
https://www.youtube.com/@egorhowell. ---- All opinions here are my own.

More from Egor Howell and Towards Data Science

Egor Howell in Towards Data Science Sheila Teo in Towards Data Science

How I Became A Data Scientist — How I Won Singapore’s GPT-4


No CS Degree, No Bootcamp Prompt Engineering Competition
How I went from despising coding to being a A deep dive into the strategies I learned for
fully fledged Data Scientist harnessing the power of Large Language…

· 12 min read · Jan 6 · 24 min read · Dec 29, 2023

1K 12 10.2K 119
Thu Vu in Towards Data Science Egor Howell in Towards Data Science

How to Learn AI on Your Own (a How I Stay Up to Date with AI as a


self-study guide) Data Scientist
If your hands touch a keyboard for work, It’s a bit more than a few simple Google
Artificial Intelligence is going to change your… searches

· 12 min read · Jan 5 · 8 min read · 5 days ago

2.5K 24 419 6

See all from Egor Howell See all from Towards Data Science

Recommended from Medium

Cristian Leo in Towards Data Science Hena Kasawatia

Stochastic Gradient Descent: Math My data scientist interview


and Python Code experience at Visa
Deep Dive on Stochastic Gradient Descent. Though we are living in a super digital era
Algorithm, assumptions, benefits, formula,… where Chat GPT can give us an answer to…

17 min read · Jan 16 3 min read · Nov 12, 2023

144 2 436 6

Lists

Predictive Modeling w/ Practical Guides to Machine


Python Learning
20 stories · 840 saves 10 stories · 980 saves

Natural Language Processing ChatGPT prompts


1132 stories · 601 saves 36 stories · 1034 saves

Anjolaoluwa Ajayi in 𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨 Anmol Tomar

130 Data Science Terms Every Data Don’t use loc/iloc with Loops In
Scientist Should Know in 2024 Python, Instead, Use This!
Most Data Science Jargon explained in plain Run your loops at a 60X faster speed
English

11 min read · Jan 5 · 3 min read · 4 days ago

2.4K 24 513 4
Sivan Hermon in Code Like A Girl The Pareto Investor

Starting with No: Why Most People ChatGPT has Just Been Dethroned
Shouldn’t Be Managers by French Geniuses!
Why the desired title won’t give you what These Three Individuals, a Former Researcher
you’re looking for at DeepMind and Two Others from Meta,…

· 7 min read · 3 days ago · 5 min read · Jan 20

4.5K 113 1.8K 22

See more recommendations

You might also like