Session 01 - Programming
Session 01 - Programming
Machine Learning
2023
Learning outcomes
After completing this session you should be able to:
1
Coding in machine learning
• The coding skills that you need to get into
machine learning will depend on the area of
data science that you may end up working
on.
• If you end up managing databases, it’s
important that you know SQL.
• If you plan to use your data to perform
visualisation, analytics, and modelling, you
will want to make sure you are strong in UK, Canada, US,
Java, Python, and R. Australia (2019)
• If you plan on building a career in machine
learning, then ideally, you should learn
Python, R and SQL.
R
• R is an excellent choice for statistical computing and data visualisation.
• Pros:
• R program has an excellent range of domain-specific packages. You can find a package
to do anything.
• It has packages that support the most quantitative and statistical applications.
• The base installation of R comes with in-built statistic functions and methods. Even the
matrix algebra is handled well.
• The core strength of R is a data visualisation that uses the ggplot2 library.
• Cons:
• Slower when compared to Python. The speed of the R language is slow.
• Since it is excellent for statistics and data science but inferior for general-purpose
programming.
• Conclusion: Brilliant for what it is designed to perform.
Dr Ivan Olier - Machine Learning 4
2
Python
• Most data scientists use Python
• In a recent worldwide survey, it was found that 83% of 24000 data
professionals used Python.
• Python is a dynamic and general-purpose language.
• Pros:
• One of the easiest programming language and a brilliant choice for beginners.
• Python is a dynamic and general-purpose language.
• It has many inbuilt and third-party libraries for most of the tasks.
• Python API is provided by many online services.
• Some of the popular packages scikit-learn, pandas, and Tensorflow are used for advanced
machine learning applications.
• Cons:
• R’s excellent statistical and data analysis packages dwarf the Python language.
• Python is a dynamically typed language. Which means you must show due care when typing.
You can expect a Type error from time to time.
• Conclusion: A great all-rounder programming language.
Dr Ivan Olier - Machine Learning 5
• Pros:
• SQL is efficient in querying, updating, and manipulating data in the database management
system.
• Since SQL follows the declarative syntax one can read it with ease.
• It is used in a range of applications to handle the data efficiently.
• Using the SQLAlchemy module, one can integrate SQL with other languages.
• Experienced programmers find it effortless to learn SQL.
• Cons:
• SQL’s analytical capability is limited. Your options become limited beyond counting,
aggregating, and averaging data.
• There is various implementation of SQL such as MariaDB, SQLite, and PostgreSQL. They are
different which makes the inter-operability difficult.
• Conclusion: It is an efficient and timeless language.
Dr Ivan Olier - Machine Learning 6
3
To keep an eye on
• Programming languages evolve quite fast, particularly in data science.
• Their popularities also change…
• It is important that instead of learning how to use instructions and packages of a particular
language, we learn how to code almost independently of the choice of language.
• Learning programming languages is a skill similar to learning human languages.
• New languages are growing fast:
• Julia
• Swift
Python
• Python can be used for a variety of different applications such as web development,
software development, scripting, machine learning and performing complex mathematics.
• Python can be used to construct production ready end to end pipelines.
• Python enjoys cross planform support (Windows, Mac, Linux, Raspberry Pi, etc.).
• The most recent major version of Python is Python 3 (which is what we will be using during
this module).
• A good resource for python is python.org
• https://www.python.org/about/gettingstarted/
4
Python
• Python will be used for the rest of the module.
• Currently Python is widely regarded as the go to language for many data science and AI
tasks.
• Python is a general-purpose programming language which has a significant ecosystem of
tools and packages while providing extensive support through forums.
• Python is easy to learn and is in the top 5 most used programming languages of 2019.
10
5
Coding with Python
1. IPython (interactive Python
• Python can be run interactively
• Used extensively in research
2. Python scripts
• What if we want to run more than a few lines of code?
• Then we must write text files in .py
11
12
6
Python is dynamically typed
You can find the type of a variable using type(). For example, type type(x).
Dr Ivan Olier - Machine Learning 13
13
Arithmetic operations
14
7
Comparison operators
15
Comparison examples
False
16
8
Logical operators
17
Strings
• Powerful and flexible in
Python
• Can be added
• Can be multiplied
• Can be multiple lines
18
9
Commenting
• Useful when your code needs further explanation. Either for your future self and anybody
else.
• Useful when you want to remove the code from execution but not permanently
• Comments in Python are done with #
19
Lists
• One of the most useful concepts
• Group multiple variables together (a kind of container!)
20
10
Indexing a list
• Indexing – accessing items within a data structure
21
22
11
Slicing lists
• Slicing – obtain a particular set of sub-elements from a data structure.
• Very useful and flexible.
23
24
12
Dictionaries
• Similar to actual dictionaries
• They are effectively 2 lists combined – keys
and values
• We use the keys to access the values instead
of indexing them like a list
• Each value is mapped to a unique key
25
If Else
Conditional statement
• Fundamental building Executed if answer is True
block of software
Executed if answer is
False
26
13
Indentation matters!
• Code is grouped by its indentation
• Indentation is the number of whitespace or tab characters before the code.
• If you put code in the wrong block, then you will get unexpected behaviour.
27
For loop
• Allows us to iterate over a set amount of variables within a data structure. During that we
can manipulate each item however we want
28
14
Functions
• Allow us to package functionality in a nice and readable way
• reuse it without writing it again
• Make code modular and readable
• Rule of thumb - if you are planning on using very similar code more than once, it may be
worthwhile writing it as a reusable function.
29
Function declaration
keyword Any number of arguments
30
15
Function example
31
32
16
Importing modules and packages
• We can import specific modules from this package
using the dot notation.
• For example, to import the dataset module from the
my_model package, we can use one of the following
code snippets:
• Or:
33
34
17
Scikit-learn
• For machine learning tasks this
module will introduce you one of
the most popular Python
machine learning libraries called
Scikit-learn.
35
Jupyter Notebooks
• One of the most common ways to write and execute
Python code is through the use of a Jupyter Notebooks
which are run directly in the browser.
• They are also useful for making notes through the use of
markdown and act as executable documents and should
be considered as a recipe for your data science task.
• One advantage of using Jupyter Notebooks is that code
can be executed directly in the browser allowing you to
instantly see the results.
• Notebooks can be exported as executable . py files which
can be used in your development pipeline.
• Notebooks ensure reproducibility by easily sharing the
notebook and associated data.
36
18
Jupyter
Notebooks
37
Anaconda
• Anaconda is a scalable
data science platform
which allows you to
combine but also
segregate different
packages and frameworks
to prove canned
environments.
• As these environments are
isolated from each other
they can be safely
modified and changed
without affecting any
other environments used.
38
19
Final remarks
• Coding is an important skill for Machine Learning.
• The expectation is that a data scientist knows more
than one programming language.
• The common type of coding is scripts, which are
pieces of codes that doesn’t have a GUI and that are
usually run interactively.
• Python has become the standard for data science.
Again, knowing other languages gives you options.
• Python is massively rich in packages, which are
typically well-maintained by the community.
• However, it is wise to keep an eye on new
technologies. The data science industry is constantly
and rapidly changing. It is very easy to become
obsolete!
39
20