0% found this document useful (0 votes)
32 views

Session 01 - Programming

This document provides an overview of machine learning programming. It discusses several programming languages used for machine learning like Python, R, and SQL. Python is highlighted as the most popular language currently due to its large ecosystem of tools and being easier to learn than other languages. The document also covers basic Python programming concepts like variables, data types, arithmetic and comparison operators, and strings.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Session 01 - Programming

This document provides an overview of machine learning programming. It discusses several programming languages used for machine learning like Python, R, and SQL. Python is highlighted as the most popular language currently due to its large ecosystem of tools and being easier to learn than other languages. The document also covers basic Python programming concepts like variables, data types, arithmetic and comparison operators, and strings.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Session 01

Machine Learning Basis –


Programming
Dr Ivan Olier
[email protected]

Machine Learning
2023

Learning outcomes
After completing this session you should be able to:

• Understand the importance of programming for machine learning.


• Identify the several programming languages that are currently in use to work in machine
learning.
• Write and run small pieces of code in Python.

Dr Ivan Olier - Machine Learning 2

1
Coding in machine learning
• The coding skills that you need to get into
machine learning will depend on the area of
data science that you may end up working
on.
• If you end up managing databases, it’s
important that you know SQL.
• If you plan to use your data to perform
visualisation, analytics, and modelling, you
will want to make sure you are strong in UK, Canada, US,
Java, Python, and R. Australia (2019)
• If you plan on building a career in machine
learning, then ideally, you should learn
Python, R and SQL.

Dr Ivan Olier - Machine Learning 3

R
• R is an excellent choice for statistical computing and data visualisation.

• Pros:
• R program has an excellent range of domain-specific packages. You can find a package
to do anything.
• It has packages that support the most quantitative and statistical applications.
• The base installation of R comes with in-built statistic functions and methods. Even the
matrix algebra is handled well.
• The core strength of R is a data visualisation that uses the ggplot2 library.
• Cons:
• Slower when compared to Python. The speed of the R language is slow.
• Since it is excellent for statistics and data science but inferior for general-purpose
programming.
• Conclusion: Brilliant for what it is designed to perform.
Dr Ivan Olier - Machine Learning 4

2
Python
• Most data scientists use Python
• In a recent worldwide survey, it was found that 83% of 24000 data
professionals used Python.
• Python is a dynamic and general-purpose language.
• Pros:
• One of the easiest programming language and a brilliant choice for beginners.
• Python is a dynamic and general-purpose language.
• It has many inbuilt and third-party libraries for most of the tasks.
• Python API is provided by many online services.
• Some of the popular packages scikit-learn, pandas, and Tensorflow are used for advanced
machine learning applications.
• Cons:
• R’s excellent statistical and data analysis packages dwarf the Python language.
• Python is a dynamically typed language. Which means you must show due care when typing.
You can expect a Type error from time to time.
• Conclusion: A great all-rounder programming language.
Dr Ivan Olier - Machine Learning 5

SQL – Structured Query Language


• It is domain-specific. The relational database management system
uses SQL to manage data.

• Pros:
• SQL is efficient in querying, updating, and manipulating data in the database management
system.
• Since SQL follows the declarative syntax one can read it with ease.
• It is used in a range of applications to handle the data efficiently.
• Using the SQLAlchemy module, one can integrate SQL with other languages.
• Experienced programmers find it effortless to learn SQL.
• Cons:
• SQL’s analytical capability is limited. Your options become limited beyond counting,
aggregating, and averaging data.
• There is various implementation of SQL such as MariaDB, SQLite, and PostgreSQL. They are
different which makes the inter-operability difficult.
• Conclusion: It is an efficient and timeless language.
Dr Ivan Olier - Machine Learning 6

3
To keep an eye on
• Programming languages evolve quite fast, particularly in data science.
• Their popularities also change…
• It is important that instead of learning how to use instructions and packages of a particular
language, we learn how to code almost independently of the choice of language.
• Learning programming languages is a skill similar to learning human languages.
• New languages are growing fast:
• Julia
• Swift

Dr Ivan Olier - Machine Learning 7

Python
• Python can be used for a variety of different applications such as web development,
software development, scripting, machine learning and performing complex mathematics.
• Python can be used to construct production ready end to end pipelines.
• Python enjoys cross planform support (Windows, Mac, Linux, Raspberry Pi, etc.).
• The most recent major version of Python is Python 3 (which is what we will be using during
this module).
• A good resource for python is python.org
• https://www.python.org/about/gettingstarted/

Dr Ivan Olier - Machine Learning 8

4
Python
• Python will be used for the rest of the module.
• Currently Python is widely regarded as the go to language for many data science and AI
tasks.
• Python is a general-purpose programming language which has a significant ecosystem of
tools and packages while providing extensive support through forums.
• Python is easy to learn and is in the top 5 most used programming languages of 2019.

Dr Ivan Olier - Machine Learning 9

Why Python for Machine Learning?


• Python is the most widely used programming language
by machine learning scientists.
• Python is easier to learn than other programming
languages such as R
• Eight of the top ten most used data tools are derived
from or utilise Python.

Dr Ivan Olier - Machine Learning 10

10

5
Coding with Python
1. IPython (interactive Python
• Python can be run interactively
• Used extensively in research

2. Python scripts
• What if we want to run more than a few lines of code?
• Then we must write text files in .py

Dr Ivan Olier - Machine Learning 11

11

Variables and variable types


• Variables store values

• Variables have a type, which defines the way it is stored.


• The basic types are:

Dr Ivan Olier - Machine Learning 12

12

6
Python is dynamically typed

Important lesson to remember!


We can't do arithmetic operations on variables of different types. Therefore, make sure that
you are always aware of your variable types!

You can find the type of a variable using type(). For example, type type(x).
Dr Ivan Olier - Machine Learning 13

13

Arithmetic operations

• Similar to actual Mathematics.


• Order of precedence is the same as in
Mathematics.

• We can also use brackets ()

Dr Ivan Olier - Machine Learning 14

14

7
Comparison operators

• I.e. comparison operators


• Return Boolean values
(i.e. True or False)
• Used extensively for conditional
statements

Dr Ivan Olier - Machine Learning 15

15

Comparison examples

False

Dr Ivan Olier - Machine Learning 16

16

8
Logical operators

• Allows us to extend the conditional logic


• Will become essential later on

Dr Ivan Olier - Machine Learning 17

17

Strings
• Powerful and flexible in
Python
• Can be added
• Can be multiplied
• Can be multiple lines

You can use ‘print’ to print


an outcome on the
terminal (if you are
running a script)

Dr Ivan Olier - Machine Learning 18

18

9
Commenting
• Useful when your code needs further explanation. Either for your future self and anybody
else.
• Useful when you want to remove the code from execution but not permanently
• Comments in Python are done with #

Dr Ivan Olier - Machine Learning 19

19

Lists
• One of the most useful concepts
• Group multiple variables together (a kind of container!)

Dr Ivan Olier - Machine Learning 20

20

10
Indexing a list
• Indexing – accessing items within a data structure

• Indexing a list is not very intuitive...


• The first element of a list has an index 0

Dr Ivan Olier - Machine Learning 21

21

Lists with integers


range() - a function that generates a sequence of numbers as a list

Dr Ivan Olier - Machine Learning 22

22

11
Slicing lists
• Slicing – obtain a particular set of sub-elements from a data structure.
• Very useful and flexible.

Dr Ivan Olier - Machine Learning 23

23

Lists can be of different types


• Not very useful, but possible

Dr Ivan Olier - Machine Learning 24

24

12
Dictionaries
• Similar to actual dictionaries
• They are effectively 2 lists combined – keys
and values
• We use the keys to access the values instead
of indexing them like a list
• Each value is mapped to a unique key

Dr Ivan Olier - Machine Learning 25

25

If Else
Conditional statement
• Fundamental building Executed if answer is True
block of software
Executed if answer is
False

• Try running the example below.


• What do you get?

Dr Ivan Olier - Machine Learning 26

26

13
Indentation matters!
• Code is grouped by its indentation
• Indentation is the number of whitespace or tab characters before the code.
• If you put code in the wrong block, then you will get unexpected behaviour.

Dr Ivan Olier - Machine Learning 27

27

For loop
• Allows us to iterate over a set amount of variables within a data structure. During that we
can manipulate each item however we want

• Again, indentation is important here!


• Example:

Dr Ivan Olier - Machine Learning 28

28

14
Functions
• Allow us to package functionality in a nice and readable way
• reuse it without writing it again
• Make code modular and readable
• Rule of thumb - if you are planning on using very similar code more than once, it may be
worthwhile writing it as a reusable function.

Dr Ivan Olier - Machine Learning 29

29

Function declaration
keyword Any number of arguments

[Optional] Exits the function and returns some value

• Functions accept arguments and execute a piece of code


• Often they also return values (the result of their code)

Dr Ivan Olier - Machine Learning 30

30

15
Function example

Dr Ivan Olier - Machine Learning 31

31

Modules, libraries and packages


• A module is basically a bunch of related code saved in a file with the extension .py
• Python packages are basically a directory of a collection of modules.
• A library is an umbrella term referring to a reusable chunk of code. Usually, a Python library
contains a collection of related modules and packages.
• It is often assumed that while a package is a collection of modules, a library is a collection of
packages.

Dr Ivan Olier - Machine Learning 32

32

16
Importing modules and packages
• We can import specific modules from this package
using the dot notation.
• For example, to import the dataset module from the
my_model package, we can use one of the following
code snippets:

• Or:

Dr Ivan Olier - Machine Learning 33

33

Useful Python libraries for Data Science


• NumPy – operations on multi-
dimensional arrays and matrices
• SciPy – used for scientific and
technical computing
• Matplotlib – standard but powerful
plotting library
• Pandas – data manipulation and
analysis
• Seaborn – high-level interface of
matplotlib
• Plotly – for interacting, publication-
quality graphs
• Statsmodels – statistical package

Dr Ivan Olier - Machine Learning 34

34

17
Scikit-learn
• For machine learning tasks this
module will introduce you one of
the most popular Python
machine learning libraries called
Scikit-learn.

Dr Ivan Olier - Machine Learning 35

35

Jupyter Notebooks
• One of the most common ways to write and execute
Python code is through the use of a Jupyter Notebooks
which are run directly in the browser.
• They are also useful for making notes through the use of
markdown and act as executable documents and should
be considered as a recipe for your data science task.
• One advantage of using Jupyter Notebooks is that code
can be executed directly in the browser allowing you to
instantly see the results.
• Notebooks can be exported as executable . py files which
can be used in your development pipeline.
• Notebooks ensure reproducibility by easily sharing the
notebook and associated data.

Dr Ivan Olier - Machine Learning 36

36

18
Jupyter
Notebooks

Dr Ivan Olier - Machine Learning 37

37

Anaconda
• Anaconda is a scalable
data science platform
which allows you to
combine but also
segregate different
packages and frameworks
to prove canned
environments.
• As these environments are
isolated from each other
they can be safely
modified and changed
without affecting any
other environments used.

Dr Ivan Olier - Machine Learning 38

38

19
Final remarks
• Coding is an important skill for Machine Learning.
• The expectation is that a data scientist knows more
than one programming language.
• The common type of coding is scripts, which are
pieces of codes that doesn’t have a GUI and that are
usually run interactively.
• Python has become the standard for data science.
Again, knowing other languages gives you options.
• Python is massively rich in packages, which are
typically well-maintained by the community.
• However, it is wise to keep an eye on new
technologies. The data science industry is constantly
and rapidly changing. It is very easy to become
obsolete!

Dr Ivan Olier - Machine Learning 39

39

20

You might also like