Python For Data Science 2025 Slides
Python For Data Science 2025 Slides
Pasty Asamoah
+233 (0) 546 116 102
[email protected]
Big Data
DEALING WITH BIG DATA
We develop capabilities to leverage big data to drive
performance!!
BIG DATA - FACEBOOK
https://www.youtube.com/watch?v=_r97qdyQtIk
THE DATA SCIENCE CONCEPT
DATA SCIENCE
Source: https://indico.ictp.it/event/7658/session/10/contribution/58/material/slides/0.pdf
DATA SCIENCE APPLICATIONS
Source: https://indico.ictp.it/event/7658/session/10/contribution/58/material/slides/0.pdf
DATA SCIENCE ECOSYSTEM
DATA SCIENCE ECOSYSTEM
• These are
called sources
DATA PIPELINES EXPLAINED
Source: https://data.cs.sfu.ca/QjZo/slides.pdf
DATA QUALITY DIMENSIONS
Source: https://data.cs.sfu.ca/QjZo/slides.pdf
COMMON TYPES OF DATA PIPELINES
https://www.youtube.com/watch?v=VDR8qGmyEQg
DATA PROTECTION & ETHICS
DIMENSIONS OF DATA PROTECTION
PRIVACY SECURITY
• Proper handling, processing, storage • Protecting information from
Source: https://data.cs.sfu.ca/QjZo/slides.pdf
DATA SCIENCE LIFECYCLE
Source: https://data.cs.sfu.ca/QjZo/slides.pdf
NEXT STEPS
TO THIS END…
We know that;
• the destination for data pipelines are either data lakes or data
warehouses (there are of course others like data mart) which are
simply databases.
Pasty Asamoah
+233 (0) 546 116 102
[email protected]
Well, create a
Table for
each data set.
DATABASES
source: w3schools.com
DATA TYPES – (NUMERIC)
source: w3schools.com
DATA TYPES – (DATE & TIME)
source: w3schools.com
CONSTRAINTS
Constraints are the rules enforced on the data columns of a table. These are used to limit the type of data that can go into a
table. This ensures the accuracy and reliability of the data in the database.
Source: https://data.cs.sfu.ca/QjZo/slides.pdf
ASPECTS OF DATA INTEGRITY
User-Defined Integrity
1. Do you see that students, courses, and instructors information are put
together in a single table?
2. What happens to the records if instructor “Peter” changes his name to “John
Doe” ?
ORIGINAL TABLE
Source: https://byjus.com/gate/first-normal-form-in-dbms/
2NF
SECOND NORMAL FORM (2NF)
A table is in 2NF when it is in 1NF with no partial
dependency (an attribute in a table depends on only a part
of the primary key and not the whole key)
ORIGINAL TABLE
Source: https://byjus.com/gate/second-normal-form-in-dbms/
3NF
THIRD NORMAL FORM (3NF)
A table is in 3NF when it is in 2NF with transitive
dependency
ORIGINAL TABLE
Source: https://byjus.com/gate/third-normal-form-in-dbms/
DENORMALIZATION
• Characteristics of entities
ERD - RELATIONSHIP
The SQL SHOW DATABASE statement is used to list all SQL databases
Example:
The SQL INSERT statement is used to add new record of data to a table in the database
Syntax:
Example 1: Example 2:
Syntax:
OR
Example 1: Example 2:
Example 3:
Syntax:
Example 2:
Example 1:
UPDATE CUSTOMERS SET
UPDATE CUSTOMERS SET
NAME=“Emmanuel Ackah”
NAME=“Esther Nana Ama Amoh”
WHERE (AGE > 20) AND (AGE < 40);
WHERE ID=1;
SQL – DELETE QUERY
The SQL DELETE statement is used to delete existing record from a table
Syntax:
Example 1: Example 2:
DELETE FROM CUSTOMERS WHERE ID=1; DELETE FROM CUSTOMERS WHERE AGE < 18
STRUCTURED QUERY LANGUAGE - III
ADVANCED CONCEPTS IN SQL
In this lecture, we’re focusing on the basics. For a mastery and a more
advanced concepts like stored procedures, using JOINS, conditions,
aggregate functions, etc., these platforms present impressively free
tutorials for your attention.
PERSONAL ASSIGNMENT
o https://www.programiz.com/SQL
o https://www.w3schools.com/sql/
o https://www.codecademy.com/learn/learn-sql
o https://www.tutorialspoint.com/sql/index.htm
o https://www.sqltutorial.org/
DATABASE MANAGEMENT SYSTEMS
DATABASE MANAGEMENT SYSTEMS
Introduction to
Python
Programming
ANY QUESTIONS??
INTRODUCTION TO PYTHON FOR DATA SCIENCE
Pasty Asamoah
+233 (0) 546 116 102
[email protected]
https://dev.mysql.com/get/Downloads/MySQLGUITools/mysql-
workbench-community-8.0.34-winx64.msi
MYSQL WORKBENCH INSTALLATION
CREATE CONNECTION
START MYSQL SERVER
INTERFACE
LAB ACTIVITIES
CONTINUITY
DATA ENGINEERING LAB WORK
DOWNLOAD JAVA SE
https://dev.mysql.com/get/Downloads/MySQLGUITools/mysql-
workbench-community-8.0.34-winx64.msi
JAVA SE INSTALLATION
https://www.youtube.com/watch?v=SQykK40fFds
DOWNLOAD PENTAHO
https://privatefilesbucket-community-edition.s3.us-west-
2.amazonaws.com/9.4.0.0-343/ce/client-tools/pdi-ce-9.4.0.0-343.zip
LAB ACTIVITIES
CONTINUITY
NEXT STEPS
Introduction to
Python
Programming
ANY QUESTIONS??
INTRODUCTION TO PYTHON FOR DATA SCIENCE
Pasty Asamoah
+233 (0) 546 116 102
[email protected]
Python Anaconda
https://www.python. https://repo.anaconda
org/ftp/python/3.12. .com/archive/Anacond
1/python-3.12.1- a3-2023.09-0-
amd64.exe Windows-x86_64.exe
PYTHON SYNTAX
The set of rules that defines how a Python program will be written and
interpreted.
Indentation
Defines a block of
code
Syntax Error
Credit: w3schools
HELLO WORLD! – Programming Tradition
Notes:
Emmanuella is wrapped
in a quotation mark.
Print(username) is not in
a quotation mark
Print(‘Female’) is
wrapped in a quotation
mark
RULES IN NAMING VARIABLES
• A variable name:
Credit: w3schools
PYTHON KEYWORDS
Keywords are predefined, reserved words used in Python programming that
have special meanings to the compiler.
!!! These keywords are reserved for the python programming languages
Credit:programiz
COMMENTS
• In computer programming, comments are hints that we use to make our
code more understandable. They are completely ignored by the
interpreter. In python, we use the # symbol for commenting.
Notes:
Credit: programiz
DATA TYPES
DATA TYPES
• In computer programming, data type refers to the type of value a variable
holds. The data type of a variable ensures that mathematical, relational or
logical operations can be applied without causing an errors. Python
supports the following:
Credit: w3schools
DATA TYPES
Credit: w3schools
NUMERIC DATA TYPES
Notes:
Concatenation. We
But how do I know the data type of a variable ?
can also use the +
symbol
Credit: programiz
CHECK DATA TYPE
• In python programming, to know the data type of a variable, we use the
type() function. Of course you don’t know what functions are. We’ll talk
about them in moments. For now, understand that we use the type()
function to get the data type of a variable.
But how do I convert between data types? Well… Notice ‘complex’,’float’, ‘int’
Credit: programiz
DATA TYPE CONVERTION
• We can easily switch between data types. Pay close attention to the results
after we checked the data types. They returned something like:
<class ‘int’>. Now, to convert any numeric value to an integer, we use int()
Wrapped in float()
function
variable X = 23
Variable Y = “12”
Tasks:
1. Print the data type of X
2. Print the data type of Y
3. Compute Z = X+Y such that the result of Z = 35
STRING DATA TYPE
• In python programming, to know the strings are enclosed in quotation
marks “”. For instance, 22 is an integer but “22” is a string. We can use
the single or double quotes to create string variables in python
Not in quote
In quote
Credit: programiz
STRING MANIPULATION: LEN()
• There are several operations on strings. For instance, we can get the
length of the string, slice parts of the string, check values in a string, etc.
18 characters
Credit: programiz
CHECK STRING EXISTENCE: IN
• To check if a set of characters are present in the string, we use the IN
keyword. The result is a Boolean: True / False
Boolean result
Credit: programiz
CHECK STRING NOT EXIST: NOT IN
• To check if a set of characters are not present in the string, we use the
NOT keyword. Basically the NOT keyword is for negation. The result is
a Boolean: True / False
Boolean result
Credit: programiz
STRING SLICING
• Sometimes you may want to slice a portion of a string. Imagine you have
the string “Hello world”, but your interest is the text “world”. To slice
those characters from the string, we leverage the slicing technique
-i -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1
+i 0 1 2 3 4 5 6 7 8 9 10
char H e l l o w o r l d
• Variable[start index : end index] # default start and end is 0:0 respectively
• The length of your result will always be = end index – start index
Credit: programiz
STRING SLICING
• https://www.w3schools.com/python/python_strings_methods.asp
Credit: w3schools
STRING FORMATTING
• There are different approaches in formatting string.
Use + to
concatenate strings
Credit: w3schools
LIST DATA TYPE
• In python programming, lists are used to store multiple items in a single
variable. Have you realized we have been creating single items per
variable? There are instances where you may need to store more than one
item Variable name
0 1 2 3 4
apple banana cherry apple cherry
Credit: w3schools
REPLACE & ADD TO LIST
• Replace
• Add
Credit: w3schools
REMOVE FROM LIST
• Remove
Credit: w3schools
LIST METHODS
• Reference: https://www.w3schools.com/python/python_lists_methods.asp
Credit: w3schools
TUPLE DATA TYPE
• Tuples are used to store multiple items in a single variable. A tuple is a
collection which is ordered and unchangeable. Tuples are written with
round brackets.
Variable name
items
Variable name
items
It removed all
duplicated values
Credit: w3schools
MANIPULATING SETS
• ADD
• UPDATE
Set 1
Set 2
Credit: w3schools
DICTIONARY DATA TYPE
• Dictionaries are used to store data values in key:value pairs. Dictionary
items are ordered, changeable, and do not allow duplicates.
Values
Use colon
Dictionary keys
thisdict[“brand”]
Credit: w3schools
OPERATORS
OPERATORS
• In programming, operators are used to perform operations on variables
and values. There are several of them in python
o Arithmetic operators
o Assignment operators
o Comparison operators
o Logical operators
o Etc.
Credit: w3schools
ASSIGNMENT OPERATORS
• Assignment operators are used to assign values to variables:
The statement
is not true
Credit: w3schools
COMPARISON OPERATORS
• Comparison operators are used to compare two values
The statement
is true
Credit: w3schools
LOGICAL OPERATORS
• Logical operators are used to combine conditional statements.
The statement
is not true
Credit: w3schools
CONDITIONAL CONTROLS
IF-STATEMENT
• In computer programming, the if statement is a conditional statement. It
is used to execute a block of code only when a specific condition is met.
For example, Suppose we need to assign different grades to students
based on their scores:
colon
Credit: programiz
IF-STATEMENT
Pay attention to the little space. Note that the print function is But what if the statement or condition returned False?
not directly under the “if”. This indicates that, the print We can handle that with an else block
statement belongs to the body of the if-statement
Credit: w3schools
IF-ELSE-STATEMENT
Credit: w3schools
LOOPS
LOOPS
• In programming, a loop is a control flow statement that is used to
repeatedly execute a group of statements as long as the condition is
satisfied. Such a type of statement is also known as an iterative statement
• For loop
• While loop
FOR-LOOP
• A for loop is used for iterating over a sequence (that is either a list, a
tuple, a dictionary, a set, or a string). The python for-loop is less like for-
loops in several other programming languages
Credit: programiz
FOR-LOOP
Items to loop
through
Variable names. It
can be anything
Credit: w3schools
FOR-LOOP: BREAK AND CONTINUE
• The break and continue keywords are important for stepping out of the
loop and skipping items respectively.
Credit: w3schools
FOR-LOOP: BREAK AND CONTINUE
• The break and continue keywords are important for stepping out of the
loop and skipping items respectively.
Credit: w3schools
NESTED -FOR-LOOP
Credit: w3schools
WHILE-LOOP
• With the while loop we can execute a set of statements as long as a
condition is true. The implication is that, we need to keep track of an
updating element.
Credit: programiz
WHILE-LOOP
Execute as long as this
condition is true
Credit: w3schools
FUNCTIONS
FUNCTIONS
• A function is a block of code which only runs when it is called. You can
pass data, known as parameters, into a function. A function can return
data as a result.
Credit: programiz
FUNCTIONS
• Defining and calling a function Function definition
NB: Until you make a function call, it will never get executed
Credit: w3schools
FUNCTIONS WITH ARGUMENT
• Defining functions with argument
Parameter
argument
FUNCTIONS WITH ARGUMENT
• Defining functions with argument
Placeholder/ parameter
Credit: w3schools
FUNCTIONS RETURN VALUES
• So far, our functions do not return values that are reusable. We can use
the return keyword to achieve that. Note that, functions that return values
do not automatically print to the screen.
Credit: w3schools
FUNCTIONS RETURN VALUES
• So far, our functions do not return values that are reusable. We can use
the return keyword to achieve that. Note that, functions that return values
do not automatically print to the screen.
Assigning result to a
variable. This is possible
because we are returning the
result after the computation x
+y
Credit: w3schools
OBJECTED-ORIENTED PROGRAMMING (OOP)
CLASS AND OBJECT
• Python is an object oriented programming language. Almost everything
in Python is an object, with its properties and methods. A Class is like an
object constructor, or a "blueprint" for creating objects.
Credit: programiz
CLASS DEFINITION
• It is very simple to create a class in python
Class name
Class properties
Class methods
Credit: programiz
CLASS DEFINITION
Credit: programiz
CLASS INSTANTIATION
Credit: programiz
CLASS CONSTRUCTORS
constructor
Credit: programiz
CASE
Credit: programiz
MODULES
MODULE
• Consider a module to be the same as a code library. It is a file containing
a set of functions you want to include in your application. To create a
module just save the code you want in a file with the file extension .py
function
MODULE
• Create another file in the same directory as the mathematics.py and name
it: use.py
• Note that we imported all the codes in the mathematics file but we used
only the add function.
• https://www.programiz.com/python-programming
• https://www.w3schools.com/python/
• https://www.javatpoint.com/python-tutorial
• https://www.youtube.com/watch?v=QXeEoD0pB3E&list=PLsyeobzWx
l7poL9JTVyndKe62ieoN-MZ3
NEXT STEPS
Data Pre-processing
ANY QUESTIONS??
INTRODUCTION TO PYTHON FOR DATA SCIENCE
Pasty Asamoah
+233 (0) 546 116 102
[email protected]
Credit: w3schools
INTRODUCTION TO PANDAS
PANDAS
Pandas is a Python library used for working with data sets.
Data Manipulation
e.g., transformation
READING DATA
We need to import the pandas package to use it.
Credit: w3schools
SERIES
In Pandas, series is just like a column in a table.
Credit: w3schools
LOCATING ROW VALUE
Locating row value is similar to indexing
Credit: w3schools
DATA TYPE CONVERTION
Locating row value is similar to indexing
Reading file
Credit: w3schools
SNAPSHOT OF THE DATA
HEAD AND TAIL
We can use the head() and tail() methods to have a snapshot of the topn
and lastn details of our data
Top 3 records
Last 3 records
Columns
Rows
DATAFRAME COLUMNS
Dataframe columns are the headings in the table.
Columns
For instance these two queries are valid and will return the temperature
column. However, the first approach can be used only when the column
name does not contain space:
Unique values in
temperature column
CONDITIONAL SELECTION
Sometimes, you may want to select columns or rows based on some conditions
Covariance matrix
Correlation matrix
Correlation matrix
EVERYTHING AT A GLANCE
HANDLING MISSING VALUES
ANY MISSING VALUES?
Missing values can be handled in several ways. For instance, we may want to
drop, impute, interpolate, or even fill it with some values (e.g., average)
Have missing
values
WHEN ENTIRE ROW IS MISSING
We may want to drop
Drop row
Drop row
forward
backward
forward
backward
No row is duplicated
DUPLICATES
Duplicated rows affect results. We handle them by deleting them.
No row is duplicated
Drop duplicates
NEXT STEPS
TO THIS END…
We know;
Introduction to
Machine Learning
ANY QUESTIONS??
INTRODUCTION TO PYTHON FOR DATA SCIENCE
Pasty Asamoah
+233 (0) 546 116 102
Kwame Nkrumah University of Science and Technology
School of Business
Supply Chain and Information Systems Dept.
DATA VISUALISATION & STORY TELLING
INTRODUCTION
It is essential to choose the right visualization to communicate and tell the story
behind the data.
Key insights and understandings are lost when data is ineffectively presented,
which affects the story behind the data.
??
CHALLENGES OF DATA
VISUALISATION
Nominal Comparisons
Time-series
Correlations
Ranking
Distribution
Part-to-whole relationship
Ideal for visualizing chronological data Ideal for visualizing chronological data
with long category names
TECHNIQUES AND VISUALISATIONS
Line Charts
Used to show changes over time (time-series) by using data points represented by dots that are
connected by a straight line. Put differently, it shows time-series relationships with continuous data.
They help show trend, acceleration, deceleration, and volatility.
Scatter Plots
Used to show the relationship between two variables. They are best used to show correlation in large
data sets and identifying outliers.
Funnel
Used to visualize a linear process that has connected sequential stages. The value of each stage in the
process is indicated by the funnel's width as it gets narrower.
TECHNIQUES AND VISUALISATIONS
Cards
Mostly used to display KPIs. (e.g.) turnover
TECHNIQUES AND VISUALISATIONS
Guage
A gauge consists of a circular arc which shows a singular value that measures progress towards a KPI or
goal. The line on the arc represents the target or goal and the shading represents the progress made
towards it. The value inside of the arc shows the progress value.
TECHNIQUES AND VISUALISATIONS
Map
Used for visualizing data across different locations and distances. (e.g.) Answer questions on cities or
countries and the related data such as number of employees, sales, etc.
TECHNIQUES AND VISUALISATIONS
Treemap
Used to display large quantities of hierarchically structured data, using nested rectangles. The chart
shows different perspectives of the data by displaying the rectangles as different sizes and colors based
on the frequency of occurrence. It is not ideal for visualising large categories
Output/graph
Markers
Credit: w3schools
MARKERS
Credit: w3schools
LINES
Lines
Credit: w3schools
LINES
Credit: w3schools
LINES
• Lines have other properties that allows for modifying colors,
line width, etc.
color = ‘red’
linewidth = '20.5'
Credit: w3schools
MULTIPLE LINES
Credit: w3schools
LABELS
• Have you noticed that all our visuals do not communicate any
specific insight?
• Pyplot allow users to set labels to define the information
communicated. E.g., Title, x-axis, y-axis
Credit: w3schools
ADDING LABELS
xlabel
Credit: w3schools
Title
ADDING TITLE
Y-values
Title
Credit: w3schools
MULTIPLE COLUMNS PLOT 1 row, 2 columns subplots
First subplot
Second subplot
Credit: w3schools
MULTIPLE ROWS PLOT 2 rows, 1 column subplots
First subplot
2 rows, 1 column
Second subplot
2 rows, 1 column
Credit: w3schools
MULTIPLE ROWS AND COLUMNS PLOT
Credit: w3schools
SCATTER PLOTS
2 rows, 1 column
Credit: w3schools
SCATTER PLOTS
• Note that the scatter method allows for setting marker colors as
well
Credit: w3schools
BAR PLOTS
• We can use the bar method for bar charts
• The bar plots has properties that allow for changing colors, bar
width, height, etc.
• Color = (string)
• Width = (float)
• Height = (float)
Credit: w3schools
BAR PLOTS
Credit: w3schools
BAR PLOTS
Credit: w3schools
HORIZONTAL BARS (barh)
Credit: w3schools
PIE CHARTS
• We can use the pie method visualize data using a pie chart
Credit: w3schools
PIE CHARTS
• We can override the default colors
Credit: w3schools
CHART LEGEND
• We can easily define a legend for the chart
Legend
Credit: w3schools
NEXT STEPS
TO THIS END…
We know;
Introduction to
Machine Learning
ANY QUESTIONS??
INTRODUCTION TO PYTHON FOR DATA SCIENCE
Pasty Asamoah
+233 (0) 546 116 102
[email protected]
learns to make
decisions by
interacting with an
environment
MACHINE LEARNING MODELS
Machine learning models can range from simple linear regression to
complex deep neural networks.
Clean data
Select model Check accuracy
Split data
OUR FIRST MACHINE LEARNING MODEL
Import
packages
Load data
DATA CLEANING
Handle duplicates
There are no
missing values
DATA CLEANING
Column data
types
We will be
working with
the integer data
types at this
stage.
FEATURE SELECTION
Predictors
What we want
to predict
MODEL SELECTION
Define: What type of model will it be? A decision tree? Some other
type of model? Some other parameters of the model type are
specified too.
The predictions
DECISION TREE
SIMPLE DECISION TREE MODEL
Clean data
Select model Check accuracy
Split data
DECISION TREE MODEL
Machine learning models can range from simple linear regression to
complex deep neural networks.
Decision Tree
DECISION TREE Import decision tree from sklearn
model
Train model
Make predictions
Predicted VS
Actual are the
same. That is a
100% accuracy.
BUT WHY??
LETS MODIFY OUR MODEL BY
INTRODUCING TRAINING AND TEST
DATASETS
We realized that our model performed well with an accuracy of 100%.
This is unlikely in real-world scenerios.
The reason for the 100% accuracy is that, we were trying to predict Y
values with X values that the model has seen before. The model saw
it in the Training Stage
What about testing our model on data that the model has not seen
before??
Dataset for
training
Dataset for
testing
MODEL SELECTION
Train dataset
Test dataset
MODEL PERFORMANCE
Checks error
margin
Error margin
LETS MODIFY THE MODEL A BIT BY
SPECIFYING LEAVES
Importing LabelEncoder
LABEL ENCODERS’
Columns of interest. We
believe that these columns
predict house prices. We
need to convert them to
numerical forms
TRANSFOMING CATEGORICAL COLUMNS
Instantiate Label encoder Transform values Categorical column
to convert
ADD TRANSFORMED COLUMNS TO
DATAFRAME
Select columns based on data types. Drop the price column. By default, it will be included because
Exclude columns with data type object we are selecting all columns other than objects.
DUMMIES columns
Pandas method to handle
categorical columns
Select columns based on data types. Drop the price column. By default, it will be included because
Exclude columns with data type object we are selecting all columns other than objects.
Task 1: Build a model with either linear
regression or decision tree and report on the
best model. Remember to apply all skills and
knowledge you have acquired especially
splitting data set into training and testing, and
encoding categorical columns
ENSEMBLE MODELS
RANDOM FOREST MODEL
Ensemble models combine multiple individual models to improve predictive
performance. A popular ensemble method is RandomForest, but there are
others like Gradient Boosting and AdaBoost.
ANY QUESTIONS??