0% found this document useful (0 votes)
9 views

06 - The Basics of Python in DS

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

06 - The Basics of Python in DS

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 94

Introduction to data science

Python for data science


Introduction to Python
Basic types and control flow
Intro to Python: part 1

 Intro to IDLE, Python


 Keyword: print
 Types: str, int, float
 Variables
 User input
 Saving your work
 Comments
 Conditionals
 Type: bool
 Keywords: and, or, not, if, elif, else
IDLE: Interactive DeveLopment Environment

 Shell
 Evaluates what you enter and displays output
 Interactive
 Type at “>>>” prompt

 Editor
 Create and save .py files
 Can run files and display output in
shell
A simple example: Hello world!

Syntax highlighting:
 IDLE colors code differently depending on

functionality: Keywords, string, results

Some common keywords:


print if else and or class while for
break elif in def not from import
Fundamental concepts in Python

 Variable: a memory space that is named.


 Names of variables include letters, digits or underscore (_);
names of variables can not begin by a digit and:
 Can not be a keyword
 Can use the digits in Unicode (from Python 3. version)
 All variables in Python are the objects. They have a type
and a location in memory (id)

6
Fundamental concepts in Python

 Variables in Python:
 Case sensitive
 No need to declare in program
 No need to specify the data type
 Can be changed to another data type
 The values of variables should be assigned as soon as they appear
 Basic data types:
 String: str
 Number: integer, float, fraction and complex
 list, tuple, dictionary
7
Fundamental concepts in Python

 Variables
>>> someVar = 2
 To re-use a value in multiple >>> print someVar # it’s an
int
computations, store it in a 2
>>> someVar = “Why hello there”
variable. >>> print someVar # now str
Why hello there
 Python is “dynamically-
typed”, so you can change
the type of value stored.
(differs from Java, C#,
C++, …)
Fundamental concepts in Python

 Input data for variables


 <variable’s name>= input(“string”)
 The users can change the data type when input data
< variable’s name >= <data type>(input (‘title’))
A=int(input(‘a=‘))
 Display data
 print (Value1, Value2,….)
 print(“string”, value)
 Display strings in one line: adding end=‘’ to print statement.
 Examples:

9
Data types in Python

 String
 We’ve already seen one
type in Python, used for
words and phrases.
 In general, this type is
called “string”.
 In Python, it’s referred to as
str.
Data types in Python

Python also has types >>> print 4


4
# int

for numbers.
>>> print 6. # float
6.0

int – integers
>>> print 2.3914 # float
2.3914

float – floating
point (decimal)
numbers
Data types in Python

 int
 In Python 3.X, int has unlimited range.
 Can be used for the computations on very large numbers.
 Common operators

Operations Examples
The division with x+y 20 + 3 = 23
the rounded
x–y 20 – 3 = 17
result
x*y 20 * 3 = 60
x/y 20 / 3 = 6.666
x // y 20 // 3 = 6
x%y 20 % 3 = 2

Exponent x ** y 20**3 = 8000


Unary operators

12
Data types in Python

 int
 Examples

13
Data types in Python

 Float
 Values: distinguished from integers by decimals. The integer part and
the real part are separated by ‘.’
 Operators: +, –, *, /, ** and unary operators
 Use decimal for higher precision: from decimal import *

14
Data types in Python - bool

Boolean values are true or >>> a


>>> b
= 2
= 5
false. >>> a
False
> b

>>> a <= b
True
Python has the values True >>> a
False
== b # does a equal b?

and False (note the capital >>> a != b # does a not-equal b?


True
letters!).

You can compare values


with ==, !=, <, <=, >, >=,
and the result of these
expressions is a bool.
Data types in Python - bool

When combining >>> a = 2


>>> b = 5
Boolean expressions, >>> False == (a > b)
True
parentheses are your
friends.
Keywords: and, or, not

and is True if both >>> a


>>> b
= 2
= 5
parts evaluate to >>> a
False
< b and False

True, otherwise >>> a < b or a == b


True
False >>> a < b and a == b
False
>>> True and False

or is True if at least False


>>> True and True

one part evaluates to True


>>> True or False
True , otherwise True

False
Keywords: and, or, not

and is True if both


parts evaluate to True, >>> not True
otherwise False False
>>> not False
True

or is True if at least >>> True and (False or not True)


one part evaluates to False
>>> True and (False or not False)
True , otherwise False True

not is the opposite of


its argument
Mathematics functions in Python

The list of numeracy and mathematics modules


 math: Mathematical functions (sin() etc.).
 cmath: Mathematical functions for complex numbers.
 decimal: Declaration of general decimal arithmetic forms
 random: Generates "pseudo" random numbers with
normal probability distribution functions.
 itertools: Functions that generate “iterators” used in
efficient loops
 functools“: Functions and operations have higher priority
on callable objects
 operator: All standard Python operators are built in
Mathematics functions in Python

 math
exp(x)
log(x[, base])
log10(x)
pow(x, y)
sqrt(x)
acos(x)
asin(x)
atan(x) atan2(y, x)
cos(x) hypot(x, y)
sin(x) tan(x)
degrees(x) radians(x)
cosh(x) sinh(x) tanh(x)
Constant number: pi, e
Built-in functions in Python
Conditionals: if, elif, else
Conditionals: if, elif, else

 The keywords if and else provide a way to


control the flow of your program.
 Python checks each condition in order, and
executes the block (whatever’s indented) of the
first one to be True
Conditionals: if, elif, else

The keywords if, elif, and else provide a way to control


the flow of your program.
Python checks each condition in order, and executes the block
(whatever’s indented) of the first one to be True.
Conditionals: if, elif, else

Indentation is important in
Python!

Make sure each if, elif,


and else has a colon after it,
and its block is indented one
tab (4 spaces by default).
Conditionals: if, elif, else

Make sure you’re careful what you compare to the result of


raw_input. It is a string, not a number.

# The right way: str to str or int to int


>>> gradYear = raw_input(“When do you plan to graduate? ”)
When do you plan to graduate? 2019
>>> gradYear == 2019 # gradYear is not an integer
False
>>> gradYear == “2019”
True # gradYear is a string :(
>>> int(gradYear) == 2019 # cast gradYear to an int :)
True
Conditionals: if, elif, else

Make sure you’re careful how to compare the result of raw_input.


It is a string, not a number.
Doing it wrong leads to a ValueError:

>>> gradYear = raw_input(“When do you plan to graduate? ”)


When do you plan to graduate? Sometime
>>> int(gradYear) == 2019

Traceback (most recent call last):


File “<pyshell#4>”, line 1, in <module>
int(gradYear) == 2019
ValueError: invalid literal for int() with base 10: ‘sometime’
Nested if

 Syntax :  Example:
var = 100
if condition1: if var < 200:
tasks_1 print (“The value of variable is less than 200")
if condition2: if var == 150:
tasks_2 print (“The value is 150")
elif condition3: elif var == 100:
tasks_3 print (" The value is 100")
else elif var == 50:
tasks print (" The value is 50")
elif condition4: elif var < 50:
tasks_4 print (" The value of variable is less than 50")
else: else:
tasks_5 print (“There is no true condition")
Nested if

 Example:
var = int(input('Enter a value: '))
if var < 200:
print (“The value of variable is less than 200")
if var == 150:
print (“The value is 150")
elif var == 100:
print (" The value is 100")
elif var == 50:
print (" The value is 50")
elif var < 50:
print (" The value of variable is less than 50")
else:
print (“There is no true condition")
Exersise

 Enter the coordinates of 3 points A, B and C on the 2-


dimensional plane. Let's check if triangle ABC is an
equilateral triangle
Exersise

 Enter the coordinates of 3 points A, B and C on the 2-


dimensional plane. Let's check if triangle ABC is an
equilateral triangle
WHILE, FOR loops - Syntax
WHILE loop - Examples

 Example 1: while without else


count = 0
while (count < 5):
print (‘Your sequence number is :', count)
count = count + 1
WHILE loop - Examples

• Example 2: while with else


count = 0
while count < 5:
print (count, " is less than 5")
count = count + 1
else:
print (count, " is not less than 5")
FOR loop - Examples

 Example 1: FOR without else


for i in range (0,10):
print ('The sequence number is:',i)
FOR loop - Examples

• Example 2: FOR with else


for i in range(0,10):
print (‘The sequence number is:',i)
else:
print (‘The last number!')
Nested loops - Exersise

 Example: Find all prime numbers that are less than 100
Nested loops - Exersise

 Example: Find all prime numbers that are less than 100
i=2
while(i < 100):
j=2
while(j <= (i/j)):
if not(i%j): break
j=j+1
if (j > i/j) : print (i, " is a prime number!")
i=i+1
Lists

 A sequence of items
 Has the ability to grow (unlike array)
 Use indexes to access elements (array notation)
 examples
aList = []
another = [1,2,3]
 You can print an entire list or an element
print another
print another[0]
 index -1 accesses the end of a list
List operation

 append method to add elements (don't have to be the same


type)
aList.append(42)
aList.append(another)
 del removes elements
del aList[0] # removes 42
 Concatenate lists with +
 Add multiple elements with *
zerolist = [0] * 5
 Multiple assignments
point = [1,2]
x , y = point
 More operations can be found at
http://docs.python.org/lib/types-set.html
Exercises

1. Enter a string, check if it is a valid email address or not?


(a valid email can be considered as containing the @
letter)

2. Given a random sequence A including 100 integer


elements (values in range of 1 and 300), separate all the
odd elements into another sequence (B)
Exercises

2. Given a random sequence A including 100 integer


elements (values in range of 1 and 300), separate all the
odd elements into another sequence (B)
Exercise 2

 Given a random sequence A including 100 integer


elements (values in range of 1 and 300), separate all the
odd elements into another array (B).
Interacting with user

 Obtaining data from a user


 Use function raw_input for strings or input for
numbers
 Example
name = raw_input("What's your name?")
 Command Line arguments
 Example:
import sys
for item in sys.argv:
print item
 Remember sys.argv[0] is the program name
Libraries in Python
Libraries in Python

 Data processing in Python

 Numpy

 Matplotlib

 Pandas

 Scikit-learn
Data processing in Python

 Basic data types: string, number, boolean


 Other data types: set, dictionary, tuple, list, file

 The errors in Python


 Syntax error: errors in syntax, programs can
not be compiled.
 Exception: abnormalities occur that are not as
designed
Data processing in Python

Deal with the exceptions: using up to 4 blocks


 “try” block: code that is likely to cause an error. When an error
occurs, this block will stop at the line that caused the error
 “except” block: error handling code, only executed if an error
occurs, otherwise it will be ignored
 “else” block: can appear right after the last except block, the
code will be executed if no except is performed (the try block
has no errors)
 “finally” block: also known as clean-up block, always executed
whether an error occurs or not
Data processing in Python

Deal with the exceptions: using up to 4 blocks


Data processing in Python
Numpy

 The main object of numpy is


homogeneous multidimensional arrays:
 The data types of elements in the array must
be the same
 Data can be one-dimensional or multi-
dimensional arrays
 The dimensions (axis) are numbered from 0
onwards
 The number of dimensions is called rank.
 There are up to 24 different number types
 The ndarray type is the main class that
handles multidimensional array data
 Lots of functions and methods for handling
matrices
Numpy

 Syntax: import numpy [as <new name>]


 Create array:
<variable name>=<library name>.array(<value>)
 Access: <variable name>[<index>]
 Examples:
import numpy as np
x = np.arange(3.0)
a = np.zeros((2, 2))
b = np.ones((1, 2))
c = np.full((3, 2, 2), 9)
d = np.eye(2)
e = np.random.random([3, 2])
Numpy

 Examples:
Numpy
 Access by index (slicing)
import numpy as np
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
row_r1 = a[1, :] #1-dimensional array of length 4
row_r2 = a[1:2, :]
# 2-dimensional array 2x4
print(row_r1, row_r1.shape)
# Display "[5 6 7 8] (4,)"
print(row_r2, row_r2.shape)
# Display "[[5 6 7 8]] (1, 4)"

col_r1 = a[:, 1] # 1-dimensional array of length 3


col_r2 = a[:, 1:2] print(col_r1,
# 2-dimensional array 3x2
col_r1.shape) print(col_r2,
col_r2.shape) #Display "[ 2 6 10] (3,)“

# Display "[[ 2]
[ 6]
Numpy

import numpy as np
x = np.array([[1, 2 ] , [ 3 , 4 ] ] , dtype=np.float64)
y = np.array([[5, 6 ] , [ 7 , 8 ] ] , dtype=np.float64)

print(x + y) # print(np.add(x, y)),


print(x - y) # print(np.subtract(x, y))
print(x * y) # print(np.multiply(x, y))
print(x / y) # print(np.divide(x, y))
print(np.sqrt(x)) # applied for all elements of x
print(2**x) # applied for all elements of x
mathplotlib

 “matplotlib” is a library specializing in plotting,


extended from numpy
 “matplotlib” has the goal of maximally simplifying
charting work to "just a few lines of code“
 “matplotlib” supports a wide variety of chart types,
especially those used in research or economics
such as line charts, lines, histograms, spectra,
correlations, errorcharts, scatterplots, etc.
 The structure of matplotlib consists of many parts,
serving different purposes
mathplotlib

 Necessary condition: available data


 There can be 4 basic steps:
Step 1: Choose the right chart type
 Depends a lot on the type of data
 Depends on the user's intended use
Step 2: Set parameters for the chart
 Parameters of axes, meaning, division ratio,...
 Highlights on the map
 Perspective, fill pattern, color and other details
 Additional information
Step 3: Draw a chart
Step 4: Save to file
mathplotlib

 Some charts drawn by using matplotlib


mathplotlib

 Some charts drawn by using matplotlib


mathplotlib

 The graph shows the correlation between X and Y


 Syntax:
plot([x], y, [fmt], data=None, **kwargs)
plot([x], y, [fmt], [x2], y2, [fmt2], ..., **kwargs)
 “fmt” is the line drawing specification
 “data” is the label of the data
 **kwargs: line drawing parameter
 Plot multiple times on one chart
 The returned result is a list of Line2D objects
mathplotlib

 fmt = '[color][marker][line]‘
 [colors] :
 ‘b’ – blue
 ‘g’ – green
 ‘r’ –red
 ‘c’ – cyan
 ‘m’ – magenta
 ‘y’ –yellow
 ‘b’ – black
 ‘w’ –white
 #rrggbb – chỉ ra mã màu theo hệRGB
mathplotlib

Line plot
 [marker] – the notation for data:
 ‘o’ – circle
 ‘v’ – (‘^’, ‘<‘,‘>’)
 ‘*’ – star
 ‘.’ – dot
 ‘p’ – pentagon
 …
 [line] – line type:
 ‘-’ solid line
 ‘--‘ dash
 ‘-.’ dotted line
 ‘:’
mathplotlib

Example – Line plot


import numpy as np
import matplotlib.pyplot as plt
# divide the interval 0-5 with the step of 0.2
t = np.arange(0., 5., 0.2)
# Draw three lines:
# - red dash line: y = x
# - blue, square marker : y = x^2
# - green, triangle marker: y = x^3
plt.plot(t, t, 'r--', t, t**2, 'bs', t, t**3, 'g^')
plt.show()
mathplotlib

Example – Line plot


mathplotlib

Example – Bar plot


import matplotlib.pyplot as plt
D = { ‘MIS': 60,
‘AC': 310,
‘AAI': 360,
‘BDA': 580,
‘FDB': 340, ‘MKT': 290 }
plt.bar(range(len(D)), D.values(), align='center')
plt.xticks(range(len(D)), D.keys())
plt.title(‘The majors in IS')
plt.show()
mathplotlib

Example – Bar plot


mathplotlib

Example – Pie plot


mathplotlib

Example – subplot
import numpy as np
import matplotlib.pyplot as p l t
x1 = np.linspace(0.0, 5.0)
x2 = np.linspace(0.0, 2.0)
y1 = np.cos(2 * np.pi * x1) * np.exp(-x1)
y2 = np.cos(2 * np.pi * x2)
plt.subplot(2, 1, 1)
plt.plot(x1, y1, 'o-')
plt.subplot(2, 1, 2)
plt.plot(x2, y2, '.-')
plt.show()
Pandas

 “pandas” is an extension library from numpy,


specializing in processing tabular data
 The name “pandas” is the plural form of “panel
data”
Pandas

 Read data from multiple formats


 Data binding and missing data processing
integration
 Rotate and convert data dimensions easily
 Split, index, and split large data sets based on
labels
 Data can be grouped for consolidation and
transformation purposes
 Filter data and perform queries on the data
 Time series data processing and sampling
Pandas

 Pandas data has 3 main structures:


 Series: 1-dimensional structure, uniform data array
 Dataframe (frame): 2-dimensional structure, data
on columns is identical (somewhat like table in
SQL, but with named rows)
 Panel: 3-dimensional structure, can be viewed as
a set of dataframes with additional information
 Series data is similar to the array type in
numpy, but there are two important differences:
 Accept missing data (NaN – unknown)
 Rich indexing system (like a dictionary?)
Pandas

 General syntax:
pd.DataFrame(data, index, columns, dtype, copy)
 In there:
 ‘data’ will receive values from many different types such
as list, dictionary, ndarray, series,... and even other
DataFrames
 ‘index’ is the column index label of the dataframe
 ‘columns’ is the row index label of the dataframe
 ‘dtype’ is the data type for each column
 ‘copy’ takes the value True/False to indicate whether
data is copied to a new memory area, default is False
Pandas

 Syntax:
pandas.Panel(data, items, major_axis, minor_axis, dtype, copy)
 In there:
 ‘data’ can accept the following data types: ndarray,
series, map, lists, dict, constants and other dataframes
 ‘items’ is axis = 0
 ‘major_axis’ is axis = 1
 ‘minor_axis’ is axis = 2
 ‘dtype’ is the data type of each column
 ‘copy’ takes the value True/False to determine whether
the data shares memory or not
Pandas – Series

import pandas as pd
import numpy as np

chi_so = ["KT", "KT", "CNTT", "Co khi"] #duplicated


gia_tri = [310, 360, 580, 340]
S = pd.Series(gia_tri, index=chi_so)
KT 310
print(S) KT 360
print(S.index) CNTT 580
Cokhi 340
print(S.values)
dtype: int64
Index(['KT', 'KT', 'CNTT', 'Co k h i ' ] , dtype='object')
[310 360 580 340]
Pandas – Series

Functions on Series
 S.axes: returns a list of indexes of S
 S.dtype: returns the data type of S's elements
 S.empty: returns True if S is empty
 S.ndim: returns the dimension of S (1)
 S.size: returns the number of elements of S
 S.values: returns a list of elements of S
 S.head(n): returns the first n elements of S
 S.tail(n): returns the last n elements of S
Pandas – Series

Operations on Series
import pandas as pd import numpy as np

chi_so = ["Ke toan", "KT", "CNTT", "Co khi"]


gia_tri = [310, 360, 580, 340]
# If the index is the same, combine it, otherwise NaN

CNTT 680.0
S = pd.Series(gia_tri, index=chi_so) Co khi NaN
P= pd.Series([100, 100], ['CNTT', 'PM']) KT NaN
Y= S +P Ke NaN
print(Y) toan NaN
dtype:
PM float64
Pandas - Frame

 Create dataframe from list

names_rank = [['MIT',1],["Stanford",2],["DHTL",200]] df
= pd.DataFrame(names_rank)
0 1
print(df) 0 MIT 1
1 Stanford 2
2 DHTL 200
Pandas - Frame

 Create dataframe from list

names_rank = [['MIT',1],["Stanford",2],["DHTL",200]] df
= pd.DataFrame(names_rank)
0 1
print(df) 0 MIT 1
1 Stanford 2
2 DHTL 200
Pandas - Panel

• Panels are widely used in


econometrics
 The data has 3 axes:
 Items (axis 0): each item is an
internal dataframe
 Major axis (axis 1 – main axis):
lines
 Minor axis (axis 2 - minor axis):
columns
• No further development (replaced
by MultiIndex)
Pandas - Panel

Syntax:
pandas.Panel(data, items, major_axis, minor_axis, dtype, copy)
In there:
 ‘data’ can accept the following data types: ndarray,
series, map, lists, dict, constants and other dataframes
 ‘items’ is axis = 0
 ‘major_axis’ is axis = 1
 ‘minor_axis’ is axis = 2
 ‘dtype’ is the data type of each column
 ‘copy’ takes the value True/False to determine whether
the data shares memory or not
scikit- learn (sklearn)
Basic machine learning problem classes
scikit- learn (sklearn)

 Linear regression
 Data clustering
 Data layering
Linear regression

import matplotlib.pyplot as plt


import pandas as pd
import numpy as np
from sklearn import linear_model, metrics

# reading data from file csv


df = pd.read_csv("nguoi.csv", index_col = 0)
print(df)

#Draw the figure


plt.plot(df.Cao, df.Nang, 'ro')
plt.xlabel(‘Height (cm)')
plt.ylabel(‘Weight (kg)')
plt.show()
Linear regression

 Using old data, adding


gender column
(Nam/Nu)
 Using the old method, to
see how gender affects
weight
Linear regression

import matplotlib.pyplot as plt


import pandas as pd
import numpy as np
from sklearn import linear_model, metrics

df = pd.read_csv("nguoi2.csv", index_col = 0)
print(df)

df['GT'] = df.Gioitinh.apply(lambda x: 1 if x=='Nam' else 0)


print(df)
Linear regression

#Training model
X = df.loc[:, ['Cao‘, 'GT']].values
y = df.Nang.values
model = linear_model.LinearRegression()

model.fit(X, y)

# Show the information of model


mse = metrics.mean_squared_error(model.predict(X), y)
print(“Mean squared error: ", mse)
print(“Regression coefficient : ", model.coef_)
print(“Intercept: ", model.intercept_)
print(f"[weight] = {model.coef_} x [height, sex] +{model.intercept_}")
Linear regression

#Applying model into some cases


while True:
x = float(input(“Enter the height (0 for stop): "))
if x <= 0: break
print(“Male with the height ", x, “ cm, will have the weight ", model.predict([[x,1]]))
print(" Female with the height ", x, "cm, will have the weight ", model.predict([[x,0]]))
scikit- learn (sklearn)

 Data clustering
from sklearn.cluster import Kmeans
 Data layering
from sklearn.naive_bayes import GaussianNB
from sklearn import tree
Classification
Classification
Clustering
Clustering
Exercises

Choose three of the following models:


- Linear regression
- Classification based on Naïve Bayes
- Classification based on SVM
- K-means clustering
- FCM clustering
Apply the selected models on 2 datasets taken
from the standard data set of the computer

You might also like