Python 20
Python 20
Submitted To-
Dr.Md.Mawarul Islam
Associate Professor
Submitted By-
Susmita Rani Saha (B180305047)
Tanvir Ahammed Hridoy (b180305020)
Jagannath university
String
1
Lists
Python has a data type known as list. Lists are same as arrays. That is, List is a
collection that allows us to put many variable in a single variable.
Zeros array:
We can declare an array that fill with zero.
Example:
2
Iterate through lists:
We use loop for iterate list. Loop is used to repeat a block of code until the
specified is met. For access elements in a list we can use loop.
Example:
3
Basic methods:
There’s some basic method, that can directly modify the lists.
Append() method:
Append or add an item to the end of the list.
Syntax: list1.append(item)
Example:
Insert() method:
Insert an item at the specified index.
Syntax: list1.insert(index, item)
Example:
Remove() method:
Remove the first occurrence of item from the list.
Syntax: list1.remove(item)
4
Example 1:
Extend() method:
Using this method we can append another list to the list.
Syntax: list1.extend(list2)
Example:
Count() method:
This method returns the number of times element occurs in the list.
5
Syntax: list1.count(element)
Example:
Sort() method:
For sort the elements of list we use sort method. Sort items in a list in
ascending order.
Syntax: list.sort()
Example:
Reverse() method:
This reverse method uses for reverse the list. That is, it reverses the order of
items in the list.
Syntax: list.reverse()
Example:
6
Copy() method:
Its copy the elements of a list and return copied list.
Syntax: list1=list.copy()
Example:
Pop() method:
This method removes and returns an element at the given index.
Syntax: n=list.pop(index)
Example:
7
Clear() method:
This method removes all items from the list.
Syntax: list.clear()
Example:
Len() method:
If we measure the length of array then we can use len() method, this method
return the length of list.
Syntax: n=len(list)
Example:
Slicing method:
Using Slicing method, we can get la sub list of a list. We can access elements
in range. We can get all elements using slice operator.
Syntax: list[star-inclusive : end-exclusive]
Example:
8
Split() method:
This method returns a list that split a string. String is converted to the elements
of list.
Example:
Mathematical methods:
There’s some methods using for mathematical operations.
Sum() method:
Using this method we can sums up the numbers in the list.
Syntax: n=sum(list)
Example:
9
Max() method:
For finding maximum value in list we can use max method.
Syntax: n=max(list)
Example:
Min() method:
For finding minimum value in list we can use min method.
Syntax: n=min(list)
Example:
10
Dictionary
A dictionary associates a simple data value called a key (most often string) with
a value. And values can be of any python data type.
Syntax: dic {key1, value, ……}
Create a dictionary:
Create a dictionary named grades which contains name as key and grade as
value of dictionary.
Example:
11
Added a new entry:
We can add a new entry that is, key and value pair of the dictionary.
Example:
12
Some methods:
Values() methods:
This method returns all values in dictionary.
Syntax: Dic.values()
Example:
Keys() method:
This method returns all keys in dictionary.
Syntax: Dic.keys()
Example:
13
Some Python libraries:
i. Numpy
ii. Pandas
iii. Scipy
iv. Scikit-learn
We will discuss about only numpy and pandas.
14
Numpy
Numpy is a popular python library. It is the fundamental package needed for
scientific computation with python.
It features:
i. Multidimensional array,
ii. Fast numerical computation,
iii. High level math function,
Arrays:
Structured lists of numbers.
Two types:
i. Vectors (single dimensional array)
ii. Matrices (Multidimensional array)
15
Basic properties (dimension, shape, data type):
For knowing the dimension (1D,2D) of dictionary we can use ndim method. It
returns the dimensions.
Syntax: array.ndim
For knowing the shape (row, column) of dictionary we can use shape method. It
returns the shape of dictionary.
Syntax: array.shape
For knowing the data type of the elements of dictionary we can dtype method,
that returns the data type of elements.
Syntax: array.dtype
Example:
16
Array addition:
We can add two array using add operator for create a new array, that represent
the sum of this two array.
Example:
Array multiplication:
We can multiply two array using dot method and store this result in an array.
Example:
17
Some methods:
zeros() method:
Using this method we can create an array of all zeros elements.
Syntax: np.zeros((row,column), dtype=data type)
Example:
18
ones() method:
Using this method we can create an array(1D,2D) of all ones elements.
Syntax: np.ones((row,column), dtype=data type)
Example:
arange() method:
This method takes start index, end index and step size and create an array using
this info. Here start inclusive, end exclusive and step size by default 1.
Syntax: np.arange(start-inclusive, end-exclusive, step, dtype)
Example:
19
concatenate() method:
This method concatenate two arrays.
Syntax: np.concatenate([array1,array2])
Example:
astype() method:
This method use for type casting. It can change the data type of an array.
Syntax: np.astype(data type)
Example:
20
random.rand() method:
This method is use for generate random values from 0 to <1. That is range is
[0,1).
Syntax: np.random.rand(value) ,for single dimension
np.random.rand(row,column) ,for multidimension
Example 1:
Example 2:
21
linspace() method:
This method returns a numbers as sample numbers instead of step in arrange method.
This method takes –
Start=starting point inclusive
Stop=stop point inclusive
Num= how many numbers in samples to generate
Endpoint= it includes last point. It always True by default.
Retstep=if true than result the sampling rate. By default it false.
Dtype=data type
Syntax: np.linspace(start,stop,num=n,endpoint=True,retstep=False,dtype=type)
Example:
22
Pandas (Part 1)
Pandas is a Python library for data manipulation and analysis. It provides data
structures and functions for working with structured data, such as tabular or time
series data.
Pandas provides a wide range of functions for manipulating and analyzing data,
such as filtering, sorting, grouping, merging, pivoting, and aggregating. It also
has built-in support for handling missing data, time series data, and categorical
data.
Pandas is widely used in data analysis and scientific computing, and is often used
in conjunction with other Python libraries such as NumPy, Matplotlib, and Scikit-
Learn.
23
Series:
Series is like one dimensional array like other languages. It can store any data
type and it have an index this is by default in numeric value.
Create Series:
First we have to import pandas library, Then create a series just like the
example.
We can store any type of value in series and also assign user-defined
labels to the index and use them to access elements of a Series.
24
Index also can be any data type. In this example I use string as data type
in index.
25
We can set index value but we have to ensure that index size must be
matched with the NumPy array size. If index is not declared is take
numeric automatically.
If the index size is not matched with the array size it throw error just like
the example.
26
Creation of Series from Dictionary:
27
Accessing Elements of a Series:
We can also access an element of the series using the positional index 3
and 2 positions value is showed here.
28
We simply can access the positional value without index.
The index value can be changed of a series and put a new index for the
existing series.
Slicing:
29
There is a difference between slicing and indexing, in indexing we only
can access the value which is given. But in slicing we can access a range
for example seriesCapCntry[0:3] we can access 0 to 2 positional index
value because 3 use here exclusive.
If labelled indexes are used for slicing, then value at the end index label
is also included in the output just like the example.
We can get the series in reverse order just like the example.
seriesName(starting_index : ending_index : step)
30
We can use slicing to modify the series. In the example we use
seriesAlpha[1:6]=99 that means from 1 to 5 index the value is updated
to 99. Updating the values in a series using slicing excludes the value at
the end index position
We can use labelled index slicing for update values. In this type the
end index position is inclusive.
31
Attributes of Series:
We can access certain properties called attributes of
a series by using that property with the series name.
We can assign a name of the series just like the example and assign a
name to the index of the series.
32
We can create a empty series and check it weather the series is empty or
not. seriesCapCntry.empty prints True if the series is empty, and False
otherwise.
Methods of Series:
There are some methods that are available for Pandas Series which give
the flex to the user.
head(n) -> Returns the first n members of the series. If the value for n is
not passed, then by default n takes 5 and the first five members are
displayed.
count() -> Returns the number of size of the series. It not include the
non-NaN values.
tail(n) -> Returns the last n members of the series. If the value for n is
not passed, then by default n takes 5 and the last five members are
displayed.
33
Mathematical Operations on Series:
34
Addition of two Series:
We can add two series like [seriesA+seriseB] it will add values based
on the index value but if in one series there is not present a index value
it will show NaN in the addition.
But if we don’t want to place NaN then we have to use
[ seriesA.add(seriesB, fill_value=0) ] like that it will add 0 by default
where there is absence of value.
35
It is same as addition just it will divide two series values and all the
properties as same as addition.
DataFrame:
We learn before about pandas series, but Sometimes we need to work
on multiple columns at a time, i.e., we need to process the tabular data. Pandas
store such tabular data using a DataFrame.
A DataFrame is a two-dimensional labelled data structure like a table
of MySQL. It contains rows and columns, and therefore has both a row and
column index. Each column can have different data type value.
36
Creation of DataFrame:
We can convert NumPy array into DataFrame by simply pass the array
into DataFrame [ dFrame4 = pd.DataFrame(array1) ] .
We can create a DataFrame using more than one n-dimension arrays just
like the example.
37
Creation of DataFrame from List of Dictionaries:
We can create DataFrame from a list of Dictionaries just like the
example.
38
Creation of DataFrame from Series:
39
If an individual dictionary element doesn’t contain any value it will put
NaN to that position.
40
Operations on rows and columns in DataFrames
If we assign value in the existing column name the column value will be
modified, it will not create a new column at the end.
41
Adding a New Row to a DataFrame:
We can add a new row to a DataFrame using the DataFrame.loc[ ]
method. In the example, we add a new row which is English.
We can set all the value of the DataFrame into one value as
ResultDF[: ] = Value. In the example we converted all value into 0.
42
Deleting Rows or Columns from a DataFrame:
We can use the DataFrame.drop() method to delete rows and columns
from a DataFrame. If we put axis value is 0 it will delete the specified
row on the other had putting axis value 1 it will delete specified column.
43
Renaming Row Labels of a DataFrame:
We can change the labels of rows and columns in a DataFrame using
the DataFrame.rename() method. In the following example Hindi,
Maths, English, Bangla to sub1, sub2, sub3, sub4. In the axis field we
have to put the value ‘index’ to rename row.
We can choose which row name I want to change. If I don’t want change
any row name we have to leave just as it is.
44
Renaming Column Labels of a DataFrame:
We can alter the column name in a DataFrame using the
DataFrame.rename() method. In the axis field we have to put the value
‘columns’ to rename column.
45
When a single column label is passed, it returns the column as a Series.
In the example it will show Riya result in list format.
Boolean Indexing:
Boolean means a binary variable that can be either True or False. In the
following example if the student result is greater than 90 it will show
True otherwise False.
46
To check in which subjects ‘Arnab’ has scored more than 90, we can
write:
47
We may use a slice of labels with a slice of column names to access
values of those rows and columns:
48
Joining, Merging and Concatenation of DataFrames
Joining:
We can use the pandas.DataFrame.append() method to merge two
DataFrames. It appends rows of the second DataFrame at the end of the
first DataFrame. If there the second DataFrame column is not present
in the first DataFrame it will add new column.
49
In the previous example the column level is not sorted order, if we want
to sort the join DataFrame in column order we can set the parameter
sort=True.
If we don’t want to sort the Dataframe in column level we can set the
parameter sort=False.
50
when we do not want to use row index labels we can set ignore_index
=True. By default in the append function ignore_index = False.
Attributes of DataFrames:
51
If we want to transpose the DataFrame we can use [ DataFrame.T ].
Means, row indices and column labels of the DataFrame replace each
other’s position
52
[ DataFrame.empty ] return a Boolean value if the DataFrame is empty it
return True otherwise False.
53
DataFrame.values display all the values in the DataFrame without the axes
labels.
DataFrame.dtypes display the data type of each column in the DataFrame.
54
Importing a CSV file to a DataFrame:
We can load the data from the ResultData.csv file into a DataFrame, In the
example using Pandas read_csv() function as shown below:
55
Pandas (Part 2)
Create dataframe:
For store the result data in dataframe we first create a dataframe from a
dictionary of list using pandas.
Example:
Descriptive Statistics:
Descriptive statistics are used to summarize the given data. We will applied
statistical method to a DataFrame. These are –
i. Max
ii. Min
iii. Count
iv. Sum
56
v. Mean
vi. Median
vii. Mode
viii. Quartiles
ix. Variance
x. Standard deviation
Numerical_only:
If we want to find the maximum value for the column that have numeric
numbers than we have to set numerical_only=True in these method.
Syntax: df.max(numerical_only=True)
Example:
Relational operators:
If we want to calculate max value based on specific condition than we can
use relational operator and apply methods.
Syntax: df2=df[df[‘ut’]==2].max(numerical_only=True)
print(df2)
57
or,
df2=df[df.ut==2]
df2.max(numerical_only=True)
or,
df[‘Maths’].min()
58
Axis:
Calculate maximum value row wise then use axis=1, if column wise then
use axis =0
Syntax: df.max(axis=1)
Example:
59
Calculate Maximum values:
If we want to calculate maximum value for each column then we can simply use
max function.
Syntax: dataframe.max()
Example:
60
Calculate Maximum values:
If we want to calculate minimum value for each column then we can simply use
min function.
Syntax: dataframe.min()
Example:
61
Calculate sum of values:
We can calculate sum of each column.
Syntax: df.sum()
We can also use parameters like numerical_only ,axis or relational operator.
Example: Calculate sum for specific entity for each sub only.
62
Calculate Number of values:
For calculate total number of values in each column or row than use count
method. Can use parameters.
Syntax: df.count()
Example:
63
Calculate mean:
If we want to calculate the mean (average) of each column or row then use
mean method. We can use parameters.
Syntax: df.mean()
Example:
64
Calculate median:
If we want to calculate the middle value of each column or row then use medin
method. We can use parameters.
Syntax: df.median()
Example:
Calculate mode:
If we want to calculate the value that is appears most numbers of times in data
of each column or row then use mode method. We can use parameters.
Syntax: df.mode()
Example:
65
Calculate quartile:
If we want to calculate the quartile value of each column or row then use
quantile method. We can use parameters. And special parameters for this
method is q. If q=.25 then denote first quartile,
If q=.75 then denote third quartie,
By default it denote second quartile that is median value.
Syntax: df.quantile()
Example 1: For a single column
66
Example 2: For multiple column
67
Calculate variance:
It is the average of squared differences from the mean. If we want to calculate
the variance of each column or row then use var method. We can use
parameters.
Syntax: df.var()
Example :
68
Describe() method:
This method display the descriptive statistical values in a single command.
Syntax: df.describe()
Example:
69
Data Aggregations:
Aggregation means to transform the dataset and produce a single numeric value.
Can be applied to one or more columns together. We can use one or more
statistical method(max,min,sum,count,std,var,mean,mode,median) together.
Syntax: df.aggregation(‘function name’)
Example 1: Single function using aggregation
70
Example 2: Multiple aggregation function in a single statement
71
Example 3: Multiple aggregation function in a single statement with axis
parameter.
Sorting a dataframe:
Sorting refers to the arrangement of data elements in a specified order,which can
either be ascending and descending. For sorting dataframe we can use sort_value
method.
Syntax: df.sort_value(by=[‘label’],axis=0,ascending=True) (by default)
Example 1: sort by single attribute/column
72
Example 2: sort by multiple attributes/columns
73
Group by function:
Groupby function is used to split the data into groups based on some criteria. This
function works based on a split-apply-combine strategy which is shown below
using a 3-step process:
Step 1: Split the data into groups by creating a groupby object from the original
DataFrame.
Step 2: Apply the required function(size,sum,mean,get_group…).
Step 3: Combine the results to form a new DataFrame.
Syntax: g1=df.groupby(‘column name’)
Df1=g1.size()
Example 1: display the first entry from each group
74
Example 2: display the size of each group
75
Example 4: display all groups data
76
Example 7: calculate average of each group with single attribute
Example 8: calculate statistical data of each group with single attribute and multiple
aggregate functions
77
Altering the index:
Depending on our requirements, we can select some other column to be the
index or we can add another index column (specially in slicng).
Syntax: df.reset_index(inplace=True)
Example 1: In slicing, altering the index
Example 2: In slicing, drop the original index after creating new index
78
Example 3: Select another column as index and then reset the index
Set -
Reset-
79
Reshaping data:
The way a dataset is arranged into rows and columns is referred to as the shape of
data. Reshaping data refers to the process of changing the shape of the dataset to
make it suitable for some analysis problems.
For reshaping data, two basic functions are available in Pandas,
i. pivot and
ii. pivot_table.
Pivot:
The pivot function is used to reshape and create a new DataFrame from the original
one. In previous section, we have to slice the data corresponding to a particular
attribute and then apply the statistical method for finding descriptive statistical data.
But reshaping has transformed the structure of the data, which makes it more
readable and easy to analyze the data.
80
Example :
81
Pivot table:
Duplicate data can’t be reshaped using pivot function. That’s why we may have to
use pivot_table function instead. It works like a pivot function, but aggregates the
values from rows with duplicate entries for the specified columns.
The default aggregate function is mean.
Syntax:
pd.pivot_table(data,values=None,index=None,columns=None,aggfunc=’mean’)
The parameter aggfunc can have values among sum,max, min, len, np.mean,
np.median wherever we have duplicate entries.
For calculating mean,median we have to import numpy as np.
Example:
82
Handling missing value:
As we know that a DataFrame can consist of many rows (objects) where each row
can have values for various columns (attributes). If a value corresponding to a
column is not present, it is considered to be a missing value. A missing value is
denoted by NaN. Missing values create a lot of problems during data analysis and
have to be handled properly. The two most common strategies for handling missing
values explained in this section are:
i. drop the object having missing values,
ii. fill or estimate the missing value
Isnull() method:
Pandas provide a function isnull() to check whether any value is missing or not in
the DataFrame. This function checks all attributes and returns True in case that
attribute has missing values, otherwise returns False.
We can check for each individual attribute also.
83
Syntax: df.isnull()
Example:
Isnull().any() method:
To check whether a column (attribute) has a missing value in the entire dataset, any()
function is used. It returns True in case of missing value else returns False.
We can check for each individual attribute also.
Syntax: df.isnull().any()
Example:
84
Isnull().sum() method:
To find the number of NaN values corresponding to each attribute, one can use the
sum() function along with isnull() function.
Syntax: df.isnull().sum()
Example:
85
Isnull().sum().sum() method:
To find the total number of NaN in the whole dataset, one can use this method.
Syntax: df.isnull().sum().sum().
Example:
86
Dropping missing values:
Missing values can be handled by either dropping the entire row having missing
value or replacing it with appropriate value. Dropping will remove the entire row
(object) having the missing value(s). The dropna() function can be used to drop an
entire row from the DataFrame.
Syntax: df.dropna()
Example:
Fillna(num) method:
87
The fillna(num) function can be used to replace missing values by the value specified
in num.
i. fillna(0) replaces missing value by 0.
ii. fillna(1) replaces missing value by 1.
Syntax: df. fillna(num)
Example:
fillna(method=’pad’) method:
This method replaces the missing value by the value before the missing value.
Syntax: df.fillna(method='pad')
Example:
88
fillna(method=’bfill’) method:
This method replaces the missing value by the value after the missing value.
Syntax: df.fillna(method='bfill')
Example:
89
END
90