0% found this document useful (0 votes)
33 views14 pages

ICT2103 Full Book-Part-3

Uploaded by

adamsmith19833
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views14 pages

ICT2103 Full Book-Part-3

Uploaded by

adamsmith19833
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Getting familiar with Azure Notebook

Now, if you go into the Example1 folder, you will see this screen:

Click + to create a new Python program. Enter Example1 in


the Item Name.

Providea
name for
your
Python
code.
Make sure
to select
Python 3.6
from this
drop-down
menu.

Once the file Example1.ipynb is created, you can click on it and it


will be opened in the notebook. This notebook is called Jupyter
Notebook.

58
Getting familiar with Azure Notebook

Jupyter Notebook is one of the industry standards for performing data analytics
in Python. The interface is divided into cells. You can execute one cell at a time or
run all cells at once. This technique of dividing code into cells allows for
programming that divides the code into components. For example, you can have
a cell that accesses data from an external file. You can run this cell once and then
use the other cells to perform the required analysis. Therefore, there is no need
to run a cell every time you execute your code.

The screenshot below displays the main components of Jupyter


Notebook.
Delete cell

Code cells

Add anew
cell

The Cell menu


allows you to run
code in a
particular cell or
all cells.

You can also run code by Pressing Control + Enter (Windows),


or cmd + Enter (Mac).

59
Example 1: Accessing Excel files and getting familiar with the data set

In this example we will use an open source data set from the Dubai Knowledge
and Human Development Authority (KHDA). The data can be downloaded
from the KHDA website we will use the following data set from KHDA:

This data set is in Excel format and contains many sheets.

Our focus will be on the sheet “Census 2015-2016”. This sheet contains
student enrollment and graduation from private higher education
institutions in Dubai for both undergraduate and postgraduate programs.

Tasks:
In this example, we will perform the following tasks:

60
Example 1: Accessing Excel files and getting familiar with the data set

You need to download the data from the KHDA website and
upload it to the Data folder.

The filename
with the full
extension.

This line
imports
pandas and
creates a
pd object.

As our file is includedin


the Data folder that is
located parallel to the The name ofthe
Example folder, we targeted sheet.
need to move it up one
level using .. and then
go into the Data folder
data using /Data

The above line creates a DataFrame and stores it in the object data1.
To display the first 5 rows of the DataFrame you can use the command
head().

The output we will use looks like this.

61
pandas DataFrame

Let's examine the pandas DataFrame in more detail.

These are the Columns

Index

Rows
Index: The index is Column zero in the DataFrame. The
default index is the integer 0 with length of 1 of the
DataFrame. The index is used to speed up
searching large data. Later we will learn how to
change the index of a DataFrame.

Columns: These are the names of the columns as in the


original Excel data. By looking carefully at these
columns, we will note that they are not the
appropriate columns for the data set. The correct
column names are in the third row. To get to the
correct columns we need to skip the first 4 rows.
We will learn later how to rename columns and
create new columns.
Rows: Rows are the actual data excluding columns and
index. You may notice that a number of cells have
data NaN. These are the missing or unavailable
data from our data source. Later we will learn how
to deal which such missing data.

62
Skipping unnecessary rows and columns

We have seen from the previous page that our columns start from row 4.
Pandas allows you to skip unwanted rows when reading the data. Here is how
to do it.

We basically added a comma then skiprows=4 to the original


line that reads data from the Excel file. We also named the new
DataFrame as data2.

Now, let us see the first five rows:

Now our DataFrame looks better. However, we still have a few


issues to deal with. The first column after the index seems
unnecessary, and so do the last two columns.
In this data set we need to use data columns from “Higher
Education Institution” to “Graduated_Post Graduate”.

63
Display data in one or more columns

If you want to focus your analysis on one or a few columns, you can do so by
selecting only those desired columns from the DataFrame.

Here is how to select one column from the DataFrame:

Column name as
it appears in the
DataFrame

Data[‘ColumnName’]

This is your
DataFrame. It can
be any name that
you chose to store
your data.

Example:
Display data stored in column
“Question1” only.

data[“Question1”]

64
Displaying unique values in a column

Finding unique values in a column is needed to perform analysis on your data.


For example, you have a record of sales that includes sales for the same
product. Finding only unique products is important for your data analysis. The
function .unique() helps you perform this task.

Example:

The file Superstore.xlsx contains one large sales file. You are
required to read this file and show the unique values stored in the
field “Ship Mode”.

Reading the file and displaying the head:

Here is the line of code to display the unique values of “Ship Mode”.

65
Describing the data of a column

To display the summary of a column, including the number of records,


minimum, maximum, mean, and standard deviation, you need to use the
function .describe() as follows:

Data[‘ColumnName’].describe()

Example:

Display the summary of column


“Question1”.

Data[‘Question1’].describe()

The above summary shows that column “Question1” contains 26


records, mean is 19.57, standard deviation is 4.11, the minimum value in
the column is 12.0, and the maximum value is 25. This is a quick
snapshot of the data in this column.

66
Slice data using index

Slicing data is a technique which is used to create small sets of your large
data. In this section we will use the index (the first column, which starts
with the value of zero) to slice our data.

Example:
Create a sub dataset from your
DataFrame that starts from index row
5 to index row 9.
Our DataFrame in this example is
stored in the variable data.

The structure of the command line is:

DataFrame[Start:End+1]

In this case, the code is:

data[5:10]

Exercises:

1. Display data in rows 2 to 15.


2. Display data in rows 2 to 15, but skip one row every time.
Use DataFrame[start:end+1:step]

67
Slice data using conditions

You can also slice data using conditions based on the value of one or
more columns.

Example:
Find the students who scored less
than 60 in the total.

The structure of the command line is:

DataFrame[Condition]

The condition in this case is:


data["Total"]<60
The full command is:
data[data["Total"]<60]

Exercises:

1. Display the IDs of students who scored 25 in Question1.


2. Display the IDs of students who scored 20 in Question2.
3. Display the IDs of students who scored more than or equal to
15 in Question4.

68
Slice data using more than one condition

To slice data using more than one condition, you need to put each
condition inside a small bracket of this type: (). You also need to use a
logical operator. The symbol & is used for and. The symbol | is used for
or.

Example:
find students with total >= 80 and
<=90

The structure of the command line is:

DataFrame[(Condition) & (Condition)]

The condition in this case is:

(data["Total"]>=80) & (data["Total"]<=90)

The full command is:


data[(data["Total"]>=80) & (data["Total"]<=90)]

Exercises:
1. Display the IDs of students who scored more than 22 in
Question1 and more than 23 in Question2.
2. Display the IDs of students who scored more than 20 in all
questions.
3. Display the IDs of students who scored more than or equal to
15 in Question4 and less than 85 in total.
4. Display the IDs of students who scored more than 22 in any
of the four questions.

65
Working with loc in pandas

The pandas loc function allows one to search and slice data based on both
index and columns. It is a powerful tool that lets us focus on the important
rows and columns for our data analytics.

Represents The colon This This


the first row in separates represent the comma
your targeted the start last row in separates
data. If you and end your targeted rows and
want data of the data. If you columns.
starting from rows. It is want all data It is a
row zero,then a 'must- to the end of 'must-
leave this have'. the set, then have'.
empty. leave this
empty.

The name of Please note Here you specify The colon Here you specify
your the use of the first column separates the last column
DataFrame square name. Please the start name. Please
object. In our brackets. note that you and endof note that you
example this Curved must usecolumn the should use
is data2. brackets names and not columns. column names
will not numbers. It is a and not numbers.
work. 'must-
have'.

Example:

Display rows 5 to 10 and only columns “Question1” and “Question2”.

data.loc[5:10,"Question1":"Question2"]

Note that you need to use


the index of the rows and
the name of the column.

In this example the index is


5:10 and the columns are
“Question1”:”Question2”

70
Working with loc in pandas

You can display columns that are not in sequence. For example, you can
display Question1 and Question2.

To display selected columns or rows, you need to add them inside a


square bracket [ ].

Example:

Display rows 3, 8, and 20 and columns


“Question1” and “Question4”.

data.loc[[3,8,20],["Question1","Question4"]]

Exercises:

1. Display rows 14 and 18 of StudentID and Total.


2. Display rows 7, 9, and 16 of Question4 and Total.
3. Display all rows of StudentID and Total.

71

You might also like