0% found this document useful (0 votes)

33 views14 pages

ICT2103 Full Book-Part-3

Uploaded by

adamsmith19833

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views14 pages

ICT2103 Full Book-Part-3

Uploaded by

adamsmith19833

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Getting familiar with Azure Notebook

Now, if you go into the Example1 folder, you will see this screen:

Click + to create a new Python program. Enter Example1 in

the Item Name.

Providea
name for
your
Python
code.
Make sure
to select
Python 3.6
from this
drop-down
menu.

Once the file Example1.ipynb is created, you can click on it and it

will be opened in the notebook. This notebook is called Jupyter
Notebook.

58
Getting familiar with Azure Notebook

Jupyter Notebook is one of the industry standards for performing data analytics
in Python. The interface is divided into cells. You can execute one cell at a time or
run all cells at once. This technique of dividing code into cells allows for
programming that divides the code into components. For example, you can have
a cell that accesses data from an external file. You can run this cell once and then
use the other cells to perform the required analysis. Therefore, there is no need
to run a cell every time you execute your code.

The screenshot below displays the main components of Jupyter

Notebook.
Delete cell

Code cells

Add anew
cell

The Cell menu

allows you to run
code in a
particular cell or
all cells.

You can also run code by Pressing Control + Enter (Windows),

or cmd + Enter (Mac).

59
Example 1: Accessing Excel files and getting familiar with the data set

In this example we will use an open source data set from the Dubai Knowledge
and Human Development Authority (KHDA). The data can be downloaded
from the KHDA website we will use the following data set from KHDA:

This data set is in Excel format and contains many sheets.

Our focus will be on the sheet “Census 2015-2016”. This sheet contains
student enrollment and graduation from private higher education
institutions in Dubai for both undergraduate and postgraduate programs.

Tasks:
In this example, we will perform the following tasks:

60
Example 1: Accessing Excel files and getting familiar with the data set

You need to download the data from the KHDA website and
upload it to the Data folder.

The filename
with the full
extension.

This line
imports
pandas and
creates a
pd object.

As our file is includedin

the Data folder that is
located parallel to the The name ofthe
Example folder, we targeted sheet.
need to move it up one
level using .. and then
go into the Data folder
data using /Data

The above line creates a DataFrame and stores it in the object data1.
To display the first 5 rows of the DataFrame you can use the command
head().

The output we will use looks like this.

61
pandas DataFrame

Let's examine the pandas DataFrame in more detail.

These are the Columns

Index

Rows
Index: The index is Column zero in the DataFrame. The
default index is the integer 0 with length of 1 of the
DataFrame. The index is used to speed up
searching large data. Later we will learn how to
change the index of a DataFrame.

Columns: These are the names of the columns as in the

original Excel data. By looking carefully at these
columns, we will note that they are not the
appropriate columns for the data set. The correct
column names are in the third row. To get to the
correct columns we need to skip the first 4 rows.
We will learn later how to rename columns and
create new columns.
Rows: Rows are the actual data excluding columns and
index. You may notice that a number of cells have
data NaN. These are the missing or unavailable
data from our data source. Later we will learn how
to deal which such missing data.

62
Skipping unnecessary rows and columns

We have seen from the previous page that our columns start from row 4.
Pandas allows you to skip unwanted rows when reading the data. Here is how
to do it.

We basically added a comma then skiprows=4 to the original

line that reads data from the Excel file. We also named the new
DataFrame as data2.

Now, let us see the first five rows:

Now our DataFrame looks better. However, we still have a few

issues to deal with. The first column after the index seems
unnecessary, and so do the last two columns.
In this data set we need to use data columns from “Higher
Education Institution” to “Graduated_Post Graduate”.

63
Display data in one or more columns

If you want to focus your analysis on one or a few columns, you can do so by
selecting only those desired columns from the DataFrame.

Here is how to select one column from the DataFrame:

Column name as
it appears in the
DataFrame

Data[‘ColumnName’]

This is your
DataFrame. It can
be any name that
you chose to store
your data.

Example:
Display data stored in column
“Question1” only.

data[“Question1”]

64
Displaying unique values in a column

Finding unique values in a column is needed to perform analysis on your data.

For example, you have a record of sales that includes sales for the same
product. Finding only unique products is important for your data analysis. The
function .unique() helps you perform this task.

Example:

The file Superstore.xlsx contains one large sales file. You are
required to read this file and show the unique values stored in the
field “Ship Mode”.

Reading the file and displaying the head:

Here is the line of code to display the unique values of “Ship Mode”.

65
Describing the data of a column

To display the summary of a column, including the number of records,

minimum, maximum, mean, and standard deviation, you need to use the
function .describe() as follows:

Data[‘ColumnName’].describe()

Example:

Display the summary of column

“Question1”.

Data[‘Question1’].describe()

The above summary shows that column “Question1” contains 26

records, mean is 19.57, standard deviation is 4.11, the minimum value in
the column is 12.0, and the maximum value is 25. This is a quick
snapshot of the data in this column.

66
Slice data using index

Slicing data is a technique which is used to create small sets of your large
data. In this section we will use the index (the first column, which starts
with the value of zero) to slice our data.

Example:
Create a sub dataset from your
DataFrame that starts from index row
5 to index row 9.
Our DataFrame in this example is
stored in the variable data.

The structure of the command line is:

DataFrame[Start:End+1]

In this case, the code is:

data[5:10]

Exercises:

1. Display data in rows 2 to 15.

2. Display data in rows 2 to 15, but skip one row every time.
Use DataFrame[start:end+1:step]

67
Slice data using conditions

You can also slice data using conditions based on the value of one or
more columns.

Example:
Find the students who scored less
than 60 in the total.

The structure of the command line is:

DataFrame[Condition]

The condition in this case is:

data["Total"]<60
The full command is:
data[data["Total"]<60]

Exercises:

1. Display the IDs of students who scored 25 in Question1.

2. Display the IDs of students who scored 20 in Question2.
3. Display the IDs of students who scored more than or equal to
15 in Question4.

68
Slice data using more than one condition

To slice data using more than one condition, you need to put each
condition inside a small bracket of this type: (). You also need to use a
logical operator. The symbol & is used for and. The symbol | is used for
or.

Example:
find students with total >= 80 and
<=90

The structure of the command line is:

DataFrame[(Condition) & (Condition)]

The condition in this case is:

(data["Total"]>=80) & (data["Total"]<=90)

The full command is:

data[(data["Total"]>=80) & (data["Total"]<=90)]

Exercises:
1. Display the IDs of students who scored more than 22 in
Question1 and more than 23 in Question2.
2. Display the IDs of students who scored more than 20 in all
questions.
3. Display the IDs of students who scored more than or equal to
15 in Question4 and less than 85 in total.
4. Display the IDs of students who scored more than 22 in any
of the four questions.

65
Working with loc in pandas

The pandas loc function allows one to search and slice data based on both
index and columns. It is a powerful tool that lets us focus on the important
rows and columns for our data analytics.

Represents The colon This This

the first row in separates represent the comma
your targeted the start last row in separates
data. If you and end your targeted rows and
want data of the data. If you columns.
starting from rows. It is want all data It is a
row zero,then a 'must- to the end of 'must-
leave this have'. the set, then have'.
empty. leave this
empty.

The name of Please note Here you specify The colon Here you specify
your the use of the first column separates the last column
DataFrame square name. Please the start name. Please
object. In our brackets. note that you and endof note that you
example this Curved must usecolumn the should use
is data2. brackets names and not columns. column names
will not numbers. It is a and not numbers.
work. 'must-
have'.

Example:

Display rows 5 to 10 and only columns “Question1” and “Question2”.

data.loc[5:10,"Question1":"Question2"]

Note that you need to use

the index of the rows and
the name of the column.

In this example the index is

5:10 and the columns are
“Question1”:”Question2”

70
Working with loc in pandas

You can display columns that are not in sequence. For example, you can
display Question1 and Question2.

To display selected columns or rows, you need to add them inside a

square bracket [ ].

Example:

Display rows 3, 8, and 20 and columns

“Question1” and “Question4”.

data.loc[[3,8,20],["Question1","Question4"]]

Exercises:

1. Display rows 14 and 18 of StudentID and Total.

2. Display rows 7, 9, and 16 of Question4 and Total.
3. Display all rows of StudentID and Total.

CH-6 Data Loading, Storage, and File Formats
No ratings yet
CH-6 Data Loading, Storage, and File Formats
163 pages
Python For Data Analysis: Dr. Kishore Kunal
100% (1)
Python For Data Analysis: Dr. Kishore Kunal
43 pages
CO3 - 1 - Pandas Series and Data Frame
No ratings yet
CO3 - 1 - Pandas Series and Data Frame
37 pages
For Assignment-3 (Final - Pandas - Lab)
No ratings yet
For Assignment-3 (Final - Pandas - Lab)
40 pages
Lecture 2 - Data Wrangling - Update
No ratings yet
Lecture 2 - Data Wrangling - Update
114 pages
Distributed System MCQ
67% (3)
Distributed System MCQ
10 pages
Unity For Beginners
100% (2)
Unity For Beginners
43 pages
Lecture Week2
No ratings yet
Lecture Week2
72 pages
Empowerment Technology 11 Q1 Mod2
100% (1)
Empowerment Technology 11 Q1 Mod2
71 pages
Apuntes Azure Data Scientist
No ratings yet
Apuntes Azure Data Scientist
397 pages
Chapter 4 Multimedia
No ratings yet
Chapter 4 Multimedia
57 pages
Pandas Practice
No ratings yet
Pandas Practice
7 pages
Chapter 1 - Part 2 - DataFrame
No ratings yet
Chapter 1 - Part 2 - DataFrame
48 pages
Chapter 4 - Python For Data Analysis
No ratings yet
Chapter 4 - Python For Data Analysis
47 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Justenoughpython Pandas 220915 175329
No ratings yet
Justenoughpython Pandas 220915 175329
64 pages
Python For Data Analysis Edgar
No ratings yet
Python For Data Analysis Edgar
49 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
DataFrame Ac Win Final
No ratings yet
DataFrame Ac Win Final
30 pages
Data Frames
No ratings yet
Data Frames
60 pages
Python Pandas-Data Frames
No ratings yet
Python Pandas-Data Frames
41 pages
Introduction To Data Science Using Python Part2
No ratings yet
Introduction To Data Science Using Python Part2
45 pages
Python For ML
No ratings yet
Python For ML
41 pages
Unit - 4 - Part 2
No ratings yet
Unit - 4 - Part 2
36 pages
Data Aggregation and Group Operations
No ratings yet
Data Aggregation and Group Operations
34 pages
Python Pandas - 2 2020-21
No ratings yet
Python Pandas - 2 2020-21
21 pages
Python For Data Science
No ratings yet
Python For Data Science
45 pages
Pandas Basics
No ratings yet
Pandas Basics
84 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
Pandas
No ratings yet
Pandas
26 pages
Working With Panda
No ratings yet
Working With Panda
13 pages
Cloud Raghav 1
No ratings yet
Cloud Raghav 1
97 pages
Python For Statistics
No ratings yet
Python For Statistics
40 pages
Ip Study
No ratings yet
Ip Study
18 pages
What Is Isometric Drawing Rev1
No ratings yet
What Is Isometric Drawing Rev1
24 pages
Panda Cheatsheet
No ratings yet
Panda Cheatsheet
17 pages
FactoryTalk View Site Edition Version 12 (CPR9 SR12) Design Considerations
No ratings yet
FactoryTalk View Site Edition Version 12 (CPR9 SR12) Design Considerations
70 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
Lab 1 ML Lab
No ratings yet
Lab 1 ML Lab
15 pages
Murali Internship
No ratings yet
Murali Internship
34 pages
Top Bim Questions - 1
No ratings yet
Top Bim Questions - 1
5 pages
Pandas PDF
No ratings yet
Pandas PDF
25 pages
Week 2 Laboratory Activity
No ratings yet
Week 2 Laboratory Activity
7 pages
How To Use AI To Create Better Images
No ratings yet
How To Use AI To Create Better Images
7 pages
Chapter Notes - Data Handling Using Pandas DataFrame
No ratings yet
Chapter Notes - Data Handling Using Pandas DataFrame
16 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
Software S Trimble Business Center 5.60 Software User Manual
No ratings yet
Software S Trimble Business Center 5.60 Software User Manual
13 pages
IP Imp Notes
No ratings yet
IP Imp Notes
5 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
Pandas
No ratings yet
Pandas
13 pages
Lab 9
No ratings yet
Lab 9
9 pages
Unit6 - Working With Data
No ratings yet
Unit6 - Working With Data
29 pages
Pandas - Digitalocean
No ratings yet
Pandas - Digitalocean
15 pages
Introduction To R
No ratings yet
Introduction To R
67 pages
Pandas Basics Guide
No ratings yet
Pandas Basics Guide
4 pages
Python-for-Data-Analysis (Pandas
No ratings yet
Python-for-Data-Analysis (Pandas
31 pages
Pandas
No ratings yet
Pandas
12 pages
Python For Data Science 1662157639
No ratings yet
Python For Data Science 1662157639
6 pages
Python ClassXII AI
No ratings yet
Python ClassXII AI
4 pages
Unit 4 Pandas
No ratings yet
Unit 4 Pandas
8 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Data Wrangling With Python and Pandas
No ratings yet
Data Wrangling With Python and Pandas
7 pages
Pandas Data Structures: Sections
No ratings yet
Pandas Data Structures: Sections
13 pages
Important Pandas Operations 1697910759
No ratings yet
Important Pandas Operations 1697910759
6 pages
Fona Art Plus C Operating Manual
No ratings yet
Fona Art Plus C Operating Manual
40 pages
TT 1901 1902 Users Manual
No ratings yet
TT 1901 1902 Users Manual
42 pages
Class 9 Notes PT1 - New
No ratings yet
Class 9 Notes PT1 - New
3 pages
Pandas
No ratings yet
Pandas
5 pages
Pandas 1
No ratings yet
Pandas 1
2 pages
Pandas Cheat Sheet - Python For Data Science
No ratings yet
Pandas Cheat Sheet - Python For Data Science
5 pages
GSSI SIR 20 Manual PDF
No ratings yet
GSSI SIR 20 Manual PDF
100 pages
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
No ratings yet
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
8 pages
Summative Test
No ratings yet
Summative Test
3 pages
Academic Writing With Ai Apps
No ratings yet
Academic Writing With Ai Apps
245 pages
Topic: Emoji: History
No ratings yet
Topic: Emoji: History
12 pages
En - PY0101EN - Loading Data With Pandas
No ratings yet
En - PY0101EN - Loading Data With Pandas
2 pages
DM780 V5 UserGuide
No ratings yet
DM780 V5 UserGuide
69 pages
0-Ip Project New
No ratings yet
0-Ip Project New
29 pages
Personal Development 11
No ratings yet
Personal Development 11
8 pages
Computer Components
No ratings yet
Computer Components
35 pages
Lecture 4 Writing Images and Data Classes
No ratings yet
Lecture 4 Writing Images and Data Classes
4 pages
Major Project
No ratings yet
Major Project
22 pages
Labeling Device Mvps G3: Step-By-Step Instruction
No ratings yet
Labeling Device Mvps G3: Step-By-Step Instruction
13 pages
COMMAND PROMT Your Wish Is My CMD
No ratings yet
COMMAND PROMT Your Wish Is My CMD
7 pages
How To Use The Boot - Recovery LOADER Command For Recovery of A Boot Device
No ratings yet
How To Use The Boot - Recovery LOADER Command For Recovery of A Boot Device
4 pages
Fix It
No ratings yet
Fix It
8 pages
Bca 302
No ratings yet
Bca 302
6 pages
Ian QUIZ-CC100
No ratings yet
Ian QUIZ-CC100
2 pages

ICT2103 Full Book-Part-3

Uploaded by

ICT2103 Full Book-Part-3

Uploaded by

Getting familiar with Azure Notebook

Click + to create a new Python program. Enter Example1 in

Once the file Example1.ipynb is created, you can click on it and it

The screenshot below displays the main components of Jupyter

The Cell menu

You can also run code by Pressing Control + Enter (Windows),

This data set is in Excel format and contains many sheets.

As our file is includedin

The output we will use looks like this.

Let's examine the pandas DataFrame in more detail.

These are the Columns

Columns: These are the names of the columns as in the

We basically added a comma then skiprows=4 to the original

Now, let us see the first five rows:

Now our DataFrame looks better. However, we still have a few

Here is how to select one column from the DataFrame:

Finding unique values in a column is needed to perform analysis on your data.

Reading the file and displaying the head:

To display the summary of a column, including the number of records,

Display the summary of column

The above summary shows that column “Question1” contains 26

The structure of the command line is:

In this case, the code is:

1. Display data in rows 2 to 15.

The structure of the command line is:

The condition in this case is:

1. Display the IDs of students who scored 25 in Question1.

The structure of the command line is:

DataFrame[(Condition) & (Condition)]

The condition in this case is:

(data["Total"]>=80) & (data["Total"]<=90)

The full command is:

Represents The colon This This

Display rows 5 to 10 and only columns “Question1” and “Question2”.

Note that you need to use

In this example the index is

To display selected columns or rows, you need to add them inside a

Display rows 3, 8, and 20 and columns

1. Display rows 14 and 18 of StudentID and Total.

You might also like