0% found this document useful (0 votes)
26 views

Introduction To Data Science Using Python Part2

Uploaded by

salahmohamed38
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Introduction To Data Science Using Python Part2

Uploaded by

salahmohamed38
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

Introduction to Data Science

using python Part2


Pandas
Reading in Data From Excel
I have the following data saved in the file “Grades_Short.csv”:

Let’s see how we read this data into pandas:


Reading in Data From Excel
I have the following data saved in the file “Grades_Short.csv”:

Before you use pandas you must


Let’s see how we read this data into pandas: import it. Anytime you use pandas put
this line as the top of your code.
Reading in Data From Excel
I have the following data saved in the file “Grades_Short.csv”:

Reading the data into a variable called


Let’s see how we read this data into pandas: df_grades.

Built in read_csv method Path to file


Reading in Data From Excel
So, what is df_grades and how does it store the data?

Typing the name of any variable at the end of a code cell will display the contents of
the variable.
Reading in Data From Excel
So, what is df_grades and how does it store the data?

• df_grades is a pandas dataframe.

• The data is stored in a tabular format very similar to excel.


Reading in Data From Excel
Data file

Jupyter notebook
Reading in Data From Excel

Now Grades_Short.csv is in Data Folder Jupyter notebook


Reading in Data From Excel

Now Grades_Short.csv is in Data Folder Jupyter Notebook

“/” separates directories


Reading in Data From Excel

Now Grades_Short.csv is in Data Folder Jupyter notebook in folder


Notebooks
“..” = go back one directory
The head() Method
Using the head() method

• If the data is really large you don’t want to print out the entire dataframe to your
output.

• The head(n) method outputs the first n rows of the data frame. If n is not supplied,
the default is the first 5 rows.

• I like to run the head() method after I read in the dataframe to check that everything
got read in correctly.

• There is also a tail(n) method that returns the last n rows of the dataframe
Basic Features

Think of this
as a list

object = string

float64 = decimal

int64 = integer
Basic Features
column names

row names = index


Basic Features
column names

row names = index


Basic Features
column names

row names = index

• Pandas defaults to have the index be the row number and it will automatically
recognize that the first row is the column names.

• Next we discuss how to pick out various pieces of the dataframe.


Selecting a Single Column

• Between square brackets, the column must be given as a string


• Outputs column as a series
• A series is a one dimensional dataframe. more on this in the slicing
section
Selecting a Single Column

• Exactly equivalent way to get Name column


• + : don’t have to type brackets or quotes
• -: won’t generalize to selecting multiple columns,, won’t work if column
names have spaces, can’t create new columns this way
Selecting Multiple Columns

• List of strings, which correspond to


column names.
• You can select as many column as
you want.
• Column don’t have to be contiguous.
Storing Result

Why store a slice?

• We might want/have to do our


analysis is steps.
• Less error prone
• More readable

The variable name stores a


series
Slicing a Series

Slice/index through
the index, which is
usually numbers
Slicing a Series

Slice/index through
the index, which is
usually numbers

Picking out single element


Slicing a Series

Slice/index through
the index, which is
usually numbers

Picking out single element Contiguous slice


non_inclusive
Slicing a Series

Slice/index through
the index, which is
usually numbers

Picking out single element Contiguous slice


Arbitrary slice
Slicing a Data Frame

• There are a few ways to pick slice a data frame, we will use the .loc method.

• Access elements through the index labels column names

• We will see how to change both of these labels later on


Slicing a Data Frame

• Pick a single value out.


Column name
Index label (string)
(number)
Slicing a Data Frame

• Pick out entire row: “pick out all


columns”

first_row is a series
Slicing a Data Frame

• Pick out contiguous chunk: Endpoints are inclusive!


Slicing a Data Frame

• Pick out arbitrary chunk:


Built in Functions

How do I compute the average score on the final?


Built in Functions

How do I compute the average score on the final?

Built in mean() method


Built in Functions

How do I compute the highest Mini Exam 1 score?


Built in Functions

I can actually get all key stats for numeric columns at once with the describe()
method:

summary_df is
a dataframe!
Built in Functions

I can actually get all key stats for numeric columns at once with the describe()
method:
Built in Functions

I can actually get all key stats for numeric columns at once with the describe()
method:

Notice here the


index is not row
numbers…
Built in Functions

Other useful built in methods:

value_count(): Gives a count of the number of times each unique value apears in the
column. Returns a series where indices are the unique column values.
Built in Functions

Other useful built in methods:

value_count(): Gives a count of the number of times each unique value appears in the
column. Returns a series where indices are the unique column values.
Built in Functions

Other useful built in methods:

unique(): Returns an array of all of the unique values.


Attributes vs. Methods

When do I a put a ()?


Attributes vs. Methods

When do I a put a ()?

dataframe attributes
dataframe methods
Attributes vs. Methods

When do I a put a ()?

dataframe attributes
dataframe methods

Require computation for output

Features of dataframe
Creating New Columns

Let’s create a useless new column of all 1s:


Creating New Columns

We can also create column as function of other column. The Final was worth 36
points, let’s create a column for each student’s percentage.
Deleting Columns
Deleting Columns

You might also like