0% found this document useful (0 votes)

39 views16 pages

Importing Files Through Pandas

The document discusses Pandas, a Python library used for working with data sets. It allows analyzing, cleaning, exploring, and manipulating data. Pandas can import data from files like CSV files into DataFrames. DataFrames are like tables with rows and columns that allow accessing and filtering specific rows or columns of data.

Uploaded by

fatimamaryam882

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views16 pages

Importing Files Through Pandas

Uploaded by

fatimamaryam882

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Importing files through

Pandas
Course Instructor: Anam Shahid

Source: https://www.w3schools.com/python/pandas/default.asp

Pandas Introduction
What is Pandas?
Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

The name "Pandas" has a reference to both "Panel Data", and "Python Data
Analysis" and was created by Wes McKinney in 2008.

Why Use Pandas?

Pandas allows us to analyze big data and make conclusions based on statistical
theories.Pandas can clean messy data sets, and make them readable and
relevant.
Relevant data is very important in data science.

:}
Data Science: is a branch of computer science where we study how to store, use and
analyze data for deriving information from it.

What Can Pandas Do?

Pandas give you answers about the data. Like:

 Is there a correlation between two or more columns?

 What is average value?
 Max value?
 Min value?

Pandas are also able to delete rows that are not relevant, or contains wrong
values, like empty or NULL values. This is called cleaning the data.

Pandas Getting Started

Installation of Pandas
If you have Python and PIP already installed on a system, then installation of
Pandas is very easy.

Install it using this command:

C:\Users\Your Name>pip install pandas

If this command fails, then use a python distribution that already has Pandas
installed like, Anaconda, Spyder etc.

Import Pandas
Once Pandas is installed, import it in your applications by adding
the import keyword:
import pandas

Now Pandas is imported and ready to use.

Example
import pandas

mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}
myvar = pandas.DataFrame(mydataset)

print(myvar)

Pandas as pd
Pandas is usually imported under the pd alias.

alias: In Python alias are an alternate name for referring to the same thing.

Create an alias with the as keyword while importing:

import pandas as pd

Now the Pandas package can be referred to as pd instead of pandas.

Example
import pandas as pd

mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}

myvar = pd.DataFrame(mydataset)

print(myvar)
Output:

Output: cars passings

0 BMW 3
1 Volvo 7
2 Ford 2

Pandas Series
What is a Series?
A Pandas Series is like a column in a table.

It is a one-dimensional array holding data of any type.

Example
Create a simple Pandas Series from a list:

import pandas as pd

a = [1, 7, 2]

myvar = pd.Series(a)

print(myvar)

Output:

0 1
1 7
2 2
dtype: int64

Labels
If nothing else is specified, the values are labeled with their index number. First
value has index 0, second value has index 1 etc.

This label can be used to access a specified value.

Example
Return the first value of the Series:

print(myvar[0])

Output: 1

Create Labels
With the index argument, you can name your own labels.

Example
Create your own labels:

import pandas as pd

a = [1, 7, 2]

myvar = pd.Series(a, index = ["x", "y", "z"])

print(myvar)

Output:

x 1
y 7
z 2
dtype: int64

When you have created labels, you can access an item by referring to the label.

Example
Return the value of "y":

print(myvar["y"])

Output: 7
Key/Value Objects as Series
You can also use a key/value object, like a dictionary, when creating a Series.

Example
Create a simple Pandas Series from a dictionary:

import pandas as pd

calories = {"day1": 420, "day2": 380, "day3": 390}

myvar = pd.Series(calories)

print(myvar)

Note: The keys of the dictionary become the labels.

Output:

day1 420
day2 380
day3 390
dtype: int64

DataFrames
Data sets in Pandas are usually multi-dimensional tables, called DataFrames.

Series is like a column, a DataFrame is the whole table.

Example
Create a DataFrame from two Series:

import pandas as pd

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

myvar = pd.DataFrame(data)

print(myvar)

Output:

calories duration
0 420 50
1 380 40
2 390 45

Pandas DataFrames
What is a DataFrame?
A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional
array, or a table with rows and columns.

Example
Create a simple Pandas DataFrame:

import pandas as pd

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

#load data into a DataFrame object:

df = pd.DataFrame(data)

print(df)

Result
calories duration
0 420 50
1 380 40
2 390 45

Locate Row
As you can see from the result above, the DataFrame is like a table with rows
and columns.

Pandas use the loc attribute to return one or more specified row(s)

Example
Return row 0:

#refer to the row index:

print(df.loc[0])

Result
calories 420
duration 50
Name: 0, dtype: int64

Note: This example returns a Pandas Series.

Example
Return row 0 and 1:

#use a list of indexes:

print(df.loc[[0, 1]])

Output:
calories duration
0 420 50
1 380 40

Note: When using [], the result is a Pandas DataFrame.

Named Indexes
With the index argument, you can name your own indexes.

Example
Add a list of names to give each row a name:

import pandas as pd

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

print(df)

Result
calories duration
day1 420 50
day2 380 40
day3 390 45

Locate Named Indexes

Use the named index in the loc attribute to return the specified row(s).
Example
Return "day2":

#refer to the named index:

print(df.loc["day2"])

Result
calories 380
duration 40
Name: 0, dtype: int64

Load Files Into a DataFrame

If your data sets are stored in a file, Pandas can load them into a DataFrame.

Example
Load a comma separated file (CSV file) into a DataFrame:

import pandas as pd

df = pd.read_csv('data.csv')

print(df)

Duration Pulse Maxpulse Calories

0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
.. ... ... ... ...
164 60 105 140 290.8
165 60 110 145 300.4
166 60 115 145 310.2
167 75 120 150 320.4
168 75 125 150 330.4

[169 rows x 4 columns]

Pandas Read CSV
Read CSV Files
A simple way to store big data sets is to use CSV files (comma separated files).

CSV files contain plain text and are a well know format that can be read by
everyone including Pandas.

In our examples we will be using a CSV file called 'data.csv'.

Example
Load the CSV into a DataFrame:

import pandas as pd

df = pd.read_csv('data.csv')

print(df.to_string())

Output:

Duration Pulse Maxpulse Calories

0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
5 60 102 127 300.5
6 60 110 136 374.0
7 45 104 134 253.3
8 30 109 133 195.1
9 60 98 124 269.0
10 60 103 147 329.3
11 60 100 120 250.7
12 60 106 128 345.3
13 60 104 132 379.3
14 60 98 123 275.0
15 60 98 120 215.2
16 60 100 120 300.0
Tip: use to_string() to print the entire DataFrame.

If you have a large DataFrame with many rows, Pandas will only return the first
5 rows, and the last 5 rows:

Example
Print the DataFrame without the to_string() method:

import pandas as pd

df = pd.read_csv('data.csv')

print(df)

Output:

Duration Pulse Maxpulse Calories

[169 rows x 4 columns]

max_rows
The number of rows returned is defined in Pandas option settings.

You can check your system's maximum rows with

the pd.options.display.max_rows statement.
Example
Check the number of maximum returned rows:

import pandas as pd
print(pd.options.display.max_rows)

Output : 6

In my system the number is 60, which means that if the DataFrame contains
more than 60 rows, the print(df) statement will return only the headers and
the first and last 5 rows.

You can change the maximum rows number with the same statement.

Example
Increase the maximum number of rows to display the entire DataFrame:

import pandas as pd

pd.options.display.max_rows = 9999

df = pd.read_csv('data.csv')

print(df)

Output:
Duration Pulse Maxpulse Calories
0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
5 60 102 127 300.5
6 60 110 136 374.0
7 45 104 134 253.3
8 30 109 133 195.1
9 60 98 124 269.0
10 60 103 147 329.3
11 60 100 120 250.7
12 60 106 128 345.3
13 60 104 132 379.3
14 60 98 123 275.0
15 60 98 120 215.2
16 60 100 120 300.0

Pandas - Analyzing DataFrames

Viewing the Data
One of the most used method for getting a quick overview of the DataFrame, is
the head() method.

The head() method returns the headers and a specified number of rows,
starting from the top.

Example
Get a quick overview by printing the first 10 rows of the DataFrame:

import pandas as pd

df = pd.read_csv('data.csv')

print(df.head(10))

Output:

Duration Pulse Maxpulse Calories

0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
5 60 102 127 300.5
6 60 110 136 374.0
7 45 104 134 253.3
8 30 109 133 195.1
9 60 98 124 269.0

Note: if the number of rows is not specified, the head() method will return the
top 5 rows.
Example
Print the first 5 rows of the DataFrame:

import pandas as pd

df = pd.read_csv('data.csv')

print(df.head())

Output:

Duration Pulse Maxpulse Calories

0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0

There is also a tail() method for viewing the last rows of the DataFrame.

The tail() method returns the headers and a specified number of rows, starting
from the bottom.

Example
Print the last 5 rows of the DataFrame:

print(df.tail())

Output:

Duration Pulse Maxpulse Calories

164 60 105 140 290.8
165 60 110 145 300.4
166 60 115 145 310.2
167 75 120 150 320.4
168 75 125 150 330.4

Info About the Data

The DataFrames object has a method called info(), that gives you more
information about the data set.
Example
Print information about the data:

print(df.info())

Result
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 169 entries, 0 to 168
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Duration 169 non-null int64
1 Pulse 169 non-null int64
2 Maxpulse 169 non-null int64
3 Calories 164 non-null float64
dtypes: float64(1), int64(3)
memory usage: 5.4 KB
None

Result Explained
The result tells us there are 169 rows and 4 columns:

RangeIndex: 169 entries, 0 to 168

Data columns (total 4 columns):

And the name of each column, with the data type:

# Column Non-Null Count Dtype

--- ------ -------------- -----
0 Duration 169 non-null int64
1 Pulse 169 non-null int64
2 Maxpulse 169 non-null int64
3 Calories 164 non-null float64

Online Pet Shop Management System
No ratings yet
Online Pet Shop Management System
49 pages
Cheat Sheet: The Pandas Dataframe Object: Column Index (DF - Columns)
No ratings yet
Cheat Sheet: The Pandas Dataframe Object: Column Index (DF - Columns)
6 pages
Pandas Notes
No ratings yet
Pandas Notes
10 pages
MOD-3 Dap
No ratings yet
MOD-3 Dap
41 pages
Lecture 7 Understanding Dataframes in Python and R
No ratings yet
Lecture 7 Understanding Dataframes in Python and R
17 pages
Pandas
No ratings yet
Pandas
21 pages
Notes On Pandas.
No ratings yet
Notes On Pandas.
7 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
19 pages
Pandas
No ratings yet
Pandas
41 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
14 pages
Exercise 3
No ratings yet
Exercise 3
12 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
Mdad - Numpy ML
No ratings yet
Mdad - Numpy ML
85 pages
Pandas
No ratings yet
Pandas
16 pages
Python Class - 22
No ratings yet
Python Class - 22
5 pages
Pandas AI
No ratings yet
Pandas AI
14 pages
2 Pandas
No ratings yet
2 Pandas
22 pages
FDS Notes Unit-4
No ratings yet
FDS Notes Unit-4
30 pages
Python Pandas
No ratings yet
Python Pandas
34 pages
EX - No: 1 Date:: Download Install Explore The Features of Numpy, Scipy, Jupiter, Statsmodels and Pandas Packages
No ratings yet
EX - No: 1 Date:: Download Install Explore The Features of Numpy, Scipy, Jupiter, Statsmodels and Pandas Packages
38 pages
Python Libraries
No ratings yet
Python Libraries
53 pages
FDS Exp 3
No ratings yet
FDS Exp 3
5 pages
DS (Pandas)
No ratings yet
DS (Pandas)
17 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Python Libraries
No ratings yet
Python Libraries
27 pages
Exp1 - Manipulating Datasets Using Pandas
No ratings yet
Exp1 - Manipulating Datasets Using Pandas
15 pages
12 IP Unit 1 Python Pandas I (Part 3 Dataframes) Notes
100% (1)
12 IP Unit 1 Python Pandas I (Part 3 Dataframes) Notes
24 pages
Pandas Notes
No ratings yet
Pandas Notes
5 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
Exercise 3
No ratings yet
Exercise 3
25 pages
Class Notes: Class: XII Date: 7-Apr-2020 Subject: Informatics Practices Topic: 2. Python Pandas
No ratings yet
Class Notes: Class: XII Date: 7-Apr-2020 Subject: Informatics Practices Topic: 2. Python Pandas
4 pages
Short Notes On Pandas
No ratings yet
Short Notes On Pandas
21 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
6 pages
Fds Mannual
No ratings yet
Fds Mannual
39 pages
Python Pandas
No ratings yet
Python Pandas
34 pages
Pandas Cheat Sheet........
No ratings yet
Pandas Cheat Sheet........
11 pages
Unit 2 Mca275 PPT Part 2
No ratings yet
Unit 2 Mca275 PPT Part 2
33 pages
For Assignment-3 (Final - Pandas - Lab)
No ratings yet
For Assignment-3 (Final - Pandas - Lab)
40 pages
Pandas
No ratings yet
Pandas
21 pages
Pandas
No ratings yet
Pandas
8 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
IP Practical File - Reference
No ratings yet
IP Practical File - Reference
98 pages
Pandas
No ratings yet
Pandas
25 pages
Data Science Notes Unit-1 Part - 2
No ratings yet
Data Science Notes Unit-1 Part - 2
22 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Pandas - Datastructures
No ratings yet
Pandas - Datastructures
19 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
P Unit-4 NP
No ratings yet
P Unit-4 NP
30 pages
Pandas 1705297450
No ratings yet
Pandas 1705297450
21 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Pandas DataFrames
No ratings yet
Pandas DataFrames
1 page
Pythonlibraries
No ratings yet
Pythonlibraries
20 pages
Pandas
No ratings yet
Pandas
42 pages
Pandas 1
No ratings yet
Pandas 1
89 pages
Exp19 Access Ch03 ML1 Small Business Loans Instructions
No ratings yet
Exp19 Access Ch03 ML1 Small Business Loans Instructions
2 pages
Lec 16 BB
No ratings yet
Lec 16 BB
24 pages
Aws Certified Data Engineer Slides
100% (1)
Aws Certified Data Engineer Slides
696 pages
IDS Assignment # 4 Name Obaid Ullah: University of Engineering and Technology (Mardan)
No ratings yet
IDS Assignment # 4 Name Obaid Ullah: University of Engineering and Technology (Mardan)
3 pages
Merit Databse
No ratings yet
Merit Databse
53 pages
What Is The Syntax For Match, Vlookup and Offset?
No ratings yet
What Is The Syntax For Match, Vlookup and Offset?
4 pages
Web Tech Lab Manual 2021
100% (1)
Web Tech Lab Manual 2021
4 pages
Final Exam Big Data - 11112
No ratings yet
Final Exam Big Data - 11112
6 pages
Ankit CS Project
No ratings yet
Ankit CS Project
32 pages
DBMS MCQ
No ratings yet
DBMS MCQ
12 pages
Data Base Management ENTC - 9-6-2021
No ratings yet
Data Base Management ENTC - 9-6-2021
6 pages
104 Management Information Systems.
No ratings yet
104 Management Information Systems.
17 pages
Backup Snapshot - Py
No ratings yet
Backup Snapshot - Py
14 pages
NetBackup104 AdminGuide PostgreSQL
No ratings yet
NetBackup104 AdminGuide PostgreSQL
38 pages
Explore The Role of SQL in The Relational Database by Showing How It Works
No ratings yet
Explore The Role of SQL in The Relational Database by Showing How It Works
4 pages
22 - Transactions and Error Handling in SQL Server
No ratings yet
22 - Transactions and Error Handling in SQL Server
9 pages
MongoDB CheatSheet v1 0
No ratings yet
MongoDB CheatSheet v1 0
4 pages
IMS - Brighter Blue Trainings
No ratings yet
IMS - Brighter Blue Trainings
6 pages
SQL Interview Questions For Experienced
No ratings yet
SQL Interview Questions For Experienced
5 pages
Maximo76 - Designer431 - Report Development Guide - Rev7
No ratings yet
Maximo76 - Designer431 - Report Development Guide - Rev7
102 pages
Platform Developer I
No ratings yet
Platform Developer I
6 pages
Homework 3.4 Mongodb
100% (1)
Homework 3.4 Mongodb
5 pages
Study Material IP XII
No ratings yet
Study Material IP XII
116 pages
Over 100 SQL Concepts
No ratings yet
Over 100 SQL Concepts
23 pages
Geographic Information System AND Remote Sensing
No ratings yet
Geographic Information System AND Remote Sensing
19 pages
Vcops Adapter Guide PDF
No ratings yet
Vcops Adapter Guide PDF
34 pages
Borana University
No ratings yet
Borana University
2 pages
DBMS Part 1
No ratings yet
DBMS Part 1
7 pages
Option 1 Option 2 Option 3 Option 4 Correct Answer Option
No ratings yet
Option 1 Option 2 Option 3 Option 4 Correct Answer Option
33 pages

Importing Files Through Pandas

Uploaded by

Importing Files Through Pandas

Uploaded by

Importing files through

It has functions for analyzing, cleaning, exploring, and manipulating data.

Why Use Pandas?

What Can Pandas Do?

 Is there a correlation between two or more columns?

Pandas Getting Started

Install it using this command:

C:\Users\Your Name>pip install pandas

Now Pandas is imported and ready to use.

Create an alias with the as keyword while importing:

Now the Pandas package can be referred to as pd instead of pandas.

Output: cars passings

It is a one-dimensional array holding data of any type.

This label can be used to access a specified value.

myvar = pd.Series(a, index = ["x", "y", "z"])

calories = {"day1": 420, "day2": 380, "day3": 390}

Note: The keys of the dictionary become the labels.

Series is like a column, a DataFrame is the whole table.

#load data into a DataFrame object:

#refer to the row index:

Note: This example returns a Pandas Series.

#use a list of indexes:

Note: When using [], the result is a Pandas DataFrame.

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

Locate Named Indexes

#refer to the named index:

Load Files Into a DataFrame

Duration Pulse Maxpulse Calories

[169 rows x 4 columns]

In our examples we will be using a CSV file called 'data.csv'.

Duration Pulse Maxpulse Calories

Duration Pulse Maxpulse Calories

[169 rows x 4 columns]

You can check your system's maximum rows with

Pandas - Analyzing DataFrames

Duration Pulse Maxpulse Calories

Duration Pulse Maxpulse Calories

Duration Pulse Maxpulse Calories

Info About the Data

RangeIndex: 169 entries, 0 to 168

And the name of each column, with the data type:

# Column Non-Null Count Dtype

You might also like