0% found this document useful (0 votes)

10 views

introduction to pandas

Pandas is a Python library designed for data manipulation and analysis, allowing users to clean, explore, and analyze datasets effectively. It provides data structures like Series and DataFrames for handling one-dimensional and multi-dimensional data, respectively, and supports operations such as loading data from files, handling missing values, and performing statistical analyses. The library is essential for data science, enabling users to derive insights from large datasets through various functions and methods.

Uploaded by

korircaren4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

introduction to pandas

Uploaded by

korircaren4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

PANDAS

Pandas is short for panel data.

It is a python library used for working with datasets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

Pandas allows us to analyze big data and make conclusions based on statistical
theories.

Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

:}
Data Science: is a branch of computer science where we study how to store,
use and analyze data for deriving information from it.

Pandas gives you answers about the data. Like:

 Is there a correlation between two or more columns?

 What is average value?

 Max value?

 Min value?

Pandas are also able to delete rows that are not relevant, or contains wrong values,
like empty or NULL values. This is called cleaning the data.

Once Pandas is installed, import it in your applications by adding

the import keyword:

import pandas
Example;
import pandas
mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}

myvar = pandas.DataFrame(mydataset)

print(myvar)
What is a Series?

A Pandas Series is like a column in a table.

It is a one-dimensional array holding data of any type.

Example;

Create a simple Pandas Series from a list:

import pandas as pd

a = [1, 7, 2]

myvar = pd.Series(a)

print(myvar)

Create Labels

With the index argument, you can name your own labels.

Example

Create your own labels:

import pandas as pd

a = [1, 7, 2]

myvar = pd.Series(a, index = ["x", "y", "z"])

print(myvar)

When you have created labels, you can access an item by referring to the label.

Example

Return the value of "y":

print(myvar["y"])

Key/Value Objects as Series

You can also use a key/value object, like a dictionary, when creating a Series.

Example

Create a simple Pandas Series from a dictionary:

import pandas as pd

calories = {"day1": 420, "day2": 380, "day3": 390}

myvar = pd.Series(calories)

print(myvar)

Note: The keys of the dictionary become the labels.

To select only some of the items in the dictionary, use the index argument and
specify only the items you want to include in the Series.

Example

Create a Series using only data from "day1" and "day2":

import pandas as pd

calories = {"day1": 420, "day2": 380, "day3": 390}

myvar = pd.Series(calories, index = ["day1", "day2"])

print(myvar)
DataFrames

Data sets in Pandas are usually multi-dimensional tables, called DataFrames.

Series is like a column, a DataFrame is the whole table.

Example

Create a DataFrame from two Series:

import pandas as pd

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

myvar = pd.DataFrame(data)

print(myvar)

A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array,

or a table with rows and columns.

Example;

Create a simple Pandas DataFrame:

import pandas as pd

data = {
"calories": [420, 380, 390],

"duration": [50, 40, 45]

#load data into a DataFrame object:

df = pd.DataFrame(data)

print(df)

As you can see from the result above, the DataFrame is like a table with rows and
columns.

Pandas use the loc attribute to return one or more specified row(s)

Example

Return row 0:

#refer to the row index:

print(df.loc[0])

Note: This example returns a Pandas Series.

Example

Return row 0 and 1:

#use a list of indexes:

print(df.loc[[0, 1]])

Note: When using [], the result is a Pandas DataFrame.

Named Indexes

With the index argument, you can name your own indexes.

Example

Add a list of names to give each row a name:

import pandas as pd

data = {

"calories": [420, 380, 390],

"duration": [50, 40, 45]

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

print(df)

Locate Named Indexes

Use the named index in the loc attribute to return the specified row(s).

Example
Return "day2":

#refer to the named index:

print(df.loc["day2"])

Load Files Into a DataFrame

If your data sets are stored in a file, Pandas can load them into a DataFrame.

Example

Load a comma separated file (CSV file) into a DataFrame:

import pandas as pd

df = pd.read_csv('data.csv')

print(df)

Read CSV Files

A simple way to store big data sets is to use CSV files (comma separated files).

CSV files contains plain text and is a well know format that can be read by
everyone including Pandas.

In our examples we will be using a CSV file called 'data.csv'.

Download data.csv. or Open data.csv

Load the CSV into a DataFrame:

import pandas as pd

df = pd.read_csv('data.csv')

print(df.to_string())

Tip: use to_string() to print the entire DataFrame.

If you have a large DataFrame with many rows, Pandas will only return the first 5
rows, and the last 5 rows:

Example

Print the DataFrame without the to_string() method:

import pandas as pd

df = pd.read_csv('data.csv')

print(df)

The number of rows returned is defined in Pandas option settings.

You can check your system's maximum rows with the

pd.options.display.max_rows statement.
Example

Check the number of maximum returned rows:

import pandas as pd

print(pd.options.display.max_rows)

n my system the number is 60, which means that if the DataFrame contains more
than 60 rows, the print(df) statement will return only the headers and the first and
last 5 rows.

You can change the maximum rows number with the same statement.

Read JSON

Big data sets are often stored, or extracted as JSON.

JSON is plain text, but has the format of an object, and is well known in the world
of programming, including Pandas.

In our examples we will be using a JSON file called 'data.json'.

Open data.json.

Example;

Load the JSON file into a DataFrame:

import pandas as pd
df = pd.read_json('data.json')

print(df.to_string())

Tip: use to_string() to print the entire DataFrame

JSON = Python Dictionary

JSON objects have the same format as Python dictionaries.

If your JSON code is not in a file, but in a Python Dictionary, you can load it into a
DataFrame directly

Example

Load a Python Dictionary into a DataFrame:

import pandas as pd

data = {
"Duration":{
"0":60,
"1":60,
"2":60,
"3":45,
"4":45,
"5":60
},
"Pulse":{
"0":110,
"1":117,
"2":103,
"3":109,
"4":117,
"5":102
},
"Maxpulse":{
"0":130,
"1":145,
"2":135,
"3":175,
"4":148,
"5":127
},
"Calories":{
"0":409,
"1":479,
"2":340,
"3":282,
"4":406,
"5":300
}
}

df = pd.DataFrame(data)

print(df)

Pandas - Analyzing DataFrames

Viewing the Data

One of the most used method for getting a quick overview of the DataFrame, is the
head() method.
The head() method returns the headers and a specified number of rows, starting
from the top

ExampleGet your own Python Server

Get a quick overview by printing the first 10 rows of the DataFrame:

import pandas as pd

df = pd.read_csv('data.csv')

print(df.head(10))

Note: if the number of rows is not specified, the head() method will return the top 5
rows.

Example

Print the first 5 rows of the DataFrame:

import pandas as pd

df = pd.read_csv('data.csv')

print(df.head())

There is also a tail() method for viewing the last rows of the DataFrame.
The tail() method returns the headers and a specified number of rows, starting from
the bottom.

Example

Print the last 5 rows of the DataFrame:

print(df.tail())

Info About the Data

The DataFrames object has a method called info(), that gives you more information
about the data set.

Example

Print information about the data:

print(df.info())

Null Values

The info() method also tells us how many Non-Null values there are present in
each column, and in our data set it seems like there are 164 of 169 Non-Null values
in the "Calories" column.

Which means that there are 5 rows with no value at all, in the "Calories" column,
for whatever reason.
Empty values, or Null values, can be bad when analyzing data, and you should
consider removing rows with empty values. This is a step towards what is called
cleaning data, and you will learn more about that in the next chapters.

1745516832930-Pandas-Handbook
No ratings yet
1745516832930-Pandas-Handbook
33 pages
Bls Decrypted
100% (2)
Bls Decrypted
264 pages
Pandas Basics
No ratings yet
Pandas Basics
84 pages
Getting Started with SAS Programming: Using SAS Studio in the Cloud
From Everand
Getting Started with SAS Programming: Using SAS Studio in the Cloud
Ron Cody
No ratings yet
IS 901SP4 NM3 02 Introduction To SSA-NAME3 PDF
No ratings yet
IS 901SP4 NM3 02 Introduction To SSA-NAME3 PDF
26 pages
Lecture 7 Understanding dataFrames in Python and R
No ratings yet
Lecture 7 Understanding dataFrames in Python and R
17 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
Pandas
No ratings yet
Pandas
21 pages
Pandas
No ratings yet
Pandas
41 pages
Pandas Notes (1)
No ratings yet
Pandas Notes (1)
10 pages
2_Pandas
No ratings yet
2_Pandas
22 pages
Mdad - Numpy ML
No ratings yet
Mdad - Numpy ML
85 pages
Notes on Pandas.
No ratings yet
Notes on Pandas.
7 pages
Exp1 - Manipulating Datasets Using Pandas
No ratings yet
Exp1 - Manipulating Datasets Using Pandas
15 pages
Pandas AI
No ratings yet
Pandas AI
14 pages
PPT for Assignment-3 (Final_Pandas_Lab)
No ratings yet
PPT for Assignment-3 (Final_Pandas_Lab)
40 pages
Data Science - Sec3
No ratings yet
Data Science - Sec3
27 pages
Exercise 3
No ratings yet
Exercise 3
12 pages
Pandas
No ratings yet
Pandas
16 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Importing Files Through Pandas
No ratings yet
Importing Files Through Pandas
16 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
Pandas
No ratings yet
Pandas
42 pages
Pandas
No ratings yet
Pandas
41 pages
Introduction To Pandas For Data Analysis
No ratings yet
Introduction To Pandas For Data Analysis
6 pages
MOD-3 Dap
No ratings yet
MOD-3 Dap
41 pages
Pandas 1705297450
No ratings yet
Pandas 1705297450
21 pages
Pandas Notes(1)
No ratings yet
Pandas Notes(1)
44 pages
Pandas Dataframe Export The CSV File
No ratings yet
Pandas Dataframe Export The CSV File
9 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
Pandas (Ziad)
No ratings yet
Pandas (Ziad)
38 pages
Pandas
No ratings yet
Pandas
8 pages
Pandas
No ratings yet
Pandas
21 pages
12 IP Unit 1 Python Pandas I (Part 3 Dataframes) Notes
100% (1)
12 IP Unit 1 Python Pandas I (Part 3 Dataframes) Notes
24 pages
Pandas
No ratings yet
Pandas
9 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Pandas
No ratings yet
Pandas
25 pages
Python Pandas Tutorial For Beginners
No ratings yet
Python Pandas Tutorial For Beginners
203 pages
Unit 4
No ratings yet
Unit 4
36 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Python Libraries
No ratings yet
Python Libraries
53 pages
Data Analysis With Pandas
No ratings yet
Data Analysis With Pandas
28 pages
FDS Notes Unit-4
No ratings yet
FDS Notes Unit-4
30 pages
Class Notes: Class: XII Date: 7-Apr-2020 Subject: Informatics Practices Topic: 2. Python Pandas
No ratings yet
Class Notes: Class: XII Date: 7-Apr-2020 Subject: Informatics Practices Topic: 2. Python Pandas
4 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Pandas
No ratings yet
Pandas
29 pages
Python Pandas
No ratings yet
Python Pandas
13 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Python 3rd unit question and answer
No ratings yet
Python 3rd unit question and answer
25 pages
Pandas cheat sheet
No ratings yet
Pandas cheat sheet
19 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
JOINS (1)
No ratings yet
JOINS (1)
10 pages
Pandas
No ratings yet
Pandas
12 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
Pandas 1
No ratings yet
Pandas 1
2 pages
Pandas Cheat Sheet........
No ratings yet
Pandas Cheat Sheet........
11 pages
DataFrame.docx
No ratings yet
DataFrame.docx
95 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
Quick Python Guide
From Everand
Quick Python Guide
Coder1
No ratings yet
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
03 PSD_WEB_00003806_Webinar_English_1
No ratings yet
03 PSD_WEB_00003806_Webinar_English_1
61 pages
Esp Scheduling
No ratings yet
Esp Scheduling
17 pages
Java MCQ
No ratings yet
Java MCQ
33 pages
CS202 - Chapter 1 - Java Bascis - Abstract Classes - Interfaces
No ratings yet
CS202 - Chapter 1 - Java Bascis - Abstract Classes - Interfaces
47 pages
Technologies Every Web Developer Should Be Able To Explain
No ratings yet
Technologies Every Web Developer Should Be Able To Explain
4 pages
CSC 1101 Assignment 1
No ratings yet
CSC 1101 Assignment 1
16 pages
R Algebra
No ratings yet
R Algebra
16 pages
Python Linear and Binary Search - Algorithms 1
No ratings yet
Python Linear and Binary Search - Algorithms 1
3 pages
Object Oriented Programming: Assignment # 03
No ratings yet
Object Oriented Programming: Assignment # 03
3 pages
Computer Science, Paper-2 Subjective - 1
No ratings yet
Computer Science, Paper-2 Subjective - 1
2 pages
Java Examination (Include Answer)
No ratings yet
Java Examination (Include Answer)
14 pages
Intro To Oas
No ratings yet
Intro To Oas
31 pages
127 STD 12 Viva Question Answers
0% (2)
127 STD 12 Viva Question Answers
15 pages
10 1 Anti-Debugging
No ratings yet
10 1 Anti-Debugging
10 pages
Operating Systems Notes
No ratings yet
Operating Systems Notes
135 pages
Inv prj1
No ratings yet
Inv prj1
22 pages
Appendix (Sap-Nw-7.0-Dual-Stack-Refresh)
No ratings yet
Appendix (Sap-Nw-7.0-Dual-Stack-Refresh)
14 pages
Soft Skill Interview Questions
No ratings yet
Soft Skill Interview Questions
7 pages
C Cheatsheet C Cheatsheet: Table of Content Table of Content
No ratings yet
C Cheatsheet C Cheatsheet: Table of Content Table of Content
8 pages
IS C++ More Difficult To Learn Than C - Quora
No ratings yet
IS C++ More Difficult To Learn Than C - Quora
5 pages
Mapinfo Mapbasic v17 0 0 User Guide
No ratings yet
Mapinfo Mapbasic v17 0 0 User Guide
366 pages
Section 5 Quiz Lanjutan (DadanD
No ratings yet
Section 5 Quiz Lanjutan (DadanD
7 pages
Week 4 Graded
No ratings yet
Week 4 Graded
10 pages
OSCII-bot Code Reference
No ratings yet
OSCII-bot Code Reference
21 pages
Eudr - Api Eo Cf-Test2 - Annex - Eudr Geojson File Description 1.3
No ratings yet
Eudr - Api Eo Cf-Test2 - Annex - Eudr Geojson File Description 1.3
13 pages
Python Shot Interview
No ratings yet
Python Shot Interview
6 pages
Cse V Systems Software (10cs52) Notes
No ratings yet
Cse V Systems Software (10cs52) Notes
206 pages
Accenture Preparation Study Plan by Talent Battle
No ratings yet
Accenture Preparation Study Plan by Talent Battle
20 pages

introduction to pandas

Uploaded by

introduction to pandas

Uploaded by

PANDAS

Pandas is short for panel data.

It is a python library used for working with datasets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

Relevant data is very important in data science.

Pandas gives you answers about the data. Like:

 Is there a correlation between two or more columns?

 What is average value?

Once Pandas is installed, import it in your applications by adding

A Pandas Series is like a column in a table.

It is a one-dimensional array holding data of any type.

Create a simple Pandas Series from a list:

Create your own labels:

myvar = pd.Series(a, index = ["x", "y", "z"])

Return the value of "y":

Key/Value Objects as Series

Create a simple Pandas Series from a dictionary:

calories = {"day1": 420, "day2": 380, "day3": 390}

Note: The keys of the dictionary become the labels.

Create a Series using only data from "day1" and "day2":

calories = {"day1": 420, "day2": 380, "day3": 390}

myvar = pd.Series(calories, index = ["day1", "day2"])

Data sets in Pandas are usually multi-dimensional tables, called DataFrames.

Series is like a column, a DataFrame is the whole table.

Create a DataFrame from two Series:

A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array,

Create a simple Pandas DataFrame:

"duration": [50, 40, 45]

#load data into a DataFrame object:

#refer to the row index:

Note: This example returns a Pandas Series.

Return row 0 and 1:

Note: When using [], the result is a Pandas DataFrame.

Add a list of names to give each row a name:

"calories": [420, 380, 390],

"duration": [50, 40, 45]

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

Locate Named Indexes

#refer to the named index:

Load Files Into a DataFrame

Load a comma separated file (CSV file) into a DataFrame:

Read CSV Files

In our examples we will be using a CSV file called 'data.csv'.

Download data.csv. or Open data.csv

Load the CSV into a DataFrame:

Tip: use to_string() to print the entire DataFrame.

Print the DataFrame without the to_string() method:

The number of rows returned is defined in Pandas option settings.

You can check your system's maximum rows with the

Check the number of maximum returned rows:

Big data sets are often stored, or extracted as JSON.

In our examples we will be using a JSON file called 'data.json'.

Load the JSON file into a DataFrame:

Tip: use to_string() to print the entire DataFrame

JSON = Python Dictionary

JSON objects have the same format as Python dictionaries.

Load a Python Dictionary into a DataFrame:

Pandas - Analyzing DataFrames

Viewing the Data

ExampleGet your own Python Server

Get a quick overview by printing the first 10 rows of the DataFrame:

Print the first 5 rows of the DataFrame:

Print the last 5 rows of the DataFrame:

Info About the Data

Print information about the data:

You might also like