0% found this document useful (0 votes)

29 views9 pages

ML Lab1 Python Panda

Uploaded by

Aly Akbar Sadakaly

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views9 pages

ML Lab1 Python Panda

Uploaded by

Aly Akbar Sadakaly

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Parul Institute of Computer Application

Faculty Of IT and Computer Science

PARUL UNIVERSITY

Python Lab

Pandas
What is Pandas?
Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and

manipulating data.

The name "Pandas" has a reference to both "Panel Data", and

"Python Data Analysis" and was created by Wes McKinney in 2008.

Why Use Pandas?

Pandas allows us to analyze big data and make conclusions based
on statistical theories.

Pandas can clean messy data sets, and make them readable and
relevant.

Relevant data is very important in data science.

Data Science: is a branch of computer science where we study how

to store, use and analyze data for deriving information from it.

What Can Pandas Do?

Pandas gives you answers about the data. Like:

• Is there a correlation between two or more columns?

• What is average value?
Python AI-IMCA SEM-2 Prof Nirmit Shah 1
Parul Institute of Computer Application
Faculty Of IT and Computer Science
PARUL UNIVERSITY

• Max value?
• Min value?

Pandas are also able to delete rows that are not relevant, or
contains wrong values, like empty or NULL values. This is
called cleaning the data.

Where is the Pandas Codebase?

The source code for Pandas is located at this github

repository https://github.com/pandas-dev/pandas

pip install pandas

Read CSV Files

A simple way to store big data sets is to use CSV files (comma separated files).

CSV files contains plain text and is a well know format that can be read by everyone
including Pandas.

In our examples we will be using a CSV file called 'data.csv'.

import pandas as pd

df = pd.read_csv('data.csv')

print(df.to_string())

Tip: use to_string() to print the entire DataFrame.

If you have a large DataFrame with many rows, Pandas will only return the first 5 rows,
and the last 5 rows:

Python AI-IMCA SEM-2 Prof Nirmit Shah 2

Parul Institute of Computer Application
Faculty Of IT and Computer Science
PARUL UNIVERSITY

max_rows

The number of rows returned is defined in Pandas option settings.

You can check your system's maximum rows with

the pd.options.display.max_rows statement.

Example

Check the number of maximum returned rows:

import pandas as pd

print(pd.options.display.max_rows)

Example

Increase the maximum number of rows to display the entire DataFrame:

import pandas as pd

pd.options.display.max_rows = 9999

df = pd.read_csv('data.csv')

print(df)

Viewing the Data

One of the most used method for getting a quick overview of the DataFrame, is
the head() method.

The head() method returns the headers and a specified number of rows, starting from
the top.

ExampleGet your own Python Server

Get a quick overview by printing the first 10 rows of the DataFrame:

Python AI-IMCA SEM-2 Prof Nirmit Shah 3

Parul Institute of Computer Application
Faculty Of IT and Computer Science
PARUL UNIVERSITY

import pandas as pd

df = pd.read_csv('data.csv')

print(df.head(10))

Example

Print the first 5 rows of the DataFrame:

import pandas as pd

df = pd.read_csv('data.csv')

print(df.head())

There is also a tail() method for viewing the last rows of the DataFrame.

The tail() method returns the headers and a specified number of rows, starting from the
bottom.

Example

Print the last 5 rows of the DataFrame:

print(df.tail())

Info About the Data

The DataFrames object has a method called info(), that gives you more information
about the data set.

Example

Print information about the data:

print(df.info())

Python AI-IMCA SEM-2 Prof Nirmit Shah 4

Parul Institute of Computer Application
Faculty Of IT and Computer Science
PARUL UNIVERSITY

To discover duplicates, we can use the duplicated() method.

The duplicated() method returns a Boolean values for each row:

ExampleGet your own Python Server

Returns True for every row that is a duplicate, otherwise False:

print(df.duplicated())

Let Learn Panda with Small example

Create CSV using following data
person salary country
0 A 40000 USA
1 B 32000 Brazil
2 C 45000 Italy
3 D 54000 USA
4 E 72000 USA
5 F 62000 Brazil
6 G 92000 Italy
7 H 55000 USA
8 I 35000 Italy
9 J 48000 Brazil

Practical 1 : Use Pandas to Calculate Stats from an Imported CSV File

For the final step, the goal is to calculate the following statistics using the Pandas
package:

• Mean salary
• Total sum of salaries
• Maximum salary
• Minimum salary
• Count of salaries
• Median salary
Python AI-IMCA SEM-2 Prof Nirmit Shah 5
Parul Institute of Computer Application
Faculty Of IT and Computer Science
PARUL UNIVERSITY

• Standard deviation of salaries

• Variance of of salaries

Sol:

import pandas as pd

df = pd.read_csv(r'C:\Users\Ron\Desktop\stats.csv')

# block 1 - simple stats

mean1 = df['salary'].mean()

sum1 = df['salary'].sum()

max1 = df['salary'].max()

min1 = df['salary'].min()

count1 = df['salary'].count()

median1 = df['salary'].median()

std1 = df['salary'].std()

var1 = df['salary'].var()

# block 2 - group by

groupby_sum1 = df.groupby(['country']).sum()

groupby_count1 = df.groupby(['country']).count()

# print block 1
Python AI-IMCA SEM-2 Prof Nirmit Shah 6
Parul Institute of Computer Application
Faculty Of IT and Computer Science
PARUL UNIVERSITY

print('mean salary: ' + str(mean1))

print('sum of salaries: ' + str(sum1))

print('max salary: ' + str(max1))

print('min salary: ' + str(min1))

print('count of salaries: ' + str(count1))

print('median salary: ' + str(median1))

print('std of salaries: ' + str(std1))

print('var of salaries: ' + str(var1))

# print block 2

print('sum of values, grouped by the country: ' + str(groupby_sum1))

print('count of values, grouped by the country: ' + str(groupby_count1))

Pandas - Plotting
Plotting

Pandas uses the plot() method to create diagrams.

We can use Pyplot, a submodule of the Matplotlib library

to visualize the diagram on the screen.

pandas.DataFrame.plot
DataFrame.plot(*args, **kwargs)[source]

Make plots of Series or DataFrame.

Python AI-IMCA SEM-2 Prof Nirmit Shah 7
Parul Institute of Computer Application
Faculty Of IT and Computer Science
PARUL UNIVERSITY

Uses the backend specified by the option plotting.backend. By default,

matplotlib is used.

Parameters:

dataSeries or DataFrame

The object for which the method is called.

xlabel or position, default None

Only used if data is a DataFrame.

ylabel, position or list of label, positions, default None

Allows plotting of one column versus another. Only used if data is a

DataFrame.
kindstr

The kind of plot to produce:

‘line’ : line plot (default)

•
• ‘bar’ : vertical bar plot
• ‘barh’ : horizontal bar plot
• ‘hist’ : histogram
• ‘box’ : boxplot
• ‘kde’ : Kernel Density Estimation plot
• ‘density’ : same as ‘kde’
• ‘area’ : area plot
• ‘pie’ : pie plot
• ‘scatter’ : scatter plot (DataFrame only)
• ‘hexbin’ : hexbin plot (DataFrame only)
axmatplotlib axes object, default None

An axes of the current figure.

Python AI-IMCA SEM-2 Prof Nirmit Shah 8

Parul Institute of Computer Application
Faculty Of IT and Computer Science
PARUL UNIVERSITY

ExampleGet your own Python Server

Import pyplot from Matplotlib and visualize our DataFrame:
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('data.csv')

df.plot()

plt.show()

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('data.csv')

df.plot(kind = 'scatter', x = 'Duration', y = 'Calories')

plt.show()

Python AI-IMCA SEM-2 Prof Nirmit Shah 9

40 NumPy and Pandas Interview Questions With Answers 1740141557
No ratings yet
40 NumPy and Pandas Interview Questions With Answers 1740141557
6 pages
Ipl Data Anlysis
No ratings yet
Ipl Data Anlysis
20 pages
Pandas
No ratings yet
Pandas
27 pages
1-Pandas Cheat Sheet
No ratings yet
1-Pandas Cheat Sheet
7 pages
Data Analysis With Python & Pandas
100% (2)
Data Analysis With Python & Pandas
378 pages
Unit-1 Python Pandas
No ratings yet
Unit-1 Python Pandas
56 pages
Pandas
No ratings yet
Pandas
41 pages
Pandas Class XII (2021-22)
No ratings yet
Pandas Class XII (2021-22)
246 pages
Learn Autonomous Programming With Python Utilize Python's Capabilities in Artificial Intelligence, Machine Learning, Deep... (P Divadkar, Varun) (Z-Library)
No ratings yet
Learn Autonomous Programming With Python Utilize Python's Capabilities in Artificial Intelligence, Machine Learning, Deep... (P Divadkar, Varun) (Z-Library)
435 pages
Pandas 6 1716219621
No ratings yet
Pandas 6 1716219621
17 pages
Pandas For Machine Learning: Acadview
No ratings yet
Pandas For Machine Learning: Acadview
18 pages
Python Pandas
No ratings yet
Python Pandas
177 pages
Pandas
No ratings yet
Pandas
30 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
18 Pandas
No ratings yet
18 Pandas
33 pages
Pandas
No ratings yet
Pandas
86 pages
Pandas
No ratings yet
Pandas
8 pages
843 Class 11 AI Curriculum (2025-26)
No ratings yet
843 Class 11 AI Curriculum (2025-26)
34 pages
Pandas For Data Science
No ratings yet
Pandas For Data Science
42 pages
Pandas CheatSheet
No ratings yet
Pandas CheatSheet
18 pages
Pandas
No ratings yet
Pandas
14 pages
Pandas Notes Design
No ratings yet
Pandas Notes Design
5 pages
Pandas Notes
No ratings yet
Pandas Notes
44 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
19 pages
Pandas Notes
No ratings yet
Pandas Notes
6 pages
XII-IP - Data Visualisation
No ratings yet
XII-IP - Data Visualisation
65 pages
Pandas Methods
No ratings yet
Pandas Methods
6 pages
Pandas
No ratings yet
Pandas
9 pages
Module1-Cheat-Sheet-LINE PLOT
No ratings yet
Module1-Cheat-Sheet-LINE PLOT
3 pages
The Racers Life
No ratings yet
The Racers Life
74 pages
B.tech CSE (AI and ML) - 23 April-1
No ratings yet
B.tech CSE (AI and ML) - 23 April-1
34 pages
1 - Interactive Data Visualization With Bokeh
No ratings yet
1 - Interactive Data Visualization With Bokeh
31 pages
Pandas in Python 16sept2022
No ratings yet
Pandas in Python 16sept2022
8 pages
Data Analysis With Pandas - Aggregates in Pandas Cheatsheet - Codecademy
100% (1)
Data Analysis With Pandas - Aggregates in Pandas Cheatsheet - Codecademy
2 pages
Python Pandas Cheatsheety
No ratings yet
Python Pandas Cheatsheety
7 pages
Top 50 Pandas Interview Questions and Answers (2024)
No ratings yet
Top 50 Pandas Interview Questions and Answers (2024)
34 pages
Pandas
No ratings yet
Pandas
13 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
Data Science Python Cheat Sheet
No ratings yet
Data Science Python Cheat Sheet
25 pages
IP TERM-1 Study Material (Session 2021-22)
No ratings yet
IP TERM-1 Study Material (Session 2021-22)
84 pages
Data Visualization With Pandas
No ratings yet
Data Visualization With Pandas
8 pages
Unit 4 Fod
100% (1)
Unit 4 Fod
21 pages
Panda Cheatsheet
No ratings yet
Panda Cheatsheet
17 pages
Block 1-Data Handling Using Pandas DataFrame
No ratings yet
Block 1-Data Handling Using Pandas DataFrame
17 pages
Pandas
No ratings yet
Pandas
4 pages
1 Pandas Basics
No ratings yet
1 Pandas Basics
13 pages
Pandas PDF
No ratings yet
Pandas PDF
25 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
Journal 12
No ratings yet
Journal 12
54 pages
Python Cheat Sheet For Excel Users
No ratings yet
Python Cheat Sheet For Excel Users
5 pages
Pandas 1705297450
No ratings yet
Pandas 1705297450
21 pages
Pandas Notes Basic To Advance
No ratings yet
Pandas Notes Basic To Advance
21 pages
Class 12th QuestionBank InformaticsPractices
No ratings yet
Class 12th QuestionBank InformaticsPractices
148 pages
Christian Mayer, Lukas Rieger, Kyrylo Kravets - Coffee Break Pandas - 74 Pandas Puzzles To Build Your Pandas Data Science Superpower-Finxter - Com (2020)
No ratings yet
Christian Mayer, Lukas Rieger, Kyrylo Kravets - Coffee Break Pandas - 74 Pandas Puzzles To Build Your Pandas Data Science Superpower-Finxter - Com (2020)
156 pages
LMRS Ip 2020 21
No ratings yet
LMRS Ip 2020 21
21 pages
Pandas Notes
No ratings yet
Pandas Notes
4 pages
DS Capstone Presentation
No ratings yet
DS Capstone Presentation
46 pages
Chapter - 6 Dictionary
100% (2)
Chapter - 6 Dictionary
25 pages
Pandas Cheat Sheet - Python For Data Science
No ratings yet
Pandas Cheat Sheet - Python For Data Science
5 pages
Tuple in Python PDF
No ratings yet
Tuple in Python PDF
20 pages
EDA With Pandas
No ratings yet
EDA With Pandas
8 pages
Research Paper Presentation Pandas Moshiul Arefin
No ratings yet
Research Paper Presentation Pandas Moshiul Arefin
30 pages
Informatics Practices Practical List22-2323
100% (1)
Informatics Practices Practical List22-2323
7 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
DSL Pandas
No ratings yet
DSL Pandas
87 pages
2 Marks Questions
No ratings yet
2 Marks Questions
116 pages
Hrithik Saini Class 12th c1, Roll No 1033
No ratings yet
Hrithik Saini Class 12th c1, Roll No 1033
25 pages
Data Science With Rust From Fundamentals To Insights Van Dyke Instant Download
No ratings yet
Data Science With Rust From Fundamentals To Insights Van Dyke Instant Download
67 pages
International Indian School, Riyadh WORKSHEET (2020-2021) Grade - Xii - Informatics Practices - Second Term
No ratings yet
International Indian School, Riyadh WORKSHEET (2020-2021) Grade - Xii - Informatics Practices - Second Term
9 pages
Data Exploration and Visualisation LP
No ratings yet
Data Exploration and Visualisation LP
4 pages
Team-7 Project Report
No ratings yet
Team-7 Project Report
59 pages
8322346-Practical File Artificial Intelligence Class 10 For 2023-24 - Final
No ratings yet
8322346-Practical File Artificial Intelligence Class 10 For 2023-24 - Final
16 pages
DSBDAL Lab Manual
No ratings yet
DSBDAL Lab Manual
26 pages
Python Programming Lab Manual 3rd Sem BCA
No ratings yet
Python Programming Lab Manual 3rd Sem BCA
22 pages
Data Science Project Details
No ratings yet
Data Science Project Details
8 pages
MC4112 Set3
No ratings yet
MC4112 Set3
3 pages
Here's A Complete Roadmap For Learning Python, From Absolute Beginner To Advanced Topics, With Resources For Each Step.
No ratings yet
Here's A Complete Roadmap For Learning Python, From Absolute Beginner To Advanced Topics, With Resources For Each Step.
3 pages
AD3411 - 1 To 5
No ratings yet
AD3411 - 1 To 5
11 pages
Cyber Threat Detection Based On Artificial Neural Networks
No ratings yet
Cyber Threat Detection Based On Artificial Neural Networks
5 pages
Evolution of Data Analytics
No ratings yet
Evolution of Data Analytics
8 pages
Module 6 NumPY and Pandas
No ratings yet
Module 6 NumPY and Pandas
12 pages
Data Preprocessing Using Python. Python Implementation of Data - by Suneet Jain - Medium
No ratings yet
Data Preprocessing Using Python. Python Implementation of Data - by Suneet Jain - Medium
20 pages
Eikon Data Api For Python v1
No ratings yet
Eikon Data Api For Python v1
15 pages
Expense Tracker
No ratings yet
Expense Tracker
9 pages
Python For Data Science
No ratings yet
Python For Data Science
3 pages
Python Prog Question Bank 1 To 5 Units 2425
No ratings yet
Python Prog Question Bank 1 To 5 Units 2425
4 pages
Nasreen F Resume SP
No ratings yet
Nasreen F Resume SP
1 page
Practice - IP - Series
No ratings yet
Practice - IP - Series
4 pages
Resume of Data Analyst
No ratings yet
Resume of Data Analyst
2 pages

ML Lab1 Python Panda

Uploaded by

ML Lab1 Python Panda

Uploaded by

Parul Institute of Computer Application

Faculty Of IT and Computer Science

It has functions for analyzing, cleaning, exploring, and

The name "Pandas" has a reference to both "Panel Data", and

Why Use Pandas?

Relevant data is very important in data science.

Data Science: is a branch of computer science where we study how

What Can Pandas Do?

• Is there a correlation between two or more columns?

Where is the Pandas Codebase?

The source code for Pandas is located at this github

pip install pandas

In our examples we will be using a CSV file called 'data.csv'.

Tip: use to_string() to print the entire DataFrame.

Python AI-IMCA SEM-2 Prof Nirmit Shah 2

The number of rows returned is defined in Pandas option settings.

You can check your system's maximum rows with

Check the number of maximum returned rows:

Increase the maximum number of rows to display the entire DataFrame:

Viewing the Data

ExampleGet your own Python Server

Get a quick overview by printing the first 10 rows of the DataFrame:

Python AI-IMCA SEM-2 Prof Nirmit Shah 3

Print the first 5 rows of the DataFrame:

Print the last 5 rows of the DataFrame:

Info About the Data

Print information about the data:

Python AI-IMCA SEM-2 Prof Nirmit Shah 4

To discover duplicates, we can use the duplicated() method.

The duplicated() method returns a Boolean values for each row:

ExampleGet your own Python Server

Let Learn Panda with Small example

Practical 1 : Use Pandas to Calculate Stats from an Imported CSV File

• Standard deviation of salaries

# block 1 - simple stats

print('mean salary: ' + str(mean1))

print('sum of salaries: ' + str(sum1))

print('max salary: ' + str(max1))

print('min salary: ' + str(min1))

print('count of salaries: ' + str(count1))

print('median salary: ' + str(median1))

print('std of salaries: ' + str(std1))

print('var of salaries: ' + str(var1))

print('sum of values, grouped by the country: ' + str(groupby_sum1))

print('count of values, grouped by the country: ' + str(groupby_count1))

Pandas uses the plot() method to create diagrams.

We can use Pyplot, a submodule of the Matplotlib library

Make plots of Series or DataFrame.

Uses the backend specified by the option plotting.backend. By default,

The object for which the method is called.

Only used if data is a DataFrame.

Allows plotting of one column versus another. Only used if data is a

The kind of plot to produce:

‘line’ : line plot (default)

An axes of the current figure.

Python AI-IMCA SEM-2 Prof Nirmit Shah 8

ExampleGet your own Python Server

df.plot(kind = 'scatter', x = 'Duration', y = 'Calories')

Python AI-IMCA SEM-2 Prof Nirmit Shah 9

You might also like