0% found this document useful (0 votes)

42 views7 pages

EDA - Session-1 - Basic Dataframe Opertaions-1

The document discusses reading and manipulating CSV files using Pandas in Python. It shows how to: 1) Import necessary libraries and read a CSV file into a Pandas DataFrame. 2) Create DataFrames from lists, dictionaries, and zipped data with or without specifying column names and indexes. 3) Add, update, and drop columns and rows from DataFrames. 4) Save DataFrames to CSV and Excel files.

Uploaded by

jeeshu048

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views7 pages

EDA - Session-1 - Basic Dataframe Opertaions-1

Uploaded by

jeeshu048

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Reading the Csv file

In [1]: import pandas as pd # Dataframe operations

import numpy as np # Math operations
import matplotlib.pyplot as plt # Diagrams / plots
import seaborn as sns # Diagrams / plots

In [ ]: # data set name: visadataset

# read csv file : Comma separeated value
# extension : .csv
# you can read this using pandas package

# read excel file
# extension: .xlsx

In [2]: # path
#file location+filename+extension

path=r"C:\Users\omkar\OneDrive\Documents\Data science\Naresh IT\Datafiles\Vi

In [3]: pd.read_csv(path)

Out[3]:
case_id continent education_of_employee has_job_experience requires_job_trainin

0 EZYV01 Asia High School N

1 EZYV02 Asia Master's Y

2 EZYV03 Asia Bachelor's N

3 EZYV04 Asia Bachelor's N

4 EZYV05 Africa Master's Y

... ... ... ... ...

25475 EZYV25476 Asia Bachelor's Y

25476 EZYV25477 Asia High School Y

25477 EZYV25478 Asia Master's Y

25478 EZYV25479 Asia Master's Y

25479 EZYV25480 Asia Bachelor's Y

25480 rows × 12 columns

 
In [8]: # Can you do bank data
# data set name= bank
path=r"C:\Users\omkar\OneDrive\Documents\Data science\Naresh IT\Datafiles\ba
pd.read_csv(path,
sep=';')

Out[8]:
age job marital education default balance housing loan contact day m

0 30 unemployed married primary no 1787 no no cellular 19

1 33 services married secondary no 4789 yes yes cellular 11

2 35 management single tertiary no 1350 yes no cellular 16

3 30 management married tertiary no 1476 yes yes unknown 3

4 59 blue-collar married secondary no 0 yes no unknown 5

... ... ... ... ... ... ... ... ... ... ...

4516 33 services married secondary no -333 yes no cellular 30

self-
4517 57 married tertiary yes -3313 yes yes unknown 9
employed

4518 57 technician married secondary no 295 no no cellular 19

4519 28 blue-collar married secondary no 1137 no no cellular 6

4520 44 entrepreneur single tertiary no 1136 yes yes cellular 3

4521 rows × 17 columns

 

Creat dataframes using List

In [10]: name=['Ramesh','Suresh','Sathish']
age=[30,35,40]
name,age

Out[10]: (['Ramesh', 'Suresh', 'Sathish'], [30, 35, 40])

Step-1

𝑐𝑟𝑒𝑎𝑡𝑒 𝑑𝑎𝑡𝑎𝑓𝑟𝑎𝑚𝑒
In [11]: pd.DataFrame() # make the dataframe

Out[11]:

Step-2

𝑝𝑟𝑜𝑣𝑖𝑑𝑒 𝑑𝑎𝑡𝑎
In [12]: pd.DataFrame(zip(name,age))

Out[12]:
0 1

0 Ramesh 30

1 Suresh 35

2 Sathish 40

Step-3

𝑝𝑟𝑜𝑣𝑖𝑑𝑒 𝑐𝑜𝑙𝑢𝑚𝑛𝑠
In [15]: #Provide columns
data=zip(name,age)
cols=['Name','Age']
pd.DataFrame(data,columns=cols)
#pd.DataFrame(zip(name,age),columns=['Name','Age'])

Out[15]:
Name Age

0 Ramesh 30

1 Suresh 35

2 Sathish 40

Step-4

𝑝𝑟𝑜𝑣𝑖𝑑𝑒 𝑖𝑛𝑑𝑒𝑥
In [16]: data=zip(name,age)
cols=['Name','Age']
ind=['A','B','C']
pd.DataFrame(data,
columns=cols,
index=ind)

Out[16]:
Name Age

A Ramesh 30

B Suresh 35

C Sathish 40

Step-5

𝐴𝑑𝑑 𝑛𝑒𝑤 𝑐𝑜𝑙𝑢𝑚𝑛

In [17]: name=['Ramesh','Suresh','Sathish']
age=[30,35,40]

data=zip(name,age)
cols=['Name','Age']
ind=['A','B','C']
df=pd.DataFrame(data,columns=cols,index=ind)
df

Out[17]:
Name Age

A Ramesh 30

B Suresh 35

C Sathish 40

if you want to add a new column

df['new column']
you need to have a list which is having some elements
that elements need to equal to number of rows
city_names=['Hyd','Blr','Chennai']
df['city']=city_names

In [19]: city_names=['Hyd','Blr','Chennai']
df['city']=city_names
df

Out[19]:
Name Age city

A Ramesh 30 Hyd

B Suresh 35 Blr

C Sathish 40 Chennai

Step-6

𝑢𝑝𝑑𝑎𝑡𝑒 𝑡ℎ𝑒 𝑒𝑥𝑠𝑖𝑠𝑡𝑖𝑛𝑔 𝑐𝑜𝑙𝑢𝑚𝑛

if you want to create new column or update the old column
both are same way

In [22]: df['Name']=['Swamy','Asif','Sathwik']
df

Out[22]:
Name Age city

A Swamy 30 Hyd

B Asif 35 Blr

C Sathwik 40 Chennai

Step-7
𝑑𝑟𝑜𝑝 𝑡ℎ𝑒 𝑐𝑜𝑙𝑢𝑚𝑛
In order to drop the column
We need to use drop method
It takes 3 parameters
drop column or row
mention the column name
axis
axis=1 reference as column
axis=0 reference as row
you want to create a new dataframe or
you want overwrite the existing dataframe
inplace= True

In [23]: df.drop('city', # column name

axis=1, # Column
inplace=True) # overwrite the same

In [24]: df

Out[24]:
Name Age

A Swamy 30

B Asif 35

C Sathwik 40

In [25]: name=['Ramesh','Suresh','Sathish']
age=[30,35,40]

df=pd.DataFrame(zip(name,age),
columns=['Name','Age'],
index=['A','B','C'])

city_names=['Hyd','Blr','Chennai']
df['city']=city_names

df.drop('city', # column name
axis=1, # Column
inplace=True) # overwrite the same
df

Out[25]:
Name Age

A Ramesh 30

B Suresh 35

C Sathish 40

𝑆𝑡𝑒𝑝 − 8
Drop rows
In [26]: df.drop('C', # column name
axis=0, # Column
inplace=True) # overwrite the same
df

Out[26]:
Name Age

A Ramesh 30

B Suresh 35

Step-9

𝑠𝑎𝑣𝑒 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎𝑓𝑟𝑎𝑚𝑒

In [27]: df.to_csv("output.csv")
# while saving index consider as extra column
df.to_excel("output.xlsx")

In [28]: # read output csv

pd.read_csv("output.csv")

Out[28]:
Unnamed: 0 Name Age

0 A Ramesh 30

1 B Suresh 35

Step-10

𝑅𝑒𝑚𝑜𝑣𝑒 𝑇ℎ𝑒 𝐼𝑛𝑑𝑒𝑥

In [29]: # To avoid the above problem
# give index=False
df.to_csv("output.csv",index=False)

In [30]: pd.read_csv("output.csv")

Out[30]:
Name Age

0 Ramesh 30

1 Suresh 35
Creat dataframes using dictionary
In [32]: d1={"NAME":['Ramesh','Suresh','Sathish'],
"AGE":[30,35,40]}

pd.DataFrame(d1)

# No need of zip
# No need of column names

Out[32]:
NAME AGE

0 Ramesh 30

1 Suresh 35

2 Sathish 40

In [ ]:

El Grimorio de La Rosa - Copiar
100% (1)
El Grimorio de La Rosa - Copiar
354 pages
(PDF) Makroekonomi Sadono Sukirno - Compress
No ratings yet
(PDF) Makroekonomi Sadono Sukirno - Compress
5 pages
Pfaff Service Books
No ratings yet
Pfaff Service Books
10 pages
Korean Grammar in Use Beginner PDF
No ratings yet
Korean Grammar in Use Beginner PDF
147 pages
Astronaut in The Ocean
No ratings yet
Astronaut in The Ocean
9 pages
CH-6 Data Loading, Storage, and File Formats
No ratings yet
CH-6 Data Loading, Storage, and File Formats
163 pages
12 Information Practices Text Book Preeti Arora
No ratings yet
12 Information Practices Text Book Preeti Arora
45 pages
Python Interviews
No ratings yet
Python Interviews
154 pages
Derek Rake Shogun Method Rar PDF
67% (3)
Derek Rake Shogun Method Rar PDF
2 pages
Onoda Hiroo Luche y Sobrevivi Mi Guerra de 4 PDF Free
No ratings yet
Onoda Hiroo Luche y Sobrevivi Mi Guerra de 4 PDF Free
148 pages
Python Notes by Prof T
No ratings yet
Python Notes by Prof T
10 pages
Luxor Quest For The Afterlife Setup Log
No ratings yet
Luxor Quest For The Afterlife Setup Log
27 pages
Raj Comics List
No ratings yet
Raj Comics List
95 pages
DataFrame 1
No ratings yet
DataFrame 1
3 pages
Chapter 7 Answers For Find The Output and Error
No ratings yet
Chapter 7 Answers For Find The Output and Error
4 pages
NumPy and Pandas Step
No ratings yet
NumPy and Pandas Step
9 pages
05 Pandas Data Frames
No ratings yet
05 Pandas Data Frames
33 pages
Dataframe Ip
No ratings yet
Dataframe Ip
75 pages
Crash 2025 05 01 19 54 23 750
No ratings yet
Crash 2025 05 01 19 54 23 750
9 pages
Pandas, Numpy, Matplotlib
No ratings yet
Pandas, Numpy, Matplotlib
11 pages
Experiment No 3 Importing and Exporting Data in Python Using Pandas Student
No ratings yet
Experiment No 3 Importing and Exporting Data in Python Using Pandas Student
6 pages
Trace
No ratings yet
Trace
318 pages
Assignment 2
No ratings yet
Assignment 2
6 pages
Data Frame
No ratings yet
Data Frame
10 pages
Unit 2 notes-II
No ratings yet
Unit 2 notes-II
47 pages
PDF&Rendition 1
No ratings yet
PDF&Rendition 1
47 pages
Class X11 Dataframe Notes PDF
No ratings yet
Class X11 Dataframe Notes PDF
17 pages
Utorrent Trackers
No ratings yet
Utorrent Trackers
7 pages
Economy of Different Countries
No ratings yet
Economy of Different Countries
24 pages
Dataframe Syntax
No ratings yet
Dataframe Syntax
3 pages
Grade 3 Mental Maths Subtraction Worksheet 1 PDF 3
No ratings yet
Grade 3 Mental Maths Subtraction Worksheet 1 PDF 3
1 page
Pandas - Cheatsheet
No ratings yet
Pandas - Cheatsheet
4 pages
Practical File Python
No ratings yet
Practical File Python
25 pages
GR12 Record Programs 6TH Onwards
No ratings yet
GR12 Record Programs 6TH Onwards
18 pages
Working With Panda
No ratings yet
Working With Panda
13 pages
EDA - Session-6 - Bi Variate Analysis
No ratings yet
EDA - Session-6 - Bi Variate Analysis
17 pages
PYTHON PROGRAMMING: Data Handling
No ratings yet
PYTHON PROGRAMMING: Data Handling
12 pages
Test-1 - Python and Stat - Jupyter Notebook
0% (1)
Test-1 - Python and Stat - Jupyter Notebook
3 pages
Lab 1 ML Lab
No ratings yet
Lab 1 ML Lab
15 pages
S&ve Task 3 QP
No ratings yet
S&ve Task 3 QP
2 pages
Natural
No ratings yet
Natural
1,152 pages
Webdev Practicalfile 2nd Year
No ratings yet
Webdev Practicalfile 2nd Year
59 pages
Data Analysis Tools
No ratings yet
Data Analysis Tools
26 pages
Justenoughpython Pandas 220915 175329
No ratings yet
Justenoughpython Pandas 220915 175329
64 pages
Cheat Sheet Pandas
No ratings yet
Cheat Sheet Pandas
4 pages
Python Data File Handling Part 01
No ratings yet
Python Data File Handling Part 01
16 pages
Pandas Introduction: What Is Python Pandas Used For?
No ratings yet
Pandas Introduction: What Is Python Pandas Used For?
28 pages
Python Pandas-DataFrames Complete - Jupyter Notebook
No ratings yet
Python Pandas-DataFrames Complete - Jupyter Notebook
34 pages
EDA - Session-7 - Convert Categorical To Numerical
No ratings yet
EDA - Session-7 - Convert Categorical To Numerical
5 pages
Log File
No ratings yet
Log File
113 pages
Data Frames
No ratings yet
Data Frames
60 pages
CSL 410 L16
No ratings yet
CSL 410 L16
22 pages
Data Frames Pandas, Handout 1
No ratings yet
Data Frames Pandas, Handout 1
16 pages
Ainotes Dataframe
No ratings yet
Ainotes Dataframe
5 pages
Lab 9
No ratings yet
Lab 9
9 pages
Chapter Notes - Data Handling Using Pandas DataFrame
No ratings yet
Chapter Notes - Data Handling Using Pandas DataFrame
16 pages
Xii Record (Dataframe & CSV)
No ratings yet
Xii Record (Dataframe & CSV)
11 pages
As THIS
No ratings yet
As THIS
11 pages
Ainotes
No ratings yet
Ainotes
5 pages
Oracle - Server PE Exp 1.xml
No ratings yet
Oracle - Server PE Exp 1.xml
2 pages
Class 1 - 2024 Business Analytics
No ratings yet
Class 1 - 2024 Business Analytics
8 pages
Metamorphosis Clean
No ratings yet
Metamorphosis Clean
35 pages
Bill
No ratings yet
Bill
40 pages
OBB
No ratings yet
OBB
2 pages
IP 12th Chapter 3
No ratings yet
IP 12th Chapter 3
9 pages
12 Pandas
100% (1)
12 Pandas
21 pages
Exp 3
No ratings yet
Exp 3
10 pages
Dataframing in CSV
No ratings yet
Dataframing in CSV
14 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Experiment 1 Solution
No ratings yet
Experiment 1 Solution
5 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
L32, 33 Pandas
No ratings yet
L32, 33 Pandas
7 pages
Pandas
No ratings yet
Pandas
27 pages
File Handling
No ratings yet
File Handling
6 pages
In 0 (0
No ratings yet
In 0 (0
11 pages
Statistics Sampling Theoresm Session 8
No ratings yet
Statistics Sampling Theoresm Session 8
5 pages
Panda Cheatsheet
No ratings yet
Panda Cheatsheet
17 pages
Unit - 1
No ratings yet
Unit - 1
29 pages
Pandas Dataframe All Operations 1735471870
No ratings yet
Pandas Dataframe All Operations 1735471870
4 pages
Pandas
No ratings yet
Pandas
4 pages
Day08-Pandas-Tutorial: Pandas - by Punith V T
No ratings yet
Day08-Pandas-Tutorial: Pandas - by Punith V T
8 pages
Chapter 07 Exam Question ID 73: Gzip Myfile - Tar
No ratings yet
Chapter 07 Exam Question ID 73: Gzip Myfile - Tar
5 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
EDA - Session-5 - Outlier Analysis
No ratings yet
EDA - Session-5 - Outlier Analysis
11 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
17 pages
Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
Revision Point - Dataframe
No ratings yet
Revision Point - Dataframe
11 pages
2019-12-04 18.12.20 Crash
No ratings yet
2019-12-04 18.12.20 Crash
5 pages
Uk Iptv List
No ratings yet
Uk Iptv List
3 pages
Pandas
No ratings yet
Pandas
5 pages
CSS Classes and GIMP Tutorial: Sunpreet Jassal
No ratings yet
CSS Classes and GIMP Tutorial: Sunpreet Jassal
17 pages
Pandas
No ratings yet
Pandas
8 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
Class53 UI5
No ratings yet
Class53 UI5
5 pages
TBG CSG Liner Liner Top KB
No ratings yet
TBG CSG Liner Liner Top KB
11 pages
Linear Algebra Fundamentals
From Everand
Linear Algebra Fundamentals
Kartikeya Dutta
No ratings yet
Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Barron's Educational Series
3/5 (2)

EDA - Session-1 - Basic Dataframe Opertaions-1

Uploaded by

EDA - Session-1 - Basic Dataframe Opertaions-1

Uploaded by

Reading the Csv file

In [1]: import pandas as pd # Dataframe operations

In [ ]: # data set name: visadataset

0 EZYV01 Asia High School N

1 EZYV02 Asia Master's Y

2 EZYV03 Asia Bachelor's N

3 EZYV04 Asia Bachelor's N

4 EZYV05 Africa Master's Y

... ... ... ... ...

25475 EZYV25476 Asia Bachelor's Y

25476 EZYV25477 Asia High School Y

25477 EZYV25478 Asia Master's Y

25478 EZYV25479 Asia Master's Y

25479 EZYV25480 Asia Bachelor's Y

25480 rows × 12 columns

0 30 unemployed married primary no 1787 no no cellular 19

1 33 services married secondary no 4789 yes yes cellular 11

2 35 management single tertiary no 1350 yes no cellular 16

3 30 management married tertiary no 1476 yes yes unknown 3

4 59 blue-collar married secondary no 0 yes no unknown 5

4516 33 services married secondary no -333 yes no cellular 30

4518 57 technician married secondary no 295 no no cellular 19

4519 28 blue-collar married secondary no 1137 no no cellular 6

4520 44 entrepreneur single tertiary no 1136 yes yes cellular 3

4521 rows × 17 columns

Creat dataframes using List

Out[10]: (['Ramesh', 'Suresh', 'Sathish'], [30, 35, 40])

𝐴𝑑𝑑 𝑛𝑒𝑤 𝑐𝑜𝑙𝑢𝑚𝑛

if you want to add a new column

𝑢𝑝𝑑𝑎𝑡𝑒 𝑡ℎ𝑒 𝑒𝑥𝑠𝑖𝑠𝑡𝑖𝑛𝑔 𝑐𝑜𝑙𝑢𝑚𝑛

In [23]: df.drop('city', # column name

𝑠𝑎𝑣𝑒 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎𝑓𝑟𝑎𝑚𝑒

In [28]: # read output csv

𝑅𝑒𝑚𝑜𝑣𝑒 𝑇ℎ𝑒 𝐼𝑛𝑑𝑒𝑥

You might also like