L CsvReadWrite
L CsvReadWrite
21 CSV FILE
CSV (Comma Separated Values) is a simple file format used to store tabular
spreadsheet or database. ACSV file stores tabular data (numbers and text) in
data, such
plain text.
line of the file is adata record. Each record consists of one or more fields, separated
by Each
The use of the comma as afield separator is the source of the name for this file format commas
For working CSVfiles in Python,there is an in-built module called csv. Files of this format :are
used to exchange data, usually when there is alarge amount, between different applicationsgeneral y
1.22 DATA TRANSFER BETWEEN DATAFRAMES AND .CSV FILE
CSV format is a kind of tabular data separated by comma and is stored in the form of plain +.
Roll No Name Marks
Roll No, Name, Marks
101 Ramesh 77.5 After conversion to CSV Format 101,Ramesh, 77.5
102 Harish 45.6 102, Harish, 45.6
Tabular Data CSV File
Fig. 1.4: Tabular Data vs CSV Data
In CSV format:
" Each row of the table is stored in one row in a CSV file.
" The field values of a row are stored together with comma after
every field value.
Advantages of CSV format:
" A simple, compact and universal format for data
storage.
" Acommon format for data interchange.
" It can be opened in popular spreadsheet packages like MS
Excel, OpenOffice Calc, etc.
Nearly allspreadsheets and databases support
import/export to CSV format.
CTM: CSV is a simple file format used to store
tabular data, such as a spreadsheet or database.
1.22.1 Creating and Reading CSV File
ACSV is a text file so it can be
created and edited using any text editor. More
aCSV file is created by exporting a frequently, however,
spreadsheet or database in the program that
All CSV files followa standard created it.
format, i.e., each column is separated by a
Comma, semicolon, space or a tab) and each new line delimiter (such as a
indicates a new roW.
Let us create a CSV file using
Microsoft Excel on the basis of "Employee" table.
Table 1.1: Employee
Empid Name Age City Salary
100
Ritesh 25 Mumbai 15000 7.
101 Aakash 26 Goa 16000
102
Mahima 27 Hyderabad
103 20000
104
Lakshay 23 Delhi
18000
Manu 25 Mumbai
105 Nidhi 26 Delhi
25000
106 Geetu 30 Bengaluru
28000
1.58 Informatics Practices with Python-X|
1. Launch Microsoft Excel.
2. Type the data given in Table 1.1 in the Excel sheet (Fig. 1.5). You willalso notice that some
cell values are missing to represent missing values (NaN) in Pandas dataframe.
Booki Microsoft Excel
Home Insert Page Layout Formulas Dsts Reiew Addtns
Paste
Copy
J Format Pairter L Merge &Center
Cipboard Font Alignment Humber
P13
A D G H
11
X
Microsoft Office Excel
The selected fle type does not support workbooks that contain multicle sheets.
"To save only the acove sheet, dd OK.
that supports mulbiple sheets.
"To save al sheets, save hem individualy using a different fle name for each, or hoosea fle type
OK Cancel
Employee.cy may contain features that are not compatble vith CS (Comma delimited). Do you want to keep the workbook in this format?
1 T o keeo this format, vwhih leaves out any incompatblein features, dd Yes.
"To preserve the feaures, dik No. Then save a copy he latest Excel format.
"To see what might be lost, didk Help.
Yes NG Help
import pandas as pd
-pd.read csv
print (df) ("E:\\Data\\Employee.csv") #Select the proper path of your file
RESTART: C:/Users
1
2
Python37-32/prog csv_dfl.py
Empid
100.0
101.0
NaN
Ritesh
Aakash
Name
/preeti/AppData/Local/Programs/Python
Age
25.0
26.0
city
Mumbai
Goa
Salary
15000.0
16000.0
/
3 NaN NaN
102.0 NaN NaN
4 l03.0 Mahimna 27.0 Hyderabad 20000,0
5 104.0 Lakshay
Manu
23.0 Delhi 18000.0
6
105.0 25.0 Mumbai
7 Nidhi 26.0 25000.0
106.0 Geetu Delhi NaN
>>> 30.0 Bengaluru
28000.0
Ln: 14 Col: 4
1.60, Informatics Practices with Python-XII
One thing tobe rememberedis that the missing values from the CSV file shall be treated as NaN
(Not a Number) in Pandas dataframe.
Practical Implementation-50
To display the shape (number of rows and columns) of the CSV file.
We can see the total number of rows (records) and columns (fields) present in the table with
the help of shape command.
Empid Name Age City Salary
100.0 Ritesh 25.0 Mumbai 15000.0
1 101.0 Aakash 26.0 Goa 16000.0
2 NaN NaN NaN NaN NaN
3 102.0 Mahima 27.0 Hyderabad 20000.0
4 103.0 Lakshay 23.0 Delhi 18000.0
5 104.0 Manu 25.0 Mumbai 25000.0
6 105.0 Nidhi 26.0 Delhi NaN
7 106.0 Geetu 30.0 Bengaluru 28000.0
>>> df. shape
(8, 5)
In the above case, we have directly displayed the row count and column count at Python shell
prompt by giving the command as df.shape. We can also display it using variables.
the
The read_csv) method automatically takes the first row of the csv file and assigns it as
dataframe header. After the creation of dataframe from aCSV file, you can perform all the
dataframe operations on it.
Reading CSV file with specific/selected columns
several columns contained in
While working with large tables in CSV format, there can be
into a dataframe. This can be done by
it. But you may require selective columns to be read
For example, in the case of
using "usecols" attribute or option along with read_csv) method.
This can be done by
"Employee" table, you have to access Name, Age and Salary of employees.
giving the command as:
("E:\\Data\\Employee.csv",
>>> df = pd.read csv
usecols = ['Name', 'Age','Salary' ])
>>> df
Practical Implementation-51
Employee.csv.
To display Name, Age and Salary from
usecols =('Name','Age', 'Salary)
>>> df = pd.read csv ("E:\\Data\\Employee.csv",
>>> df
Name Age Salary
Ritesh 25.0 15000.0
1 Aakash 26.0 16000.0
2 NaN NaN NaN
3 Mahima 27.0 20000.0
Lakshay 23.0 18000.0
Manu 25.0 25000.0
6 Nidhi 26.0 NaN
Geetu 30.0 28000.0
7
>>>
Data Handling using Pandas 1.61
Reading CSV file with specific/selected rows
Like columns,there can be thousands of records in acsv file. You can display selective
rows or selective lines using "nrows" option or attribute used with read_csv(0 method. This e records|
be done by giving the command as:
>>> df = pd.read csv("E:\\Data\\Employee. csv",nrows=5)
>>> print (df)
Practical Implementation-52
To display only 5 records from Employee.csv.
&prog.cv_df1.py - C\Users\preetiNAppData\ Local Programs\Python\ Pyth.- - Dx
File Edit Format Run Options Window Help
#To open Employee.csv for selective rows only
import pandas as pd
|df -pd.read csv("E:\\Data\\Employee.csv",nrows = 5)
print (df)
L8 Cok 0
Inthe above code, we have given 5 as the value to 'nrows' attribute used with read_csv) function.
nrows means number of rows. In the above example, 5 represents the first five records, even empty
records containing NaN values, excluding headers. Hence, the following output shall be obtained.
>>>
RESTART: C:\Users\preeti\AppData\Local\Programs \
Python\Python37-32\prog_cSv_dfl.py
Empid
100.0
Name City Age Salary
Rítesh 25.0 Mumbai 15000.0
1 101.0 Aakash 26.0
NaN NaN NaN
Goa 16000.0
NaN NaN
3 102.0 Mahima 27.0
4
Hyderabad 20000.0
103.0 Lakshay 23.0 Delhi 18000.0
>>>
column
code, we have given the option skiprows = 1which will omit the default
In the above displayed while
argument holds the new column names to be
names from the CSV file. names
following output shall be displayed:
loading the CSV file into the dataframe. Hence, the
>>> C:\Users\preeti\AppData\Local\Programs\Python\ Python3
RESTART:
7-32\prog_csv_dfl .py Ecity Esalary
E id Ename E age
Mumbai 15000.0
0 100.0 Ritesh 25.0
Goa 16000.0
Aakash 26.0
1 101.0 NaN NaN
NaN NaN
2 NaN 20000.0
102.0 Mahima 27.0 Hyderabad
3 Delhi 18000.0
103.0 Lakshay 23.0
4 MImbai 25000.0
Manu 25.0
5 104.0 Delhi NaN
Nidhi 26.0
6 105.0 30.0 Bengaluru 28000.0
|7 106.0 Geetu
>>>
1.23 UPDATING/MODIFYING CONTENTS IN ACSV FILE
In the previous section, we learnt how to change column name. Similarly, we can modify
update row data as well.
We will now discuss ways in which the value(s) of a column can be updated. The best and th.
optimalway to update any column value of aCSV is to use the Pandas Library and the dataframe
functions. The following steps are to be followed for updating column contents in a CSV file usine
dataframe.
Import module
Open CSV file and read itsdata
Find column to be updated
Update value in the CSV file using to_csv)function
Consider the Employee.csv used in the previous implementations. Suppose we wish to change
the employee name (Ename) from Mahima to Harsh.
Practical Implementation-56
To modify the employee name from Mahima to Harsh.
>>>
RESTART: C:/Users/preeti/AppData/Local/Programs.
updtecol.py
Dataframe Contents before updation
Empid Name Age City Salary
100.0 Ritesh 25.0 Mumbai 15000.0
101.0 Aakash 26.0 Goa 16000.0
2 NaN NaN NaN NaN NaN
3 102.0 Mahima 27.0 Hyderabad 20000.0
4 103.0 Lakshay 23.0 Delhi 18000.0
5 104.0 Manu 25.0 Mumbai 25000.0
6 105.0 Nidhi 26.0 NaN NaN
106.0 Geetu 30.0 NaN 28000.0
import pandas as pd
df -pd. read csv ("E:\\Data\\Employee. csv")
df.to csv ("E:\\Data\\Empnew.csv")
print (df)
Ln: 10 Cot 0
DCopy path
X
Delete Renace New
DEasy acces Properties History
9 seled none
Invert seledion
CoPy
Copy Paste Paste shortcut folder
New Open Select
Clipboard Organize
Search Data
This PC New Volume (E) Data
(’ ‘
DName
Empnew.V- Microsoft Excel
KFavortes Formulas Data Review View Add-Ins
Employeecsv Home Insert Page Layout
Desktop
General
lo Downtoads ZEnpnew.cN Calibi 11
Recent places
B |A EI9 %
Styles Cells
Paste
S| A 2
Font Alignment Number Editing
OneDrive Clipboard
This Pc G H
A C
Ausic
102 Mahima 27 Hyderaba 20000
Pictues
4 103 Lakshay 23 Delhl 18000
a Videos 6
104 Manu 25 Mumbal 25000
Windouse_OS (C)
G LNONO (D) Nidhi 26 Delhi
23000
GNex Volume (E) 106 Geetu 30 Bengaluru
10
GNetaotk
X Cut
Copy Paste
Copy path Student.cSV - Microsoft Excel
D Paste shortcut Move Copy Delete Rename
to to Home Insert Page Layout Formulas Data
Clipboard
Review View Add-Ins o x
Organize Calibri |1 General
- ‘ A
This PC New Volume (E) Data BiI U-AIE %
Paste
A Styles Cells
Favorites DName Clipboard G Font
2
Alignment G Number
Editing
Desktop
lo Downloads Employee.csv
Empnew.csv A
AJ
Recent places C D G
Student.csv 1 RollNo StudName Marks Class
H
Ln: 43 Cot 4
In the given output window, Unnamed: 0column gets displayed automatically along with the index
values. To avoid this column,use the attribute index_col =0 with read csv() method as shown:
Python 3.7.0 Shell
File Edit Shell Debug Options Window Help
Data
Copy path
Xs
Delete Rename Neww
Easy access
Properties
2Edit 8 Select none
Invert seledion
Move Copy History
Copy Paste Paste shortcut to to folder
Salad
Clipboard
Organize Emp.csy - Microsoft Excel
New Volume (E) Data Review View Add-Ins O X
‘ This PC Home Insert Page Layout Formulas Data
Name Calibri 11
Favorites BI U AA Alignment Number Styles Cells
Desktop Emp.ca Paste
A 2
De Downloads Employee.csv Font
Editing
Empnew.csv Clipboard
Recent places A1
Student.csv F G
A C D E
OneDrive Name
Empid
100 Ritesh
2
This PC 3 101 Aakash
Desktop 4
B Documents 102 Mahima
o Downloads 103 Lakshay
WMusic 7 104 Manu
Pictures 3 6 105 Nidhi
a Videos 106 Geetu
B, Windows8_ OS (C) 10
G LENOVO (D:) 11
G New Vlume (E) 12
K-4b Emp
O 100%
Network Ready
(case-sensitive).
are the same in both the files
Learning Tip: Ensure that field names