
- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - Read CSV
Reading data from a CSV (Comma-Separated Values) file is one of the most common tasks in data analysis and data science. Python's Pandas library provides a flexible read_csv() method for reading data from CSV files into a DataFrame.
This method simplifies the process of loading the data from a CSV file into a DataFrame (a 2D table-like structure with labeled rows and columns).
In this tutorial, we will learn various aspects of reading CSV files with Pandas, including advanced features like controlling the number of rows to read, parsing dates, handling missing data, and more.
Introduction to read_csv() Method
The pandas.read_csv() method in the Pandas library used to read a CSV file and converts the data into a Pandas DataFrame object. It can read files stored locally or remotely using supported URL schemes like HTTP, FTP, or cloud storage paths (e.g., S3).
This method supports a wide range of parameters for customizing headers, delimiters, and data types.
Key Features of read_csv() Method −
This method can handle large datasets efficiently.
It supports different delimiters and is not limited to reading comma-separated values.
It supports reading data from different sources like, URLs, local files, and file-like objects.
This method handles missing values effectively.
Syntax
Below is the syntax of the read_csv() method −
pandas.read_csv(filepath_or_buffer, sep=',', header='infer', names=None, index_col=None, usecols=None, dtype=None, ...)
Here are some commonly used parameters of this method −
filepath_or_buffer: The file path, URL, or file-like object to read CSV file from.
sep or delimiter: Specifies the character or regex pattern used to separate values in the file. Default is , (comma).
header: Specifies the row number to use as column names.
names: A list of column names to use when there is no header row.
Example
Here is a basic example of using the pandas.read_csv() method for reading a CSV data from local file.
import pandas as pd # Read a CSV file into a DataFrame data = pd.read_csv('data.csv') # Display the first few rows print(data.head())
Following is the output of the above code −
Name | Salary | |
---|---|---|
0 | Ravi | 50000 |
1 | Priya | 45000 |
2 | Kiran | 65000 |
3 | NaN | 55000 |
Reading Large CSV Files in Chunks
For reading large CSV files in small blocks, you can use the chunksize parameter of the read_csv() method. This parameter reads data into smaller, more manageable blocks of given size.
Example
This example shows loading the CSV data into the Pandas DataFrame of smaller chunks using the chunksize parameter.
import pandas as pd url ="https://raw.githubusercontent.com/Opensourcefordatascience/Data-sets/master/blood_pressure.csv" # Read the CSV data in chunks chunks = pd.read_csv(url, chunksize=50) for chunk in chunks: print("Output DataFrame:") print(chunk.head())
When we run above program, it will produces following result −
Output DataFrame:
patient | sex | agegrp | bp_before | bp_after | |
---|---|---|---|---|---|
0 | 1 | Male | 30-45 | 143 | 153 |
1 | 2 | Male | 30-45 | 163 | 170 |
2 | 3 | Male | 30-45 | 153 | 168 |
3 | 4 | Male | 30-45 | 153 | 142 |
4 | 5 | Male | 30-45 | 146 | 141 |
patient | sex | agegrp | bp_before | bp_after | |
---|---|---|---|---|---|
50 | 51 | Male | 60+ | 175 | 146 |
51 | 52 | Male | 60+ | 175 | 160 |
52 | 53 | Male | 60+ | 172 | 175 |
53 | 54 | Male | 60+ | 173 | 163 |
54 | 55 | Male | 60+ | 170 | 185 |
patient | sex | agegrp | bp_before | bp_after | |
---|---|---|---|---|---|
100 | 101 | Female | 60+ | 168 | 178 |
101 | 102 | Female | 60+ | 142 | 141 |
102 | 103 | Female | 60+ | 147 | 149 |
103 | 104 | Female | 60+ | 148 | 148 |
104 | 105 | Female | 60+ | 162 | 138 |
Reading CSV Files with Custom Delimiters
Sometimes, data in a CSV file may use a delimiter other than a comma. In that case you can use the sep or delimiter parameter to specify the custom character used as a separator.
Example
Here is an example that demonstrates specifying the custom delimiter to the sep parameter of the pandas.read_csv() method.
import pandas as pd # Import StringIO to load a file-like object for reading CSV data from io import StringIO # Create a tab-delimited data data = """ Sr.no\tName\tGender\tAge 1\tChinmayi\tfemale\t22 2\tMadhuri\tfemale\t38 3\tKarthik\tmale\t26 4\tGeetha\tfemale\t35 """ # Use StringIO to convert the string data into a file-like object obj = StringIO(data) # Read a tab-separated data using read_csv() data = pd.read_csv(obj, sep='\t') print('Output DataFrame:') print(data.head())
The output of the above code as follows −
Output DataFrame:
Sr.no | Name | Gender | Age | |
---|---|---|---|---|
0 | 1 | Chinmayi | female | 22 |
1 | 2 | Madhuri | female | 38 |
2 | 3 | Karthik | male | 26 |
3 | 4 | Geetha | female | 35 |
Control Reading Number of Rows from CSV
You can limit the number of rows read from a CSV file using the nrows parameter of the pandas.read_csv() method.
Example
This example demonstrates how to limit the number of rows while reading the CSV data using the pandas.read_csv() method with the nrows parameter.
import pandas as pd from io import StringIO # Create a CSV data data = """ Sr.no,Name,Gender,Age 1,Chinmayi,female,22 2,Madhuri,female,38 3,Karthik,male,26 4,Geetha,female,35 """ # Use StringIO to convert the string data into a file-like object obj = StringIO(data) # Read a tab-separated data using read_csv() data = pd.read_csv(obj, nrows=2) print('Output DataFrame:') print(data)
Following is the output of the above code −
Output DataFrame:
Sr.no | Name | Gender | Age | |
---|---|---|---|---|
0 | 1 | Chinmayi | female | 22 |
1 | 2 | Madhuri | female | 38 |
Parsing Dates while Reading CSV Data
The parse_dates parameter handles datetime object while reading the CSV data containing the date columns.
Example
Here is an example that shows how pandas automatically parse date columns into datetime64 objects.
import pandas as pd # Creating DataFrame dataFrame = pd.DataFrame({'Car': ['BMW', 'Lexus', 'Audi', 'Mercedes', 'Jaguar', 'Bentley'], 'Date_of_purchase': ['2024-10-10', '2024-10-12', '2024-10-17', '2024-10-16', '2024-10-19', '2024-10-22']}) # write dataFrame to CSV file dataFrame.to_csv("Sample_CSV_File.csv") # Parsing date columns data = pd.read_csv("Sample_CSV_File.csv", parse_dates=['Date_of_purchase']) print(data.info())
The output of the above code as follows −
<class 'pandas.core.frame.DataFrame'> RangeIndex: 6 entries, 0 to 5 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Unnamed: 0 6 non-null int64 1 Car 6 non-null object 2 Date_of_purchase 6 non-null datetime64[ns] dtypes: datetime64[ns](1), int64(1), object(1) memory usage: 276.0+ bytes