0% found this document useful (0 votes)

1K views21 pages

Pandas Basics

pandas is a Python package providing flexible data structures like Series and DataFrame for working with labeled and relational data. Series is a single column of data and DataFrame is a multi-dimensional table made of Series. DataFrames can be created from various data sources like CSV files with a few lines of code and provide many methods for fundamental data analysis and transformations.

Uploaded by

Dhruv Bhardwaj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views21 pages

Pandas Basics

Uploaded by

Dhruv Bhardwaj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Pandas

PYTHON FOR DATA ANALYSIS

Package overview

pandas is a Python package providing fast, flexible, and expressive data structures
designed to make working with “relational” or “labelled” data both easy and
intuitive. It aims to be the fundamental high-level building block for doing practical,
real-world data analysis in Python. Additionally, it has the broader goal of
becoming the most powerful and flexible open source data analysis/manipulation
tool available in any language.
pandas is well suited for many different kinds of data:
 Tabular data with heterogeneously-typed columns, as in an SQL table or Excel
spreadsheet
 Ordered and unordered (not necessarily field-frequency) time series data.
 Arbitrary matrix data (homogeneously typed or heterogeneous) with row and
column labels
 Any other form of observational / statistical data sets. The data need not be
labelled at all to be placed into a pandas data structure
Data structures of pandas

 The two primary data structures of pandas, Series (1-dimensional)

and DataFrame (2-dimensional), handle the vast majority of typical
use cases in finance, statistics, social science, and many areas of
engineering. For R users, DataFrame provides everything that R’s
data.frame provides and much more. pandas is built on top of
NumPy and is intended to integrate well within a scientific
computing environment with many other 3rd party libraries.

 The best way to think about the pandas data structures is as flexible
containers for lower dimensional data. For example, DataFrame is a
container for Series, and Series is a container for scalars. We would
like to be able to insert and remove objects from these containers in
a dictionary-like fashion.
Install and import

 Pandas is an easy package to install. Open up your terminal

program (for Mac users) or command line (for PC users) and install it
using either of the following commands:
conda install pandas
or
pip install pandas

 Alternatively, if you're currently viewing this article in a Jupyter

notebook you can run this cell:

!pip install pandas

import

 To import pandas we usually import it with a shorter name since it's

used so much:

import pandas as pd
Core components of pandas: Series and DataFrames

 The primary two components of pandas are the Series and DataFrame.

 A Series is essentially a column, and a DataFrame is a multi-dimensional

table made up of a collection of Series.
Creating DataFrames

data = {
'apples': [3, 2, 0, 1],
'oranges': [0, 3, 7, 2]
}

purchases = pd.DataFrame(data)

purchases
Index in DataFrame

 The Index of this DataFrame was given to us on creation as the

numbers 0-3, but we could also create our own when we initialize
the DataFrame.
 purchases = pd.DataFrame(data, index=['June', 'Robert', 'Lily',
'David'])

 purchases
 purchases.loc['June']
Example:

import pandas as pd
df = pd.DataFrame({"Name":["Braund, Mr. Owen Harris","Allen, Mr.
William Henry","Bonnell, Miss. Elizabeth"],
"Age": [22, 35, 58],
"Sex": ["male", "male", "female"]})

df #it will print our data frame

#I’m just interested in working with the data in the column Age
df["Age"]
Create a Series:

ages = pd.Series([22, 35, 58], name="Age")

ages

NOTE: A pandas Series has no column labels, as it is just a single column

of a DataFrame. A Series does have row labels.
How to read in data

 It’s quite simple to load data from various file formats into a
DataFrame. In the following examples we'll keep using our apples
and oranges data, but this time it's coming from various files.
 pandas supports many diffrent fie formats or data sources out of the
box (csv, excel, sql, json, parquet, ...), each of them with the
prefi read_*.

read data from a CSV file or a text file:

df = pd.read_csv(file_path, sep=’,’, header = 0, index_col=False,

names=None)
Explanation:

‘read_csv’ function has a plethora of parameters and I have specified

only a few, ones that you may use most often. A few key points:
 a) header=0 means you have the names of columns in the first row in
the file and if you don’t you will have to specify header=None
 b) index_col = False means to not use the first column of the data as an
index in the data frame, you might want to set it to true if the first
column is really an index.
 c) names = None implies you are not specifying the column names and
want it to be inferred from csv file, which means that your header =
some_number contains column names. Otherwise, you can specify the
names in here in the same order as you have the data in the csv file.
 If you are reading a text file separated by space or tab, you could
simply change the sep to be:
 sep = " " or sep='\t'
Reading data from CSVs

 With CSV files all you need is a single line to load in the data:
 df = pd.read_csv('purchases.csv')

 df

 CSVs don't have indexes like our DataFrames, so all we need to do is

just designate the index_col when reading:
 df = pd.read_csv('purchases.csv', index_col=0)

 df
Most important DataFrame operations

 DataFrames possess hundreds of methods and other operations that

are crucial to any analysis. As a beginner, you should know the
operations that perform simple transformations of your data and
those that provide fundamental statistical analysis.
 Let's load in the IMDB movies dataset to begin:

 movies_df = pd.read_csv("IMDB-Movie-Data.csv", index_col="Title")

Viewing your data

 The first thing to do when opening a new dataset is print out a few
rows to keep as a visual reference. We accomplish this with .head():
 movies_df.head()

 movies_df.tail(2)
Getting info about your data

 .info() should be one of the very first commands you run after loading
your data:
 movies_df.info()

 movies_df.shape
 Movies_df.describe()
Handling duplicates

 temp_df = movies_df.append(movies_df)

 temp_df.shape

 temp_df = temp_df.drop_duplicates()

 temp_df.shape
 temp_df.drop_duplicates(inplace=True)
Column cleanup

 Many times datasets will have verbose column names with symbols,
upper and lowercase words, spaces, and typos. To make selecting
data by column name easier we can spend a little time cleaning up
their names.
 Here's how to print the column names of our dataset:

 movies_df.columns

Introduction To Web Services Development (CS311) - Updated Handouts
No ratings yet
Introduction To Web Services Development (CS311) - Updated Handouts
96 pages
Github Actions
No ratings yet
Github Actions
57 pages
Jupyter Notebook For Beginners
100% (2)
Jupyter Notebook For Beginners
23 pages
Flask Python
No ratings yet
Flask Python
324 pages
100+ Essential Python Questions
100% (1)
100+ Essential Python Questions
45 pages
Pandas
No ratings yet
Pandas
2,977 pages
Practical Guide To Matplotlib For Data Science
100% (1)
Practical Guide To Matplotlib For Data Science
35 pages
Python Pandas
No ratings yet
Python Pandas
96 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
Pandas
No ratings yet
Pandas
41 pages
Interactive Applications Using Matplotlib - Sample Chapter
100% (1)
Interactive Applications Using Matplotlib - Sample Chapter
24 pages
Python Libraries Cheat Sheets
No ratings yet
Python Libraries Cheat Sheets
6 pages
ASS ICTTEN622 v1 PDF
No ratings yet
ASS ICTTEN622 v1 PDF
44 pages
Pandas Summarized Visually in 8
100% (2)
Pandas Summarized Visually in 8
8 pages
Chapter 10 Python Pandas
No ratings yet
Chapter 10 Python Pandas
40 pages
Python Cheat Sheet For Excel Users
No ratings yet
Python Cheat Sheet For Excel Users
5 pages
IP Management
No ratings yet
IP Management
192 pages
Python Course Book
No ratings yet
Python Course Book
219 pages
Python Data Visualization: 2019 Tools and Trends
No ratings yet
Python Data Visualization: 2019 Tools and Trends
22 pages
How To Work With Excel Spreadsheets Using Python
100% (1)
How To Work With Excel Spreadsheets Using Python
21 pages
Unit - 1 - Python Pandas
No ratings yet
Unit - 1 - Python Pandas
176 pages
Pandas Basics
No ratings yet
Pandas Basics
84 pages
Python Pandas-Series-neww
100% (1)
Python Pandas-Series-neww
80 pages
Matplotlib EBOOK
No ratings yet
Matplotlib EBOOK
97 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Pandas
No ratings yet
Pandas
41 pages
Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
Tactix Charlie: Owner's Manual
100% (1)
Tactix Charlie: Owner's Manual
46 pages
BIG-IP DNS Presentation
No ratings yet
BIG-IP DNS Presentation
18 pages
Pandas Dataframe
No ratings yet
Pandas Dataframe
48 pages
Data Science Python Cheat Sheet
No ratings yet
Data Science Python Cheat Sheet
25 pages
Intro To Jupyter Notebooks
No ratings yet
Intro To Jupyter Notebooks
44 pages
Numpy Python Cheat Sheet
100% (1)
Numpy Python Cheat Sheet
1 page
Flask
No ratings yet
Flask
284 pages
How To Edit EBR Files and Change Partition Size in MediaTek Phones
100% (1)
How To Edit EBR Files and Change Partition Size in MediaTek Phones
37 pages
Python Lists: Python For Informatics: Exploring Information
No ratings yet
Python Lists: Python For Informatics: Exploring Information
28 pages
Scikit-Learn: Library For Machine Learning and Data Science With Python
No ratings yet
Scikit-Learn: Library For Machine Learning and Data Science With Python
11 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Review of Basic Statistical Concepts Hanke
No ratings yet
Review of Basic Statistical Concepts Hanke
28 pages
AMIBIOS Modding (Looking For AMIBCP For Windows 2.x) - VOGONS
No ratings yet
AMIBIOS Modding (Looking For AMIBCP For Windows 2.x) - VOGONS
3 pages
12 Useful Pandas Techniques in Python For Data Manipulation PDF
No ratings yet
12 Useful Pandas Techniques in Python For Data Manipulation PDF
13 pages
Pandas Class 12 Ncertttt
No ratings yet
Pandas Class 12 Ncertttt
48 pages
Data Visualisation Using Pyplot
No ratings yet
Data Visualisation Using Pyplot
20 pages
Data Analysis With PANDAS: Cheat Sheet
86% (7)
Data Analysis With PANDAS: Cheat Sheet
4 pages
Python Pandas Tutorial
96% (28)
Python Pandas Tutorial
178 pages
Windows Win32 Debug PE
No ratings yet
Windows Win32 Debug PE
104 pages
Embedded Systems in Washing Machines
No ratings yet
Embedded Systems in Washing Machines
8 pages
Python 201
No ratings yet
Python 201
15 pages
API Reference - Scikit-Learn 0.19.2 Documentation
No ratings yet
API Reference - Scikit-Learn 0.19.2 Documentation
21 pages
PyCoder 12 20
No ratings yet
PyCoder 12 20
80 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
Embedding S
No ratings yet
Embedding S
83 pages
04-Teknik Keandalan Dan Pewaratan
No ratings yet
04-Teknik Keandalan Dan Pewaratan
28 pages
Virtual Instrumentation and Data Acquisition Using Labview
No ratings yet
Virtual Instrumentation and Data Acquisition Using Labview
64 pages
Pandas (Ziad)
No ratings yet
Pandas (Ziad)
38 pages
Pandas
No ratings yet
Pandas
43 pages
70 445
No ratings yet
70 445
81 pages
Py4Inf 02 Expressions
No ratings yet
Py4Inf 02 Expressions
31 pages
Micro1 - 04E - Devices and Networks
No ratings yet
Micro1 - 04E - Devices and Networks
46 pages
Python Pandas Tutorial
No ratings yet
Python Pandas Tutorial
45 pages
2373 Programming With MS Visual Basic
No ratings yet
2373 Programming With MS Visual Basic
5 pages
Built-In Functions - Python 3.10.1 Documentation
No ratings yet
Built-In Functions - Python 3.10.1 Documentation
27 pages
Celua
No ratings yet
Celua
60 pages
Python Interview Questions
No ratings yet
Python Interview Questions
8 pages
Analyticsvidhya Com
No ratings yet
Analyticsvidhya Com
38 pages
Python Programms
No ratings yet
Python Programms
8 pages
R Manual - Merged
No ratings yet
R Manual - Merged
25 pages
Class 6 Pandas
No ratings yet
Class 6 Pandas
13 pages
Python Pandas2 PDF
No ratings yet
Python Pandas2 PDF
38 pages
Power BI Sec - 1 - Session-1
No ratings yet
Power BI Sec - 1 - Session-1
17 pages
Implementing Analytics Solutions Using Microsoft Fabric (Beta) v1.0
No ratings yet
Implementing Analytics Solutions Using Microsoft Fabric (Beta) v1.0
16 pages
Theater Management Using Python Incorporating NumPy and Pandas
No ratings yet
Theater Management Using Python Incorporating NumPy and Pandas
10 pages
Cisco Unified Attendant Console Standard - Installation and Configuration Guide
No ratings yet
Cisco Unified Attendant Console Standard - Installation and Configuration Guide
78 pages
Class Three
No ratings yet
Class Three
5 pages
Vulnerability Scanning
No ratings yet
Vulnerability Scanning
9 pages
Introduction To Pandas For Data Analysis
No ratings yet
Introduction To Pandas For Data Analysis
6 pages
RLT 03 Aa 1
No ratings yet
RLT 03 Aa 1
2 pages
Python Flask Cheat
No ratings yet
Python Flask Cheat
3 pages
Data Analysis With Pandas - Introduction To Pandas Cheatsheet - Codecademy PDF
No ratings yet
Data Analysis With Pandas - Introduction To Pandas Cheatsheet - Codecademy PDF
3 pages
Ccs370 - Ui & Ux Design
No ratings yet
Ccs370 - Ui & Ux Design
6 pages
Cs Option: Illustrated Parts List
No ratings yet
Cs Option: Illustrated Parts List
11 pages
Anaconda CheatSheet PDF
No ratings yet
Anaconda CheatSheet PDF
2 pages
Tuples: Python For Informatics: Exploring Information
No ratings yet
Tuples: Python For Informatics: Exploring Information
16 pages
Pandas Tutorial 1: Pandas Basics (Reading Data Files, Dataframes, Data Selection)
No ratings yet
Pandas Tutorial 1: Pandas Basics (Reading Data Files, Dataframes, Data Selection)
15 pages
Introductory Notes: Matplotlib: Preliminaries
No ratings yet
Introductory Notes: Matplotlib: Preliminaries
8 pages
C Lab Manual 1 To 3
No ratings yet
C Lab Manual 1 To 3
5 pages
Sharath 1 - 2 1
No ratings yet
Sharath 1 - 2 1
2 pages
Topcoder II
No ratings yet
Topcoder II
8 pages
12 Comp Sci 1 Revision Notes Pythan Advanced Prog
No ratings yet
12 Comp Sci 1 Revision Notes Pythan Advanced Prog
5 pages
Certificate of Training Jeraldin T. Bulat-Ag: Sibugay Technical Institute Incorporated Inc
No ratings yet
Certificate of Training Jeraldin T. Bulat-Ag: Sibugay Technical Institute Incorporated Inc
2 pages

Pandas Basics

Uploaded by

Pandas Basics

Uploaded by

Pandas

PYTHON FOR DATA ANALYSIS

 The two primary data structures of pandas, Series (1-dimensional)

 Pandas is an easy package to install. Open up your terminal

 Alternatively, if you're currently viewing this article in a Jupyter

!pip install pandas

 To import pandas we usually import it with a shorter name since it's

 A Series is essentially a column, and a DataFrame is a multi-dimensional

 The Index of this DataFrame was given to us on creation as the

df #it will print our data frame

ages = pd.Series([22, 35, 58], name="Age")

NOTE: A pandas Series has no column labels, as it is just a single column

read data from a CSV file or a text file:

df = pd.read_csv(file_path, sep=’,’, header = 0, index_col=False,

‘read_csv’ function has a plethora of parameters and I have specified

 CSVs don't have indexes like our DataFrames, so all we need to do is

 DataFrames possess hundreds of methods and other operations that

 movies_df = pd.read_csv("IMDB-Movie-Data.csv", index_col="Title")

You might also like