0% found this document useful (0 votes)

109 views8 pages

Only Pandas

This document demonstrates how to work with Pandas dataframes including reading in data, selecting subsets of data, grouping and aggregating data, merging dataframes, and working with datetime data. Key functions and methods shown include read_csv, groupby, merge, to_datetime, and datetime properties. Examples include calculating mean life expectancy by year and continent, filtering data based on group sizes, and extracting date fields like year and month from datetime columns.

Uploaded by

Jyotirmay Sahu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

109 views8 pages

Only Pandas

Uploaded by

Jyotirmay Sahu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

import pandas as pd

df = pd.read_csv('../data/gapminder.tsv', sep='\t')
print(type(df))
print(df.shape)
print(df.columns)
print(df.dtypes)
print(df.info())
Pandas TypeType Description
object string Most common data type
int64 int Whole numbers
float64 float Numbers with decimals
datetime64 datetime datetime is found in the Python standard library

country_df = df['country']
print(country_df.head())
print(country_df.tail())
subset = df[['country', 'continent', 'year']]
print(subset.head())
Subset method Description
loc Subset based on index label (row name)
iloc Subset based on row index (row number)
ix Subset based on index label or row index
(no longer works in Pandas v0.20)
print(df.loc[0])# get the first row
print(df.loc[99])# get the 100th row
# select the first, 100th, and 1000th rows
print(df.head(n=10))
print(df.groupby('year')['lifeExp'].mean())
flat = multi_group_var.reset_index()
print(df.groupby('continent')['country'].nunique())## to calculate the number of
unique values in a series
s = pd.Series(['banana', 42])
s = pd.Series(['Wes McKinney', 'Creator of Pandas'],              index=['Person',
'Who'])
Person         Wes McKinney
Who        Creator of Pandas
dtype: object
#a=pd.DataFrame(data=dict1,index=list1,columns=list2)
# select by row index label
first_row = scientists.loc['William Gosset']
print(type(first_row))
<class 'pandas.core.series.Series'>

print(first_row.index)
print(first_row.values)

print(first_row.keys())
print(ages.mean())
print(ages.min())
print(ages.std())
print(ages.max())

scientists = pd.read_csv('../data/scientists.csv')
ages = scientists['Age']
print(ages.describe())# get basic stats
print(ages.mean())
print(ages[ages > ages.mean()])
print(ages > ages.mean())

print(scientists['Born'].dtype)
object

import random
# set a seed so the randomness is always the same
random.seed(42)
random.shuffle(scientists['Age'])
# we can convert the value to just the year
scientists['age_years_dt'] = scientists['age_days_dt'].\ astype('timedelta64[Y]')
print(scientists.columns)
# drop the shuffled age column
# you provide the axis=1 argument to drop column-wise
scientists_dropped = scientists.drop(['Age'], axis=1)

#One of the (conceptually) easier ways to combine data is with concatenation

row_concat = pd.concat([df1, df2, df3])

print(df1.append(df2))
col_concat = pd.concat([df1, df2, df3], axis=1)#columnwise
pd.concat takes an Iterable as its argument. Hence, it cannot take DataFrames
directly as its argument. Also Dimensions of the DataFrame should match along
axis while concatenating.

pd.merge can take DataFrames as its argument, and is used to combine two
DataFrames with same columns or index, which can't be done with pd.concat since
it will show the repeated column in the DataFrame.

Whereas join can be used to join two DataFrames with different indices.

# the default value for 'how' is 'inner'# so it doesn't need to be specifiedo2o_merge

= site.merge(visited_subset, left_on='name', right_on='site')
from numpy import NaN, NAN, nan
print(NaN == True)
False
print(NaN == False)
False
print(NaN == 0)
False
print(pd.read_csv(visited_file, keep_default_na=False))
scientists['missing'] = nan

# count the number of non-missing values

print(ebola.count())
num_rows = ebola.shape[0]
num_missing = num_rows - ebola.count()
# get the first 5 value counts from the Cases_Guinea column
print(ebola.Cases_Guinea.value_counts(dropna=False).head())
print(ebola.fillna(0).iloc[0:10, 0:5])
#missing values are replaced with the last known/recorded value.
print(ebola.fillna(method='ffill').iloc[0:10, 0:5])
#the interpolation in Pandas fills in missing values linearly. Specifically, it treats
the missing values as if they should be equally spaced apart.
print(ebola.interpolate().iloc[0:10, 0:5])

ebola_dropna = ebola.dropna()
# skipping missing values is True by default
print(ebola.Cases_Guinea.sum(skipna = True))

print(billboard_long[billboard_long.track == 'Loser'].head())
billboard_songs = billboard_songs.drop_duplicates()
# Merge the song dataframe to the original data set
billboard_ratings = billboard_long.merge( billboard_songs, on=['year', 'artist',
'track', 'time'])
# concatenate the dataframes together
taxi = pd.concat([taxi1, taxi2, taxi3, taxi4, taxi5])
#Now that we have a list of dataframes, we can concatenate them.
taxi_loop_concat = pd.concat(list_taxi_df)
#Converting to String Objects
tips['sex_str'] = tips['sex'].astype(str)
#to_numbers
tips_sub_miss['total_bill'] = pd.to_numeric( tips_sub_miss['total_bill'], errors='ignore')
"black Knight".capitalize()
'Black knight'
"It's just a flesh wound!".count('u')
2
"Halt! Who goes there?".startswith('Halt')
True
"coconut".endswith('nut')
True
"It's just a flesh wound!".find('u')
7
"It's just a flesh wound!".index('scratch')
ValueError
"old woman".isalpha()
False (there is a whitespace)
"37".isdecimal()
True
"I'm 37".isalnum()
False (apostrophe and space)
"Black Knight".lower()
'black knight'
"Black Knight".upper()
'BLACK KNIGHT'
"flesh wound!".replace('flesh wound', 'scratch')
'scratch!'
" I'm not dead. ".strip()
"I'm not dead."
"NI! NI! NI! NI!".split(sep=' ')
['NI!', 'NI!', 'NI!', 'NI!']
"3,4.partition(',')
('3', ',', '4')
"nine".center(width=10)
' nine '
"9".zfill(with=5)
'00009'
coords = ' '.join([d1, m1, s1, u1, d2, m2, s2, u2])
multi_str_split = multi_str.splitlines()

var = 'flesh wound'

s = "It's just a {}!"
print(s.format(var))
It's just a flesh wound!

#Find a Pattern
m = re.findall(pattern=p, string=s)

def print_me(x):
print(x)
df.apply(print_me, axis=0)
cmis_row = titanic.apply(count_missing, axis=1)
pmis_row = titanic.apply(prop_missing, axis=1)
pcom_row = titanic.apply(prop_complete, axis=1)
print(cmis_row.value_counts())
titanic['num_missing'] = titanic.apply(count_missing, axis=1)
docs['name_lamb'] = docs[0].apply(lambda x: p.match(x).group())

avg_life_exp_by_year = df.groupby('year').lifeExp.mean()
avg_life_exp_by_year = df.groupby('year')['lifeExp'].mean()
# get a list of unique years in the data
years = df.year.unique()
# subset the data for the year 1952
y1952 = df.loc[df.year == 1952, :]
y1952_mean = y1952.lifeExp.mean()
cont_le_agg2 = df.groupby('continent').lifeExp.aggregate(np.mean)

import seaborn as sns

import numpy as np
np.random.seed(42)
# sample 10 rows from tips
tips_10 = sns.load_dataset('tips').sample(10)
tips = sns.load_dataset('tips')
print(tips['size'].value_counts())
# filter the data such that each group has more than 30 observations
tips_filtered = tips.groupby('size').filter(lambda x: x['size'].count() >= 30)

print(tips_filtered['size'].value_counts())
# list all the columns
print(tips_10.columns)

# get the 'Female' group

female = grouped.get_group('Female')

from datetime import datetime

now = datetime.now()
t2 = datetime(1970, 1, 1)
diff = now - t2
import pandas as pd
ebola = pd.read_csv('../data/country_timeseries.csv')
print(ebola.info())
ebola['date_dt'] = pd.to_datetime(ebola['Date'])
ebola['date_dt'] = pd.to_datetime(ebola['Date'], format='%m/%d/%Y')
print(ebola.info())
d = pd.to_datetime('2016-02-29')
print(d.year)
print(d.month)
print(d.day)
ebola['date_dt'] = pd.to_datetime(ebola['Date'])
ebola['year'] = ebola['date_dt'].dt.year
print(ebola['date_dt'].min())
banks['closing_quarter'], banks['closing_year'] = \ (banks['Closing
Date'].dt.quarter, banks['Closing Date'].dt.year)
closing_year = banks.groupby(['closing_year']).size()
print(tesla.loc[(tesla.Date.dt.year == 2010) & \(tesla.Date.dt.month == 6)])
tesla.index = tesla['Date']
head_range = pd.date_range(star='2014-12-31', en='2015-01-05')

Primitive War (Ethan Pettus) (Z-Library)
0% (1)
Primitive War (Ethan Pettus) (Z-Library)
320 pages
Sade The Best of Sade Piano Notes PDF
97% (32)
Sade The Best of Sade Piano Notes PDF
79 pages
12 Information Practices Text Book Preeti Arora
No ratings yet
12 Information Practices Text Book Preeti Arora
45 pages
Category Baidehisha Bilasa
No ratings yet
Category Baidehisha Bilasa
1 page
12 Pandas
100% (1)
12 Pandas
21 pages
Dsbda Assignment 1
No ratings yet
Dsbda Assignment 1
5 pages
Unit3 - 3) Pandas - Ipynb - Colab
No ratings yet
Unit3 - 3) Pandas - Ipynb - Colab
11 pages
Exp 3
No ratings yet
Exp 3
10 pages
Pandas Library Problems For Parctice
No ratings yet
Pandas Library Problems For Parctice
13 pages
Pandas Cheat Sheet
100% (2)
Pandas Cheat Sheet
6 pages
DMT Function
No ratings yet
DMT Function
10 pages
Fundamental - Python
No ratings yet
Fundamental - Python
3 pages
Dev Lab Record
No ratings yet
Dev Lab Record
21 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Mayank Chaudhary DEV Practicals
No ratings yet
Mayank Chaudhary DEV Practicals
14 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
10) Merging Dataframes: # Detecting Duplicates
No ratings yet
10) Merging Dataframes: # Detecting Duplicates
7 pages
Week 5 LAB
No ratings yet
Week 5 LAB
23 pages
Python Pandas
No ratings yet
Python Pandas
13 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
Pandas
No ratings yet
Pandas
8 pages
Exercise 7 - Pandas
No ratings yet
Exercise 7 - Pandas
2 pages
Pandas Cheat Sheet
85% (13)
Pandas Cheat Sheet
2 pages
Exp3 Python
No ratings yet
Exp3 Python
15 pages
Pandas Cheat Sheet CN
No ratings yet
Pandas Cheat Sheet CN
4 pages
Pandas Cheat Sheet
100% (4)
Pandas Cheat Sheet
2 pages
IP Practical File - Reference
No ratings yet
IP Practical File - Reference
98 pages
Asfasdas
No ratings yet
Asfasdas
36 pages
Unit 1 Python Pandas
No ratings yet
Unit 1 Python Pandas
20 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Davp Pyq 2023 Solution
No ratings yet
Davp Pyq 2023 Solution
15 pages
Pandas Cheat Sheet........
No ratings yet
Pandas Cheat Sheet........
11 pages
Assignments IP Class 12
No ratings yet
Assignments IP Class 12
9 pages
Machine Learning Group Project
No ratings yet
Machine Learning Group Project
22 pages
Pandas Data Analytics
No ratings yet
Pandas Data Analytics
61 pages
Pandas Library
No ratings yet
Pandas Library
5 pages
Numpy Boolean Indexing: Filter
No ratings yet
Numpy Boolean Indexing: Filter
39 pages
Pandas
No ratings yet
Pandas
27 pages
CS2209 Python Pandas
No ratings yet
CS2209 Python Pandas
30 pages
Python Pandas Demo PDF
100% (2)
Python Pandas Demo PDF
23 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
Pandas Merged
No ratings yet
Pandas Merged
2 pages
Pandas 1
No ratings yet
Pandas 1
89 pages
Creation of Series Using List, Dictionary & Ndarray
No ratings yet
Creation of Series Using List, Dictionary & Ndarray
65 pages
Ap Python
No ratings yet
Ap Python
12 pages
Prac3 23bme053
No ratings yet
Prac3 23bme053
5 pages
Chapter 2 Python Pandas - II
No ratings yet
Chapter 2 Python Pandas - II
19 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
Batch1 Ds
No ratings yet
Batch1 Ds
15 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Pandas & Mysql
No ratings yet
Pandas & Mysql
20 pages
Pandas Part-2
No ratings yet
Pandas Part-2
9 pages
018) Pandas - Batch 2 - Day 018
No ratings yet
018) Pandas - Batch 2 - Day 018
35 pages
Python Libraries Cheat Sheets
No ratings yet
Python Libraries Cheat Sheets
6 pages
Pandas - Cheatsheet
No ratings yet
Pandas - Cheatsheet
4 pages
Data Sci
No ratings yet
Data Sci
29 pages
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
No ratings yet
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
9 pages
Rapids Cheatsheet
100% (1)
Rapids Cheatsheet
2 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
All T-Codes of Sap Fico
No ratings yet
All T-Codes of Sap Fico
31 pages
Sap Notes
No ratings yet
Sap Notes
38 pages
GPU Computing With Apache Spark and Python: April 5, 2016
No ratings yet
GPU Computing With Apache Spark and Python: April 5, 2016
55 pages
PYTHONBOOK
No ratings yet
PYTHONBOOK
32 pages
OpenMP 01 Introduction
No ratings yet
OpenMP 01 Introduction
70 pages
GPU Fundamentals
No ratings yet
GPU Fundamentals
20 pages
Python TCS
0% (1)
Python TCS
6 pages
Game of Thrones - 1x03 - Lord Snow.720p HDTV - En.srt
No ratings yet
Game of Thrones - 1x03 - Lord Snow.720p HDTV - En.srt
83 pages
Trianz Resume Template
No ratings yet
Trianz Resume Template
4 pages
Python-Howto Curses
No ratings yet
Python-Howto Curses
8 pages
Python Quick Note
No ratings yet
Python Quick Note
23 pages
Imp Details
No ratings yet
Imp Details
6 pages
Init - (Self,: Class. - Init - (... ) X. - Init - (... ) Class (... )
No ratings yet
Init - (Self,: Class. - Init - (... ) X. - Init - (... ) Class (... )
12 pages
PYTHONBOOK
No ratings yet
PYTHONBOOK
32 pages
Day 1 in Venice: Saint Mark'S and Rialto
No ratings yet
Day 1 in Venice: Saint Mark'S and Rialto
90 pages
Jyotirmay Sahu: Professional Summary
No ratings yet
Jyotirmay Sahu: Professional Summary
3 pages
CMPLDW Model Validation - FIDVR Events
No ratings yet
CMPLDW Model Validation - FIDVR Events
20 pages
Ia TP
No ratings yet
Ia TP
22 pages
OD2e L6 Word Lists
No ratings yet
OD2e L6 Word Lists
6 pages
Topic Trinity
No ratings yet
Topic Trinity
3 pages
Lesson Plan Science - Aug 10 To 12
No ratings yet
Lesson Plan Science - Aug 10 To 12
5 pages
Billie Eilish
No ratings yet
Billie Eilish
21 pages
135514edward II
No ratings yet
135514edward II
10 pages
Brigada Eskwela Kick Off Ceremony Script
90% (10)
Brigada Eskwela Kick Off Ceremony Script
3 pages
Biomech of Space Closure-Ortho / Orthodontic Courses by Indian Dental Academy
100% (1)
Biomech of Space Closure-Ortho / Orthodontic Courses by Indian Dental Academy
119 pages
Case Digest in Special Civil Action
No ratings yet
Case Digest in Special Civil Action
6 pages
Mid-Term Year 5 Paper 2 (2021)
No ratings yet
Mid-Term Year 5 Paper 2 (2021)
6 pages
مراجعة شهر فبراير للصف الأول الإعدادي د محمد شوقي النجار 2025 الترم الثاني
No ratings yet
مراجعة شهر فبراير للصف الأول الإعدادي د محمد شوقي النجار 2025 الترم الثاني
20 pages
Polyhedral Mesh Generation PDF
No ratings yet
Polyhedral Mesh Generation PDF
12 pages
Sports: Sports Nutrition Knowledge, Perceptions, Resources, and Advice Given by Certified Crossfit Trainers
No ratings yet
Sports: Sports Nutrition Knowledge, Perceptions, Resources, and Advice Given by Certified Crossfit Trainers
9 pages
Blow Blow
No ratings yet
Blow Blow
9 pages
Cataloging Continuing Resources
No ratings yet
Cataloging Continuing Resources
42 pages
Starbucks Coffee-In Bangladesh-Marketing
No ratings yet
Starbucks Coffee-In Bangladesh-Marketing
22 pages
CPTM 3.2 - Respect For Diversity
No ratings yet
CPTM 3.2 - Respect For Diversity
21 pages
SOLOGAMY ESL Valentine's Theme Lesson
No ratings yet
SOLOGAMY ESL Valentine's Theme Lesson
2 pages
GC Method Development Tree
No ratings yet
GC Method Development Tree
9 pages
Eletronic Cash Controller
No ratings yet
Eletronic Cash Controller
3 pages
Clustering Cba 1
No ratings yet
Clustering Cba 1
10 pages
Research Stem
No ratings yet
Research Stem
50 pages
Ogbe Ose IFA TRADITIONAL
No ratings yet
Ogbe Ose IFA TRADITIONAL
40 pages
TAS2019
No ratings yet
TAS2019
48 pages
Cover Letter For Packer Position With No Experience
100% (1)
Cover Letter For Packer Position With No Experience
6 pages
Austrian Man Has Confessed To Imprisoning His Daughter
No ratings yet
Austrian Man Has Confessed To Imprisoning His Daughter
6 pages
Research
No ratings yet
Research
41 pages

Only Pandas

Uploaded by

Only Pandas

Uploaded by

import pandas as pd

#One of the (conceptually) easier ways to combine data is with concatenation

# the default value for 'how' is 'inner'# so it doesn't need to be specifiedo2o_merge

# count the number of non-missing values

var = 'flesh wound'

import seaborn as sns

# get the 'Female' group

from datetime import datetime

You might also like