0% found this document useful (0 votes)

13 views44 pages

Pandas

Uploaded by

Tonis Petronis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views44 pages

Pandas

Uploaded by

Tonis Petronis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 44

WHAT IS PANDAS?

• Derived from “panel data” or “panel data analysis”

• Multidimensional structured data sets
• Python Data Analysis Library
HOW TO WORK WITH PANEL DATA?

• All variables can be acquired as vectors.

• But vectors can be of different data types.
• And it would be nice to work with them as with one entity.
• Put ‘em in frame… DataFrame.
WHAT’S INSIDE?

• DataFrame object for data manipulation with integrated indexing.

• Tools for reading and writing data between in-memory data structures and different file formats.
• Data alignment and integrated handling of missing data.
• Reshaping and pivoting of data sets.
• Label-based slicing, fancy indexing, and subsetting of large data sets.
• Data structure column insertion and deletion.
• Group by engine allowing split-apply-combine operations on data sets.
• Data set merging and joining.
• Hierarchical axis indexing to work with high-dimensional data in a lower-dimensional data structure.
• Time series-functionality: Date range generation and frequency conversion, moving window statistics, moving window linear regressions,
date shifting and lagging.
• Data filtration.
SERIES

• The same as list or array only with labels.

• pandas.Series(data, index, dtype, copy)
• For data it is possible to use lists, dictionaries, numpy arrays
EXAMPLE

>>> ser = pd.Series([1,2,3,math.nan, 4], ['a','b','c','d','e'])

>>> ser
a 1.0
b 2.0
c 3.0
d NaN
e 4.0
dtype: float64
WHAT CAN WE DO WITH SERIES?

• Conversion
• Indexing, iteration
• Binary operator functions
• Function application, GroupBy & window
• Computations / descriptive statistics
• Reindexing / selection / label manipulation
• Missing data handling
• Reshaping, sorting
• Combining / comparing / joining / merging
• Time Series-related
>>> ser.sum()
10.0
>>> ser.cumsum()
a 1.0
b 3.0
c 6.0
d NaN
e 10.0
dtype: float64
>>> ser.isna()
a False
b False
c False
d True
e False
dtype: bool
DATAFRAME

• A way to store data in rectangular grid or table.

• Each row corresponds to measurements or values of an instance.
• Each column is a vector (or Series) containing data for a specific variable.
• Each vector can be of different type (numeric, character, etc.)
• DataFrame can be defined as two-dimensional labeled data structure with
columns of potentially different types.
• pandas.DataFrame( data, index, columns, dtype, copy)
SELECTING DATA

• By index
• By series
• By selectors:
• loc – label based
• iloc – integer based
• ix - both
>>> d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']),
'three' : pd.Series([10,20,30], index=['a','b','c'])}
>>> df = pd.DataFrame(d)
>>> df
one two three
a 1.0 1 10.0
b 2.0 2 20.0
c 3.0 3 30.0
d NaN 4 NaN
>>> df[2:4]
one two three
c 3.0 3 30.0
d NaN 4 NaN
>>> df['one']
a 1.0
b 2.0
c 3.0
d NaN
Name: one, dtype: float64
>>> df['one']['a']
1.0
>>> df.loc['a']
one 1.0
two 1.0
three 10.0
Name: a, dtype: float64
>>> df.iloc[:2,1:]
two three
a 1 10.0
b 2 20.0
PANEL

• 3D container of data
• Deprecated from 0.25 version
• Use MultiIndex instead
• Used terms:
• items: axis 0; each item corresponds to a DataFrame contained inside
• major_axis: axis 1; the index (rows) of each of the DataFrames
• minor_axis: axis 2; the columns of each of the DataFrames
INDEX OBJECTS

• Index
• Numeric Index
• CategoricalIndex
• IntervalIndex
• MultiIndex
• DatetimeIndex
• TimedeltaIndex
• PeriodIndex
>>> data = np.random.randint(1, 10, (5, 3, 2))
>>> data
array([[[1, 7],
[7, 3],
[6, 7]],

[[7, 6],
[8, 7],
[4, 2]],

[[9, 7],
[6, 4],
[5, 5]],

[[4, 1],
[6, 3],
[6, 9]],

[[7, 1],
[3, 7],
[5, 9]]])
>>> data = data.reshape(5, 6).T
>>> data
array([[1, 7, 9, 4, 7],
[7, 6, 7, 1, 1],
[7, 8, 6, 6, 3],
[3, 7, 4, 3, 7],
[6, 4, 5, 6, 5],
[7, 2, 5, 9, 9]])
>>> df = pd.DataFrame(
data=data,
index=pd.MultiIndex.from_product([[2015, 2016, 2017], ['US', 'UK']]),
columns=['item {}'.format(i) for i in range(1, 6)]
)
>>> df
item 1 item 2 item 3 item 4 item 5
2015 US 1 7 9 4 7
UK 7 6 7 1 1
2016 US 7 8 6 6 3
UK 3 7 4 3 7
2017 US 6 4 5 6 5
UK 7 2 5 9 9
BASIC FUNCTIONALITY
Attribute or method Description
T Transposes rows and columns.
axes Returns a list with the row axis labels and column axis labels as the only
members.
dtypes Returns the dtypes in this object.
empty True if NDFrame is entirely empty [no items]; if any of the axes are of length 0.
ndim Number of axes / array dimensions.
shape Returns a tuple representing the dimensionality of the DataFrame.
size Number of elements in the NDFrame.
values Numpy representation of NDFrame.
head() Returns the first n rows.
tail() Returns last n rows.
>>> df.tail(2)
item 1 item 2 item 3 item 4 item 5
2017 US 6 4 5 6 5
UK 7 2 5 9 9
>>> df.head(2)
item 1 item 2 item 3 item 4 item 5
2015 US 1 7 9 4 7
UK 7 6 7 1 1
>>> df.axes
[MultiIndex([(2015, 'US'),
(2015, 'UK'),
(2016, 'US'),
(2016, 'UK'),
(2017, 'US'),
(2017, 'UK')],
), Index(['item 1', 'item 2', 'item 3', 'item 4', 'item 5'], dtype='object')]
>>> df.T
2015 2016 2017
US UK US UK US UK
item 1 1 7 7 3 6 7
item 2 7 6 8 7 4 2
item 3 9 7 6 4 5 5
item 4 4 1 6 3 6 9
item 5 7 1 3 7 5 9
DESCRIPTIVE STATISTICS
Function Description
count() Number of non-null observations
sum() Sum of values
mean() Mean of Values
median() Median of Values
mode() Mode of values
std() Standard Deviation of the Values
min() Minimum Value
max() Maximum Value
abs() Absolute Value
prod() Product of Values
cumsum() Cumulative Sum
cumprod() Cumulative Product
>>> d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
'Lee','David','Gasper','Betina','Andres']),
'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
}
>>> df = pd.DataFrame(d)
>>> df.describe()
Age Rating
count 12.000000 12.000000
mean 31.833333 3.743333
std 9.232682 0.661628
min 23.000000 2.560000
25% 25.000000 3.230000
50% 29.500000 3.790000
75% 35.500000 4.132500
max 51.000000 4.800000
>>> df.sum()
Name TomJamesRickyVinSteveSmithJackLeeDavidGasperBe...
Age 382
Rating 44.92
dtype: object
FUNCTION APPLICATION

• Table wise Function Application: pipe()

• Row or Column wise Function Application: apply()
• Element wise Function Application: applymap()
PIPE EXAMPLE
>>> def adder(ele1,ele2):
return ele1+ele2
>>> df =
pd.DataFrame(np.random.randn(5,3),columns=['col1','col2','col3'
])
>>> df
col1 col2 col3
0 -0.565990 0.042208 0.632644
1 -0.090935 -0.520348 -0.094119
2 0.217500 -0.373585 1.279152
3 1.619577 1.533433 1.059039
4 -0.739193 -2.124682 1.802189
>>> df.pipe(adder,2)
col1 col2 col3
0 1.434010 2.042208 2.632644
1 1.909065 1.479652 1.905881
2 2.217500 1.626415 3.279152
3 3.619577 3.533433 3.059039
4 1.260807 -0.124682 3.802189
APPLY EXAMPLE

>>> df = pd.DataFrame(np.random.randn(5,3),columns=['col1','col2','col3'])
>>> df.apply(np.mean, axis=1)
0 0.356482
1 0.221546
2 -0.376605
3 -0.463950
4 0.132884
dtype: float64
>>> df.apply(np.mean)
col1 0.479030
col2 -0.809850
col3 0.253035
dtype: float64
APPLYMAP EXAMPLE
>>> df.applymap(lambda x:x*100)
col1 col2 col3
0 48.147452 -39.275584 98.072821
1 216.419288 -152.239495 2.284114
2 -107.420440 24.806028 -30.366953
3 16.792875 -136.146811 -19.831032
4 65.575667 -102.068915 76.358390
REINDEXING
>>> df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
>>> df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])
>>> df2
col1 col2 col3
0 -1.011192 -0.614197 0.351578
1 0.633386 -2.339780 -1.041833
>>> df2.reindex_like(df1,method='ffill',limit=1)
col1 col2 col3
0 -1.011192 -0.614197 0.351578
1 0.633386 -2.339780 -1.041833
2 0.633386 -2.339780 -1.041833
3 NaN NaN NaN
4 NaN NaN NaN
5 NaN NaN NaN
>>> df1.rename(columns={'col1' : 'c1', 'col2' : 'c2'},
index = {0 : 'apple', 1 : 'banana', 2 : 'durian'})
c1 c2 col3
apple 0.048334 0.531656 1.419512
banana 0.877269 0.242278 -0.214751
durian -0.651795 -0.520254 0.184121
3 -0.404872 1.386826 -1.151902
4 -0.695822 0.657571 0.764508
5 0.164209 0.947984 0.724488
ITERATION
>>> for column_name in df:

>>> for column_name, column_as_series in df.iteritems():

>>> for row_index, row_as_series in df.iterrows():

>>> for row_as_named_tuple in df.itertuples():

SORTING
>>> unsorted_df = pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7],
columns = ['col2','col1'])
>>> unsorted_df.sort_index() #axis, ascending , etc.
col2 col1
0 1.337929 0.025269
1 0.437463 0.124547
2 -0.227216 0.985426
3 -1.111511 -0.624232
4 1.095845 0.823082
5 -0.488366 0.160159
6 -1.304389 0.647291
7 -0.521915 -0.148872
8 -0.073079 1.104150
9 -0.317896 1.547743
>>> unsorted_df.sort_values(by=['col1'])
col2 col1
3 -1.111511 -0.624232
7 -0.521915 -0.148872
0 1.337929 0.025269
1 0.437463 0.124547
5 -0.488366 0.160159
6 -1.304389 0.647291
4 1.095845 0.823082
2 -0.227216 0.985426
8 -0.073079 1.104150
9 -0.317896 1.547743
WORKING WITH TEXT
>>> s = pd.Series(['Tom', 'William Rick', 'John', 'Alber@t', np.nan, '1234
SteveSmith'])
>>> s
0 Tom
1 William Rick
2 John
3 Alber@t
4 NaN
5 1234
6 SteveSmith
dtype: object
>>> s.str.lower()
0 tom
1 william rick
2 john
3 alber@t
4 NaN
5 1234
6 stevesmith
dtype: object
STATISTICAL FUNCTIONS

• pct_change() - compares every element with its prior element and computes
the change percentage.
• cov() - covariance is applied on series data.
• cor() – correlation is applied on series data.
• rank() - produces ranking for each element in the array of elements.
>>> s = pd.Series(np.random.randn(5),
index=list('abcde'))
>>> s
a 0.021429
b -0.501898
c -2.342914
d 0.808404
e 0.926918
dtype: float64
>>> s.rank()
a 3.0
b 2.0
c 1.0
d 4.0
e 5.0
dtype: float64
WINDOW FUNCTIONS

• Statistics in finding the trends by smoothing the curve:

• rolling() – rolling window calculations
• expanding() – expanding transformation
• ewm() – exponential weighted move
>>> df = pd.DataFrame(np.random.randn(10, 4),
index = pd.date_range('1/1/2000', periods=10),
columns = ['A', 'B', 'C', 'D'])
>>> df
A B C D
2000-01-01 -0.303304 -2.077986 0.694436 0.491008
2000-01-02 -0.185362 2.821360 0.393097 0.565165
2000-01-03 0.156716 1.260999 0.531221 1.610406
2000-01-04 -0.750791 0.578499 0.520296 -0.491102
2000-01-05 0.076014 2.101193 -0.530172 0.433667
2000-01-06 0.272712 0.771861 0.756701 -0.949046
2000-01-07 0.045145 -0.867820 0.248871 0.242362
2000-01-08 0.736053 -0.135037 0.290101 0.179423
2000-01-09 -0.523680 -1.571565 0.157508 0.588099
2000-01-10 0.235188 0.571272 0.227741 -0.657029
>>> df.rolling(window=3).mean()
A B C D
2000-01-01 NaN NaN NaN NaN
2000-01-02 NaN NaN NaN NaN
2000-01-03 -0.110650 0.668124 0.539585 0.888860
2000-01-04 -0.259812 1.553620 0.481538 0.561490
2000-01-05 -0.172687 1.313564 0.173782 0.517657
2000-01-06 -0.134022 1.150518 0.248942 -0.335494
2000-01-07 0.131290 0.668411 0.158467 -0.091006
2000-01-08 0.351303 -0.076999 0.431891 -0.175754
2000-01-09 0.085839 -0.858141 0.232160 0.336628
2000-01-10 0.149187 -0.378444 0.225117 0.036831
AGGREGATION
>>> r = df.rolling(window=3,min_periods=1)
>>> r.aggregate({'A' : np.sum,'B' : np.mean})
A B
2000-01-01 -0.303304 -2.077986
2000-01-02 -0.488666 0.371687
2000-01-03 -0.331949 0.668124 Can be applied to
2000-01-04 -0.779437 1.553620 DataFrame directly
2000-01-05 -0.518061 1.313564
2000-01-06 -0.402066 1.150518
2000-01-07 0.393871 0.668411
2000-01-08 1.053910 -0.076999
2000-01-09 0.257518 -0.858141
2000-01-10 0.447562 -0.378444
Alias can be used -‘agg’
MISSING DATA

• isnull()
• notnull()
• fillna(value_to_replace)
• dropna()
• replace()
GROUPBY

• Any groupby operation involves one of the following operations on the original object:
• Splitting the Object
• Applying a function
• Combining the results
• Functionality to apply on each data subset:
• Aggregation − computing a summary statistic
• Transformation − perform some group-specific operation
• Filtration − discarding the data with some condition
>>> ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
'Kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
>>> df = pd.DataFrame(ipl_data)
>>> df
Team Rank Year Points
0 Riders 1 2014 876
1 Riders 2 2015 789
2 Devils 2 2014 863
3 Devils 3 2015 673
4 Kings 3 2014 741
5 Kings 4 2015 812
6 Kings 1 2016 756
7 Kings 1 2017 788
8 Riders 2 2016 694
9 Royals 4 2014 701
10 Royals 1 2015 804
11 Riders 2 2017 690
>>> grouped = df.groupby('Team')
>>> grouped.groups
{'Devils': [2, 3], 'Kings': [4, 5, 6, 7], 'Riders': [0, 1, 8, 11], 'Royals':
[9, 10]}
>>> df.groupby(['Team','Year']).groups
>>> grouped['Points'].agg([np.sum, np.mean, np.std])
MERGE

• pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=True)

• left − A DataFrame object.
• right − Another DataFrame object.
• on − Columns (names) to join on. Must be found in both the left and right DataFrame objects.
• left_on − Columns from the left DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the
DataFrame.
• right_on − Columns from the right DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the
DataFrame.
• left_index − If True, use the index (row labels) from the left DataFrame as its join key(s).
• right_index − Same usage as left_index for the right DataFrame.
• how − One of 'left', 'right', 'outer', 'inner'. Defaults to inner
• sort − Sort the result DataFrame by the join keys in lexicographical order. Defaults to True, setting to False will improve the performance.
>>> left = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5']})
>>> right = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5']})
>>> pd.merge(left, right, on=['id','subject_id'],
how='left')
id Name_x subject_id Name_y
0 1 Alex sub1 NaN
1 2 Amy sub2 NaN
2 3 Allen sub4 NaN
3 4 Alice sub6 Bryce
4 5 Ayoung sub5 Betty
CONCATENATION

• pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False)

• objs − This is a sequence or mapping of Series or DataFrame objects.
• axis − {0, 1, ...}, default 0. This is the axis to concatenate along.
• join − {‘inner’, ‘outer’}, default ‘outer’. How to handle indexes on other axis(es). Outer
for union and inner for intersection.
• ignore_index − boolean, default False. If True, do not use the index values on the
concatenation axis. The resulting axis will be labeled 0, ..., n - 1.
• join_axes − This is the list of Index objects. Specific indexes to use for the other (n-1) axes
instead of performing inner/outer set logic.
>>> one = pd.DataFrame({
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5'],
'Marks_scored':[98,90,87,69,78]},
index=[1,2,3,4,5])
>>> two = pd.DataFrame({
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5'],
'Marks_scored':[89,80,79,97,88]},
index=[1,2,3,4,5])
>>> pd.concat([one,two],keys=['x','y'])
Name subject_id Marks_scored
x 1 Alex sub1 98
2 Amy sub2 90
3 Allen sub4 87
4 Alice sub6 69 ‘df1.append(df2)’ also can
5 Ayoung sub5 78 be used
y 1 Billy sub2 89
2 Brian sub4 80
3 Bran sub3 79
4 Bryce sub6 97
5 Betty sub5 88
CATEGORICAL DATA

• Categorical variables can take on only a limited, and usually fixed number of
possible values.
• Categorical data might have an order but cannot perform numerical
operation.
>>> s = pd.Series(["a","b","c","a"], dtype="category")
>>> s
0 a
1 b
2 c
3 a
dtype: category
Categories (3, object): ['a', 'b', 'c']
>>> cat = pd.Categorical(['a', 'b', 'c', 'a', 'b', 'c'])
>>> cat
['a', 'b', 'c', 'a', 'b', 'c']
Categories (3, object): ['a', 'b', 'c']
VISUALIZATION

• Based on matplotlib library.

• Few wrappers:
• plot()
• plot.bar()
• plot.barh()
• plot.hist()
• plot.box()
• plot.area()
• plot.scatter()
• plot.pie()
INPUT/OUTPUT

• A set of ‘read_xxx’ functions:

• pickle – pickled pandas object from file
• table, csv, fwf – flat files
• clipboard – from os clipboard
• excel
• json
• html – html style table into a list or DataFrame
• hdf – pandas object in HDF5 file (PyTables)
• feather – feather format object from file
• parquet – Apache parquet format object from file
• orc – ORC object from file
• sas
• spss
• sql
• gbq – data from Google BigQuery
• stata

Banking and Insurance
50% (2)
Banking and Insurance
13 pages
Republic of The Philippines Department of Education Region Vii, Central Visayas Division of Cebu Province Self-Learning Home Task (SLHT)
100% (2)
Republic of The Philippines Department of Education Region Vii, Central Visayas Division of Cebu Province Self-Learning Home Task (SLHT)
20 pages
Report On Rural Haat
83% (6)
Report On Rural Haat
22 pages
Data Science - Unit II
100% (2)
Data Science - Unit II
173 pages
Lecture 1 Basics of PCB
No ratings yet
Lecture 1 Basics of PCB
32 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
10 Minutes To Pandas - Pandas 0.21
No ratings yet
10 Minutes To Pandas - Pandas 0.21
23 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
Python Pandas Demo PDF
100% (2)
Python Pandas Demo PDF
23 pages
Ducati Monster S4RS 2006 Parts List WWW - Manualedereparatie.info PDF
No ratings yet
Ducati Monster S4RS 2006 Parts List WWW - Manualedereparatie.info PDF
120 pages
Grundfos - CR 5 12 A A A E HQQE
No ratings yet
Grundfos - CR 5 12 A A A E HQQE
10 pages
Unit 2
No ratings yet
Unit 2
81 pages
10 Minutes To Pandas
No ratings yet
10 Minutes To Pandas
26 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
Organic Bakery Marketing Plan
No ratings yet
Organic Bakery Marketing Plan
30 pages
Pandas Data Wrangling Cheatsheet Datacamp PDF
No ratings yet
Pandas Data Wrangling Cheatsheet Datacamp PDF
1 page
Polybutadiene Rubber
100% (1)
Polybutadiene Rubber
6 pages
10 Minutes To Pandas - Pandas 1.2.4 Documentation
No ratings yet
10 Minutes To Pandas - Pandas 1.2.4 Documentation
18 pages
Conrado Lopez v. Atty. Mata, Atty. Sentillas, and Atty. Abellana AC No. 9334 July 28, 2020 Facts
No ratings yet
Conrado Lopez v. Atty. Mata, Atty. Sentillas, and Atty. Abellana AC No. 9334 July 28, 2020 Facts
2 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
DMT Function
No ratings yet
DMT Function
10 pages
Python Cheat Sheets
97% (33)
Python Cheat Sheets
11 pages
Vio's Bartering Money Guide For Poor People-1 PDF
No ratings yet
Vio's Bartering Money Guide For Poor People-1 PDF
13 pages
Panas Short Notes
No ratings yet
Panas Short Notes
4 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
Pandas Cheat Sheet
100% (2)
Pandas Cheat Sheet
6 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
Digital Signatures: CCA Controller of Certifying Authorities
No ratings yet
Digital Signatures: CCA Controller of Certifying Authorities
18 pages
Avionics Data Buses & Architectures
No ratings yet
Avionics Data Buses & Architectures
27 pages
Last Introduction
No ratings yet
Last Introduction
5 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
Pandas
No ratings yet
Pandas
29 pages
Phannarak CV
No ratings yet
Phannarak CV
2 pages
Pandas
No ratings yet
Pandas
8 pages
Python Libraries Cheat Sheets
No ratings yet
Python Libraries Cheat Sheets
6 pages
Framemaker Has Two Ways of Approaching Documents: and Unstructured
No ratings yet
Framemaker Has Two Ways of Approaching Documents: and Unstructured
3 pages
Residual vs. Zero Sequence: Welcome Posts About Electrical Training Arc Flash Studies Safety Compliance
No ratings yet
Residual vs. Zero Sequence: Welcome Posts About Electrical Training Arc Flash Studies Safety Compliance
2 pages
Pandas
No ratings yet
Pandas
5 pages
12 Pandas
No ratings yet
12 Pandas
9 pages
Python Pandas-Data Frames
No ratings yet
Python Pandas-Data Frames
41 pages
IP Practical File - Reference
No ratings yet
IP Practical File - Reference
98 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Pandas & Numpy
No ratings yet
Pandas & Numpy
32 pages
IP Slybuss
No ratings yet
IP Slybuss
21 pages
05getting Started With Pandas
No ratings yet
05getting Started With Pandas
44 pages
CPIM part 2 practice exam 2单词卡 - Quizlet
No ratings yet
CPIM part 2 practice exam 2单词卡 - Quizlet
15 pages
Fransisca Mira Widyasari
No ratings yet
Fransisca Mira Widyasari
2 pages
Pandas Class 12 Ncertttt
No ratings yet
Pandas Class 12 Ncertttt
48 pages
Exp3 Python
No ratings yet
Exp3 Python
15 pages
Investing For Inclusion Exploring Lgbti Lens
No ratings yet
Investing For Inclusion Exploring Lgbti Lens
48 pages
10 Minutes To Pandas - Pandas 2.1.1 Documentation
No ratings yet
10 Minutes To Pandas - Pandas 2.1.1 Documentation
24 pages
WORKBOOK - Product Design Workshop-2
No ratings yet
WORKBOOK - Product Design Workshop-2
34 pages
VCB White Paper Public
No ratings yet
VCB White Paper Public
17 pages
Pandas Merged
No ratings yet
Pandas Merged
2 pages
Pandas
No ratings yet
Pandas
24 pages
Chapter 2 Python Pandas - II
No ratings yet
Chapter 2 Python Pandas - II
19 pages
Pandas DataFrame1
No ratings yet
Pandas DataFrame1
22 pages
Unit3 - 3) Pandas - Ipynb - Colab
No ratings yet
Unit3 - 3) Pandas - Ipynb - Colab
11 pages
Kolkata Faculty List DG Upload Jan 2023
No ratings yet
Kolkata Faculty List DG Upload Jan 2023
3 pages
09 - Pandas Slides
No ratings yet
09 - Pandas Slides
33 pages
Pandas
No ratings yet
Pandas
63 pages
Trigonometry 15 Dec1.
No ratings yet
Trigonometry 15 Dec1.
107 pages
Data Frame Demo
No ratings yet
Data Frame Demo
73 pages
Chapter 2
No ratings yet
Chapter 2
42 pages
Exp 3
No ratings yet
Exp 3
10 pages
Missing Persons - Newspaper
No ratings yet
Missing Persons - Newspaper
2 pages
IB Biology - DNA Replication - Quizlet
No ratings yet
IB Biology - DNA Replication - Quizlet
7 pages
Lab 9
No ratings yet
Lab 9
9 pages
14 Pandas
No ratings yet
14 Pandas
25 pages
Numpy Boolean Indexing: Filter
No ratings yet
Numpy Boolean Indexing: Filter
39 pages
Great Debaters
No ratings yet
Great Debaters
51 pages
ML Unit-2 Notes
No ratings yet
ML Unit-2 Notes
17 pages
Doctrinal
No ratings yet
Doctrinal
42 pages
Myroslava
No ratings yet
Myroslava
1 page
Pandas
No ratings yet
Pandas
27 pages
Ip Study
No ratings yet
Ip Study
18 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
Pandas Shan Ver2
No ratings yet
Pandas Shan Ver2
25 pages
Pandas - Ipynb - Colab
No ratings yet
Pandas - Ipynb - Colab
8 pages
Dataframe Ip
No ratings yet
Dataframe Ip
75 pages
Customer Course Catalog
100% (1)
Customer Course Catalog
112 pages
Unit 3
No ratings yet
Unit 3
10 pages
DataFrame Ac Win Final
No ratings yet
DataFrame Ac Win Final
30 pages

Pandas

Uploaded by

Pandas

Uploaded by

WHAT IS PANDAS?

• Derived from “panel data” or “panel data analysis”

• All variables can be acquired as vectors.

• DataFrame object for data manipulation with integrated indexing.

• The same as list or array only with labels.

>>> ser = pd.Series([1,2,3,math.nan, 4], ['a','b','c','d','e'])

• A way to store data in rectangular grid or table.

• Table wise Function Application: pipe()

>>> for column_name, column_as_series in df.iteritems():

>>> for row_index, row_as_series in df.iterrows():

>>> for row_as_named_tuple in df.itertuples():

• Statistics in finding the trends by smoothing the curve:

• pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=True)

• pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False)

• Based on matplotlib library.

• A set of ‘read_xxx’ functions:

You might also like