0% found this document useful (0 votes)

11 views

Phan1_Pandas_Numpy_Matplotlib

Uploaded by

minhtandragon29

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Phan1_Pandas_Numpy_Matplotlib

Uploaded by

minhtandragon29

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 158

LessonPANDAS

Python 3x
Tutor Mrs. Mỹ Linh
Programing

Time 90 mins

1
Content • Statistical Functions
• Python Pandas • Window Functions
• Series • Aggregations
• DataFrame • Missing Data
• Panel • GroupBy
• Basic Functionality • Merging/Joining
• Descriptive Statistics • Concatenation
• Function Application • Date Functionality
• Reindexing • Timedelta
• Iteration • Categorical Data
• Sorting • Visualization
• Working with Text Data • IO Tools
• Options & Customization • Sparse Data
• Indexing & Selecting Data • Caveats & Gotchas
Introduction to Pandas

3
Python Pandas • Fast and efficient DataFrame object with default and
• Pandas is an open-source Python Library providing customized indexing.
high-performance data manipulation and analysis • Tools for loading data into in-memory data objects
tool using its powerful data structures. The name from different file formats.
Pandas is derived from the word Panel Data – an • Data alignment and integrated handling of missing
Econometrics from Multidimensional data. data.
• To use Pandas, must import pandas as pd • Reshaping and pivoting of date sets.
• Pandas deals with the following three data structures • Label-based slicing, indexing and subsetting of large
• Series: dimension = 1 data sets.
• DataFrame: dimension = 2
• Panel: dimension = 3 • Columns from a data structure can be deleted or
inserted.
• Group by data for aggregation and transformations.
• High performance merging and joining of data.
• Time Series functionality.
• Retrieve Data Using Label
Python Pandas - Series
• Create: pandas.Series( data, index, dtype, copy)
• Data: data takes various forms like ndarray, list,
constants
• Index: Index values must be unique and hashable, same
length as data. Default np.arrange(n) if no index is
passed.
• Dtype: dtype is for data type. If None, data type will be
inferred
• Copy: Copy data. Default False
→
Python Pandas - DataFrame
• Create: • Creating dataframe many ways
• pandas.DataFrame( data, index, columns, dtype, copy) • Adding column
• Columns: For column labels, the optional default syntax is - • Delete column
np.arrange(n). This is only true if no index is passed. • Row Selection, Addition, and Deletion
Example – Create Dataframe
Column Addition
Column Deletion
Row Selection, Addition, and Deletion
Python Pandas - Panel
• Create: pandas.Panel(data, items, major_axis,
minor_axis, dtype, copy)
• Data: Data takes various forms like ndarray, series,
map, lists, dict, constants and also another
DataFrame
• Items: axis=0
• Major_axis: axis=1
• Minor_axis: axis=2
• Dtype: Data type of each column
• Copy: Copy data. Default, false
Example - From 3D ndarray
Series Basic Functionality

13
Series Basic Functionality

14
DataFrame Basic Functionality

15
DataFrame Basic Functionality

16
Function Application
• To apply your own or another library’s functions to Pandas objects, you should be aware of the three
important methods. The appropriate method to use depends on whether your function expects to operate on an
entire DataFrame, row- or column-wise, or element wise.
• Table wise Function Application: pipe()
• Row or Column Wise Function Application: apply()
• Element wise Function Application: applymap()

17
Function Application
• Suppose df is data frame and adder is function

• df = df.pipe(adder,2)
df = df['Salary'].map(lambda x:x*10)
#On Series data

• df = df.apply(np.mean)

df = df.applymap(lambda x:x*10)

• df = df.apply(np.mean, axis = 1)
df = df.apply(lambda x: x.max() - x.min())

12/28/2022 18
Mapping
• map = {
'label1' : 'value1,
'label2' : 'value2,
...
}

• The functions that you will see in this section perform specific operations, but they
all accept a dict object.
• replace()—Replaces values
• map()—Creates a new column
• rename()—Replaces the index values

19
Mapping

20
Adding Values via Mapping

21
Rename the Indexes of the Axes

22
Rename the Indexes of the Axes

23
Re-indexing
• Reindexing changes the row labels and column labels of a DataFrame. To reindex means to
conform the data to match a given set of labels along a particular axis.
• Multiple operations can be accomplished through indexing like −

• Reorder the existing data to match a new set of labels.

• Insert missing value (NA) markers in label locations where no data for the label existed.

24
Example

12/28/2022 25
Re-index to Align with Other Objects

26
Filling while ReIndexing
• reindex() takes an optional parameter method which is a filling method with values as follows −

• pad/ffill − Fill values forward

• bfill/backfill − Fill values backward

• nearest − Fill from the nearest index values

27
Example

28
Limits on Filling while Re-indexing
• The limit argument provides additional control over filling while reindexing. Limit specifies the maximum
count of consecutive matches.

29
Renaming
• The rename() method allows you to relabel an axis based on some mapping (a dict or Series) or an arbitrary
function.

30
ITERATION
• The behavior of basic iteration over Pandas objects depends on the type. When iterating over a Series, it is
regarded as array-like, and basic iteration produces the values. Other data structures, like DataFrame and
Panel, follow the dict-like convention of iterating over the keys of the objects.
• In short, basic iteration (for i in object) produces −
• Series − values
• DataFrame − column labels
• Panel − item labels

31
ITERATOR COLUMN
• Iterating a DataFrame gives column names

32
ITERATOR ROWS
• To iterate over the rows of the DataFrame, we can use the following functions −
• iteritems() − to iterate over the (key,value) pairs
• iterrows() − iterate over the rows as (index,series) pairs
• itertuples() − iterate over the rows as namedtuples

33
iteritems()
• Iterates over each column as key, value pair with
label as key and column value as a Series object.

34
iterrows()
• iterrows() returns the iterator yielding each index value along with a series containing the data in each row.

35
itertuples()
• itertuples() method will return an iterator yielding a named tuple for each row in the DataFrame. The first
element of the tuple will be the row’s corresponding index value, while the remaining values are the row
values.

36
Example

37
Sorting
• There are two kinds of sorting available in Pandas. They are −
• By label
• By Actual Value
• Look at data generating randomly

38
Sorting Example

39
Sorting Example

40
Sorting Example

41
Working with Text Data
• Pandas provides a set of string functions
which make it easy to operate on string
data. Most importantly, these functions
ignore (or exclude) missing/NaN values.

42
Working with Text Data

43
Working with Text Data

44
Options and Customization
• get_option(param): get_option takes a single
parameter and returns the value as given in the table
• set_option(param,value): set_option takes two
arguments and sets the value to the parameter as
shown table
• reset_option(param): takes an argument and sets the
value back to the default value.
• describe_option(param): describe_option prints the
description of the argument.
• option_context(): option_context context manager
is used to set the option in with statement
temporarily. Option values are restored
automatically when you exit the with block

45
Indexing and Selecting Data in Pandas

46
Indexing and Selecting Data
• The Python and NumPy indexing operators "[ ]" and attribute operator "." provide quick and easy access to
Pandas data structures across a wide range of use cases. However, since the type of the data to be accessed
isn’t known in advance, directly using standard operators has some optimization limits. For production code,
we recommend that you take advantage of the optimized pandas data access methods explained.
• Pandas now supports three types of Multi-axes indexing; the three types are mentioned in the following table.

47
.loc()
• Pandas provide various methods to have purely label based indexing. When slicing, the start bound is also
included. Integers are valid labels, but they refer to the label and not the position.
• .loc() has multiple access methods like:
• A single scalar label
• A list of labels
• A slice object
• A Boolean array
• loc takes two single/list/range operator separated by ','. The first one indicates the row and the second one
indicates columns.

48
.loc() Example

49
.loc() Example

50
.loc() Example

51
.loc() Example

52
.loc() Example

53
.loc() Example

54
.iloc()
• Pandas provide various methods in order to get purely integer based indexing. Like python and numpy,
these are 0-based indexing.
• The various access methods are as follows:
• An Integer
• A list of integers
• A range of values

55
.iloc() Example

56
.iloc() Example

57
.iloc() Example

58
.iloc() Example

12/28/2022 59
.ix()
• Besides pure label based and integer based, Pandas provides a hybrid method for selections and subsetting the
object using the .ix() operator.

60
.ix() Example

12/28/2022 61
Use of Notations
• Getting values from the Pandas object with Multi-axes indexing uses the following notation
• Note: .iloc() & .ix() applies the same indexing options and Return value.

62
(Example 1) Use the basic indexing operator '[ ]'

63
(Example 1) Use the basic indexing operator '[ ]'

12/28/2022 64
Sort, Filter, Aggregation, Grouping, Pivot,
Concatenation, Merge/Join in Pandas

65
Sort
• Sort theo 1 column, mặc định là tăng dần: df.sort_values(by='TOTAL')

• Sort theo thứ tự giảm dần: df.sort_values(by='TOTAL', ascending=False)

• Sort theo nhiều trường: df.sort_values(by=['QUANTITY','TOTAL'])
• Sort nhiều trường theo thứ tự khác nhau: df.sort_values(by=['QUANTITY','TOTAL'],
ascending=[True, False])
66
Filter (lọc dữ liệu)
• Filter lấy ra các cột của dataframe: df.filter(items=['USER_ID', 'TAX'])
• Filter lấy ra các cột theo regular expression: df.filter(regex='T$', axis=1)

67
Filter
• Filter các row chứa ký tự: df.filter(like='bbi', axis=0)
• Filter các row theo biểu thức so sánh
• Ví dụ lấy tất cả các order có TOTAL lớn hơn 100: df[df['TOTAL'] > 100]

• Filter theo một hàm tự định nghĩa

def custom(tax, total):

return ( total - tax > 100)

df[custom(df['TAX'], df['TOTAL'])]

68
Aggregation

12/28/2022 69
Example

70
Example

71
Example

72
Example

73
Group

74
Group

75
Group

76
Group

77
Group

78
Group

79
Grouping with user-define function
• Chẳng hạn group lại theo Team và lấy ra tổng số tuổi của 10 bản ghi đầu tiên

def custom_aggregate(series):
return series.head(10).sum()

df.groupby([‘Team’])[‘Age’].agg(custom_aggregate)

80
Pivot
• One of the most common tasks in data science is to manipulate the data frame we have to a specific format.
• Give data about life expectancy (expectancy refers to the number of years a person is expected to live based
on the statistical average. Life expectancy varies by geographical area and by era.)
• Python Pandas function pivot_table help us with the summarization and conversion of dataframe in long form
to dataframe in wide form, in a variety of complex scenarios.

Raw data: df Pivot

81
Pandas Simple Pivot
• A simple example of Python Pivot using a dataframe with jus two columns. Let us subset our
dataframe to contain just two columns, continent and lifeExp

pd.pivot_table(df[['continent','lifeExp']], values='lifeExp', columns='continent')

82
Pandas pivot_table on a data frame with three columns
• Pandas pivot_table gets more useful when we try to summarize and convert a tall data frame with more than
two variables into a wide data frame. Use three columns; continent, year, and lifeExp

pd.pivot_table(df[['continent', 'year','lifeExp']], values='lifeExp', index=['year'], columns='continent')

12/28/2022 83
Pandas pivot_table with Different Aggregating Function
• Pivot_table uses mean function for aggregating or summarizing data by default. We can change the
aggregating function, if needed.
• For example, we can use aggfunc=’max’ to compute “maximum” lifeExp instead of “mean” lifeExp for each
year and continent values.

pd.pivot_table(df[['continent', 'year','lifeExp']], values='lifeExp', index=['year'], columns='continent',aggfunc='max')

12/28/2022 84
Pandas pivot_table with Different Aggregating Function
• pd.pivot_table(df[['continent', 'year','lifeExp']], values='lifeExp', index=['year'], columns='continent',aggfunc=[min,max])

85
Melt
• Pandas melt() function is used to change the DataFrame format from wide to long. It’s used to create a
specific format of the DataFrame object where one or more columns work as identifiers. All the remaining
columns are treated as values and unpivoted to the row axis and only two columns – variable and value.

86
Concatenation

87
Advanced Concatenation

88
Advanced Concatenation

https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

89
Merging

90
Joining

91
Data Manipulation in Pandas

92
Regex

A Regular Expression (RegEx) is a sequence of

characters that defines a search pattern.

93
Date Functionality

94
Date functionality

95
Time Delta
• Time deltas are differences in times, expressed in difference units, for example,
days, hours, minutes, seconds.
• They can be both positive and negative.

96
Example
Name Code Output

By passing a string literal, we can create a pd.Timedelta('2 days 2 hours 15 minutes 30 seconds') 2 days 02:15:30
timedelta object.

By passing an integer value with the unit, an pd.Timedelta(6,unit='h') 0 days 06:00:00

argument creates a Timedelta object.

Data offsets such as - weeks, days, hours, pd.Timedelta(days=2) 2 days 00:00:00

minutes, seconds, milliseconds, microseconds,
nanoseconds

Convert a scalar, array, list, or series from a pd.Timedelta(days=2) 2 days 00:00:00

recognized timedelta format/ value into a
Timedelta type. It will construct Series if the
input is a Series, a scalar if the input is scalar-
like, otherwise will output a TimedeltaIndex.

97
Example
Name Code Output

Operate on Series/ Data s = pd.Series(pd.date_range('2012-1-1', periods=3, freq='D'))

Frames and construct td = pd.Series([ pd.Timedelta(days=i) for i in range(3) ])
timedelta64[ns] Series df = pd.DataFrame(dict(A = s, B = td))
through subtraction
operations on datetime64[ns]
Series, or Timestamps.
Addition Operations s = pd.Series(pd.date_range('2012-1-1', periods=3, freq='D'))
td = pd.Series([ pd.Timedelta(days=i) for i in range(3) ])
df = pd.DataFrame(dict(A = s, B = td))
df['C']=df['A']+df['B']

Subtraction Operation s = pd.Series(pd.date_range('2012-1-1', periods=3, freq='D'))

td = pd.Series([ pd.Timedelta(days=i) for i in range(3) ])
df = pd.DataFrame(dict(A = s, B = td))
df['C']=df['A']+df['B']
df['D']=df['C']+df['B']
12/28/2022 98
Normalization
• Normalization refers to rescaling real-valued numeric
attributes into a 0 to 1 range.
• Data normalization is used in machine learning to make
model training less sensitive to the scale of features.
This allows our model to converge to better weights
and, in turn, leads to a more accurate model.

12/28/2022 99
Standardization

100
Missing Data Handle
• Missing Data can occur when no information is • Pandas treat None and NaN as essentially
provided for one or more items or for a whole unit. interchangeable for indicating missing or null
Missing Data is a very big problem in real life values. To facilitate this convention, there are
scenario. Missing Data can also refer to as NA(Not several useful functions for detecting, removing, and
Available) values in pandas. In DataFrame replacing null values in Pandas DataFrame :
sometimes many datasets simply arrive with missing • isnull()
data, either because it exists and was not collected or • notnull()
it never existed. • dropna()
• In Pandas missing data is represented by two value: • fillna()
• None: None is a Python singleton object that is often used • replace()
for missing data in Python code. • interpolate()
• NaN : NaN (an acronym for Not a Number), is a special
floating-point value recognized by all systems that use the
standard IEEE floating-point representation

101
isnull()

102
notnull()

103
Filling Missing Data

104
#1

105
#2

106
#3

107
Interpolate

108
dropna()

109
dropna()

110
dropna()

111
Window Functions
• .rolling() Function
• .expanding() Function
• .ewm() Function

112
.rolling() Function

113
.expanding() Function

114
EWM
• Ewm is applied on a series of data. Specify any of the com, span, halflife argument and apply the appropriate
statistical function on top of it. It assigns the weights exponentially.
• Using to make data smooth to handle noise data

df.ewm(com=0.5).mean()

115
Data Analysis in Pandas

116
Descriptive Statistics
• Most of these are aggregations like
sum(), mean(), but some of them,
like sumsum(), produce an object of
the same size.
• These methods take an axis
argument, just like ndarray.{sum,
std, ...}, but the axis can be
specified by name or integer.
• DataFrame − “index” (axis=0,
default), “columns” (axis=1)

117
Example
Summarizing Data
• The describe() function computes a summary of statistics pertaining to the Data Frame columns.

119
Summarizing Data with include
• This function gives the mean, std and IQR values. And, function excludes the character columns
and given summary about numeric columns. 'include' is the argument which is used to pass
necessary information regarding what columns need to be considered for summarizing. Takes
the list of values; by default, 'number'.
• object − Summarizes String columns
• number − Summarizes Numeric columns
• all − Summarizes all columns together (Should not pass it as a list value)

120
describe(include=['object'])
• #Create a DataFrame
• df = pd.DataFrame(d)
• print df.describe(include=['object'])

121
describe(include='all')

122
Statistical Functions
• Statistical methods help in the understanding and analyzing the behavior of data.
• Some useful functions:
• Percent change
• Covariance
• Correlation
• Data Ranking

123
Percent_change
• Series, DatFrames and Panel, all have the function pct_change().
• This function compares every element with its prior element and computes the change percentage.
• Formulas: 𝒗𝒂𝒍𝒖𝒆𝒏 = (𝒙𝒏 − 𝒙𝒏−𝟏 ) : (𝒙𝒏−𝟏 )

124
Co-variance
• Covariance is applied on series data. The Series object has a method cov to compute covariance between
series objects. NA will be excluded automatically.
• The covariance formula is similar to the formula for deals with the calculation of data points from the average
value in a dataset. For example, the covariance between two random variables X and Y can be calculated
using the following formula (for population → left) or (for sample → right):

n-1

12/28/2022 125
Correlation Value
• The correlation coefficient is a value that indicates the strength of the relationship. The coefficient can take
any values from -1 to 1. The interpretations of the values are:
• -1: Perfect negative correlation. The variables tend to move in opposite directions (i.e., when one variable increases, the
other variable decreases).
• 0: No correlation. The variables do not have a relationship with each other.
• 1: Perfect positive correlation. The variables tend to move in the same direction (i.e., when one variable increases, the other
variable also increases).

12/28/2022 126
Data Ranking
• Data Ranking produces ranking for each element in the array of elements. In case of ties, assigns the mean
rank.

127
Data Ranking – More Example

128
Categorical Data
• Data includes the text columns, which are repetitive. Features like gender, country, and codes are always
repetitive. These are the examples for categorical data.
• Categorical variables can take on only a limited, and usually fixed number of possible values. Besides
the fixed length, categorical data might have an order but cannot perform numerical operation.
Categorical are a Pandas data type.
• The categorical data type is useful in the following cases −
• A string variable consisting of only a few different values. Converting such a string variable to a
categorical variable will save some memory.
• The lexical order of a variable is not the same as the logical order (“one”, “two”, “three”). By
converting to a categorical and specifying an order on the categories, sorting and min/max will use
the logical order instead of the lexical order.
• As a signal to other python libraries that this column should be treated as a categorical variable
(e.g. to use suitable statistical methods or plot types)

129
Example

130
Comparison of Categorical Data

131
Visualization
• Plotting methods allow a handful of plot styles other than the default line plot. These methods
can be provided as the kind keyword argument to plot(). These include −
• bar or barh for bar plots
• hist for histogram
• box for boxplot
• 'area' for area plots
• 'scatter' for scatter plots

132
Plotting

133
Bar Plotting

134
Bar Plotting

12/28/2022 135
Bar Plotting

12/28/2022 136
Bar Plotting

137
Bar Plotting

12/28/2022 138
Stacked Bar Plotting

12/28/2022 139
Horizontal Bar Plotting

12/28/2022 140
Histogram in same plot

141
Plot different histograms for each column

142
Box Plots

143
Area Plot

144
Scatter Plots

145
Pie Chart

146
Pie Chart

147
Pie Chart

148
Pie Chart

12/28/2022 149
IO Tools
• The two workhorse functions for reading text files (or the flat files) are read_csv() and read_table(). They both
use the same parsing code to intelligently convert tabular data into a DataFrame object
• Example: The temp.csv file data looks like

12/28/2022 150
Example
• df=pd.read_csv("temp.csv")
• df=pd.read_csv("temp.csv",index_col=['S.No'])
• df = pd.read_csv("temp.csv", dtype={'Salary': np.float64})
• df=pd.read_csv("temp.csv", names=['a', 'b', 'c','d','e'])

df=pd.read_csv("temp.csv",names=['a','b','c','d','e'],header=0)

→ What is about

• df=pd.read_csv("temp.csv", skiprows=2)

151
Sparse Data
• Sparse objects are “compressed” when any data matching a specific value (NaN / missing value, though any
value can be chosen) is omitted. A special SparseIndex object tracks where data has been “sparsified”.
• Using to compress data to improve memory if data is sparse
• Use for Series data and Data Frame
• Sparse data should have the same dtype as its dense representation. Currently, float64, int64 and bool dtypes
are supported. Depending on the original dtype, fill_value default changes.
• float64 − np.nan
• int64 − 0
• bool − False

152
Example

df.to_sparse()

compressing

decompressing

sdf.to_dense()

sdf.density → density = 0.4

153
Caveats & Gotchas
• Caveats means warning and gotcha means an unseen problem.
• Pandas follows the numpy convention of raising an error when you try to convert something to a bool. This
happens in an if or when using the Boolean operations, and, or, or not. It is not clear what the result should be.
Should it be True because it is not zerolength? False because there are False values? It is unclear, so instead,
Pandas raises a ValueError.
• Series data
• .empty
• .bool()
• .item()
• .any()
• .all()
• Bitwise Boolean
• Isin

154
Example

155
Comparison with SQL

Query T-SQL Pandas

SELECT SELECT total_bill, tip, smoker, time tips[['total_bill', 'tip', 'smoker',
FROM tips 'time']].head(5)
LIMIT 5;

WHERE SELECT * FROM tips WHERE time = tips[tips['time'] == 'Dinner'].head(5)

'Dinner' LIMIT 5;
GROUP BY SELECT sex, count(*) FROM tips tips.groupby('sex').size()
GROUP BY sex;
TOP N ROWs SELECT * FROM tips LIMIT 5 ; tips.head(5)
156
Mastering Pandas - To master data manipulation in Python using Pandas, here’s
what you need to learn:

• read csv • query

• set index • rename
• reset index • sort values
• loc • agg
• iloc • groupby
• drop • concat
• dropna • merge
• fillna • pivot
• assign • melt
• filter
157
158

TM 9-2320-342-10-2
100% (4)
TM 9-2320-342-10-2
696 pages
Discrete Mathematical Structures With Applications To Computer Science by J.P. Tremblay, R. Manohar PDF
61% (112)
Discrete Mathematical Structures With Applications To Computer Science by J.P. Tremblay, R. Manohar PDF
510 pages
Pandas Basics
No ratings yet
Pandas Basics
84 pages
Digital Payment Systems Perception and C PDF
100% (1)
Digital Payment Systems Perception and C PDF
5 pages
Python Pandas Tutorial For Beginners
No ratings yet
Python Pandas Tutorial For Beginners
203 pages
Python Data Frame New
No ratings yet
Python Data Frame New
32 pages
Pandas
No ratings yet
Pandas
29 pages
Dataframe Notes
No ratings yet
Dataframe Notes
47 pages
Python Programming Pandas Across Examples
No ratings yet
Python Programming Pandas Across Examples
350 pages
Data Handling Using Pandas-I-ORG
No ratings yet
Data Handling Using Pandas-I-ORG
44 pages
05Getting Started With Pandas
No ratings yet
05Getting Started With Pandas
44 pages
Pandas
No ratings yet
Pandas
5 pages
Pandas
No ratings yet
Pandas
25 pages
Class XII IP Key Points (Python Pandas)
No ratings yet
Class XII IP Key Points (Python Pandas)
5 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Pandas Notes
No ratings yet
Pandas Notes
4 pages
Unit 2
No ratings yet
Unit 2
81 pages
Pandas Notes(1)
No ratings yet
Pandas Notes(1)
44 pages
Pandas
No ratings yet
Pandas
13 pages
ML UNIT-2 NOTES
No ratings yet
ML UNIT-2 NOTES
17 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
Loki Temp PPT Pandas 2
No ratings yet
Loki Temp PPT Pandas 2
31 pages
Unit 4
No ratings yet
Unit 4
36 pages
All Document Reader 1715619870900
No ratings yet
All Document Reader 1715619870900
6 pages
python unit 3 4
No ratings yet
python unit 3 4
92 pages
Eda Unit 2
No ratings yet
Eda Unit 2
65 pages
Unit 3 Data Analysis using pandas - Copy
No ratings yet
Unit 3 Data Analysis using pandas - Copy
49 pages
Pandas Class 12 Ncertttt
No ratings yet
Pandas Class 12 Ncertttt
48 pages
Pandas-Creating Series & Dataframes (DR V Gowri, Srmist)
No ratings yet
Pandas-Creating Series & Dataframes (DR V Gowri, Srmist)
47 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Pandas DataFrame
No ratings yet
Pandas DataFrame
70 pages
Data Handlinng Using Pandas
No ratings yet
Data Handlinng Using Pandas
46 pages
Pandas Viva Questions
No ratings yet
Pandas Viva Questions
23 pages
Class XII Data Handlinng Using PandasI
No ratings yet
Class XII Data Handlinng Using PandasI
46 pages
Unit-4Introduction To Pandas
No ratings yet
Unit-4Introduction To Pandas
44 pages
99c949c0-5910-425f-9ac5-155882800fa5
No ratings yet
99c949c0-5910-425f-9ac5-155882800fa5
36 pages
DevOps Session 3 Pandas.pptx
No ratings yet
DevOps Session 3 Pandas.pptx
33 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Pandas
No ratings yet
Pandas
26 pages
Pandas python
No ratings yet
Pandas python
11 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
1 Data Handlinng Using Pandas-I
No ratings yet
1 Data Handlinng Using Pandas-I
46 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
Python Pandas Demo PDF
100% (2)
Python Pandas Demo PDF
23 pages
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
1501992967_1496666168_Pandas
No ratings yet
1501992967_1496666168_Pandas
63 pages
PPT for Assignment-3 (Final_Pandas_Lab)
No ratings yet
PPT for Assignment-3 (Final_Pandas_Lab)
40 pages
python 2.1.2 (2)
No ratings yet
python 2.1.2 (2)
7 pages
Introduction To Pandas For Data Analysis
No ratings yet
Introduction To Pandas For Data Analysis
6 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
On Data Handling Using Pandas-I
100% (2)
On Data Handling Using Pandas-I
64 pages
Pandas
No ratings yet
Pandas
8 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Pandas_Notes
No ratings yet
Pandas_Notes
6 pages
Data Handlinng Using Pandas-I
No ratings yet
Data Handlinng Using Pandas-I
46 pages
Python Pandas
No ratings yet
Python Pandas
21 pages
Pandas
No ratings yet
Pandas
41 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
Pandas Dataframe Export The CSV File
No ratings yet
Pandas Dataframe Export The CSV File
9 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
05_02_How to Scroll Down or UP a Page in Selenium Webdriver
No ratings yet
05_02_How to Scroll Down or UP a Page in Selenium Webdriver
16 pages
04_01_Selenium WebDriver Locators - Part6-XPath in Selenium WebDriver
No ratings yet
04_01_Selenium WebDriver Locators - Part6-XPath in Selenium WebDriver
45 pages
06_02_TestNG @Test Priority in Selenium
No ratings yet
06_02_TestNG @Test Priority in Selenium
19 pages
04_02_Double Click and Right Click in Selenium With Examples
No ratings yet
04_02_Double Click and Right Click in Selenium With Examples
9 pages
07_02_TestNG Listeners in Selenium
No ratings yet
07_02_TestNG Listeners in Selenium
17 pages
07_01_TestNG Groups Include, Exclude with Example - Selenium Tutorial
No ratings yet
07_01_TestNG Groups Include, Exclude with Example - Selenium Tutorial
15 pages
06_01_TestNG Tutorial Annotations
No ratings yet
06_01_TestNG Tutorial Annotations
41 pages
07_03_TestNG Report Generation in Selenium WebDriver
No ratings yet
07_03_TestNG Report Generation in Selenium WebDriver
14 pages
TH6
No ratings yet
TH6
2 pages
TH3
No ratings yet
TH3
1 page
TH2
No ratings yet
TH2
3 pages
TH4
No ratings yet
TH4
2 pages
01_TH_Selenium Webdriver-Setup environment
No ratings yet
01_TH_Selenium Webdriver-Setup environment
17 pages
01_01_Selenium Overview- Selenium Webdriver
No ratings yet
01_01_Selenium Overview- Selenium Webdriver
43 pages
01_01_Automation Testing
No ratings yet
01_01_Automation Testing
19 pages
02_02_Selenium WebDriver Locators_Part1
No ratings yet
02_02_Selenium WebDriver Locators_Part1
15 pages
01_TH_Installing TestNG in Eclipse
No ratings yet
01_TH_Installing TestNG in Eclipse
7 pages
A Computational Foundation For The Study of Cognition
No ratings yet
A Computational Foundation For The Study of Cognition
36 pages
YHCT Bai Giang Cham Cuu Vatm
No ratings yet
YHCT Bai Giang Cham Cuu Vatm
80 pages
Unit II - Problem Solving by Searching
No ratings yet
Unit II - Problem Solving by Searching
21 pages
Virtual Fetal Pig Dissection
No ratings yet
Virtual Fetal Pig Dissection
6 pages
Prof. Kamble A.S.: Sub: Quantity Surveying, Contracts & Tenders. BE Civil
No ratings yet
Prof. Kamble A.S.: Sub: Quantity Surveying, Contracts & Tenders. BE Civil
15 pages
Develop A Competencies Framework For Digital Transformation in The Banking Industry
No ratings yet
Develop A Competencies Framework For Digital Transformation in The Banking Industry
52 pages
Interview Blueprint (English) - PARADISE
No ratings yet
Interview Blueprint (English) - PARADISE
19 pages
Brochure Solar Module - Znshine Solar - ZXM6-NH156 166 - 460-485W - 40 35 - 20200311 - E - 1200mmcable20
100% (1)
Brochure Solar Module - Znshine Solar - ZXM6-NH156 166 - 460-485W - 40 35 - 20200311 - E - 1200mmcable20
2 pages
Financial Analysis Formula Table
No ratings yet
Financial Analysis Formula Table
1 page
The Fraudulent Povedano Calendar
No ratings yet
The Fraudulent Povedano Calendar
2 pages
Logcat
No ratings yet
Logcat
19 pages
MD2 PS1
No ratings yet
MD2 PS1
6 pages
A Reconstructed Historical Aetiology of The Sars-Cov-2 Spike
100% (2)
A Reconstructed Historical Aetiology of The Sars-Cov-2 Spike
8 pages
MyEdspace response on our request for refund
No ratings yet
MyEdspace response on our request for refund
3 pages
This is Jody's Fawn
No ratings yet
This is Jody's Fawn
3 pages
1939 Govt Manual
No ratings yet
1939 Govt Manual
24 pages
Approved Catalogs - SYQ - CY-2012 June 2012 - 2012 - 06 - 06
No ratings yet
Approved Catalogs - SYQ - CY-2012 June 2012 - 2012 - 06 - 06
1 page
Worksheet - Experiment 4 Color Reactions of Proteins
No ratings yet
Worksheet - Experiment 4 Color Reactions of Proteins
3 pages
Medical Tourism
No ratings yet
Medical Tourism
9 pages
Doric Measure and Doric Design 1 The Evi
No ratings yet
Doric Measure and Doric Design 1 The Evi
21 pages
Riscv Vector Workshop June2015
No ratings yet
Riscv Vector Workshop June2015
58 pages
Massey Ferguson - MF 350 Plus 50HP - Xtra Series and Premium Quality
No ratings yet
Massey Ferguson - MF 350 Plus 50HP - Xtra Series and Premium Quality
7 pages
Project Management
No ratings yet
Project Management
43 pages
L12 Engineering Ethics Competences - Guarding Against Conflict of Interests (2017R)
No ratings yet
L12 Engineering Ethics Competences - Guarding Against Conflict of Interests (2017R)
10 pages
Situation Thai Older Persons
No ratings yet
Situation Thai Older Persons
164 pages
Education and Training
No ratings yet
Education and Training
5 pages
Proposal -Blue Economy and Inclusive Development for Climate Justice
No ratings yet
Proposal -Blue Economy and Inclusive Development for Climate Justice
6 pages

Phan1_Pandas_Numpy_Matplotlib

Uploaded by

Phan1_Pandas_Numpy_Matplotlib

Uploaded by

LessonPANDAS

• Reorder the existing data to match a new set of labels.

• pad/ffill − Fill values forward

• bfill/backfill − Fill values backward

• nearest − Fill from the nearest index values

• Sort theo thứ tự giảm dần: df.sort_values(by='TOTAL', ascending=False)

• Filter theo một hàm tự định nghĩa

def custom(tax, total):

Raw data: df Pivot

pd.pivot_table(df[['continent','lifeExp']], values='lifeExp', columns='continent')

pd.pivot_table(df[['continent', 'year','lifeExp']], values='lifeExp', index=['year'], columns='continent')

pd.pivot_table(df[['continent', 'year','lifeExp']], values='lifeExp', index=['year'], columns='continent',aggfunc='max')

A Regular Expression (RegEx) is a sequence of

By passing an integer value with the unit, an pd.Timedelta(6,unit='h') 0 days 06:00:00

Data offsets such as - weeks, days, hours, pd.Timedelta(days=2) 2 days 00:00:00

Convert a scalar, array, list, or series from a pd.Timedelta(days=2) 2 days 00:00:00

Operate on Series/ Data s = pd.Series(pd.date_range('2012-1-1', periods=3, freq='D'))

Subtraction Operation s = pd.Series(pd.date_range('2012-1-1', periods=3, freq='D'))

sdf.density → density = 0.4

Query T-SQL Pandas

WHERE SELECT * FROM tips WHERE time = tips[tips['time'] == 'Dinner'].head(5)

• read csv • query

You might also like