Handout Pandas
Handout Pandas
DataFrame
pandas.DataFrame
class pandas.DataFrame(data=None, index=None, columns=None,
dtype=None, copy=None) [source]
See also
DataFrame.from_records
Constructor from tuples, also record arrays.
DataFrame.from_dict
From dicts of Series, arrays, or dicts.
read_csv
Read a comma-separated values (csv) file into DataFrame.
read_table
Read general delimited file into DataFrame.
read_clipboard
Read text from clipboard into DataFrame.
Notes
Please reference the User Guide for more information.
Examples
Constructing DataFrame from a dictionary.
>>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df
col1 col2
0 1 3
1 2 4
Attributes
T The transpose of the DataFrame.
at Access a single value for a row/column label pair.
attrs Dictionary of global attributes of this dataset.
axes Return a list representing the axes of the DataFrame.
columns The column labels of the DataFrame.
dtypes Return the dtypes in the DataFrame.
empty Indicator whether Series/DataFrame is empty.
flags Get the properties associated with this pandas object.
iat Access a single value for a row/column pair by integer position.
iloc (DEPRECATED) Purely integer-location based indexing for selection by
position.
index The index (row labels) of the DataFrame.
loc Access a group of rows and columns by label(s) or a boolean array.
ndim Return an int representing the number of axes / array dimensions.
shape Return a tuple representing the dimensionality of the DataFrame.
size Return an int representing
Skip tothe
mainnumber
contentof elements in this object.
API reference DataFrame pandas.DataFrame.loc
pandas.DataFrame.loc
property DataFrame.loc [source]
A boolean array of the same length as the axis being sliced, e.g. [True, False,
True] .
An alignable boolean Series. The index of the key will be aligned before masking.
An alignable Index. The Index of the returned selection will be the input.
A callable function with one argument (the calling Series or DataFrame) and that
returns valid output for indexing (one of the above)
See more at Selection by Label.
Raises:
KeyError
If any items are not found.
IndexingError
If an indexed key is passed and its index is unalignable to the frame index.
Skip to main content
See also
DataFrame.at
Access a single value for a row/column label pair.
DataFrame.iloc
Access group of rows and columns by integer position(s).
DataFrame.xs
Returns a cross-section (row(s) or column(s)) from the Series/DataFrame.
Series.loc
Access group of values using labels.
Examples
Getting values
>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
... index=['cobra', 'viper', 'sidewinder'],
... columns=['max_speed', 'shield'])
>>> df
max_speed shield
cobra 1 2
viper 4 5
sidewinder 7 8
Please ensure that each condition is wrapped in parentheses () . See the user guide for
more details and explanations of Boolean indexing.
Note
If you find yourself using 3 or more conditionals in .loc[] , consider using
advanced indexing.
See below for using .loc[] on MultiIndex DataFrames.
Callable that returns a boolean Series
>>> df.loc[lambda df: df['shield'] == 8]
max_speed shield
sidewinder 7 8
Setting values
Set value for all items matching the list of labels
>>> df.loc[['viper', 'sidewinder'], ['shield']] = 50
>>> df
max_speed shield
cobra 1 2
viper 4 50
sidewinder 7 50
Setting using a Series or a DataFrame sets the values matching the index labels, not
the index positions.
>>> shuffled_df = df.loc[["viper", "cobra", "sidewinder"]]
>>> df.loc[:] += shuffled_df
>>> df
max_speed shield
cobra 60 20
viper 0 10
sidewinder 0 0
Slice with integer labels for rows. As mentioned above, note that both the start and stop of
the slice are included.
>>> df.loc[7:9]
max_speed shield
7 1 2
8 4 5
9 7 8
Single label for row and column. Similar to passing in a tuple, this returns a Series.
>>> df.loc['cobra', 'mark i']
max_speed 12
shield 2
Name: (cobra, mark i), dtype: int64
Single tuple for the index with a single label for the column
>>> df.loc[('cobra', 'mark i'), 'shield']
2
Previous
viper mark ii 7 1
Next
pandas.DataFrame.iat pandas.DataFrame.iloc
Please see the user guide for moreSkipdetails
to main
andcontent
explanations of advanced indexing.
API reference DataFrame pandas.DataF...
pandas.DataFrame.iloc
property DataFrame.iloc [source]
Examples
>>> mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
... {'a': 100, 'b': 200, 'c': 300, 'd': 400},
... {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000}]
>>> df = pd.DataFrame(mydict)
>>> df
a b c d
0 1 2 3 4
1 100 200 300 400
2 1000 2000 3000 4000
With a callable, useful in method chains. The x passed to the lambda is the DataFrame
being sliced. This selects the rows whose index label even.
>>> df.iloc[lambda x: x.index % 2 == 0]
a b c d
0 1 2 3 4
2 1000 2000 3000 4000
pandas.DataFrame.to_pickle
DataFrame.to_pickle(path, *, compression='infer', protocol=5,
storage_options=None) [source]
Examples
>>> original_df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)})
>>> original_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
>>> original_df.to_pickle("./dummy.pkl")
Previous Next
pandas.read_pickle pandas.read_table
Skip to main content
© 2024, pandas via NumFOCUS, Inc. Hosted by
OVHcloud. Built with the PyData Sphinx Theme
0.14.4.
Created using Sphinx 8.0.2.
API reference DataFrame pandas.DataF...
pandas.DataFrame.sort_values
DataFrame.sort_values(by, *, axis=0, ascending=True, inplace=False,
Parameters:
by : str or list of str
Name or list of names to sort by.
if axis is 0 or ‘index’ then by may contain index levels and/or column labels.
if axis is 1 or ‘columns’ then by may contain column levels and/or index labels.
axis : “{0 or ‘index’, 1 or ‘columns’}”, default 0
Axis to be sorted.
ascending : bool or list of bool, default True
Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list
of bools, must match the length of the by.
inplace : bool, default False
If True, perform operation in-place.
kind : {‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’}, default ‘quicksort’
Choice of sorting algorithm. See also numpy.sort() for more information.
mergesort and stable are the only stable algorithms. For DataFrames, this option
is only applied when sorting on a single column or label.
na_position : {‘first’, ‘last’}, default ‘last’
Puts NaNs at the beginning if first; last puts NaNs at the end.
ignore_index : bool, default False Back to top
If True, the resulting axis will be labeled 0, 1, …, n - 1.
key : callable, optional
Apply the key function to the values before sorting. This is similar to the key
argument in the builtin sorted() function, with the notable difference that this
key function should beSkip vectorized.
to mainItcontent
should expect a Series and return a
Series with the same shape as the input. It will be applied to each column in by
independently.
Returns:
DataFrame or None
DataFrame with sorted values or None if inplace=True .
See also
DataFrame.sort_index
Sort a DataFrame by the index.
Series.sort_values
Similar method for a Series.
Examples
>>> df = pd.DataFrame({
... 'col1': ['A', 'A', 'B', np.nan, 'D', 'C'],
... 'col2': [2, 1, 9, 8, 7, 4],
... 'col3': [0, 1, 9, 4, 2, 3],
... 'col4': ['a', 'B', 'c', 'D', 'e', 'F']
... })
>>> df
col1 col2 col3 col4
0 A 2 0 a
1 A 1 1 B
2 B 9 9 c
3 NaN 8 4 D
4 D 7 2 e
5 C 4 3 F
Sort by col1
>>> df.sort_values(by=['col1'])
col1 col2 col3 col4
0 A 2 0 a
1 A 1 1 B
2 B 9 9 c
5 C 4 3 F
4 D 7 2 e
3 NaN 8 4 D
Sort Descending
>>> df.sort_values(by='col1', ascending=False)
col1 col2 col3 col4
4 D 7 2 e
5 C 4 3 F
2 B 9 9 c
0 A 2 0 a
1 A 1 1 B
3 NaN 8 4 D
Previous Next
pandas.DataFrame.reorder_levels pandas.DataFrame.sort_index
© 2024, pandas via NumFOCUS, Inc. Hosted by
OVHcloud. Built with the PyData Sphinx Theme
0.14.4.
Created using Sphinx 8.0.2.
API reference DataFrame pandas.DataF...
pandas.DataFrame.groupby
DataFrame.groupby(by=None, axis=<no_default>, level=None,
as_index=True, sort=True, group_keys=True, observed=<no_default>,
dropna=True) [source]
Notes
See the user guide for more detailed usage and examples, including splitting an object into
groups, iterating through groups, selecting a group, aggregation, and more.
Examples
>>> df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
... 'Parrot', 'Parrot'],
... 'Max Speed': [380., 370., 24., 26.]})
>>> df
Animal Max Speed
0 Falcon 380.0
1 Falcon 370.0
2 Parrot 24.0
3 Parrot 26.0
>>> df.groupby(['Animal']).mean()
Max Speed
Animal
Falcon 375.0
Parrot 25.0
Hierarchical Indexes
We can groupby different levels of a hierarchical index using the level parameter:
>>> arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'],
... ['Captive', 'Wild', 'Captive', 'Wild']]
>>> index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type'))
We can also choose to include NA in group keys or not by setting dropna parameter, the
default setting is True.
>>> l = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]
>>> df = pd.DataFrame(l, columns=["a", "b", "c"])
>>> df.groupby(by=["b"]).sum()
a c
b
1.0 2 3
2.0 2 5
>>> l = [["a", 12, 12], [None, 12.3, 33.], ["b", 12.3, 123], ["a", 1, 1]]
>>> df = pd.DataFrame(l, columns=["a", "b", "c"])
>>> df.groupby(by="a").sum()
b c
a
a 13.0 13.0
b 12.3 123.0
When using .apply() , use group_keys to include or exclude the group keys. The
group_keys argument defaults to True (include).
pandas.DataFrame.drop_duplicates
DataFrame.drop_duplicates(subset=None, *, keep='first',
Previous Next
pandas.DataFrame.drop pandas.DataFrame.duplicated
© 2024, pandas via NumFOCUS, Inc. Hosted by
OVHcloud. Skip to main content
API reference DataFrame pandas.DataF...
pandas.DataFrame.drop
DataFrame.drop(labels=None, *, axis=0, index=None, columns=None,
Examples
>>> df = pd.DataFrame(np.arange(12).reshape(3, 4),
... columns=['A', 'B', 'C', 'D'])
>>> df
A B C D
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
Drop columns
>>> df.drop(['B', 'C'], axis=1)
A D
0 0 3
1 4 7
2 8 11
Drop a specific index combination from the MultiIndex DataFrame, i.e., drop the
combination 'falcon' and 'weight' , which deletes only the corresponding row
>>> df.drop(index=('falcon', 'weight'))
big small
llama speed 45.0 30.0
weight 200.0 100.0
length 1.5 1.0
cow speed 30.0 20.0
weight 250.0 150.0
length 1.5 0.8
falcon speed 320.0 250.0
length 0.3 0.2
Previous Nex
pandas.DataFrame.between_time pandas.DataFrame.drop_duplicates
© 2024, pandas via NumFOCUS, Inc. Hosted by
OVHcloud. Built with the PyData Sphinx Theme
0.14.4.
Created using Sphinx 8.0.2.