0% found this document useful (0 votes)
13 views2 pages

Pandas Merged

Uploaded by

shreeja471
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views2 pages

Pandas Merged

Uploaded by

shreeja471
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Pandas Cheat Sheet Updating Rows/Columns

Import convention
>>> import pandas as pd >>> df.rename(columns={'age':'Age'}) >>> p_df.append(
- Renames the column names >>> df.replace(to_replace=[51, 69.3], value = 58) {'name':'Jim lake',
- Replaces values ‘51, 69.3’ to 58 in the whole dataframe’. 'first':'Jim',

Creating data >>> df.loc[2, ['age', 'weight']] = [35, 89.1]


- Updates row values at given columns. >>> df['age'].replace({51:58})
'last':'lake'}, ignore_index=True)
- Appends rows to p_df and returns a new object.
Creating Series - Replace value ‘51’ in age column to ‘58’.
>>> df["age"].apply(lambda x: x + 5) >>> df2 = pd.DataFrame(
>>> s = pd.Series([3, -5, 7, 4], - Updates the column value as per the lambda function. >>> p_df[['first', 'last']] = {"place" : ["HYD","DEL"],
index=['a', 'b', 'c', 'd']) p_df['name'].str.split(' ', expand=True) "state" : ["TEL", "UP"]})
>>> df.apply(max) - Splits columns. >>> p_df.merge(df2, on="place")
Creating Dataframe - Applies the given function on dataframe. - Merge DataFrames or Series objects similar to SQL join operation.
>>> age_df = pd.DataFrame(
>>> df = pd.DataFrame(
>>> p_df = pd.DataFrame( {"age": [35, 17]}) >>> df.sort_values(by='age')
{"name" : ["Ram","Rahul","Ravi"],
{"name" : ["Jack Smith", 'Jane Lodge'], >>> pd.concat([p_df, age_df], axis=1)
"age" : [51, 28, 19], - Sort by the values of the given column.
"place" : ["HYD", "DEL"]})
"weight" : [69.3, 44.6, 36.9]}) - Concatenates pandas objects along axis.
>>> df['age'].nlargest(2)
>>> p_df.applymap(str.lower) >>> p_df.drop(labels='last', axis='columns') - Orders first 2 rows based on given column in descending order.
Loading Data - Applies the function to every element. - Removes rows or columns by specifying label names and
corresponding axis. >>> df['age'].nsmallest(2)
>>> df = pd.read_csv('data.csv')
>>> df['name'].map({'Rahul':'Raghu'}) - Orders first 2 rows based on given column in ascending order.
- Loading the data from a csv file into python.
- Map values of the Series according to input correspondence.

Properties of Dataframe Accessing Data Filtering Based on Criteria


>>> df.head(n) First n rows >>> df.loc[0] Row by label >>> df [df['age'] > 50] Extracts rows that meet logical criteria.
>>> df.tail(n) Last n rows >>> df.loc[[0, 2], Group of rows and columns by label(s)
>>> df.shape Shape of df ['age', 'weight']] >>> df.query('age < weight & age>=11') DataFrame resulting from the provided query expression.

>>> df.columns Column labels >>> df.iloc[[0, 1]] Group of rows and columns by indices.

>>> df.dtypes Datatypes of columns >>> filter = df['name'].str.contains('Rah') Series resulting from the provided string query expression.
>>>df.at[1,'weight'] Single value for a row-column label pair.
>>> df[filter]
>>>df.describe Summary statistics
Display Options Grouping and Aggregation Cleaning Data
>>> pd.set_option('display.max_rows', n)
Handling Missing Values Changing Datatypes
- Sets the max visible rows for dataframe. >>> df.groupby(by=['age', 'name'])
>>> nan_df = pd.DataFrame({ "A" :[1.0, -3.0, 1.0],
- Returns Groupby object grouped by values in given "B" : [1.0, np.nan, 1.0], >>> df['weight'].astype('int64')
>>> pd.reset_option('display') - Converts 'weight' column into integer.
columns. "C" : [3.0, -2.0, 3.0],
- Resets all the display options.
"D" :[1.0, -3.0, 1.0])}
>>> df['name'].value_counts() >>> nan_df.isna() >>> df.astype('string')
- Counts the number of times each value is repeated. - Returns a boolean same-sized object indicating if the values are NA. - Converts every element in the df to string.

Changing the Index >>> df.groupby('name')['age'].mean()


- Splits into groups based on 'age' and
>>> nan_df.fillna(2) >>> data = pd.DataFrame(
- Fills NA/NaN values with the given value. {'year': [2015, 2016],
aggregation done on 'name'.
'month': [2, 3],'day': [4, 5]})
>>> nan_df.dropna()
>>> df.set_index('name') >>> df.count() - Returns a DataFrame with the NaN entries dropped from it.
>>> datetime_df = pd.to_datetime(data)
- Set the index to become the ‘name’ column. - Counts non-NA cells for each column or row.
- Converts into datetime datatype.
>>> nan_df.replace('NA', np.nan, inplace=True)
>>> df.reset_index() - Handle other missing values by replacing with given value.
>>> df['age'].min() >>> datetime_df.dt.month
- Reset the index of df and use the default one.
- Returns minimum of the values. - Returns months in the timestamps.
>>> df.first_valid_index()
>>> df.sort_index(axis=0) - Index of the first non-NA/null value.
>>> df.aggregate(['sum', 'min', 'mean']) >>> datetime_df.dt.year
- Sort object by labels (along an axis).
- Aggregates the data using the functions: Handling Duplicates - Returns years in the timestamps.
>>> pd.read_csv('data.csv', index_col = 'sum', 'min', mean'. >>> nan_df.duplicated()
- Returns a boolean series for each of the duplicated rows. >>> datetime_df.dt.day_name()
'column_name')
- Setting the index while reading the csv file. >>> df['age'].cumsum() - Returns weekday in the timestamps.
- Returns the cumulative sum of a Series or DataFrame. >>> nan_df.drop_duplicates()
- Returns a dataframe with the duplicated rows removed.

You might also like