0% found this document useful (0 votes)
8 views

Manipulating Dataframes - Beginner

The document lists important Python libraries for data analysis and manipulation like Numpy, Pandas, Matplotlib. It then provides examples of using F-strings, creating lists with for loops, sorting and filtering dataframes, performing data analysis and aggregation, manipulating data through methods like replace(), to_numeric(), resample() and query.

Uploaded by

Allan John Lima
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Manipulating Dataframes - Beginner

The document lists important Python libraries for data analysis and manipulation like Numpy, Pandas, Matplotlib. It then provides examples of using F-strings, creating lists with for loops, sorting and filtering dataframes, performing data analysis and aggregation, manipulating data through methods like replace(), to_numeric(), resample() and query.

Uploaded by

Allan John Lima
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Important libraries:

• Numpy – arrays

• Matplotlib – data analysis

• Pandas – dataframe manipulation

• String – strings manipulation

• Wget – web scrapping tool

• Qgrid – dataframe visualization

• Zfile – zip file manipulation

• Investpy – investing data harvesting

• Ipywidget – interaction with graphs – (Interact)

F’string – It’s a way of getting dictionaries faster less verbose. Ex: k = “Allan” / f’{k} is a genius’

If it’s needed to use multiple quotes in the sentence, it should be considered to use different quote mark
for f string. Ex: f”{k} told ‘Fuck you’ to the teacher”.

Creating lists with for loop – Can be done by using the command append() or concat(), a for loop.

Arq = []

for i in range(2011,2021):

arq.append(f’qualquer_nome_{i}’, columns=[“arquivo”]) – Create a dataframe and store


qualquernome_nome_{i(ano)} into the column “arquivo”

qgrid.show_grid(dataframe) – opens the dataframe for visualization with grids and filters.

Sorting items

Dataframe.set_index([‘column A’],[‘column B’]) – applies indices to the dataframe

df.sort_values(by=[‘Column_A’,‘Column_B’])

df[Column_A']=df['Column_A'].map("{:,}".format) – this section puts comma on every thousand of the


dataframe data.

df.T - returns the transport of df

Filtering

Simple filtering

df = df[df['Column_A’] == 'filter'] – Will return a dataframe with data where there Will be the string
‘filter’ on ‘column A’
df1 = df["Column_A"].str.contains("Filter") – Will return a dataframe with Boolean check whether the
rows of ‘column A’ contains or not the string ‘Filter’

by calling df[df1] it will apply the filter method ‘df1’ to ‘df’.

Data Analysis

Dataframe.agg({“column_A”: ["min","max","mean","median", "skew"]}) – will return aggregating


function for column A

df.describe() - returns statistics

Data Manipulation

df.shape - shows the dimension of the dataframe you're looking at.

replace() - it's not a string method. It is used to replace multiples elements in the dataframe. Ex:
titanic["Sex_short"] = titanic["Sex"].replace("Male": "M", "Female": "F") – It’ll create a column named
“Sex_Short”, copy the values from “Sex” and replace them with short for male and female.

pd.to_numeric(dataframe[“column_A”]) – mutate the data of the Column A into numeric

Resample() - re-organize data frequency

Query – Allows one to search in the dataframe based on conditions. Ex: df.query( ‘a > b’)

The query sentence must be entered inside quote marks. For columns with spaces in the name it must
be entered with backtick ` ` . Ex: df.query( ‘ `Col Ex` == “Improving”’) . The strings must be entered with
doble quote marks.

You might also like