0% found this document useful (0 votes)
19 views20 pages

Python Notes For Midterms and Finals

Uploaded by

alissaosq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views20 pages

Python Notes For Midterms and Finals

Uploaded by

alissaosq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

PMT Recap – Learning points

1. groupby and .loc to filter df


##PMT2.24, Ex0 (3pts)

##PMT2.24, Ex5 (3pts)

##PMT2.22, Ex2 (1pt)


##PMT2.24, Ex7

2. groupby and count


##PMT2.23, Ex4 (2pts)

##PMT2.21, Ex5 (2pts)

3. groupby and apply


##PMT2.22, Ex4 (3pts)
##PMT2.20, Ex5 (3pts)
4. merge (left join/ outer join / inner join)

##PMT2.24, Ex1 (2pts)

##PMT2.24, Ex4 (3pts)

5. Sort and filter by top n


##PMT2.23, Ex2 (2pts)
6. Create df based on list (of column names)
##PMT2.24, Ex 3 (2pts)

7. Change/Check datatype for df[col]


##PMT2.24, Ex2 (1pt)

## https://stackoverflow.com/questions/15891038/change-column-type-in-pandas
## change df col to numpy array – df[‘col’].to_numpy()
8. Rename df columns
##PMT2.21, Ex2 (1pt)

9. Remove cols in pandas


## https://stackoverflow.com/questions/51167612/what-is-the-best-way-to-remove-columns-in-pandas
 df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10],'c':[11,12,13,14,15]})
o del df['a']. ## limited way
o df=df.drop('a',1). ## object-oriented way
 df.drop('a', axis=1) # This will return a NEW DataFrame object, leave the
original `df` untouched.
 df.drop('column_name', axis=1, inplace=True)
 df.drop('row_index', axis=0, inplace=True)
 df.drop('a', axis=1, inplace=True) # This will modify the `df` inplace. **And
return a `None`**.
 It can handle more complicated cases with it's args. E.g. with level, we can
handle MultiIndex deletion. And with errors, we can prevent some bugs.
o df=df[['b','c']] ## not deletion in essence. It selects data by indexing with [] synax, then
unbind the name df with the original df and bind it with a new one.
10. Timestamps
##PMT2.24, Ex6 – change datetime to str type with .strftime

##PMT2.24, Ex7 – use pd.Timedelta to add/subtract timestamps

##PMT2.21, Ex3 (1pt) – use pd.Timedelta to convert timestamp to minutes/hr/days

## pd.to_datetime
11. Convert df to dict – df.to_dict()
##PMT2.23, Ex1

##PMT2.23, Ex3
##PMT2.21, Ex0 & 1(1+1pt)

##PMT2.20, Ex3 (2pts). – use dict( df . values) function

## Using dict comprehension

12. Create dict


##PMT2.23, Ex5 (4pts) – Create network dict

##PMT2.22, Ex3 (1pt) – Map unique zip codes and unique candidates to integers
# Map unique values in x, and by index 0-n

13. Convert dict to df

14. *Create cartesian product of a dictionary of list


https://stackoverflow.com/questions/5228158/cartesian-product-of-a-
dictionary-of-lists

15. .map()
##PMT2.21, Ex6 (2pts) – use .map (to remap keys to value description)
16. Helper functions
##drop_rename_sort

## load_data

17. SQL
##PMT2.22, Ex0 (1pt)

##PMT2.20, Example

##PMT2.22, Ex1 (2pts)

##PMT2.20, Ex0 (2pts)


##PMT2.20, Ex7 (3pts)

18. *More SQL


## Union All
## SUM(...) OVER(PARTITION BY ... ORDER BY ... ROWS BETWEEN
UNBOUNDED PRECEDING AND CURRENT ROW)

## Left Join

19. COO matrix


##PMT2.22, Ex5 (2pts)
##PMT2.20, Ex4 (3pts)

20. CSR matrix


##PMT2.21, Ex7 (2pts)
21. *Cleaning Data
Handling missing values

Merge, filter df, drop columns, rename columns, drop na


Drop cols, replace blanks with np.nan, change datatypes

Remove missing values only at the end


Use df.filter to search based on column names (regex)
DataFrame.filter(items=None, like=None, regex=None, axis=None)
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.filter.html

Filtering and group

22. *Use isin() for multiple parameters filtering


https://www.geeksforgeeks.org/python-pandas-dataframe-isin/
23. Linear Regression Examples
Calculate R2 (coefficient of determination) and adjusted R2

Log-transformation
Dummification/one-hot encoding
24. PCA/SVD Examples

You might also like