0% found this document useful (0 votes)
2 views

Python CheatSheet

The document outlines a structured curriculum for learning programming concepts, data manipulation, and SQL, divided into multiple semesters. It covers topics such as variables, loops, functions, data types, and data visualization using libraries like Pandas and Matplotlib. Additionally, it includes SQL commands for data definition and querying, along with practical examples for data analysis and preparation.

Uploaded by

marcuslimlj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Python CheatSheet

The document outlines a structured curriculum for learning programming concepts, data manipulation, and SQL, divided into multiple semesters. It covers topics such as variables, loops, functions, data types, and data visualization using libraries like Pandas and Matplotlib. Additionally, it includes SQL commands for data definition and querying, along with practical examples for data analysis and preparation.

Uploaded by

marcuslimlj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Sem 1-2: Basics Sem 3-5: Basics Sem 6: File Pandas Additional

Variables Read Merge


Definite (for loop) Indefinite (while loop)

Boolean (True/False)

*For unlike columns, i.e. different column names but similar items in rows
Functions
Write pd.merge(df1, df2 ,left_on='Column1', right_on='Column2')
Concat
pd.concat([df1,df2], axis=0, join'outer', ignore_index=False)

*When function returns multiple values


it's in tuple

Slicing Concat

Unique
df[df_name].unique
Formatting • Return only unique values (rows) of the dataframe
Python Shortcuts
1) ^ < > : Centre/left/right align • In order of appearance, removes repeats
2) s=string, d=decimal, f=float
3) .2f: 2d.p. Random Syntax
Type Conversion 4) ',': Auto splits numbers by thousands

CSV
Input Lists
Functions: len(), min() or max(), sum(), insert(), reverse()
.split(): Separates string into lists, by certain delimiter (" ", "-")
Math Operations .strip(): Removes white spaces at the beg and end
.pop(-1): Removes items at index: -1

Sem 10: Plotting


Graphs

Append Sort Reverse Types: bar, scatter, box, stackplot, hist, pie

Tuple

Errors Dictionary
Syntax Error: Syntax/variables/indent/brackets Functions: len(dict), dict.clear() = empties dict
Runtime Error: Cannot run, debug dict.items() = returns pairs of key value: dict_items([('keys1','values1'), etc])
Logical Error: Able to run, but output wrong dict.keys() = returns all keys: dict_keys(['keys1'], ['keys2'])
dict.values() = returns all values: dict_values(['values1'], ['values2'])

Simple

Iteration

Quick Notes 2 Page 1


SQL Data Preparation & Descriptive Analysis

Data Definition Language (DDL) Changing Title & Sort Changing Title & Listing Runtime > import numpy as np (data computation & data manipulation)
CREATE SCHEMA AUTHORIZATION/TABLE/INDEX/VIEW SELECT Movie_title AS Movies 120 import pandas as pd (Dataframe & Data structure)
ALTERT TABLE (AS); NOT NULL; UNIQUE; PRIMARY KEY FROM Movie SELECT Movie_title AS Movies import matplotlib.pyplot as plt (Visualisation)
FOREIGN KEY; DEFAULT; CHECK; DROP TABLE/INDEX/VIEW ORDER BY Movie_title FROM Movie from scipy import <sub_module_name> (cluster, stats, linalg)
WHERE Movie_Runtime >120 pd.DataFrame(data=<df/dict>, index=[‘<char>,…], columns=[‘<char>’…],
ORDER BY Movie_title;
dtype=<char>,copy - <true/false> → Construct Dataframe
<var1> = pd.read_csv(“<file>”) → Read CSV file
Count How Many Movies in Each Genre
df.head() → get first 5 rows | df.tail() → last 5 rows [Specify no. of rows]
SELECT Movie_Genre, COUNT(*)
FROM Movie
df.index → gives range or specific index
GROUP BY Movie_Genre; df.columns or .keys() → show the headers of the column
df.shape → size of data (rows x columns)
Select Unique Movie ID from Carey Dopico df.idxmax() → returns max value index
SELECT DISTINCT Movie_ID df.isnull() → returns True/False
FROM Rental df.dtypes → data type of each column in a Dataframe
INNER JOIN Customer ON Customer.Cus_No = Rental.Cus_No df[“column”].unique → shows the different items in the column
WHERE Customer.Cus_Lname = “Carey” AND df.drop(number) → Drop certain row
Customer.Cus_Fname = “Dopico”; df.append(another_df) → Insert another dataframe into df
df.pop(column_name) → remove a column
SQL Constraints Listing First and Last Name Who Ordered Movie 3 df.duplicated → returns True/False | df.drop_duplicates()
NOT NULL – Column does not accept nulls SELECT DISTINCT Cus_Fname, Cus_Lname df.drop_duplicated([‘col_name’]) → drop duplicates
UNIQUE – All values in column are unique FROM Customer
df.fillna(‘content to fill’)
DEFAULT – Assign value to an attribute when new row is added to table INNER JOIN Rental ON
df.dropna() → drop all Nan | df.replace(“111”,”Others”,inplace = True)
CHECK – Validates data when an attribute value is entered Customer.Cus_No = Rental.Cus_No
WHERE Movie_ID = 3; df.dropna(axis=0, how=’any’, thresh=None, subset=None, inplace=True)
axis=0 is index, =1 is column
how=’any’ vs ‘all’. OR or AND logic
Listing First and Last Name Who Never Ordered Movie 3 subset=[‘col_name’] → which column to look for NAs
SELECT DISTINCT Cus_Fname, Cus_Lname thresh=2 means requires 2 NAs | inplace =True = change original instead of returning a new one
FROM Customer df.mean(), .median(), .mode(), .std(), .sum(), .min(), .max(), .var()
WHERE NOT (Cus_Fname = “Baitmore” AND Cus_Lname = “Aliza”); df.describe() → auto tabulate all variables and give a numeric summary
df.corr(method=’pearson’) → correlation
Average Movie runtime for movies with <120 df.info() → show info of each column
SELECT AVG(Movie_Runtime) AS AVERAGE df.loc[“f”,”age”] =1.5 → Changing Cell Values
FROM Movie
WHERE Movie_Runtime <120; Data Selection
Pandas Additional df.<column> → print the column data only (any column header)
df.Dataframe → print will give first 5 and last 5 rows df.<column>.mean() → mean of values in column
df.to_string() → return entire DataFrame df[df.<column> == 3] | df[df.<column> == 3].price.mean()
df.head() → Returns headers & specified row, from top df.loc[] → Locating a particular row index or column name
df.dropna() → default returns new Dataframe and not change original df.loc[0,[‘beds’,’type’]] → see row 0 in 2 columns | df.loc[[0,1]] ->1st&2nd
Loc includes last row, e.g.2:11 includes 11, iloc doesn’t
df.plot() → plot (draw) diagrams
pd.Series(mylist) → create a Pandas Series df.iloc[row#, column #] → index on column and row
myseries[0] → return first value of pandas series df.iloc[1:3] → slicing data see 2nd to 4th row
pd.Series(mylist, index =[“x”,”y”,”z”]) → Add lables to Pandas Series .groupby(col_name) → groups by value
df[col_name][row_index]
Join Clause Corresponding month with highest muffin sales df[df.col_name>5000]
SELECT * FROM MANAGER, EXECUTIVE WHERE MANAGER>MANAGER_CODE = df[df.sales_muffin == df.sales_muffin.max()][“months”] df.col_name[row_index]
EXECUTIVE.MANAGER_CODE;
Mean sale of muffin of first 3 mths of sales (2dp) df.groupby(‘host_name’).number_reviews.sum().sort_values(ascending=False) →
Inner Join → combines results & look for values common to both tables number of reviews for host_name
round(df.loc[:2,”sales_muffin”.mean(),2])
Outer Join → Keeps the non-matching results (Left, Right, Full) df[(df.CustomerID == 15311) & (df.Quantity >= 24.0)].loc[:,['CustomerID', 'Quantity']]
df[['area','neighbourhood']] = df.neighbour_hood_info.str.split(",", expand = True) [split column]

You might also like