100% found this document useful (1 vote)
22 views2 pages

CheatSheet PA2

Uploaded by

Quang Huy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
22 views2 pages

CheatSheet PA2

Uploaded by

Quang Huy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Control Tuple – Round brackets () SQL

+ The break statement terminates the current loop and resumes the execution of + Tuple is immutable, value remains the same. + NUMBER(L,D): numbers is stored with D decimal places and up to L digits long,
the next statement that is located outside the loop (if any). INTEGER: store as whole counting numbers, SMALLINT: like INTEGER but limited
+ tuple.count("a") returns the number of items whose content matches with "a", to only 6 digits, DECIMAL(L,D): like NUMBER but L is instead the minimum length,
+ The continue statement rejects all the remaining statements in the current tuple.index("a") returns the index number of first "a”, enumerate(), len(), max(), CHAR(L): fixed-length character data for up to 255 characters, VARCHAR(L):
iteration of the loop and goes back to the beginning of the loop. min(), sort(), sum(), tuple() converts a data type to tuple form. variable-length character data that stores characters up to L characters, DATE

+ “\n” value will be printed in the next line and “\t” allows value to be indented Dictionary – Curly brackets { } Create a new SQLite Database

Functions + len(dict), (dict.clear()) to empty the dictionary, (dict_name.items()) returns CREATE TABLE table_name (//column1 type NOT NULL UNIQUE,//column2 type
pairs of key value, (dict_name.keys()) returns all keys in the dictionary, NOT NULL UNIQUE,//PRIMARY KEY (column1));
+ For if function, if no condition is included (empty list or string), the returned (dict_name.values()) returns all values in the dictionary, pop() to delete
value is auto False SQL Constraints (Conditions that the data must follow)
Files input & output
+ When there is no return function, the value of return is set as "None" + NOT NULL #Ensures that a column does not accept nulls, UNIQUE #Ensures that
Reading text and csv files all values in a column are unique, DEFAULT #Assigns a value to an attribute when
+ Return ends the function (like break ends the loop) a new row is added, CHECK #Validates data when an attribute value is entered
file_pointer = open("filename.txt", "r") #Reading is a default mode
Built-in functions: Creating table structures
+ file_pointer.read() #Read as a multi-line string, file_pointer.readline() #Read
+ Return the absolute value (abs()), Create a dictionary (dict()), Converts a string line by line (reset when file closes), file_pointer.readlines() #Read as a single list + After database structure is created, table structures need to be defined by
or a number to floating point (float()), Prompts the user and takes input from providing CREATE TABLE + NAME + (COL_NAME DATATYPE(LEN) CONSTRAINT ...)
user as a string (input()), Returns an integer from a given object (int()), Returns with open("file_name.csv", "r") as variable_name:
the length of the object (len()), Takes any iterable (e.g. tuple, string, dictionary) as + Creating a table structures with foreign key
a parameter and returns a list. (list()), Returns the largest item in an iterable csv_reader = csv.reader(variable_name) -> indented
(max()), Returns the smallest item in an iterable, (min()), Computes the power of CREATE TABLE table_name (//column1 type NOT NULL UNIQUE,//column2 type
a number (pow()), Generate a sequence of numbers (range()), Rounds off to the Writing text and csv files NOT NULL,//FOREIGN KEY//(column1))//REFERENCES//table_name(column1));
given number of digits and returns the floating-point number (round()), Returns
the string version of the object (str()), Sums up the numbers in the list or tuple variable_name = open("filename.txt", "w") SQL Data Manipulation Language (DML)
(sum()), Create a tuple (tuple()), Returns the data type of the object (type()).
+ A text file will be created in the same directory + UPDATE #modify an attribute's values in one or more table's rows, DELETE
+ When looping through a dictionary, the first output of the enumerate() function #delete one or more rows from a table, COMMIT #permanently saves data
is the counter (starts 0) followed by the key of each key-value pair. (list.tuple) + The write() method writes a single string or line-by-line changes, ROLLBACK #restores data to its original values
(file_pointer.write(string + "\n")), The writelines() method writes the items of a
Data types list of string(file_pointer.writelines(list)) DML Operators and Functions

string_var_name[index] or string_var_name[-index] #String indexing (start 0) with open("file_name.csv", "w", newline = "") as variable_name: + =, <, >, <=, >=, <> (not equal)// NOT,AND,OR // +, -, *, /, ^ (to the power of)

String slicing (The default index for start is 0 and stop value is not included) csv_writer = csv.writer(variable_name) -> indented + COUNT #return the number of rows with non-null values for a column, MIN
#return the minimum value in a column, MAX #return the maximum attribute,
string_var_name[start: stop: step] #step = -1 will reverse the string + newline = "" is used to eliminate "\n" SUM #return the sum of all values, AVG #return the average of all values

String formatting + The writerow()writes a single row, The writerows()writes multiple rows SELECT COUNT(*) FROM MANAGER;

"Text {:formatting} text".format(matching variable) Datetime module (Date) SELECT MIN(PORTFOLIO_VALUE) FROM MANAGER;

+ String (s), Integer (d), Float (f), Left align (<), Right align (>), Centre align (^), from datetime import date//to_day = datetime.date.today()//to_day.day // Insert data to table
Comma in dollars (,), Space available for text (number) to_day.month //to_day.year
INSERT INTO TABLE_NAME VALUES (X, Y, Z);
String method Datetime module (timedelta)
INSERT INTO TABLE_NAME(COL_NAME, COL2_NAME) VALUES (X, Y);
+ Change all string characters to lowercase (lower()), Change all string characters import datetime//to_day = datetime.date.today()//one_year_days =
to uppercase (upper()), Break up a string at the specified separator and return a datetime.timedelta(days=365)//five_weeks = datetime.timedelta(weeks=5) Querying data
list (split()), Return the index of the specified element in the list (index()), Add an
item to the end of the list (append()), Adds all the elements of an iterable to the print(to_day + five_weeks) SELECT COLUMN_LIST FROM TABLE_NAME;
end of the list (extend()), Check if all characters are digits (isdigit()), Check if all
characters are alphabets (isalpha()), Only capitalize the first letter (capitalize()) Datetime module (datetime) SELECT * FROM MANAGER; #using * asterisk to list out all fields.

List – Square brackets [ ] from datetime import datetime//in_day = datetime(year=1965, month=8, day=9) SELECT DISTINCT NAME FROM MANAGER; #Select unique values

+ List is mutable(changable) and can be sliced using index (start with 0) moment = datetime.now()//print(moment.strftime("formating_conditions")) SELECT PORTFOLIO_VALUE, PORTFOLIO_VALUE * 1.3 AS USD FROM MANAGER;

+ List can be concatenated using + but it will not result in a nested list + %a - week day (short form), %A – week day, %B - Month in English, %Y – year, ORDER BY
%m – month, %d – day, %H – hour, %M – minute, %S – second
+ len(list), min(list), sum(list), append(list), insert(list), sort(list), reverse(list) SELECT NAME,PORTFOLIO_VALUE FROM MANAGER//ORDER BY
PORTFOLIO_VALUE DESC; #Delete DESC for ascending
WHERE clause Data cleaning (Duplicated data) Exporting using pandas DataFrame to CSV, excel or "tab separated values" file

SELECT NAME, PORTFOLIO_VALUE FROM MANAGER//WHERE PORTFOLIO_VALUE + df.duplicated(), df.duplicated().sum(), df[df["col1", "col2"]].duplicated() #show + Use index = False to ensure that index number is not exported
>= 50000//ORDER BY PORTFOLIO_VALUE; only rows with duplicated on column1 and column2
data.to.csv("updated_df.csv"), index = False)
Special operators + Use df.drop_duplicates("duplicated_value", inplace=True) to remove duplicates
data.to.excel("updated_df.xlsv", index = False)
+ BETWEEN #check whether an value is within the range, ISNULL #check whether Data cleaning (Missing values)
an value is null, LIKE #check whether an attribute value matches a given string data.to.csv("updated_df.xlsv", index = False, sep = "\t")
pattern, IN #check whether an value matches any value within a list, EXISTS + fillna(), data.isnull().sum(), data.isna().sum()
#check whether a subquery returns any rows, DISTINCT #limit to unique values Filtering data
data.[data.isna().any(axis=1)] #Show which entries are NaN (Axis=0 means that it
SELECT NAME FROM MANAGER WHERE VALUE BETWEEN 3000 AND 50000; is referring to the row and Axis=1 means it is referring to the column) data[(data["col1" = criteria1]) & (data["column2" = criteria2)] #and

SELECT NAME FROM MANAGER WHERE NAME LIKE "%je%"; #with je between df.dropna(axis=0, how='any', inplace=True) #at least 1 empty cell data[(data["col1" = criteria1]) | (data["column2" = criteria2)] #or is |

SELECT NAME FROM MANAGER WHERE MANAGER_CODE IN ( SELECT df.dropna(axis=0, how='all', inplace=True) #rows where all cells are empty data[data["col1"].str.___("value")] #str.contains()/endswith()/startswith()
MANAGER_CODE FROM EXECUTIVE);
Dataframe row and columns operations Groupby & Aggregation functions
HAVING + GROUP BY
+ Rows and columns are accessed through index, index 0 is the start + Groupby essentially splits the data into different groups depending on the
+ GROUP BY clause is used to create frequency distribution of data, grouping data unique values of a column of your choice.
into a few groups to summarise the data, HAVING filters results after grouped df.drop(number) #drop certain row
data.groupby(["column"]).mean().round(2)
SELECT COUNT(*),DEPARTMENT FROM MANAGER//GROUP BY df.append(another_df) #to insert another dataframe into df
DEPARTMENT//HAVING COUNT(*) >= 2; data.groupby(["column1"]).mean()["column2"]
+ We could first use df.columns to get all columns' names
JOIN Clause Standard indexing
df.pop('acquiree') #remove the column with the header ‘acquiree’
SELECT * FROM MANAGER, EXECUTIVE WHERE MANAGER.MANAGER_CODE = + Selecting rows and columns using standard indexing
EXECUTIVE.MANAGER_CODE; #Join tables using WHERE CLAUSE Pandas basics
df[0:2][["column1","column2"]]
+ INNER JOIN combines the results similar to AND look for values common to + head() method returns the first 5 rows (default) and index starts from 0, tail()
both tables, OUTER JOIN keeps the non-matching results when join is done method returns the last 5 rows, shape attribute returns the rows number, data["column_name"] or data.column_name or data[["column1", "column2"]]
followed by the columns number, infor() method returns basic infor about the
SELECT * FROM MANAGER INNER JOIN EXECUTIVE ON file, value_counts() returns counts of unique values, data.corr(method = Selecting using iloc[ ] – Stop index is EXCLUDED
MANAGER.MANAGER_CODE = EXECUTIVE.MANAGER_CODE; "pearson") or data.corr() calculates correlation, describe() gives count, mean,
standard deviation, min, 25%, 50%, 75%, max + The left of the comma will slice the rows using index numbers. The right of the
OUTER JOIN comma will slice the columns using index numbers (Not column names)
Sorting data in pandas
+ LEFT JOIN, left table records will be shown even for non-matching results, df.iloc[1] or df.iloc[1,:] #read a column
RIGHT JOIN, right table records will be shown even for non-matching results [Not data.sort_values("column_name") #Sort in ascending order based on 1 column
SQLite], FULL JOIN, join left and right tables all records [Not SQLite] df.iloc[0:2] #return rows 0 and 1
data.sort_values(['price', 'quantity_ordered'], ascending = False) #reverse oRder
SUBQUERIES df.iloc[[0:2]] #return rows 0 and 2
+ Use inplace=True if you want the sorted order to be saved
+ For nested selection, subquery can be used to generate information. Runs the df.iloc[0,2] #return value at row 0 and column 2
inner query first, then apply outer queries on the inner queries. + Sorting a text-based column (default is A to Z)
df.iloc[0:2,0:2] #return rows 0&1 + columns 0&1
SELECT NAME FROM MANAGER WHERE PORTFOLIO_VALUE >= (SELECT Making changes to data
AVG(PORTFOLIO_VALUE) FROM MANAGER); df.iloc[::2,0:2] #return rows 0&2&4 + columns 0&1
data["new_header"] = list #Create a new column from current columns
Data and Pandas Selecting using loc[] – Stop index is INCLUDED
data.column.replace(old_value, new_value, inplace = True) #Replace
Constructs of DataFrame (not Dataframe or dataframe) + The left of the comma will slice the rows using index numbers. The right of the
data.loc[data["column"] == "old_value", "column"] = "new_value" #Replace comma will slice the columns using the column names (Not index numbers)
df.list = pd.DataFrame(list, columns = ["columns_name"]) # Given a list of lists)
data.loc[0,"column"] = value #Set values of a specific cell using loc[] df.loc[:,"column_name"] #Read all rows using column name
df_dictlist = pd.DataFrame(dictlist) #When you are given a dictionary and the
values are lists - the keys will be column names and the values will be the rows.) data.iloc[0,5] = value #Set values using iloc[] df.loc[0] #Selecting single row

tuple_list = list(zip(list1, list2)) #Two lists can be merged using list(zip()) function data.iloc[0, data.columns.get_loc("column")] = value #Set values using iloc[] df.loc[[0,2]] #rows 0,2 or df.loc[0:4:2] #rows 0,2,4

df = pd.DataFrame(tuple_list, columns = ["columns_name"]) df2.append(df3,ignore_index=True) # Append other dataframe and renames the df.loc[[0], "column_name"] #Selecting one row and one column
axis(identifier) to 0 ,1,2,3,...
my_file = pd.read_csv("abc.csv") #Used when you already have a CSV file data.loc[data["header"] = criteria,["columns_name"]] #Read a specific row
according to criterias

You might also like