SQL Basics
SQL Basics
2023 年 7 月 28 日
The most common shape for data is a spreadsheet or table. The things we are measuring (variables) are in the columns,
and the individual instances (observations) are in the rows. We can read each column “down” the table (viewing
multiple observations), and each row “across” the table (viewing multiple variables).
What is SQL SQL (Structured Query Language) is a programming language designed to manipulate and manage
data stored in relational databases.
SQL 指结构化查询语言,就是访问和处理关系数据库的计算机标准语言
What is SQL statement - text that database recognizes as a valid command, usually end in[;],statement 会有
statement clause 和 parameter(括号内容)两部分,clause 是具体的指令,通常全部大写
A statement is a string of characters that the database recognizes as a valid command.
What is Retrieving information stored in a database, one of the core purposes of the SQL language
querying
MANIPULATIVE FUNCTIONS
constraints Constraints come after the data type, which are rules applied to the values of individual columns
CREATE TABLE celebs (
id INTEGER PRIMARY KEY,
name TEXT UNIQUE,
date_of_birth TEXT NOT NULL,
date_of_death TEXT DEFAULT 'Not Applicable' );
Constraints 都是大写
[PRIMARY KEY] Attempts to insert a row with an identical value to a row already in the table will
result in a constraint violation(i.e., 基本就是序号,不能输入一样的,一个 table 只能设置一个列为
PRIMARY KEY)
[UNIQUE] columns have a different value for every row, similar function to PRIMARY KEY except a
table can have many different UNIQUE columns
[NOT NULL] columns must have a value
[DEFAULT] columns take an additional argument that will be the assumed value for an inserted
row if the new row does not specify a value for that column. (i.e., 给该列设置预设值)
ALTER 给表格新增列 allow change making to the table, add a new column to the table
TABLE… ALTER TABLE table_name
ADD ADD COLUMN column_name data_type;
COLUMN… 更改列才叫更改表格,因为每行的内容只是 record / observation
ALTER modifies the structure of the table (by adding, removing or renaming columns)
UPDATE modifies the information contained in the table
[NULL] is a special value in SQL that represents missing or unknown data. Here, the rows that
existed before the column was added have NULL (∅) values for the new column
By default, a new column will always be added at the end of the table
[IS NULL] is a condition in SQL that returns true when the value is NULL and false otherwise.
If want to get rid of the row without changing positions, just use UPDATE and set the
observation to NULL
QUERY FUNCTIONS
SELECT statements always return a new table called the result set
SELECT… 从表格中调用符合特定条件的数据 filters the result set to only include rows where the following condition
FROM… is true
WHERE… SELECT */column_name FROM table_name
WHERE condition;
[LIKE] a special operator used within the WHERE clause to search for a specific pattern in a column,
usually followed by a pattern with wildcard characters
[_], meaning that you can substitute any individual character here without breaking the pattern.
[%], a wildcard character that matches zero or more missing letters in the pattern
SELECT */column_name FROM table_name
WHERE column_name LIKE 'Se_en'; #The names Seven and Se7en both match this pattern.
WHERE column_name LIKE '%A%'; #return results that matche all names containing 'man'
[IS NULL] / [IS NOT NULL] special operators used within the WHERE clause to identify missing
values
SELECT */column_name FROM table_name
WHERE column_name IS NULL;
WHERE column_name IS NOT NULL;
[BETWEEN] special operator used within the WHERE clause to filter the result set within a certain
range, it accepts two values that are either numbers, text or dates; The range differs depending on
data type, for numbers/dates, it includes both end, but for text, it filters the result set for within the
alphabetical range (not including the latter end)
SELECT */column_name FROM table_name
WHERE column_name BETWEEN 1990 AND 2000; #return results that only have year value of
number between 1990-2000 (includes both ends)
WHERE column_name BETWEEN 'A' AND 'J'; #return results that begin with the letter ‘A’ up
to, but not including ones that begin with ‘J’, However, if a movie has a name of simply
‘J’, it would actually match. This is because BETWEEN goes up to the second value — up to
‘J’. So the movie named ‘J’ would be included in the result set but not ‘Jaws’.
[AND] special operator used within the WHERE clause to combine multiple conditions, with AND,
both conditions must be true for the row to be included in the result.
SELECT */column_name FROM table_name
WHERE condition_1 AND condition_B
[OR] similar to AND, but displays a row if any condition is true
SELECT */column_name FROM table_name
WHERE condition_1 OR condition_B
SELECT… 从表格中调用数据并按一定规律排序 list the data in our result set in a particular order
FROM… SELECT * FROM movies
ORDER BY… ORDER BY column_name; #sort the result set by a particular column.
ORDER BY column_name DESC/ASC; #sort the result by a particular column and in descending /
ascending order
ORDER BY column_1, column_2; #sort the results set first by column1 then by column2
IF WHERE clause is used, ORDER BY always goes after the WHERE clause
SELECT… 从表格中调用规定行数的数据 a clause that lets you specify the maximum number of rows the result set
FROM… will have
LIMIT… SELECT * FROM table_name
LIMIT number;
[CASE] statement is usually put in the SELECT statement, and it must end with END
AGGREGATE FUNCTIONS
COUNT 计算指定列的非空行数的函数 takes the name of a column as an argument and counts the number of non-
empty values in that column
SELECT COUNT(column_name)
FROM table_name;
SUM 计算指定列的各行值之和的函数 takes the name of a column as an argument and returns the sum of all the
values in that column
SELECT SUM(column_name)
FROM table_name;
MAX / MIN 计算指定列的最大和最小值的函数 takes the name of a column as an argument and returns the largest
value in that column
SELECT MAX(column_name) / MIN(column_name)
FROM fake_apps;
AVERAGE 计算指定列各行值的平均数的函数 taking a column name as an argument and returns the average value
for that column
SELECT AVG(column_name)
FROM table_name;
ROUND 对指定列的各行值取整的函数 takes two arguments inside the parenthesis: 1.a column name 2. an integer.
It rounds the values in the column to the number of decimal places specified by the integer
SELECT ROUND(column_name, integer)
FROM table_name;
SELECT column_1 ROUND(column_2, 2) #return column_1 and column_2 rounded to two decimal places
FROM table_name
GROUP BY 根据指定列的项对 SELECT statement 返回的值进行分组 a clause in SQL that is used with aggregate
functions. It is used in collaboration with the SELECT statement to arrange identical data into groups
SELECT column_1, aggregate_function(column2)
FROM table_name
GROUP BY column_1;
The GROUP BY statement comes after any WHERE statements, but before ORDER BY or LIMIT
SQL lets us use column reference(s) in our GROUP BY / ORDER BY,
1 is the first column selected
2 is the second column selected
3 is the third column selected
SELECT ROUND(column_1),
COUNT(column_2)
FROM table_name
GROUP BY 1,
ORDER BY 1;
[HAVING] is very similar to WHERE. In fact, all types of WHERE clauses you learned about thus far
can be used with HAVING. HAVING statement always comes after GROUP BY, but before ORDER BY
and LIMIT
When we want to limit the results of a query based on values of the individual rows, use WHERE.
When we want to limit the results of a query based on an aggregate property, use HAVING.
SELECT ROUND(column_1),
COUNT(column_2)
FROM table_name
GROUP BY 1,
HAVING COUNT(column_2) > 5; #return round(column_1) and count(column_2) when the result > 5
MULTIPLE TABLE
JOIN 根据指定重复列串联/合并表格 combine tables
SELECT *
FROM table_1
JOIN table_2
ON table_1.column_1 = table_2.column_1; # match table_1’s column_1 with table_2's column_2
The table_name.column_name syntax can be not only used in the ON statement, but we can also
use it in the SELECT or any other statement where we refer to column names
SELECT table_1.column_1, table_2.column_2
FROM table_1
JOIN table_2
ON table_1.column_1 = table_2.column_1;
[INNER JOIN 合并同类项] is the default JOIN and it will only return results matching the condition
specified by ON
[LEFT JOIN 保留表 1&表 2 的同类项和表 1 的剩余项] will keep all rows from the first table, regardless
of whether there is a matching row in the second table. If the join condition is not met, LEFT JOIN
will fill columns on the right table with NULL
SELECT *
FROM table1
LEFT JOIN table2
ON table1.c2 = table2.c2;
[CROSS JOIN 用表 1 的每一项和表 2 的每一项结合] comine all rows of one table with all rows of
another table, able to know all the possible combinations. A common usgage is when we need to
compare each row of a table to a list of values. CROSS JOIN don’t require an ON statement.
You’re not really joining on any columns!
SELECT table1.column1, table2.column2
FROM table1
CROSS JOIN table2;
UNION 合并有相同列的两个表的所有行 Stack table1 on top of table2, tables must have the same number of
columns and the same data types in the same order asd the first table
SELECT * FROM table1
UNION
SELECT * FROM table2;
we can also use UNION to quickly make a “mini” dataset:
WITH 嵌套另一个查询语句 Essentially, we are putting a whole first query inside the parentheses () and giving it
a name. After that, we can use this name as if it’s a table and write a new query using the first query.
WITH previous_results AS (
SELECT ...
...
...
… #注意括号内这个 query 不要分号结束
)
SELECT *
FROM previous_results
JOIN table2
ON _____ = _____;
[WITH] statement allows us to perform a separate query
[previous_results] is the alias that we will use to reference any columns from the query inside of the
WITH clause