SQL For Data Analyst Part - 3
SQL For Data Analyst Part - 3
🔑 Primary Key is used to uniquely identify each row in a table and ensures that no two rows
have the same value. It helps in maintaining data integrity and forms the basis for
relationships with other tables.
🔑 Unique Key also ensures that each row has a unique value, but unlike the primary key, it
allows one row to have a NULL value. Tables can have multiple unique keys, but only one
primary key.
🔑 Foreign Key establishes a relationship between two tables by linking a column in one table
to the primary key of another table. It helps maintain referential integrity and ensures data
consistency.
What is the difference between DISTINCT and GROUP BY?
DISTINCT clause will return unique column values. Depending on the list of columns you provide , it will fetch the unique combination of values for
all those combined columns. If you provide just a single column in DISTINCT then it fetches just the unique values in that specific column.
Example below:
Below query returns unique employee names from the employee table:
Whereas below query would return unique combination of values based on all the columns from the employee table.
GROUP BY clause will group together the data based on the columns specified in group by. GROUP BY can also be used to fetch unique records from
a table but this is not why group by clause is used for. The main purpose of group by clause is to perform some aggregation (using the aggregate
functions like MIN, MAX, COUNT, SUM, AVG) based on the grouped by column values. Example below:
Below query would group together the data from employee table based on name column and then for each name value, it would count how many
records have the same name.
RANK() function will assign a rank to each row within each partitioned result set. If multiple
rows have the same value then each of these rows will share the same rank. However the
rank of the following (next) rows will get skipped. Meaning for each duplicate row, one rank
value gets skipped.
DENSE_RANK() function will assign a rank to each row within each partitioned result set. If
multiple rows have the same value then each of these rows will share the same rank.
However the dense_rank of the following (next) rows will NOT get skipped.
This is the only difference between rank and dense_rank. RANK() function skips a rank if
there are duplicate rows whereas DENSE_RANK() function will never skip a rank.
ROW_NUMBER() function will assign a unique row number to every row within each
partitioned result set. It does not matter if the rows are duplicate or not.
Can we use aggregate function as window function? If yes then
how do we do it?
Yes, we can use aggregate function as a window function by using the OVER
clause. Aggregate function will reduce the number of rows or records since
they perform calculation of a set of row values to return a single value.