Open In App

SQL for Data Analysis Cheat Sheet

Last Updated : 24 Feb, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

SQL (Structured Query Language) is essential for data analysis as it enables efficient data retrieval, manipulation, and transformation. It allows analysts to filter, sort, group, and aggregate large datasets, making data-driven decision-making easier. SQL integrates seamlessly with business intelligence tools like Tableau, Power BI, and Python libraries, enhancing data visualization and reporting.

Why SQL is Important for Data Analysis

SQL is widely used in data analysis because it provides a powerful and efficient way to interact with structured data. Here are some reasons why SQL is essential for data analysts:

  • Efficient Data Retrieval – SQL allows users to fetch specific data from large datasets quickly using simple queries.
  • Data Manipulation – SQL supports various operations such as filtering, sorting, grouping, and aggregating data to prepare datasets for analysis.
  • Scalability – SQL can handle vast amounts of data efficiently, making it suitable for enterprise-level analytics.
  • Integration with BI Tools – SQL works seamlessly with business intelligence tools like Tableau, Power BI, and Python libraries (Pandas, SQLAlchemy) for advanced analysis and visualization.
  • Data Cleaning and Transformation – SQL functions help clean and normalize data, removing duplicates and handling missing values.
  • Decision-Making Support – Businesses use SQL queries to extract insights and drive data-driven decisions.
  • Automation SQL scripts can be scheduled and automated for regular reporting and data processing.

Data Retrieval

1. SELECT Statement: Used to fetch data from a database.

SELECT column1, column2
FROM table_name;

2. DISTINCT Keyword: Eliminates duplicate records.

SELECT DISTINCT column1
FROM table_name;

3. WHERE Clause: Filters records based on specified conditions.

SELECT column1
FROM table_name
WHERE condition;

2. Sorting and Limiting Results

ORDER BY Clause: Sorts the result set.

SELECT column1
FROM table_name
ORDER BY column1 [ASC|DESC];

LIMIT Clause: Limits the number of returned records.

SELECT column1
FROM table_name
LIMIT number;

3. Aggregate Functions

COUNT(): Returns the number of rows.

SELECT COUNT(*)
FROM table_name;

SUM(): Calculates the total sum of a numeric column.

SELECT SUM(column1)
FROM table_name;

AVG(): Calculates the average value.

SELECT AVG(column1)
FROM table_name;

MIN() and MAX(): Retrieve the minimum and maximum values.

SELECT MIN(column1), MAX(column1)
FROM table_name;

4. Grouping Data

GROUP BY: Group rows that have the same values in specified columns into summary rows.

SELECT column1, COUNT(*)
FROM table_name
GROUP BY column1;

HAVING: Filter groups based on aggregate functions.

SELECT column1, COUNT(*)
FROM table_name
GROUP BY column1
HAVING COUNT(*) > 1;

5. Joining Tables

INNER JOIN: Select records with matching values in both tables.

SELECT a.column1, b.column2
FROM table1 a INNER JOIN table2 b
ON a.common_field = b.common_field;

LEFT JOIN: Include all records from the left table and matched records from the right table; fill with NULLs if no match.

SELECT a.column1, b.column2
FROM table1 aLEFT JOIN table2 b ON a.common_field = b.common_field;

RIGHT JOIN: Include all records from the right table and matched records from the left table; fill with NULLs if no match.

SELECT a.column1, b.column2
FROM table1 aRIGHT JOIN table2 b
ON a.common_field = b.common_field;

FULL OUTER JOIN: Return all records when there is a match in either left or right table.

SELECT a.column1, b.column2FROM table1 aFULL OUTER JOIN table2 b ON a.common_field = b.common_field;

5. Subqueries

Subquery in WHERE Clause: Use a subquery to filter results.

SELECT column1
FROM table_name
WHERE column2 IN (SELECT column2 FROM another_table WHERE condition);

Subquery in FROM Clause: Use a subquery as a temporary table.

SELECT a.column1, b.column2
FROM (SELECT column1 FROM table_name WHERE condition)
a JOIN another_table b ON a.common_field = b.common_field;

6. Set Operations

UNION: Combine the result sets of two queries and remove duplicates.

SELECT column1 
FROM table1
UNION
SELECT column1
FROM table2;

UNION ALL: Combine the result sets of two queries, including duplicates.

SELECT column1 
FROM table1
UNION ALL
SELECT column1
FROM table2;

INTERSECT: Return the common records from two queries.

SELECT column1 FROM table1
INTERSECTSELECT column1
FROM table2;

EXCEPT: Return records from the first query that are not in the second query.

SELECT column1 FROM table1
EXCEPT
SELECT column1 FROM table2;

7. Window Functions

ROW_NUMBER(): Assign a unique sequential integer to rows within a partition.

SELECT column1,       
ROW_NUMBER() OVER (PARTITION BY column2 ORDER BY column3) as row_num
FROM table_name;

RANK(): Assign a rank to rows within a partition, with gaps for ties.

SELECT column1, 
RANK() OVER (PARTITION BY column2 ORDER BY column3) AS rank_num
FROM table_name;

8. Date and Time Functions

CURRENT_DATE and CURRENT_TIME: Retrieve the current date and time.

SELECT CURRENT_DATE, CURRENT_TIME;

NOW(): Returns the current timestamp (date and time)

SELECT NOW();

EXTRACT(): Extracts a specific part of a date (e.g., year, month, day).

SELECT EXTRACT(YEAR FROM date_column) AS year_value 
FROM table_name;

DATE_ADD() and DATE_SUB(): Adds or subtracts an interval from a date.

SELECT DATE_ADD(date_column, INTERVAL 7 DAY) AS next_week 
FROM table_name;

SELECT DATE_SUB(date_column, INTERVAL 1 MONTH) AS previous_month
FROM table_name;

DATEDIFF(): Finds the difference between two dates.

SELECT DATEDIFF(end_date, start_date) AS days_difference 
FROM table_name;

DATE_FORMAT(): Formats a date in a specific pattern.

SELECT DATE_FORMAT(date_column, '%Y-%m-%d') AS formatted_date 
FROM table_name;

Conclusion

SQL plays a vital role in data analysis by providing powerful tools for querying, manipulating, and transforming structured data. Its ability to retrieve specific data efficiently, perform aggregations, and join multiple tables makes it indispensable for analysts. SQL integrates seamlessly with BI tools and programming languages, enhancing visualization and reporting. Additionally, it supports automation and scalable data processing, making it suitable for large datasets. Mastering SQL empowers professionals to extract valuable insights, streamline decision-making, and improve data management, solidifying its importance in the world of data analysis.


Next Article
Article Tags :

Similar Reads