0% found this document useful (0 votes)
4 views33 pages

Learn

The document outlines a six-step data analysis process, which includes defining the problem, collecting data, cleaning data, analyzing it, visualizing results, and presenting findings. It also provides insights into SQL, covering database creation, CRUD operations, and various SQL commands and functions. Additionally, it highlights the importance of data constraints, joins, and the role of SQL in data manipulation and management.

Uploaded by

kanimozhi MANI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views33 pages

Learn

The document outlines a six-step data analysis process, which includes defining the problem, collecting data, cleaning data, analyzing it, visualizing results, and presenting findings. It also provides insights into SQL, covering database creation, CRUD operations, and various SQL commands and functions. Additionally, it highlights the importance of data constraints, joins, and the role of SQL in data manipulation and management.

Uploaded by

kanimozhi MANI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Six Steps of Data Analysis Process

1. Define the Problem or Research Question


2. Collect Data
3. Data Cleaning
4. Analyzing the Data
5. Data Visualization
6. Presenting Data

1. Define the Problem or Research Question (Ask Phase)


In the first step, I focus on understanding the business problem or objective. This involves
communicating clearly with stakeholders to identify their pain points, expectations, and
desired outcomes. I ensure that I ask the right questions to uncover the root cause of the
issue and align my analysis with the stakeholders' goals. It’s also important to remain
focused and avoid distractions during this phase to fully grasp the scope of the problem.
Key Questions I Ask:
 What problem are we trying to solve?
 What are the stakeholders expecting as a solution?
2. Collect the Data (Prepare Phase)
Once the problem is clearly defined, I move to data collection. I gather relevant data from
multiple sources — both internal (like company databases) and external (such as public
datasets or third-party providers).
Types of Data:
 First-party: Data collected directly by the organization.
 Second-party: Shared data from another organization.
 Third-party: Data purchased or collected externally.
Data is commonly collected through surveys, interviews, feedback forms, and stored in
formats like Excel spreadsheets or SQL databases.
3. Clean the Data (Process Phase)
Cleaning is one of the most critical steps. I remove duplicates, correct inconsistencies, and
ensure the data is in the correct format. Clean data is essential for reliable analysis. I also
check for any biases in the data to ensure fair and accurate insights.
Tools I Use:
 Excel functions (e.g., Remove Duplicates, TRIM)
 SQL queries (e.g., filtering NULLs, removing duplicates)
4. Analyze the Data (Analyze Phase)
With clean data, I perform analysis to identify patterns, correlations, and trends. I use tools
like Excel (pivot tables, formulas) and SQL (joins, groupings, aggregations). I may also use
programming languages like Python or R for advanced statistical analysis or automation.
Objective: Find meaningful insights that address the business problem.
5. Visualize the Data (Share Phase)
Visualization helps convey complex data in a simple and understandable form. I use tools
like Tableau or Power BI to create interactive dashboards and charts. These visuals are
especially helpful for non-technical stakeholders.
Goal: Make data easy to understand and insights more impactful.
6. Present the Data (Act Phase)
Finally, I present the data story. This is more than just showing visuals — I interpret the
findings, connect them back to the original problem, and recommend actions. I tailor the
presentation to suit both technical and non-technical audiences, using storytelling to guide
them through the insights.
Output Formats:
Reports
Slide decks
Interactive dashboards
The aim is to support informed decision-making and drive positive business outcomes.
Summary
Each step builds on the previous one — from understanding the problem to delivering
actionable insights. I follow this structured approach to ensure my analysis is accurate,
insightful, and aligned with business goals.
Perform Data Analysis in Excel
Preparing Data for Analysis
How to Clean Data in Excel
1. Remove Duplicates: Use Data > Remove Duplicates to eliminate redundancy.
Also, there are many ways to identify duplicates for checking
 =UNIQUE([array], [by column], [TRUE-[returns items that appear only once] FALSE-[return
every distinct item]])
 DATA -> REMOVE DUPLICATES -> (remove duplicates by column-wise)
 By using CONDITIONAL FORMATTING-> HIGHLIGHT DUPLICATES-> (can highlight
duplicate values by color and remove after checking)
2. Use TRIM and CLEAN Functions:
 TRIM removes unnecessary spaces.
 CLEAN removes non-printable characters (hidden characters, weird characters, also
removes multiple lines text)
3. Sort and Structure Data:
 Use Text to columns wizard option to segregate or split columns based on preferences.

SQL
What is a Database?
A database is a digital system designed for the storage and arrangement of data.
Think of it as an online filing system that allows you to store and quickly access a
vast amount of information. Databases facilitate the efficient management of data,
enabling the simple addition, modification, removal, and access of information. They
serve numerous uses such as websites, applications, and enterprises to manage
extensive data in an organized and secure manner.
What is MySQL?
MySQL is an open-source relational database management system (RDBMS) that
uses Structured Query Language (SQL) to manage data. Developed by MySQL
AB and now owned by Oracle Corporation, it’s widely used due to
its reliability, speed, and ease of use. MySQL is a key component in many web
applications, forming the backbone of popular websites and services.
It allows users to create, modify, and maintain databases, supporting operations like
data insertion, querying, updating, and deletion. Ideal for both small and large-scale
applications, MySQL powers various types of systems, from personal projects to
complex enterprise environments.
SQL is a Structured query language used to access and manipulate data in
databases. SQL is a query language that communicates with databases.

SQL Uses:
SQL's integration with various technologies makes it essential for managing and
querying data in databases. Whether it's in traditional relational databases (RDBMS)
or modern technologies such as machine learning, AI, and blockchain, SQL plays a
key role. It works seamlessly with DBMS (Database Management Systems) to help
users interact with data, whether stored in structured RDBMS or other types of
databases.
Data Science & Analytics: Used for querying large datasets, data cleaning, and
analysis. Analysts use SQL to generate reports and insights that inform business
decisions.

How SQL work?


When you interact with a database, you typically use SQL commands to perform
these operations. These commands are translated into actions by the SQL Engine,
the core component responsible for processing queries. The SQL Engine parses and
compiles SQL queries, optimizing and executing them to interact with the stored data.
Different DBMS tools (like MySQL, SQL Server, etc.) provide an interface and APIs
that users can use to interact with the database. These tools provide a user-friendly
way to write and execute SQL queries, but internally, they rely on their respective
SQL Engines to process these commands.
For example, MySQL uses its own SQL Engine to parse, optimize, and execute
queries, while SQL Server has a different SQL Engine for the same task.

Important Terminologies
These are some important terminologies that are used in terms of relation.
Table STUDENT

ROLL_NO NAME ADDRESS PHONE AGE

1 RAM DELHI 9455123451 18


ROLL_NO NAME ADDRESS PHONE AGE

2 RAMESH GURGAON 9652431543 18

3 SUJIT ROHTAK 9156253131 20

4 SURESH DELHI 9156768971 18

Attribute:Attributes are the properties that define a relation.


e.g.; ROLL_NO, NAME etc.
Tuple:Each row in the relation is known as tuple. The above relation contains 4
tuples, one of which is shown as:
Degree:The number of attributes in the relation is known as degree of the relation.
The STUDENT relation defined above has degree 5.
Cardinality:The number of tuples in a relation is known as cardinality.
The STUDENTrelation defined above has cardinality 4.
Column:Column represents the set of values for a particular attribute. The
column ROLL_NO is extracted from relation STUDENT.

Create Database
The first step to storing the information electronically using SQL includes creating
database. And in this section we will learn how to Create, Select, Drop, and Rename
databases with examples.
 CREATE Database
 DROP Database
 RENAME Database
 SELECT Database
Tables

The cornerstone of any SQL database is the table. Basically, these structure
functions is very similar to spreadsheets, which store data in very organized grid
format. In this section, you will learn how to Create, Drop, Delete, and more related
to Table.
 CREATE TABLE
 DROP TABLE
 RENAME TABLE
 TRUNCATE TABLE
 COPY TABLE
 TEMP TABLE
 ALTER TABLE
CRUD Operations

In this section, you will learn about the SQL Queries like SELECT statement, SELECT
LAST, and more. Explore this section and learn how to use these queries.
 SELECT Statement
 INSERT INTO
 INSERT Multiple Rows
 UPDATE Statement
 DELETE Statement
 DELETE Duplicate Rows
Clauses

Unlock the power of SQL Clauses with this SQL tutorial. Here in this section, you will
learn how to use SELECT, WHERE, JOIN, GROUP BY, and more to query databases
effectively.
 WHERE Clause
 WITH Clause
 HAVING Clause
 ORDER By Clause
 Group By Clause
 LIMIT Clause
 Distinct Clause
 FETCH
 Aliases
Operators
SQL Operators" refers to the fundamental symbols and keywords within the SQL
that enable users to perform various operations and SQL AND, OR, LIKE, NOT, and
more operators on databases. Here, we have discussed all the SQL operators in a
detailed manner with examples.
 AND Operator
 OR Operator
 Logical Operators
 LIKE Operator
 IN Operator
 NOT Operator
 NOT EQUAL Operator
 IS NULL Operator
 UNION Operator
 UNION ALL Operator
 EXCEPT Operator
 BETWEEN Operator
 ALL and ANY
 INTERSECT Operator
 EXISTS Operator
 CASE Operator
Aggregate Functions
Whether you are calculating the total sales revenue for a particular product, finding
the average age of customers, or determining the highest value in a dataset, SQL
Aggregate Functions make these tasks straightforward and manageable.
Aggregate Function
 Count() Function
 SUM() Function
 MIN() Function
 MAX() Function
 AVG() Function
Data Constraints

Constraints act as rules or conditions imposed on the data, dictating what values are
permissible and what actions can be taken. They play a crucial role in maintaining
the quality and coherence of the database by preventing errors. So, explore this
section to get a hand on SQL Data Constraints.
 NOT NULL Constraints
 Primary Key Constraints
 Foreign Key Constraints
 Composite Key
 Unique Constraints
 Alternate Key
 CHECK Constraints
 DEFAULT Constraints
Joining Data

SQL joins serve as the weaver's tool, allowing you to seamlessly merge data from
multiple tables based on common threads. So explore this section to learn how to
use JOIN command.
 JOIN
 Outer Join
 Left Join
 Right Join
 Full Join
 Cross Join
 Self Join
 UPDATE with JOIN
 DELETE JOIN
 Recursive Join
Functions

SQL functions offer an efficient and versatile approach to data analysis. By


leveraging these functions within your queries, you can enhance the depth and
accuracy of your insights, transforming raw data into actionable knowledge.
Date Functions
 String Functions
 Numeric Functions
 Statistical Functions
 JSON Functions
 Conversion Functions
 Datatype Functions
 LTRIM Function
 UPPER Function
 RTRIM Function
Views
Views makes easier for anyone to access the information they need, without getting
bogged down in complicated queries. Views also act like a helpful security guard,
keeping the most sensitive information in the back room, while still allowing access
to what's needed.
 CREATE VIEW
 UPDATE VIEW
 RENAME VIEW
 DELETE VIEW
Indexes
Indexes work by organizing specific columns in a particular order, allowing the
database to quickly pinpoint the information you need. And in this section, we have
listed all the points that one has to learn while learning SQL.
Indexes
 Create Index
 Drop Index
 Show Indexes
 Unique Index
 Clustered Index vs Non-Clustered Index
Subquery

Subqueries allow you to perform nested queries within a larger query, enabling more
complex data retrieval. They help in filtering data or performing operations on data
that would otherwise require multiple queries.
 Subquery
 Correlated Subqueries
 Nested Queries
Miscellaneous Topics

In this miscellaneous section, you will encounter concepts like stored procedures for
automating repetitive tasks, triggers for automated actions based on data changes,
and window functions for complex calculations within a single query.
 Wildcards Operators
 Comments
 Pivot and Unpivot
 Trigger
 Hosting
 Performance Tuning
 Stored Procedures
 Transactions
 Sub Queries
 Using Sequences
 Auto Increment
 Window functions
 Cursors
 Common Table Expressions
 Database Tuning
 Dynamic
 Regular Expressions

SQL commands are categorized based on their specific functionalities:


1. Data Definition Language (DDL),
2. Data Manipulation Language (DML),
3. Data Control Language (DCL),
4. Data Query Language (DQL), and
5. Transaction Control Language (TCL).

1. DDL - Data Definition Language


can be used for defining, altering, and deleting database structures such
as tables, indexes, and schemas. It simply deals with descriptions of the database
schema and is used to create and modify the structure of database objects in the
database.

Command Description Syntax

Create database or its objects CREATE TABLE table_name


CREATE (table, index, function, views, (column1 data_type, column2
store procedure, and triggers) data_type, ...);

Delete objects from the


DROP DROP TABLE table_name;
database

ALTER Alter the structure of the ALTER TABLE table_name ADD


Command Description Syntax

database COLUMN column_name data_type;

Remove all records from a


table, including all spaces
TRUNCATE TRUNCATE TABLE table_name;
allocated for the records are
removed

Add comments to the data COMMENT 'comment_text' ON


COMMENT
dictionary TABLE table_name;

Rename an object existing in RENAME TABLE old_table_name TO


RENAME
the database new_table_name;

2. DQL - Data Query Language


DQL statements are used for performing queries on the data within schema objects.
The purpose of the DQL Command is to get some schema relation based on the
query passed to it. This command allows getting the data out of the database to
perform operations with it. When a SELECT is fired against a table or tables the
result is compiled into a further temporary table, which is displayed.

Command Description Syntax

It is used to retrieve data from SELECT column1, column2, ...FROM


SELECT
the database table_name WHERE condition;

3. DML - Data Manipulation Language


The SQL commands that deal with the manipulation of data present in the
database. It is the component of the SQL statement that controls access to data and
to the database.

Command Description Syntax


Command Description Syntax

INSERT INTO table_name (column1, column2, ...)


INSERT Insert data into a table
VALUES (value1, value2, ...);

Update existing data UPDATE table_name SET column1 = value1, column2


UPDATE
within a table = value2 WHERE condition;

Delete records from a


DELETE DELETE FROM table_name WHERE condition;
database table

Table control
LOCK LOCK TABLE table_name IN lock_mode;
concurrency

Call a PL/SQL or JAVA


CALL CALL procedure_name(arguments);
subprogram

EXPLAIN Describe the access


EXPLAIN PLAN FOR SELECT * FROM table_name;
PLAN path to data

4. DCL - Data Control Language


DCL (Data Control Language) includes commands such
as GRANT and REVOKE which mainly deal with the rights, permissions, and other
controls of the database system. These commands are used to control access to
data in the database by granting or revoking permissions.

Command Description Syntax

Assigns new privileges to a


GRANT privilege_type [(column_list)]
user account, allowing access
GRANT ON [object_type] object_name TO user
to specific database objects, [WITH GRANT OPTION];
actions, or functions.
Command Description Syntax

Removes previously granted


REVOKE [GRANT OPTION FOR]
privileges from a user account,
privilege_type [(column_list)] ON
REVOKE taking away their access to
[object_type] object_name FROM user
certain database objects or [CASCADE];
actions.

5. TCL - Transaction Control Language


Transactions group a set of tasks into a single execution unit. Each transaction
begins with a specific task and ends when all the tasks in the group are successfully
completed. If any of the tasks fail, the transaction fails. Therefore, a transaction has
only two results: success or failure.

Command Description Syntax

BEGIN BEGIN TRANSACTION


Starts a new transaction
TRANSACTION [transaction_name];

Saves all changes made during the


COMMIT COMMIT;
transaction

Undoes all changes made during


ROLLBACK ROLLBACK;
the transaction

Creates a savepoint within the


SAVEPOINT SAVEPOINT savepoint_name;
current transaction

SQL CONSTRAINTS:

SQL constraints are essential elements in relational database design that ensure
the integrity, accuracy, and reliability of the data stored in a database. By enforcing
specific rules on table columns, SQL constraints help maintain data consistency,
preventing invalid data entries and optimizing query performance.
In this article, we will explain the most common SQL constraints in detail, providing
clear examples and explaining how to implement them effectively.
What Are SQL Constraints?
SQL constraints are rules applied to columns or tables in a relational database to
limit the type of data that can be inserted, updated, or deleted. These rules ensure
the data is valid, consistent, and adheres to the business logic or database
requirements. Constraints can be enforced during table creation or later using
the ALTER TABLE statement. They play a vital role in maintaining the quality and
integrity of your database.
Types of SQL Constraints
SQL provides several types of constraints to manage different aspects of data
integrity. These constraints are essential for ensuring that data meets the
requirements of accuracy, consistency, and validity. Let’s go through each of them
with detailed explanations and examples.
1. NOT NULL Constraint
The NOT NULL constraint ensures that a column cannot contain NULL values. This is
particularly important for columns where a value is essential for identifying records
or performing calculations. If a column is defined as NOT NULL, every row must
include a value for that column.
Example:
CREATE TABLE Student
(
ID int(6) NOT NULL,
NAME varchar(10) NOT NULL,
ADDRESS varchar(20)
);
Explanation: In the above example, both the ID and NAME columns are defined with
the NOT NULL constraint, meaning every student must have an ID and NAME value.
2. UNIQUE Constraint
The UNIQUE constraint ensures that all values in a column are distinct across all
rows in a table. Unlike the PRIMARY KEY, which requires uniqueness and does not
allow NULLs, the UNIQUE constraint allows NULL values but still enforces
uniqueness for non-NULL entries.
Example:
CREATE TABLE Student
(
ID int(6) NOT NULL UNIQUE,
NAME varchar(10),
ADDRESS varchar(20)
);

Explanation: Here, the ID column must have unique values, ensuring that no two
students can share the same ID. We can have more than one UNIQUE constraint in a
table.
3. PRIMARY KEY Constraint
A PRIMARY KEY constraint is a combination of the NOT
NULL and UNIQUE constraints. It uniquely identifies each row in a table. A table can
only have one PRIMARY KEY, and it cannot accept NULL values. This is typically
used for the column that will serve as the identifier of records.
Example:
CREATE TABLE Student
(
ID int(6) NOT NULL UNIQUE,
NAME varchar(10),
ADDRESS varchar(20),
PRIMARY KEY(ID)
);

Explanation: In this case, the ID column is set as the primary key, ensuring that each
student’s ID is unique and cannot be NULL.
4. FOREIGN KEY Constraint
A FOREIGN KEY constraint links a column in one table to the primary key in another
table. This relationship helps maintain referential integrity by ensuring that the
value in the foreign key column matches a valid record in the referenced table.
Orders Table:

O_ID ORDER_NO C_ID

1 2253 3

2 3325 3

3 4521 2

4 8532 1
Customers Table:

C_ID NAME ADDRESS

1 RAMESH DELHI

2 SURESH NOIDA

3 DHARMESH GURGAON

As we can see clearly that the field C_ID in Orders table is the primary key in
Customers table, i.e. it uniquely identifies each row in the Customers table.
Therefore, it is a Foreign Key in Orders table.
Example:
CREATE TABLE Orders
(
O_ID int NOT NULL,
ORDER_NO int NOT NULL,
C_ID int,
PRIMARY KEY (O_ID),
FOREIGN KEY (C_ID) REFERENCES Customers(C_ID)
)
Explanation: In this example, the C_ID column in the Orders table is a foreign key that
references the C_ID column in the Customers table. This ensures that only valid
customer IDs can be inserted into the Orders table.
5. CHECK Constraint
The CHECK constraint allows us to specify a condition that data must satisfy before
it is inserted into the table. This can be used to enforce rules, such as ensuring that a
column’s value meets certain criteria (e.g., age must be greater than 18)
Example:
CREATE TABLE Student
(
ID int(6) NOT NULL,
NAME varchar(10) NOT NULL,
AGE int NOT NULL CHECK (AGE >= 18)
);
Explanation: In the above table, the CHECK constraint ensures that only students
aged 18 or older can be inserted into the table.
6. DEFAULT Constraint
The DEFAULT constraint provides a default value for a column when no value is
specified during insertion. This is useful for ensuring that certain columns always
have a meaningful value, even if the user does not provide one. This helps maintain
consistent data and simplifies data entry.
Example:
CREATE TABLE Student
(
ID int(6) NOT NULL,
NAME varchar(10) NOT NULL,
AGE int DEFAULT 18
);
Explanation: Here, if no value is provided for AGE during an insert, the default value
of 18 will be assigned automatically.
How to Specify Constraints in SQL
Constraints can be specified during the table creation process using the CREATE
TABLE statement. Additionally, constraints can be modified or added to existing
tables using the ALTER TABLE statement.
Syntax for Creating Constraints:
CREATE TABLE table_name
(
column1 data_type [constraint_name],
column2 data_type [constraint_name],
column3 data_type [constraint_name],
...
);
We can also add or remove constraints after a table is created:
Example to Add a Constraint:
ALTER TABLE Student
ADD CONSTRAINT unique_student_id UNIQUE (ID);
Conclusion
SQL constraints are essential for maintaining data integrity and ensuring
consistency in relational databases. Understanding and implementing these
constraints effectively will help in designing robust, error-free databases. By
leveraging NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY, CHECK, DEFAULT,
and INDEX, you can ensure your database is optimized
for accuracy and performance.

Questions:
1. What is the difference between CHAR and VARCHAR2 data types?

CHAR: Fixed-length storage. If the defined length is not fully used, it is padded with spaces.
VARCHAR2: Variable-length storage. Only the actual data is stored, saving space when
the full length is not needed.

2. What is a primary key?


A primary key is a unique identifier for each record in a table. It ensures that no
two rows have the same value in the primary key column(s), and it does not allow
NULL values.

3. What is a foreign key?


A foreign key is a column (or set of columns) in one table that refers to the
primary key in another table. It establishes and enforces a relationship between
the two tables, ensuring data integrity.

4. What is normalization in databases?


Normalization is the process of organizing data in a database to reduce
redundancy and improve data integrity. This involves dividing large tables into
smaller, related tables and defining relationships between them to ensure
consistency and avoid anomalies.

5. What is a query in SQL?


A query is a SQL statement used to retrieve, update, or manipulate data in
a database.

6. What is a view in SQL?


A view is a virtual table created by a SELECT query. It does not store data itself,
but presents data from one or more tables in a structured way. Views simplify
complex queries, improve readability, and enhance security by restricting access
to specific rows or columns.
7. What is a subquery?
A subquery is a query nested within another query. It is often used in the WHERE
clause to filter data based on the results of another query, making it easier to
handle complex conditions.

8. What is the difference between the WHERE and HAVING clauses?


WHERE: Filters rows before any grouping takes place.
HAVING: Filters grouped data after the GROUP BY clause has been applied.
In short, WHERE applies to individual rows, while HAVING applies to groups.

9. What are indexes, and why are they used?


Indexes are database objects that improve query performance by allowing faster
retrieval of rows. They function like a book’s index, making it quicker to find
specific data without scanning the entire table. However, indexes
require additional storage and can slightly slow down data
modification operations.

10. What is the difference between DELETE and TRUNCATE commands?


DELETE: Removes rows one at a time and records each deletion in the transaction
log, allowing rollback. It can have a WHERE clause.
TRUNCATE: Removes all rows at once without logging individual row deletions.
It cannot have a WHERE clause and is faster than DELETE for large data sets.

11. What are the differences between SQL and NoSQL databases?
SQL Databases:
Use structured tables with rows and columns.
Rely on a fixed schema.
Offer ACID properties.
NoSQL Databases:
Use flexible, schema-less structures (e.g., key-value pairs, document stores).
Are designed for horizontal scaling.
Often focus on performance and scalability over strict consistency.
12. What is a trigger in SQL?
A trigger is a set of SQL statements that automatically execute in response to
certain events on a table, such as INSERT, UPDATE, or DELETE. Triggers help
maintain data consistency, enforce business rules, and implement complex
integrity constraints.

13. What is the purpose of the ALTER command in SQL?


The ALTER command is used to modify the structure of an existing database
object. This command is essential for adapting our database schema as
requirements evolve.
Add or drop a column in a table.
Change a column’s data type.
Add or remove constraints.
Rename columns or tables.
Adjust indexing or storage settings.

14. What is a composite primary key?


A composite primary key is a primary key made up of two or more columns.
Together, these columns must form a unique combination for each row in the
table. It’s used when a single column isn’t sufficient to uniquely identify a record.
Example:
Consider an Orders table where OrderID and ProductID together uniquely identify
each record because multiple orders might include the same product, but not
within the same order.
CREATE TABLE OrderDetails (
OrderID INT,
ProductID INT,
Quantity INT,
PRIMARY KEY (OrderID, ProductID)
);

15. How is data integrity maintained in SQL databases?


Data integrity refers to the accuracy, consistency, and reliability of the data
stored in the database. SQL databases maintain data integrity through several
mechanisms:
Constraints: Ensuring that certain conditions are always met. For example, NOT
NULL ensures a column cannot have missing values, FOREIGN KEY ensures a valid
relationship between tables, and UNIQUE ensures no duplicate values.
Transactions: Ensuring that a series of operations either all succeed or all fail,
preserving data consistency.
Triggers: Automatically enforcing rules or validations before or after changes to
data.
Normalization: Organizing data into multiple related tables to minimize
redundancy and prevent anomalies.
These measures collectively ensure that the data remains reliable and meaningful
over time.
16. What are the advantages of using stored procedures?
Improved Performance: Stored procedures are precompiled and cached in the
database, making their execution faster than sending multiple individual queries.
Reduced Network Traffic: By executing complex logic on the server, fewer round
trips between the application and database are needed.
Enhanced Security: Stored procedures can restrict direct access to underlying
tables, allowing users to execute only authorized operations.
Reusability and Maintenance: Once a procedure is written, it can be reused
across multiple applications. If business logic changes, you only need to update
the stored procedure, not every application that uses it.

17. What is the difference between UNION and UNION ALL?


UNION: Combines result sets from two queries and removes duplicate rows,
ensuring only unique records are returned.
UNION ALL: Combines the result sets without removing duplicates, meaning all
records from both queries are included.
Performance-wise, UNION ALL is faster than UNION because it doesn’t perform the
additional operation of eliminating duplicates.
18. What are scalar functions in SQL?
Scalar functions operate on individual values and return a single value as a result.
They are often used for formatting or converting data. Common examples include:
LEN(): Returns the length of a string.
ROUND(): Rounds a numeric value.
CONVERT(): Converts a value from one data type to another.
Example:
SELECT LEN('Example') AS StringLength;

19. What is the difference between the NVL and NVL2 functions?
NVL(): Replaces a NULL value with a specified replacement
value. Example: NVL(Salary, 0) will replace NULL with 0.
NVL2(): Evaluates two arguments:
If the first argument is NOT NULL, returns the second argument.
If the first argument is NULL, returns the third argument.
Example:
SELECT NVL(Salary, 0) AS AdjustedSalary FROM Employees; -- Replaces NULL with 0

SELECT NVL2(Salary, Salary, 0) AS AdjustedSalary FROM Employees; -- If Salary is NULL,


returns 0; otherwise, returns Salary.

20. How does the RANK() function differ from DENSE_RANK()?


RANK(): Assigns a rank to each row, with gaps if there are ties.
DENSE_RANK(): Assigns consecutive ranks without any gaps.
Example:
SELECT Name, Salary, RANK() OVER (ORDER BY Salary DESC) AS Rank
FROM Employees;

If two employees have the same salary, they get the same rank, but RANK() will
skip a number for the next rank, while DENSE_RANK() will not.

21. What is the difference between ROW_NUMBER() and RANK()?


ROW_NUMBER(): Assigns a unique number to each row regardless of ties.
RANK(): Assigns the same number to tied rows and leaves gaps for subsequent
ranks.
Example:
SELECT Name, ROW_NUMBER() OVER (ORDER BY Salary DESC) AS RowNum
FROM Employees;

22. What are common table expressions (CTEs) in SQL?


A CTE is a temporary result set defined within a query. It improves query
readability and can be referenced multiple times.
Example:
WITH TopSalaries AS (
SELECT Name, Salary
FROM Employees
WHERE Salary > 50000
)
SELECT * FROM TopSalaries WHERE Name LIKE 'A%';

23. What are window functions, and how are they used?
Window functions allow you to perform calculations across a set of table rows
that are related to the current row within a result set, without collapsing the
result set into a single row. These functions can be used to compute running
totals, moving averages, rank rows, etc.
Example: Calculating a running total
SELECT Name, Salary,
SUM(Salary) OVER (ORDER BY Salary) AS RunningTotal
FROM Employees;

24. How does indexing improve query performance?


Indexing allows the database to locate and access the rows corresponding to
a query condition much faster than scanning the entire table. Instead of reading
each row sequentially, the database uses the index to jump directly to the
relevant data pages. This reduces the number of disk I/O operations and speeds
up query execution, especially for large tables.
Example:
CREATE INDEX idx_lastname ON Employees(LastName);
SELECT * FROM Employees WHERE LastName = 'Smith';

The index on LastName lets the database quickly find all rows matching ‘Smith’
without scanning every record.

25. What are the trade-offs of using indexes in SQL databases?


Advantages
Faster query performance, especially for SELECT queries with WHERE clauses,
JOIN conditions, or ORDER BY clauses.
Improved sorting and filtering efficiency.
Disadvantages:
Increased storage space for the index structures.
Additional overhead for write operations (INSERT, UPDATE, DELETE), as indexes
must be updated whenever the underlying data changes.
Potentially slower bulk data loads or batch inserts due to the need to maintain
index integrity.
In short, indexes make read operations faster but can slow down write operations
and increase storage requirements.

26. What is the difference between clustered and non-clustered indexes?


1. Clustered Index:
Organizes the physical data in the table itself in the order of the indexed
column(s).
A table can have only one clustered index.
Improves range queries and queries that sort data.
Example: If EmployeeID is the clustered index, the rows in the table are stored
physically sorted by EmployeeID.
2. Non-Clustered Index:
Maintains a separate structure that contains a reference (or pointer) to the
physical data in the table.
A table can have multiple non-clustered indexes.
Useful for specific query conditions that aren’t related to the primary ordering of
the data.
Example: A non-clustered index on LastName allows fast lookups by last name
even if the table is sorted by another column.

27. What is a materialized view, and how does it differ from a standard
view?
Standard View:
A virtual table defined by a query.
Does not store data; the underlying query is executed each time the view is
referenced.
A standard view shows real-time data.
Materialized View:
A physical table that stores the result of the query.
Data is precomputed and stored, making reads faster.
Requires periodic refreshes to keep data up to date.
materialized view is used to store aggregated sales data, updated nightly, for fast
reporting.

28. What is a sequence in SQL?


A sequence is a database object that generates a series of unique numeric values.
It’s often used to produce unique identifiers for primary keys or other columns
requiring sequential values.
Example:
CREATE SEQUENCE seq_emp_id START WITH 1 INCREMENT BY 1;
SELECT NEXT VALUE FOR seq_emp_id; -- Returns 1
SELECT NEXT VALUE FOR seq_emp_id; -- Returns 2

29. What is the purpose of the SQL MERGE statement?


The MERGE statement combines multiple operations INSERT, UPDATE, and
DELETE into one. It is used to synchronize two tables by:
Inserting rows that don’t exist in the target table.
Updating rows that already exist.
Deleting rows from the target table based on conditions
Example:
MERGE INTO TargetTable T
USING SourceTable S
ON T.ID = S.ID
WHEN MATCHED THEN
UPDATE SET T.Value = S.Value
WHEN NOT MATCHED THEN
INSERT (ID, Value) VALUES (S.ID, S.Value);

30. How can you handle duplicates in a query without using DISTINCT?
1. GROUP BY: Aggregate rows to eliminate duplicates
SELECT Column1, MAX(Column2)
FROM TableName
GROUP BY Column1;
2. ROW_NUMBER(): Assign a unique number to each row and filter by that
WITH CTE AS (
SELECT Column1, Column2, ROW_NUMBER() OVER (PARTITION BY Column1 ORDER
BY Column2) AS RowNum
FROM TableName
)
SELECT * FROM CTE WHERE RowNum = 1;

31. What are the best practices for writing optimized SQL queries?
1. Write Simple, Clear Queries:
Avoid overly complex joins and subqueries.
Use straightforward, well-structured SQL that is easy to read and maintain.
2. Filter Data Early:
Apply WHERE clauses as early as possible to reduce the amount of data
processed.
Consider using indexed columns in WHERE clauses for faster lookups.
3. **Avoid SELECT *:
Retrieve only the columns needed. This reduces I/O and improves performance.
4. Use Indexes Wisely:
Create indexes on columns that are frequently used in WHERE clauses, JOIN
conditions, and ORDER BY clauses.
Regularly review index usage and remove unused indexes.
5. Leverage Query Execution Plans:
Use execution plans to identify bottlenecks, missing indexes, or inefficient query
patterns.
6. Use Appropriate Join Types:
Choose INNER JOIN, LEFT JOIN, or OUTER JOIN based on the data relationships
and performance requirements.
7. Break Down Complex Queries:
Instead of a single monolithic query, use temporary tables or CTEs to process data
in stages.
8. Optimize Aggregations:
Use GROUP BY and aggregate functions efficiently.
Consider pre-aggregating data if queries frequently require the same
computations.
9. Monitor Performance Regularly:
Continuously analyze query performance and fine-tune as data volumes grow or
usage patterns change.
32. What is the purpose of the SQL PIVOT operator?
The PIVOT operator transforms rows into columns, making it easier to summarize or
rearrange data for reporting.
Example:
Converting a dataset that lists monthly sales into a format that displays each month as a
separate column.
SELECT ProductID, [2021], [2022]
FROM (
SELECT ProductID, YEAR(SaleDate) AS SaleYear, Amount
FROM Sales
) AS Source
PIVOT (
SUM(Amount)
FOR SaleYear IN ([2021], [2022])
) AS PivotTable;

Query Based SQL Interview Questions

33. Write a query to find the second-highest salary of an employee in a table.


SELECT MAX(Salary) AS SecondHighestSalary
FROM Employee
WHERE Salary < (SELECT MAX(Salary) FROM Employee);
Explanation:
This query identifies the second-highest salary by selecting the maximum salary
that is less than the overall highest salary. The subquery determines the top
salary, while the outer query finds the next highest value.
Write a query to retrieve employees who earn more than the average salary.
SELECT *
FROM Employee
WHERE Salary > (SELECT AVG(Salary) FROM Employee);

Explanation:
This query fetches details of employees whose salary exceeds the average salary.
The subquery calculates the average salary, and the main query filters rows based
on that result.
34. Write a query to fetch the duplicate values from a column in a table.
SELECT ColumnName, COUNT(*)
FROM TableName
GROUP BY ColumnName
HAVING COUNT(*) > 1;

Explanation:
The query uses GROUP BY to group identical values and HAVING COUNT(*) > 1 to
identify values that appear more than once in the specified column.
84. Write a query to find the employees who joined in the last 30 days.
SELECT *
FROM Employee
WHERE JoiningDate > DATE_SUB(CURDATE(), INTERVAL 30 DAY);

Explanation:
By comparing the JoiningDate to the current date minus 30 days, this query retrieves
all employees who joined within the last month.
35. Write a query to fetch top 3 earning employees.
SELECT *
FROM Employee
ORDER BY Salary DESC
LIMIT 3;

Explanation:
The query sorts employees by salary in descending order and uses LIMIT 3 to
return only the top three earners.
36. Write a query to delete duplicate rows in a table without using the ROWID
keyword.
DELETE FROM Employee
WHERE EmployeeID NOT IN (
SELECT MIN(EmployeeID)
FROM Employee
GROUP BY Column1, Column2
);

Explanation:
This query retains only one row for each set of duplicates by keeping the row with
the smallest EmployeeID. It identifies duplicates using GROUP BY and removes rows
not matching the minimum ID.
87. Write a query to fetch common records from two tables.
SELECT *
FROM TableA
INNER JOIN TableB ON TableA.ID = TableB.ID;
Explanation:
An INNER JOIN is used to find rows present in both tables by matching a common
column (in this case, ID).
37. Write a query to fetch employees whose names start and end with ‘A’.
SELECT *
FROM Employee
WHERE Name LIKE 'A%' AND Name LIKE '%A';
Explanation:
The query uses LIKE with wildcard characters to filter rows where the Name column
starts and ends with the letter 'A'.
89. Write a query to display all departments along with the number of employees in each.
SELECT DepartmentID, COUNT(*) AS EmployeeCount
FROM Employee
GROUP BY DepartmentID;
Explanation:
By grouping employees by their DepartmentID and counting rows in each group, the
query produces a list of departments along with the employee count.
90. Write a query to find employees who do not have managers.
SELECT *
FROM Employee
WHERE ManagerID IS NULL;
Explanation:
This query selects employees whose ManagerID column is NULL, indicating they
don’t report to a manager.
38. Write a query to fetch the 3rd and 4th highest salaries.
WITH SalaryRank AS (
SELECT Salary, RANK() OVER (ORDER BY Salary DESC) AS Rank
FROM Employee
)
SELECT Salary
FROM SalaryRank
WHERE Rank IN (3, 4);

Explanation:
This query uses the RANK() window function to rank the salaries in descending
order. The outer query then selects the 3rd and 4th highest salaries by filtering for
those ranks.
39. Write a query to transpose rows into columns.
SELECT
MAX(CASE WHEN ColumnName = 'Condition1' THEN Value END) AS Column1,
MAX(CASE WHEN ColumnName = 'Condition2' THEN Value END) AS Column2
FROM TableName;

Explanation:
This query converts specific row values into columns using conditional
aggregation with CASE. Each column’s value is determined based on a condition
applied to rows.
40. Write a query to fetch records updated within the last hour.
SELECT *
FROM TableName
WHERE UpdatedAt >= NOW() - INTERVAL 1 HOUR;

Explanation:
By comparing the UpdatedAt timestamp to the current time minus one hour, the
query retrieves rows updated in the last 60 minutes.
41. Write a query to list employees in departments that have fewer than 5
employees.
SELECT *
FROM Employee
WHERE DepartmentID IN (
SELECT DepartmentID
FROM Employee
GROUP BY DepartmentID
HAVING COUNT(*) < 5
);

Explanation:
The subquery counts employees in each department, and the main query uses
those results to find employees working in departments with fewer than 5
members.
42. Write a query to check if a table contains any records.
SELECT CASE
WHEN EXISTS (SELECT * FROM TableName) THEN 'Has Records'
ELSE 'No Records'
END AS Status;
Explanation:
The query uses EXISTS to determine if any rows exist in the table, returning a
status of 'Has Records' or 'No Records' based on the result.
43. Write a query to find employees whose salaries are higher than their
managers.
SELECT e.EmployeeID, e.Salary
FROM Employee e
JOIN Employee m ON e.ManagerID = m.EmployeeID
WHERE e.Salary > m.Salary;
Explanation:
This query joins the Employee table with itself to compare employee salaries to
their respective managers’ salaries, selecting those who earn more.
44. Write a query to fetch alternating rows from a table.
WITH RowNumbered AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RowNum
FROM Employee
)
SELECT *
FROM RowNumbered
WHERE RowNum % 2 = 0;
Explanation:
This query assigns a sequential number to each row using ROW_NUMBER(), then
selects rows where the row number is even, effectively fetching alternating rows.
The ORDER BY (SELECT NULL) is used to avoid any specific ordering and just apply a
sequential numbering.
45. Write a query to find departments with the highest average salary.
SELECT DepartmentID
FROM Employee
GROUP BY DepartmentID
ORDER BY AVG(Salary) DESC
LIMIT 1;
Explanation:
Grouping by DepartmentID and ordering by the average salary in descending order,
the query returns the department with the highest average.
46. Write a query to fetch the nth record from a table.
WITH OrderedEmployees AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RowNum
FROM Employee
)
SELECT *
FROM OrderedEmployees
WHERE RowNum = n;
Explanation:
This query uses ROW_NUMBER() to generate a sequential number for each row.
The outer query then retrieves the row where the number matches the
desired nth position. The approach is portable across most databases.
47. Write a query to find employees hired in the same month of any year.
SELECT *
FROM Employee
WHERE MONTH(JoiningDate) = MONTH(CURDATE());
Explanation:
By comparing the month of JoiningDate to the current month, the query selects all
employees who were hired in that month regardless of the year.

You might also like