Data - Analytics - Interview - Q and A
Data - Analytics - Interview - Q and A
MOHD MUJTABA
DATA ANALYTICS INTERVIEW
QUESTIONS AND ANSWERS
Q1 What is Data Analysis?
Data analysis is basically a process of analyzing, modeling, and interpreting data to draw
insights or conclusions. With the insights gained, informed decisions can be made. It is
used by every industry, which is why data analysts are in high demand. A Data Analyst's
sole responsibility is to play around with large amounts of data and search for hidden
insights. By interpreting a wide range of data, data analysts assist organizations in
understanding the business's current state.
• Collect Data: The data is collected from a variety of sources and is then stored to be
cleaned and prepared. This step involves removing all missing values and outliers.
• Analyse Data: As soon as the data is prepared, the next step is to analyze it. Improvements
are made by running a model repeatedly. Following that, the model is validated to ensure
that it is meeting the requirements.
• Create Reports: In the end, the model is implemented, and reports are generated as well as
distributed to stakeholders.
Q5 What are the different challenges one faces during data analysis?
While analyzing data, a Data Analyst can encounter the following issues:
• Duplicate entries and spelling errors. Data quality can be hampered and reduced by these
errors.
• The representation of data obtained from multiple sources may differ. It may cause a delay
in the analysis process if the collected data are combined after being cleaned and
organized.
• Another major challenge in data analysis is incomplete data. This would invariably lead to
errors or faulty results.
• You would have to spend a lot of time cleaning the data if you are extracting data from a
poor source.
• Business stakeholders' unrealistic timelines and expectations
• Data blending/ integration from multiple sources is a challenge, particularly if there are no
consistent parameters and conventions
• Insufficient data architecture and tools to achieve the analytics goals on time.
1. R and Python
2. Microsoft Excel
3. Tableau
4. RapidMiner
5. KNIME
6. Power BI
7. Apache Spark
8. QlikView
9. Talend
10. Splunk
Data Profiling Process: It generally involves analyzing that data's individual attributes. In
this case, the emphasis is on providing useful information on data attributes such as data
type, frequency, etc. Additionally, it also facilitates the discovery and evaluation of
enterprise metadata.
Data Mining Data Profiling
It involves analyzing a pre-built database to It involves analyses of raw data from
identify patterns. existing datasets.
It also analyzes existing databases and large
In this, statistical or informative
datasets to convert raw data into useful
summaries of the data are collected.
information.
It usually involves finding hidden patterns and It usually involves the evaluation of data
seeking out new, useful, and non-trivial data to sets to ensure consistency, uniqueness,
generate useful information. and logic.
In data profiling, erroneous data is
Data mining is incapable of identifying inaccurate
identified during the initial stage of
or incorrect data values.
analysis.
Classification, regression, clustering,
This process involves using discoveries
summarization, estimation, and description are
and analytical methods to gather
some primary data mining tasks that are needed
statistics or summaries about the data.
to be performed.
Q11 What are the ways to detect outliers? Explain different ways to deal with it.
Outliers are detected using two methods:
• Box Plot Method: According to this method, the value is considered an outlier if it exceeds
or falls below 1.5*IQR (interquartile range), that is, if it lies above the top quartile (Q3) or
below the bottom quartile (Q1).
• Standard Deviation Method: According to this method, an outlier is defined as a value
that is greater or lower than the mean ± (3*standard deviation).
The above image illustrates how data usually tend to be distributed around a central value
with no bias on either side. In addition, the random variables are distributed according to
symmetrical bell-shaped curves.
Here, X represents an individual data point, U represents the average of multiple data
points, and N represents the total number of data points.
Here, X represents the independent variable, Y represents the dependent variable, x-bar
represents the mean of the X, y-bar represents the mean of the Y, and N represents the
total number of data points in the sample
.
Q27 Name the statistical methods that are highly beneficial for data analysts?
Accurate predictions and valuable results can only be achieved through the right statistical
methods for analysis. Research well to find the leading ones used by the majority of analysts
for varied tasks to deliver a reliable answer in the analyst interview questions.
• Bayesian method
• Markov process
• Simplex algorithm
• Imputation
• Spatial and cluster processes
• Rank statistics, percentile, outliers detection
• Mathematical optimization
In addition to this, there are various types of data analysis as well, which the data analysts
use-
1. Descriptive
2. Inferential
3. Differences
4. Associative
5. Predictive
Q28 What's the difference between a data lake and a data warehouse?
• Imputation techniques
• Bayesian methodologies
The storage of data is a big deal. Companies that use big data have been in the news a lot
lately, as they try to maximize its potential. Data storage is usually handled by traditional
databases for the layperson. For storing, managing, and analyzing big data, companies
use data warehouses and data lakes.
Q30 Name the different data validation methods used by data analysts.
There are many ways to validate datasets. Some of the most commonly used data validation
methods by Data Analysts include:
• Field Level Validation – In this method, data validation is done in each field as and when a
user enters the data. It helps to correct the errors as you go.
• Form Level Validation – In this method, the data is validated after the user completes the
form and submits it. It checks the entire data entry form at once, validates all the fields in it,
and highlights the errors (if any) so that the user can correct it.
• Data Saving Validation – This data validation technique is used during the process of
saving an actual file or database record. Usually, it is done when multiple data entry forms
must be validated.
• Search Criteria Validation – This validation technique is used to offer the user accurate
and related matches for their searched keywords or phrases. The main purpose of this
validation method is to ensure that the user’s search queries can return the most relevant
results.
Q33 What are the problems that a Data Analyst can encounter while performing data
analysis?
A critical data analyst interview question you need to be aware of. A Data Analyst can
confront the following issues while performing data analysis:
• Presence of duplicate entries and spelling mistakes. These errors can hamper data quality.
• Poor quality data acquired from unreliable sources. In such a case, a Data Analyst will have
to spend a significant amount of time in cleansing the data.
• Data extracted from multiple sources may vary in representation. Once the collected data is
combined after being cleansed and organized, the variations in data representation may
cause a delay in the analysis process.
• Incomplete data is another major challenge in the data analysis process. It would inevitably
lead to erroneous or faulty results.
Uses simulation
Uses data aggregation algorithms and
Uses statistical models and
and data mining optimization techniques
forecasting techniques
techniques to advise possible
outcomes
Mohd Mujtaba 10 | P a g e
Example: An ice cream Example: An ice cream
Example: Lower prices to
company can analyze how company can analyze how
increase the sale of ice
much ice cream was sold, much ice cream was sold,
creams, produce
which flavors were sold, which flavors were sold, and
more/fewer quantities of
and whether more or less whether more or less ice
a specific flavor of ice
ice cream was sold than cream was sold than the day
cream
the day before before
The bivariate analysis involves the analysis of two variables to find causes, relationships,
and correlations between the variables.
Example – Analyzing the sale of ice creams based on the temperature outside.
The bivariate analysis can be explained using Correlation coefficients, Linear regression,
Logistic regression, Scatter plots, and Box plots.
The multivariate analysis involves the analysis of three or more variables to understand the
relationship of each variable with the other variables.
Example – Analysing Revenue based on expenditure.
Multivariate analysis can be performed using Multiple regression, Factor analysis,
Classification & regression trees, Cluster analysis, Principal component analysis, Dual-axis
charts, etc.
Mohd Mujtaba 11 | P a g e
EXCEL INTERVIEW QUESTIONS
Q4 What is the function to find the day of the week for a particular date value?
The get the day of the week, you can use the WEEKDAY() function.
The above function will return 6 as the result, i.e., 17th December is a Saturday.
need to find things in a table or a range by row.
Q5 What function would you use to get the current date and time in Excel?
In Excel, you can use the TODAY() and NOW() function to get the current date and time.
Mohd Mujtaba 12 | P a g e
Q6 Using the below sales table, calculate the total quantity sold by sales
representatives whose name starts with A, and the cost of each item they have sold is
greater than 10.
You can use the SUMIFS() function to find the total quantity.
For the Sales Rep column, you need to give the criteria as “A*” - meaning the name should
start with the letter “A”. For the Cost each column, the criteria should be “>10” - meaning the
cost of each item is greater than 10.
Mohd Mujtaba 13 | P a g e
Q7 Using the data given below, create a pivot table to find the total sales made by
each sales representative for each item. Display the sales as % of the grand total.
• Select the entire table range, click on the Insert tab and choose PivotTable
• Select the table range and the worksheet where you want to place the pivot table
• Drag Sale total on to Values, and Sales Rep and Item on to Row Labels. It will give the
sum of sales made by each representative for every item they have sold.
Mohd Mujtaba 14 | P a g e
• Right-click on “Sum of Sale Total’ and expand Show Values As to select % of Grand
Total.
Mohd Mujtaba 15 | P a g e
• Column Area: The headings above the values area make up the column area.
• Filter Area: Using this filter you may drill down in the data set.
•
• If you wanted to find the department to which Stuart belongs to, you could use the
VLOOKUP function as shown below:
Mohd Mujtaba 16 | P a g e
•
• Here, A11 cell has the lookup value, A2:E7 is the table array, 3 is the column index
number with information about departments, and 0 is the range lookup.
• If you hit enter, it will return “Marketing”, indicating that Stuart is from the marketing
department
•
Q11 What Is VLOOKUP?
VLOOKUP is a predetermined function in Excel that allows the user to find data within a
table corresponding to a particular row.
For instance, say you have a table of employee information that includes (from column A
onward) employee ID, employee name, start date, hours per week, and salary. With
VLOOKUP you can specify a row from the first column (i.e an employee number) and look
up corresponding data from other columns, like the salary of the employee with that
employee ID.
Mohd Mujtaba 17 | P a g e
Q13 What Is the Default Value of the Last Parameter of VLOOKUP?
If the last parameter is not specified via TRUE or FALSE, the return value will default to
TRUE (approximate), and show an approximate match for your request.
Mohd Mujtaba 18 | P a g e
SQL INTERVIEW QUESTIONS
Q1 What is Data?
Data is a collection of a distinct small unit of information. It can be used in a variety of forms
like text, numbers, media, bytes, etc. it can be stored in pieces of paper or electronic memory,
etc.
Word 'Data' is originated from the word 'datum' that means 'single piece of information.' It is
plural of the word datum.
In computing, Data is information that can be translated into a form for efficient movement and
processing. Data is interchangeable.
Q2 What is Database?
A database is an organized collection of data, stored and retrieved digitally from a remote or
local computer system. Databases can be vast and complex, and such databases are
developed using fixed design and modeling approaches.
A database is a systematic collection of data. They support electronic storage and
manipulation of data. Databases make data management easy.
Let us discuss a database example: An online telephone directory uses a database to store
data of people, phone numbers, and other contact details. Your electricity service provider
uses a database to manage billing, client-related issues, handle fault data, etc.
Let us also consider Facebook. It needs to store, manipulate, and present data related to
members, their friends, member activities, messages, advertisements, and a lot more. We
can provide a countless number of examples for the usage of databases.
Q3 What is a Datawarehouse?
Datawarehouse refers to a central repository of data where the data is assembled from
multiple sources of information. Those data are consolidated, transformed and made available
for the mining as well as online processing. Warehouse data also have a subset of data called
Data Marts.
Q4 What is DBMS?
DBMS stands for Database Management System. DBMS is a system software responsible
for the creation, retrieval, updation, and management of the database. It ensures that our
data is consistent, organized, and is easily accessible by serving as an interface between
the database and its end-users or application software.
Mohd Mujtaba 19 | P a g e
DBMS RDBMS
DBMS does not apply any security with RDBMS defines the integrity
regards to data manipulation. constraint for the purpose of ACID
(Atomocity, Consistency, Isolation and
Durability) property.
DBMS uses file system to store data, so in RDBMS, data values are stored in the
there will be no relation between the form of tables, so a relationship between
tables. these data values will be stored in the form
of a table as well.
Q6 What is SQL?
SQL stands for Structured Query Language. It is the standard language for relational
database management systems. It is especially useful in handling organized data comprised
of entities (variables) and relations between different entities of the data.
Mohd Mujtaba 20 | P a g e
SQL MySQL
SQL follows a simple standard format MySQL has numerous variants and gets
without many or regular updates. frequent updates.
SQL does not allow other processors or MySQL is less secure than SQL, as it
even its own binaries to manipulate data allows third-party processors to
during execution. manipulate data files during execution.
Mohd Mujtaba 21 | P a g e
Q8 What are the subsets of SQL?
Mohd Mujtaba 22 | P a g e
Q9 What is the purpose of DML Language?
Data manipulation language makes the user able to retrieve and manipulate data in a
relational database. The DML commands can only perform read-only operations on data. We
can perform the following operations using DDL language:
o Insert data into the database through the INSERT command.
o Retrieve data from the database through the SELECT command.
o Update data in the database through the UPDATE command.
o Delete data from the database through the DELETE command.
Example
INSERT INTO Student VALUES (111, 'George', 'Computer Science')
Mohd Mujtaba 23 | P a g e
Q15 What is a Primary Key?
The PRIMARY KEY constraint uniquely identifies each row in a table. It must contain
UNIQUE values and has an implicit NOT NULL constraint.
A table in SQL is strictly restricted to have one and only one primary key, which is comprised
of single or multiple fields (columns).
CREATE TABLE tableName (
col1 int NOT NULL,
col2 varchar(50) NOT NULL,
col3 int,
…………….
PRIMARY KEY (col1)
);
Mohd Mujtaba 24 | P a g e
Q18 What is the difference between Primary key and Foreign key?
It uniquely identifies a
record in the relational It refers to the field in a table which is
2 database table. the primary key of another table.
Only one primary key is Whereas more than one foreign key
3 allowed in a table. are allowed in a table.
It is a combination of
UNIQUE and Not Null It can contain duplicate values and a
4 constraints. table in a relational database.
It constraint can be
implicitly defined on the It constraint cannot be defined on the
7 temporary tables. local or global temporary tables.
Mohd Mujtaba 25 | P a g e
There are four different types of JOINs in SQL:
• (INNER) JOIN: Retrieves records that have matching values in both tables involved in the
join. This is the widely used join for queries.
SELECT *
FROM Table_A
JOIN Table_B;
SELECT *
FROM Table_A
INNER JOIN Table_B;
• LEFT (OUTER) JOIN: Retrieves all the records/rows from the left and the matched
records/rows from the right table.
SELECT *
FROM Table_A A
LEFT JOIN Table_B B
ON A.col = B.col;
• RIGHT (OUTER) JOIN: Retrieves all the records/rows from the right and the matched
records/rows from the left table.
SELECT *
FROM Table_A A
RIGHT JOIN Table_B B
ON A.col = B.col;
• FULL (OUTER) JOIN: Retrieves all the records where there is a match in either the left or
right table.
SELECT *
FROM Table_A A
FULL JOIN Table_B B
ON A.col = B.col;
Mohd Mujtaba 26 | P a g e
Q22 What is an Index? Explain its different types.
A database index is a data structure that provides a quick lookup of data in a column or
columns of a table. It enhances the speed of operations accessing data from a database
table at the cost of additional writes and memory to maintain the index data structure.
CREATE INDEX index_name /* Create Index */
ON table_name (column_1, column_2);
DROP INDEX index_name; /* Drop Index */
There are different types of indexes that can be created for different purposes:
• Unique and Non-Unique Index:
Unique indexes are indexes that help maintain data integrity by ensuring that no two rows of
data in a table have identical key values. Once a unique index has been defined for a table,
uniqueness is enforced whenever keys are added or changed within the index.
CREATE UNIQUE INDEX myIndex
ON students (enroll_no);
Non-unique indexes, on the other hand, are not used to enforce constraints on the tables
with which they are associated. Instead, non-unique indexes are used solely to improve
query performance by maintaining a sorted order of data values that are used frequently.
• Clustered and Non-Clustered Index:
Clustered indexes are indexes whose order of the rows in the database corresponds to the
order of the rows in the index. This is why only one clustered index can exist in a given table,
whereas, multiple non-clustered indexes can exist in the table.
The only difference between clustered and non-clustered indexes is that the database
manager attempts to keep the data in the database in the same order as the corresponding
keys appear in the clustered index.
Clustering indexes can improve the performance of most query operations because they
provide a linear-access path to data stored in the database.
Mohd Mujtaba 27 | P a g e
o Logical operators: These operators evaluate the expressions and return their results
in True or False. This operator includes ALL, AND, ANY, ISNULL, EXISTS,
BETWEEN, IN, LIKE, NOT, OR, UNIQUE.
o Comparison operators: These operators are used to perform comparisons of two
values and check whether they are the same or not. It includes equal to (=), not equal
to (!= or <>), less than (<), greater than (>), less than or equal to (<=), greater than or
equal to (>=), not less than (!<), not greater than (!>), etc.
o Bitwise operators: It is used to do bit manipulations between two expressions of
integer type. It first performs conversion of integers into binary bits and then applied
operators such as AND (& symbol), OR (|, ^), NOT (~), etc.
o Compound operators: These operators perform operations on a variable before
setting the variable's result to the operation's result. It includes Add equals (+=),
subtract equals (-=), multiply equals (*=), divide equals (/=), modulo equals (%=), etc.
o String operators: These operators are primarily used to perform concatenation and
pattern matching of strings. It includes + (String concatenation), += (String
concatenation assignment), % (Wildcard), [] (Character(s) matches), [^] (Character(s)
not to match), _ (Wildcard match one character), etc.
Q24 What is a view in SQL?
A view is a database object that has no values. It is a virtual table that contains a subset of
data within a table. It looks like an actual table containing rows and columns, but it takes less
space because it is not present physically. It is operated similarly to the base table but does
not contain any data of its own. Its name is always unique. A view can have data from one or
more tables. If any changes occur in the underlying table, the same changes reflected in the
views also.
The primary use of a view is to implement the security mechanism. It is the searchable object
where we can use a query to search the view as we use for the table. It only shows the data
returned by the query that was declared when the view was created.
We can create a view by using the following syntax:
CREATE VIEW view_name AS
SELECT column_lists FROM table_name
WHERE condition;
Mohd Mujtaba 28 | P a g e
Q25 What are the differences between SQL, MySQL, and SQL Server?
The following comparison chart explains their main differences:
SQL first appeared in 1974. MySQL first appeared on SQL Server first appeared on April
May 23, 1995. 24, 1989.
SQL was developed by IBM MySQL was developed by SQL Server was developed by
Corporation. Oracle Corporation. Microsoft Company.
SQL is a query language for MySQL is database SQL Server is also a software that
managing databases. software that uses SQL uses SQL language to conduct
language to conduct with with the database.
the database.
SQL has no variables. MySQL can use variables SQL Server can use variables
constraints and data types. constraints and data types.
SQL PL/SQL
SQL has no variables. PL/SQL can use variables constraints and data
types.
Mohd Mujtaba 29 | P a g e
SQL can execute only a single query at a PL/SQL can execute a whole block of code at
time. once.
SQL query can be embedded in PL/SQL. PL/SQL cannot be embedded in SQL as SQL
does not support any programming language
and keywords.
SQL can directly interact with the PL/SQL cannot directly interact with the
database server. database server.
SQL is like the source of data that we PL/SQL provides a platform where SQL data
need to display. will be shown.
This operator is used to selects the range of It is a logical operator to determine whether
data between two values. The values can be or not a specific value exists within a set of
numbers, text, and dates as well. values. This operator reduces the use of
multiple OR conditions with the query.
It returns records whose column value lies in It compares the specified column's value and
between the defined range. returns the records when the match exists in
the set of values.
The following syntax illustrates this operator: The following syntax illustrates this operator:
SELECT * FROM table_name SELECT * FROM table_name
WHERE column_name BETWEEN 'value1' WHERE column_name IN ('value1','value
AND 'value2'; 2');
Q29 What is the difference between DELETE and TRUNCATE statements in SQL?
The main difference between them is that the delete statement deletes data without resetting
a table's identity, whereas the truncate command resets a particular table's identity. The
following comparison chart explains it more clearly:
DELETE TRUNCATE
The delete statement removes single or The truncate command deletes the whole
multiple rows from an existing table contents of an existing table without the table
depending on the specified condition. itself. It preserves the table structure or
schema.
Mohd Mujtaba 30 | P a g e
We can use the WHERE clause in the We cannot use the WHERE clause with
DELETE command. TRUNCATE.
You can roll back data after using the It is not possible to roll back after using the
DELETE statement. TRUNCATE statement.
DELETE query takes more space. TRUNCATE query occupies less space.
The DROP command is used to remove Whereas the TRUNCATE command is used
table definition and its contents. to delete all the rows from the table.
In the DROP command, table space is While the TRUNCATE command does not
freed from memory. free the table space from memory.
In the DROP command, a view of the While in this command, a view of the table
table does not exist. exists.
In the DROP command, undo space is While in this command, undo space is used
not used. but less than DELETE.
Mohd Mujtaba 31 | P a g e
Q31 Why do we use Commit and Rollback command?
COMMIT ROLLBACK
COMMIT permanently saves the changes ROLLBACK undo the changes made by
made by the current transaction. the current transaction.
The transaction can not undo changes after Transaction reaches its previous state
COMMIT execution. after ROLLBACK.
Q34 What is the default ordering of data using the ORDER BY clause? How could it be
changed?
The ORDER BY clause in MySQL can be used without the ASC or DESC modifiers. The sort
order is preset to ASC or ascending order when this attribute is absent from the ORDER BY
clause.
Mohd Mujtaba 32 | P a g e
SQL functions are used for the following purposes:
o To perform calculations on data
o To modify individual data items
o To manipulate the output
o To format dates and numbers
o To convert data types
Q37 What is meant by case manipulation functions? Explains its different types in SQL.
Case manipulation functions are part of the character functions. It converts the data from the
state in which it is already stored in the table to upper, lower, or mixed case. The conversion
performed by this function can be used to format the output. We can use it in almost every
part of the SQL statement. Case manipulation functions are mostly used when you need to
search for data, and you don't have any idea that the data you are looking for is in lower case
or upper case.
There are three case manipulation functions in SQL:
LOWER: This function is used to converts a given character into lowercase. The following
example will return the 'STEPHEN' as 'stephen':
SELECT LOWER ('STEPHEN') AS Case_Reault FROM dual;
UPPER: This function is used to converts a given character into uppercase. The following
example will return the 'stephen' as 'STEPHEN':
SELECT UPPER ('stephen') AS Case_Reault FROM dual;
INITCAP: This function is used to converts given character values to uppercase for the initials
of each word. It means every first letter of the word is converted into uppercase, and the rest
is in lower case. The following example will return the 'hello stephen' as 'Hello Stephen':
SELECT INITCAP ('hello stephen') AS Case_Reault FROM dual;
A) CONCAT: This function is used to join two or more values together. It always appends the
second string into the end of the first string. For example:
Input:
SELECT CONCAT ('Information-', 'technology') FROM DUAL;
Output: Information-technology
B) SUBSTR: It is used to return the portion of the string from a specified start point to an
endpoint. For example:
Input:
SELECT SUBSTR ('Database Management System', 9, 11) FROM DUAL;
Output: Management
C) LENGTH: This function returns the string's length in numerical value, including the blank
spaces. For example:
Input:
SELECT LENGTH ('Hello Javatpoint') FROM DUAL;
Output: 16
Mohd Mujtaba 33 | P a g e
D) INSTR: This function finds the exact numeric position of a specified character or word in a
given string. For example:
Input:
SELECT INSTR ('Hello Javatpoint', 'Javatpoint');
Output: 7
E) LPAD: It returns the padding of the left-side character value for right-justified value. For
example:
Input:
SELECT LPAD ('200', 6,'*');
Output: ***200
F) RPAD: It returns the padding of the right-side character value for left-justified value. For
example:
Input:
SELECT RPAD ('200', 6,'*');
Output: 200***
G) TRIM: This function is used to remove all the defined characters from the beginning, end,
or both. It also trimmed extra spaces. For example:
Input:
SELECT TRIM ('A' FROM 'ABCDCBA');
Output: BCDCB
H) REPLACE: This function is used to replace all occurrences of a word or portion of the string
(substring) with the other specified string value. For example:
Input:
SELECT REPLACE ( 'It is the best coffee at the famous coffee shop.', 'coffee', 'tea');
Output: It is the best tea at the famous tea shop.
Q39 What is the difference between the WHERE and HAVING clauses?
The main difference is that the WHERE clause is used to filter records before any groupings
are established, whereas the HAVING clause is used to filter values from a group. The below
comparison chart explains the most common differences:
WHERE HAVING
It does not allow to work with aggregate It can work with aggregate functions.
functions.
This clause can be used with the SELECT, This clause can only be used with the SELECT
UPDATE, and DELETE statements. statement.
Mohd Mujtaba 34 | P a g e
Q39 What is the difference between the RANK() and DENSE_RANK() functions?
The RANK function determines the rank for each row within your ordered partition in the
result set. If the two rows are assigned the same rank, then the next number in the ranking
will be its previous rank plus a number of duplicate numbers. For example, if we have three
records at rank 4, the next rank listed would be ranked 7.
The DENSE_RANK function assigns a unique rank for each row within a partition as per the
specified column value without any gaps. It always specifies ranking in consecutive order. If
the two rows are assigned the same rank, this function will assign it with the same rank, and
the next rank being the next sequential number. For example, if we have 3 records at rank 4,
the next rank listed would be ranked 5.
Q43Write the SQL query to get the third maximum salary of an employee from a table
named employees.
Employee table
employee_name salary
A 24000
C 34000
D 55000
E 75000
F 21000
G 40000
H 50000
SELECT * FROM(
SELECT employee_name, salary, DENSE_RANK()
OVER(ORDER BY salary DESC)r FROM Employee)
WHERE r=&n;
To find 3rd highest salary set n = 3
Mohd Mujtaba 35 | P a g e
Q44 How to find the nth highest salary in SQL?
The most typical interview question is to find the Nth highest pay in a table. This work can be
accomplished using the dense rank() function.
Employee table
employee_name salary
A 24000
C 34000
D 55000
E 75000
F 21000
G 40000
H 50000
SELECT * FROM(
SELECT employee_name, salary, DENSE_RANK()
OVER(ORDER BY salary DESC)r FROM Employee)
WHERE r=&n;
Q45 Write a SQL query to find the names of employees that begin with ‘A’?
To display name of the employees that begin with ‘A’, type in the below command:
SELECT * FROM Table_name WHERE EmpName like 'A%'
Q46 What is the main difference between ‘BETWEEN’ and ‘IN’ condition operators?
BETWEEN operator is used to display rows based on a range of values in a row whereas the
IN condition operator is used to check for values contained in a specific set of values.
Example of BETWEEN:
SELECT * FROM Students where ROLL_NO BETWEEN 10 AND 50;
Example of IN:
SELECT * FROM students where ROLL_NO IN (8,15,25);
Mohd Mujtaba 36 | P a g e
You may also supply parameters to a stored procedure so that it can act based on the
value(s) of the parameter(s) given.
Mohd Mujtaba 37 | P a g e
TABLEAU INTERRVIEW QUESTIONS
Q1 What is data visualization?
Data visualization means the graphical representation of data or information. We can use
visual objects like graphs, charts, bars, and a lot more. Data visualization tools provide an
accessible way to see and understand the data easily.
Mohd Mujtaba 38 | P a g e
Q4 What is Tableau?
• Tableau is a business intelligence software.
• It allows anyone to connect to the respective data.
• Visualizes and creates interactive, shareable dashboards.
• Tableau Packaged Workbook (.twbx) – zip file containing .twb and external files.
Q6 What are the different Tableau Products and what is the latest version of Tableau?
Here is the Tableau Product family.
(i)Tableau Desktop:
It is a self service business analytics and data visualization that anyone can use. It translates
pictures of data into optimized queries. With tableau desktop, you can directly connect to data
from your data warehouse for live upto date data analysis. You can also perform queries
without writing a single line of code. Import all your data into Tableau’s data engine from
multiple sources & integrate altogether by combining multiple views in a interactive dashboard.
Mohd Mujtaba 39 | P a g e
(ii)Tableau Server:
It is more of an enterprise level Tableau software. You can publish dashboards with Tableau
Desktop and share them throughout the organization with web-based Tableau server. It
leverages fast databases through live connections.
(iii)Tableau Online:
This is a hosted version of Tableau server which helps makes business intelligence faster and
easier than before. You can publish Tableau dashboards with Tableau Desktop and share
them with colleagues.
(iv)Tableau Reader:
It’s a free desktop application that enables you to open and view visualizations that are built
in Tableau Desktop. You can filter, drill down data but you cannot edit or perform any kind of
interactions.
(v)Tableau Public:
This is a free Tableau software which you can use to make visualizations with but you need
to save your workbook or worksheets in the Tableau Server which can be viewed by anyone.
Mohd Mujtaba 40 | P a g e
Q9 What is the difference between .twb and .twbx extension?
• A .twb is an xml document which contains all the selections and layout made you have
made in your Tableau workbook. It does not contain any data.
• A .twbx is a ‘zipped’ archive containing a .twb and any external files such as extracts
and background images.
Q13 What are the different connections you can make with your dataset?
We can either connect live to our data set or extract data onto Tableau.
• Live: Connecting live to a data set leverages its computational processing and
storage. New queries will go to the database and will be reflected as new or updated
within the data.
• Extract: An extract will make a static snapshot of the data to be used by Tableau’s
data engine. The snapshot of the data can be refreshed on a recurring schedule as a
whole or incrementally append data. One way to set up these schedules is via the
Tableau server.
The benefit of Tableau extract over live connection is that extract can be used anywhere
without any connection and you can build your own visualization without connecting to
database.
Mohd Mujtaba 41 | P a g e
DataType Possible Values
Boolean True/False
Date Date Value (December 28, 2016)
Date & Timestamp values (December 28, 2016
Date & Time
06:00:00 PM)
Geographical Values Geographical Mapping (Beijing, Mumbai)
Text/String Text/String
Number Decimal (8.00)
Number Whole Number (5)
Q19 What are the different filters in Tableau and how are they different from each
other?
All the organizations use filters to reduce the size of the dataset and removing irrelevant
information to improve the performance or highlight the required information. In Tableau,
there are different ways to filter the dataset to increase data efficacy. Each filter is created
for different purposes and the order in which they are executed can change the performance
drastically. There are 6 types of filter used in Tableau sorted by order of execution;
1. Extract Filter
We can use extract filter while loading the dataset into Tableau, so it reduces the number of
times Tableau queries for the data source. We can further reduce the size of the data by
applying filters to the extract as required.
2. Data Source Filter
This filters any important or sensitive information that we want to control while loading the
data into Tableau. It works on both Live and Extract connection. We can add the data source
filter on any column by clicking on the ADD option.
Mohd Mujtaba 42 | P a g e
After clicking on the ADD option, the ADD Filter dialog box will appear containing all the
fields, then we can select the field that we want to apply the filter on. We can also edit or
remove the data source filters as required.
3. Context Filters
The filters used in Tableau are normally the independent filters that produce their own result
but there are certain filters that are executed to process the records as returned by the first
filter. Context filter is an independent filter that creates a different worksheet out of the
original dataset and computes the calculation in the filtered dataset. It can be used to
improve the performance of large data sources. They can be created by dragging the
dimension to the filter section box and clicking on Add to context option. By clicking on this
the dimension will change to a grey color which is an indication of the context filter.
It can also be used to view Top N products in any particular category. We can also remove
the Context filter.
4. Dimension Filter
Fields in Dimension contain discrete categorical data and we can exclude or include the
values that we want to analyze. The process of adding the dimension filter is simple and is
given as follows;
• Drag the dimension from the dimension list to the filter section box.
• It will open a Filter Dialog box where we can select the values that we want to
analyze.
Mohd Mujtaba 43 | P a g e
There are four tabs in the Filter dialog box:
1. General: To select the members present in the dimension that we want to include or
exclude.
2. Wildcard: To filter the result on the basis of a particular pattern. For e.g. if we want to
filter the email address of a particular domain then we can use the filter that ends
with “@yahoo.com” to include those email addresses.
3. Condition: To filter the result on the basis of a particular condition.
4. Top: To filter the Top N products of a particular category.
5. Measure Filters
Measure fields contain quantitative data and these filters are applied to the measure fields. It
can be applied by following the below procedure:
Drag the measured field from the Measure box to Filter section and a Filter dialog box will
open containing various operations.
Select the operation that needs to be performed and click Next. In subsequent dialog box
there are 4 types of filter:
1. Range: To select the range of values to include in the result.
2. At least: To select the minimum value of a measure to filter the data.
3. At most: To select the maximum value of a measure to filter the data.
4. Special: To select null or non-null values.
6. Table Calculation Filter
These filters are used when we do not want to filter the view without changing the underlying
data. Table Calculations are functions used when creating Calculated Fields such
as LOOKUP, WINDOW_SUM, WINDOW_AVG, etc.
Mohd Mujtaba 44 | P a g e
Q21 What is a dual axis?
Dual Axis is an excellent phenomenon supported by Tableau that helps users view two scales
of two measures in the same graph. Many websites like Indeed.com and other make use of
dual axis to show the comparison between two measures and their growth rate in a septic set
of years. Dual axes let you compare multiple measures at once, having two independent axes
layered on top of one another. This is how it looks like:
Blended axis is the axis where multiple measures are shown in a single axis and all the
marks are shown in a single pane. We can blend measures by dragging the 1st measure on
one axis and the 2nd on the existing axis.
• Drag a dimension in a column
• Drag the first measure in the column
• Drag the second measure in the existing axis
• Us/multiplemeasures_blendedaxes.html
Mohd Mujtaba 45 | P a g e
Q25 Give a brief about the tableau dashboard?
Tableau dashboard is a group of various views which allows you to compare different types
of data simultaneously. Datasheets and dashboards are connected if any modification
happens to the data that directly reflects in dashboards. It is the most efficient approach to
visualize the data and analyze it.
Mohd Mujtaba 46 | P a g e
Q31 What is a Column chart?
A column chat visualizes the data as a set of rectangle columns, as their lengths are
proportional to values when they represent the data. The horizontal axis shows the category
to which they belong, and the vertical axis shows the values.
Mohd Mujtaba 47 | P a g e
Q36 What is an Area Chart?
An area chart is nothing but line chat, the area between the x-axis and lines will be color or
patterns. These charts are typically used to represent accumulated totals over time and are
the conventional way to display stacked lines
Q37 What is Context Filter and show the steps on how to create the Context Filter
Tableau?
Context Filters are applied to the data rows before any other filters. They are limited to
views, but they can be applied on selected sheets. They define Aggregation and
Disaggregation of data in Tableau
Step 1: Drag the subcategory dimensions to the row shelf and measure sales to the column
shelf. Now choose the horizontal bar chart as chart type and again drag the sub-category
dimensions to the filter shelf. Then we will get the following chart.
Step 2: Right-click on the Sub-Category field in the filter shelf and go to the Top fourth tab.
Choose the option field, from the next drop-down and choose the option Top 10 by Sales
Sum as shown in the following screenshot.
Step 3: Drag the dimension Category to the filter shelf. Give right-click on the general tab to
edit and under that choose Furniture from the list. As you can see the result shows three
subcategories of products.
Step 4: Right-click the Category: Furniture filter and select the option Add to Context. This
produces the final result, which shows the subcategory of products from the category
Furniture which are among the top 10 subcategories across all the products.
Mohd Mujtaba 48 | P a g e
Q42 Differentiate discrete and continuous data roles in Tableau
Discrete data roles consist of values that are separate and distinct. Discrete data roles can
take individual values within a range. For Example – cancer patients in the hospital, no. of
threads in a sheet, state. Discrete values are displayed as blue icons in the data window and
blue pills on shelves. Discrete fields can be sorted.
Continuous data roles consist of any value within the finite or infinite intervals. For Example –
age, unit price, order quantity. Continuous values displayed as green icons in the data window
and green pills on shelves. Continuous fields cannot be sorted.
Discrete Continuous
Discrete data is the value that is counted Continuous data is used to measure
as distinct or separate. continuous data.
Only It can take individual values within a It can take any values within a finite and
range. infinite range.
Q45 What is the difference between the Tree map and Heat map?
A treemap also does the same thing as well A heat map can compare categories with
as it can be used for illustrating hierarchical color and size
data and part of whole relationships. In the heat map, you can compare two
different measures together.
Q46 What is the difference between Data Joining and Data blending?
Data joining is used when you are combing Data blending is required two completely
the data from the same source. defined data sources in a report.
Mohd Mujtaba 49 | P a g e
Q47 What are the different Tableau files?
o Bookmarks: It contains only single worksheet and its easy way to share your work.
o Workbooks: Workbook can hold one or more dashboards and worksheets.
o Packaged workbooks: It contains the workbook along with any supporting local file
data and background images.
o Data extraction files: Data extraction files are a local copy of a data source or a
subset.
o Data connection files: Data connection file is a small XML file that contain various
connection information.
Q48 What is the difference between the published data source and an embedded data
source?
Mohd Mujtaba 50 | P a g e
POWER BI INTERVIEW QUESTIONS
Q1 What is Power BI?
Power BI is a business analytics tool developed by Microsoft that helps you turn multiple
unrelated data sources into valuable and interactive insights. These data may be in the form
of an Excel spreadsheet or cloud-based/on-premises hybrid data warehouses. You can
easily connect to all your data sources and share the insights with anyone.
Tableau Power BI
Tableau uses MDX for measures and Power BI uses DAX for calculating
dimensions measures
Mohd Mujtaba 51 | P a g e
Q4 What is Power BI Desktop
Power BI Desktop is an open-source application designed and developed by Microsoft.
Power BI Desktop will allow users to connect to, transform, and visualize your data with
ease. Power BI Desktop lets users build visuals and collections of visuals that can be shared
as reports with your colleagues or your clients in your organization.
Q7 What is Visualization?
Visualization is a process to represent data in pictorial form like tables, graphs, or charts based
on the specific requirement.
Q9 What is a Report?
The report is a Power BI feature that is a result of visualized data from a single data set. A
report can have multiple pages of visualization.
Mohd Mujtaba 52 | P a g e
Q12 List out some drawbacks/limitations of using Power BI.
Here are some limitations to using Power BI:
• Power BI does not accept file sizes larger than 1 GB and doesn't mix imported data
accessed from real-time connection ns.
• There are very few data sources that allow real-time connections to Power BI reports and
dashboards.
• It only shares dashboards and reports with users logged in with the same email address.
• Dashboard doesn't accept or pass user, account, or other entity parameters.
Q13 What are some differences in data modeling between Power BI Desktop and
Power Pivot for Excel?
Power Pivot for Excel supports only single directional relationships (one to many), calculated
columns, and one import mode. Power BI Desktop supports bi-directional cross-filtering
connections, security, calculated tables, and multiple import options.
Q15 What are the key differences between a Power BI dataset, a report, and a
dashboard?
Dataset Report Dashboard
A Power BI dashboard is a
A Power BI dataset can have Each report can have multiple single page, often called a
many data sources. sheets. canvas, that uses
visualizations to tell a story.
Mohd Mujtaba 53 | P a g e
Q17 What is self-service BI, anyway?
SSBI is an abbreviation for Self-Service Business Intelligence and is a breakthrough in
business intelligence. SSBI has enabled many business professionals with no technical
or coding background to use Power BI and generate reports and draw predictions
successfully. Even non-technical users can create these dashboards to help their business
make more informed decisions.
Q19 What are some differences in data modelling between Power BI Desktop and
Power Pivot for Excel?
Power Pivot for Excel supports only single directional relationships (one to many), calculated
columns, and one import mode. Power BI Desktop supports bi-directional cross-filtering
connections, security, calculated tables, and multiple import options.
Q20 What are the various types of refresh options provided in Power BI?
Four important types of refresh options provided in Microsoft Power BI are as follows:
• Package refresh - This synchronizes your Power BI Desktop or Excel file between the
Power BI service and OneDrive, or SharePoint Online.
• Model or data refresh - This refreshes the dataset within the Power BI service with data
from the original data source.
• Tile refresh - This updates the cache for tile visuals every 15 minutes on the dashboard
once data changes.
• Visual container refresh - This refreshes the visible container and updates the cached
report visuals within a report once the data changes.
Q21 Name the data sources can Power BI can connect to?
Several data sources can be connected to Power BI, which is grouped into three main types:
• Files
It can import data from Excel (.xlsx, .xlxm), Power BI Desktop files (.pbix) and Comma-
Separated Values (.csv).
• Content Packs
These are a collection of related documents or files stored as a group. There are two
types of content packs in Power BI:
Content packs from services providers like Google Analytics, Marketo, or Salesforce and
Content packs are created and shared by other users in your organization.
• Connectors
Connectors help you connect your databases and datasets with apps, services, and data
in the cloud.
Mohd Mujtaba 54 | P a g e
Q23 What is row-level security?
Row-level security limits the data a user can view and has access to, and it relies on filters.
Users can define the rules and roles in Power BI Desktop and also publish them to Power BI
Service to configure row-level security.
Mohd Mujtaba 55 | P a g e
Q26 What are the critical components of the Power BI toolkit?
The critical components of Power BI are mentioned below.
• Power Query
• Power Pivot
• Power View
• Power Map
• Power Q&A
• Power Query: It is one of the most important components of PowerBI to transform
data. Power Query helps to extract data from different data sources like Oracle, SQL,
Text/CSV files, Excel, etc. and even delete data from different sources.
• Power Pivot : It is used for data modeling that uses DAX ( Data Analysis
Expression) functions for the calculations. Relationships between different tables can
also be created here and we can get values that can be shown in Pivot Tables.
• Power View: The Power View is used for providing an intuitive display of the data
and retrieving the metadata for data analysis. The views are interactive in nature and
slicers and filters can be used for slicing and dicing the data.
• Power BI Desktop: Power Desktop is an integration tool for Power Query, Power
View, and Power Pivot. It helps to create advanced queries, data models, reports and
dashboards and helps in developing your BI skills for data analysis.
• Power BI Mobile Application: It is available for the Operating systems Android, iOS
and even Windows. The App has an interactive display of the dashboards which can
be shared as well.
• Power Map: It presents geo-spatial visualization of the data in 3 Dimensional Mode.
The data can be highlighted based on the geographical location which can be
continent, state, city or even street address.
• Power Q&A : It is used to provide answers to the questions asked by users. It works
with Power View and can be answered with representations by Power Q&A.
• Power BI Services
• Power BI Mobile
• Power BI Gateway
• Power BI Premium
• Power BI Embedded
Mohd Mujtaba 56 | P a g e
Q29 Explain responsive slicers in Power BI.
On a Power BI final report page, a developer can resize a responsive slicer to various sizes
and shapes, and the data collected in the container will be rearranged to find a match. If a
visual report becomes too small to be useful, an icon representing the visual takes its place,
saving space on the report page.
Mohd Mujtaba 57 | P a g e
Q34 What is the difference between a Filter and a Slicer?
Filters are used to restrict users and not allow them to interact with dashboards or reports,
while the slicers are used to interact with dashboards and reports.
Q35 What is the difference between a new column and a new measure in Power BI?
In Power BI, a new column is an area where the physical data is stored when logic is applied.
On the other hand, the measure is where the calculations are performed on the fly based on
dimensions. The measure doesn't store any physical data like Column.
Q37 What are the various type of users who can use Power BI?
Ans: PowerBI can be used by anyone for their requirements but there is a particular group
of users who are more likely to use it:
Report Consumers: They consume the reports based on a specific information they need
Report Analyst: Report Analysts need detailed data for their analysis from the reports
Self Service Data Analyst: They are more experienced business data users. They have an
in-depth understanding of the data to work with.
Basic Data Analyst: They can build their own datasets and are experienced in PowerBI
Service
Advanced Data Analyst: They know how to write SQL Queries and have hands-on
experience on PowerBI. They have experience in Advanced PowerBI with DAX training and
data modelling.
measure Name
B- = – indicate beginning of formula
C- DAX Function
D- Parenthesis for Sum Function
E- Referenced Table
F- Referenced column name
Mohd Mujtaba 58 | P a g e
date/time, time intelligence, information, logical, mathematical, statistical, text, parent/child,
and others.
Context
There are two types: row context and filter context. Row context comes into play whenever a
formula has a function that applies filters to identify a single row in a table. When one or
more filters are applied in a calculation that determines a result or value, the filter context
comes into play.
Name some commonly used tasks in the Query Editor.
• Connect to data
• Shape and combine data
• Group rows
• Pivot columns
• Create custom columns
• Query formulas
Q42 What are the purpose and benefits of using the DAX function?
DAX or Data Analysis Expression is a functional language which can create calculated
columns and/or measures for smarter calculations to limit the data the dashboard has to
fetch and visualize.
Mohd Mujtaba 59 | P a g e
Q45 What is special or unique about the CALCULATE and CALCULATETABLE
functions?
These are the only functions that allow you modify filter context of measures or tables.
• Add to existing filter context of queries.
• Override filter context from queries.
• Remove existing filter context from queries.
Limitations:
• Filter parameters can only operate on a single column at a time.
• Filter parameters cannot reference a metric.
Mohd Mujtaba 60 | P a g e
Q51 How are a Power BI Dashboard and Report different from each other?
To understand the difference between Power BI Dashboard and Report, let’s run through
some quick points.
Capability Report Dashboard
It has a single dataset per Can have data tiles from one or more
Data sources
report. datasets or reports.
Set alerts No option for setting alerts. Enable setting email alerts
Mohd Mujtaba 61 | P a g e
Q57 Which language is used in Power Query?
A new programming language is used in power query called M-Code. It is easy to use and
similar to other languages. M-code is case-sensitive language.
Q58 Why do we need Power Query when Power Pivot can import data from mostly used
sources?
Power Query is a self-service ETL (Extract, Transform, Load) tool which runs as an Excel add-
in. It allows users to pull data from various sources, manipulate said data into a form that suits
their needs and load it into Excel. It is most optimum to use Power Query over Power Pivot as
it lets you not only load the data but also manipulate it as per the users needs while loading.
Mohd Mujtaba 62 | P a g e
REFERENCE
https://www.interviewbit.com/data-analyst-interview-questions/
https://www.upgrad.com/blog/data-analyst-interview-questions-and-answer/
https://www.simplilearn.com/tutorials/data-analytics-tutorial/data-analyst-interview-
questions
https://www.springboard.com/blog/data-analytics/excel-interview-questions/
https://www.javatpoint.com/what-is-database
https://www.javatpoint.com/sql-interview-questions
https://www.geeksforgeeks.org/sql-interview-questions/
https://www.edureka.co/blog/interview-questions/top-tableau-interview-questions-and-
answers/
https://mindmajix.com/tableau-interview-questions
https://www.interviewbit.com/tableau-interview-questions/
https://www.simplilearn.com/tableau-interview-questions-and-answers-article
https://www.javatpoint.com/tableau-interview-questions
https://intellipaat.com/blog/interview-question/tableau-interview-questions/
https://www.simplilearn.com/power-bi-interview-questions-and-answers-article
https://www.edureka.co/blog/interview-questions/power-bi-interview-questions/
https://mindmajix.com/power-bi-interview-questions
https://www.javatpoint.com/power-bi-interview-questions
Mohd Mujtaba 63 | P a g e