COGNIZANT Data Analyst Interview Questions Part 2-11
COGNIZANT Data Analyst Interview Questions Part 2-11
Interview Questions
0-3 YOE
CTC : 8-10 lpa
Q1: How would you optimize a complex Power BI report with
multiple data sources and large datasets to improve
performance?
Optimizing Power BI reports is crucial to ensure they load fast and offer a smooth user
experience, especially when working with large datasets or multiple sources.
o Import mode loads data into Power BI memory and performs faster.
o Use Power Query to remove nulls, irrelevant columns, and apply filters early.
o This can reduce unnecessary hidden tables and improve memory usage.
6. Use Aggregations:
o Create summary tables for large datasets and let Power BI use them before
querying the detailed table.
o Helps identify slow visuals and DAX queries, allowing you to fine-tune
performance.
Assumptions:
CALCULATE(
[Total Sales],
DATESINPERIOD(
'Date'[Date],
MAX('Date'[Date]),
-12,
MONTH
What it does:
If your fiscal year starts in April (not January), create a Fiscal Year Offset column in your
Date Table:
FiscalMonthNumber =
MOD(MONTH('Date'[Date]) - 4 + 12, 12) + 1
Then, use this logic to build a proper Fiscal Year column and apply a rolling 12-month logic
based on fiscal calendar, not just calendar months.
Or alternatively, create a custom fiscal period table and link your model accordingly.
DIVIDE(
12
Steps to Implement:
[Region] = "East"
Power BI does not directly support report-level RLS (meaning RLS defined only inside the
report layer). However, you can achieve this via:
Best Practice: Always enforce RLS at the dataset level, not report level, for better
security and maintenance.
Example:
Implement a role:
ON Target.CustomerID = Source.CustomerID
UPDATE DimCustomer
In Power BI Context:
• For Type 2, your fact table should be joined to the active row using EndDate IS
NULL or to a time-aware version of the dimension using BETWEEN FactDate AND
Dim.StartDate/EndDate.
• Use Power Query or SQL views to present the latest version or full history
depending on need.
Scenario: A customer changes their address, and you want to track previous addresses in
your sales reports.
A running total (aka cumulative sum) calculates the sum of a column, incrementally, over
a defined order—here it's by sale_date, partitioned by product_category.
SELECT
sale_id,
product_category,
sale_date,
sale_amount,
SUM(sale_amount) OVER (
PARTITION BY product_category
ORDER BY sale_date
) AS running_total
FROM SalesData;
Output:
INNER JOIN
SELECT *
FROM Customers c
• Returns all rows from the left table, and matched rows from the right.
Use Case: Show all products, including those that haven't been sold.
SELECT *
FROM Products p
Use when your focus is on the left table, and you want to include unmatched records.
Use Case: List all employees in a department, even if they have no task assigned.
SELECT *
FROM Tasks t
Use when you want to retain all data from the right table.
Use Case: Combine customer and supplier addresses into one list, even if some
customers haven’t bought or some suppliers haven’t supplied.
SELECT *
FROM Customers c
Best when you want to combine two datasets and retain everything.
Summary Table:
FULL OUTER JOIN All rows from both with matches & gaps on both
Q7: How would you implement incremental refresh in Power
BI to efficiently update large datasets?
What is Incremental Refresh?
Incremental refresh is a technique in Power BI that refreshes only new or changed data,
instead of reloading the entire dataset, significantly improving performance and reducing
load time—ideal for large datasets.
o RangeStart (Date/Time)
o RangeEnd (Date/Time)
• Filter the table using a date column based on RangeStart and RangeEnd.
powerquery
• Choose:
Use Case:
For a dataset containing sales transactions since 2010, you don’t need to refresh all 10
years daily—just the latest day's data.
A CTE (Common Table Expression) is a temporary named result set defined using WITH,
which you can reference within a SELECT, INSERT, UPDATE, or DELETE statement.
When breaking down logic into modular steps, especially when nesting subqueries makes
it unreadable.
WITH FilteredSales AS (
FROM Sales
GROUP BY product_id
SELECT product_id
FROM FilteredSales
WITH EmployeeCTE AS (
SELECT EmployeeID, ManagerID, 1 AS Level
FROM Employees
UNION ALL
FROM Employees e
Rather than repeating the same subquery multiple times, define it once using a CTE and
reuse it.
Temp tables persist until dropped, whereas CTEs exist only during query execution—ideal
for one-time transformations.
Scope Only within that query Nested inside query Exists until dropped
Reuse in
Yes No Yes
Query
Feature CTE Subquery Temp Table
Creating custom visuals using R or Python helps when native Power BI visuals aren’t
enough — for example, heatmaps, advanced forecasting, or machine learning charts.
dataset = dataset.dropna()
plt.show()
library(ggplot2)
geom_boxplot() +
theme_minimal()
Considerations:
You want to create a correlation heatmap between sales metrics across regions — which
isn’t supported by native visuals. You can use Seaborn in Python to build it.
Transform row-based data into a column-based format — manually using CASE WHEN or
aggregate functions like MAX(), SUM() etc.
Sample Table: Sales
East Q1 1000
East Q2 1200
West Q1 900
West Q2 1100
SELECT
Region,
FROM Sales
GROUP BY Region;
Explanation:
• MAX() used to ensure single value is returned per Region for each quarter.