0% found this document useful (0 votes)

2 views43 pages

Google Data Analyst Interview Experience YOE:0-3 CTC: 22-24LPA

The document outlines SQL interview questions and answers for a data analyst position at Google, including how to calculate bounce rates, identify active users, and find frequent search queries. It also discusses Power BI optimization techniques for dashboards with large datasets, emphasizing data model optimization, DAX efficiency, and schema design. The content is structured with SQL queries, explanations, and outputs for clarity.

Uploaded by

shailenderojha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views43 pages

Google Data Analyst Interview Experience YOE:0-3 CTC: 22-24LPA

Uploaded by

shailenderojha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

GOOGLE

DATA ANALYST INTERVIEW

EXPERIENCE
YOE:0-3
CTC: 22-24LPA
SQL

Question 1: Write a query to calculate the bounce rate for a

website using session and page view data.
Concept: Bounce rate is the percentage of single-page sessions (sessions in which the
user viewed only one page) divided by all sessions.

Assumptions:

• You have two tables: sessions and page_views.

• sessions table contains session_id and potentially other session-related details.

• page_views table contains session_id and page_url (or a similar identifier for a
page).

Input Tables:

1. sessions table:

session_id start_time user_id

S001 2025-06-19 10:00:00 U101

S002 2025-06-19 10:05:00 U102

S003 2025-06-19 10:10:00 U101

S004 2025-06-19 10:15:00 U103

S005 2025-06-19 10:20:00 U102

2. page_views table:

page_view_id session_id page_url view_time

PV001 S001 /home 2025-06-19 10:00:15

PV002 S001 /product 2025-06-19 10:01:00

PV003 S002 /about 2025-06-19 10:05:30

PV004 S003 /contact 2025-06-19 10:10:20

PV005 S003 /faq 2025-06-19 10:11:00

PV006 S003 /privacy 2025-06-19 10:11:45

PV007 S004 /index 2025-06-19 10:15:40

PV008 S005 /services 2025-06-19 10:20:10

PV009 S005 /pricing 2025-06-19 10:21:00

SQL Query:

SELECT

(COUNT(DISTINCT CASE WHEN pv_count = 1 THEN session_id ELSE NULL END) * 1.0 /
COUNT(DISTINCT session_id)) AS bounce_rate

FROM (

SELECT

session_id,

COUNT(page_url) AS pv_count

FROM

page_views

GROUP BY

session_id

) AS session_page_counts;
Explanation:

1. Inner Query (session_page_counts):

o SELECT session_id, COUNT(page_url) AS pv_count FROM page_views

GROUP BY session_id: This subquery calculates the total number of page
views for each session_id.

2. Outer Query:

o COUNT(DISTINCT session_id): This counts the total number of unique

sessions.

o COUNT(DISTINCT CASE WHEN pv_count = 1 THEN session_id ELSE NULL

END): This counts the number of unique sessions where pv_count (page
views for that session) is equal to 1. These are our "bounced" sessions.

o * 1.0: We multiply by 1.0 to ensure floating-point division, giving us a decimal

bounce rate.

o The result is the ratio of bounced sessions to total sessions.

Output:

bounce_rate

0.4000000000000000

(Interpretation: 2 out of 5 sessions were bounces (S002 and S004 each had only one page
view)).

Question 2: From a user_activity table, find the number of users

who were active on at least 15 days in a given month.
Concept: This requires counting distinct days a user was active within a specific month
and then filtering for users who meet the 15-day threshold.

Assumptions:

• You have a user_activity table with user_id and activity_date.

• "Active" means there's at least one entry for that user on that day.

Input Table:
1. user_activity table:

activity_id user_id activity_date activity_type

1 U101 2025-05-01 login

2 U101 2025-05-01 view_product

3 U102 2025-05-01 login

4 U101 2025-05-02 add_to_cart

5 U103 2025-05-03 login

6 U101 2025-05-10 purchase

7 U101 2025-05-15 login

8 U101 2025-05-16 logout

9 U101 2025-05-17 login

10 U101 2025-05-18 view_product

11 U101 2025-05-19 add_to_cart

12 U101 2025-05-20 purchase

13 U101 2025-05-21 login

14 U101 2025-05-22 logout

15 U101 2025-05-23 login

16 U101 2025-05-24 view_product

17 U101 2025-05-25 add_to_cart

18 U101 2025-05-26 purchase

19 U101 2025-05-27 login

20 U101 2025-05-28 logout

21 U102 2025-05-05 login

22 U102 2025-05-10 view_product

23 U102 2025-05-12 purchase

24 U102 2025-05-14 login

25 U103 2025-05-05 login

26 U103 2025-05-10 view_product

27 U103 2025-05-12 purchase

SQL Query:

SELECT

COUNT(user_id) AS num_users_active_15_days

FROM (

SELECT

user_id,

COUNT(DISTINCT activity_date) AS distinct_active_days

FROM

user_activity

WHERE

STRFTIME('%Y-%m', activity_date) = '2025-05' -- For a given month, e.g., May 2025

GROUP BY

user_id

HAVING

distinct_active_days >= 15

) AS active_users_summary;

Explanation:

1. Inner Query (active_users_summary):

o SELECT user_id, COUNT(DISTINCT activity_date) AS distinct_active_days:
This counts the number of unique activity_date entries for each user_id.

o FROM user_activity: Specifies the table.

o WHERE STRFTIME('%Y-%m', activity_date) = '2025-05': This filters the data for

a specific month (May 2025 in this example). STRFTIME (or similar date
formatting functions like TO_CHAR in PostgreSQL/Oracle, FORMAT in SQL
Server, DATE_FORMAT in MySQL) extracts the year and month from the
activity_date.

o GROUP BY user_id: Groups the results by user to count distinct days per
user.

o HAVING distinct_active_days >= 15: Filters these grouped results, keeping

only those users who have been active on 15 or more distinct days.

2. Outer Query:

o COUNT(user_id) AS num_users_active_15_days: This simply counts the

number of user_ids that met the criteria from the inner query.

Output:

num_users_active_15_days

(Interpretation: Only User U101 was active on 15 or more days in May 2025.)

Question 3: You have a search_logs table with query, timestamp,

and user_id. Find the top 3 most frequent search queries per
week.
Concept: This involves grouping by week, then by query, counting the occurrences, and
finally ranking queries within each week to get the top 3.

Assumptions:

• You have a search_logs table with query, timestamp, and user_id.

Input Table:
1. search_logs table:

log_id query timestamp user_id

1 "data analyst" 2025-06-03 10:00:00 U101

2 "SQL basics" 2025-06-03 11:00:00 U102

3 "data analyst" 2025-06-04 09:30:00 U103

4 "Python" 2025-06-05 14:00:00 U101

5 "data analyst" 2025-06-05 16:00:00 U102

6 "SQL basics" 2025-06-06 10:00:00 U101

7 "Python" 2025-06-06 11:00:00 U103

8 "machine learning" 2025-06-07 10:00:00 U101

9 "data analyst" 2025-06-10 09:00:00 U102

10 "SQL advanced" 2025-06-10 10:00:00 U101

11 "SQL advanced" 2025-06-11 11:00:00 U103

12 "Python" 2025-06-11 12:00:00 U102

13 "data analyst" 2025-06-12 13:00:00 U101

14 "machine learning" 2025-06-13 14:00:00 U103

15 "Python" 2025-06-13 15:00:00 U101

16 "data visualization" 2025-06-14 16:00:00 U102

(Note: Assuming week starts on Monday for simplicity, but the exact week start day
depends on the SQL dialect's date functions.)

SQL Query:

SELECT
week_start_date,

query,

query_count

FROM (

SELECT

STRFTIME('%Y-%W', timestamp) AS week_identifier, -- Or DATE_TRUNC('week',

timestamp) for PostgreSQL, etc.

MIN(DATE(timestamp, 'weekday 0')) AS week_start_date, -- Adjust 'weekday 0' for your

desired week start (Sunday). Use 'weekday 1' for Monday.

query,

COUNT(query) AS query_count,

ROW_NUMBER() OVER (PARTITION BY STRFTIME('%Y-%W', timestamp) ORDER BY

COUNT(query) DESC) AS rn

FROM

search_logs

GROUP BY

week_identifier,

query

) AS weekly_query_counts

WHERE

rn <= 3

ORDER BY

week_start_date,

query_count DESC;

Explanation:

1. Inner Query (weekly_query_counts):

o STRFTIME('%Y-%W', timestamp) AS week_identifier: This extracts the year
and week number from the timestamp. %W typically represents the week
number of the year, with the first Monday as the first day of week 01. (For
different SQL dialects, you'd use functions like DATE_TRUNC('week',
timestamp) in PostgreSQL, DATEPART(week, timestamp) in SQL Server, or
WEEK(timestamp) in MySQL).

o MIN(DATE(timestamp, 'weekday 0')) AS week_start_date: This tries to get a

clear start date for the week. DATE(timestamp, 'weekday 0') in SQLite will give
you the most recent Sunday. Adjust 'weekday 1' for Monday, etc., based on
your database. This is important for a more readable output of the week.

o query: The search query itself.

o COUNT(query) AS query_count: Counts the occurrences of each query

within each week_identifier.

o GROUP BY week_identifier, query: Groups the data first by week, then by

query, to get counts for each unique query in each week.

o ROW_NUMBER() OVER (PARTITION BY STRFTIME('%Y-%W', timestamp)

ORDER BY COUNT(query) DESC) AS rn: This is a window function:

▪ PARTITION BY STRFTIME('%Y-%W', timestamp): It divides the data into

partitions (groups) for each week.

▪ ORDER BY COUNT(query) DESC: Within each week, it orders the

queries by their query_count in descending order (most frequent first).

▪ ROW_NUMBER(): Assigns a unique rank (1, 2, 3...) to each query

within its week, based on the ordering.

2. Outer Query:

o SELECT week_start_date, query, query_count: Selects the relevant columns.

o FROM ( ... ) AS weekly_query_counts: Uses the result of the inner query as a

subquery.

o WHERE rn <= 3: Filters the results to include only the top 3 ranked queries for
each week.

o ORDER BY week_start_date, query_count DESC: Orders the final output by

week and then by query count for better readability.
Output:

week_start_date query query_count

2025-06-01 data analyst 3

2025-06-01 SQL basics 2

2025-06-01 Python 2

2025-06-08 data analyst 2

2025-06-08 Python 2

2025-06-08 SQL advanced 2

(Note: The week_start_date might vary slightly depending on your database's exact week-
starting conventions for the date functions used. I used 'weekday 0' for Sunday in SQLite,
which makes 2025-06-01 a Sunday for the first week.)

POWER BI

1. How do you optimize a Power BI dashboard with millions of

rows for performance and user experience?
Optimizing a Power BI dashboard with millions of rows is crucial for responsiveness and
user satisfaction. This involves a multi-pronged approach covering data modeling, DAX,
visuals, and infrastructure.

Here's a detailed breakdown:

A. Data Model Optimization (Most Impactful):

1. Import Mode vs. DirectQuery/Live Connection:

o Import Mode: Generally offers the best performance because data is loaded
into Power BI's in-memory engine (VertiPaq). This is where most
optimizations apply.
o DirectQuery/Live Connection: Data remains in the source. Performance
heavily depends on the source system's speed and network latency.
Optimize the source database queries/views first.

o Hybrid (Composite Models): Combine Import and DirectQuery tables. Use

DirectQuery for large fact tables where real-time data is critical and Import
for smaller, static dimension tables. This is a powerful optimization.

2. Reduce Cardinality:

o Remove Unnecessary Columns: Delete columns not used for reporting,

filtering, or relationships. This reduces model size significantly.

o Reduce Row Count: Apply filters at the source or during data loading (e.g.,
only load the last 5 years of data if that's all that's needed).

o Optimize Data Types: Use the smallest appropriate data types (e.g., Whole
Number instead of Decimal where possible). Avoid text data types for
columns that could be numbers or dates.

o Cardinality of Columns: High-cardinality columns (unique values per row,

like timestamps with milliseconds, free-text fields) consume more memory
and slow down performance. Reduce precision for dates/times if not needed
(e.g., date instead of datetime).

3. Optimize Relationships:

o Correct Cardinality: Ensure relationships are set correctly (One-to-Many,

One-to-One).

o Disable Cross-Filter Direction if not needed: By default, Power BI often

sets "Both" directions. Change to "Single" if filtering only flows one way.
"Both" directions can create ambiguity and negatively impact performance.

o Avoid Bidirectional Relationships: Use them sparingly and only when

absolutely necessary, as they can lead to performance issues and
unexpected filter behavior.

4. Schema Design (Star Schema/Snowflake Schema):

o Star Schema is King: Organize your data into fact tables (measures) and
dimension tables (attributes). This is the most efficient design for Power BI's
VertiPaq engine, enabling fast slicing and dicing.
o Denormalization: For dimensions, consider denormalizing (flattening)
tables if they are small and frequently joined, to reduce relationship traversal
overhead.

5. Aggregations:

o Pre-aggregate Data: For very large fact tables, create aggregate tables (e.g.,
daily sums of sales instead of individual transactions).

o Power BI Aggregations: Power BI allows you to define aggregations within

the model, where Power BI automatically redirects queries to a smaller,
aggregated table if possible, improving query speed without changing the
report logic.

B. DAX Optimization:

1. Efficient DAX Formulas:

o Avoid Iterators (X-functions) on Large Tables: Functions like SUMX,

AVERAGEX can be slow if used on entire large tables. Where possible, use
simpler aggregate functions (SUM, AVERAGE).

o Use Variables (VAR): Store intermediate results in variables to avoid

recalculating expressions multiple times. This improves readability and
performance.

o Minimize Context Transitions: Context transitions (e.g., using CALCULATE

without explicit filters) can be expensive. Understand how DAX calculates.

o Use KEEPFILTERS and REMOVEFILTERS strategically: To control filter

context precisely.

o Measure Branching: Break down complex measures into simpler, reusable

base measures.

2. Optimize Calculated Columns:

o Avoid Heavy Calculations in Calculated Columns: Calculated columns

are computed during data refresh and stored in the model, increasing its
size. If a calculation can be a measure, make it a measure.

o Push Calculations Upstream: Perform complex data transformations and

calculations in Power Query (M language) or even better, in the source
database (SQL views, stored procedures).
C. Visual and Report Design Optimization:

1. Limit Number of Visuals: Too many visuals on a single page can lead to slower
rendering.

2. Optimize Visual Types: Some visuals are more performant than others. Table and
Matrix visuals with many rows/columns can be slow.

3. Use Filters and Slicers Effectively:

o Pre-filtered Pages: Create initial views that are already filtered to a smaller
data set.

o "Apply" Button for Slicers: For many slicers, enable the "Apply" button so
queries only run after all selections are made.

o Hierarchy Slicers: Use hierarchy slicers if appropriate, as they can

sometimes be more efficient than many individual slicers.

4. Conditional Formatting: Complex conditional formatting rules can impact

performance.

5. Measure Headers in Matrix/Table: Avoid placing measures in the "Rows" or

"Columns" of a matrix/table, as this significantly increases cardinality and memory
usage.

D. Power Query (M Language) Optimization:

1. Query Folding: Ensure Power Query steps are "folded back" to the source database
as much as possible. This means the transformation happens at the source,
reducing the data transferred to Power BI. Check the query plan for folding
indicators.

2. Remove Unnecessary Steps: Clean up your Power Query steps; remove redundant
transformations.

3. Disable Load: Disable loading for staging queries or queries that are only used as
intermediate steps.

E. Power BI Service and Infrastructure:

1. Premium Capacity: For very large datasets and many users, consider Power BI
Premium (per user or capacity). This provides dedicated resources, larger memory
limits, and features like XMLA endpoint for advanced management.
2. Scheduled Refresh Optimization: Use incremental refresh (discussed in the next
question).

3. Monitoring: Use Power BI Performance Analyzer to identify slow visuals and DAX
queries. Use external tools like DAX Studio to analyze and optimize DAX expressions
and monitor VertiPaq memory usage.

User Experience Considerations:

• Clear Navigation: Use bookmarks, buttons, and drill-throughs for intuitive

navigation.

• Performance Awareness: Inform users about initial load times for large reports.

• Clean Design: Avoid cluttered dashboards. Focus on key metrics.

• Responsiveness: Ensure the dashboard adapts well to different screen sizes.

2. Explain how incremental data refresh works and why it’s

important.
How Incremental Data Refresh Works:

Incremental refresh is a Power BI Premium feature (also available with Power BI Pro for
datasets up to 1GB, but typically used for larger datasets) that allows Power BI to efficiently
refresh large datasets by only loading new or updated data, instead of reprocessing the
entire dataset with every refresh.

Here's the mechanism:

1. Defining the Policy: You configure an incremental refresh policy in Power BI

Desktop for specific tables (usually large fact tables). This policy defines:

o Date/Time Column: A column in your table that Power BI can use to identify
new or changed rows (e.g., OrderDate, LastModifiedDate). This column must
be of Date/Time data type.

o Range Start (RangeStart) and Range End (RangeEnd) Parameters: These

are two reserved DateTime parameters that Power BI automatically generates
and passes to your data source query. They define the "window" of data to be
refreshed.
o Archive Period: How many past years/months/days of data you want to keep
in the Power BI model. This data will be loaded once and then not refreshed.

o Refresh Period: How many recent years/months/days of data should be

refreshed incrementally with each refresh operation. This is the "sliding
window" for new/updated data.

2. Partitioning: When you publish the report to the Power BI Service, Power BI
dynamically creates partitions for the table based on your incremental refresh
policy:

o Historical Partitions: For the "Archive Period," Power BI creates partitions

that contain historical data. This data is loaded once and then not refreshed
in subsequent refreshes.

o Incremental Refresh Partition(s): For the "Refresh Period," Power BI creates

one or more partitions. Only these partitions are refreshed in subsequent
refresh cycles.

o Real-time Partition (Optional): If you configure a DirectQuery partition, this

can fetch the latest data directly from the source for the freshest view.

3. Refresh Process:

o When a scheduled refresh runs, Power BI calculates the RangeStart and

RangeEnd values based on the current refresh time and your policy.

o It then issues a query to your data source using these parameters, fetching
only the data within the defined refresh window.

o This new/updated data is loaded into the incremental partition(s), and older
incremental partitions might be rolled into the archive partitions or removed,
as the window slides.

Why It's Important:

Incremental refresh is vital for several reasons, especially with large datasets:

1. Faster Refreshes: This is the primary benefit. Instead of reloading millions or

billions of rows, Power BI only fetches tens or hundreds of thousands, dramatically
cutting down refresh times from hours to minutes or seconds.

2. Reduced Resource Consumption:

o Less Memory: Fewer resources are consumed on the Power BI service side
during refresh because less data is being processed.

o Less Network Bandwidth: Less data needs to be transferred from the

source system to Power BI.

o Less Load on Source System: The source database experiences less strain
because queries are filtered to a smaller range, reducing query execution
time and resource usage on the database server.

3. Higher Refresh Frequency: Because refreshes are faster and less resource-
intensive, you can schedule them more frequently (e.g., hourly instead of daily),
providing users with more up-to-date data.

4. Increased Reliability: Shorter refresh windows reduce the chances of refresh

failures due to network timeouts, source system issues, or hitting refresh limits.

5. Scalability: Enables Power BI to handle datasets that would otherwise be too large
or too slow to refresh regularly, making it viable for enterprise-level reporting
solutions.

6. Better User Experience: Users get access to fresh data faster, improving their
decision-making capabilities.

3. What’s the difference between calculated columns and

measures in Power BI, and when would you use each?
Calculated columns and measures are both powerful DAX (Data Analysis Expressions)
constructs in Power BI, but they serve fundamentally different purposes and have distinct
characteristics.

Feature Calculated Column Measure

Calculation At Query Time (when used in a

During Data Refresh (load time)
Time visual)

Stored in the Data Model (adds to

Storage Not Stored (calculated on the fly)
model size)
Row Context (can refer to values in Filter Context (and Row Context
Context
the same row) within iterators)

A single scalar value (number,

Output A new column added to the table
text, date)

Increases PBIX file size & memory

Impact on Size Minimal impact on PBIX file size
usage

Can be aggregated like any other Always aggregated (implicit or

Aggregation
column explicit)

Appears as a column in the Fields Appears as a measure in the Fields

Visibility
pane pane

When to Use Each:

Use Calculated Columns When:

1. You need to create a new categorical attribute:

o Full Name = [FirstName] & " " & [LastName]

o Age Group = IF([Age] < 18, "Child", IF([Age] < 65, "Adult", "Senior"))

2. You need to perform row-level calculations that will be used for slicing, dicing,
or filtering:

o Profit Margin % = ([Sales] - [Cost]) / [Sales] (if you need to filter or group by
this margin on a row-by-row basis).

o Fiscal Quarter = "Q" & ROUNDUP(MONTH([Date])/3,0)

3. You need to define relationships: Calculated columns can be used as the key for
relationships if a direct column from your source isn't suitable. (However, it's often
better to handle this in Power Query if possible).

4. You are creating a static value for each row that doesn't change based on filters
applied in the report.

Use Measures When:

1. You need to perform aggregations or calculations that respond dynamically to

filters and slicers applied in the report:

o Total Sales = SUM(FactSales[SalesAmount])

o Average Order Value = DIVIDE( [Total Sales], COUNTROWS(FactSales) )

o Sales YTD = TOTALYTD([Total Sales], 'Date'[Date])

2. You need to calculate a ratio, percentage, or difference that changes based on

the selected context:

o % of Total Sales = DIVIDE([Total Sales], CALCULATE([Total Sales],

ALL(Product[Category])))

3. You want to perform complex time-intelligence calculations:

o Sales Last Year = CALCULATE([Total Sales],

SAMEPERIODLASTYEAR('Date'[Date]))

4. You want to minimize the model size and optimize performance: Since measures
are calculated on the fly and not stored, they are generally preferred for
performance over calculated columns, especially for large datasets.

5. Your calculation logic changes based on the filter context of the visual.

General Rule of Thumb:

• If you can do it in Power Query (M Language), do it there. This pushes the

calculation closest to the source, often leveraging query folding.

• If it's a row-level calculation that defines a characteristic of that row and you
need to slice/dice by it, use a Calculated Column.

• For all other aggregations and dynamic calculations that react to user
interaction, use a Measure.

Choosing correctly between calculated columns and measures is fundamental for building
efficient, performant, and maintainable Power BI models.

4. How would you implement cross-report drillthrough in Power

BI for navigating between detailed reports?
Cross-report drillthrough in Power BI allows users to jump from a summary visual in one
report to a more detailed report page in a different report, passing the filter context along.
This is incredibly powerful for creating a guided analytical experience across a suite of
related reports.

Here's how you would implement it:

Scenario:

• Source Report (Summary): Sales Overview Dashboard.pbix with a chart showing

"Sales by Region."

• Target Report (Detail): Regional Sales Details.pbix with a table showing individual
sales transactions for a specific region.

Steps to Implement Cross-Report Drillthrough:

1. Prepare the Target Report (Regional Sales Details.pbix):

• Create the Detail Page: Open Regional Sales Details.pbix. Create a new page
dedicated to displaying the detailed information (e.g., "Sales Transactions").

• Add Drillthrough Fields:

o In the "Fields" pane for your detail page, locate the fields that will serve as the
drillthrough filters (e.g., Region Name, Product Category). These are the
fields that will be passed from the source report.

o Drag these fields into the "Drill through" section of the "Visualizations" pane.

o Crucial: Ensure that the data types and column names of these drillthrough
fields are identical in both the source and target reports. If they aren't, the
drillthrough won't work correctly.

• Set "Keep all filters": By default, "Keep all filters" is usually on. This ensures that
any other filters applied to the source visual (e.g., date range, product type) are also
passed to the target report. You can turn it off if you only want to pass the
drillthrough fields explicitly.

• Add Visuals: Add the detailed visuals (e.g., a table showing Date, Product,
Customer, Sales Amount) to this drillthrough page.

• Add a Back Button (Optional but Recommended): Power BI automatically adds a

"back" button for intra-report drillthrough. For cross-report, you usually add a
custom button (Insert > Buttons > Back) and configure its action to "Back" or a
specific bookmark if you have complex navigation. This allows users to easily return
to the summary report.

• Publish the Target Report: Publish Regional Sales Details.pbix to a Power BI

workspace in the Power BI Service. Make sure it's in a workspace that both you and
your users have access to.
2. Prepare the Source Report (Sales Overview Dashboard.pbix):

• Ensure Data Model Consistency: Verify that the drillthrough fields (e.g., Region
Name, Product Category) exist in the source report's data model and have the same
name and data type as in the target report.

• Select the Source Visual: Choose the visual from which you want to initiate the
drillthrough (e.g., your "Sales by Region" bar chart).

• Configure Drillthrough Type:

o Go to the "Format" pane for the selected visual.

o Under the "Drill through" card, ensure "Cross-report" is enabled.

• Choose the Target Report:

o In the "Drill through" card, you'll see a dropdown list of available reports in
your workspace that have drillthrough pages configured.

o Select Regional Sales Details from this list.

• Publish the Source Report: Publish Sales Overview Dashboard.pbix to the same
Power BI workspace as the target report. This is essential for cross-report
drillthrough to work.

3. User Experience in Power BI Service:

• Navigation: When a user views the Sales Overview Dashboard report in the Power
BI Service, they can right-click on a data point in the configured source visual (e.g., a
bar representing "East" region sales).

• Drillthrough Option: A context menu will appear, and they will see an option like
"Drill through" -> "Regional Sales Details."

• Context Passing: Clicking this option will open the Regional Sales Details report,
automatically navigating to the specified drillthrough page. Critically, the Region
Name (e.g., "East") and any other filters from the source visual will be applied to the
Regional Sales Details report, showing only the transactions for the "East" region.

Key Considerations for Cross-Report Drillthrough:

• Workspace: Both reports must be published to the same Power BI workspace. This
is a fundamental requirement.
• Field Matching: Column names and data types of the drillthrough fields must be an
exact match across both reports. Case sensitivity can also be an issue.

• Data Models: While the column names must match, the underlying data models
don't have to be identical. The source report only needs the columns to pass as
filters, and the target report needs those columns to filter its detailed data.

• User Permissions: Users must have at least "Viewer" access to both the source
and target reports in the Power BI Service.

• Security (Row-Level Security - RLS): RLS applied in the target report will respect
the user's RLS role, even if the drillthrough filters pass data that the user wouldn't
normally see. The RLS will act as an additional filter layer.

• Performance: Be mindful of the performance of the target report, especially if it's

loading large volumes of detailed data. Optimize it as per the first question.

• Clarity: Make it clear to users that a drillthrough option exists (e.g., by adding text
instructions or using appropriate visual cues).

Cross-report drillthrough is an advanced feature that significantly enhances the navigability

and analytical depth of your Power BI solutions by allowing you to break down complex
business problems into manageable, linked reports.

PYTHON

1. Write a Python function to detect outliers in a dataset using

the IQR method.
The Interquartile Range (IQR) method is a robust way to identify outliers, as it's less
sensitive to extreme values than methods based on the mean and standard deviation.

Concept:

1. Calculate the First Quartile (Q1) - 25th percentile.

2. Calculate the Third Quartile (Q3) - 75th percentile.

3. Calculate the Interquartile Range (IQR) = Q3 - Q1.

4. Define the Lower Bound: Q1 - 1.5 * IQR

5. Define the Upper Bound: Q3 + 1.5 * IQR

6. Any data point below the Lower Bound or above the Upper Bound is considered an
outlier.

Python Function:

import numpy as np

import pandas as pd

def detect_iqr_outliers(data, column):

"""

Detects outliers in a specified column of a pandas DataFrame using the IQR method.

Args:

data (pd.DataFrame): The input DataFrame.

column (str): The name of the column to check for outliers.

Returns:

pd.DataFrame: A DataFrame containing the detected outliers.

dict: A dictionary containing the outlier bounds (lower_bound, upper_bound).

"""

if column not in data.columns:

raise ValueError(f"Column '{column}' not found in the DataFrame.")

# Ensure the column is numeric

if not pd.api.types.is_numeric_dtype(data[column]):

raise TypeError(f"Column '{column}' is not numeric. IQR method requires numeric

data.")
Q1 = data[column].quantile(0.25)

Q3 = data[column].quantile(0.75)

IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR

upper_bound = Q3 + 1.5 * IQR

outliers = data[(data[column] < lower_bound) | (data[column] > upper_bound)]

bounds = {'lower_bound': lower_bound, 'upper_bound': upper_bound}

return outliers, bounds

# --- Example Usage ---

if __name__ == "__main__":

# Sample Dataset

np.random.seed(42)

data = {

'id': range(1, 21),

'value': np.random.normal(loc=50, scale=10, size=20)

df = pd.DataFrame(data)

# Introduce some outliers

df.loc[3, 'value'] = 150 # Outlier

df.loc[15, 'value'] = 5 # Outlier

df.loc[19, 'value'] = 120 # Outlier

print("Original DataFrame:")

print(df)

print("\n" + "="*30 + "\n")

# Detect outliers in the 'value' column

outliers_df, bounds = detect_iqr_outliers(df, 'value')

print(f"Calculated Bounds for 'value':")

print(f" Q1: {df['value'].quantile(0.25):.2f}")

print(f" Q3: {df['value'].quantile(0.75):.2f}")

print(f" IQR: {(df['value'].quantile(0.75) - df['value'].quantile(0.25)):.2f}")

print(f" Lower Bound: {bounds['lower_bound']:.2f}")

print(f" Upper Bound: {bounds['upper_bound']:.2f}")

print("\n" + "="*30 + "\n")

print("Detected Outliers:")

if not outliers_df.empty:

print(outliers_df)

else:

print("No outliers detected.")

print("\n" + "="*30 + "\n")

# Example with no outliers (after removing manual outliers)

df_clean = df.drop([3, 15, 19]).copy()

print("DataFrame without extreme manual outliers:")

print(df_clean)

outliers_clean, bounds_clean = detect_iqr_outliers(df_clean, 'value')

print("\nDetected Outliers (cleaned data):")

if not outliers_clean.empty:

print(outliers_clean)

else:

print("No outliers detected.")

Explanation:

1. Import numpy and pandas: Essential libraries for numerical operations and
DataFrame manipulation.

2. detect_iqr_outliers(data, column) function:

o Input Validation: Checks if the column exists in the DataFrame and if it's a
numeric data type. This makes the function more robust.

o Calculate Quartiles: data[column].quantile(0.25) and

data[column].quantile(0.75) directly compute Q1 and Q3 using pandas' built-
in quantile method.

o Calculate IQR: IQR = Q3 - Q1.

o Calculate Bounds: lower_bound = Q1 - 1.5 * IQR and upper_bound = Q3 +

1.5 * IQR. The 1.5 factor is a commonly used convention.

o Identify Outliers: data[(data[column] < lower_bound) | (data[column] >

upper_bound)] uses boolean indexing to filter the DataFrame and select rows
where the specified column's value falls outside the calculated bounds.

o Return Values: The function returns two things: a DataFrame containing the
outlier rows and a dictionary with the calculated bounds. This allows the
caller to not only see the outliers but also understand the thresholds used.

3. Example Usage (if name == "main":)

o A sample DataFrame df is created with normally distributed data.

o Specific rows are then manually modified to introduce clear outliers for
demonstration.

o The detect_iqr_outliers function is called, and its output (outlier DataFrame

and bounds) is printed.

o A second example demonstrates the function on a cleaner dataset where no

"extreme" outliers are introduced to show a case with no detected outliers.

2. You have two DataFrames: clicks and installs. Merge them and
calculate the install-to-click ratio per campaign.
This question tests your knowledge of DataFrame merging, aggregation, and basic
arithmetic operations in pandas.

Concept:

1. Merge the clicks and installs DataFrames. A campaign_id is the natural key for
merging. A "left merge" (or "left join") is appropriate if we want to retain all click data
and bring in matching install data.

2. After merging, group the data by campaign_id.

3. For each campaign, count the total clicks and total installs.

4. Calculate the ratio: total_installs / total_clicks. Handle division by zero.

Python Code:

Python

import pandas as pd

def calculate_install_to_click_ratio(clicks_df, installs_df):

"""

Merges clicks and installs DataFrames and calculates the install-to-click ratio per
campaign.

Args:
clicks_df (pd.DataFrame): DataFrame with click data, expected columns:
['campaign_id', 'click_id', ...].

installs_df (pd.DataFrame): DataFrame with install data, expected columns:

['campaign_id', 'install_id', ...].

Returns:

pd.DataFrame: A DataFrame with 'campaign_id', 'total_clicks', 'total_installs', and

'install_to_click_ratio'.

"""

# 1. Merge the two DataFrames

# We'll perform a left merge from clicks_df to installs_df to ensure all clicks are
considered.

# If a click doesn't have a corresponding install, the install_id will be NaN.

# We only need campaign_id and a count for clicks, and campaign_id and a count for
installs.

# Aggregate clicks and installs first by campaign_id to reduce merge size

# This is often more efficient for large datasets than merging raw rows

clicks_agg = clicks_df.groupby('campaign_id').size().reset_index(name='total_clicks')

installs_agg = installs_df.groupby('campaign_id').size().reset_index(name='total_installs')

# Merge the aggregated data

merged_df = pd.merge(

clicks_agg,

installs_agg,

on='campaign_id',

how='left' # Use left join to keep all campaigns that had clicks
)

# 2. Handle campaigns with no installs (NaN in total_installs after left join)

merged_df['total_installs'] = merged_df['total_installs'].fillna(0).astype(int)

# 3. Calculate the install-to-click ratio

# Handle division by zero by replacing it with 0 or NaN, depending on desired behavior.

# Using np.where or a simple check for 0 clicks.

merged_df['install_to_click_ratio'] = merged_df.apply(

lambda row: row['total_installs'] / row['total_clicks'] if row['total_clicks'] > 0 else 0,

axis=1

# Or using np.where for potentially better performance on large datasets

# merged_df['install_to_click_ratio'] = np.where(

# merged_df['total_clicks'] > 0,

# merged_df['total_installs'] / merged_df['total_clicks'],

# 0 # Set to 0 if no clicks, or np.nan if you prefer

return merged_df[['campaign_id', 'total_clicks', 'total_installs', 'install_to_click_ratio']]

# --- Example Usage ---

if __name__ == "__main__":

# Sample Clicks DataFrame

clicks_data = {
'campaign_id': ['C1', 'C1', 'C2', 'C3', 'C1', 'C2', 'C4', 'C3', 'C1'],

'click_id': range(101, 110),

'timestamp': pd.to_datetime(['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02',

'2023-01-03', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-05'])

clicks_df = pd.DataFrame(clicks_data)

# Sample Installs DataFrame

installs_data = {

'campaign_id': ['C1', 'C2', 'C1', 'C3', 'C1', 'C2'],

'install_id': range(201, 207),

'timestamp': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-03',

'2023-01-05', '2023-01-05'])

installs_df = pd.DataFrame(installs_data)

print("Clicks DataFrame:")

print(clicks_df)

print("\nInstalls DataFrame:")

print(installs_df)

print("\n" + "="*30 + "\n")

# Calculate the ratio

ratio_df = calculate_install_to_click_ratio(clicks_df, installs_df)

print("Install-to-Click Ratio per Campaign:")

print(ratio_df)
print("\n" + "="*30 + "\n")

# Example with a campaign with clicks but no installs

clicks_data_2 = {

'campaign_id': ['C1', 'C1', 'C5', 'C5'], # C5 has clicks but no installs

'click_id': [1,2,3,4]

installs_data_2 = {

'campaign_id': ['C1', 'C1'],

'install_id': [10,11]

clicks_df_2 = pd.DataFrame(clicks_data_2)

installs_df_2 = pd.DataFrame(installs_data_2)

ratio_df_2 = calculate_install_to_click_ratio(clicks_df_2, installs_df_2)

print("Install-to-Click Ratio (with campaign with no installs):")

print(ratio_df_2)

Explanation:

1. calculate_install_to_click_ratio(clicks_df, installs_df) function:

o Aggregation Before Merge (Optimization): Instead of merging the full

clicks_df and installs_df (which could be very large), we first aggregate them
by campaign_id using
groupby('campaign_id').size().reset_index(name='total_clicks'). This creates
smaller DataFrames (clicks_agg and installs_agg) containing just the
campaign_id and the total count for that campaign. Merging these smaller
DataFrames is much more efficient.
o Merge: pd.merge(clicks_agg, installs_agg, on='campaign_id', how='left')
performs a left join. This ensures that every campaign_id that had clicks is
included in the final result, even if it had zero installs.

o Handle Missing Installs: merged_df['total_installs'].fillna(0).astype(int)

replaces NaN values (which occur for campaigns with clicks but no installs)
with 0 and converts the column to integer type.

o Calculate Ratio:

▪ merged_df.apply(lambda row: row['total_installs'] / row['total_clicks']

if row['total_clicks'] > 0 else 0, axis=1): This calculates the ratio row by
row. It includes a if row['total_clicks'] > 0 else 0 check to prevent
ZeroDivisionError. If total_clicks is 0, the ratio is set to 0. You could
also set it to np.nan if that's more appropriate for your use case.

▪ The commented-out np.where alternative is generally more

performant for very large DataFrames as it's vectorized.

o Return Value: The function returns a DataFrame with the campaign_id, total
clicks, total installs, and the calculated ratio.

2. Example Usage:

o Sample clicks_df and installs_df are created to demonstrate the function.

o The results are printed, showing the counts and the calculated ratios for
each campaign.

o A second example shows how the function handles a campaign that has
clicks but no installs, correctly assigning an install count of 0 and a ratio of 0.

3. How would you use Python to automate a weekly reporting

task that includes querying data, generating a chart, and
emailing it?
Automating this type of task is a classic use case for Python in data analysis. It involves
several key libraries and concepts.

Overall Workflow:
1. Configuration: Store sensitive information (database credentials, email passwords,
recipient lists) securely.

2. Data Querying: Connect to a database (e.g., SQL, PostgreSQL, etc.) and fetch the
necessary data.

3. Data Processing: Use pandas to clean, transform, and aggregate the queried data.

4. Chart Generation: Use a plotting library (e.g., Matplotlib, Seaborn, Plotly) to create
a visual representation of the data.

5. Emailing: Use Python's smtplib and email.mime modules to send the report as an
email with the chart attached.

6. Scheduling: Use operating system tools (Cron on Linux/macOS, Task Scheduler on

Windows) or a Python scheduler (e.g., APScheduler) to run the script automatically
every week.

Python Libraries Used:

• pandas: For data manipulation.

• sqlalchemy / psycopg2 / mysql-connector-python: For connecting to databases

(example below uses a generic SQL connection via sqlalchemy).

• matplotlib.pyplot / seaborn: For plotting.

• smtplib: For sending emails via SMTP.

• email.mime.multipart / email.mime.text / email.mime.image: For constructing

email messages with attachments.

• configparser / python-dotenv: For managing configurations and credentials

(recommended for security).

Example Code Structure:

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

import sqlalchemy # or specific db driver like psycopg2

import smtplib

from email.mime.multipart import MIMEMultipart

from email.mime.text import MIMEText

from email.mime.image import MIMEImage

import datetime

import os

import configparser # For storing credentials securely (basic example)

# --- Configuration (Highly Recommended to use .env or a dedicated config file) ---

# Create a config.ini file:

# [DATABASE]

# DB_TYPE = postgresql

# DB_USER = your_db_user

# DB_PASSWORD = your_db_password

# DB_HOST = localhost

# DB_PORT = 5432

# DB_NAME = your_database

# [EMAIL]

# SENDER_EMAIL = [email protected]

# SENDER_PASSWORD = your_email_app_password # Use app passwords for

Gmail/Outlook

# RECEIVER_EMAIL = [email protected]

# SMTP_SERVER = smtp.gmail.com

# SMTP_PORT = 587 # or 465 for SSL

config = configparser.ConfigParser()

config.read('config.ini') # Make sure config.ini is in the same directory or provide full path
# Database Credentials

DB_TYPE = config['DATABASE']['DB_TYPE']

DB_USER = config['DATABASE']['DB_USER']

DB_PASSWORD = config['DATABASE']['DB_PASSWORD']

DB_HOST = config['DATABASE']['DB_HOST']

DB_PORT = config['DATABASE']['DB_PORT']

DB_NAME = config['DATABASE']['DB_NAME']

# Email Credentials

SENDER_EMAIL = config['EMAIL']['SENDER_EMAIL']

SENDER_PASSWORD = config['EMAIL']['SENDER_PASSWORD']

RECEIVER_EMAIL = config['EMAIL']['RECEIVER_EMAIL']

SMTP_SERVER = config['EMAIL']['SMTP_SERVER']

SMTP_PORT = int(config['EMAIL']['SMTP_PORT'])

def get_data_from_db():

"""

Connects to the database and fetches the required data.

"""

# Example for PostgreSQL connection string

# For MySQL: 'mysql+mysqlconnector://user:password@host:port/database'

# For SQLite: 'sqlite:///your_database.db'

db_connection_str =
f'{DB_TYPE}://{DB_USER}:{DB_PASSWORD}@{DB_HOST}:{DB_PORT}/{DB_NAME}'
try:

engine = sqlalchemy.create_engine(db_connection_str)

# Query data for the last week

# Adjust your SQL query based on your table structure and desired date filtering

query = """

SELECT

campaign_name,

SUM(clicks) AS total_clicks,

SUM(installs) AS total_installs

FROM

your_ad_data_table

WHERE

date_column >= CURRENT_DATE - INTERVAL '7 days' -- Example for PostgreSQL

-- date_column >= DATE('now', '-7 days') -- Example for SQLite

-- date_column >= DATEADD(day, -7, GETDATE()) -- Example for SQL Server

GROUP BY

campaign_name

ORDER BY

total_clicks DESC;

"""

df = pd.read_sql(query, engine)

print("Data queried successfully.")

return df

except Exception as e:

print(f"Error querying data: {e}")

return pd.DataFrame() # Return empty DataFrame on error

def generate_chart(data_df, chart_filename="campaign_performance.png"):

"""

Generates a bar chart from the data and saves it as an image.

"""

if data_df.empty:

print("No data to generate chart.")

return None

# Calculate ratio for plotting

data_df['install_to_click_ratio'] = data_df.apply(

lambda row: row['total_installs'] / row['total_clicks'] if row['total_clicks'] > 0 else 0,

axis=1

plt.figure(figsize=(12, 7))

sns.barplot(x='campaign_name', y='total_clicks', data=data_df, color='skyblue',

label='Total Clicks')

sns.barplot(x='campaign_name', y='total_installs', data=data_df, color='lightcoral',

label='Total Installs')

# Optional: Add a line for ratio if meaningful on the same chart, or create a separate one

# plt.twinx()

# sns.lineplot(x='campaign_name', y='install_to_click_ratio', data=data_df, color='green',

marker='o', label='Install-to-Click Ratio')

# plt.ylabel("Ratio")
plt.title(f"Weekly Campaign Performance - {datetime.date.today().strftime('%Y-%m-
%d')}")

plt.xlabel("Campaign Name")

plt.ylabel("Count")

plt.xticks(rotation=45, ha='right')

plt.legend()

plt.tight_layout()

plt.savefig(chart_filename)

plt.close() # Close the plot to free memory

print(f"Chart saved as {chart_filename}")

return chart_filename

def send_email_report(chart_path, sender_email, sender_password, receiver_email,

smtp_server, smtp_port):

"""

Sends the generated chart as an email attachment.

"""

if not chart_path or not os.path.exists(chart_path):

print("Chart file not found, cannot send email.")

return

msg = MIMEMultipart()

msg['From'] = sender_email

msg['To'] = receiver_email

msg['Subject'] = f"Weekly Campaign Performance Report -

{datetime.date.today().strftime('%Y-%m-%d')}"
# Email body

body = """

<html>

<body>

Hi Team,

Please find attached the weekly campaign performance report.

This report covers data for the last 7 days ending today.

Best regards, Your Reporting Automation Bot

</body>

</html>

"""

msg.attach(MIMEText(body, 'html'))

# Attach the chart image

with open(chart_path, 'rb') as fp:

img = MIMEMultipart('image', 'png') # Correct MIME type for PNG image

img.add_header('Content-Disposition', 'attachment',
filename=os.path.basename(chart_path))

img.add_header('Content-ID', '<my_chart_image>') # CID for embedding in HTML

img.add_header('X-Attachment-Id', 'my_chart_image')

img.set_payload(fp.read())

from email import encoders

encoders.encode_base64(img)

msg.attach(img)
try:

with smtplib.SMTP(smtp_server, smtp_port) as server:

server.starttls() # Secure the connection

server.login(sender_email, sender_password)

server.send_message(msg)

print("Email sent successfully!")

except Exception as e:

print(f"Error sending email: {e}")

# --- Main Automation Function ---

def automate_weekly_report():

print("Starting weekly report automation...")

# 1. Query Data

data_df = get_data_from_db()

if data_df.empty:

print("No data retrieved. Exiting automation.")

return

# 2. Generate Chart

chart_filename = generate_chart(data_df)

# 3. Email Report

if chart_filename:
send_email_report(chart_filename, SENDER_EMAIL, SENDER_PASSWORD,
RECEIVER_EMAIL, SMTP_SERVER, SMTP_PORT)

# Clean up the generated chart file

os.remove(chart_filename)

print(f"Cleaned up {chart_filename}")

else:

print("Chart generation failed, skipping email.")

print("Weekly report automation finished.")

if __name__ == "__main__":

automate_weekly_report()

# To schedule this weekly:

# On Linux/macOS:

# 1. Save the script as e.g., `weekly_report.py`

# 2. Create a cron job: `crontab -e`

# 3. Add a line like (e.g., every Monday at 9 AM):

# `0 9 * * 1 /usr/bin/python3 /path/to/your/weekly_report.py`

# On Windows:

# Use Task Scheduler to create a task that runs the Python script weekly.

# Alternatively, for more complex Python-native scheduling, use APScheduler.

Explanation:

1. Configuration (configparser):

o config.ini: A separate file (config.ini) is used to store database credentials

and email settings. This is much better than hardcoding them directly in the
script, as it keeps sensitive info separate from your code and allows easy
modification.

o Security Note: For production systems, consider more robust secrets

management solutions like environment variables, Azure Key Vault, AWS
Secrets Manager, or Google Secret Manager. An app-specific password (if
your email provider supports it, like Gmail) is better than your main
password.

2. get_data_from_db():

o Database Connection: Uses sqlalchemy.create_engine to establish a

connection to your database. You'd replace the DB_TYPE and connection
string parts to match your specific database (e.g., mysql, postgresql, sqlite).
You might need to install specific drivers like psycopg2 for PostgreSQL or
mysql-connector-python for MySQL.

o SQL Query: Contains a placeholder SQL query to fetch campaign data for
the last 7 days. You'll need to adapt this query to your actual table names,
column names, and date filtering syntax for your database.

o pd.read_sql(): Reads the results of the SQL query directly into a pandas
DataFrame.

o Error Handling: Includes a try-except block to catch database connection or

query errors.

3. generate_chart():

o Data Preparation: Calculates the install_to_click_ratio within this function,

as it's specific to the report's visual.

o Plotting: Uses matplotlib.pyplot and seaborn to create a visually appealing

bar chart.

o Customization: You can customize colors, titles, labels, rotations, and add a
legend for clarity.

o Saving Chart: plt.savefig(chart_filename) saves the generated chart as a

PNG image. plt.close() is important to release memory after saving.

o Return: Returns the filename of the saved chart.

4. send_email_report():
o MIMEMultipart: Creates a multipart email message, allowing you to
combine text (HTML) and attachments.

o MIMEText: Sets the HTML body of the email.

o MIMEImage: Attaches the generated chart as an image. cid:my_chart_image

and Content-ID are used to embed the image directly within the HTML body,
so it appears inline.

o SMTP Connection:

▪ smtplib.SMTP(smtp_server, smtp_port): Connects to the SMTP server

(e.g., smtp.gmail.com for Gmail).

▪ server.starttls(): Initiates Transport Layer Security (TLS) for a secure

connection.

▪ server.login(sender_email, sender_password): Logs into your email

account.

▪ server.send_message(msg): Sends the constructed email.

o Error Handling: Includes a try-except block for email sending errors.

5. automate_weekly_report() (Main Function):

o Orchestrates the entire process: calls get_data_from_db(), then

generate_chart(), and finally send_email_report().

o Includes basic checks and messages for user feedback.

o Cleanup: os.remove(chart_filename) deletes the temporary chart image file

after the email is sent, keeping your system clean.

6. Scheduling (if name == "main":)

o The if name == "main": block ensures automate_weekly_report()

runs when the script is executed directly.

o Comments provide guidance on how to schedule this script using cron (for
Linux/macOS) or Windows Task Scheduler. For more advanced Python-native
scheduling within a long-running application, APScheduler is a good choice.

? PDF - 1000+ SQL Interview Questions & Answers v2
No ratings yet
? PDF - 1000+ SQL Interview Questions & Answers v2
1,261 pages
High Valyrian Dictionary
50% (2)
High Valyrian Dictionary
16 pages
Mysore 50k
No ratings yet
Mysore 50k
1,144 pages
Amazon Data Analyst Interview Questions - 1
No ratings yet
Amazon Data Analyst Interview Questions - 1
22 pages
Yucatec Mayan Dictionary and Phrasebook
100% (5)
Yucatec Mayan Dictionary and Phrasebook
108 pages
Autobiographical Elements in Paradise Lost
80% (5)
Autobiographical Elements in Paradise Lost
3 pages
9 Spotify SQL Interview Questions
No ratings yet
9 Spotify SQL Interview Questions
14 pages
February SQL Questions Compiled
No ratings yet
February SQL Questions Compiled
174 pages
SQL Interview
100% (1)
SQL Interview
68 pages
Selected Letters of Friedrich Nietzsche
80% (5)
Selected Letters of Friedrich Nietzsche
288 pages
Operation Analytics
No ratings yet
Operation Analytics
10 pages
Seekho Assgn1 Raman Jorewal
No ratings yet
Seekho Assgn1 Raman Jorewal
9 pages
ISO 45001-2018-Awareness-Arabic 2021
No ratings yet
ISO 45001-2018-Awareness-Arabic 2021
95 pages
SQL Leetcode Questions
No ratings yet
SQL Leetcode Questions
21 pages
THE ESSENCE OF TRUE Worship
No ratings yet
THE ESSENCE OF TRUE Worship
2 pages
Teradata Advanced SQL
No ratings yet
Teradata Advanced SQL
5 pages
1 Create A Table For Electricity Bill
50% (2)
1 Create A Table For Electricity Bill
18 pages
Interview Question SQL
100% (1)
Interview Question SQL
14 pages
Postgresql Exercises
No ratings yet
Postgresql Exercises
97 pages
RG-1000 User Manual
50% (2)
RG-1000 User Manual
36 pages
Phiếu bài bổ trợ Unit 6 PDF
No ratings yet
Phiếu bài bổ trợ Unit 6 PDF
9 pages
Operation Analytics and Investigating Metric Spike
No ratings yet
Operation Analytics and Investigating Metric Spike
28 pages
Grade 6 Unit 1
No ratings yet
Grade 6 Unit 1
7 pages
Operation Analytics and Investigating Metric Spike
50% (2)
Operation Analytics and Investigating Metric Spike
14 pages
Hackerrank SQL
100% (1)
Hackerrank SQL
19 pages
SQL Interview Questions Day 13-20
No ratings yet
SQL Interview Questions Day 13-20
23 pages
UL2v2 L2 TST RW U03
100% (1)
UL2v2 L2 TST RW U03
5 pages
Day3 Datanalyst
No ratings yet
Day3 Datanalyst
10 pages
SQL Exercises
No ratings yet
SQL Exercises
17 pages
25 SQL Practice Problems With Solutions:: Exercises-5fc791e24082
No ratings yet
25 SQL Practice Problems With Solutions:: Exercises-5fc791e24082
44 pages
SQL 30 Days
No ratings yet
SQL 30 Days
23 pages
Final Assessment
No ratings yet
Final Assessment
11 pages
Project 3 - Operation Analytics
No ratings yet
Project 3 - Operation Analytics
23 pages
CREATE DATABASE IF NOT EXISTS Operational
No ratings yet
CREATE DATABASE IF NOT EXISTS Operational
13 pages
PRJCT3 CS2
No ratings yet
PRJCT3 CS2
15 pages
Project 3
No ratings yet
Project 3
16 pages
SQL Subsquery and Temporary Tables
No ratings yet
SQL Subsquery and Temporary Tables
14 pages
Assignment: Case Study - 1: Operation Analytics
59% (27)
Assignment: Case Study - 1: Operation Analytics
5 pages
Case Study o Aims
No ratings yet
Case Study o Aims
26 pages
SQL Window Functions 2
No ratings yet
SQL Window Functions 2
9 pages
Operations Analytics and Metrics Spike - 20241115 - 124811 - 0000
No ratings yet
Operations Analytics and Metrics Spike - 20241115 - 124811 - 0000
11 pages
SQL Notes
No ratings yet
SQL Notes
24 pages
R Rec M.493 15 201901 I!!pdf e
No ratings yet
R Rec M.493 15 201901 I!!pdf e
64 pages
The Best Medium-Hard Data Analyst SQL Interview Questions: Background & Motivation
No ratings yet
The Best Medium-Hard Data Analyst SQL Interview Questions: Background & Motivation
17 pages
SQL Techniques
No ratings yet
SQL Techniques
37 pages
SQL Interview Prep Questions Repository-1
No ratings yet
SQL Interview Prep Questions Repository-1
23 pages
Operation Analytics and Investigating Metric Spike Analysis
No ratings yet
Operation Analytics and Investigating Metric Spike Analysis
6 pages
50 SQL Interview Questions With Answers
No ratings yet
50 SQL Interview Questions With Answers
15 pages
Operation Analytics and Investigating Metric Spike
No ratings yet
Operation Analytics and Investigating Metric Spike
20 pages
SQL Sample Questions
No ratings yet
SQL Sample Questions
7 pages
Operations and Metric Analytics - Case Study
No ratings yet
Operations and Metric Analytics - Case Study
17 pages
Short Stories in Teaching English
100% (1)
Short Stories in Teaching English
32 pages
Kusto Query Language
No ratings yet
Kusto Query Language
5 pages
SQL Answers
No ratings yet
SQL Answers
7 pages
Operation Analytics and Investigating Metric Spike
No ratings yet
Operation Analytics and Investigating Metric Spike
10 pages
SQL Interview Q&A
No ratings yet
SQL Interview Q&A
18 pages
Questions For Preparation
No ratings yet
Questions For Preparation
9 pages
Complex SQL Queries
No ratings yet
Complex SQL Queries
43 pages
SQL Hard Interview Question
No ratings yet
SQL Hard Interview Question
4 pages
Mod - 2 - 2.3 The Interactional View of Language
No ratings yet
Mod - 2 - 2.3 The Interactional View of Language
3 pages
Operational Analytics and Investigating Metric Spike
No ratings yet
Operational Analytics and Investigating Metric Spike
9 pages
Operation Analytics and Investigating Metric Spike
No ratings yet
Operation Analytics and Investigating Metric Spike
26 pages
Average in Query
No ratings yet
Average in Query
8 pages
Metric Spike.
No ratings yet
Metric Spike.
11 pages
SQL Questions
No ratings yet
SQL Questions
4 pages
Data and Business Analytics Interview Questions
No ratings yet
Data and Business Analytics Interview Questions
54 pages
Yelpdatacoursera
No ratings yet
Yelpdatacoursera
11 pages
Operation and Metric Analytics: By-S Rahul
No ratings yet
Operation and Metric Analytics: By-S Rahul
32 pages
D.A.V Public School: A Project Report On Hotel Management System
No ratings yet
D.A.V Public School: A Project Report On Hotel Management System
60 pages
Sandhya's Seminar.
No ratings yet
Sandhya's Seminar.
30 pages
Mitsubishi Electric Modular PLC Family Catalog
No ratings yet
Mitsubishi Electric Modular PLC Family Catalog
152 pages
Most Common SQL Questions Asked at Netflix For Data Analysts
No ratings yet
Most Common SQL Questions Asked at Netflix For Data Analysts
2 pages
SQL Practise
No ratings yet
SQL Practise
4 pages
Configurable Task List Continuation To httpscnsapcomdocsDOC-43459
No ratings yet
Configurable Task List Continuation To httpscnsapcomdocsDOC-43459
7 pages
How To Add or Subtract Time Units - StrataScratch 2
No ratings yet
How To Add or Subtract Time Units - StrataScratch 2
1 page
SQL Practice
No ratings yet
SQL Practice
3 pages
Unruly Examples On The Rhetoric of Exemplarity Alexander Gelley Download
No ratings yet
Unruly Examples On The Rhetoric of Exemplarity Alexander Gelley Download
82 pages
Pathways Foundations Scoop and Sequence
No ratings yet
Pathways Foundations Scoop and Sequence
2 pages
1 - Memoria - TFM - PokeAI
No ratings yet
1 - Memoria - TFM - PokeAI
78 pages
q1 Computer Systems Servicing 11 Module 3w2 3
No ratings yet
q1 Computer Systems Servicing 11 Module 3w2 3
30 pages
Real Men Do Wear Mascara, Advertising Discourse and Masculline Identity
No ratings yet
Real Men Do Wear Mascara, Advertising Discourse and Masculline Identity
21 pages
17 - Works Cited
No ratings yet
17 - Works Cited
17 pages
PS2 Controller
No ratings yet
PS2 Controller
8 pages
All-In-One Mock Dse
No ratings yet
All-In-One Mock Dse
14 pages
Unit 1 - Lesson 2.1 - Vocabulary & Reading - Page 9
No ratings yet
Unit 1 - Lesson 2.1 - Vocabulary & Reading - Page 9
23 pages
Brochure DGN75S
No ratings yet
Brochure DGN75S
6 pages
Wifi Connects Only Once
No ratings yet
Wifi Connects Only Once
6 pages
Digital Media
No ratings yet
Digital Media
2 pages
The Applied SQL Data Analytics Workshop - Second Edition: Develop your practical skills and prepare to become a professional data analyst, 2nd Edition
From Everand
The Applied SQL Data Analytics Workshop - Second Edition: Develop your practical skills and prepare to become a professional data analyst, 2nd Edition
Matt Goldwasser
No ratings yet
Data Mining with Microsoft SQL Server 2008
From Everand
Data Mining with Microsoft SQL Server 2008
Jamie MacLennan
4/5 (1)