0% found this document useful (0 votes)
98 views10 pages

Walmart Data Analyst Interview Experience

The document outlines interview questions and answers for a Walmart Data Analyst position, covering topics in Python, Power BI, and SQL. It includes practical coding examples for data manipulation, as well as theoretical explanations of key concepts like data structures, report types, and security measures. Additionally, it discusses the differences between various data handling techniques and tools, providing insights into best practices for data analysis.

Uploaded by

mukesh kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views10 pages

Walmart Data Analyst Interview Experience

The document outlines interview questions and answers for a Walmart Data Analyst position, covering topics in Python, Power BI, and SQL. It includes practical coding examples for data manipulation, as well as theoretical explanations of key concepts like data structures, report types, and security measures. Additionally, it discusses the differences between various data handling techniques and tools, providing insights into best practices for data analysis.

Uploaded by

mukesh kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Walmart Data Analyst

Interview Experience (1-3


Years):- CTC - 18 LPA
Python

1- Write a Python script to identify unique values in a list and


count their occurrences.
Theoretical Explanation

This question tests your understanding of Python data structures like sets and dictionaries:

• Sets: Used to store unique elements. It eliminates duplicates.

• Dictionaries: Key-value pairs are used to store occurrences of each unique element
efficiently.
Using these, you can identify unique elements and count their occurrences.

Code

# Sample list

data = [1, 2, 2, 3, 4, 4, 4, 5]

# Using set to find unique values

unique_values = set(data)

# Using dictionary to count occurrences


occurrences = {value: data.count(value) for value in unique_values}

print("Unique Values:", unique_values)

print("Occurrences:", occurrences)

Output:

Unique Values: {1, 2, 3, 4, 5}

Occurrences: {1: 1, 2: 2, 3: 1, 4: 3, 5: 1}

2. How would you use pandas to merge two datasets and


calculate total sales for products with valid promotions?
Theoretical Explanation

• Merging Datasets: Use pandas.merge() to combine two datasets based on a


common key (e.g., product_id).

• Filtering Promotions: Filter rows where promotions are valid.

• Grouping and Aggregation: Use groupby() to group data by products and calculate
total sales using aggregation functions like sum().

Code

import pandas as pd

# Sample datasets

products = pd.DataFrame({

'product_id': [101, 102, 103, 104],

'product_name': ['A', 'B', 'C', 'D']

})
sales = pd.DataFrame({

'product_id': [101, 102, 102, 103, 104],

'sales': [200, 150, 100, 300, 50],

'promotion_valid': [True, True, False, False, True]

})

# Merge datasets on product_id

merged_data = pd.merge(products, sales, on='product_id')

# Filter for valid promotions

valid_promotions = merged_data[merged_data['promotion_valid']]

# Group by product_name and calculate total sales

total_sales = valid_promotions.groupby('product_name')['sales'].sum()

print("Total Sales for Valid Promotions:")

print(total_sales)

Output:

Total Sales for Valid Promotions:

product_name

A 200

B 150

D 50

3. Differences Between Lists, Tuples, Sets, and Dictionaries


Theoretical Explanation
• Lists: Ordered, mutable, allows duplicates. Suitable for sequential data and
iteration.

• Tuples: Ordered, immutable, allows duplicates. Used for fixed collections of items.

• Sets: Unordered, mutable, no duplicates. Ideal for membership tests and unique
element extraction.

• Dictionaries: Unordered, mutable, key-value pairs. Excellent for fast lookups and
association of data.

Feature List Tuple Set Dictionary

Ordered Yes Yes No No

Mutable Yes No Yes Yes

Allows Duplicates Yes Yes No Keys: No, Values: Yes

Use Case Iteration Fixed Data Unique Data Key-Value Mapping

Code

# List

my_list = [1, 2, 3, 3]

print("List:", my_list)

# Tuple

my_tuple = (1, 2, 3, 3)

print("Tuple:", my_tuple)

# Set

my_set = {1, 2, 3, 3}

print("Set (No Duplicates):", my_set)

# Dictionary
my_dict = {'a': 1, 'b': 2, 'c': 3}

print("Dictionary:", my_dict)

Output:

css

CopyEdit

List: [1, 2, 3, 3]

Tuple: (1, 2, 3, 3)

Set (No Duplicates): {1, 2, 3}

Dictionary: {'a': 1, 'b': 2, 'c': 3}

POWER BI

1. Difference Between Import and Direct Query Modes


Theoretical Explanation

• Import Mode:

o Data is imported into Power BI's in-memory model, offering faster


performance.

o The report becomes static and doesn’t reflect real-time changes in the
source unless refreshed.

o Suitable for small to medium datasets.

• Direct Query Mode:

o Data stays in the source system, and queries are sent to fetch data as
needed.

o Enables real-time data visualization but may be slower due to dependency


on the source system's performance.

o Suitable for large datasets or when real-time updates are critical.


When to Choose:
For large datasets, use Direct Query to avoid importing and storing massive data into
Power BI. However, it may impact performance, so ensure the data source is optimized for
query execution.

2. Slicers vs Visual-Level Filters


Theoretical Explanation

• Slicers:

o Interactive visuals that allow users to filter data directly on the dashboard.

o They are visible to users and improve interactivity.

o Example: A slicer for "Year" allows selecting specific years to filter all linked
visuals.

• Visual-Level Filters:

o Filters applied to specific visuals rather than the entire page or report.

o Not interactive for end-users but provide control over what data is displayed
in a specific visual.

o Example: A filter applied to a bar chart to display only sales > $10,000.

Impact:
Slicers enhance user interactivity, allowing dynamic filtering, while visual-level filters
provide static control for specific visuals.

3. Row-Level Security (RLS)


Theoretical Explanation

• RLS restricts data access based on roles, ensuring that users or groups see only the
data they are authorized to view.

• Implementation Steps:
1. Define roles in Power BI Desktop: Use DAX expressions to filter data based on
user criteria (e.g., Region = "North").

2. Assign roles in Power BI Service: Map users/groups to the defined roles.

3. Validate: Test roles in Power BI Desktop by simulating different users.

Example:
To restrict regional managers to see only their respective region's data, create a role with a
DAX filter:

[Region] = USERPRINCIPALNAME()

Then assign regional managers to this role in the Power BI Service.

4. What is a Paginated Report and When to Use It?


Theoretical Explanation

• Paginated Reports:

o Pixel-perfect reports designed for printing or exporting.

o Data is displayed across multiple pages, with precise control over layout.

o Suitable for reports like invoices, billing statements, or regulatory reports


where exact formatting is crucial.

• When to Use:

o When you need formatted, printable outputs that may span multiple pages.

o When exporting reports to formats like PDF or Word is essential.

o For operational reports with detailed rows of data.

Example: A paginated report would be ideal for generating monthly sales invoices for a
large number of customers.

SQL
1. Find the Second-Highest Salary in a Department
Theoretical Explanation
• ROW_NUMBER(): Assigns a unique sequential number to each row within a
partition of data.

• DENSE_RANK(): Assigns ranks to rows in a partition, but ties receive the same rank.
There are no gaps in ranks.

To find the second-highest salary in each department, partition data by department_id and
order salaries in descending order, then filter for rank = 2.

Query Using DENSE_RANK()

WITH RankedSalaries AS (

SELECT

department_id,

employee_id,

salary,

DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank

FROM employees

SELECT department_id, employee_id, salary

FROM RankedSalaries

WHERE rank = 2;

2. Calculate Total Transactions Per User for Each Day


Theoretical Explanation

To calculate daily transaction counts for each user:

• Use GROUP BY to group data by user_id and transaction_date.

• Use COUNT() to count the transactions for each group.

Query

SELECT

user_id,
transaction_date,

COUNT(*) AS total_transactions

FROM transactions

GROUP BY user_id, transaction_date

ORDER BY user_id, transaction_date;

3. Select Projects with the Highest Budget-Per-Employee Ratio


Theoretical Explanation

This involves:

1. Joining the projects table with the employees table to calculate the number of
employees per project.

2. Calculating the budget-per-employee ratio for each project.

3. Finding the project(s) with the highest ratio.

Assume Tables

• projects(project_id, budget)

• employees(employee_id, project_id)

Query

WITH ProjectEmployeeCount AS (

SELECT

p.project_id,

p.budget,

COUNT(e.employee_id) AS total_employees

FROM projects p

LEFT JOIN employees e ON p.project_id = e.project_id

GROUP BY p.project_id, p.budget

),
BudgetRatio AS (

SELECT

project_id,

budget,

total_employees,

CASE

WHEN total_employees > 0 THEN budget / total_employees

ELSE 0

END AS budget_per_employee

FROM ProjectEmployeeCount

SELECT project_id, budget, total_employees, budget_per_employee

FROM BudgetRatio

WHERE budget_per_employee = (SELECT MAX(budget_per_employee) FROM BudgetRatio);

You might also like