DA-Interview Go Through
DA-Interview Go Through
def my_decorator(func):
def wrapper():
print("Something is happening before the function is
called.")
func()
print("Something is happening after the function is
called.")
return wrapper
@my_decorator
def say_hello():
print("Hello!")
say_hello()
• How does exception handling work in Python? Can you give an example using try-
except blocks?
Reading file:
# Open the file in read mode ('r')
with open('example.txt', 'r') as file:
# Read the entire content of the file
content = file.read()
print(content)
writing file:
# Open the file in write mode ('w')
with open('example.txt', 'w') as file:
# Write content to the file
file.write('This is a sample text.\n')
file.write('Python is awesome!')
• Can you explain the use of the 'self' keyword in Python classes?
• What are generators in Python, and how are they different from iterators?
• How do you implement multithreading or multiprocessing in Python, and in what
scenarios would you use them?
List:
Array:
def reverse_list(arr):
# Get the length of the list
n = len(arr)
Pandas?
• How do you handle outliers in your data?
preprocessing?
• How do you deal with missing data?
• Could you elaborate on how to use iloc and loc for data
selection in a DataFrame?
• Can you clarify the distinctions between the merge and join methods in Pandas?
applications?
• How do you perform sorting in Pandas based on a specific column?
• Can you explain the usage of the sort_values method in this context?
• PIVOT in python pandas
• GROUP BY - AGGREGATION
• JOINS
• MERGE
• CONCAT
• WHERE - FILTERING
• SORT
• your Python solution, you were required to merge three tables. Can you explain the
process you followed?
• Why did you choose to use merge instead of concat in your Python solution? Can you
explain how concat differs from merge?
• What is the axis parameter in the concat function, and how does it affect the
concatenation process?
• As a final request, could you send us an email detailing your past projects, focusing on
their key aspects and what you learned from them?
• Parameter used in the split function
• How to import an excel workbook in python which consists of multiple sheets. How to
merge these sheets together?
• How can you read 2 different excel files in Python?
SCRAPPING
• What is beautifulsoup and request library
• What is cursor and why connection is required
• If login is required, Would you able to scrap data without loging in
• What are security features and why is it required
• How would you approach the task of extracting all data related to Hard-disks from the
Flipkart website? Could you outline the steps or provide a sample code?
• Could you describe how you would scrape data from a website for a project and provide
a sample code?
ML
• Can you explain what clustering is?
unsupervised learning
• How do you handle overfitting in a machine learning
model?
• Describe a situation where you would use a random forest
networks?
• How would you detect fraudulent transactions in a large
dataset?
• What is the difference between a decision tree and a
random forest?
• What is the k-nearest neighbors algorithm?
• What is the difference between bias and variance?
• How do you evaluate the performance of a machine learning model?
• What are the different types of dimensionality reduction techniques?
• What is the curse of dimensionality?
• What are the different types of machine learning algorithms?
• Coefficient
• Explain the concept of overfitting in machine learning and ways to prevent it.
• How would you handle skewed data when building a machine learning model?
• Explain the principle of a decision tree algorithm.
Excel
• Excel - Application of MID Function
• https://docs.google.com/spreadsheets/d/1Zw-dMdiH-
VjlJyQdodXuSNIxQkKks3cN/edit?
usp=sharing&ouid=113468865657991512043&rtpof=true&sd
=true
• You have 2 tables - customers and orders - how will you find the list of customers who
did not place any order?
• Give the list of customers who ordered more than once.
• calculate the distinct order IDs of all those customers who ordered Shilajit
• Previous MIS reports were shared - asked to apply formulas SUMIFS, COUNTIFS,
VLOOKUP, AVG, SUM, IFs, Logical Functions, Cell Referencing Further asked
about course, past studies and other background related questions
SUMIFS(RANGE_SUM ,
CRITERIA_RANGE,CRITERIA,.CRITERIA_RANGE_
N,CREITERIAN)
COUNTIFS(CRITERIA_RANGE,CRITERIA,…..CRIT
ERIA_RANGEn,CRITERIAn)
VLOOKUP(VALUE,TABLE_RANGE,COLUMN_NO,
MATCH_TYPE)
AVERAGE(RANGE)
SUM(RANGE)
IFS(CONDITION1,V_TRUE1,…..,F)
Logical Function- and(),or(),not(),if()
Cell Referencing à
• Absolute – if we have fixed one cell value and we
want to use that cell value with a row formula we
can use that cell or press f4 on that cell value to
fix those value or we can add $ symbol to the
value
• Relative – it is simple just right formula and
extend to each row the formula will be affected
for each row
SQL:
• Import
• SQL (Relational):
o Data is organized in tables with rows and columns.
o Tables are linked together through predefined
relationships, ensuring data consistency.
o Follows a rigid schema, meaning the structure of the
data is defined upfront.
• NoSQL (Non-Relational):
o Offers more flexible data structures. Data can be
SELECT ROUND(my_numeric_column, 2) AS
rounded_value
FROM my_table;
• Convert where column to decimal and then calculate percentage
SELECT
CAST(amount AS DECIMAL) / total * 100 AS
percentage
FROM my_table;
• CONCATENATE the % sign
WITH EmployeeCounts AS (
SELECT department_id, COUNT(*) AS
employee_count
FROM employees
GROUP BY department_id
)
SELECT * FROM EmployeeCounts;
• Give count from sub query which consisted of order by - order by had to be removed
With order by
SELECT *
FROM (
SELECT column1, column2
FROM your_table
ORDER BY column1
) AS subquery;
Without order by
SELECT COUNT(*)
FROM (
SELECT column1, column2
FROM your_table
) AS subquery;
• SELF JOIN query - Analytical skills and SQL knowledge test
indexing.
2. Foreign Key Constraint:
• Foreign keys establish relationships between tables,
domain-specific requirements.
5. Default Constraint:
• Default constraints provide default values for
data consistency.
Difference between Primary Key and Foreign Key:
• Primary Key:
table.
• Prevents duplicates and null values within its
column(s).
• Essential for indexing and establishing relationships
Syntax:
EG:
SELECT
employee_id,
salary,
AVG(salary) OVER (PARTITION BY department_id
ORDER BY hire_date ROWS BETWEEN 1
PRECEDING AND 1 FOLLOWING) AS avg_salary
FROM
employees;
• What is the difference between a GROUP BY statement and a HAVING clause?
Group by:
The GROUP BY statement is used to group rows that
have the same values into summary rows, typically to
perform aggregate functions on each group.
The GROUP BY clause is applied before the result set is
aggregated, so it determines the grouping of rows before
any filtering is applied.
Having:
The HAVING clause is used to filter groups of rows
based on specified conditions after the GROUP BY
operation has been performed.
Conditions specified in the HAVING clause are
evaluated after the GROUP BY operation and can
include aggregate functions.
It is commonly used to apply conditions to aggregated
data, such as filtering groups with a certain minimum or
maximum value.
We can you use limit clause where we can use limit and
the limit value for the number of value we want
EG:
Select * from order limit 5;
It will only return the 5 rows
By using limit we can perform operation such as top 10
orders ordered by customer or top 5 employee based on
salary
• How do you calculate the average of a column?
procedure?
To write a stored procedure in SQL, you can use the
CREATE PROCEDURE statement followed by the
procedure name, parameters (if any), and the SQL
statements that define the procedure's functionality.
Here's a basic example:
In this example:
SELECT Employees.Name,
Departments.DepartmentName
FROM Employees
INNER JOIN Departments ON
Employees.DepartmentID = Departments.DepartmentID;
Retrieve The Department who have employee
• In SQL, when would you use "WHERE" versus "HAVING"?
DROP:
DROP is a command used to remove an entire table,
view, index, or database object from the database
schema.
When you DROP a table, all the data, indexes, and
privileges associated with that table are permanently
removed from the database.
It's important to note that DROP is a DDL (Data
Definition Language) command, and it cannot be rolled
back. Once you drop an object, it's gone.
TRUNCATE:
TRUNCATE is a command used to remove all rows from
a table quickly and efficiently, but it does not remove the
table structure.
TRUNCATE is faster than DELETE because it does not
generate individual delete operations for each row.
Instead, it deallocates the data pages of the table,
effectively removing all rows at once.
TRUNCATE is also a DDL command, and like DROP, it
cannot be rolled back.
DELETE:
DELETE is a command used to remove one or more
rows from a table based on a condition.
Unlike TRUNCATE, DELETE removes specific rows
from the table, allowing you to specify filtering criteria
using a WHERE clause.
DELETE is slower than TRUNCATE because it
generates individual delete operations for each row that
matches the condition.
DELETE is a DML (Data Manipulation Language)
command, and it can be rolled back using a transaction if
it's executed within a transaction block.
• What are database triggers and can you list their types?
Eg:
SELECT region,
product,
amount,
SUM(amount) OVER (PARTITION BY region)
AS total_sales_region,
AVG(amount) OVER (PARTITION BY region,
product) AS avg_sales_product_in_region
FROM
sales
GROUP BY
region, product, amount;
• Under
what circumstances would you use LIMIT, and when would you opt for
OFFSET?
Eg:
-- Retrieve the next 5 student records (for the third page)
SELECT *
FROM students
LIMIT 5 OFFSET 10;
• Canyou write a SQL query to find the maximum salary in each department and then
rank these maximum salaries sequentially?
WITH MaxSalaries AS (
SELECT
department,
MAX(salary) AS max_salary
FROM
employees
GROUP BY
department
)
SELECT
department,
max_salary,
ROW_NUMBER() OVER (ORDER BY max_salary
DESC) AS salary_rank
FROM
MaxSalaries;
• Different types of Databases?
Different types of joins are inner join, left join, right join,
cross join , self join
• What are data pipelines?
Data pipelines are a series of processes that extract,
transform, and load (ETL) data from various sources into
a destination system or database. They are used to
automate the flow of data between different systems,
applications, or databases, ensuring that data is
efficiently collected, processed, and made available for
analysis or use.
Purpose:
Star Schema:
In a star schema, data is organized into a central fact
table surrounded by dimension tables.
The fact table contains numerical measures or metrics,
often related to business transactions or events.
Dimension tables contain descriptive attributes or
dimensions that provide context to the measures in the
fact table.
Each dimension table is connected to the fact table
through foreign key relationships.
Star schemas are denormalized, meaning that dimension
tables are typically in a fully normalized form, and
redundant data is intentionally introduced for
performance optimization.
Star schemas are well-suited for query performance and
simplicity, making them popular in data warehousing
environments.
Snowflake Schema:
A snowflake schema is an extension of the star schema,
where dimension tables are normalized into multiple
related tables.
Unlike the star schema, where dimension tables are
denormalized, in a snowflake schema, dimension tables
may have multiple levels of normalization, resembling a
snowflake's shape.
Normalization reduces data redundancy and can improve
data integrity, but it can also lead to more complex
queries and potentially slower performance compared to
star schemas.
Snowflake schemas are useful when there are strict
requirements for data integrity and when storage space
needs to be optimized.
• What is Normalization in SQL Databases? and why is it important?
• Print the 2nd highest salary when the 2nd and 3rd salary is
same.
WITH RankedSalaries AS (
SELECT
Salary,
DENSE_RANK() OVER (ORDER BY Salary DESC)
AS SalaryRank
FROM
YourTableName
)
SELECT DISTINCT Salary
FROM RankedSalaries
WHERE SalaryRank = 2
• What are different types of relations in Database? Brief with example for each relation
SELECT
Month,
Revenue,
LAG(Revenue) OVER (ORDER BY Month) AS
PreviousRevenue,
Revenue - LAG(Revenue) OVER (ORDER BY
Month) AS RevenueDifference
FROM
Sales;
This query retrieves the revenue for each month,
along with the revenue from the previous month and
the difference in revenue between the current month
and the previous month. The LAG() function is used
to fetch the revenue from the previous row, ordered
by the Month column.
• LEAD
SELECT
Month,
Revenue,
LEAD(Revenue) OVER (ORDER BY Month) AS
NextMonthRevenue,
LEAD(Revenue) OVER (ORDER BY Month) -
Revenue AS RevenueDifference
FROM
Sales;
This query retrieves the revenue for each month, along
with the revenue from the subsequent month and the
difference in revenue between the subsequent month and
the current month. The LEAD() function is used to fetch
the revenue from the next row, ordered by the Month
column.
• CUMULATIVE SUM, AVERAGE
Cummalative sum :
SELECT
Month,
Revenue,
SUM(Revenue) OVER (ORDER BY Month) AS
cumulative_revenue
FROM
Sales;
Cummalative average:
SELECT
Month,
Revenue,
AVG(Revenue) OVER (ORDER BY Month) AS
cumulative_avg_revenue
FROM
Sales;
• Difference Between RDBMS and DBMS: "How do you differentiate between a
Relational Database Management System (RDBMS) and a Database Management
System (DBMS)? Can you provide examples of each?"
View:
A view acts as a virtual table derived from one or more
underlying tables.
It presents a structured subset of data and does not store
any data itself.
Views are primarily used to simplify data access, hide
sensitive information, and present data in a predefined
format.
They offer a way to create a reusable and simplified
representation of complex data relationships.
Stored Procedure:
A stored procedure is a precompiled set of SQL
statements stored in the database.
It can accept input parameters, perform operations on
data, and return results.
Stored procedures are often used to encapsulate business
logic, implement data manipulation operations, and
automate repetitive tasks.
They enable developers to execute complex logic on the
database server, reducing network traffic and improving
performance.
• Uses of NoSQL: "What are the primary uses of NoSQL databases, and in what
situations would you recommend a NoSQL database over a traditional SQL
database?"
WITH RankedSalaries AS (
SELECT
Salary,
DENSE_RANK() OVER (ORDER BY Salary DESC)
AS SalaryRank
FROM
YourTableName
)
SELECT DISTINCT Salary
FROM RankedSalaries
WHERE SalaryRank = 2
• How would you identify the names of employees whose
salaries are greater than the average salary, and can you
demonstrate this using a sub-query in SQL?
SELECT employee_name
FROM employees
WHERE salary > (
SELECT AVG(salary)
FROM employees
);
• What is a transaction in SQL? How are ACID properties maintained?
WITH cte AS (
SELECT * FROM Table1
UNION ALL
SELECT * FROM Table2
)
SELECT * FROM cte
GROUP BY column1, column2
HAVING COUNT(*) > 1
• https://masai-school.notion.site/SQL-Test3916b80f6d924396bf7dcbb63abf2ba3?pvs=25
POWER BI:
• Youare given a dataset with sales data. How would you forecast sales for the next
month?
• A streaming service wants to build a recommendation system. How would you approach
this?
doughnut chart?
• Can you unpivot the columns in this table?
• Create a DAX function using the Logical question given.
• Create a DAX Function to find out subtotal of a column.
• Types of charts used and when should the respective charts have used.
• Data Cleaning, Data handling.
ETL:
• What is ETL?
• What are the main advantages of using cloud computing in data processing?
• Describe the differences between a data warehouse and a data lake.
• What are the steps involved in the ETL process?
• What are the different types of ETL tools?
• What are the benefits of using an ETL tool?
• What are the challenges of using an ETL tool?
• How do you choose the right ETL tool for your needs?
• How do you design an ETL process?
• How do you implement an ETL process?
• How do you monitor an ETL process?
• How do you troubleshoot an ETL process?
• How do you maintain an ETL process?
• What are the best practices for ETL?
• What are the security considerations for ETL?
• What are the compliance considerations for ETL?
• What are the ethical considerations for ETL?
• How does ETL relate to data warehousing?
• How does ETL relate to data mining?
• How does ETL relate to data visualization
• What are the future trends in ETL?
• What are the challenges of ETL in the cloud?
• What is ETL?
• What are the steps involved in the ETL process?
• What are the different types of ETL tools?
• What are the benefits of using an ETL tool?
• What are the challenges of using an ETL tool?
• How do you choose the right ETL tool for your needs?
• How do you design an ETL process?
• How do you implement an ETL process?
• How do you monitor an ETL process?
• How do you troubleshoot an ETL process?
• How do you maintain an ETL process?
• What are the best practices for ETL?
• What are the security considerations for ETL?
• What are the compliance considerations for ETL?
• What are the ethical considerations for ETL?
• How does ETL relate to data warehousing?
• How does ETL relate to data mining?
• How does ETL relate to data visualization?
• What are the future trends in ETL?
• What are the challenges of ETL in the cloud?
PROGRAMMING CODE:
• Write a function to compute the factorial of a number.
• Write a function to create a queue and give a size of the
queue
• Write a function to print all numbers less than a given
number.
• Question on time complexity of student’s code
• In Python, how would you write a program where the sum of the elements on the left
side of an array equals the sum on the right side?
• How would you use a dictionary in Python to count the frequency of each integer in an
array?
• Using an online Python compiler, can you demonstrate how to split a string into two,
reverse each part, and then merge them?
• In Python, how would you identify the elements with the minimum number of
repetitions in a list? Can you do this using binary sorting or a stack?
• How would you write a Python program to check if brackets in an input string are
balanced?
• How would you find the first Equilibrium Point in an array, where the point is defined
as a position such that the sum of elements before it equals the sum of elements after
it? Please write a Python function to return the index of this point.
• Can you write a Python program that checks if the brackets in a given input string are
balanced?
• Define a Python function to determine whether a given number is prime.
• Using Pandas, how would you write code to find the total number of educated people in
each country?
• Using Pandas again, can you write a query to calculate the average monthly income for
each country, segmented by education level?
• Can you explain how to find the first Equilibrium Point in an array of n positive
numbers?
• How would you convert a string in Roman numeral format to an integer?
• What approach would you use to sort an array containing only 0s, 1s, and 2s in
ascending order?
• How would you create a Python database to find the names of customers from India
who are older than 30?
• How can one create a Dashboard in Python?
• (Take two table withid and name for Students and Teachers)Prime number detection,
Binary search à python code
• Bracket Combinations -
https://coderbyte.com/information/Bracket Combinations
• Bracket Matcher -
https://coderbyte.com/information/Bracket Matcher
• Codeland Username Validation -
https://coderbyte.com/information/Bracket Matcher
• Find Intersection - https://coderbyte.com/information/Find
Intersection
• Question Marks - https://coderbyte.com/information/Find
Intersection
• Find Reverse - https://coderbyte.com/information/Find
Intersection
• First Factorial - https://coderbyte.com/information/Find
Intersection
• Longest Word -
https://coderbyte.com/information/Longest Word
• How to bring JSON file to structure and convert to CSV?
import json
import pandas as pd
# Step 1: Read the JSON file
with open('data.json') as f:
data = json.load(f)
# Step 2: Normalize JSON data (if needed)
# Example: If 'data' is a list of dictionaries, you can
directly convert it to a DataFrame
# If 'data' is a nested JSON structure, you may need
to normalize it using pandas.json_normalize()
PROBLEM SOLVING:
• How would you estimate the number of cars in Delhi?
• Describe a time when you had to analyze data to make a decision. What was the
outcome?
• How do you prioritize when given multiple tasks related to shipment on-time
performance and SLA compliance?
• How would you approach a case where you have to determine the toll cost for a
particular route?
• Describe a situation where you had to solve a problem using quantitative reasoning.
• How would you determine the ratio of diabetes patients living in Chandigarh?
• Guesstimate Challenge: Estimate the opening week revenue generated by the movie
"JAWAN" in Delhi. Consider the data for Friday, Saturday, and Sunday. Please walk
us through your thought process.
• Guesstimate Challenge 2: Assume "JAWAN" opened on a holiday Friday in
Bangalore. Can you guesstimate its opening weekend revenue? Describe your
approach.
• Number of iPhone users in India
• 25 horses - find top 3 horses by organizing races - each race can only accommodate 5
horses - find min number of races required to find the top 3 horses.
• What additional features can you add to the Zomato app to improve customer
experience
• If someone is ordering from Zomato - at which point can he just dropout from the app
and not order anything?
• The Zomato app is opened 10 thousand times and every 4th opening of the app is
converting to an order. Average login hours of the delivery partners is 30 hours per
week - and they are delivering 2 orders every hour. How many delivery partners are
required to complete the 2500 orders that are converting from the 10,000 app
openings.
• What is the minimum number of cuts needed to divide a cake into 8 equal slices?
• What approach do you take when dealing with missing or inconsistent data in financial
reports?
• What was a significant error you identified in a financial analysis, and how did you
handle it?
• What methods do you use for forecasting financial trends?
• What criteria do you use to prioritize tasks when working on multiple financial
analyses?
• What is an example of a business problem you solved using data?
• Explain the Nifty Fifty Stock Price Prediction
o Cross Questioning
o How did you perform data cleaning?
o What evaluation metrices did you use to test the accuracy of the models?
o RMSE
o Could you have used an alternative of Linear Regression or Polynomial
Regression?
• Question from mandeline
Scenario 1: Keep a Month's Stock (28 days) in FBA
1. Calculate the total volume for each SKU.
2. Convert the total volume to total pallets.
3. Round up to the nearest whole pallet.
4. Calculate the storage cost for each SKU (rounded-up pallets * Cost Per Pallet).
5. Sum up the storage costs for all SKUs.
6. Since the stock is held for 28 days or more, no understocked fee is applied.
Scenario 2: Keep 20 Days' Stock in FBA
1. Calculate the total volume for each SKU.
2. Convert the total volume to total pallets.
3. Round up to the nearest whole pallet.
4. Calculate the storage cost for each SKU (rounded-up pallets * Cost Per Pallet).
5. If the stock is held for less than 28 days, apply the understocked fee for the
understocked volume (understocked volume * Fee per CBM).
• In a scenario where 100 individuals are surveyed, with 80 liking tea and 70 liking
coffee, what is the potential range of people who enjoy both beverages?
• If faced with a situation involving a gun with 6 barrels, two bullets, and a person in
front of it, how would you maximize the person's chances of survival?
• You are in a situation where you are dealing with a person who is non-cooperative,
how do you deal with such a situation and come out with a win-win scenario
• If you were in a team, 2 of your team members are posing 2 different ideas, which idea
would you be going with?
• In a team, you are dealing with a problematic person (you have a personal level
problem with him/her) how would you deal with such a situation to not
impact/hamper your work.
stakeholder?
• What are the benefits of using Python for data science?
• VBA
warehouse
• If this is my average order value then what should I do to
• Data Pipeline
analytics?
• Explain AI algorithms such as KNN and K-means
o Verbal Ability
o Reasoning Ability
• Communication - Same as Cognizant Communication Test
• Questions with Audio based info and answer basis the Audio
• Trait Based Assessment - Similar to IBM - Psychrometric Test
• Supply chain
3. https://www.youtube.com/watch?v=yUOC-Y0f5ZQ&ab_channel=Atlassian
4. https://www.youtube.com/watch?v=6y545eCNHG8&ab_channel=PMDiegoGranados
5. https://www.youtube.com/watch?v=m-qyEDwB1tw&ab_channel=Exponent
6. https://www.youtube.com/watch?
v=2dczveSrsv8&list=PLIvg2wJAAhT6hpxKQs4YJGbfesyPDkv8E&index=7
7. https://www.youtube.com/watch?
v=n530l09t8zY&list=PLIvg2wJAAhT6hpxKQs4YJGbfesyPDkv8E&index=9
General Questions
1. Can you describe a product you successfully brought to market?
2. How do you prioritize features for a new product or an existing one?
3. What products do you admire and why?
4. How would you handle a situation where the development team is missing deadlines?
5. Why switch from Data Anaytics to Product Management ?
6. Define Product Management / What do you know about Product Management.
7. What do you know about our business?
8. How can you as a Product Manager help grow our business?
Customer Focus
1. How do you gather customer requirements?
2. Can you provide an example of a time when you had to balance customer needs with
business needs?