0% found this document useful (0 votes)
37 views59 pages

DA-Interview Go Through

The document covers various Python programming concepts, including data structures, control flow statements, decorators, exception handling, and file operations. It also discusses differences between lists, arrays, and dataframes, as well as data manipulation techniques using Pandas and NumPy. Additionally, it touches on machine learning principles, evaluation metrics, and Excel functions like MID and VLOOKUP.

Uploaded by

saswatnayak2023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views59 pages

DA-Interview Go Through

The document covers various Python programming concepts, including data structures, control flow statements, decorators, exception handling, and file operations. It also discusses differences between lists, arrays, and dataframes, as well as data manipulation techniques using Pandas and NumPy. Additionally, it touches on machine learning principles, evaluation metrics, and Excel functions like MID and VLOOKUP.

Uploaded by

saswatnayak2023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

PYTHON:

BASICS and ADVANCED


• Double Equal to and IS – difference

o Double Equal to is used to compare the value

equality comparison while IS is used for object


identify comparison
o “==” this operator checks if the values of two

operands are equal or not.it compares the values and


returns ‘True’ if they are equal, “ False “ Otherwise
• What are the different data structures in Python?

List,Set,tuple,strings,array and dictionary are some of


built in data structure in python
• Continue, Pass & Break - how output behaves when these

are introduced in code


IF we introduce Continue it act as skip the current
iteration and continue to the next iteration
Pass is used for if we create any loop or any
function or any condition and we don’t need it now
and we don’t know what to add in body so we can
use pass for not getting error
Break is used for breaking the loop iteration
• What are decorators in Python, and how would you use them?

def my_decorator(func):
def wrapper():
print("Something is happening before the function is
called.")
func()
print("Something is happening after the function is
called.")
return wrapper

@my_decorator
def say_hello():
print("Hello!")
say_hello()
• How does exception handling work in Python? Can you give an example using try-
except blocks?

There are 3 blocks in exception handling


try,except,finally we write code on try block if the block
has some error then the code flow will not break it gets
continue and the exception block will run if the error is
there and finally block runs everytime if we introduce it
• Explain the concept of list comprehensions and provide an example where it would be
useful.

So the list comprehensions means creating a list a with a


single line of code
Syntax :
Li = [ I for I in range(0,10) if I % 2 == 0)
This will create a list of even number
• What is a lambda function in Python, and how does it differ from a regular function?

Lambda function is a no name function or anonymous


function it can be use for a short period of time it is
defined using lambda keyword
Syntax
square = lambda x: x ** 2
print(square(5)) # Output: 25
• How would you work with files in Python? Can you show an example of reading from
and writing to a file?

Reading file:
# Open the file in read mode ('r')
with open('example.txt', 'r') as file:
# Read the entire content of the file
content = file.read()
print(content)
writing file:
# Open the file in write mode ('w')
with open('example.txt', 'w') as file:
# Write content to the file
file.write('This is a sample text.\n')
file.write('Python is awesome!')
• Can you explain the use of the 'self' keyword in Python classes?
• What are generators in Python, and how are they different from iterators?
• How do you implement multithreading or multiprocessing in Python, and in what
scenarios would you use them?

• Difference between numpy and pandas


• What is a dataframe?
A DataFrame is a two-dimensional, labeled data structure
similar to a table or spreadsheet, commonly used in data
analysis and manipulation. It consists of rows and
columns, with labeled axes for easy access, supports
heterogeneous data types, and offers a wide range of
functionalities for data manipulation and analysis.

• How to apply any function to any dataframe?

To apply any function to a DataFrame in Python, you can


use the apply()
• Diff b/w LIST and ARRAY

List:

Lists are a built-in data type in Python used to store a


collection of items.
Lists are mutable, meaning their elements can be
modified after the list is created.
Lists can contain elements of different data types.
Lists are created using square brackets [] and can be
modified using various methods like append(), insert(),
remove(), etc.

Array:

Arrays in Python refer to the array module, which


provides a way to create arrays with elements of a single
data type.
Arrays are mutable, like lists, meaning their elements can
be modified after creation.
Unlike lists, arrays can only contain elements of a single
data type.
Arrays are created using the array.array() constructor and
can be modified using methods like indexing or slicing.
• Diff b/w ARRAY and NUMPY ARRAY

Array is mutable and it has only one dimensional and we


can store only one data type value where Numpy array is
mutable means we can change the value once it was
created with help of indexing and slicing and perform
various operation and here we can store multiple data
type value , it Is a multi dimensional array we can create
2D array or higher numpy array offers wide range of
mathematics function
• Define the purpose of the init function in programming

and provide an example scenario where its use is


essential.
• In Python, define the roles of map, filter, and reduce

functions. Provide concise examples illustrating the


application of each function in data manipulation.
# Using lambda with map()
numbers = [1, 2, 3, 4, 5]
squared_numbers = map(lambda x: x ** 2, numbers)
print(list(squared_numbers)) # Output: [1, 4, 9, 16, 25]

# Using lambda with filter()


even_numbers = filter(lambda x: x % 2 == 0, numbers)
print(list(even_numbers)) # Output: [2, 4]

# Using lambda with reduce()


from functools import reduce
sum_of_numbers = reduce(lambda x, y: x + y, numbers)
print(sum_of_numbers) # Output: 15
• Can you explain the distinction between a list and a tuple?

lists are mutable sequences of elements enclosed in


square brackets, while tuples are immutable sequences of
elements enclosed in parentheses. Use lists when you
need a mutable collection, and use tuples when you need
an immutable collection or want to ensure data integrity.
• How would you reverse a list without using any in-built functions/methods?

def reverse_list(arr):
# Get the length of the list
n = len(arr)

# Loop through the first half of the list


for i in range(n // 2):
# Swap the elements at index i and (n - i - 1)
print(n,i)
arr[i], arr[n - i - 1] = arr[n - i - 1], arr[i]
# Example usage:
my_list = [1, 2, 3, 4, 5]
reverse_list(my_list)
print("Reversed list:", my_list) # Output: [5, 4, 3, 2, 1]
• Tuples and Arrays. How do they differ?

• What are continuous variables and discrete variables?

• Give a list, how to remove duplicates


• Extend and append difference

If we used Extend any string to the list it is going to add


each character of string to the list but if we use append it
will add only the string
• Could you explain the difference between a list, a tuple, and a dictionary in Python?
• How do you define a class in Python?
• What is the difference between reference by variable and value by reference?
• Have you ever created a model using OOPs concepts in Python?
• Can you explain what decorators are in Python?
• Can you give an example of a use case for decorators?

• Can you explain the concept of object classes and the


scope of variables in Python?

PANDAS and NUMPY


• Can you explain how to rename a dataframe column in Pandas?
Pandas Rename Columns
o The rename() function in Pandas is used to rename columns in a DataFrame.
The rename() function takes a dictionary as an argument, where the keys are
the old column names and the values are the new column names.
o For example, the following code would rename the Name column to First
Name and the Age column to Years Old:

df = df.rename(columns={"Name": "First Name", "Age": "Years Old"})


• How do you merge two dataframes in Pandas?
o Pandas Merge
o The merge() function in Pandas is used to combine two or more DataFrames.
The merge() function takes a number of arguments, including the two
DataFrames to be merged, the on keyword argument, which specifies the
column or columns on which the two DataFrames will be merged, and the
how keyword argument, which specifies the type of merge to be performed.
o There are four types of merges that can be performed:
▪ inner: Only rowsthat exist in both DataFrames will be included in the
merged DataFrame.
▪ outer: All rows
from both DataFrames will be included in the merged
DataFrame, even if they only exist in one DataFrame.
▪ left: All rows
from the left DataFrame will be included in the merged
DataFrame, and any matching rows from the right DataFrame will also
be included.
▪ right: All rows
from the right DataFrame will be included in the merged
DataFrame, and any matching rows from the left DataFrame will also
be included.

merged_df = pd.merge(df1, df2,


on='common_column_name', how='inner')
• handle null, view/handle outliers

• How would you handle missing data in a dataset using

Pandas?
• How do you handle outliers in your data?

• What tools or techniques do you use for data cleaning and

preprocessing?
• How do you deal with missing data?

• What is a filter in the context of data analysis?

• Pandas datastructures - series & dataframes


• Given a dataframe of single column having numbers, get square values of these numbers
in a new column

• Explain the working of group by function on Pandas Data


Frames.
• Can you explain how to transpose columns in a data table?
• Define a Pivot Table and its use cases.
• What methods do you employ to remove duplicate entries in a dataset?
• Are there any methods available for this purpose besides isnull?

• Could you elaborate on how to use iloc and loc for data
selection in a DataFrame?
• Can you clarify the distinctions between the merge and join methods in Pandas?

• Why do we employ double square brackets [[]] when


selecting multiple columns from a DataFrame in Pandas?
• Could you provide an overview of what Pandas is and its

applications?
• How do you perform sorting in Pandas based on a specific column?
• Can you explain the usage of the sort_values method in this context?
• PIVOT in python pandas

• GROUP BY - AGGREGATION
• JOINS

• MERGE

• CONCAT

• WHERE - FILTERING
• SORT

• your Python solution, you were required to merge three tables. Can you explain the
process you followed?
• Why did you choose to use merge instead of concat in your Python solution? Can you
explain how concat differs from merge?
• What is the axis parameter in the concat function, and how does it affect the
concatenation process?
• As a final request, could you send us an email detailing your past projects, focusing on
their key aspects and what you learned from them?
• Parameter used in the split function
• How to import an excel workbook in python which consists of multiple sheets. How to
merge these sheets together?
• How can you read 2 different excel files in Python?

• Does numpy use row order or column order?

SCRAPPING
• What is beautifulsoup and request library
• What is cursor and why connection is required
• If login is required, Would you able to scrap data without loging in
• What are security features and why is it required
• How would you approach the task of extracting all data related to Hard-disks from the
Flipkart website? Could you outline the steps or provide a sample code?
• Could you describe how you would scrape data from a website for a project and provide
a sample code?

ML
• Can you explain what clustering is?

• What all different accuracy metrics were used for


evaluating the models etc
• What is the difference between correlation and causation?

• Explain the difference between supervised and

unsupervised learning
• How do you handle overfitting in a machine learning

model?
• Describe a situation where you would use a random forest

over a linear regression model.


• Explain the concept of backpropagation.

• What are convolutional neural networks (CNN) and where

are they typically used?


• How does dropout help in preventing overfitting in neural

networks?
• How would you detect fraudulent transactions in a large

dataset?
• What is the difference between a decision tree and a

random forest?
• What is the k-nearest neighbors algorithm?
• What is the difference between bias and variance?
• How do you evaluate the performance of a machine learning model?
• What are the different types of dimensionality reduction techniques?
• What is the curse of dimensionality?
• What are the different types of machine learning algorithms?

• What exactly is AI? And how does it differ from Machine


Learning?
• R Value

• Coefficient

• Explain the concept of overfitting in machine learning and ways to prevent it.
• How would you handle skewed data when building a machine learning model?
• Explain the principle of a decision tree algorithm.

• How do you evaluate the performance of a regression


model?
• What is the difference between a Logistic Regression

model and a Linear Regression model?


• Evaluation matrix for supervised learning Difference

between precision and recall and what is the use of it


Excel
• Excel - Application of MID Function

The MID function in Excel is used to extract a specific


number of characters from a text string, starting at a
specified position. Its syntax is:

=MID(text, start_num, num_chars)

text: The text string from which you want to extract


characters.
start_num: The position of the first character you want to
extract.
num_chars: The number of characters you want to
extract.
• Explain VLOOKUP - Disadvantages / Limitations

VLOOKUP is a commonly used function in Excel for


vertical lookup. It is used to search for a value in the first
column of a table array and return a value in the same
row from another column. While VLOOKUP is a
powerful tool for data analysis and manipulation, it does
have some disadvantages and limitations:

Exact Match Requirement: By default, VLOOKUP


performs an exact match lookup. This means that it will
only find values that exactly match the lookup value. If
there is no exact match, VLOOKUP will return an error
(#N/A). While there is an option to perform an
approximate match using the fourth argument, this is
often overlooked and can lead to unexpected results.

Single Column Lookup: VLOOKUP only allows you to


search for values in the first column of a table array. If
the value you want to lookup is not in the first column,
you will need to rearrange your data or use other
functions like INDEX and MATCH.

Inflexible Column Indexing: With VLOOKUP, the


column index number is fixed and must be specified
manually. If the table structure changes, such as inserting
or deleting columns, the column index number may
become outdated and need to be adjusted manually.

Performance Issues with Large Data: VLOOKUP can be


slow and inefficient, especially when dealing with large
datasets. This is because it performs a linear search
through the entire lookup range, which can result in
longer calculation times, especially if used repeatedly or
with complex formulas.

Not Dynamic: VLOOKUP is not dynamic and does not


automatically update when the source data changes. If
the source data is updated or expanded, the VLOOKUP
formula will not reflect these changes unless manually
refreshed.

Limited to One Value: VLOOKUP only returns the first


matching value found in the lookup range. If there are
multiple matches, it will only return the first one. This
can be a limitation if you need to retrieve multiple values
or perform more complex lookup operations.
Case Sensitivity: By default, VLOOKUP is case-
insensitive, which means it may not distinguish between
uppercase and lowercase letters. This can lead to
unexpected results if case sensitivity is important in your
lookup operation.
• How
would you differentiate between VLOOKUP, HLOOKUP, and XLOOKUP in
Excel?

VLOOKUP, HLOOKUP, and XLOOKUP are all Excel


functions used for looking up and retrieving data from a
table. Here's how they differ:

VLOOKUP (Vertical Lookup):


Purpose: Searches for a value in the first column of a
table array and returns a value in the same row from a
specified column.
Syntax: VLOOKUP(lookup_value, table_array,
col_index_num, [range_lookup]).
Key Points:
Lookup value must be in the leftmost column of the table
array.
Can only search vertically (from top to bottom).
Suitable for vertical data structures.

HLOOKUP (Horizontal Lookup):


Purpose: Searches for a value in the top row of a table
array and returns a value in the same column from a
specified row.
Syntax: HLOOKUP(lookup_value, table_array,
row_index_num, [range_lookup]).
Key Points:
Lookup value must be in the top row of the table array.
Can only search horizontally (from left to right).
Suitable for horizontal data structures.
XLOOKUP:
Purpose: Introduced in Excel 365, XLOOKUP is a
versatile lookup function that can perform both vertical
and horizontal lookups, as well as array lookups.
Syntax: XLOOKUP(lookup_value, lookup_array,
return_array, [if_not_found], [match_mode],
[search_mode])
Key Points:
Offers more flexibility than VLOOKUP and HLOOKUP.
Can perform approximate or exact match lookups.
Allows searching in both vertical and horizontal
directions.
Supports array operations, making it more powerful and
versatile.
In summary, while VLOOKUP and HLOOKUP are
traditional Excel lookup functions designed for specific
lookup scenarios, XLOOKUP is a more advanced
function that offers greater flexibility and functionality,
including the ability to perform both vertical and
horizontal lookups, as well as array operations.
• Between INDEX MATCH and XLOOKUP, which one do

you prefer and why?


• How do you perform customized calculations within a Pivot table?

Open your Pivot table: Ensure you have a Pivot table


created with your desired fields.

Insert a Calculated Field:

Go to the "PivotTable Analyze" or "Options" tab on the


Excel ribbon.
Click on "Fields, Items, & Sets" (this might vary
depending on your Excel version).
Choose "Calculated Field" from the dropdown.
In the "Name" field, enter a name for your calculated
field.
In the "Formula" field, enter your calculation using field
names and operators (e.g., =Sales * 0.1 for calculating
10% of sales).
Click OK to create the calculated field.
Insert a Calculated Item:

If you're working with a hierarchical Pivot table (e.g.,


with multiple items under a single field), you might need
to use calculated items.
Right-click on an item within a field in your Pivot table.
Choose "Insert Calculated Item."
In the dialog box that appears, specify a name for your
calculated item and enter a formula similar to how you
would in a calculated field.
Click OK to create the calculated item.
Modify or Delete Calculations:

If you need to change or remove a calculated field or


item, you can usually find options to do so within the
Pivot table's settings or options menu. This may involve
navigating back to the calculated field/item dialog and
making the necessary changes or deletions.
Refresh Your Pivot Table: After making any changes,
you may need to refresh your Pivot table to see the
updated results.
• COUNTIF, SUMIF

COUNTIF Function: The COUNTIF function counts the


number of cells within a range that meet a specified
condition.
Syntax:
COUNTIF(range, criteria)

range: The range of cells that you want to apply the


criteria to.
criteria: The condition that defines which cells to count.

SUMIF Function: The SUMIF function adds the values


in a range that meet a specified condition.
Syntax:
SUMIF(range, criteria, [sum_range])

range: The range of cells that you want to apply the


criteria to.
criteria: The condition that defines which cells to add.
sum_range: (Optional) The range of cells to sum. If
omitted, the range specified in the "range" argument is
used.
• What is the most complex macro you have developed, and

what was its impact?


• How to combine multiple sheet in excel

• https://docs.google.com/spreadsheets/d/1Zw-dMdiH-

VjlJyQdodXuSNIxQkKks3cN/edit?
usp=sharing&ouid=113468865657991512043&rtpof=true&sd
=true
• You have 2 tables - customers and orders - how will you find the list of customers who
did not place any order?
• Give the list of customers who ordered more than once.
• calculate the distinct order IDs of all those customers who ordered Shilajit
• Previous MIS reports were shared - asked to apply formulas SUMIFS, COUNTIFS,
VLOOKUP, AVG, SUM, IFs, Logical Functions, Cell Referencing Further asked
about course, past studies and other background related questions

SUMIFS(RANGE_SUM ,
CRITERIA_RANGE,CRITERIA,.CRITERIA_RANGE_
N,CREITERIAN)
COUNTIFS(CRITERIA_RANGE,CRITERIA,…..CRIT
ERIA_RANGEn,CRITERIAn)
VLOOKUP(VALUE,TABLE_RANGE,COLUMN_NO,
MATCH_TYPE)
AVERAGE(RANGE)
SUM(RANGE)
IFS(CONDITION1,V_TRUE1,…..,F)
Logical Function- and(),or(),not(),if()
Cell Referencing à
• Absolute – if we have fixed one cell value and we
want to use that cell value with a row formula we
can use that cell or press f4 on that cell value to
fix those value or we can add $ symbol to the
value
• Relative – it is simple just right formula and
extend to each row the formula will be affected
for each row

• SUMPRODUCT Formula in excel


• Do you have any knowledge of VBA and Macro - told that

he know what they do - But not so much exposure with


them.

SQL:
• Import

So import in sql is not a command basically its refer to


bringing a data into the a database so basically lots of
data base management system as import wizard which
helps to bring data from external source to database but if
we have a data into the database already here we can use
insert-select combo to insert those data into new table
Insert into table2 (col1, col2) select col1,col2 form
table1 ;
• What is ddl,dml,dcl,tcl?
DDL (Data Definition Language):
DDL commands are used to define, modify, and delete
the structure of database objects such as tables, indexes,
views, and schemas.
Examples of DDL commands include CREATE, ALTER,
and DROP.
These commands are used to manage the structure of the
database.

DML (Data Manipulation Language):


DML commands are used to manipulate data stored in
database objects.
Examples of DML commands include SELECT,
INSERT, UPDATE, and DELETE.
These commands are used to retrieve, insert, modify, and
delete data in database tables.

DCL (Data Control Language):


DCL commands are used to control access to data stored
in the database.
Examples of DCL commands include GRANT and
REVOKE.
These commands are used to grant or revoke privileges
and permissions on database objects.

TCL (Transaction Control Language):


TCL commands are used to manage transactions within
the database.
Examples of TCL commands include COMMIT,
ROLLBACK, and SAVEPOINT.
These commands are used to control the outcome and
flow of transactions in the database.
• What is the difference between SQL and NOSQL?

• SQL (Relational):
o Data is organized in tables with rows and columns.
o Tables are linked together through predefined
relationships, ensuring data consistency.
o Follows a rigid schema, meaning the structure of the
data is defined upfront.

• NoSQL (Non-Relational):
o Offers more flexible data structures. Data can be

stored in documents, key-value pairs, graphs, or


wide-column stores.
o Less emphasis on predefined schema, allowing for

more dynamic data models.

• Join the 2 tables

There are Lots of join in sql for eg left join ,right


join,inner join and cross join
Select * from table1 t1 join table2 t2 on t1.id = t2.id
• Fetch records of a particular type

Select * from table1 where type = “condition”


• Get few names and their datatype from first table (dataname) then get count of these
from second table

SELECT dataname, COUNT(*) AS count


FROM second_table
WHERE dataname IN ( -- List of names from first_table
SELECT dataname
FROM first_table
);
• Dynamic variable declaration

SQL itself doesn't have direct support for dynamic


variable declaration at runtime in the way some
programming languages do. However, there are
techniques to achieve a similar effect depending on your
specific needs:

CREATE PROCEDURE GetCustomerData (IN


customer_id INT)
AS
BEGIN
SELECT * FROM customers WHERE customer_id =
customer_id;
END;
• GROUP BY

GROUP BY clause is used to group together rows that


share a common value in one or more columns. This
allows you to perform aggregate functions (functions that
summarize data) on the grouped data
• CONVERT

The CONVERT function in SQL is used to transform


data from one data type to another. It's a versatile
function that can be used for various data conversion
tasks

SELECT CONVERT(INT, price) AS discounted_price


FROM products;
• ROUND

The ROUND function in SQL is used to round a numeric


value to a specified number of decimal places or to the
nearest integer

SELECT ROUND(my_numeric_column, 2) AS
rounded_value
FROM my_table;
• Convert where column to decimal and then calculate percentage

SELECT
CAST(amount AS DECIMAL) / total * 100 AS
percentage
FROM my_table;
• CONCATENATE the % sign

SELECT CONCAT(CONVERT(DECIMAL, amount) /


total * 100, '%') AS percentage
FROM my_table;
• Write a SQL query and convert it's output to CTE and print again

WITH EmployeeCounts AS (
SELECT department_id, COUNT(*) AS
employee_count
FROM employees
GROUP BY department_id
)
SELECT * FROM EmployeeCounts;
• Give count from sub query which consisted of order by - order by had to be removed
With order by

SELECT *
FROM (
SELECT column1, column2
FROM your_table
ORDER BY column1
) AS subquery;

Without order by

SELECT COUNT(*)
FROM (
SELECT column1, column2
FROM your_table
) AS subquery;
• SELF JOIN query - Analytical skills and SQL knowledge test

SELECT e1.first_name AS employee_name,


e2.first_name AS manager_name
FROM employees e1
JOIN employees e2 ON e1.manager_id =
e2.employee_id;
• In the employees table, how would you find the name of the employee with the
employee_id of 152?

Select name from employees where employee_id = 152;


• Can you explain some basic and advanced SQL concepts?

Some basics concepts are filtering printing the columns


and using group by, concat, order by, aggregate functions
and some more
And for advanced sql concept cte, window functions-like
rank, dense rank,row number and lag , lead indexing ,
views, etc
• Different types of constraints and the difference between a primary key and a foreign
key?

Constraints are essential components of database design,


ensuring data integrity and consistency. Here's a
breakdown of the types of constraints and their
distinctions:
1. Primary Key Constraint:
• A primary key uniquely identifies each record in a

table, acting as its unique identifier.


• It prevents duplicate or null values within the
specified column(s).
• Primary keys are crucial for data integrity and

indexing.
2. Foreign Key Constraint:
• Foreign keys establish relationships between tables,

ensuring referential integrity.


• They link to the primary key of another table,

maintaining consistency across related data.


• Foreign keys enforce that values in the referencing

column(s) exist in the referenced primary key


column(s).
3. Unique Constraint:
• Unique constraints enforce uniqueness for column(s)

values, allowing only distinct and non-null entries.


• They differ from primary keys in permitting null

values (limited to one null per column).


4. Check Constraint:
• Check constraints impose conditions on column

values, ensuring they meet specified criteria or


expressions.
• They're useful for enforcing business rules or

domain-specific requirements.
5. Default Constraint:
• Default constraints provide default values for

columns when no explicit value is provided during


insertion.
• They assign predefined values to columns, facilitating

data consistency.
Difference between Primary Key and Foreign Key:
• Primary Key:

• Acts as the unique identifier for each record in a

table.
• Prevents duplicates and null values within its

column(s).
• Essential for indexing and establishing relationships

with other tables.


• Foreign Key:

• Establishes relationships between tables, referencing

the primary key of another table.


• Ensures referential integrity by verifying that values

in the referencing column(s) exist in the referenced


primary key column(s).
• What isthe difference between a SELECT statement and a WHERE statement?
The SELECT statement determines which columns of data to retrieve from the
database tables, while the WHERE clause specifies the conditions that the retrieved
data must meet.
The SELECT statement is responsible for projecting the desired columns, while the
WHERE clause filters the rows based on specified criteria.
In summary, the SELECT statement defines what to retrieve, while the WHERE
clause specifies which records to retrieve based on specified conditions.
• What is the difference between an inner join and an outer join?

The main difference between an inner join and an outer


join is how they handle unmatched rows:
Inner join returns only the rows with matching values in
both tables.
Outer join returns all rows from one or both tables,
including unmatched rows from one table.
• What is a subquery?

In SQL, a subquery, also known as an inner query or


nested query, is a query nested within another SQL
statement, such as SELECT, INSERT, UPDATE, or
DELETE. It allows you to retrieve or manipulate data
based on the results of another query.
Eg:
SELECT column1
FROM table1
WHERE column2 IN (SELECT column3 FROM table2
WHERE condition);
• What is a CTE?

A Common Table Expression (CTE) is a temporary result


set that can be referenced within a SELECT, INSERT,
UPDATE, or DELETE statement in SQL. It allows you
to define a named temporary result set that can be used in
subsequent parts of a query, enhancing readability,
modularity, and performance.
CTEs are defined using the WITH keyword followed by
a name for the CTE and a query that defines the result
set.
EG:
WITH CTE AS (
SELECT column1, column2
FROM table1
WHERE condition
)
SELECT *
FROM CTE
WHERE column2 > 10;

• What is a window function?

Window functions are applied to a specific column or


expression in a SELECT statement.
They can be used for tasks such as ranking, aggregation,
statistical analysis, and data transformation within a
query result set.
• Ranking: Window functions like ROW_NUMBER(),

RANK(), and DENSE_RANK() assign a unique rank to


each row based on specified criteria.
• Aggregation: Functions like SUM(), AVG(), MIN(), and

MAX() can be used as window functions to calculate


aggregates over a window of rows.
• Moving averages: Window functions like AVG() with the

OVER clause can calculate moving averages over a


specified window of rows.
• Partitioning: Window functions can partition the result set

into groups based on one or more columns, allowing


separate calculations for each partition.
• Data comparison: Functions like LAG() and LEAD()

allow you to compare the current row with previous or


subsequent rows within a window.

Syntax:

Window functions are typically specified using the


OVER clause, which defines the window or frame over
which the function operates.
The OVER clause includes partitioning, ordering, and
framing clauses to specify the window boundaries and
partitioning criteria.

EG:
SELECT
employee_id,
salary,
AVG(salary) OVER (PARTITION BY department_id
ORDER BY hire_date ROWS BETWEEN 1
PRECEDING AND 1 FOLLOWING) AS avg_salary
FROM
employees;
• What is the difference between a GROUP BY statement and a HAVING clause?

The GROUP BY statement and the HAVING clause are


both used in SQL queries to perform aggregation and
filtering, but they serve different purposes:

Group by:
The GROUP BY statement is used to group rows that
have the same values into summary rows, typically to
perform aggregate functions on each group.
The GROUP BY clause is applied before the result set is
aggregated, so it determines the grouping of rows before
any filtering is applied.

Having:
The HAVING clause is used to filter groups of rows
based on specified conditions after the GROUP BY
operation has been performed.
Conditions specified in the HAVING clause are
evaluated after the GROUP BY operation and can
include aggregate functions.
It is commonly used to apply conditions to aggregated
data, such as filtering groups with a certain minimum or
maximum value.

• What is a DISTINCT statement?

The DISTINCT statement is used in SQL to remove


duplicate rows from the result set of a query. It ensures
that only unique values are returned for a specified
column or combination of columns.

When the DISTINCT keyword is used in a SELECT


statement, it instructs the database to return only distinct
(unique) values for the specified column or columns.
Eg:
SELECT DISTINCT department
FROM employees;
• What is a NULL value? How do you handle NULL values in a WHERE clause?

A NULL value in SQL represents a missing or unknown


value in a database. It signifies the absence of any actual
value in a field.
Checking for NULL Values:
SELECT * FROM table_name WHERE column_name
IS NULL;
Handling NULL Values in Conditions: When
performing comparisons involving NULL values, you
should consider the behavior of NULL. For instance,
comparing a column to NULL using equality operators
will not return any rows because NULL is not equal to
anything, not even itself. However, you can use the IS
NULL operator to check for NULL values explicitly.
Using COALESCE or ISNULL:
SELECT * FROM table_name WHERE
COALESCE(column_name, default_value) =
'some_value';
• How do you join two tables on multiple columns?
So we can Join table on columns using ‘on’ clause so if we want to join two table on
the basis of multiple columns so we can use and or OR keyword to join
EG:
Select * from table1 t1 join table2 t2 on t1.c1 = t2.c1 and t1.c2 = t2.c2;
• How do you order the results of a SELECT statement?

By using Order by Clause we can order the result od a


select statement
Eg:
Select CustomerID from customers order by CustomerID
DESC;
This will sort result desc or if we don’t give DESC by
default it will asc
• How do you limit the number of rows returned by a SELECT statement?

We can you use limit clause where we can use limit and
the limit value for the number of value we want
EG:
Select * from order limit 5;
It will only return the 5 rows
By using limit we can perform operation such as top 10
orders ordered by customer or top 5 employee based on
salary
• How do you calculate the average of a column?

Here I can use avg() aggregate function which can


calculate if I will use directly without using any group by
it will calculate average or entire columns and if I use
group by it wlill group by the rows and calculate that
particular group average
EG:
Select dept ,avg(Salary) from Employees group by dept
• How do you calculate the standard deviation of a column?

To calculate the standard deviation of a column in SQL,


you can use the STDDEV or STDDEV_POP function,
depending on whether you want to compute the sample
standard deviation or the population standard deviation,
respectively. Here's how you can use these functions:
STDDEV:
SELECT STDDEV(column_name) AS sample_std_dev
FROM table_name;
Population Standard Deviation (STDDEV_POP):
SELECT STDDEV_POP(column_name) AS
population_std_dev
FROM table_name;
• What is a view in SQL?

A view in SQL is a virtual table that is based on the result


set of a SELECT query. It doesn't physically store the
data but instead provides a way to present data from one
or more tables in a structured format, similar to a table.
Views can be used to simplify complex queries, hide
sensitive data, or provide a predefined subset of data to
users.
EG:
-- Create a view named 'EmployeeDetails' that shows the
name and salary of employees
CREATE VIEW EmployeeDetails AS
SELECT EmployeeID, FirstName, LastName, Salary
FROM Employees;
• Can you define what a stored procedure is? How do you write a stored

procedure?
To write a stored procedure in SQL, you can use the
CREATE PROCEDURE statement followed by the
procedure name, parameters (if any), and the SQL
statements that define the procedure's functionality.
Here's a basic example:

CREATE PROCEDURE GetEmployeeByID


@EmployeeID INT
AS
BEGIN
SELECT *
FROM Employees
WHERE EmployeeID = @EmployeeID;
END;

In this example:

CREATE PROCEDURE is used to define a new stored


procedure named GetEmployeeByID.
@EmployeeID INT specifies a parameter named
@EmployeeID of type INT.
The AS keyword begins the body of the stored
procedure.
Inside the procedure, a SQL SELECT statement retrieves
data from the Employees table based on the provided
EmployeeID.
The END keyword marks the end of the stored
procedure.
Once created, you can execute the stored procedure using
EXECUTE or EXEC:

EXEC GetEmployeeByID @EmployeeID = 123;


This will execute the stored procedure
GetEmployeeByID with the parameter @EmployeeID
set to 123.
• How does SQL help in extracting the data?

SQL (Structured Query Language) helps in extracting


data from databases by providing a standardized
language for querying relational databases. Here's how
SQL facilitates data extraction:

Querying Databases: SQL allows users to write queries


to retrieve data from one or more tables in a database.
Queries can range from simple SELECT statements to
complex JOINs, GROUP BYs, and subqueries.

Filtering Data: With SQL, users can apply WHERE


clauses to filter data based on specific conditions. This
helps in extracting only the relevant data from large
datasets.
Joining Tables: SQL supports various types of JOIN
operations (e.g., INNER JOIN, LEFT JOIN, RIGHT
JOIN) to combine data from multiple tables based on
common columns. This is useful for fetching related data
from different tables.

Aggregating Data: SQL provides aggregate functions


such as COUNT, SUM, AVG, MIN, and MAX to
perform calculations on groups of data. These functions
are helpful in summarizing and extracting insights from
datasets.

Sorting Data: SQL allows users to sort query results


using the ORDER BY clause. This helps in arranging the
extracted data in a desired order based on one or more
columns.

Limiting Results: SQL provides the LIMIT and OFFSET


clauses to limit the number of rows returned by a query.
This is useful when dealing with large datasets and when
only a subset of rows is needed.
• SQL Joins: Could you explain the differences between

LEFT JOIN and INNER JOIN in SQL? Please provide


an example where you would use each.
INNER JOIN returns only the matched rows from both
tables.
LEFT JOIN returns all the rows from the left table, along
with matched rows from the right table, and NULL
values for unmatched rows in the right table.
SELECT Departments.DepartmentName,
COUNT(Employees.EmployeeID) AS EmployeeCount
FROM Departments
LEFT JOIN Employees ON Departments.DepartmentID
= Employees.DepartmentID
GROUP BY Departments.DepartmentName;
Retrieve the department who have number of employee
and also the department who don’t have employee

SELECT Employees.Name,
Departments.DepartmentName
FROM Employees
INNER JOIN Departments ON
Employees.DepartmentID = Departments.DepartmentID;
Retrieve The Department who have employee
• In SQL, when would you use "WHERE" versus "HAVING"?

So if I have the situation where I want to filter the table


and then perform group by there I can use WHERE
clause but if the situation happens when I want to
perform group by and then I have to use and filteration
there I can use HAVING clause
• How would you write a SQL query to fetch the 5th highest salary of each department?

SELECT DepartmentID, DepartmentName, Salary


FROM (
SELECT DepartmentID, DepartmentName, Salary,
ROW_NUMBER() OVER (PARTITION BY
DepartmentID ORDER BY Salary DESC) AS
salary_rank
FROM Employees
) AS ranked_salaries
WHERE salary_rank = 5;
• Can you explain the differences between DROP, TRUNCATE, and DELETE in SQL?

DROP:
DROP is a command used to remove an entire table,
view, index, or database object from the database
schema.
When you DROP a table, all the data, indexes, and
privileges associated with that table are permanently
removed from the database.
It's important to note that DROP is a DDL (Data
Definition Language) command, and it cannot be rolled
back. Once you drop an object, it's gone.

TRUNCATE:
TRUNCATE is a command used to remove all rows from
a table quickly and efficiently, but it does not remove the
table structure.
TRUNCATE is faster than DELETE because it does not
generate individual delete operations for each row.
Instead, it deallocates the data pages of the table,
effectively removing all rows at once.
TRUNCATE is also a DDL command, and like DROP, it
cannot be rolled back.

DELETE:
DELETE is a command used to remove one or more
rows from a table based on a condition.
Unlike TRUNCATE, DELETE removes specific rows
from the table, allowing you to specify filtering criteria
using a WHERE clause.
DELETE is slower than TRUNCATE because it
generates individual delete operations for each row that
matches the condition.
DELETE is a DML (Data Manipulation Language)
command, and it can be rolled back using a transaction if
it's executed within a transaction block.
• What are database triggers and can you list their types?

Database triggers are special stored procedures in a


database that automatically execute in response to certain
events or actions performed on the database. These
events can include INSERT, UPDATE, DELETE
operations on tables, or even specific conditions like
database startup or shutdown. Triggers are commonly
used to enforce business rules, maintain data integrity, or
automate certain tasks.

Here are the types of database triggers:

DML Triggers (Data Manipulation Language Triggers):

These triggers fire in response to data manipulation


language (DML) operations such as INSERT, UPDATE,
and DELETE statements executed on tables.
DML triggers are further categorized into:
INSERT triggers: Fired after an INSERT operation.
UPDATE triggers: Fired after an UPDATE operation.
DELETE triggers: Fired after a DELETE operation.
DDL Triggers (Data Definition Language Triggers):

These triggers fire in response to data definition language


(DDL) events such as CREATE, ALTER, and DROP
statements executed on objects in the database schema.
DDL triggers are used to track changes to the database
structure, enforce security policies, or perform
administrative tasks.
Instead Of Triggers:

Instead Of triggers are a special type of trigger that fires


instead of the triggering action (e.g., INSERT, UPDATE,
DELETE).
They are commonly used with views and allow you to
define custom logic to handle the triggering action
without actually performing the action itself.
• Can you differentiate between a self-join and an inner join?

Inner join: Joins two or more different tables based on a


common column(s).
Self-join: Joins a table with itself, typically to compare
rows within the same table based on related columns.
EG:
SELECT employees.emp_id, employees.emp_name,
departments.dept_name
FROM employees
INNER JOIN departments ON employees.dept_id =
departments.dept_id;

SELECT e.emp_name AS employee_name,


m.emp_name AS manager_name
FROM employees e
INNER JOIN employees m ON e.manager_id =
m.emp_id;
• Is it possible to use both GROUP BY and PARTITION BY in a single SELECT
statement? Could you provide an example?

Yes, it's possible to use both GROUP BY and


PARTITION BY in a single SELECT statement.
However, they serve different purposes:

GROUP BY is used to group rows that have the same


values into summary rows, typically used with aggregate
functions like SUM, AVG, COUNT, etc.
PARTITION BY is used to divide the result set into
partitions to which the window function is applied
separately.

Eg:
SELECT region,
product,
amount,
SUM(amount) OVER (PARTITION BY region)
AS total_sales_region,
AVG(amount) OVER (PARTITION BY region,
product) AS avg_sales_product_in_region
FROM
sales
GROUP BY
region, product, amount;
• Under
what circumstances would you use LIMIT, and when would you opt for
OFFSET?

In SQL, both LIMIT and OFFSET are used to restrict the


number of rows returned by a query, but they serve
different purposes:

LIMIT is used to restrict the number of rows returned by


a query to a specified number. It is typically used to
retrieve a fixed number of rows from the beginning of
the result set.

OFFSET is used to skip a specified number of rows from


the beginning of the result set before returning the
remaining rows. It is often used in conjunction with
LIMIT to implement pagination, where different subsets
of rows are displayed on different pages.

Eg:
-- Retrieve the next 5 student records (for the third page)
SELECT *
FROM students
LIMIT 5 OFFSET 10;
• Canyou write a SQL query to find the maximum salary in each department and then
rank these maximum salaries sequentially?

WITH MaxSalaries AS (
SELECT
department,
MAX(salary) AS max_salary
FROM
employees
GROUP BY
department
)
SELECT
department,
max_salary,
ROW_NUMBER() OVER (ORDER BY max_salary
DESC) AS salary_rank
FROM
MaxSalaries;
• Different types of Databases?

There are several types of databases, each designed to


serve specific purposes and cater to different data storage
and retrieval needs. Some of the commonly used types of
databases include:

Relational Databases (SQL databases):

These databases store data in tabular form with rows and


columns.
They use Structured Query Language (SQL) for querying
and managing data.
Examples include MySQL, PostgreSQL, Oracle
Database, Microsoft SQL Server, SQLite.
NoSQL Databases:

NoSQL databases use non-tabular data models to store


and retrieve data.
They offer more flexibility and scalability compared to
relational databases.
Examples include MongoDB, Cassandra, Couchbase,
Redis, Amazon DynamoDB.
Document Databases:

Document databases store data in JSON or BSON


format, making them suitable for storing and retrieving
document-based data.
Each document may contain key-value pairs or nested
structures.
Examples include MongoDB, Couchbase, Firebase.
Key-Value Stores:

Key-value stores are simple databases that store data as a


collection of key-value pairs.
They are optimized for fast retrieval of data based on
keys.
Examples include Redis, DynamoDB, Riak.
Column-Family Stores:

Column-family stores organize data into columns rather


than rows, making them suitable for storing and
retrieving data with a large number of columns.
They are optimized for read and write performance.
Examples include Apache Cassandra, HBase.
Graph Databases:

Graph databases store data in nodes and edges,


representing relationships between entities.
They are designed for querying and analyzing complex
relationships in data.
Examples include Neo4j, Amazon Neptune, JanusGraph.
Time-Series Databases:

Time-series databases are optimized for storing and


analyzing time-series data, such as sensor data, stock
prices, and IoT data.
They support efficient storage and retrieval of data based
on timestamps.
Examples include InfluxDB, Prometheus, TimescaleDB.
In-Memory Databases:

In-memory databases store data in system memory rather


than on disk, enabling faster data access and processing.
They are suitable for applications requiring high
performance and low latency.
Examples include Redis, Memcached, Apache Ignite.
• What are semi-structure databases?

Semi-structured databases, also known as NoSQL


databases
Examples of semi-structured databases include
MongoDB, Couchbase, Cassandra, Redis, and Amazon
DynamoDB
• Different types of JOINS?

Different types of joins are inner join, left join, right join,
cross join , self join
• What are data pipelines?
Data pipelines are a series of processes that extract,
transform, and load (ETL) data from various sources into
a destination system or database. They are used to
automate the flow of data between different systems,
applications, or databases, ensuring that data is
efficiently collected, processed, and made available for
analysis or use.

Key components of a data pipeline include:

Data Sources: These are the systems, databases,


applications, or files from which data is collected. Data
can come from various sources, including databases,
APIs, streaming platforms, logs, files (such as CSV,
JSON, XML), and IoT devices.

Data Extraction: In this stage, data is extracted from the


source systems or files. This may involve querying
databases, accessing APIs, reading files, or streaming
data from real-time sources.

Data Transformation: Extracted data often needs to be


transformed into a format suitable for analysis or
consumption by downstream systems. Data
transformation tasks may include cleaning, filtering,
aggregating, joining, enriching, and restructuring data.

Data Loading: Transformed data is loaded into a


destination system or database, where it can be stored for
further analysis, reporting, or application use. Loading
may involve inserting data into a relational database,
writing to a data warehouse, or storing in a data lake.

Data Processing: Some data pipelines include additional


processing steps, such as real-time analytics, machine
learning model inference, or business logic execution,
before data is stored or consumed.
• Data Pre-Processing

Data pre-processing is a crucial step in data analysis and


machine learning pipelines. It involves cleaning,
transforming, and preparing raw data into a format that is
suitable for analysis or modeling. Here are the main steps
involved in data pre-processing:

• Data warehouses vs Databases?

Data warehouses and databases serve different purposes


and have distinct characteristics. Here's a comparison
between the two:

Purpose:

Databases: Databases are designed for transactional


processing, storing, and managing structured data
efficiently. They are optimized for quick data retrieval,
updating, and querying to support real-time operations.
Data Warehouses: Data warehouses are specialized
databases that are optimized for online analytical
processing (OLAP) and data analysis. They are used for
collecting, integrating, and storing large volumes of
historical and current data from various sources to
support business intelligence (BI) and decision-making
processes.
Data Structure:

Databases: Databases typically store structured data in


tables with predefined schemas. They are ideal for
handling operational data such as transactions, user
information, and inventory records.
Data Warehouses: Data warehouses can store structured,
semi-structured, and unstructured data. They often use a
dimensional model with facts and dimensions to organize
data for analysis. They support complex queries and
reporting requirements.
Schema:

Databases: Databases use a normalized schema to


minimize redundancy and maintain data integrity. They
are optimized for OLTP (Online Transaction Processing)
operations, where data modification and transactional
consistency are critical.
Data Warehouses: Data warehouses typically use a
denormalized or star schema to optimize query
performance for analytical queries. The schema is
designed to facilitate data aggregation, filtering, and
analysis across multiple dimensions.
Querying and Analysis:

Databases: Databases are optimized for fast read and


write operations to support transactional applications.
They prioritize concurrency, consistency, and isolation
(ACID properties).
Data Warehouses: Data warehouses are optimized for
complex analytical queries and reporting. They support
OLAP operations, including slicing, dicing, drilling
down, and rolling up data to analyze trends, patterns, and
relationships.
Usage:

Databases: Databases are used for day-to-day operations,


such as managing customer information, processing
orders, and handling inventory. They are transactional
systems that support online transaction processing
(OLTP).
Data Warehouses: Data warehouses are used for strategic
decision-making, business intelligence, and data analysis.
They provide a consolidated view of data from multiple
sources and support online analytical processing (OLAP)
for reporting, querying, and data mining.
• Data warehouse Schema

In a data warehouse, the schema refers to the logical


structure that organizes and represents data for efficient
querying, analysis, and reporting. There are mainly two
types of schemas used in data warehousing: star schema
and snowflake schema.

Star Schema:
In a star schema, data is organized into a central fact
table surrounded by dimension tables.
The fact table contains numerical measures or metrics,
often related to business transactions or events.
Dimension tables contain descriptive attributes or
dimensions that provide context to the measures in the
fact table.
Each dimension table is connected to the fact table
through foreign key relationships.
Star schemas are denormalized, meaning that dimension
tables are typically in a fully normalized form, and
redundant data is intentionally introduced for
performance optimization.
Star schemas are well-suited for query performance and
simplicity, making them popular in data warehousing
environments.

Snowflake Schema:
A snowflake schema is an extension of the star schema,
where dimension tables are normalized into multiple
related tables.
Unlike the star schema, where dimension tables are
denormalized, in a snowflake schema, dimension tables
may have multiple levels of normalization, resembling a
snowflake's shape.
Normalization reduces data redundancy and can improve
data integrity, but it can also lead to more complex
queries and potentially slower performance compared to
star schemas.
Snowflake schemas are useful when there are strict
requirements for data integrity and when storage space
needs to be optimized.
• What is Normalization in SQL Databases? and why is it important?

Normalization in SQL databases refers to the process of


organizing and structuring data in a relational database to
minimize redundancy and dependency, thus improving
data integrity and efficiency. It involves breaking down
larger tables into smaller, related tables and establishing
relationships between them.

Normalization is important for several reasons:

Minimizing Redundancy: By eliminating duplicate data


and storing it in separate tables, normalization reduces
storage space requirements and ensures that each piece of
information is stored only once. This helps to prevent
data anomalies and inconsistencies.
Avoiding Update Anomalies: When data is duplicated
across multiple records, inconsistencies can arise if one
instance of the data is updated and others are not.
Normalization helps to mitigate update anomalies by
storing data in a centralized location and updating it in
one place.

Improving Data Integrity: By organizing data logically


and reducing redundancy, normalization improves data
integrity by reducing the risk of data inconsistencies or
errors. This ensures that the data accurately reflects the
real-world entities and relationships it represents.

Facilitating Query Optimization: Normalized databases


typically have well-defined relationships between tables,
making it easier to write efficient queries that retrieve
specific data. This can lead to faster query execution
times and improved overall performance.

Normalization is achieved through a series of


normalization rules or normal forms, such as First
Normal Form (1NF), Second Normal Form (2NF), Third
Normal Form (3NF), and so on. Each normal form
represents a level of data organization and dependency
reduction, with higher normal forms indicating more
rigorous normalization.
• Are you familiar with window functions? If so, can you

explain their use cases?


Window functions are a powerful feature in SQL that
allow you to perform calculations across a set of rows
related to the current row, without the need for grouping
the rows into a single output row. They operate on a
"window" of rows defined by a specific partition or
ordering criteria.

Here are some common use cases for window functions:

Calculating Running Totals or Averages: Window


functions can be used to calculate cumulative sums or
averages over a specific range of rows. This is useful for
financial analysis, trend analysis, or calculating moving
averages.

Ranking and Percentiles: Window functions can assign


ranks, dense ranks, or percentiles to rows within a
partition based on specified criteria. This is helpful for
identifying top performers, outliers, or segmenting data
into quantiles.

Comparing Values with Previous or Next Rows: Window


functions can compare the current row's value with
values from preceding or following rows within the same
partition. This is useful for detecting trends, identifying
changes in data, or calculating differences over time.

Aggregating Data without Grouping: Unlike traditional


aggregate functions like SUM() or AVG(), window
functions allow you to aggregate data across multiple
rows without collapsing them into a single row. This
enables you to retain the granularity of the data while
performing calculations.

Top-N Queries: Window functions can be used to


retrieve the top or bottom N records within each partition
based on specified criteria. This is helpful for finding the
highest or lowest values within groups without having to
use subqueries or complex joins.
• In terms of window functions, can you differentiate

between "dense rank" and "rank"?


"rank" leaves gaps in ranks for tied values, while "dense
rank" assigns consecutive ranks without any gaps, even
for tied values. The choice between the two depends on
the desired behavior for handling ties in the ranking
• Describe the process of data wrangling or data munging.
• How do you ensure data security and privacy when working on a project?

• Print the 2nd highest salary when the 2nd and 3rd salary is
same.
WITH RankedSalaries AS (
SELECT
Salary,
DENSE_RANK() OVER (ORDER BY Salary DESC)
AS SalaryRank
FROM
YourTableName
)
SELECT DISTINCT Salary
FROM RankedSalaries
WHERE SalaryRank = 2
• What are different types of relations in Database? Brief with example for each relation

The relation is Database are 1to1, 1 to many, many to


one and many to many
• Row_number

Row number is a widow function which work as index it


gives basically index to each row
• LAG

the LAG() function is a window function that allows


you to access the value of a column from a previous
row within the same result set. It is commonly used
to compare the current row with the preceding row
in a specified order.
LAG(column_name, offset [, default]) OVER
(PARTITION BY partition_expression ORDER BY
order_expression)
• column_name: The name of the column whose value

you want to access from the previous row.


• offset: An optional integer indicating the number of

rows before the current row from which to retrieve


the value. By default, the offset is 1 (i.e., the value
from the previous row).
• default: An optional value to return if the offset row

does not exist. If omitted, NULL is returned.


• PARTITION BY partition_expression: Optional

clause that divides the result set into partitions to


which the LAG() function is applied independently.
• ORDER BY order_expression: Specifies the order

of the rows within each partition.


Consider a table named Sales with columns Month
and Revenue. We want to calculate the difference in
revenue between the current month and the previous
month.

SELECT
Month,
Revenue,
LAG(Revenue) OVER (ORDER BY Month) AS
PreviousRevenue,
Revenue - LAG(Revenue) OVER (ORDER BY
Month) AS RevenueDifference
FROM
Sales;
This query retrieves the revenue for each month,
along with the revenue from the previous month and
the difference in revenue between the current month
and the previous month. The LAG() function is used
to fetch the revenue from the previous row, ordered
by the Month column.

• LEAD

the LEAD() function is a window function that allows


you to access the value of a column from a subsequent
row within the same result set. It is similar to the LAG()
function but retrieves values from rows that come after
the current row.

The syntax of the LEAD() function is as follows:

LEAD(column_name, offset [, default]) OVER


(PARTITION BY partition_expression ORDER BY
order_expression)
column_name: The name of the column whose value you
want to access from the subsequent row.
offset: An optional integer indicating the number of rows
after the current row from which to retrieve the value. By
default, the offset is 1 (i.e., the value from the next row).
default: An optional value to return if the offset row does
not exist. If omitted, NULL is returned.
PARTITION BY partition_expression: Optional clause
that divides the result set into partitions to which the
LEAD() function is applied independently.
ORDER BY order_expression: Specifies the order of the
rows within each partition.
Here's an example of using the LEAD() function:
Consider a table named Sales with columns Month and
Revenue. We want to calculate the difference in revenue
between the current month and the subsequent month.

SELECT
Month,
Revenue,
LEAD(Revenue) OVER (ORDER BY Month) AS
NextMonthRevenue,
LEAD(Revenue) OVER (ORDER BY Month) -
Revenue AS RevenueDifference
FROM
Sales;
This query retrieves the revenue for each month, along
with the revenue from the subsequent month and the
difference in revenue between the subsequent month and
the current month. The LEAD() function is used to fetch
the revenue from the next row, ordered by the Month
column.
• CUMULATIVE SUM, AVERAGE

Cummalative sum :
SELECT
Month,
Revenue,
SUM(Revenue) OVER (ORDER BY Month) AS
cumulative_revenue
FROM
Sales;
Cummalative average:
SELECT
Month,
Revenue,
AVG(Revenue) OVER (ORDER BY Month) AS
cumulative_avg_revenue
FROM
Sales;
• Difference Between RDBMS and DBMS: "How do you differentiate between a
Relational Database Management System (RDBMS) and a Database Management
System (DBMS)? Can you provide examples of each?"

A Database Management System (DBMS) is a software


system that provides an interface for managing
databases. It facilitates the creation, maintenance, and
manipulation of databases. Examples of DBMS include
Microsoft Access, SQLite, and MongoDB (a NoSQL
database).

A Relational Database Management System (RDBMS) is


a type of DBMS that organizes data into tables with rows
and columns and enforces relationships among the tables
using keys. It supports the relational model of data
storage and retrieval. Examples of RDBMS include
MySQL, PostgreSQL, Oracle Database, and SQL Server.
• Difference Between View and Stored Procedure: "How would you differentiate a
view from a stored procedure in a database system? In what scenarios would you
choose one over the other?"

In a database system, views and stored procedures serve


distinct purposes and are utilized in different scenarios:

View:
A view acts as a virtual table derived from one or more
underlying tables.
It presents a structured subset of data and does not store
any data itself.
Views are primarily used to simplify data access, hide
sensitive information, and present data in a predefined
format.
They offer a way to create a reusable and simplified
representation of complex data relationships.

Stored Procedure:
A stored procedure is a precompiled set of SQL
statements stored in the database.
It can accept input parameters, perform operations on
data, and return results.
Stored procedures are often used to encapsulate business
logic, implement data manipulation operations, and
automate repetitive tasks.
They enable developers to execute complex logic on the
database server, reducing network traffic and improving
performance.
• Uses of NoSQL: "What are the primary uses of NoSQL databases, and in what
situations would you recommend a NoSQL database over a traditional SQL
database?"

NoSQL databases are recommended when scalability,


flexibility, performance, real-time analytics, agile
development, or IoT data management are critical
requirements for an application. However, it's essential to
evaluate the specific needs and characteristics of each
project to determine whether a NoSQL or SQL database
is the best fit.
• Indexes in Databases: "What is the role of indexes in a database, and how do they
impact performance? Can you discuss a situation where you had to optimize a query
using indexes? what are its pros and cons?"

Indexes play a crucial role in relational databases by


providing a way to quickly locate and access data within
tables. They improve the performance of queries by
allowing the database management system (DBMS) to
locate rows more efficiently, reducing the number of disk
I/O operations required to fulfill a query. Here's how
indexes impact performance and a scenario where
optimizing a query using indexes was necessary
Pros and Cons of Indexes:
• Pros:

1. Improved Query Performance: Indexes can


significantly enhance query performance by
reducing the time required to locate and access data.
2. Data Integrity: Unique indexes enforce data
integrity by ensuring the uniqueness of values in
indexed columns.
3. Support for Constraints: Indexes support various
constraints, such as primary key, foreign key, and
unique constraints, which help maintain data
consistency.
• Cons:

1. Increased Storage Overhead: Indexes consume


additional storage space, as they store copies of
indexed column values along with pointers to data
rows.
2. Overhead on Data Modification: Indexes need to
be updated whenever data in the indexed columns is
modified (e.g., INSERT, UPDATE, DELETE
operations). This can result in increased overhead
during data manipulation operations.
3. Maintenance Overhead: Managing and
maintaining indexes requires additional resources
and may introduce complexity, particularly in
environments with frequent data modifications.

• Given a table with columns


for salary and department, how would you write an SQL
query to count the number of employees per department?
Select Department, count(empID) from employees group by Department ;
• Write a query to find the second highest salary from each department.

WITH RankedSalaries AS (
SELECT
Salary,
DENSE_RANK() OVER (ORDER BY Salary DESC)
AS SalaryRank
FROM
YourTableName
)
SELECT DISTINCT Salary
FROM RankedSalaries
WHERE SalaryRank = 2
• How would you identify the names of employees whose

salaries are greater than the average salary, and can you
demonstrate this using a sub-query in SQL?
SELECT employee_name
FROM employees
WHERE salary > (
SELECT AVG(salary)
FROM employees
);
• What is a transaction in SQL? How are ACID properties maintained?

In SQL, a transaction is a unit of work performed within


a database management system (DBMS). It represents a
sequence of one or more database operations (such as
inserts, updates, or deletes) that should be executed
together as a single logical unit. The concept of a
transaction ensures that either all the operations within
the transaction are completed successfully and
permanently applied to the database, or none of them are
applied (in case of failure), maintaining data consistency
and integrity.

ACID is an acronym that stands for Atomicity,


Consistency, Isolation, and Durability. These properties
ensure the reliability and integrity of transactions in a
database system:

Atomicity: Atomicity guarantees that a transaction is


treated as a single unit of work, meaning that either all
the operations within the transaction are successfully
completed and applied to the database, or none of them
are applied. If any part of the transaction fails, the entire
transaction is rolled back to its original state.

Consistency: Consistency ensures that the database


remains in a consistent state before and after the
transaction. In other words, transactions must preserve
the integrity constraints and rules defined for the
database, maintaining data validity and correctness.

Isolation: Isolation ensures that the operations within one


transaction are isolated from the operations in other
concurrent transactions. Each transaction operates
independently and appears to execute in isolation from
other transactions, even if they are executed
concurrently. Isolation prevents interference or
inconsistency caused by concurrent transactions.

Durability: Durability guarantees that the changes made


by a committed transaction are permanent and will not be
lost, even in the event of a system failure or crash. Once
a transaction is committed, its effects are permanently
stored in the database and will survive system failures.

ACID properties are maintained by the database


management system (DBMS) through various
mechanisms such as transaction logs, locking
mechanisms, and recovery protocols. These mechanisms
ensure that transactions are executed reliably and
consistently, even in the presence of failures or
concurrent access by multiple users.
• What is a candidate key?

Candidate key is a single key or a group of multiple keys


that uniquely identify rows in a table. A candidate key is
a column or a combination of columns that uniquely
identifies each row in a table. It is used to ensure that
there are no duplicate or ambiguous records in a table.
•2 Tables given - how to print common rows from them without using INTERSECT and
INNER JOIN? - Ans Sub Query

WITH cte AS (
SELECT * FROM Table1
UNION ALL
SELECT * FROM Table2
)
SELECT * FROM cte
GROUP BY column1, column2
HAVING COUNT(*) > 1
• https://masai-school.notion.site/SQL-Test3916b80f6d924396bf7dcbb63abf2ba3?pvs=25

POWER BI:
• Youare given a dataset with sales data. How would you forecast sales for the next
month?
• A streaming service wants to build a recommendation system. How would you approach
this?

• What are the different types of data visualization


techniques?
• How proficient are you in Power BI? What is the

difference between Excel and Power BI?


• Define different types of filters available in Power BI.

• How do you create a pivot table?


• How do you create a bar chart?
• How do you create a line chart?
• How do you create a pie chart?

• How will you Extract the data from various sources


• How would you transform the data into the required form?

• Can you explain the difference between a pie chart and a

doughnut chart?
• Can you unpivot the columns in this table?
• Create a DAX function using the Logical question given.
• Create a DAX Function to find out subtotal of a column.
• Types of charts used and when should the respective charts have used.
• Data Cleaning, Data handling.

• How features/table will changed in many to many relationship

• What steps do you take to ensure accuracy in your


financial analyses?
• DAX Functions Power BI

• Difference between SUM and SUMX in Power BI

• How to deal with discrepancies in the data?

Power bi Questions live:


• Questions:
o 1. Organise the data in a manner that allows us to slice and dice the data to look at
sales foreach market segment, solution, partner, location etc.
o 2. Please prepare a partner performance sheet – showing how each partner has made
sales foreach solution. This should also give easy view of location wise, market
segment wise sales.Please also undertake a RAG (Red, yellow, green) analysis
applying a criteria of green >= 1crore of sales, yellow >= 50 lac of sales and red <50
lac of sales
o 3. Please put together graphics to depict the partner performance – the visuals should
showthe partner headcount for each location who fall under the red/ yellow and green
categories
o Dataset link:
o
https://docs.google.com/spreadsheets/d/1W8Jb8kX0YBiFXwZ2sJ_cPhM2hh46f6zn/e
dit#gid=506364694 for dataset

ETL:
• What is ETL?
• What are the main advantages of using cloud computing in data processing?
• Describe the differences between a data warehouse and a data lake.
• What are the steps involved in the ETL process?
• What are the different types of ETL tools?
• What are the benefits of using an ETL tool?
• What are the challenges of using an ETL tool?
• How do you choose the right ETL tool for your needs?
• How do you design an ETL process?
• How do you implement an ETL process?
• How do you monitor an ETL process?
• How do you troubleshoot an ETL process?
• How do you maintain an ETL process?
• What are the best practices for ETL?
• What are the security considerations for ETL?
• What are the compliance considerations for ETL?
• What are the ethical considerations for ETL?
• How does ETL relate to data warehousing?
• How does ETL relate to data mining?
• How does ETL relate to data visualization
• What are the future trends in ETL?
• What are the challenges of ETL in the cloud?
• What is ETL?
• What are the steps involved in the ETL process?
• What are the different types of ETL tools?
• What are the benefits of using an ETL tool?
• What are the challenges of using an ETL tool?
• How do you choose the right ETL tool for your needs?
• How do you design an ETL process?
• How do you implement an ETL process?
• How do you monitor an ETL process?
• How do you troubleshoot an ETL process?
• How do you maintain an ETL process?
• What are the best practices for ETL?
• What are the security considerations for ETL?
• What are the compliance considerations for ETL?
• What are the ethical considerations for ETL?
• How does ETL relate to data warehousing?
• How does ETL relate to data mining?
• How does ETL relate to data visualization?
• What are the future trends in ETL?
• What are the challenges of ETL in the cloud?

• What is the role of an ETL (Extract, Transform, Load)


process in data analytics?
• Provide concise definitions for both ELT (Extract, Load,

Transform) and ETL (Extract, Transform, Load)


processes. What are the primary differences between
these two data integration approaches?
• Can you explain the process you follow to ensure data is correctly deployed to a data
warehouse?
• Do you have any idea of snowflakes - BASIC
• Do you know about the ETL process - BASIC

PROGRAMMING CODE:
• Write a function to compute the factorial of a number.
• Write a function to create a queue and give a size of the

queue
• Write a function to print all numbers less than a given

number.
• Question on time complexity of student’s code

• Prime number printing from 1 to 10

• Reverse string without using step size

• In Python, how would you write a program where the sum of the elements on the left
side of an array equals the sum on the right side?
• How would you use a dictionary in Python to count the frequency of each integer in an
array?
• Using an online Python compiler, can you demonstrate how to split a string into two,
reverse each part, and then merge them?
• In Python, how would you identify the elements with the minimum number of
repetitions in a list? Can you do this using binary sorting or a stack?
• How would you write a Python program to check if brackets in an input string are
balanced?
• How would you find the first Equilibrium Point in an array, where the point is defined
as a position such that the sum of elements before it equals the sum of elements after
it? Please write a Python function to return the index of this point.
• Can you write a Python program that checks if the brackets in a given input string are
balanced?
• Define a Python function to determine whether a given number is prime.
• Using Pandas, how would you write code to find the total number of educated people in
each country?
• Using Pandas again, can you write a query to calculate the average monthly income for
each country, segmented by education level?
• Can you explain how to find the first Equilibrium Point in an array of n positive
numbers?
• How would you convert a string in Roman numeral format to an integer?
• What approach would you use to sort an array containing only 0s, 1s, and 2s in
ascending order?
• How would you create a Python database to find the names of customers from India
who are older than 30?
• How can one create a Dashboard in Python?
• (Take two table withid and name for Students and Teachers)Prime number detection,
Binary search à python code

• Bracket Combinations -
https://coderbyte.com/information/Bracket Combinations
• Bracket Matcher -

https://coderbyte.com/information/Bracket Matcher
• Codeland Username Validation -

https://coderbyte.com/information/Bracket Matcher
• Find Intersection - https://coderbyte.com/information/Find

Intersection
• Question Marks - https://coderbyte.com/information/Find

Intersection
• Find Reverse - https://coderbyte.com/information/Find

Intersection
• First Factorial - https://coderbyte.com/information/Find

Intersection
• Longest Word -

https://coderbyte.com/information/Longest Word
• How to bring JSON file to structure and convert to CSV?

import json
import pandas as pd
# Step 1: Read the JSON file
with open('data.json') as f:
data = json.load(f)
# Step 2: Normalize JSON data (if needed)
# Example: If 'data' is a list of dictionaries, you can
directly convert it to a DataFrame
# If 'data' is a nested JSON structure, you may need
to normalize it using pandas.json_normalize()

# Step 3: Convert to structured format (DataFrame)


df = pd.DataFrame(data)
# Step 4: Export to CSV
df.to_csv('data.csv', index=False)
• https://masai-school.notion.site/Python-Test-24088e56c9f3442a8eae2422f0123bca?
pvs=25
• How would you check if given input is a palindrome

PROBLEM SOLVING:
• How would you estimate the number of cars in Delhi?
• Describe a time when you had to analyze data to make a decision. What was the
outcome?
• How do you prioritize when given multiple tasks related to shipment on-time
performance and SLA compliance?

• How would you approach a case where you have to determine the toll cost for a
particular route?
• Describe a situation where you had to solve a problem using quantitative reasoning.
• How would you determine the ratio of diabetes patients living in Chandigarh?
• Guesstimate Challenge: Estimate the opening week revenue generated by the movie
"JAWAN" in Delhi. Consider the data for Friday, Saturday, and Sunday. Please walk
us through your thought process.
• Guesstimate Challenge 2: Assume "JAWAN" opened on a holiday Friday in
Bangalore. Can you guesstimate its opening weekend revenue? Describe your
approach.
• Number of iPhone users in India
• 25 horses - find top 3 horses by organizing races - each race can only accommodate 5
horses - find min number of races required to find the top 3 horses.
• What additional features can you add to the Zomato app to improve customer
experience
• If someone is ordering from Zomato - at which point can he just dropout from the app
and not order anything?
• The Zomato app is opened 10 thousand times and every 4th opening of the app is
converting to an order. Average login hours of the delivery partners is 30 hours per
week - and they are delivering 2 orders every hour. How many delivery partners are
required to complete the 2500 orders that are converting from the 10,000 app
openings.
• What is the minimum number of cuts needed to divide a cake into 8 equal slices?
• What approach do you take when dealing with missing or inconsistent data in financial
reports?
• What was a significant error you identified in a financial analysis, and how did you
handle it?
• What methods do you use for forecasting financial trends?
• What criteria do you use to prioritize tasks when working on multiple financial
analyses?
• What is an example of a business problem you solved using data?
• Explain the Nifty Fifty Stock Price Prediction
o Cross Questioning
o How did you perform data cleaning?
o What evaluation metrices did you use to test the accuracy of the models?
o RMSE
o Could you have used an alternative of Linear Regression or Polynomial
Regression?
• Question from mandeline
Scenario 1: Keep a Month's Stock (28 days) in FBA
1. Calculate the total volume for each SKU.
2. Convert the total volume to total pallets.
3. Round up to the nearest whole pallet.
4. Calculate the storage cost for each SKU (rounded-up pallets * Cost Per Pallet).
5. Sum up the storage costs for all SKUs.
6. Since the stock is held for 28 days or more, no understocked fee is applied.
Scenario 2: Keep 20 Days' Stock in FBA
1. Calculate the total volume for each SKU.
2. Convert the total volume to total pallets.
3. Round up to the nearest whole pallet.
4. Calculate the storage cost for each SKU (rounded-up pallets * Cost Per Pallet).
5. If the stock is held for less than 28 days, apply the understocked fee for the
understocked volume (understocked volume * Fee per CBM).
• In a scenario where 100 individuals are surveyed, with 80 liking tea and 70 liking
coffee, what is the potential range of people who enjoy both beverages?
• If faced with a situation involving a gun with 6 barrels, two bullets, and a person in
front of it, how would you maximize the person's chances of survival?
• You are in a situation where you are dealing with a person who is non-cooperative,
how do you deal with such a situation and come out with a win-win scenario
• If you were in a team, 2 of your team members are posing 2 different ideas, which idea
would you be going with?
• In a team, you are dealing with a problematic person (you have a personal level
problem with him/her) how would you deal with such a situation to not
impact/hamper your work.

• Coca Cola is planning to launch a Grape flavored soft


drink, how do you validate if Coca Cola will be able to
make maximum profit with a data driven approach?
• There are 3 switches outside the room, 3 Bulbs inside the
room, identify which switch is connected to which bulb.
You can enter the room only once. You cannot come out
without answering.
• Your friend and yourself are working on a project, Your friend is not available due to
some reason, you are expected to complete the project by the end of the day, how will
you deal with this situation, where you have to complete both your work and your
friend’s work. - Answer - Take help of teammates
• In a team, you are dealing with a problematic person (you have a personal level problem
with him/her) how would you deal with such a situation to not impact/hamper your
work.
• What are your weaknesses?
• Tell me a situation where you could overcome your weakness?

OTHER QUESTION RELATED TO ANALYST:


• What is the Central Limit Theorem and why is it important
in data science?
• How would you explain a p-value to a non-technical

stakeholder?
• What are the benefits of using Python for data science?

• What are the benefits of using SQL for data science?

• How do you communicate your findings to non-technical audiences?


• How do you work with stakeholders to ensure that your data science projects are
successful?
• How do you manage your time and resources effectively?
• How do you handle setbacks and failures?
• How do you stay up-to-date on the latest trends in data science?
• Have you ever been responsible for developing or revising data collection and analysis
policies? Can you provide an example?
• How do you ensure that the data collection policies you develop are adhered to by all
relevant stakeholders?
• Describe a situation where you introduced a new data procurement or processing
program. What challenges did you face and how did you overcome them?
• How do you usually collaborate with IT departments to ensure effective deployment of
software and hardware upgrades?
• How do you monitor and evaluate the effectiveness of analytics results?
• Share an experience where you found a discrepancy or anomaly in data. How did you
handle it?
• Describe a challenging situation you faced while managing a team. How did you resolve
it?

• Role Understanding: How do you think your skills in


dealing with clients and understanding their journey,
especially focusing on sales teams, will help you in this
role?
• What do you understand by the term "Equity"?

• Can you explain the differences between Bonds and Debentures?


• How would you define a Hedge Fund?
• What is the primary objective of Venture Capital?
• Describe the core features of a Mutual Fund.
• How do you differentiate between Securities and Joint Ventures?
• What is the role of SEBI in financial markets?
• How does a Stock Exchange function?
• How would you differentiate between Shares and Stocks?
• What do you understand by the term "Capital"?

• VBA

• What do you understand about SAP


• Questions on Data handling

• What are your strategies for staying updated with new


financial analysis tools and techniques?
• What is an example of a financial report you’ve prepared and presented?
• What practices do you follow to ensure clear communication in a remote work
environment?

• What techniques do you use to explain complex financial


data to non-experts?
• What steps would you take if you were given incomplete financial data for analysis?
• What is your approach to handling a tight deadline for a critical financial report?
• What was an instance where you had to quickly learn a new financial analysis tool?
• What is an example of a tough decision you made based on financial data, and what was
the outcome?
• What is a significant change you have adapted to in a project?
• What attracts you to working in structured financial transactions?
• What are your strategies for contributing effectively to the Emend Analytics team?
• What approaches would you use for remote collaboration with the analytics and
operations teams?

• Imagine we are launching a new product. Can you suggest


five marketing tactics we could use for this launch?
• Picked up content writing from Resume
o If you want to write the content wrt Snowflake for example, what will be your
approach?

• How do you use your statistical knowledge in Data


Operations?
• How can be identified outlier and what is IQR ?
• What is ANOVA
• What is z score ?
• How can you deal with null values ?
• If you have salary column, what is the approach you used to fill the null value
• When we use mean median mode?
• What is z score?
• Why we use z score?
• How we can deal with missing values?
• How we find missing values exact code?
• What do you know about ANOVA?

• Explain 2 queries and tell about business aspects of them


• Connecting the live data that comes in to the data

warehouse
• If this is my average order value then what should I do to

increase the value


• Data Lake

• Data Pipeline

• Why did you choose this algorithm, how do you find


outliers, on what basis do you do feature selection, how
do you find correlation?
• Why did you use this linear regression

• What is GCP, Big Query, DBT Cloud?

• How is GCP different from AWS?


• What alternatives do you know for Big query?

• Explain the primary functions of Apache Kafka and how it


facilitates real-time data streaming. Can you provide a
concise definition of its role in distributed systems?
• Define the key functions of data warehouses. How do they support decision-making
processes, and what distinguishes them from other data storage solutions?
• Differentiate between a data warehouse and a data lake. Define the core characteristics
of each and explain when it is preferable to use one over the other.
• Define the role of cloud
platforms in data engineering, specifically focusing on how
AWS, Azure, or Google Cloud Platform support and enhance data processing.
• Define the term "NoSQL databases" give examples of such databases. How are they
different from conventional databases?
• Define the terms - data security, access controls in the context of Data Engineering.

• What is the difference between a data analyst and a


business analyst?
• How can Pampers optimize its market share using data

analytics?
• Explain AI algorithms such as KNN and K-means

clustering, as well as the distinctions between C++ and


Python, and RDBMS concepts like MySQL queries?
• Data Interpretation
• Quantitative Aptitude

o Verbal Ability
o Reasoning Ability
• Communication - Same as Cognizant Communication Test
• Questions with Audio based info and answer basis the Audio
• Trait Based Assessment - Similar to IBM - Psychrometric Test
• Supply chain

MEDUPLUS COMPANY TEST


Question 1: What is the primary objective of data analysis in the context of a pharmaceutical
company like MedPlus?
a) To increase shareholder value
b) To enhance customer satisfaction
c) To optimize business operations
d) To minimize regulatory compliance
Correct Answer: c) To optimize business operations
Question 2: Which of the following statistical measures is used to measure the central
tendency of a dataset?
a) Standard deviation
b) Variance
c) Mean
d) Range
Correct Answer: c) Mean
Question 3: Which data visualization technique is best suited for comparing the distribution
of a categorical variable across different groups?
a) Pie chart
b) Box plot
c) Scatter plot
d) Histogram
Correct Answer: b) Box plot
Question 4: Which of the following statements about correlation is true?
a) Correlation implies causation
b) Correlation indicates a strong linear relationship between variables
c) Correlation ranges from -1 to 1
d) Correlation is not affected by outliers
Correct Answer: c) Correlation ranges from -1 to 1
Question 5: What is the main advantage of using SQL (Structured Query Language) in data
analysis?
a) It can handle unstructured data efficiently
b) It enables complex statistical analysis
c) It allows for easy manipulation and retrieval of data
d) It requires minimal computational resources
Correct Answer: c) It allows for easy manipulation and retrieval of data
Question 6: In a data analysis project, what is the purpose of data cleaning?
a) To remove outliers from the dataset
b) To transform raw data into a usable format
c) To visualize data using charts and graphs
d) To analyze trends and patterns
Correct Answer: b) To transform raw data into a usable format
Question 7: Which of the following machine learning algorithms is commonly used for
classification tasks?
a) K-means clustering
b) Linear regression
c) Decision trees
d) Principal component analysis
Correct Answer: c) Decision trees
Question 8: What is the significance of A/B testing in data analysis?
a) It helps in identifying outliers in the dataset
b) It allows for the comparison of two or more groups to determine the effectiveness of a
change
c) It is used to visualize the distribution of a continuous variable
d) It measures the association between two categorical variables
Correct Answer: b) It allows for the comparison of two or more groups to determine the
effectiveness of a change
Question 9: What does the term "data-driven decision-making" refer to?
a) Making decisions based on intuition and gut feeling
b) Making decisions based on empirical evidence and data analysis
c) Making decisions based on industry trends and benchmarks
d) Making decisions based on feedback from stakeholders
Correct Answer: b) Making decisions based on empirical evidence and data analysis
Question 10: Which of the following charts in Tableau is used to visualize the relationship
between two continuous variables?
a) Bar chart
b) Line chart
c) Scatter plot
d) Pie chart
Correct Answer: c) Scatter plot
Question 11: In HR analysis, what does the term "turnover rate" refer to?
a) The rate at which employees are promoted within the organization
b) The rate at which employees are hired by the organization
c) The rate at which employees leave the organization
d) The rate at which employees are trained within the organization
Correct Answer: c) The rate at which employees leave the organization
Question 12: Which of the following HR metrics measures the average time it takes to fill a
job opening?
a) Attrition rate
b) Time-to-hire
c) Turnover rate
d) Recruitment cost
Correct Answer: b) Time-to-hire
Question 13: Which of the following HR metrics measures the effectiveness of employee
training programs?
a) Attrition rate
b) Employee satisfaction score
c) Training ROI (Return on Investment)
d) Time-to-fill
Correct Answer: c) Training ROI (Return on Investment)
Question 14: In HR analysis, what does the term "EEO-1 report" refer to?
a) A report on employee engagement and satisfaction
b) A report on workplace safety incidents
c) A report on diversity and affirmative action
d) A report on employee training and development
Correct Answer: c) A report on diversity and affirmative action
Question 15: In Tableau, what is the purpose of a calculated field?
a) To summarize data using aggregate functions
b) To filter data based on specific criteria
c) To perform mathematical operations on data fields
d) To create hierarchies in data visualization
Correct Answer: c) To perform mathematical operations on data fields

Food Delivery Prediction Application | -Developed a web-


based application capable of providing restaurant
recommendations based on various user-specific criteria.
Developed an interactive dashboard to effectively visualize
the distribution of data in a dataset. Utilized a machine
learning algorithm, specifically Random Forest Regression, to
predict restaurant outcomes. Integrated a case study approach
to determine the optimal location for establishing a cloud
kitchen.

Answer 1: Lean manufacturing is a systematic approach to production management that


focuses on minimizing waste and maximizing efficiency. For example, in a food processing
plant, implementing lean principles may involve optimizing the layout of the production floor
to reduce unnecessary movement of workers or ingredients, implementing visual
management techniques like color-coded labels or visual cues to streamline inventory control,
and continuously improving processes through employee suggestions and regular Kaizen
events. By implementing these lean practices, the company can reduce production waste,
improve workflow, and enhance overall efficiency.
Notes:
• Customize the example to an industry or manufacturing process you are familiar with.
• Discuss how lean principles can be applied to reduce waste and improve efficiency in
that specific context.
• Highlight any personal experiences or projects where you have implemented lean
practices in a relatable industry.
Question 2: What is the theory of constraints and inventory?
Answer 2: The theory of constraints (TOC) is a management philosophy that aims to identify
and overcome bottlenecks or constraints that limit an organization's performance. For
instance, in a retail supply chain, TOC can be applied to address inventory-related
constraints. Let's consider an online clothing retailer. By analyzing the flow of goods from
suppliers to customers, the company can identify bottlenecks, such as excess inventory at
certain distribution centers or inefficient transportation routes. By implementing strategies
like demand-driven inventory management, strategic sourcing, and real-time inventory
visibility, the retailer can alleviate the inventory-related constraints, optimize inventory
levels, and improve overall supply chain performance.
Notes:
• Customize the example to a specific industry or supply chain scenario you are familiar
with.
• Explain how TOC principles and inventory optimization strategies can be applied to
improve efficiency and performance in that particular context.
• Highlight any personal experiences or projects where you have addressed inventory-
related constraints in a similar industry or supply chain.
Question 3: What is BOM - Bill of Materials?
Answer 3: Bill of Materials (BOM) is a comprehensive list of all the components, parts, and
raw materials required to manufacture a product. For example, let's consider the production
of a smartphone. The BOM for a smartphone would include items such as the display,
processor, battery, camera module, connectors, and various other components. Each item in
the BOM would have detailed information like part numbers, descriptions, quantities, and
sometimes even cost. This detailed BOM serves as a reference for production planning and
procurement, ensuring that the necessary components are available in the correct quantities
and at the right time for assembly.
Notes:
• Customize the example to a specific product or industry you are familiar with.
• Discuss how accurate and up-to-date BOMs are essential for efficient production
planning, procurement, and inventory control in that specific industry.
• Highlight any personal experiences or projects where you have worked with BOMs in a
similar industry.
Question 4: What is process flow?
Answer 4: Process flow refers to the sequence of steps or activities involved in a specific
process or operation. For example, let's consider the process flow of order fulfillment in an e-
commerce company. The process flow may include steps such as order receiving, inventory
picking, packing, labeling, shipping, and final delivery to the customer. Visualizing the
process flow through a flowchart or process map can help identify areas of inefficiency or
bottlenecks, such as long wait times or unnecessary handoffs between departments. By
analyzing the process flow and making improvements, the company can streamline
operations, reduce lead times, and enhance customer satisfaction.
Notes:
• Customize the example to a specific industry or process you are familiar with.
• Discuss how process flow analysis and optimization can improve efficiency and
customer experience in that specific context.
• Highlight any personal experiences or projects where you have analyzed and improved
process flows in a similar industry.
Question 5: How do you calculate cycle time?
Answer 5: Cycle time is the total time required to complete one cycle of a process or
operation. For example, in a call center environment, if it takes an average of 5 minutes to
handle a customer call from start to finish, the cycle time for call handling would be 5
minutes. By calculating cycle time, the company can evaluate the efficiency of its operations
and identify areas for improvement. Reducing cycle time through process optimization can
lead to increased productivity, shorter customer wait times, and improved customer
satisfaction.
Notes:
• Customize the example to a specific industry or process you are familiar with.
• Discuss how reducing cycle time can lead to improved operational efficiency and
customer satisfaction in that particular context.
• Highlight any personal experiences or projects where you have calculated and optimized
cycle times in a relatable industry.
EXTRA QUESTIONS
3Five8 Technologies - Product Management
Intern
Resources:

1. Product Masterclass [Video]: https://masai-course.s3.ap-south-


1.amazonaws.com/material/videos/55144/caPun4l8midU0Z06qkmhi2ulCgt3xl0CUpF
QtkOw.mp4

2. Product Masterclass [PDF]: https://masai-course.s3.ap-south-


1.amazonaws.com/editor/uploads/2023-04-04/Katana Sri Ajay- Whatsapp trading
presentation (1)_383038.pdf

3. https://www.youtube.com/watch?v=yUOC-Y0f5ZQ&ab_channel=Atlassian
4. https://www.youtube.com/watch?v=6y545eCNHG8&ab_channel=PMDiegoGranados
5. https://www.youtube.com/watch?v=m-qyEDwB1tw&ab_channel=Exponent
6. https://www.youtube.com/watch?
v=2dczveSrsv8&list=PLIvg2wJAAhT6hpxKQs4YJGbfesyPDkv8E&index=7
7. https://www.youtube.com/watch?
v=n530l09t8zY&list=PLIvg2wJAAhT6hpxKQs4YJGbfesyPDkv8E&index=9
General Questions
1. Can you describe a product you successfully brought to market?
2. How do you prioritize features for a new product or an existing one?
3. What products do you admire and why?
4. How would you handle a situation where the development team is missing deadlines?
5. Why switch from Data Anaytics to Product Management ?
6. Define Product Management / What do you know about Product Management.
7. What do you know about our business?
8. How can you as a Product Manager help grow our business?
Customer Focus
1. How do you gather customer requirements?
2. Can you provide an example of a time when you had to balance customer needs with
business needs?

3. How do you define and measure customer success?


Market and Competitive Analysis
1. How do you conduct a competitive analysis?
2. How do you identify new market opportunities?
3. How would you launch a product in a crowded market?
Technical Understanding
1. How do you work with engineering teams to build a product?
2. Can you explain a complex technical concept to a non-technical audience?
3. How would you handle disagreements between engineering and design teams?
Strategic Thinking
1. How do you align product strategy with company goals?
2. How do you decide when to pivot or kill a product?
3. How do you assess the risks and rewards of a particular product strategy?
Behavioral Questions
1. Can you describe a time when you had to overcome a significant challenge on a
product?

2. How do you handle stress and high-pressure situations?


3. Tell me about a time when you had to manage conflicting priorities or stakeholders.
Case Studies and Hypothetical Scenarios
1. How would you improve [popular product]?
2. How would you design a new product for [specific target audience]?
3. How would you approach entering a new market with an existing product?
Metrics and Analytics
1. What key performance indicators (KPIs) would you track for a particular product?
2. How do you use data to inform product decisions?
3. How do you conduct A/B testing?

You might also like