Wipro Data Analyst Interview Questions
Wipro Data Analyst Interview Questions
1. INNER JOIN
FROM employees e
• Returns all rows from the left table and matching rows from the right table.
FROM employees e
• Returns all rows from the right table and matching rows from the left table.
• Returns all rows from both tables, with NULLs where there are no matches.
FROM employees e
5. CROSS JOIN
FROM employees e
FROM employees
GROUP BY employee_id
SELECT *
FROM employees e1
FROM employees
GROUP BY name
);
1. ROW_NUMBER()
FROM employees;
2. RANK()
FROM employees;
3. DENSE_RANK()
FROM employees;
FROM employees;
1. Use Indexing – Index columns used in WHERE, JOIN, and ORDER BY.
);
4. Use Joins Efficiently – Prefer INNER JOIN over OUTER JOIN when possible.
name VARCHAR(100),
salary INT,
department_id INT
);
CTE Example
WITH TopEmployees AS (
SELECT employee_id, name, salary FROM employees WHERE salary > 50000
Subquery Example
FROM employees
Example Schema
Unnormalized Table:
2 Bob Mouse 20
1. Customers Table
Name VARCHAR(100)
);
2. Orders Table
CustomerID INT,
);
3. Products Table
ProductName VARCHAR(100),
Price DECIMAL(10,2)
);
Types of Indexes
name VARCHAR(100)
);
2. Unique Index
3. Composite Index
Best Practices:
import pandas as pd
df.dropna(inplace=True)
df.dropna(axis=1, inplace=True)
df.fillna(0, inplace=True)
df['A'].fillna(df['A'].mean(), inplace=True)
df['B'].fillna(df['B'].median(), inplace=True)
df['B'].fillna(df['B'].mode()[0], inplace=True)
df.fillna(method='ffill', inplace=True)
Use Case When data needs to change When data should remain constant
Example
# List (mutable)
my_list = [1, 2, 3]
my_list.append(4) # Allowed
print(my_list) # [1, 2, 3, 4]
# Tuple (immutable)
my_tuple = (1, 2, 3)
import pandas as pd
df = pd.DataFrame(data)
df.drop_duplicates(inplace=True)
print(df)
Example of apply()
print(df)
Example of map()
python
CopyEdit
print(df)
import pandas as pd
df = pd.DataFrame(data)
grouped_df = df.groupby('Department')['Salary'].mean()
print(grouped_df)
• Count: df.groupby('Department')['Salary'].count()
• Sum: df.groupby('Department')['Salary'].sum()
• Max: df.groupby('Department')['Salary'].max()
• Min: df.groupby('Department')['Salary'].min()
Applying Multiple Aggregations
import pandas as pd
def clean_data(df):
# Drop duplicates
df.drop_duplicates(inplace=True)
df.fillna(df.mean(numeric_only=True), inplace=True)
return df
# Example usage
data = {'Name': ['Alice', 'Bob', 'Alice', None], 'Age': [25, 30, None, 40]}
df = pd.DataFrame(data)
cleaned_df = clean_data(df)
print(cleaned_df)
import pandas as pd
df = pd.read_csv('dirty_data.csv')
cleaned_df = clean_data(df)
cleaned_df.to_csv('cleaned_data.csv', index=False)
Syntax
import pandas as pd
print(df)
data.sort(key=lambda x: x[1])
print(filtered_data)
• Click Home > Get Data and import data (Excel, SQL, CSV, etc.).
• Use the Report View to add charts, tables, KPIs, and slicers.
• Common visuals: Bar charts, Line charts, Pie charts, Cards, Maps.
TotalSales = SUM(Sales[Amount])
Visualization
Good, but limited customization More flexibility & aesthetics
Options
Power BI is great for Microsoft-based companies & affordability.
Tableau is preferred for advanced data visualization & interactivity.
Aggregate Functions
TotalSales = SUM(Sales[Amount])
AverageSales = AVERAGE(Sales[Amount])
Filter Functions
SalesLastYear = CALCULATE(SUM(Sales[Amount]),
SAMEPERIODLASTYEAR(Sales[Date]))
Logical Functions
Ranking Functions
DAX is essential for advanced calculations, custom measures, and enhanced data
analysis in Power BI.
1. DirectQuery Mode
• Connect to live data sources (SQL Server, Snowflake, Azure).
Best approach depends on whether the data source supports live connections.
• In Options > Query Reduction, enable "Reduce the number of queries sent to
the server".
5. Optimize Visuals
A combination of better data modeling, optimized DAX, and efficient data storage
significantly improves Power BI performance.
[Region] = "East" -- Users with this role will only see East region data
[Email] = USERPRINCIPALNAME()
Dynamic RLS ensures that each user only sees their permitted data.
Final Thoughts
Makes predictions or
Provides insights into dataset
Purpose inferences about a larger
characteristics.
population.
Feature Descriptive Statistics Inferential Statistics
• Python Example:
import pandas as pd
import numpy as np
data = pd.DataFrame({'values': [10, 12, 13, 15, 200, 17, 19, 21]})
Q1 = data['values'].quantile(0.25)
Q3 = data['values'].quantile(0.75)
IQR = Q3 - Q1
• Python Example:
data['zscore'] = np.abs(stats.zscore(data['values']))
print(outliers)
data['log_values'] = np.log(data['values'])
Types of Correlation
• Positive Correlation (r > 0): As X increases, Y increases (e.g., more study hours →
higher grades).
• Negative Correlation (r < 0): As X increases, Y decreases (e.g., more time on
social media → lower productivity).
• Python Example:
import pandas as pd
df = pd.DataFrame({'X': [10, 20, 30, 40], 'Y': [15, 25, 35, 50]})
correlation = df['X'].corr(df['Y'])
spearman_corr = df.corr(method='spearman')
kendall_corr = df.corr(method='kendall')
4 .Calculate p-value
print("p-value:", p_value)
Significance
• Explore data columns (sales amount, region, product category, time, etc.).
df['Month'] = pd.to_datetime(df['Date']).dt.month
df.groupby('Month')['Sales'].sum().plot(kind='line', marker='o')
plt.show()
df.groupby('Product')['Sales'].sum().plot(kind='bar')
df[['Sales', 'Ad_Spend']].corr()
model = ExponentialSmoothing(df['Sales']).fit()
df['Forecast'] = model.predict(start=0, end=len(df))
• Handle missing values (fillna() for missing data, drop_duplicates() for duplicate
transactions).
df['Transaction_Date'] = pd.to_datetime(df['Transaction_Date'])
df.set_index('Transaction_Date')['Amount_Spent'].resample('M').sum().plot(kind='line')
import datetime as dt
today = dt.datetime.today()
df['Frequency'] = df.groupby('Customer_ID')['Transaction_Date'].count()
df['Monetary'] = df.groupby('Customer_ID')['Amount_Spent'].sum()
3 .Product Performance
df.groupby('Product_Category')['Amount_Spent'].sum().plot(kind='bar')
• Avoid technical jargon (e.g., say "sales increased" instead of "revenue has an
upward linear regression trend").
• Use bullet points & structured slides.
• Instead of “Customer churn rate is 10%,” say “We lost 1,000 customers last
month, costing $50K in revenue”.
4. Operational Metrics
29. How would you automate a weekly sales report in Power BI?
1. Connect to Data Source
Sales_Growth = DIVIDE([This Week Sales] - [Last Week Sales], [Last Week Sales])
30. What steps would you take to validate data accuracy before
creating a report?
1. Check for Missing & Duplicate Data
df.isnull().sum() # Check for missing values
Data validation prevents incorrect insights that could lead to poor business
decisions.