0% found this document useful (0 votes)
32 views3 pages

Text 3

The document outlines key responsibilities of a Data Analyst, including data collection, cleaning, and analysis, as well as the importance of KPIs. It also provides technical questions and answers related to SQL, Python, Pandas, Power BI, and Tableau, along with scenario-based questions for practical application. Overall, it serves as a comprehensive guide for understanding data analysis concepts and techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views3 pages

Text 3

The document outlines key responsibilities of a Data Analyst, including data collection, cleaning, and analysis, as well as the importance of KPIs. It also provides technical questions and answers related to SQL, Python, Pandas, Power BI, and Tableau, along with scenario-based questions for practical application. Overall, it serves as a comprehensive guide for understanding data analysis concepts and techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

Theoretical Questions

1. What are the key responsibilities of a Data Analyst?


• Answer:
• Collecting and interpreting data.
• Cleaning and organizing data.
• Performing analysis to identify patterns and trends.
• Creating visualizations and dashboards to communicate insights.
2. Explain the difference between structured and unstructured data.
• Answer:
• Structured Data: Data that is organized in rows and columns (e.g.,
databases).
• Unstructured Data: Data that is not organized in a predefined format
(e.g., text files, images).
3. What is the importance of data cleaning in analysis?
• Answer: Data cleaning ensures accuracy, consistency, and reliability of
insights by removing errors, duplicates, and irrelevant information.
4. What is the difference between correlation and causation?
• Answer:
• Correlation: A relationship between two variables without implying
causation.
• Causation: One variable directly affects another.
5. Describe the concept of normalization in databases.
• Answer: Normalization is the process of organizing data to reduce
redundancy and improve integrity by dividing a database into smaller tables and
defining relationships between them.
6. What is the significance of KPIs (Key Performance Indicators) in data
analysis?
• Answer: KPIs measure the performance of specific objectives, helping
organizations track progress and make data-driven decisions.
Technical Questions
SQL
1. Write an SQL query to find the second highest salary in an employee
table.
• Answer:SELECT MAX(salary) AS second_highest_salary
• FROM employees
• WHERE salary < (SELECT MAX(salary) FROM employees);

2. Scenario: You need to calculate the retention rate of customers month
over month. How would you write this query?
• Answer:SELECT month, COUNT(DISTINCT customer_id) AS retained_customers
• FROM transactions
• WHERE customer_id IN (
• SELECT customer_id
• FROM transactions
• WHERE month = 'previous_month'
• )
• GROUP BY month;

3. Write a query to identify duplicate records in a table.
• Answer:SELECT column1, column2, COUNT(*)
• FROM table_name
• GROUP BY column1, column2
• HAVING COUNT(*) > 1;

4. Scenario: Identify the top-selling product in each category.**
• Answer:SELECT category, product, MAX(sales) AS max_sales
• FROM sales_data
• GROUP BY category, product
• ORDER BY category, max_sales DESC;

Python
1. How would you handle missing data in a Pandas DataFrame?
• Answer:import pandas as pd
• df['column_name'].fillna(df['column_name'].mean(), inplace=True)
• df.dropna(inplace=True)

2. Scenario: Write a Python script to calculate the correlation matrix for
a given dataset and visualize it as a heatmap.**
• Answer:import pandas as pd
• import seaborn as sns
• import matplotlib.pyplot as plt

• corr_matrix = df.corr()
• sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
• plt.show()

3. Write a Python script to find the most frequent word in a text column
of a DataFrame.
• Answer:from collections import Counter

• most_common_word = Counter('
'.join(df['text_column']).split()).most_common(1)
• print(most_common_word)

4. Scenario: Automate email sending for a monthly sales report.**
• Answer:import smtplib
• from email.mime.text import MIMEText

• def send_email(subject, body, recipient):
• msg = MIMEText(body)
• msg['Subject'] = subject
• msg['From'] = '[email protected]'
• msg['To'] = recipient

• with smtplib.SMTP('smtp.example.com', 587) as server:
• server.login('[email protected]', 'password')
• server.sendmail('[email protected]', recipient,
msg.as_string())

Pandas
1. How would you group data by a column and calculate the mean for each
group?
• Answer:grouped_data = df.groupby('column_name')
['numeric_column'].mean()

2. Scenario: Identify the top 5 products by sales in each region.**
• Answer:top_products = df.groupby('region')['sales'].apply(lambda x:
x.nlargest(5))

3. How do you merge two DataFrames on a common column?
• Answer:merged_df = pd.merge(df1, df2, on='common_column', how='inner')

4. Scenario: Create a rolling average for sales data over a 7-day
window.**
• Answer:df['7_day_avg'] = df['sales'].rolling(window=7).mean()

Power BI
1. How would you create a calculated column to classify sales as "High" or
"Low" based on a threshold?
• Answer:SalesCategory = IF(Sales[Amount] > 10000, "High", "Low")

2. Scenario: Create a dashboard to show sales trends, regional
performance, and key KPIs.**
• Answer:
• Use line charts for trends.
• Create bar/column charts for regional performance.
• Use card visualizations for key KPIs.
Tableau
1. Explain how to create a calculated field in Tableau.
• Answer:
• Go to the Data pane, click on the drop-down menu, and select "Create
Calculated Field."
• Write the formula, e.g., IF SUM(Sales) > 10000 THEN 'High' ELSE 'Low'
END.
2. Scenario: Create a churn analysis dashboard with KPIs and
visualizations.**
• Answer:
• Filter churned customers by last transaction date.
• Visualize with heatmaps for geographic data.
• Use line charts for churn trends.
Scenario-Based Questions
1. Guesstimate: How many Uber rides are taken in New York City on a
typical weekday?**
• Answer Approach:
• Estimate the population of NYC.
• Assume a percentage of people using Uber.
• Calculate average trips per user.
2. Case Study: Analyze sales performance using a given dataset. What steps
would you take?**
• Answer:
• Perform data cleaning.
• Conduct exploratory data analysis.
• Segment sales by various categories.
• Visualize insights using Power BI/Tableau.
3. Scenario: Identify main factors driving customer satisfaction.**
• Answer:
• Use correlation analysis on feedback data.
• Perform regression or decision tree analysis.
• Present findings with actionable recommendations.
4. Scenario: Forecast demand for a new product launch.**
• Answer:
• Analyze historical sales data of similar products.
• Use predictive modeling (e.g., ARIMA or Prophet).
• Validate results with business teams.

You might also like