0% found this document useful (0 votes)
12 views5 pages

HCLTech

The document contains a series of interview questions and answers for data analyst positions, focusing on SQL and Python. It includes SQL queries for finding the second highest salary, identifying and deleting duplicates, calculating sales percentages, retrieving top records by category, and joining tables. Additionally, it covers Python functions for calculating mean and median, handling missing values, fetching SQL data with pandas, visualizing correlations, and automating data processing tasks.

Uploaded by

jagandeep singh.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views5 pages

HCLTech

The document contains a series of interview questions and answers for data analyst positions, focusing on SQL and Python. It includes SQL queries for finding the second highest salary, identifying and deleting duplicates, calculating sales percentages, retrieving top records by category, and joining tables. Additionally, it covers Python functions for calculating mean and median, handling missing values, fetching SQL data with pandas, visualizing correlations, and automating data processing tasks.

Uploaded by

jagandeep singh.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

HCLTech

Data Analyst Interview Questions & Answers

SQL Questions

Q1. Write a query to find the second highest salary in an employee table?

WITH RankedSalaries AS (

SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS Rank

FROM employees

SELECT salary FROM RankedSalaries

WHERE Rank = N;

Q2. How do you identify duplicate rows in a table and delete them?

WITH cte AS (

SELECT *, ROW_NUMBER() OVER (PARTITION BY column1, column2, ... ORDER BY id) as row_num

FROM your_table

DELETE FROM cte WHERE row_num > 1;


Q3. Write a query to calculate the percentage of sales for each product?

SELECT

p.product_name,

SUM(o.quantity * o.unit_price) AS total_sales,

(SUM(o.quantity * o.unit_price) * 100.0) /

(SELECT SUM(quantity * unit_price) FROM orders) AS sales_percentage

FROM

orders o

INNER JOIN products p ON o.product_id = p.product_id

GROUP BY

p.product_name

ORDER BY

total_sales DESC;
Q4. How do you retrieve the top N records for each category in a dataset?

WITH ranked_data AS (
SELECT

category,

value,

ROW_NUMBER() OVER (PARTITION BY category ORDER BY value DESC) AS row_num

FROM

your_table

SELECT

category,

value

FROM

ranked_data

WHERE

row_num <= N;
Q5. Write a query to join two tables and fetch records that exist in one table but not the other?

SELECT

t1.*

FROM

table1 t1

LEFT JOIN table2 t2 ON t1.id = t2.id

WHERE

t2.id IS NULL;

Python Questions

Q1. Write a Python function to calculate the mean and median of a dataset.

import numpy as np

def calculate_mean_median_np(data):

if not data: # Check if the dataset is empty

return "Data is empty"

mean = np.mean(data) # Calculate mean using NumPy

median = np.median(data) # Calculate median using NumPy

return mean, median


data = [1, 2, 3, 4, 5, 6]

mean, median = calculate_mean_median_np(data)

print(f"Mean: {mean}")

print(f"Median: {median}")
Q2. How would you clean and preprocess a dataset with missing values?

import pandas as pd

# Sample data with missing values

data = {'col1': [1, 2, None, 4, 5],

'col2': [None, 2, 3, 4, None]}

df = pd.DataFrame(data)

# 1. Check for missing values

print(df.isnull())

# 2. Fill missing values with the column mean

df_filled = df.fillna(df.mean())

print(df_filled)
Q3. Write a Python script to fetch data from a SQL database using pandas?

import pandas as pd

from sqlalchemy import create_engine

# Database connection details

db_url = 'mysql+pymysql://username:password@host:port/database_name' # Replace with your actual database


info

# Create the database connection

engine = create_engine(db_url)

# SQL query to fetch data

query = 'SELECT * FROM your_table_name' # Replace with your actual SQL query

# Fetch data into a DataFrame

df = pd.read_sql(query, engine)

# Display the data

df.head() # Jupyter displays DataFrame nicely when you use df.head()


Q4. How would you visualize the correlation between two variables in a dataset?

import seaborn as sns

import matplotlib.pyplot as plt

# Example data (replace with your own dataset)


data = {'x': [1, 2, 3, 4, 5], 'y': [2, 4, 5, 4, 5]}

df = pd.DataFrame(data)

# Create scatter plot with seaborn

sns.scatterplot(data=df, x='x', y='y')

# Add labels and title

plt.xlabel('Variable X')

plt.ylabel('Variable Y')

plt.title('Scatter Plot of Variable X vs. Y')

# Show plot

plt.show()
Q5. Explain how you would use Python to automate repetitive data processing tasks.

To automate repetitive data processing tasks in Python:

1. Identify the Task:


Determine which part of the process is repetitive (e.g., data cleaning, transformation, or aggregation). For
example, you might need to remove missing values or convert data formats in multiple files.

2. Write a Python Function:


Create a function to handle the repetitive task. This function should accept inputs (like files or datasets) and
process them consistently. For example, a clean_data() function can remove missing values from a dataset.

3. Automate Execution:
Use loops to apply the function to multiple datasets, or schedule the task to run automatically at specific times
using tools like cron or Task Scheduler.

import pandas as pd

import os

def clean_data(file_path):

df = pd.read_csv(file_path)

df_cleaned = df.dropna()

df_cleaned.to_csv(file_path.replace('.csv', '_cleaned.csv'), index=False)

folder_path = '/path/to/data'

for file_name in os.listdir(folder_path):

if file_name.endswith('.csv'):

clean_data(os.path.join(folder_path, file_name))

This approach automates data cleaning across multiple files efficiently.


Found this useful?

Follow me
Save it
Sangeetha A Comment below

You might also like