0% found this document useful (0 votes)

12 views5 pages

HCLTech

The document contains a series of interview questions and answers for data analyst positions, focusing on SQL and Python. It includes SQL queries for finding the second highest salary, identifying and deleting duplicates, calculating sales percentages, retrieving top records by category, and joining tables. Additionally, it covers Python functions for calculating mean and median, handling missing values, fetching SQL data with pandas, visualizing correlations, and automating data processing tasks.

Uploaded by

jagandeep singh.

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views5 pages

HCLTech

Uploaded by

jagandeep singh.

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

HCLTech

Data Analyst Interview Questions & Answers

SQL Questions

Q1. Write a query to find the second highest salary in an employee table?

WITH RankedSalaries AS (

SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS Rank

FROM employees

SELECT salary FROM RankedSalaries

WHERE Rank = N;

Q2. How do you identify duplicate rows in a table and delete them?

WITH cte AS (

SELECT *, ROW_NUMBER() OVER (PARTITION BY column1, column2, ... ORDER BY id) as row_num

FROM your_table

DELETE FROM cte WHERE row_num > 1;

Q3. Write a query to calculate the percentage of sales for each product?

SELECT

p.product_name,

SUM(o.quantity * o.unit_price) AS total_sales,

(SUM(o.quantity * o.unit_price) * 100.0) /

(SELECT SUM(quantity * unit_price) FROM orders) AS sales_percentage

FROM

orders o

INNER JOIN products p ON o.product_id = p.product_id

GROUP BY

p.product_name

ORDER BY

total_sales DESC;
Q4. How do you retrieve the top N records for each category in a dataset?

WITH ranked_data AS (
SELECT

category,

value,

ROW_NUMBER() OVER (PARTITION BY category ORDER BY value DESC) AS row_num

FROM

your_table

SELECT

category,

value

FROM

ranked_data

WHERE

row_num <= N;
Q5. Write a query to join two tables and fetch records that exist in one table but not the other?

SELECT

t1.*

FROM

table1 t1

LEFT JOIN table2 t2 ON t1.id = t2.id

WHERE

t2.id IS NULL;

Python Questions

Q1. Write a Python function to calculate the mean and median of a dataset.

import numpy as np

def calculate_mean_median_np(data):

if not data: # Check if the dataset is empty

return "Data is empty"

mean = np.mean(data) # Calculate mean using NumPy

median = np.median(data) # Calculate median using NumPy

return mean, median

data = [1, 2, 3, 4, 5, 6]

mean, median = calculate_mean_median_np(data)

print(f"Mean: {mean}")

print(f"Median: {median}")
Q2. How would you clean and preprocess a dataset with missing values?

import pandas as pd

# Sample data with missing values

data = {'col1': [1, 2, None, 4, 5],

'col2': [None, 2, 3, 4, None]}

df = pd.DataFrame(data)

# 1. Check for missing values

print(df.isnull())

# 2. Fill missing values with the column mean

df_filled = df.fillna(df.mean())

print(df_filled)
Q3. Write a Python script to fetch data from a SQL database using pandas?

import pandas as pd

from sqlalchemy import create_engine

# Database connection details

db_url = 'mysql+pymysql://username:password@host:port/database_name' # Replace with your actual database

info

# Create the database connection

engine = create_engine(db_url)

# SQL query to fetch data

query = 'SELECT * FROM your_table_name' # Replace with your actual SQL query

# Fetch data into a DataFrame

df = pd.read_sql(query, engine)

# Display the data

df.head() # Jupyter displays DataFrame nicely when you use df.head()

Q4. How would you visualize the correlation between two variables in a dataset?

import seaborn as sns

import matplotlib.pyplot as plt

# Example data (replace with your own dataset)

data = {'x': [1, 2, 3, 4, 5], 'y': [2, 4, 5, 4, 5]}

df = pd.DataFrame(data)

# Create scatter plot with seaborn

sns.scatterplot(data=df, x='x', y='y')

# Add labels and title

plt.xlabel('Variable X')

plt.ylabel('Variable Y')

plt.title('Scatter Plot of Variable X vs. Y')

# Show plot

plt.show()
Q5. Explain how you would use Python to automate repetitive data processing tasks.

To automate repetitive data processing tasks in Python:

1. Identify the Task:

Determine which part of the process is repetitive (e.g., data cleaning, transformation, or aggregation). For
example, you might need to remove missing values or convert data formats in multiple files.

2. Write a Python Function:

Create a function to handle the repetitive task. This function should accept inputs (like files or datasets) and
process them consistently. For example, a clean_data() function can remove missing values from a dataset.

3. Automate Execution:
Use loops to apply the function to multiple datasets, or schedule the task to run automatically at specific times
using tools like cron or Task Scheduler.

import pandas as pd

import os

def clean_data(file_path):

df = pd.read_csv(file_path)

df_cleaned = df.dropna()

df_cleaned.to_csv(file_path.replace('.csv', '_cleaned.csv'), index=False)

folder_path = '/path/to/data'

for file_name in os.listdir(folder_path):

if file_name.endswith('.csv'):

clean_data(os.path.join(folder_path, file_name))

This approach automates data cleaning across multiple files efficiently.

Found this useful?

Follow me
Save it
Sangeetha A Comment below

The Seasons - Tchaikovsky - Sheet Music
100% (2)
The Seasons - Tchaikovsky - Sheet Music
56 pages
IT SKILL LAB-2 MBA-1st Yr NKY
100% (4)
IT SKILL LAB-2 MBA-1st Yr NKY
15 pages
Behenchod
No ratings yet
Behenchod
10 pages
Creation of SDTM Annotated CRFs
No ratings yet
Creation of SDTM Annotated CRFs
25 pages
Python Interviews
No ratings yet
Python Interviews
154 pages
CB1PRAFJ Lesson5 DBTools
No ratings yet
CB1PRAFJ Lesson5 DBTools
50 pages
A Complete Data Science Interview With 100 Questions
100% (1)
A Complete Data Science Interview With 100 Questions
57 pages
IP - Pandas 1 & 2 (Worksheet) Class 12
No ratings yet
IP - Pandas 1 & 2 (Worksheet) Class 12
16 pages
Business Process Modeling
50% (2)
Business Process Modeling
69 pages
Computer Awareness MCQs PDF
100% (2)
Computer Awareness MCQs PDF
27 pages
12 IPRevision Papers 2025
No ratings yet
12 IPRevision Papers 2025
93 pages
All-In-One Xii Ip PB QP Ms 2024-25 (301 Pages)
No ratings yet
All-In-One Xii Ip PB QP Ms 2024-25 (301 Pages)
301 pages
Python Interview Questions 1653100147
No ratings yet
Python Interview Questions 1653100147
24 pages
Sample Questions For XII IP
No ratings yet
Sample Questions For XII IP
59 pages
Chapter2 - Data Wrangling
No ratings yet
Chapter2 - Data Wrangling
48 pages
Ebilling and Invoice System - SYNOPSIS
67% (3)
Ebilling and Invoice System - SYNOPSIS
15 pages
2023 Final Aissce Practical Examination 2022
No ratings yet
2023 Final Aissce Practical Examination 2022
6 pages
Marking Scheme Practical Paper
No ratings yet
Marking Scheme Practical Paper
7 pages
RelativityOne - Recipes PDF
100% (1)
RelativityOne - Recipes PDF
294 pages
Practice For Practical 2024-25
No ratings yet
Practice For Practical 2024-25
9 pages
Lesson 3 Effective Internet Research
No ratings yet
Lesson 3 Effective Internet Research
25 pages
Sample Paper General Instruction
No ratings yet
Sample Paper General Instruction
12 pages
Xii Ip Jpr-Ms-Pb-1-Set-2
No ratings yet
Xii Ip Jpr-Ms-Pb-1-Set-2
12 pages
SAMPLE
No ratings yet
SAMPLE
5 pages
Informatics Practices-Sahodaya QP New
No ratings yet
Informatics Practices-Sahodaya QP New
15 pages
12 Ip Model To Print
No ratings yet
12 Ip Model To Print
7 pages
CSE445 NSU Week - 3
No ratings yet
CSE445 NSU Week - 3
48 pages
DHP Journal
No ratings yet
DHP Journal
29 pages
Python Basics Refresher
No ratings yet
Python Basics Refresher
19 pages
65 (Hs Xii A SC Com Ip 22)
No ratings yet
65 (Hs Xii A SC Com Ip 22)
11 pages
Pandas NumPy Practice Questions
No ratings yet
Pandas NumPy Practice Questions
2 pages
20ca2204 Data Science QB With Answers
No ratings yet
20ca2204 Data Science QB With Answers
48 pages
12 Ip
No ratings yet
12 Ip
4 pages
DS Question Bank Unit-1 Part-2
No ratings yet
DS Question Bank Unit-1 Part-2
3 pages
Worksheet - Pandas
100% (1)
Worksheet - Pandas
16 pages
Ip 1
No ratings yet
Ip 1
26 pages
MCQ
No ratings yet
MCQ
8 pages
Deloitte Data Engineer Interview Experience (0-3 Yoe)
No ratings yet
Deloitte Data Engineer Interview Experience (0-3 Yoe)
22 pages
InformaticsPractices-24-25 Classs XII
No ratings yet
InformaticsPractices-24-25 Classs XII
16 pages
Practice Ques Ip Pract
No ratings yet
Practice Ques Ip Pract
6 pages
Data Handling Ques
No ratings yet
Data Handling Ques
2 pages
IP - Preboard PPS
No ratings yet
IP - Preboard PPS
3 pages
SSCE VGS Set-1 Updated
No ratings yet
SSCE VGS Set-1 Updated
4 pages
Xii Ip MT Sept 24
No ratings yet
Xii Ip MT Sept 24
4 pages
Informatics Practices
No ratings yet
Informatics Practices
9 pages
12th - QPAPER - Half Yearly 2023
No ratings yet
12th - QPAPER - Half Yearly 2023
9 pages
Ip CLSS Xii 2024-25 Hy
No ratings yet
Ip CLSS Xii 2024-25 Hy
14 pages
HEALTHCARE
No ratings yet
HEALTHCARE
3 pages
Text 3
No ratings yet
Text 3
3 pages
Set-D CT2 Answerkey
No ratings yet
Set-D CT2 Answerkey
11 pages
Python CAT Papers
No ratings yet
Python CAT Papers
6 pages
Sssis Interview Questins
No ratings yet
Sssis Interview Questins
7 pages
SSCE VGS Set-4 Updated
No ratings yet
SSCE VGS Set-4 Updated
6 pages
QP of IP - 1st Preboard 2024-25 - Set1
No ratings yet
QP of IP - 1st Preboard 2024-25 - Set1
14 pages
Flipkart Business Analyst Interview Questions
No ratings yet
Flipkart Business Analyst Interview Questions
16 pages
Practical
No ratings yet
Practical
12 pages
N RQgi 8 Eg DUNFS451 K4 X QXA
No ratings yet
N RQgi 8 Eg DUNFS451 K4 X QXA
61 pages
Ip MS
No ratings yet
Ip MS
6 pages
QP - Ip - Xii - Set 2
No ratings yet
QP - Ip - Xii - Set 2
8 pages
Smart Fitness Presentation
No ratings yet
Smart Fitness Presentation
13 pages
QP-1PB-IP-2024 Set 1
No ratings yet
QP-1PB-IP-2024 Set 1
9 pages
50 Common Data Analyst Interview Questions
No ratings yet
50 Common Data Analyst Interview Questions
3 pages
Pract - 12 - IP - Practice - A - B
No ratings yet
Pract - 12 - IP - Practice - A - B
6 pages
Python CheatSheet
No ratings yet
Python CheatSheet
2 pages
Ip - Capsule
No ratings yet
Ip - Capsule
17 pages
B Tech-AIML-question Bank-2 Answer Key
No ratings yet
B Tech-AIML-question Bank-2 Answer Key
9 pages
PYQ Data Analysis and Visualisation Using Python GE May 2024
No ratings yet
PYQ Data Analysis and Visualisation Using Python GE May 2024
6 pages
Book Review Format
No ratings yet
Book Review Format
4 pages
Oracle Basics Interview Questions - GeekInterview
100% (1)
Oracle Basics Interview Questions - GeekInterview
12 pages
6234 Course Notes
No ratings yet
6234 Course Notes
48 pages
Fingerprintattendencesys
No ratings yet
Fingerprintattendencesys
29 pages
Chapter2 Session2-1
No ratings yet
Chapter2 Session2-1
61 pages
Classic Security Models of Data Base Security
No ratings yet
Classic Security Models of Data Base Security
6 pages
Aishwarya Resume24
No ratings yet
Aishwarya Resume24
1 page
Analisa Motivasi
No ratings yet
Analisa Motivasi
17 pages
Dbms Lesson Plan
No ratings yet
Dbms Lesson Plan
4 pages
Yang Et Al 2024 A Bibliometric Analysis and
No ratings yet
Yang Et Al 2024 A Bibliometric Analysis and
33 pages
62-BigData Hadoop Course
No ratings yet
62-BigData Hadoop Course
3 pages
Book La Nueva Cura Biblica para La Depresion y Ansied
No ratings yet
Book La Nueva Cura Biblica para La Depresion y Ansied
2 pages
DBMS Classtest2
No ratings yet
DBMS Classtest2
10 pages
Big Data
No ratings yet
Big Data
32 pages
2037 4486 1 PB
No ratings yet
2037 4486 1 PB
14 pages
Class 1
No ratings yet
Class 1
3 pages
Database Worksheet1
No ratings yet
Database Worksheet1
8 pages
Internship Assignment Data Analytics
No ratings yet
Internship Assignment Data Analytics
6 pages
Literature Review of Online Library Management System
No ratings yet
Literature Review of Online Library Management System
8 pages
Data Structures in C / C ++: Exercises and Solved Problems
From Everand
Data Structures in C / C ++: Exercises and Solved Problems
Fulbia Torres
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

HCLTech

Uploaded by

HCLTech

Uploaded by

HCLTech

Data Analyst Interview Questions & Answers

SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS Rank

SELECT salary FROM RankedSalaries

DELETE FROM cte WHERE row_num > 1;

SUM(o.quantity * o.unit_price) AS total_sales,

(SUM(o.quantity * o.unit_price) * 100.0) /

(SELECT SUM(quantity * unit_price) FROM orders) AS sales_percentage

INNER JOIN products p ON o.product_id = p.product_id

ROW_NUMBER() OVER (PARTITION BY category ORDER BY value DESC) AS row_num

LEFT JOIN table2 t2 ON t1.id = t2.id

if not data: # Check if the dataset is empty

return "Data is empty"

mean = np.mean(data) # Calculate mean using NumPy

median = np.median(data) # Calculate median using NumPy

return mean, median

mean, median = calculate_mean_median_np(data)

# Sample data with missing values

data = {'col1': [1, 2, None, 4, 5],

'col2': [None, 2, 3, 4, None]}

# 1. Check for missing values

# 2. Fill missing values with the column mean

from sqlalchemy import create_engine

# Database connection details

db_url = 'mysql+pymysql://username:password@host:port/database_name' # Replace with your actual database

# Create the database connection

# SQL query to fetch data

# Fetch data into a DataFrame

# Display the data

df.head() # Jupyter displays DataFrame nicely when you use df.head()

import seaborn as sns

import matplotlib.pyplot as plt

# Example data (replace with your own dataset)

# Create scatter plot with seaborn

sns.scatterplot(data=df, x='x', y='y')

# Add labels and title

plt.title('Scatter Plot of Variable X vs. Y')

To automate repetitive data processing tasks in Python:

1. Identify the Task:

2. Write a Python Function:

df_cleaned.to_csv(file_path.replace('.csv', '_cleaned.csv'), index=False)

for file_name in os.listdir(folder_path):

This approach automates data cleaning across multiple files efficiently.

You might also like