HCLTech
HCLTech
SQL Questions
Q1. Write a query to find the second highest salary in an employee table?
WITH RankedSalaries AS (
FROM employees
WHERE Rank = N;
Q2. How do you identify duplicate rows in a table and delete them?
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY column1, column2, ... ORDER BY id) as row_num
FROM your_table
SELECT
p.product_name,
FROM
orders o
GROUP BY
p.product_name
ORDER BY
total_sales DESC;
Q4. How do you retrieve the top N records for each category in a dataset?
WITH ranked_data AS (
SELECT
category,
value,
FROM
your_table
SELECT
category,
value
FROM
ranked_data
WHERE
row_num <= N;
Q5. Write a query to join two tables and fetch records that exist in one table but not the other?
SELECT
t1.*
FROM
table1 t1
WHERE
t2.id IS NULL;
Python Questions
Q1. Write a Python function to calculate the mean and median of a dataset.
import numpy as np
def calculate_mean_median_np(data):
print(f"Mean: {mean}")
print(f"Median: {median}")
Q2. How would you clean and preprocess a dataset with missing values?
import pandas as pd
df = pd.DataFrame(data)
print(df.isnull())
df_filled = df.fillna(df.mean())
print(df_filled)
Q3. Write a Python script to fetch data from a SQL database using pandas?
import pandas as pd
engine = create_engine(db_url)
query = 'SELECT * FROM your_table_name' # Replace with your actual SQL query
df = pd.read_sql(query, engine)
df = pd.DataFrame(data)
plt.xlabel('Variable X')
plt.ylabel('Variable Y')
# Show plot
plt.show()
Q5. Explain how you would use Python to automate repetitive data processing tasks.
3. Automate Execution:
Use loops to apply the function to multiple datasets, or schedule the task to run automatically at specific times
using tools like cron or Task Scheduler.
import pandas as pd
import os
def clean_data(file_path):
df = pd.read_csv(file_path)
df_cleaned = df.dropna()
folder_path = '/path/to/data'
if file_name.endswith('.csv'):
clean_data(os.path.join(folder_path, file_name))
Follow me
Save it
Sangeetha A Comment below