0% found this document useful (0 votes)

18 views29 pages

Data Ques

The document provides a comprehensive list of frequently asked data analyst interview questions and their answers, focusing on key concepts such as OLTP vs. OLAP, data cleaning methods, ETL pipeline design, data quality assurance, and statistical analysis. It covers various topics including SQL optimization, handling skewed data, A/B testing, and the differences between supervised and unsupervised learning. The content is aimed at helping candidates prepare for data analyst roles, particularly in MAANG companies.

Uploaded by

Akshita Arora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views29 pages

Data Ques

Uploaded by

Akshita Arora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

TOP 30

Frequently Asked
DATA Analyst
Questions

By MAANG Companies
*Disclaimer*
Everyone learns uniquely.

What matters is developing the problem 

solving ability to solve new problems.

This Doc will help you with the same.

www.bosscoderacademy.com 1
Question - 1

What makes OLTP different from OLAP?

OLTP (Online Transaction Processing) deals with day to

day transactions which makes real time entry and retrieval

of data fast

OLAP (Online Analytical Processing) is for the analysis of

huge amount for data and is more concentrating on the high

integrity of the queries made and the reports developed for

decision making.

In short: OLTP is good for processing data transactions; OLAP

good for data analysis.

www.bosscoderacademy.com 2
Question - 2
Explain how you would approach cleaning a
dataset with 10% missing values.
To clean a dataset with 10% missing values, I would
Assess the missing data : Which columns have got missing
values and how many records are there with these missing
values
Decide on handling methods :
If there is a significant number of missing values for  
numerical data columns, use imputation (mean, median  
model-based), or delete rows/columns alike
For categorical variables it is better to use the mode  
imputation or create a new category for instance
“Unknown”
Decide on handling methods : Ensure while scrubbing the
data it does not have loss of patterns and doesn’t have
elements of biasness.

www.bosscoderacademy.com 3
Question - 3
How do you design an ETL pipeline for real-time
analytics?
To design an ETL (Extract, Transform, Load) pipeline for real-
time analytics:
Extract : Some of the common real-time data source are
the use of message queues such as Kafka or APIs for real-
time data access
Transform : Do operations such as filtering, aggregation, or  
enrichment on the fly, typically employing stream
processing engines (Apache Flink, Spark Streaming ...)
Load : Load transformed data to real-time data storage
such as a real-time data-warehouse like AWS Redshift,
google big query etc.

www.bosscoderacademy.com 4
Question - 4
How do you ensure data quality in a project?
To ensure data quality in a project, I focus on:
Clear Data Collection Standards : Develop guidelines in
terms of the adherence of the process used in data
collection
Data Validation : This is in several cases done with the use
of software, but it is important to perform this check
frequently
Data Cleaning : It usually involves eliminating, for example,
duplicate or unnecessary results of database queries
Timely Updates : Update the data and keep the same  
up-to-date
Regular Audits : The reviews should be done periodically in
order to ensure that its contents are accurate and complete.

www.bosscoderacademy.com 5
Question - 5

What is the importance of p-values in

hypothesis testing?

A P-value aids in adding whether or not the outcomes of a

hypothesis test are statistically different. If the results are

statistically significant, a small p-value (p < 0.05, or the

researcher’s chosen alpha level) reject null hypothesis in favour

of the stand with the consequent supporting the research

alternative hypothesis. A large p-value means the data we

present in our study does not allow us to reject the null

hypothesis.

Question - 6

Can you describe the difference between

normalization and standardization?

Normalization brings data to a scale within a particularly

predefined range, which is often between 0 and 1 while

Standardization adjusts data to have a zero mean and unit

standard deviation. Normalization is helpful when the features

have different units of measurement but standardization is

helpful when features have different scales which must be

measured in one form.

www.bosscoderacademy.com 6
Question - 7
Explain how you would optimize a SQL query for
large datasets.
To optimize a SQL query for large datasets:
Use indexes : Index a column on fields to be searched on
and fields used in a JOIN
JOIN statement as well as in an  
ORDER
ORDER BYB statement
Limit result set : To limit the number of results returned use
LIMIT
LIMIT or TOP
TOP

SELECT : The learner should only choose the

Avoid SELEC
desirable columns to enhance the efficiency of data
processing
Use efficient joins : Use INNER JOI rather than OUTER  
INNER JOIN OUTER JOIN
and avoid unnecessary ones
WHERE filters early : Filters need to be applied right
Use WHERE
from the start as to minimize the number of rows that are
going to be dealt with
Optimize subqueries : Generally it is better to use joins
rather than subqueries
Analyze query execution plan : Look at the execution plan
to recognize some problems concerning time-consuming.

www.bosscoderacademy.com 7
Question - 8
How do you handle skewed data distributions?
To handle skewed data distributions, you can use techniques
like:
Log Transformation : If the data are skewed, and the
variable is continuous, use log or square root
transformation
Winsorization : Maximum and minimum numbers for
variables to limit effect of outliers
Resampling : Undersampling or oversampling is the method
used when there is imbalanced data
Model Selection : This one should be done using algorithms
that are less likely to be substantially affected by skewed
data, such as tree based models.

www.bosscoderacademy.com 8
Question - 9
What is a Type I error and Type II error? Give
examples
To handle skewed data distributions, you can use techniques
like:
Type I error on the other hand is made when the null
hypothesis is rejected when in actual sense it is true.
Example: An X-ray, for example, improperly suggests a
person free of a certain disease actually has the disease
In Type II error also known as false negative, you do not
reject a false null hypothesis or fail to find them to be not
true. 
Example: An m-test fails to ‘screen out’ a person who, in
reality, has a disease.

www.bosscoderacademy.com 9
Question - 10
What is the difference between LEFT JOIN and
FULL OUTER JOIN in SQL?
In SQL:
LEFT JOI : Brings back the all of the records of the left
LEFT JOIN
table with the matching record from the right table. In case
where there is no match, NULL values are returned from the
columns in the right table
FULL JOI : Brings all records if there is a match in
OUTERJOIN
FULL OUTER
any of the two tables. It contains combination of unmatched
rows of both left and right tables, where unmatched column
has NULL values.

www.bosscoderacademy.com 10
Question - 11

How would you design a dashboard to track

product performance?

Design a product performance dashboard with the following

key features:

Overview Section : The major ideas such as the sales,

revenues or customers feedback can be easily observed at

a glance

Graphs/Charts : Categorize performance by means of

charts such as bar, line or pie

Filters : Users should be able to sort the data by time,

geographical area, and/ or by product type

Comparisons : Build direct comparisons with related values

(for example, product to target comparison)

Alerts/Notifications : They underline some important

change or problem in the company (for example, weak

product/s)

Real-time Data : It is also important to make sure the latest

data is put in the dashboard frequently.

www.bosscoderacademy.com 11
Question - 12
How do you decide between RDBMS and NoSQL
for a project?
RDBMS (e.g., MySQL, PostgreSQL) : Suits well when
dealing with huge volumes of structured data, where the
assemblies of data are intricate and for situations where
reliable transactions are preferred
NoSQL (e.g., MongoDB, Cassandra) : Suitable to handle
data that are ill defined or partially defined, excellent for
growing businesses, and when the structure may evolve
over some time.

Question - 13
Explain the concept of data normalization in
databases.
Data normalization in databases can be defined as the action
of arranging data to decrease the problems of repetition as
well as to increase the consistency of data. It is necessary for
breaking a site into more discreet tables to reduce a number of
replicates, and establishing associations between them. This
makes the database much more flexible as well as easy to
manage than using other complicated structures.

www.bosscoderacademy.com 12
Question - 14
How do you detect and handle outliers in a
dataset?
Detect Outliers :
It is better to use such graphical representation such as
box plot or scatter plot
Use statistical approaches such as the IQR rule or the  
Z-scores to so doing
Handle Outliers :
Remove : In particular, it may happen that an outlier
results from data entry errors or it is not useful for the
analysi
Transform : Reduce the impact by using some of the
methods like the log transformation
Cap/Impute: It is suggested replacing outliers with
maximum or median value.

www.bosscoderacademy.com 13
Question - 15
How do you approach A/B testing?
My conception of the A/B testing is based on beginning with a
specific objective, for example, increasing conversion. Then, I
split the audience into two groups: one gets to see the first
test (control) and the other gets an opportunity to look at the
second test (variation). I make sure that the test is run for long
enough to collect adequate data then statistics must be used
to find out the version that performed well.

Question - 16
What is the difference between batch
processing and stream processing?
Batch Processing : Analyses a large amount of data that
has been gathered over the period. It is a batch process
which means it is not done interactively (for instance
preparing daily, weekly or monthly reports)
Stream Processing : Data gets analyzed in real-time, a
moment when the data is being produced, and you can
work with it immediately (for instance, analyzing traffic on
the website in the process of its functioning).

www.bosscoderacademy.com 14
Question - 17
How do you optimize joins in SQL queries?
To optimize joins in SQL queries:
Use Proper Indexing : Make it possible that only indexed
columns are used in the join conditions
Filter Early : By adding the filters in the WHERE
WHERE or ON
ON
clause, always try to reduce the dataset before joining it
Choose the Right Join Type : Always opt for INNER JOI
INNER JOIN
since it has been stated to be faster than using an  
OUTER JOI
OUTER JOIN

Avoid Joining Unnecessary Tables : Thus, only those tables

and columns required for the query only should be included
Check Execution Plan : Optimize query execution plan so as
to remove any obstacles or modify the methods used in
query optimization.

www.bosscoderacademy.com 15
Question - 18
How do you design a data warehouse with a
star schema?
To design a data warehouse with a star schema :
Identify the Business Process : Identify out the type of flow
you wish to model, for instance; sale flow, inventory flow
and so on
Define the Fact Table : Make special table that will contain
numeric values like amount of sales, number of pieces sold
and etc
Define Dimension Tables : Generate other tables for the
descriptive attributes (time, product, customer, location
etc.)
Establish Relationships : Relation the fact table to each of
the dimension tables using the primary key and foreign key
Optimize for Queries : Make sure that the schema is
decomposed and kept as plain as possible for needed
queries.

www.bosscoderacademy.com 16
Question - 19
How would you calculate the 90th percentile of
sales in SQL?
Calculating of the 90th percentile of sales in SQL is easier if
using built-in function named PERCENTILE_CONT, which
computes a percentile within a given set of values arranged
according to the specified order.

sql
SELECT PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY
sales) AS percentile_90

FROM sales_table;

This will return the 90th percentile of the sales column from
the sales_table.

www.bosscoderacademy.com 17
Question - 20
What is the difference between a Snowflake
schema and a Star schema?
A Star schema is a basic data model characterized by a fact
table and one or more dimension tables. Every dimension
table is connected with the fact table creating a star-like
structure of organization
A Snowflake schema is slightly more difficult to
understand. This is somewhat like the Star schema but the
dimension tables are normalized into several related tables
that create a ‘snowflake’ structure.

Question - 21
What is the role of indexing in databases?
Database indexing enhances the efficiency of data searching
through formation of a structure for data rows called table or
tree, such that the database does not need to go through the
entire table in order to find a particular row. Like in books, it is
an index helping to search faster.

www.bosscoderacademy.com 18
Question - 22
How would you calculate churn rate in SQL
To calculate churn rate in SQL:
Total customers at the beginning of a period :
total_customers_star
Quantify churned or the number of customers who left
during the period (churned_customers).
Use the formula :
sql
SELECT

(CAST(churned_customeers AS FLOAT) /
total_customers_start) * 100 AS churn_rate

FROM (

SELECT

COUNT (*) AS churned_customers

FROM customers

WHERE status = ‘churned’ AND churn_date BETWEEN

‘start_date’ AND ‘end_date’

) AS churned;

SELECT

www.bosscoderacademy.com 19
COUNT (*) AS total_customers_start

FROM customers

WHERE join_date <= ‘start_date’

) AS start;

Replace 'start_date' and 'end_date' with your period dates.

Question - 23
How do you decide between using Python or
SQL for a data task?
Use Python where you need to perform rather heavy
calculations, carry out analytics or machine learning, or format
free form data. SQL should be used preferably when it is
directly required to operate on relational databases through
queries for filter, aggregate function or joining of big frames.

www.bosscoderacademy.com 20
Question - 24

Explain the differences between supervised and

unsupervised learning

Supervised Learning :

Incorporates labeled data which are input-output pairs

The model learns with the aim at predicting outputs from

the given inputs

Example: House price prediction based on historical

data

Unsupervised Learning :

Requires no predetermined outcomes to be applied to

the data it processes

The model classifies data that is available by looking for

patterns or grouping them

Example: Customer classification in marketing.

www.bosscoderacademy.com 21
Question - 25
How do you prioritize tasks in a data analytics
project?
Prioritize tasks in a data analytics project using these steps :
Define Objectives : Learn what the project is about and
what the major questions to answer are
Assess Impact : Concentrate on what you think is most
critical or useful to your work
Sequence Dependencies : Complete basic activities before
deploying, for example, analytical procedures
Allocate Resources : Fit the tasks to the characteristics of
the team members and resources in their disposal
Set Timelines : Divide work on the project into particular
stages with corresponding dates
Iterate : Carry out investigation based on the results and
update the plan according to the development in the
project.

www.bosscoderacademy.com 22
Question - 26
How do you decide which visualization to use
for a given dataset?
To decide on a visualization :
Understand Your Data : Consider the type of data
(categorical and numerical) as well as the type of
relationship to establish which is appropriate namely
comparison, distribution, trends, or composition
Define Your Goal : Be clear about what you need to portray
for example, temporal changes, relative sizes, relationships
Choose the Right Chart :
Comparison : Bar chart, line chart
Distribution : Histogram, box plot
Trends : Line chart
Composition : Pie chart, stacked bar chart
Relationships : Scatter plot, bubble chart.

www.bosscoderacademy.com 23
Question - 27
What is the difference between UNION and
UNION ALL in SQL?
The key difference between UNION
UNION and UNION AL in SQL is :
UNION ALL

UNION : After executing two queries, the command

UNION
combines the results and erases all the rows that are
similar. This means there is additional activity that includes
sorting and checking for duplicates of similar records
UNION AL : Joins two columns/trials on same parameters
UNION ALL
and retrieves all the rows, including any rows that are
duplicated. It is faster, especially due to the lack of needful
check that are generally performed to confirm a record was
indeed successfully added.
When you are looking for unique output then you should go for
UNION
UNION while for duplicate output and when performance is
also a concern you should go for UNION AL .
UNION ALL

www.bosscoderacademy.com 24
Question - 28
How do you ensure the scalability of a data
pipeline?
To ensure scalability in a data pipeline :
Distributed Processing : Organize large datasets through
the utilization of special platforms such as Apache Spark, or
Kafka
Horizontal Scaling : Invest in more machines, or nodes for
handling increased workload
Modular Design : Make pipelines in standalone and
composable steps to scale up the process more efficiently
Auto-scaling : Use service models that are connected with
the cloud scenario and which are able to increase on their
own
Optimized Storage : Some general and cheap storage
solutions are cloud object storage which includes S3 and
GCS
Monitoring and Load Balancing : Be consistent in analyzing
frequent performance and avoid large variations in the
distribution of load.

www.bosscoderacademy.com 25
Question - 29
How do you handle correlated variables in
predictive modeling?
To handle correlated variables in predictive modeling :
Identify Correlation : When using the correlation matrix
select variables with high correlation, for instance Pearson
correlation coefficient greater than 0.8
Remove Redundancy : There should be one measure
retained while other measures that can offer similar
information should be removed
Use Regularization : The correlation is managed by
methods like Lasso or Ridge regression because it punishes
less important features
Dimensionality Reduction : Use methods such as PCA in
order to replace several related variables by several
orthogonal components
Domain Knowledge : Choose the variable that best fits
what you consider to be the problem with the organization.

www.bosscoderacademy.com 26
Question - 30
Explain the difference between rank() and
dense_rank() in SQL.
The key difference between RANK() and DENSE_RANK() in
SQL lies in how they handle ranking when there are ties :
RANK() : May leave gaps in the ranking depending on the
number of ties present for the ranking. For example, where
two rows have the same rank of 1, then the next row would
have a rank of 3 (1, 1, 3)
DENSE_RANK() : It does not create gaps in ranking. If there
are equal two rows as the highest rank, the next rank will be
the second rank (1st rank = 1, 2nd rank = 1).
Both are used to order the rows in order to give each row a
rank according to the order given.

www.bosscoderacademy.com 27
Why Bosscoder?
2200+ Alumni placed at Top Product-
based companies.

More than 136% hike for every  

2 out of 3 Working Professional.

Average Package of 24LPA.

Explore More

Up & Running With PBI
No ratings yet
Up & Running With PBI
86 pages
Business Intelligence and Analytic Kds051
No ratings yet
Business Intelligence and Analytic Kds051
2 pages
AMI SOP 052021 v2
No ratings yet
AMI SOP 052021 v2
9 pages
SDET
No ratings yet
SDET
4 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
61 pages
AI ML Data Pipeline
No ratings yet
AI ML Data Pipeline
10 pages
Data Science Bootcamp Brochure
No ratings yet
Data Science Bootcamp Brochure
20 pages
Complete 50 Data Analyst Questions
No ratings yet
Complete 50 Data Analyst Questions
7 pages
Combine DLfile
No ratings yet
Combine DLfile
70 pages
数据仓库技术架构及方案
No ratings yet
数据仓库技术架构及方案
60 pages
Lecture4 - 2 - Health Monitoring
No ratings yet
Lecture4 - 2 - Health Monitoring
27 pages
ACKO MOCKDRIVEQuestions and Answers
No ratings yet
ACKO MOCKDRIVEQuestions and Answers
7 pages
Data Analyst Interview Questions
No ratings yet
Data Analyst Interview Questions
9 pages
Data Engineer Interview 1738557398
No ratings yet
Data Engineer Interview 1738557398
15 pages
Data Preprocessing
No ratings yet
Data Preprocessing
120 pages
Sap Integrations Soln Adv
No ratings yet
Sap Integrations Soln Adv
64 pages
Soft
No ratings yet
Soft
8 pages
Soft
No ratings yet
Soft
8 pages
Job Description - Project
No ratings yet
Job Description - Project
5 pages
Data Science: Part 2 - SQL
100% (1)
Data Science: Part 2 - SQL
13 pages
DWBI Lab Steps
No ratings yet
DWBI Lab Steps
9 pages
Modern Data Analytics in Excel Using Power Query Power Pivot and More For Enhanced Data Analytics 1st / Converted Edition George Mount Download
No ratings yet
Modern Data Analytics in Excel Using Power Query Power Pivot and More For Enhanced Data Analytics 1st / Converted Edition George Mount Download
47 pages
Getting Data Right Preview Ed Aug 2015 Tamr-2
No ratings yet
Getting Data Right Preview Ed Aug 2015 Tamr-2
27 pages
Becoming A Data Analyst
100% (2)
Becoming A Data Analyst
348 pages
Ananya
No ratings yet
Ananya
1 page
Infa Practice Test 1
No ratings yet
Infa Practice Test 1
47 pages
10 SQL Interview Questions To Prepare As A Data Analyst
No ratings yet
10 SQL Interview Questions To Prepare As A Data Analyst
13 pages
Da - Power Bi
No ratings yet
Da - Power Bi
9 pages
Data Science Edited
No ratings yet
Data Science Edited
57 pages
Paban Kumar Agarwal: Professional Highlights
No ratings yet
Paban Kumar Agarwal: Professional Highlights
4 pages
DA0 001 Full File Edu Re Tldwno
No ratings yet
DA0 001 Full File Edu Re Tldwno
50 pages
Book 1
No ratings yet
Book 1
9 pages
Interview Guide For Data Analyst Role
No ratings yet
Interview Guide For Data Analyst Role
4 pages
Learn SQL For Data Science
No ratings yet
Learn SQL For Data Science
12 pages
IDC Data Intelligence in The Future of Intelligence - 2023 Mar
No ratings yet
IDC Data Intelligence in The Future of Intelligence - 2023 Mar
10 pages
Complete Data Analyst Data Science Interview QA Diksha
No ratings yet
Complete Data Analyst Data Science Interview QA Diksha
3 pages
Data Analytics Chennai
No ratings yet
Data Analytics Chennai
20 pages
Soalan Data Analisis
No ratings yet
Soalan Data Analisis
15 pages
All SQL Interviews
No ratings yet
All SQL Interviews
84 pages
What Is Data Mart?
No ratings yet
What Is Data Mart?
4 pages
100 Data Scientist Interview Questions by DataInterview 1688929352
No ratings yet
100 Data Scientist Interview Questions by DataInterview 1688929352
7 pages
Data Analytics
No ratings yet
Data Analytics
22 pages
Resume Ashish Yadav AI
No ratings yet
Resume Ashish Yadav AI
5 pages
MapReduce Patterns, Algorithms, and Use Cases - Highly Scalable Blog
No ratings yet
MapReduce Patterns, Algorithms, and Use Cases - Highly Scalable Blog
7 pages
THE FAKE ACCOUNT DETECTION IN ONLINE SOCIAL NETWORKS (OSNs) USING RANDOM FOREST
No ratings yet
THE FAKE ACCOUNT DETECTION IN ONLINE SOCIAL NETWORKS (OSNs) USING RANDOM FOREST
95 pages
EPN VIT - PGP Data Analytics New
No ratings yet
EPN VIT - PGP Data Analytics New
16 pages
50 Interview Questions & Answers!
No ratings yet
50 Interview Questions & Answers!
52 pages
08 Nov 2024
No ratings yet
08 Nov 2024
9 pages
Trideep Ghosh: Tech Lead at Tech Mahindra
No ratings yet
Trideep Ghosh: Tech Lead at Tech Mahindra
3 pages
10 Most Commonly Asked DA Interview Questions and Answers
No ratings yet
10 Most Commonly Asked DA Interview Questions and Answers
3 pages
Lecture4 - 1 - Home Automation
No ratings yet
Lecture4 - 1 - Home Automation
13 pages
Data Analyst Interview QA Hemagajulapalli
No ratings yet
Data Analyst Interview QA Hemagajulapalli
12 pages
Data Analyst Interview Questions
No ratings yet
Data Analyst Interview Questions
7 pages
Question Data
No ratings yet
Question Data
1 page
Informatica ETL Beginner's Guide Informatica Tutorial Edureka
No ratings yet
Informatica ETL Beginner's Guide Informatica Tutorial Edureka
40 pages
A Complete Data Science Interview With 100 Questions
100% (1)
A Complete Data Science Interview With 100 Questions
57 pages
Deloitte
No ratings yet
Deloitte
34 pages
Ultimate Data Interview Guide
No ratings yet
Ultimate Data Interview Guide
9 pages
Data Analyst Interview Questions
No ratings yet
Data Analyst Interview Questions
12 pages
Da Q&a
No ratings yet
Da Q&a
20 pages
Data Science Brochure
No ratings yet
Data Science Brochure
16 pages
100+ Data Analyst Interview QnA PDF
No ratings yet
100+ Data Analyst Interview QnA PDF
19 pages
Top 100 Data Analyst Questions 1 To 60
No ratings yet
Top 100 Data Analyst Questions 1 To 60
14 pages
Bda Unit-1
No ratings yet
Bda Unit-1
43 pages
SQL Questions
No ratings yet
SQL Questions
25 pages
Data Science Bootcamp Brochure
No ratings yet
Data Science Bootcamp Brochure
21 pages
Project Questions
No ratings yet
Project Questions
5 pages
Unit 3
No ratings yet
Unit 3
18 pages
Data Analyst Interview Questions
No ratings yet
Data Analyst Interview Questions
3 pages
Data Science Curriculum
No ratings yet
Data Science Curriculum
22 pages
300+ Power BI Interview Questions
No ratings yet
300+ Power BI Interview Questions
11 pages
Oracle Data Integrator (Odi) Best Practices
No ratings yet
Oracle Data Integrator (Odi) Best Practices
65 pages
50 Common Data Analyst Interview Questions
No ratings yet
50 Common Data Analyst Interview Questions
3 pages
Questions and Answers
No ratings yet
Questions and Answers
7 pages
Interview Questions
No ratings yet
Interview Questions
29 pages
Power BI Week 1
No ratings yet
Power BI Week 1
50 pages
Most Asked Interview Questions For Data Analyst
No ratings yet
Most Asked Interview Questions For Data Analyst
10 pages
Interview QnAs - CloudyML
No ratings yet
Interview QnAs - CloudyML
13 pages
Interview Question For DA
No ratings yet
Interview Question For DA
14 pages
Top 50 Industry-Relevant Data Analyst Interview Q - A
No ratings yet
Top 50 Industry-Relevant Data Analyst Interview Q - A
5 pages
BI Unit 4 Final
No ratings yet
BI Unit 4 Final
2 pages
Company Interview
No ratings yet
Company Interview
24 pages
Data Analytics Interview
No ratings yet
Data Analytics Interview
10 pages
Cognizant Data Analyst Interview Questions 1745235888
No ratings yet
Cognizant Data Analyst Interview Questions 1745235888
18 pages
Mock Interview Topics and Questions
No ratings yet
Mock Interview Topics and Questions
4 pages
Advanced PL/SQL and Oracle 9i Etl
No ratings yet
Advanced PL/SQL and Oracle 9i Etl
15 pages
CCI ETL Estimate Guidelines v1 1
No ratings yet
CCI ETL Estimate Guidelines v1 1
11 pages
100 Most Difficult Data Analyst Interview Q&A
No ratings yet
100 Most Difficult Data Analyst Interview Q&A
26 pages
Unit 2 Data Gathering
No ratings yet
Unit 2 Data Gathering
14 pages
Computer Basics Document
No ratings yet
Computer Basics Document
27 pages
Unit1 (DW&DM)
No ratings yet
Unit1 (DW&DM)
30 pages
Aspects of Data Quality (Excellent!)
No ratings yet
Aspects of Data Quality (Excellent!)
2 pages
DM104 - Evaluation of Business Performance
No ratings yet
DM104 - Evaluation of Business Performance
13 pages
DM104 - Evaluation of Business Performance
No ratings yet
DM104 - Evaluation of Business Performance
15 pages
55 Questions
No ratings yet
55 Questions
17 pages
Data Analytics Questions
No ratings yet
Data Analytics Questions
6 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
From Everand
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Brian Knight
3/5 (1)

Data Ques

Uploaded by

Data Ques

Uploaded by

TOP 30

What matters is developing the problem

This Doc will help you with the same.

What makes OLTP different from OLAP?

OLTP (Online Transaction Processing) deals with day to

day transactions which makes real time entry and retrieval

OLAP (Online Analytical Processing) is for the analysis of

huge amount for data and is more concentrating on the high

integrity of the queries made and the reports developed for

In short: OLTP is good for processing data transactions; OLAP

good for data analysis.

What is the importance of p-values in

A P-value aids in adding whether or not the outcomes of a

hypothesis test are statistically different. If the results are

statistically significant, a small p-value (p < 0.05, or the

researcher’s chosen alpha level) reject null hypothesis in favour

of the stand with the consequent supporting the research

alternative hypothesis. A large p-value means the data we

present in our study does not allow us to reject the null

Can you describe the difference between

normalization and standardization?

Normalization brings data to a scale within a particularly

predefined range, which is often between 0 and 1 while

Standardization adjusts data to have a zero mean and unit

standard deviation. Normalization is helpful when the features

have different units of measurement but standardization is

helpful when features have different scales which must be

measured in one form.

SELECT : The learner should only choose the

How would you design a dashboard to track

Design a product performance dashboard with the following

Overview Section : The major ideas such as the sales,

revenues or customers feedback can be easily observed at

Graphs/Charts : Categorize performance by means of

charts such as bar, line or pie

Filters : Users should be able to sort the data by time,

geographical area, and/ or by product type

Comparisons : Build direct comparisons with related values

(for example, product to target comparison)

Alerts/Notifications : They underline some important

change or problem in the company (for example, weak

Real-time Data : It is also important to make sure the latest

data is put in the dashboard frequently.

Avoid Joining Unnecessary Tables : Thus, only those tables

COUNT (*) AS churned_customers

WHERE status = ‘churned’ AND churn_date BETWEEN

‘start_date’ AND ‘end_date’

WHERE join_date <= ‘start_date’

Replace 'start_date' and 'end_date' with your period dates.

Explain the differences between supervised and

Incorporates labeled data which are input-output pairs

The model learns with the aim at predicting outputs from

Example: House price prediction based on historical

Requires no predetermined outcomes to be applied to

The model classifies data that is available by looking for

Example: Customer classification in marketing.

UNION : After executing two queries, the command

More than 136% hike for every

Average Package of 24LPA.

You might also like

What matters is developing the problem 

More than 136% hike for every