Data Analytics Mid
Data Analytics Mid
SUBJECT TOPIC
1 Data Analytics Data Analytics
2 Data Analytics Data Analytics
3 Data Analytics Data Analytics
4 Data Analytics Data Analytics
5 Data Analytics Data Analytics
6 Data Analytics Data Analytics
7 Data Analytics Data Analytics
8 Data Analytics Data Analytics
9 Data Analytics Data Analytics
10 Data Analytics Data Analytics
11 Data Analytics Data Analytics
12 Data Analytics Data Analytics
13 Data Analytics Data Analytics
14 Data Analytics Data Analytics
15 Data Analytics Data Analytics
16 Data Analytics Data Analytics
17 Data Analytics Data Analytics
18 Data Analytics Data Analytics
19 Data Analytics Data Analytics
20 Data Analytics Data Analytics
21 Data Analytics Data Analytics
22 Data Analytics Data Analytics
23 Data Analytics Data Analytics
24 Data Analytics Data Analytics
25 Data Analytics Data Analytics
26 Data Analytics Data Analytics
27 Data Analytics Data Analytics
28 Data Analytics Data Analytics
29 Data Analytics Data Analytics
30 Data Analytics Data Analytics
TAGS QUESTION TYPE
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
Intermediate SINGLECORRECT
QUESTION TEXT
Which data visualization technique is best suited for comparing the distribution of a categorical variable
What statistical measure is used to measure the strength and direction of the linear relationship betwee
Which data transformation technique is used to scale numerical features to a standard range?
What type of statistical test is appropriate for comparing the means of two independent groups?
In a dataset with skewed distribution, which measure of central tendency is typically most appropriate t
Which data exploration technique involves identifying and handling missing values in a dataset?
Which statistical measure represents the dispersion of data values around the median?
What type of data visualization is useful for displaying the relationship between two continuous variable
Which statistical measure assesses the variability of data points from the regression line in a linear regr
What data preprocessing technique is used to convert categorical variables into numerical representati
In data analysis, what is the purpose of outlier detection?
What type of statistical test is used to determine if there is a significant association between two catego
Which data preprocessing technique involves removing duplicate records from a dataset?
What is the primary purpose of exploratory data analysis (EDA) in data analytics?
What is the key advantage of using dimensionality reduction techniques in data analysis?
Which statistical measure provides insight into the spread of data values around the mean in a normal d
What is the primary objective of feature scaling in machine learning and data analysis?
In data analysis, what is the role of cross-validation?
What data exploration technique is used to identify relationships and dependencies between variables i
Which statistical measure indicates the proportion of variance explained by the regression model's inde
What is the primary goal of feature engineering in data preprocessing?
What statistical test is used to determine if there are significant differences between the means of multiple grou
In machine learning, what is the primary purpose of hyperparameter tuning?
What data preprocessing technique is used to convert textual data into numerical representations for analysis?
In statistical hypothesis testing, what does the p-value represent?
What type of machine learning algorithm is used for classification tasks with discrete output categories?
What is the primary purpose of data imputation in data preprocessing?
Which statistical measure assesses the strength and direction of the relationship between two categorical variab
What type of data preprocessing technique is used to detect and handle outliers in a dataset?
In machine learning, what is the primary purpose of model evaluation?
OPTION1
Scatter plot
Mean
Min-Max scaling
Paired t-test
Mean
Outlier detection
Range
Scatter plot matrix
Mean squared error
One-hot encoding
To identify missing values
Chi-square test
Feature scaling
To summarize data
Increased computational complexity
Range
To improve model performance
To split data into training and testing sets
Correlation analysis
Mean squared error
To increase data complexity
Independent t-test
To train the model with optimal parameters
One-hot encoding
Type I error rate
Regression
To remove missing values
Chi-square test
Feature scaling
To select the most relevant features
OPTION2
Histogram
Median
Log transformation
One-way ANOVA
Median
Imputation
Interquartile range
Heatmap
R-squared
Principal component analysis
To find patterns
Independent t-test
Imputation
To make predictions
Improved interpretability
Variance
To reduce feature importance
To evaluate model performance
Feature engineering
R-squared
To improve model performance
ANOVA
To select the most relevant features
Text normalization
Type II error rate
Clustering
To handle categorical variables
Independent t-test
Imputation
To train the model with optimal parameters
OPTION3
Box plot
Mode
Z-score normalization
Independent t-test
Mode
Data cleansing
Standard deviation
Box plot
Standard error
Feature scaling
To detect unusual data points
ANOVA
Outlier detection
To identify patterns
Reduced data complexity
Interquartile range
To increase data complexity
To impute missing values
Outlier detection
Standard error
To remove missing values
Chi-square test
To scale input features
Feature scaling
Probability of observing the data
Decision tree
To reduce feature importance
ANOVA
Outlier detection
To scale input features
OPTION4
Bar chart
Correlation coefficient
Mean normalization
Chi-square test
Standard deviation
Feature scaling
Variance
Parallel coordinates
Residuals
Normalization
To calculate central tendency
Pearson correlation
Deduplication
To explore relationships
Enhanced feature importance
Standard deviation
To remove missing values
To identify outliers
Dimensionality reduction
Residuals
To normalize feature scales
Pearson correlation
To prevent overfitting
Principal component analysis
Significance level
Support vector machine
To replace missing values
Pearson correlation
Dimensionality reduction
To assess model performance
OPTION5 OPTION6 OPTION7 OPTION8 OPTION9 OPTION10 RIGHT ANSWER
4
4
3
3
2
2
2
1
4
1
3
1
4
3
2
4
1
2
1
2
2
2
1
2
3
3
4
1
3
4
EXPLANATION
Bar charts are effective for comparing the distribution of a categorical variable across multiple groups because th
The correlation coefficient, such as Pearson's correlation coefficient, is used to measure the strength and directio
Z-score normalization (also known as standardization) scales numerical features to have a mean of 0 and a stand
The independent t-test (also known as the two-sample t-test) is used to compare the means of two independent
In skewed distributions, the median is often the most appropriate measure of central tendency as it is less affect
Imputation is the process of replacing missing values in a dataset with estimated values based on the available d
The interquartile range (IQR) represents the dispersion of data values around the median by measuring the spre
A scatter plot matrix displays scatter plots for each pair of continuous variables in a dataset, allowing for the visu
Residuals represent the differences between observed and predicted values in a linear regression model, provid
One-hot encoding converts categorical variables into binary vectors, where each category is represented by a bin
Outlier detection aims to identify unusual or anomalous data points that deviate significantly from the rest of th
The chi-square test is used to determine if there is a significant association between two categorical variables by
Deduplication involves removing duplicate records from a dataset, ensuring that each observation is unique and
EDA is used to identify patterns, trends, and relationships within a dataset, providing insights that guide further a
Dimensionality reduction techniques reduce the number of features in a dataset while preserving important info
The standard deviation measures the spread of data values around the mean in a normal distribution, with highe
Feature scaling aims to standardize or normalize the scale of input features in a dataset, ensuring that all feature
Cross-validation is a technique used to assess the performance of machine learning models by partitioning the d
Correlation analysis is used to identify relationships and dependencies between variables in a dataset by measur
R-squared, also known as the coefficient of determination, indicates the proportion of variance in the dependen
Feature engineering involves creating new features or transforming existing ones to improve the performance o
ANOVA (Analysis of Variance) is used to determine if there are significant differences between the means of mul
Hyperparameter tuning involves selecting the optimal values for parameters that control the learning process of
Text normalization involves converting textual data into a standardized format by removing punctuation, conver
The p-value represents the probability of observing the data, assuming that the null hypothesis is true. A lower p
Decision trees are supervised learning algorithms used for classification tasks, where the output consists of discr
Data imputation involves replacing missing values in a dataset with estimated values based on the available data
The chi-square test assesses the strength and direction of the relationship between two categorical variables by
Outlier detection involves identifying and handling data points that deviate significantly from the rest of the data
Model evaluation involves assessing the performance of machine learning models on unseen data to determine
CORRECT MARKS NEGATIVE MARKS OTHER LANGUAGE QUESTION TEXT
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
OPTION1 OPTION2 OPTION3 OPTION4 OPTION5 OPTION6 OPTION7
OPTION8 OPTION9 OPTION10 EXPLANATION
OTHER LANGUAGE QUESTION TEXOPTION1 OPTION2 OPTION3
OPTION4 OPTION5 OPTION6 OPTION7 OPTION8 OPTION9 OPTION10
EXPLANATION