Total Marks (15 Qns 1 Mark 15 Marks) : Business Intelligence and Analytics Assignment Week 1
Total Marks (15 Qns 1 Mark 15 Marks) : Business Intelligence and Analytics Assignment Week 1
Total Marks (15 Qns 1 Mark 15 Marks) : Business Intelligence and Analytics Assignment Week 1
ASSIGNMENT WEEK 1:
1. What concept does the phrase "turning data tombs into 'golden nuggets' of knowledge"
signify with respect to data mining? (1 mark)
a) The transformation of extensive data reserves into valuable insights and knowledge.
b) The replacement of conventional data repositories with intuitive decision-making
tools.
c) The extraction of specific data sets for expert systems' utilization.
d) The integration of data archives with cutting-edge data mining technologies.
2. Which step involves the extraction of data patterns using intelligent methods? (1 mark)
a) Data cleaning
b) Data integration
c) Data selection
d) Data mining
3. What is the primary purpose of data mining in the context of the data age? (1 mark)
5. What does the architecture of a data warehouse primarily aim to facilitate? (1 mark)
a) Data cleaning and integration
b) Advanced query optimization
c) Management decision making
d) Real-time data processing
6. What is the primary advantage of using data warehouse systems for OLAP? (1 mark)
8. How does data mining benefit from scalable database technologies? (1 mark)
10. Which phase in the knowledge discovery process involves the removal of noise and
inconsistent data? (1 mark)
a) Data integration
b) Data transformation
c) Data cleaning
d) Data selection
11. In the context of data preprocessing, what is the purpose of data transformation?
(1 mark)
13. Which of the following can be called as a major driver of Data Mining? (1 mark)
15. What can you infer from the following graph? (1 mark)
a) Less travelled destinations are growing more popular with each passing year
b) Top travel destinations are becoming more popular with each passing year
c) The growth of ICT evidently played an important role in making least popular places
more popular
d) The growth of social media evidently played an important role in making least popular
places more popular
BUSINESS INTELLIGENCE AND ANALYTICS
ASSIGNMENT WEEK 2:
1. Which term describes the practice of making decisions purely on data analysis rather
than intuition? (1 mark)
a) Data Science
b) Data Engineering
c) Data-Driven Decision (DDD) Making
d) Fundamental principles of data extraction
3. What does the acronym ACID stand for in the context of databases? (1 mark)
a) OLTP converts data cube into relational data, while OLAP focusses on real time
data entry.
b) OLAP handles a large number of simple transactions, while OLTP deals with
complex analysis of data.
c) OLAP deals with data retrieval and analysis for revealing business trends, while
OLTP supports a large number of simple transactions.
d) OLTP utilizes cubes as its primary data structure, while OLAP uses traditional
relational databases.
7. Which data warehouse model spans the entire organization and provides corporate-
wide data integration? (1 mark)
a) Data mart
b) Virtual warehouse
c) Enterprise warehouse
d) Operational system
a) Data extraction
b) Data cleaning
c) Data transformation
d) Load
12. Which type of DBMS language is used to create the database schema? (1 mark)
a) Data Manipulation Language
b) Data Query Language
c) Data Definition Language
d) Transaction Control Language
13. Which Data Manipulation command is used to add a new record in a database? (1
mark)
a) SELECT
b) INSERT
c) UPDATE
d) DELETE
14. What does the atomicity property of the ACID database guarantee in a transaction?
(1 mark)
a) That the transaction will be completed
b) That the transaction will be all-or-nothing.
c) That the transaction will be isolated from other transactions.
d) That the transaction will be durable and survive failures.
15. What problem does the ACID property of isolation address? (1 mark)
a) Incomplete or inconsistent updates during transactions.
b) Concurrent access to data causing inconsistencies.
c) Loss of data due to system failures.
d) Unrealistic expectations for transaction performance.
BUSINESS INTELLIGENCE AND ANALYTICS
ASSIGNMENT WEEK 3:
4. What does the apex cuboid in a data cube typically represent? (1 Mark)
a) Lowest level of summarization
b) Highest level of summarization
c) Total sales or dollars sold
d) Entities or perspectives for record-keeping
5. How many cuboids are there in a 4-dimensional cube with 4 levels each? (1 Mark)
a) 625 cuboids
b) 725 cuboids
c) 125 cuboids
6. 525 cuboidsWhat is a significant difference between a snowflake
schema and a star schema? (1 Mark)
a) Higher redundancy in dimension tables
7. Which schema is commonly used in data warehouses due to its capability to model
multiple, interrelated subjects? (1 Mark)
a) Star schema
b) Snowflake schema
c) Fact constellation
d) Entity-relationship model
8. Which normal form deals with atomicity and ensures that each attribute contains only
indivisible values? (1 Mark)
a) First Normal Form (1NF)
b) Second Normal Form (2NF)
c) Third Normal Form (3NF)
d) Boyce-Codd Normal Form (BCNF)
10. Consider the SQL statement: SELECT COUNT (*) FROM table_name. What does it
retrieve? (1 Mark)
12. Which normalization form ensures that every non-prime attribute is fully functionally
dependent on the primary key, eliminating all transitive dependencies? (1 Mark)
a) Second Normal Form (2NF)
b) Third Normal Form (3NF)
c) Boyce-Codd Normal Form (BCNF)
d) Fourth Normal Form (4NF)
13. What is the purpose of generating a lattice of cuboids in a data cube model? (1 Mark)
a) To display data at various levels of summarization based on different dimensions
b) To limit data visualization to a three-dimensional representation
c) To establish a relationship between the number of dimensions and the quantity of
facts
d) To organize data in a hierarchical manner for easier access
14. What distinguishes a data mart from a data warehouse in terms of schema
preference? (1 Mark)
a) Data marts prioritize the fact constellation schema, whereas data warehouses prefer
snowflake schemas.
b) Data warehouses commonly employ star schema, while data marts usually opt for
snowflake schemas.
c) Data marts typically utilize star or snowflake schemas, while data warehouses Favor
the fact constellation schema.
d) Data warehouses exclusively use star schemas, whereas data marts solely rely on
snowflake schemas.
ASSIGNMENT WEEK 4:
1. The concept of "Survival at time 't'" in survival analysis refers to: (1 Mark)
a) The probability of customer loyalty at a specific time
b) The duration a customer remains active
c) The likelihood of customers making repeat purchases
d) The probability of a customer surviving from the previous time period to 't'
2. What does the term "Churn Rate" signify in customer analytics? (1 Mark)
a) The percentage of customers who make repeat purchases
b) The rate at which new customers are acquired
c) The ratio of customers who remain loyal to total number of customers
d) The rate at which customers discontinue or leave
8. Why is it important for businesses to track their customer acquisition cost (CAC)
alongside CLV? (1 Mark)
a) To determine the profitability of customer segments
b) To identify opportunities for cost reduction
c) To measure the effectiveness of marketing campaigns
d) All of the above
10. What makes the survival curve a more reliable measure compared to the retention
curve? (1 Mark)
a) The survival curve is based on newer customer cohorts, providing more accurate
data.
b) Survival calculations use information from all customers, offering more stability.
c) Retention curves are limited to customers starting at a specific time, causing
fluctuations.
d) The retention curve considers the hazard probabilities at all tenures.
11. What are the potential limitations of using survival analysis in customer churn
prediction? (1 Mark)
a) It requires a large amount of historical data for accurate predictions.
b) It assumes that customer behaviour remains consistent over time.
c) It cannot account for external factors that may influence churn rates.
d) All of the above
12. How does survival differ from retention in customer analytics? (1 Mark)
a) Survival focuses on future customer behaviour, while retention analyses past
behaviour.
b) Retention measures the conditional survival at specific tenures.
c) Survival accumulates probabilities of a customer event not occurring over time.
d) Retention is always a smoother curve compared to survival.
13. Which components are crucial for a full customer value calculation? (1 Mark)
a) Length of the customer relationship only
b) Revenues and length of the customer relationship
c) Costs associated with customers only
d) Revenues, costs, and length of the customer relationship
14. How does survival analysis contribute to customer value calculations? (1 Mark)
a) It estimates the probability of a customer surviving indefinitely.
b) It helps determine the exact tenure for each customer in a relationship.
c) It provides insights into the expected remaining tenure for customers.
d) It calculates the value of the customer per unit time.
15. An online gaming platform has 100,000 active users. During a specific month, 10,000
users become inactive. The platform identifies 20,000 users as being at risk of
becoming inactive during that month. What is the hazard probability for the online
gaming platform during that month? (1 Mark)
a) 0.2
b) 0.6
c) 0.5
d) 0.25
ASSIGNMENT WEEK 5:
2. What type of data transformation technique scales data to a specific range, such as 0
to 1? (1 Mark)
a) Database normalization
b) Aggregation
c) Smoothing techniques
d) Standardization/Normalization
4. What does Ordinary Least Squares (OLS) aim to minimize in the context of linear
regression? (1 Mark)
a) The sum of squared errors between the predicted and observed values of the
dependent variable.
b) The sum of squared residuals between the predicted and observed values of the
independent variable.
c) The total variance of the independent variables.
d) The sum of squared errors between the predicted and observed values of the
independent variable.
5. The coefficient of determination (R-squared) value of 0.98 in a regression model
implies: (1 Mark)
D) The residuals in the model are normally distributed with z value of 0.98
9. When should one focus on reducing bias in a machine learning model? (1 Mark)
a) When the model performs well on the training data but poorly on test data.
b) When the model shows high variability in predictions.
c) When the model consistently overfits the training data.
d) When the model doesn’t fit the data well, and works poorly in
explanatory/predictive performance
a) It iteratively uses all but one sample as the test set and the remaining sample as
the training set.
b) It divides the dataset into k subsets and uses each subset as the testing set in
turn.
c) It creates a validation set from a small portion of the data.
d) It iteratively uses all but one sample as the training set and the remaining sample
as the testing set.
14. What are the three sources of error in predicted Y in machine learning? (1 Mark)
15. Which of the following statements most accurately distinguishes supervised learning
from unsupervised learning in machine learning? (1 Mark)
a) Supervised learning requires labelled data for training models to predict specific
outcomes, while unsupervised learning uncovers patterns or structures in data
without predefined outcomes.
b) Supervised learning primarily deals with clustering data points based on similarities,
while unsupervised learning focuses on predicting future trends based on historical
data.
c) Supervised learning utilizes human supervision to label data for analysis, while
unsupervised learning relies on algorithms to classify data into distinct categories.
d) Supervised learning involves training models without any prior knowledge of the
dataset, while unsupervised learning requires prior information about the
characteristics of the data.
BUSINESS INTELLIGENCE AND ANALYTICS
ASSIGNMENT WEEK 6:
a) Decision trees
b) Bayes' Classifiers
c) Support Vector Machines (SVM)
d) Artificial Neural Networks (ANN)
3. Which are the two measures used in ROC curves to visualize the performance of
classifiers? (1 Mark)
4. Which metric measures the ratio of correctly predicted positive observations to the total
predicted positives? (1 Mark)
a) Accuracy
b) Sensitivity
c) Specificity
d) Precision
5. Imagine you're building a spam filter that classifies emails as spam or not spam. After
testing your model, you get the following results:
a) 0.812
b) 0.525
c) 0.909
d) 0.455
6. Which technique primarily uses a set of if-else decision rules to categorize data?
(1 Mark)
a) Decision trees
b) Artificial Neural Networks (ANN)
c) Support Vector Machines (SVM)
d) Genetic algorithms
7. How does the test data variation contribute to the errors in predicting Y values?
(1 Mark)
9. In classification, what does the term "reducible error" primarily refer to? (1 Mark)
10. In a medical study evaluating a diagnostic test for a certain disease, 150 patients were
tested. Of these, 90 patients were diagnosed with the disease, while 60 patients did
not have the disease. The model predictions are as follows:
a) 0.25
b) 0.2
c) 0.15
d) 0.18
11. Overfitting occurs when a classifier incorporates anomalies of the training data that are
not present in the general dataset. (True/False) (1 Mark)
15. Which of the following is NOT a commonly used classification technique? (1 Mark)
a) Decision trees
b) Logistic regression
c) K-nearest neighbours (KNN)
d) Principal component analysis (PCA)
ASSIGNMENT WEEK 7:
3. Why might a decision tree, resulting from the described process, perform poorly on a
test set? (1 Mark)
a) Due to too few splits leading to underfitting
b) Because it has too few leaves
c) It's likely to have too many splits, causing overfitting
d) It has high bias but low variance
4. What might a smaller tree with fewer splits achieve in terms of variance and bias? (1
Mark)
a) It reduces both variance and bias
b) It reduces variance but possibly increases bias
c) It increases both variance and bias
d) It doesn't affect variance or bias
8. Bagging primarily addresses which issue within statistical learning methods like
decision trees? (1 Mark)
a) Increases the computational complexity of the models.
b) Reduces the need for accurate parameter tuning in models.
c) Deals with high variance and improves prediction accuracy.
d) Reduces high bias and improves prediction accuracy.
11. In Bagging, each individual tree is independent of each other because they consider
different subset of features and samples. (T/F) (1 Mark)
12. What are some common techniques for handling imbalanced data in classification
tasks? (1 Mark)
a) Oversampling the minority class to create a more balanced dataset.
b) Under sampling the majority class to reduce its dominance.
c) Only a is correct
d) Both a and b are correct
13. In Random forest you can generate hundreds of trees (say T1, T2 …..Tn) and then
aggregate the results of these trees. Which of the following is true about an individual
(Tk) tree in Random Forest? (1 Mark)
a) 1 and 3
b) 1 and 4
c) 2 and 3
d) 2 and 4
14. Consider a dataset with a binary target variable (0 or 1) and a split based on a
feature resulting in two child nodes after the split.
• Node 1 (left child): Out of 40 samples, 30 belong to class 0 and 10 belong to class 1.
• Node 2 (right child): Out of 60 samples, 20 belong to class 0 and 40 belong to class
1.
which option has the correct Gini indices of the child nodes? (3 Marks)
a) Gini index for Node 1: 0.375, Gini index for Node 2: 0.444
b) Gini index for Node 1: 0.375, Gini index for Node 2: 0.320
c) Gini index for Node 1: 0.425, Gini index for Node 2: 0.320
d) Gini index for Node 1: 0.444, Gini index for Node 2: 0.375
15. How does Random Forest aim to reduce correlation among trees? (1 Mark)
a) By constructing trees sequentially based on the residuals.
b) By growing trees independently with a random subset of predictors at each split.
c) By fitting trees to the residuals from the current model.
d) By sequentially building trees using information from previously grown trees.
ASSIGNMENT WEEK 8:
1. Which of the following is a common method for splitting nodes in a decision tree? (1
Mark)
A. Gini impurity
B. Cross-validation
C. Gradient descent
D. Principal component analysis
4. What is the main difference between classification and regression trees in a CART
algorithm? (1 Mark)
A. Classification trees predict categorical variables, while regression trees predict
continuous variables
B. Classification trees use Gini impurity as the splitting criterion, while regression trees
use information gain
C. Classification trees can handle missing data, while regression trees cannot
D. Classification trees are computationally expensive, while regression trees are
computationally inexpensive
7. Which of the following is a common stopping criterion for growing a decision tree? (1
Mark)
9. What's the primary drawback of utilizing a substantial maximum depth for a decision
tree? (1 Mark)
A. It leads to overfitting
B. It cannot capture the noise in the training data
C. It simplifies the computational complexity of the tree
D. It results in the tree underfitting the data
A. Fraud detection
B. Stock price prediction
D. Image classification
12. How can decision trees be made more robust to noise in the data? (1 Mark)
14. If the true positive value is 10 and the false positive value is 15, what is the precision score
for the classification model? (1 Mark)
A. 0.6
B. 0.4
C. 0.5
ASSIGNMENT WEEK 9:
A. Labeled data
B. Unlabeled data
C. Numerical data
D. Categorical data
6. Which of the following is a method of choosing the optimal number of clusters for k-
means? (1 Mark)
A. Shadow method
B. the silhouette method
C. the elbow method
D. B and C
7. Which of the following statements best describes the goal of SMOTE preprocessing
technique? (1 Mark)
a) Reduce the dimensionality of the data
b) Balance the class distribution in imbalanced datasets
c) Improve the interpretability of a machine learning model
d) Detect outliers in the data
11. In a 3-dimensional space represented by coordinates (x, y, z), two cluster centroids,
A and B, have coordinates A(2, 4, 6) and B(5, 1, 3) respectively. What is the precise
Euclidean distance between these centroids, denoting their dissimilarity in the cluster
space? (1 Mark)
A) 5.20 units
B) 3.00 units
C) 4.36 units
D) 6.48 units
12. In K-means clustering, what is the purpose of the "elbow method"? (1 Mark)
A. To determine the optimal number of clusters
B. To identify the best distance metric
C. To select the best initialization method
D. To determine the convergence criteria
13. Suppose that a customer transaction table contains 9 items and 3 customers. What
is the Jaccard coefficient (similarity measure for asymmetric binary variables) for C1
and C2? (1 Mark)
a) 0.75
b) 0.25
c) 0.35
d) 0.85
14. In the figure below, if you draw a horizontal line on the y-axis for y=2. What will be
the number of clusters formed? (1 Mark)
Options:
A. 1
B. 2
C. 3
D. 4
15. Assume you want to cluster 7 observations into 3 clusters using the K-Means
clustering algorithm. After first iteration, clusters C1, C2, C3 have following
observations:
Options:
A. 10
B. 5* sqrt (2)
C. 13*sqrt (2)
D. None of these
BUSINESS INTELLIGENCE AND ANALYTICS
4. Which customer group is likely to need incentives to increase their spending and
engagement? (1 Mark)
A) True Friends
B) Butterflies
C) Barnacles
D) Strangers
E)
5. What does CLV stand for in RFM analyses? (1 Mark)
A) Customer Lifetime Value
B) Customer Loyalty Value
C) Customer Longevity Value
D) Customer Lifetime Volume
6. How are R, F, and M typically combined to create composite scores in some methods?
(1 Mark)
A) Adding R, F, and M directly
B) Multiplying R by 5, F by 2, and M by 1
C) Dividing R, F, and M by a constant
D) Subtracting R from F and then adding M
7. What SQL function is used for RFM analysis to scale RFM into a predefined range?
(1 Mark)
A) GROUP BY
B) AVG()
C) NTILE()
D) MAX()
8. In RFM analysis what does "Recency" refer to? (1 Mark)
10. Which Python package provides functionality for visualizing K-means clustering
results using 2D and 3D plots?
A) seaborn
B) b. matplotlib
C) c. pandas
D) d. scikit-learn
11. How is Recency (R) scaled after grouping Days since last order into 10 deciles? (1
Mark)
a) It is scaled from 1-5 for better representation
b) It is reversed, with the most recent customer receiving the highest R value
c) It is not scaled as it represents the number of days directly
d) It is scaled logarithmically for better clustering
12. Which clustering algorithm assigns data points to the nearest cluster centroid? (1 Mark)
a) K-Means
b) DBSCAN
c) Agglomerative
d) Mean-Shift
13. A retail company wants to segment its customers for targeted marketing campaigns.
They have data on customer demographics (age, gender, income), purchase history
(amount, frequency, categories), and online behaviour (website visits, clicks). Which
features are most suitable for k-means clustering in this scenario? (1 Mark)
a) Demographics only (age, gender, income)
b) Purchase history only (amount, frequency, categories)
c) Online behaviour only (website visits, clicks)
d) A combination of all features
14. True or False: In K-means clustering, each cluster is represented by its center (centroid) which
corresponds to the median of points assigned to the cluster.
15. Out of the reasons elicited below, what would be a major reason for you not to choose
K- means for clustering analysis? (1 Mark)
a) It is sensitive to noise and outlier data points and also sensitive to the initial placement
of its cluster centers.
b) It always leads to complex cluster formation due to unequal size of cluster formed
c) Inter cluster distance is high for K-Means clustering
d) Accuracy of the model is comparatively low compared to other modes of clustering
BUSINESS INTELLIGENCE AND ANALYTICS
A) Gradient Descent
B) K-Means
C) Random Forest
7. What does the term 'epoch' refer to in neural network training? (1 Mark)
10. If a neural network has 16 input neurons and 4 output neurons, how many neurons
would be recommended for the hidden layer according to thumb rule? (1 Mark)
A) 8 neurons
B) 4 neurons
C) 2 neurons
D) 12 neurons
11. If you increase the number of hidden layers in a multi-layer perceptron, the
classification error of test data always decreases- True/False (1 Mark)
12. There is a feedback loop in the final stage of a back propagation algorithm- T/F (1
Mark)
13. In time series analysis, which component represents the long-term movement or the
general direction of the data? (1 Mark)
A) Seasonality
B) Cyclical variations
C) Trend
D) Residual or noise
c) A collection of M documents
a) A collection of M files
b) A sequence of N words
A. A punctuation mark
B. A lemmatization process
C. A collection of M documents
A. Uncommon words
B. Words with high term frequency
C. Words with low term frequency
D. Commonly used words like "the," "is," "of," etc.
6. How is the inverse document frequency (idf) calculated for a given term? (1
Mark)
7. True or False: The statistic tf-idf is intended to measure how important a word is to
a document in a collection (or corpus) of documents
9. What does a higher Phi coefficient value indicate regarding word co-occurrence?
(1 Mark)
10. .In a text corpus comprising 200 documents, the word "forest" and “wildlife” doesn’t co-occur in
120 documents. Both "forest" and "wildlife" co-occur in 50 documents. Furthermore, "forest" without
"wildlife" appears in 10 documents, and "wildlife" without "forest" appears in 20 documents. What is
the Phi coefficient to measure the correlation between the appearance of the words "forest" and
"wildlife" in this dataset?
A. 0.19
B. 0.66
C. 0.72
D. 0.85
11. Which of the following datasets provides a polarity score ranging from -5 to +5 for
words in sentiment analysis? (1 Mark)
13. In TF-IDF analysis, what does the term frequency (tf) measure for a word in a
document? (1 Mark)
A. The count of the word in a document divided by the total words in that document
B. The count of the word in a document divided by the count of the word in the entire
corpus
C. The number of documents containing the word divided by the total number of
documents
D. The logarithm of the total number of documents divided by the number of
documents containing the term
15. Which of these techniques is used for normalization in text mining? (1 Mark)
A. Stemming
B. Stop words removal
C. Lemmatization
D. All of the above