Total Marks (15 Qns 1 Mark 15 Marks) : Business Intelligence and Analytics Assignment Week 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

BUSINESS INTELLIGENCE AND ANALYTICS

ASSIGNMENT WEEK 1:

Total marks = (15 Qns * 1 mark = 15 marks)

1. What concept does the phrase "turning data tombs into 'golden nuggets' of knowledge"
signify with respect to data mining? (1 mark)

a) The transformation of extensive data reserves into valuable insights and knowledge.
b) The replacement of conventional data repositories with intuitive decision-making
tools.
c) The extraction of specific data sets for expert systems' utilization.
d) The integration of data archives with cutting-edge data mining technologies.

2. Which step involves the extraction of data patterns using intelligent methods? (1 mark)

a) Data cleaning
b) Data integration
c) Data selection
d) Data mining

3. What is the primary purpose of data mining in the context of the data age? (1 mark)

a) Storing and organizing massive amounts of data


b) Creating social networks and communities
c) Uncovering valuable information from vast data and converting it into organized
knowledge
d) Supporting scientific experiments and observations

4. Which technology contributed substantially to the evolution and wide acceptance of


relational technology for efficient storage, retrieval, and management of large amounts
of data? (1 mark)

a) Advanced indexing and accessing methods


b) Online transaction processing (OLTP)
c) Hierarchical database systems
d) Object-oriented database models

5. What does the architecture of a data warehouse primarily aim to facilitate? (1 mark)
a) Data cleaning and integration
b) Advanced query optimization
c) Management decision making
d) Real-time data processing

6. What is the primary advantage of using data warehouse systems for OLAP? (1 mark)

a) Providing detailed transactional views to users


b) Performing daily transaction analysis
c) Presenting data at different levels of abstraction
d) Enabling real-time data updates and modifications
7. What defines outliers in a dataset? (1 mark)

a) Objects that conform to the general behaviour or model of the data.


b) Data objects that are commonly discarded as exceptions.
c) Data that adhere strictly to statistical or distance measures.
d) Objects that deviate from the general behaviour or model of the data.

8. How does data mining benefit from scalable database technologies? (1 mark)

a) It diminishes efficiency and scalability on large datasets


b) It relies solely on traditional database systems
c) It enables high efficiency and scalability on large datasets
d) It limits scalability to small datasets for efficiency

9. What distinguishes Descriptive Analytics from other types of analytics? (1 mark)

a) It focuses on predicting future outcomes.


b) It identifies patterns and trends from past data.
c) It prescribes actions based on analyzed data.
d) It evaluates data in real-time for immediate decisions.

10. Which phase in the knowledge discovery process involves the removal of noise and
inconsistent data? (1 mark)

a) Data integration
b) Data transformation
c) Data cleaning
d) Data selection
11. In the context of data preprocessing, what is the purpose of data transformation?
(1 mark)

a) To retrieve relevant data from the database


b) To remove noise and inconsistent data
c) To prepare data for mining by performing summary or aggregation operations
d) To evaluate interesting patterns based on specific measures

12. What is the fundamental characteristic of a relational database? (1 mark)

a) It consists of a collection of tables with unique names.


b) It uses graphs or networked structures to store data.
c) It primarily stores multimedia data and text data.
d) It represents the database as a set of entities and their relationships.

13. Which of the following can be called as a major driver of Data Mining? (1 mark)

a) Decline of open-source technologies


b) Declining growth in Manufacturing sector globally
c) Penetration of MOOC platforms
d) Rise of transaction processing systems/ERPs
14. What does the "long tail" phenomenon refer to in business? (1 mark)
a) A marketing strategy focused on a narrow customer base
b) The distribution of sales that extends to less common products returning
substantial profits
c) A strategy to increase product diversity without considering market demand
d) A method to minimize inventory costs in retail businesses

15. What can you infer from the following graph? (1 mark)

a) Less travelled destinations are growing more popular with each passing year
b) Top travel destinations are becoming more popular with each passing year
c) The growth of ICT evidently played an important role in making least popular places
more popular
d) The growth of social media evidently played an important role in making least popular
places more popular
BUSINESS INTELLIGENCE AND ANALYTICS

ASSIGNMENT WEEK 2:

Total marks = (15 Qns * 1 mark = 15 marks)

1. Which term describes the practice of making decisions purely on data analysis rather
than intuition? (1 mark)

a) Data Science
b) Data Engineering
c) Data-Driven Decision (DDD) Making
d) Fundamental principles of data extraction

2. What characterizes the primary function of databases supporting Online Transaction


Processing (OLTP)? (1 mark)

a) Complex queries and analysis of data


b) Retrieval of business trends and statistics
c) Handling a large number of simple transactions and updates
d) Aggregation of data into cubes

3. What does the acronym ACID stand for in the context of databases? (1 mark)

a) Atomicity, Consistency, Integration, Durability


b) Atomicity, Continuity, Isolation, Durability
c) Atomicity, Consistency, Isolation, Durability
d) Atomicity, Continuity, Integration, Durability

4. What primarily distinguishes Online Analytical Processing (OLAP) from Online


Transaction Processing (OLTP)? (1 mark)

a) OLTP converts data cube into relational data, while OLAP focusses on real time
data entry.
b) OLAP handles a large number of simple transactions, while OLTP deals with
complex analysis of data.
c) OLAP deals with data retrieval and analysis for revealing business trends, while
OLTP supports a large number of simple transactions.
d) OLTP utilizes cubes as its primary data structure, while OLAP uses traditional
relational databases.

5. Which of the following best describes a data warehouse? (1 mark)


a) An operational database used for day-to-day transactions.
b) A repository separate from operational databases, providing integrated and
historic data for decision-making processes.
c) A system focused on real-time data analysis and customer interactions.
d) An integrated data structure primarily handling transaction processing.
6. Why does a data warehouse not require transaction processing, recovery, and
concurrency control mechanisms? (1 mark)
a) Because data warehouse is volatile
b) Due to its physical separation from operational data
c) Because it focuses on day-to-day operational activities rather than historical
data.
d) Since it integrates multiple heterogeneous data sources for decision support.

7. Which data warehouse model spans the entire organization and provides corporate-
wide data integration? (1 mark)

a) Data mart
b) Virtual warehouse
c) Enterprise warehouse
d) Operational system

8. What distinguishes a data mart from an enterprise warehouse? (1 mark)

a) Data marts contain detailed data, while enterprise warehouses contain


summarized data.
b) Enterprise warehouses are implemented on departmental servers, while data
marts utilize parallel architecture platforms.
c) Data marts require extensive business modelling, while enterprise warehouses
have a shorter implementation cycle.
d) Data marts typically contain a subset of corporate-wide data for specific users,
while enterprise warehouses collect information spanning the entire organization.
9. Which function of back-end tools in data warehouse systems involves
rectifying errors detected in the data? (1 mark)

a) Data extraction
b) Data cleaning
c) Data transformation
d) Load

10. What is the first step in the ETL process? (1 mark)

a) Load data into the target system.


b) Clean and transform the data.
c) Extract data from source systems.
d) Validate the data for accuracy.
11. Which of the following gives a logical structure of the database graphically? (1 mark)
a) Entity-relationship diagram
b) Data mining diagram
c) Database diagram
d) Architectural representation

12. Which type of DBMS language is used to create the database schema? (1 mark)
a) Data Manipulation Language
b) Data Query Language
c) Data Definition Language
d) Transaction Control Language

13. Which Data Manipulation command is used to add a new record in a database? (1
mark)
a) SELECT
b) INSERT
c) UPDATE
d) DELETE

14. What does the atomicity property of the ACID database guarantee in a transaction?
(1 mark)
a) That the transaction will be completed
b) That the transaction will be all-or-nothing.
c) That the transaction will be isolated from other transactions.
d) That the transaction will be durable and survive failures.
15. What problem does the ACID property of isolation address? (1 mark)
a) Incomplete or inconsistent updates during transactions.
b) Concurrent access to data causing inconsistencies.
c) Loss of data due to system failures.
d) Unrealistic expectations for transaction performance.
BUSINESS INTELLIGENCE AND ANALYTICS

ASSIGNMENT WEEK 3:

Total marks = (15 Qns * 1 mark = 15 marks)

1. Which database schema is typically associated with OLAP systems? (1 Mark)


a) Entity-Relationship (ER) schema
b) Star or Snowflake schema
c) Relational schema
d) Object-oriented schema

2. In a data cube, what are dimensions primarily representing? (1 Mark)


a) Numeric measures
b) Facts related to sales
c) Entities or perspectives for record-keeping
d) Tables associated with facts

3. What metaphor is used to describe multidimensional data storage in data


warehousing? (1 Mark)
a) Lattice
b) Apex cuboid
c) Data cube
d) Base cuboid

4. What does the apex cuboid in a data cube typically represent? (1 Mark)
a) Lowest level of summarization
b) Highest level of summarization
c) Total sales or dollars sold
d) Entities or perspectives for record-keeping

5. How many cuboids are there in a 4-dimensional cube with 4 levels each? (1 Mark)
a) 625 cuboids
b) 725 cuboids
c) 125 cuboids
6. 525 cuboidsWhat is a significant difference between a snowflake
schema and a star schema? (1 Mark)
a) Higher redundancy in dimension tables

b) Increased efficiency in querying


c) Normalization of dimension tables
d) Dimension tables linked directly to the fact table

7. Which schema is commonly used in data warehouses due to its capability to model
multiple, interrelated subjects? (1 Mark)
a) Star schema
b) Snowflake schema
c) Fact constellation
d) Entity-relationship model

8. Which normal form deals with atomicity and ensures that each attribute contains only
indivisible values? (1 Mark)
a) First Normal Form (1NF)
b) Second Normal Form (2NF)
c) Third Normal Form (3NF)
d) Boyce-Codd Normal Form (BCNF)

9. In a relational database, what is the purpose of a foreign key? (1 Mark)


a) It uniquely identifies each record in a table.
b) It maintains referential integrity between tables.
c) It ensures all attributes are atomic.
d) It acts as a substitute for the primary key.

10. Consider the SQL statement: SELECT COUNT (*) FROM table_name. What does it
retrieve? (1 Mark)

a) All entries with * in the table


b) The number of unique values in the table
c) All rows in the table
d) The average value across all columns
11. What is the primary objective of normalizing a database? (1 Mark)
a) To eliminate data redundancy and minimize data inconsistency
b) To increase data duplication for faster retrieval
c) To combine tables for simplification
d) To allow for more complex queries

12. Which normalization form ensures that every non-prime attribute is fully functionally
dependent on the primary key, eliminating all transitive dependencies? (1 Mark)
a) Second Normal Form (2NF)
b) Third Normal Form (3NF)
c) Boyce-Codd Normal Form (BCNF)
d) Fourth Normal Form (4NF)

13. What is the purpose of generating a lattice of cuboids in a data cube model? (1 Mark)
a) To display data at various levels of summarization based on different dimensions
b) To limit data visualization to a three-dimensional representation
c) To establish a relationship between the number of dimensions and the quantity of
facts
d) To organize data in a hierarchical manner for easier access

14. What distinguishes a data mart from a data warehouse in terms of schema
preference? (1 Mark)
a) Data marts prioritize the fact constellation schema, whereas data warehouses prefer
snowflake schemas.
b) Data warehouses commonly employ star schema, while data marts usually opt for
snowflake schemas.
c) Data marts typically utilize star or snowflake schemas, while data warehouses Favor
the fact constellation schema.
d) Data warehouses exclusively use star schemas, whereas data marts solely rely on
snowflake schemas.

15. What characterizes the Roll-up operation in OLAP? (1 Mark)


a) It aggregates data by stepping up a concept hierarchy or by adding dimensions.
b) It drills down into more detailed data by ascending a concept hierarchy.
c) It removes one or more dimensions from the cube, reducing its granularity.
d) It visualizes data by rotating the axes to provide an alternative presentation.

BUSINESS INTELLIGENCE AND ANALYTICS

ASSIGNMENT WEEK 4:

Total marks = (15 Qns * 1 mark = 15 marks)

1. The concept of "Survival at time 't'" in survival analysis refers to: (1 Mark)
a) The probability of customer loyalty at a specific time
b) The duration a customer remains active
c) The likelihood of customers making repeat purchases
d) The probability of a customer surviving from the previous time period to 't'

2. What does the term "Churn Rate" signify in customer analytics? (1 Mark)
a) The percentage of customers who make repeat purchases
b) The rate at which new customers are acquired
c) The ratio of customers who remain loyal to total number of customers
d) The rate at which customers discontinue or leave

3. How is Customer Lifetime Value (CLV) useful for businesses? (1 Mark)


a) It helps in reducing customer acquisition costs
b) It predicts the success of new product launches
c) It assists in identifying high-value customers for loyalty programs
d) It determines employee performance metrics

4. What does the customer half-life measure? (1 Mark)


a) The average time a customer stays with a business.
b) The time taken for exactly half of a customer cohort to leave
c) The average lifespan of a customer.
d) The time when a customer starts their relationship with a business.

5. What does the hazard signify in survival analysis? (1 Mark)


a) The probability of customers remaining loyal
b) The risk of customer attrition within a specific time interval
c) The likelihood of acquiring new customers
d) The proportion of customers renewing their subscriptions
6. How is the hazard probability calculated in customer tenure analysis? (1 Mark)
a) It is derived from a parametric equation.
b) It requires complex regression modelling.
c) It involves the ratio of customers who stop at a particular tenure to the population
at risk.
d) It is calculated using a customer's initial signup date.

7. What is a key application of survival analysis besides measuring customer


churn? (1 Mark)
a) Identifying factors influencing customer purchases.
b) Calculating customer lifetime value.
c) Predicting customer sentiment on social media.
d) Analysing customer demographics and purchase history.

8. Why is it important for businesses to track their customer acquisition cost (CAC)
alongside CLV? (1 Mark)
a) To determine the profitability of customer segments
b) To identify opportunities for cost reduction
c) To measure the effectiveness of marketing campaigns
d) All of the above

9. What does a survival curve in customer retention showcase? (1 Mark)


a) The increase in customer base over time
b) The proportion of customers expected to remain active over specific tenures
c) The decline in customer satisfaction rates
d) The total number of customers engaged with the business

10. What makes the survival curve a more reliable measure compared to the retention
curve? (1 Mark)
a) The survival curve is based on newer customer cohorts, providing more accurate
data.
b) Survival calculations use information from all customers, offering more stability.
c) Retention curves are limited to customers starting at a specific time, causing
fluctuations.
d) The retention curve considers the hazard probabilities at all tenures.

11. What are the potential limitations of using survival analysis in customer churn
prediction? (1 Mark)
a) It requires a large amount of historical data for accurate predictions.
b) It assumes that customer behaviour remains consistent over time.
c) It cannot account for external factors that may influence churn rates.
d) All of the above

12. How does survival differ from retention in customer analytics? (1 Mark)
a) Survival focuses on future customer behaviour, while retention analyses past
behaviour.
b) Retention measures the conditional survival at specific tenures.
c) Survival accumulates probabilities of a customer event not occurring over time.
d) Retention is always a smoother curve compared to survival.

13. Which components are crucial for a full customer value calculation? (1 Mark)
a) Length of the customer relationship only
b) Revenues and length of the customer relationship
c) Costs associated with customers only
d) Revenues, costs, and length of the customer relationship
14. How does survival analysis contribute to customer value calculations? (1 Mark)
a) It estimates the probability of a customer surviving indefinitely.
b) It helps determine the exact tenure for each customer in a relationship.
c) It provides insights into the expected remaining tenure for customers.
d) It calculates the value of the customer per unit time.

15. An online gaming platform has 100,000 active users. During a specific month, 10,000
users become inactive. The platform identifies 20,000 users as being at risk of
becoming inactive during that month. What is the hazard probability for the online
gaming platform during that month? (1 Mark)
a) 0.2
b) 0.6
c) 0.5
d) 0.25

BUSINESS INTELLIGENCE AND ANALYTICS

ASSIGNMENT WEEK 5:

Total marks = (15 Qns * 1 mark = 15 marks)

1. In regression analysis, multicollinearity refers to: (1 Mark)


a) The perfect linear relationship between the dependent and independent
variables.
b) The presence of outliers in the dataset that affect the regression coefficients.
c) High intercorrelation among the independent variables, leading to unstable
estimates of the regression coefficients.
d) The variance in the residuals of the regression model.

2. What type of data transformation technique scales data to a specific range, such as 0
to 1? (1 Mark)

a) Database normalization
b) Aggregation
c) Smoothing techniques
d) Standardization/Normalization

3. Which of the following statements about the coefficient of determination (R-squared)


is true? (1 Mark)
a) A higher R-squared value always indicates a lower model performance.
b) A higher R-squared value always indicates better model performance, regardless
of the number of predictor variables.
c) R-squared ranges from 0 to 1 and represents the percentage of variation in the
dependent variable explained by the independent variables.
d) R-squared can only take positive values and is unaffected by the presence of
multicollinearity in the regression model.

4. What does Ordinary Least Squares (OLS) aim to minimize in the context of linear
regression? (1 Mark)
a) The sum of squared errors between the predicted and observed values of the
dependent variable.
b) The sum of squared residuals between the predicted and observed values of the
independent variable.
c) The total variance of the independent variables.
d) The sum of squared errors between the predicted and observed values of the
independent variable.
5. The coefficient of determination (R-squared) value of 0.98 in a regression model
implies: (1 Mark)

A) The model has a high level of multicollinearity.

B) 98% of the variability in the dependent variable is explained by the independent


variable.

C) The regression model is overfitting the data by 98 %.

D) The residuals in the model are normally distributed with z value of 0.98

6. Prediction error in a model refers to: (1 Mark)


a) The difference between actual and predicted values.
b) The degree of overfitting in the model.
c) The number of features used in the model.
d) The variability of the target variable.

7. Which of the following statements is wrong with regards to Overfitting in a machine


learning model? (1 Mark)
a) The model is too simple to capture the underlying patterns in the data.
b) The model performs well on training data but poorly on unseen data.
c) The model fits the noise in the training data.
d) None of the above

8. Underfitting in a machine learning model results in: (1 Mark)


a) Low bias and high variance.
b) High bias and low variance.
c) High bias and high variance.
d) Low bias and low variance.

9. When should one focus on reducing bias in a machine learning model? (1 Mark)
a) When the model performs well on the training data but poorly on test data.
b) When the model shows high variability in predictions.
c) When the model consistently overfits the training data.
d) When the model doesn’t fit the data well, and works poorly in
explanatory/predictive performance

10. What is the bias-variance trade-off in machine learning? (1 Mark)


a) Balancing the computational resources used in training with model accuracy.
b) Aiming to minimize the difference between predicted and actual values in a
model.
c) Finding the equilibrium between model complexity and its ability to generalize to
unseen data.
d) Choosing the best algorithm that minimizes both bias and variance
simultaneously.

11. Training error refers to: (1 Mark)

a) Error calculated on the training dataset.


b) Error due to overfitting.
c) Error calculated on the testing dataset.
d) Error due to underfitting.
12. What does Leave-One-Out Cross-Validation (LOOCV) do? (1 Mark)

a) It iteratively uses all but one sample as the test set and the remaining sample as
the training set.
b) It divides the dataset into k subsets and uses each subset as the testing set in
turn.
c) It creates a validation set from a small portion of the data.
d) It iteratively uses all but one sample as the training set and the remaining sample
as the testing set.

13. What is the primary purpose of cross-validation in machine learning? (1 Mark)

a) To fit the model to the training data efficiently.


b) To evaluate the model's performance on unseen data.
c) To increase model complexity for better predictions.
d) To reduce the number of features in the dataset.

14. What are the three sources of error in predicted Y in machine learning? (1 Mark)

a) Measurement error, data preprocessing error, and feature selection error.


b) Model complexity error, parameter tuning error, and overfitting error.
c) Reducible error due to inaccurate estimation of f, irreducible error due to
randomness, and test data variation.
d) Training error, validation error, and testing error.

15. Which of the following statements most accurately distinguishes supervised learning
from unsupervised learning in machine learning? (1 Mark)
a) Supervised learning requires labelled data for training models to predict specific
outcomes, while unsupervised learning uncovers patterns or structures in data
without predefined outcomes.
b) Supervised learning primarily deals with clustering data points based on similarities,
while unsupervised learning focuses on predicting future trends based on historical
data.
c) Supervised learning utilizes human supervision to label data for analysis, while
unsupervised learning relies on algorithms to classify data into distinct categories.
d) Supervised learning involves training models without any prior knowledge of the
dataset, while unsupervised learning requires prior information about the
characteristics of the data.
BUSINESS INTELLIGENCE AND ANALYTICS

ASSIGNMENT WEEK 6:

Total marks = 18 Marks


12 Qns * 1 marks = 12 marks
3 Qns * 2 marks = 6 marks

1. Which of the following statements accurately describes a characteristic of classifiers in


machine learning? (1 Mark)
a) They are designed to generate continuous predictions based on input features.
b) They aim to identify the most prominent features that influence a target variable.
c) They categorize objects into distinct and mutually exclusive groups based on their
characteristics.
d) They analyse the relationships between variables and estimate their strength and
direction.

2. Which classification technique is primarily statistics and probability-based? (1 Mark)

a) Decision trees
b) Bayes' Classifiers
c) Support Vector Machines (SVM)
d) Artificial Neural Networks (ANN)

3. Which are the two measures used in ROC curves to visualize the performance of
classifiers? (1 Mark)

a) Sensitivity and specificity


b) Precision and recall
c) Accuracy and error rate
d) Sensitivity and precision

4. Which metric measures the ratio of correctly predicted positive observations to the total
predicted positives? (1 Mark)

a) Accuracy
b) Sensitivity
c) Specificity
d) Precision

5. Imagine you're building a spam filter that classifies emails as spam or not spam. After
testing your model, you get the following results:

• True Positives (TP): 100 emails correctly classified as spam


• False Positives (FP): 5 emails incorrectly classified as spam
• False Negatives (FN): 10 emails correctly classified as not spam but are actually
spam

What is the recall of your spam filter? (2 Marks)

a) 0.812
b) 0.525
c) 0.909
d) 0.455
6. Which technique primarily uses a set of if-else decision rules to categorize data?
(1 Mark)

a) Decision trees
b) Artificial Neural Networks (ANN)
c) Support Vector Machines (SVM)
d) Genetic algorithms

7. How does the test data variation contribute to the errors in predicting Y values?
(1 Mark)

a) It adds to the reducible error due to inaccurate estimation of f.


b) It causes the irreducible error due to randomness.
c) It directly affects the learning techniques.
d) It minimizes the error through cross-validation.

8. What does classifier accuracy represent in classification tasks? (1 Mark)


a) The percentage of test set tuples correctly classified by the classifier.
b) The similarity between training and test sets.
c) The number of rules generated by the classifier.
d) The predictive mapping function's complexity.

9. In classification, what does the term "reducible error" primarily refer to? (1 Mark)

a) Error due to randomness


b) Error caused by the classifier model
c) Error that can be minimized by better learning techniques
d) Error that cannot be reduced

10. In a medical study evaluating a diagnostic test for a certain disease, 150 patients were
tested. Of these, 90 patients were diagnosed with the disease, while 60 patients did
not have the disease. The model predictions are as follows:

Choose the correct option that


represents the error rate of the diagnostic test based on the provided classification outcomes. (2 Marks)

a) 0.25
b) 0.2
c) 0.15
d) 0.18

11. Overfitting occurs when a classifier incorporates anomalies of the training data that are
not present in the general dataset. (True/False) (1 Mark)

12. In unsupervised learning, for every observation i = 1,..., n, we observe a vector of


measurements xi but no associated response yi.(True/False) (1 Mark)
13. What is the lift obtained by a marketing team if, without data mining, they achieve a
15% response rate by randomly selecting 20% of potential customers, while with
predictive analytics, they target 20% of likely customers and achieve a response rate
of 25%? (2 Marks)
a) 2.5
b) 1.67
c) 3.25
d) 6.67

14. Choose the correct answer:


1. K-nearest neighbours or KNN is an unsupervised classification algorithm
2. K-means Clustering is a supervised classification algorithm.
a) 1 and 2 are correct
b) Only 1 is correct
c) Only 2 is correct
d) Both are wrong

15. Which of the following is NOT a commonly used classification technique? (1 Mark)
a) Decision trees
b) Logistic regression
c) K-nearest neighbours (KNN)
d) Principal component analysis (PCA)

BUSINESS INTELLIGENGE AND ANALYTICS

ASSIGNMENT WEEK 7:

Total marks = 17 Marks


14 Qns * 1 marks = 14 marks
1 Qns * 3 marks = 3 marks

1. What is a key advantage of decision trees in knowledge representation? (1 Mark)


a) They possess a complex representation
b) They have a slow learning process
c) They represent acquired knowledge in an intuitive tree form
d) They handle only unidimensional data

2. What does an internal node in a decision tree represent? (1 Mark)


a) Class label
b) Outcome of a test on an attribute
c) Unknown tuple
d) Root node

3. Why might a decision tree, resulting from the described process, perform poorly on a
test set? (1 Mark)
a) Due to too few splits leading to underfitting
b) Because it has too few leaves
c) It's likely to have too many splits, causing overfitting
d) It has high bias but low variance
4. What might a smaller tree with fewer splits achieve in terms of variance and bias? (1
Mark)
a) It reduces both variance and bias
b) It reduces variance but possibly increases bias
c) It increases both variance and bias
d) It doesn't affect variance or bias

5. How does a classification tree differ from a regression tree? (1 Mark)


a) It predicts a quantitative response, not a qualitative one.
b) It predicts a qualitative response, not a quantitative one.
c) It's less accurate than a regression tree.
d) It always predicts the mean response.

6. Which of the following is an advantage of decision trees compared to linear


regression? (1 Mark)
a) Decision trees are more accurate in predicting continuous outcomes.
b) Decision trees are easier to explain and interpret.
c) Decision trees require the creation of dummy variables for qualitative predictors.
d) Decision trees are less affected by outliers.

7. Ensemble methods are used to improve prediction performance of decision trees


(T/F) (1 Mark)

8. Bagging primarily addresses which issue within statistical learning methods like
decision trees? (1 Mark)
a) Increases the computational complexity of the models.
b) Reduces the need for accurate parameter tuning in models.
c) Deals with high variance and improves prediction accuracy.
d) Reduces high bias and improves prediction accuracy.

9. Which technique involves averaging predictions from multiple models built on


bootstrapped training sets? (1 Mark)
a) Decision tree pruning
b) Bootstrap aggregation (bagging)
c) Recursive feature elimination
d) Ridge regression

10. How does bagging handle classification problems? (1 Mark)


a) By averaging predictions from different classifiers
b) By constructing decision trees without bootstrapping
c) By taking a majority vote from predictions of multiple trees
d) By using only a single tree for classification

11. In Bagging, each individual tree is independent of each other because they consider
different subset of features and samples. (T/F) (1 Mark)

12. What are some common techniques for handling imbalanced data in classification
tasks? (1 Mark)
a) Oversampling the minority class to create a more balanced dataset.
b) Under sampling the majority class to reduce its dominance.
c) Only a is correct
d) Both a and b are correct
13. In Random forest you can generate hundreds of trees (say T1, T2 …..Tn) and then
aggregate the results of these trees. Which of the following is true about an individual
(Tk) tree in Random Forest? (1 Mark)

1. Individual tree is built on a subset of the features


2. Individual tree is built on all the features
3. Individual tree is built on a subset of observations
4. Individual tree is built on full set of observations

a) 1 and 3
b) 1 and 4
c) 2 and 3
d) 2 and 4

14. Consider a dataset with a binary target variable (0 or 1) and a split based on a
feature resulting in two child nodes after the split.

• Node 1 (left child): Out of 40 samples, 30 belong to class 0 and 10 belong to class 1.
• Node 2 (right child): Out of 60 samples, 20 belong to class 0 and 40 belong to class
1.

which option has the correct Gini indices of the child nodes? (3 Marks)

a) Gini index for Node 1: 0.375, Gini index for Node 2: 0.444
b) Gini index for Node 1: 0.375, Gini index for Node 2: 0.320
c) Gini index for Node 1: 0.425, Gini index for Node 2: 0.320
d) Gini index for Node 1: 0.444, Gini index for Node 2: 0.375

15. How does Random Forest aim to reduce correlation among trees? (1 Mark)
a) By constructing trees sequentially based on the residuals.
b) By growing trees independently with a random subset of predictors at each split.
c) By fitting trees to the residuals from the current model.
d) By sequentially building trees using information from previously grown trees.

BUSINESS INTELLIGENCE AND ANALYTICS

ASSIGNMENT WEEK 8:

Total marks = 1 Marks


15 Qns * 1 marks = 15 marks

1. Which of the following is a common method for splitting nodes in a decision tree? (1
Mark)

A. Gini impurity
B. Cross-validation
C. Gradient descent
D. Principal component analysis

2. What is the purpose of pruning in decision trees? (1 Mark)

A. To reduce the depth of the tree and prevent overfitting


B. To optimize the tree's parameters
C. To handle missing data
D. To reduce the depth of the tree and prevent underfitting
3. Which of the following is a popular algorithm for constructing decision trees? (1 Mark)
A. ID3
B. k-Nearest Neighbors
C. Support Vector Machines
D. Naive Bayes

4. What is the main difference between classification and regression trees in a CART
algorithm? (1 Mark)
A. Classification trees predict categorical variables, while regression trees predict
continuous variables
B. Classification trees use Gini impurity as the splitting criterion, while regression trees
use information gain
C. Classification trees can handle missing data, while regression trees cannot
D. Classification trees are computationally expensive, while regression trees are
computationally inexpensive

5. Consider the following statements:


a) The term Boosting stands for bootstrap aggregation.
b) Bagging is less susceptible to model overfitting as compared to Boosting
Which of the above statements are correct? (1 Mark)
A. Only a
B. Only b
C. Both a and b
D. Neither a nor b

6. What is entropy in the context of decision trees? (1 Mark)

A. A measure of disorder or impurity in a node


B. A measure of the complexity of a decision tree
C. The difference between the predicted and actual values in a node
D. The rate at which information is gained in a decision tree

7. Which of the following is a common stopping criterion for growing a decision tree? (1
Mark)

A. Reaching a maximum depth


B. Achieving a minimum information gain
C. Achieving a minimum Gini impurity
D. Both A and B
8. For decision trees, what purpose does "one-hot encoding" serve? (1 Mark)
A. Handling missing data
B. Transform categorical data into numerical format that algorithms can process
C. Normalizing continuous variables
D. Reducing the dimensionality of the feature space

9. What's the primary drawback of utilizing a substantial maximum depth for a decision
tree? (1 Mark)
A. It leads to overfitting
B. It cannot capture the noise in the training data
C. It simplifies the computational complexity of the tree
D. It results in the tree underfitting the data

10. Which strategy is effective in mitigating overfitting in decision trees? (1 Mark)


A. Pruning
B. Bagging
C. Boosting
D. All of the above
11. Which of the following is NOT commonly associated with the use of decision trees? (1
Mark)

A. Fraud detection
B. Stock price prediction

C. Customer churn prediction

D. Image classification

12. How can decision trees be made more robust to noise in the data? (1 Mark)

A. By increasing the maximum depth of the tree

B. By using a smaller minimum samples per leaf

C. By using ensemble techniques like bagging or boosting

D. By removing features with low importance

13. What role do leaf nodes play in a decision tree? (1 Mark)

A. To hold the conditions for data splitting

B. To represent the importance of a feature

C. To signify the depth of the tree

D. To denote the class label or predicted value

14. If the true positive value is 10 and the false positive value is 15, what is the precision score
for the classification model? (1 Mark)

A. 0.6

B. 0.4

C. 0.5

D. None of the above

15. Which of the following definitions describes false negatives? (1 Mark)


A. Predicted negatives that are actually positives
B. Predicted positives that are actually negatives
C. Predicted negatives that are actually negatives
D. Predicted positives that are actually positives
BUSINESS INTELLIGENCE AND ANALYTICS

ASSIGNMENT WEEK 9:

Total marks = 1 Marks


15 Qns * 1 marks = 15 marks

1. What is a key challenge associated with unsupervised learning? (1 Mark)


a) Lack of data availability
b) Subjectivity and absence of a clear analysis goal
c) Overfitting issues
d) Limited model interpretability

2. For clustering, we do not require- (1 Mark)

A. Labeled data
B. Unlabeled data
C. Numerical data
D. Categorical data

3. Which of the following is an example of an unsupervised learning algorithm? (1 Mark)


A. Linear regression
B. Logistic regression
C. K-means clustering
D. Support vector machines

4. What distinguishes K-means clustering from hierarchical clustering? (1 Mark)


A) K-means clustering produces a tree-like representation, while hierarchical
clustering uses pre-specified clusters.

B) K-means clustering requires knowing the number of clusters beforehand, while


hierarchical clustering does not.

C) K-means clustering creates a dendrogram, while hierarchical clustering creates


distinct clusters.

D) There is no difference between K-means and hierarchical clustering techniques.

5. What does a dendrogram represent in hierarchical clustering? (1 Mark)


A) A scatter plot of feature clusters

B) A visual display of K-means clustering

C) A tree-like structure showing clustering at various levels

D) A linear representation of data distributions

6. Which of the following is a method of choosing the optimal number of clusters for k-
means? (1 Mark)

A. Shadow method
B. the silhouette method
C. the elbow method
D. B and C
7. Which of the following statements best describes the goal of SMOTE preprocessing
technique? (1 Mark)
a) Reduce the dimensionality of the data
b) Balance the class distribution in imbalanced datasets
c) Improve the interpretability of a machine learning model
d) Detect outliers in the data

8. What defines a good clustering according to the K-means approach? (1 Mark)


A) Maximizing the distance between clusters
B) Minimizing the total number of observations in each cluster
C) Minimizing the sum of squared distances within each cluster
D) Maximizing the number of overlapping observations between clusters

9. Which of the following is a limitation of K-means clustering? (1 Mark)

A. Sensitivity to the initial placement of cluster centroids


B. Inability to handle missing data
C. Inability to handle categorical data
D. All of the above
10. Which of the following statements about distance between clusters is true? (1 Mark)
A) Single linkage computes distances between cluster centroids.
B) Complete linkage uses the average similarity of all objects within clusters.
C) Single linkage calculates the distance between individual objects in different
clusters.
D) Complete linkage considers the maximum distance between objects in different
clusters.

11. In a 3-dimensional space represented by coordinates (x, y, z), two cluster centroids,
A and B, have coordinates A(2, 4, 6) and B(5, 1, 3) respectively. What is the precise
Euclidean distance between these centroids, denoting their dissimilarity in the cluster
space? (1 Mark)
A) 5.20 units

B) 3.00 units

C) 4.36 units

D) 6.48 units

12. In K-means clustering, what is the purpose of the "elbow method"? (1 Mark)
A. To determine the optimal number of clusters
B. To identify the best distance metric
C. To select the best initialization method
D. To determine the convergence criteria

13. Suppose that a customer transaction table contains 9 items and 3 customers. What
is the Jaccard coefficient (similarity measure for asymmetric binary variables) for C1
and C2? (1 Mark)

a) 0.75
b) 0.25
c) 0.35
d) 0.85
14. In the figure below, if you draw a horizontal line on the y-axis for y=2. What will be
the number of clusters formed? (1 Mark)

Options:
A. 1
B. 2
C. 3
D. 4

15. Assume you want to cluster 7 observations into 3 clusters using the K-Means
clustering algorithm. After first iteration, clusters C1, C2, C3 have following
observations:

C1: {(2,2), (4,4), (6,6)}


C2: {(0,4), (4,0)}
C3: {(5,5), (9,9)}
What will be the Manhattan distance for observation (9, 9) from cluster centroid C1 in the
second iteration? (1 Mark)

Options:
A. 10
B. 5* sqrt (2)
C. 13*sqrt (2)
D. None of these
BUSINESS INTELLIGENCE AND ANALYTICS

ASSIGNMENT WEEK 10:

Total marks = 1 Marks


15 Qns * 1 marks = 15 marks

1. What does RFM stand for in customer segmentation strategy? (1 Mark)


A) Recency, Frequency, Management
B) Recency, Frequency, Monetary value
C) Revenue, Frequency, Management
D) Recency, Firmness, Money

2. Which of the following is NOT true about RFM analysis? (1 Mark)


A) It requires detailed demographic data.
B) It helps in predictive modelling techniques like churn modelling.
C) It is useful for targeting mails.
D) Customers with high scores in all three categories are likely to be the most loyal and
profitable.

3. How are Butterflies categorized in RFM segmentation? (1 Mark)


A) Highly loyal but not very profitable
B) Profitable but disloyal customers
C) Highly loyal and profitable
D) Customers who generate neither loyalty nor profits

4. Which customer group is likely to need incentives to increase their spending and
engagement? (1 Mark)
A) True Friends
B) Butterflies
C) Barnacles
D) Strangers
E)
5. What does CLV stand for in RFM analyses? (1 Mark)
A) Customer Lifetime Value
B) Customer Loyalty Value
C) Customer Longevity Value
D) Customer Lifetime Volume

6. How are R, F, and M typically combined to create composite scores in some methods?
(1 Mark)
A) Adding R, F, and M directly
B) Multiplying R by 5, F by 2, and M by 1
C) Dividing R, F, and M by a constant
D) Subtracting R from F and then adding M

7. What SQL function is used for RFM analysis to scale RFM into a predefined range?
(1 Mark)
A) GROUP BY

B) AVG()

C) NTILE()

D) MAX()
8. In RFM analysis what does "Recency" refer to? (1 Mark)

A) The total amount a customer spends

B) The number of purchases made by a customer

C) The time elapsed since a customer's last purchase

D) The predicted net profit from a customer over a future horizon

9. What is the purpose of the fit_predict method in scikit-learn's K-means


implementation? (1 Mark)
A) Training the model
B) Predicting cluster labels
C) Evaluating model performance
D) Visualizing data distribution

10. Which Python package provides functionality for visualizing K-means clustering
results using 2D and 3D plots?
A) seaborn
B) b. matplotlib
C) c. pandas
D) d. scikit-learn

11. How is Recency (R) scaled after grouping Days since last order into 10 deciles? (1
Mark)
a) It is scaled from 1-5 for better representation
b) It is reversed, with the most recent customer receiving the highest R value
c) It is not scaled as it represents the number of days directly
d) It is scaled logarithmically for better clustering

12. Which clustering algorithm assigns data points to the nearest cluster centroid? (1 Mark)
a) K-Means
b) DBSCAN
c) Agglomerative
d) Mean-Shift

13. A retail company wants to segment its customers for targeted marketing campaigns.
They have data on customer demographics (age, gender, income), purchase history
(amount, frequency, categories), and online behaviour (website visits, clicks). Which
features are most suitable for k-means clustering in this scenario? (1 Mark)
a) Demographics only (age, gender, income)
b) Purchase history only (amount, frequency, categories)
c) Online behaviour only (website visits, clicks)
d) A combination of all features

14. True or False: In K-means clustering, each cluster is represented by its center (centroid) which
corresponds to the median of points assigned to the cluster.

15. Out of the reasons elicited below, what would be a major reason for you not to choose
K- means for clustering analysis? (1 Mark)
a) It is sensitive to noise and outlier data points and also sensitive to the initial placement
of its cluster centers.
b) It always leads to complex cluster formation due to unequal size of cluster formed
c) Inter cluster distance is high for K-Means clustering
d) Accuracy of the model is comparatively low compared to other modes of clustering
BUSINESS INTELLIGENCE AND ANALYTICS

ASSIGNMENT WEEK 11:

Total marks = 1 Marks


15 Qns * 1 marks = 15 marks

1. Which of the following is not a type of layer in a neural network? (1 Mark)


A. Input layer
B. Hidden layer
C. Output layer
D. Support layer

2. What is the process of adjusting control parameters to optimize a neural network's


performance called? (1 Mark)
A. Regularization
B. Hyperparameter tuning
C. Gradient descent
D. Feature scaling

3. What is the purpose of the learning rate in a neural network? (1 Mark)


A. To control the speed of weight updates
B. To determine the number of layers
C. To set the activation function
D. To initialize the weights

4. What is the purpose of the loss function in a neural network? (1 Mark)


A. To measure the accuracy of the model
B. To update the weights
C. To compute the gradient
D. To measure the difference between predicted output and actual output

5. What does the term 'backpropagation' refer to in neural networks? (1 Mark)

A) Forward movement of information in a neural network

B) Fine-tuning the weights by propagating errors backward

C) Activation of output neurons


D) Weight initialization process

6. Which algorithm is commonly used for updating weights in backpropagation? (1


Mark)

A) Gradient Descent

B) K-Means

C) Random Forest

D) Principal Component Analysis

7. What does the term 'epoch' refer to in neural network training? (1 Mark)

A) A type of activation function


B) Number of layers in a network

C) One complete cycle of training data through the network

D) A method for weight initialization

8. What is a perceptron? (1 Mark)

A. a single layer feed-forward neural network


B. an auto-associative neural network
C. a double layer auto-associative neural network
D. a neural network that contains feedback

9. Which of the following best defines cross-sectional data? (1 Mark)


A) Data collected over different time periods from the same subjects
B) Data collected from a single point in time from different subjects
C) Data collected from the same subjects over multiple time points
D) Data collected from a specific population at regular intervals

10. If a neural network has 16 input neurons and 4 output neurons, how many neurons
would be recommended for the hidden layer according to thumb rule? (1 Mark)
A) 8 neurons
B) 4 neurons
C) 2 neurons
D) 12 neurons

11. If you increase the number of hidden layers in a multi-layer perceptron, the
classification error of test data always decreases- True/False (1 Mark)

12. There is a feedback loop in the final stage of a back propagation algorithm- T/F (1
Mark)

13. In time series analysis, which component represents the long-term movement or the
general direction of the data? (1 Mark)

A) Seasonality

B) Cyclical variations

C) Trend

D) Residual or noise

14. What defines panel data in econometric studies? (1 Mark)


A) Data that involve repeated multi-dimensional observations of the same subjects
over different periods of time
B) same as cohort study
C) repeated observations at same time
D) All the above
15. What differentiates a feedforward neural network from other types of neural networks
like recurrent neural networks (RNNs) or convolutional neural networks (CNNs)? (1
Mark)
A) It incorporates feedback connections.
B) It's designed specifically for sequential data.
C) Information flows in a single direction, without loops or cycles.
D) It employs pooling layers for feature extraction.

BUSINESS INTELLIGENCE AND ANALYTICS

ASSIGNMENT WEEK 12:

Total marks = 15 Marks


15 Qns * 1 marks = 15 marks

1. What is a token in text mining? (1 Mark)


a) A sequence of N words

b) A string of contiguous alphanumeric characters with space on either side

c) A collection of M documents

d) A semantic unit representing a concept or idea

2. What does stemming or lemmatization aim to achieve in text processing? (1 Mark)

a) Grouping words into documents

b) Identifying tokens in a text

c) Normalization by reducing different word forms into a single stem

d) Sequencing words in a corpus

3. Which of the following defines a document in text mining? (1 Mark)

a) A collection of M files

b) A sequence of N words

c) Identification of tokens in a text

d) Grouping words into stems

4. What does the term "Corpus" refer to in text mining? (1 Mark)

A. A punctuation mark

B. A lemmatization process

C. A collection of M documents

D. A string of contiguous alphanumeric characters


5. Which type of words are typically considered for removal or stop word lists in
text mining? (1 Mark)

A. Uncommon words
B. Words with high term frequency
C. Words with low term frequency
D. Commonly used words like "the," "is," "of," etc.

6. How is the inverse document frequency (idf) calculated for a given term? (1
Mark)

A. idf(term) = log(ndocuments / ndocuments containing term)


B. idf(term) = log(nwords / nwords containing term)
C. idf(term) = ndocuments / ndocuments containing term
D. idf(term) = nwords / nwords containing term

7. True or False: The statistic tf-idf is intended to measure how important a word is to
a document in a collection (or corpus) of documents

8. How can bigrams be beneficial in text analysis compared to individual words?


(1 Mark)
a) Bigrams always have higher frequency counts than individual words
b) Bigrams offer more context and capture structural relationships between words
c) Bigrams are easier to visualize in network plots
d) Bigrams are better than n-grams

9. What does a higher Phi coefficient value indicate regarding word co-occurrence?
(1 Mark)

a) Given words occur independently of each other


b) Given words always appear together in the same documents
c) Given words rarely appear together in documents
d) Given words are more likely to co-occur compared to appearing separately

10. .In a text corpus comprising 200 documents, the word "forest" and “wildlife” doesn’t co-occur in
120 documents. Both "forest" and "wildlife" co-occur in 50 documents. Furthermore, "forest" without
"wildlife" appears in 10 documents, and "wildlife" without "forest" appears in 20 documents. What is
the Phi coefficient to measure the correlation between the appearance of the words "forest" and
"wildlife" in this dataset?
A. 0.19
B. 0.66
C. 0.72
D. 0.85

11. Which of the following datasets provides a polarity score ranging from -5 to +5 for
words in sentiment analysis? (1 Mark)

A. Bing sentiment dataset


B. NRC sentiment dataset
C. AFINN sentiment dataset
D. VADER sentiment dataset
E.
12. What does a cosine similarity value of 0 indicate when comparing term frequency
vectors of two documents? (1 Mark)

A. The documents have no common words


B. The documents are perfectly similar
C. The documents are completely dissimilar
D. The documents are orthogonal

13. In TF-IDF analysis, what does the term frequency (tf) measure for a word in a
document? (1 Mark)

A. The count of the word in a document divided by the total words in that document
B. The count of the word in a document divided by the count of the word in the entire
corpus
C. The number of documents containing the word divided by the total number of
documents
D. The logarithm of the total number of documents divided by the number of
documents containing the term

14. A bag of words model uses- (1 Mark)


A. A vocabulary of known words
B. A measure of the presence of known words
C. Both A and B
D. None

15. Which of these techniques is used for normalization in text mining? (1 Mark)

A. Stemming
B. Stop words removal
C. Lemmatization
D. All of the above

You might also like