0% found this document useful (0 votes)
3K views37 pages

BCS602 Model Question Paper Solved (Search Creators)

This document is a model question paper for the Machine Learning course at Visvesvaraya Technological University, covering various topics such as definitions, types of machine learning, data preprocessing, and algorithms like PCA and decision trees. It includes questions from different modules, requiring students to demonstrate their understanding of concepts and apply algorithms to given datasets. The paper is structured to assess knowledge through a mix of theoretical and practical applications in machine learning.

Uploaded by

nayakbhuvangame
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3K views37 pages

BCS602 Model Question Paper Solved (Search Creators)

This document is a model question paper for the Machine Learning course at Visvesvaraya Technological University, covering various topics such as definitions, types of machine learning, data preprocessing, and algorithms like PCA and decision trees. It includes questions from different modules, requiring students to demonstrate their understanding of concepts and apply algorithms to given datasets. The paper is structured to assess knowledge through a mix of theoretical and practical applications in machine learning.

Uploaded by

nayakbhuvangame
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Visvesvaraya Technological University (VTU)

Subject Code: BCS602

Subject: Machine Learning

Created By:

Hanumanthu

Dedicated To.
All CSE Engineering Students

📺 YouTube: https://www.youtube.com/@searchcreators7348
📸 Instagram : https://www.instagram.com/searchcreators/
📱 Telegram: https://t.me/SearchCreators
💬 WhatsApp:+917348878215
BCS602

Model Question Paper-1 with effect from 2022 (CBCS Scheme)


USN

Sixth Semester B.E. Degree Examination


Machine Learning

TIME: 03 Hours Max. Marks: 100

Note: 01. Answer any FIVE full questions, choosing at least ONE question from each MODULE.

Bloom’s Marks
Module -1
Taxonomy
Level

Q.01 a Define Machine Learning. Explain its relationship to other fields with
diagram.
Machine learning is a branch of artificial intelligence that enables algorithms to
uncover hidden patterns within datasets. It allows them to predict new, similar data
without explicit programming for each task.

Relationship of Machine Learning to Other Fields


1. Artificial Intelligence (AI):
Machine Learning (ML) is a subset of AI that enables systems to learn
from data and improve over time without being explicitly programmed. L2 10
While AI is a broader field aiming to simulate human intelligence, ML
focuses on using data-driven algorithms to make decisions.

2. Statistics:
ML heavily relies on statistical techniques to analyze and interpret data. It
uses concepts such as probability, regression, and hypothesis testing to
find patterns, relationships, and predictions in data.

3. Data Mining:
Both ML and data mining involve analyzing large datasets to find useful
patterns. However, data mining is more about discovering unknown
insights, while ML emphasizes building predictive models that generalize
to new data.

4. Data Science:
ML is a key component of data science. Data scientists use ML algorithms
to build models, make predictions, and automate decisions across fields
like business, healthcare, and finance. ML helps in turning raw data into
actionable insights.

5. Deep Learning:
Deep Learning is a subfield of ML that uses artificial neural networks
with multiple layers. It excels in handling complex data like images,
audio, and text, powering applications such as face recognition, language
translation, and autonomous driving.
b Explain different types of machine learning with a diagram.

L2 10

Main Types of Machine Learning:


• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning
• Semi-Supervised Learning (mix of supervised and unsupervised)

Supervised Learning
• Learns from labeled data (input-output pairs)
• Predicts output from known input
• Example: Classifying animals as cats or dogs
• Classification algorithms: Logistic Regression, SVM, Decision Tree, KNN,
Random Forest, Naive Bayes
• Regression algorithms: Linear Regression, Polynomial Regression, Ridge,
Lasso, Random Forest
• Advantages: High accuracy, interpretable results, suitable for pre-trained
models
• Disadvantages: Requires large labeled datasets, struggles with unseen data
• Applications: Spam detection, medical diagnosis, image/speech
recognition, fraud detection, weather forecasting, stock prediction, credit
scoring

Unsupervised Learning
• Learns from unlabeled data
• Discovers hidden patterns or structure
• Example: Grouping customers by behavior
• Clustering algorithms: K-Means, DBSCAN, Mean-Shift, PCA, ICA
• Association algorithms: Apriori, FP-Growth, ECLAT
• Advantages: No labeled data needed, finds unknown patterns, good for
exploration
• Disadvantages: Hard to interpret, may produce unclear groupings
• Applications: Customer segmentation, market basket analysis, anomaly
detection, genomics, image compression, recommendations

Reinforcement Learning
• Learns through interaction with environment using rewards/penalties
• Goal: Maximize long-term rewards
• Example: AI learning to play chess
• Types: Positive reinforcement, Negative reinforcement
• Algorithms: Q-learning, SARSA, Deep Q-learning (DQN)
• Advantages: Handles dynamic environments, good for long-term strategy,
can outperform humans
• Disadvantages: Requires lots of data, time, and computing power
• Applications: Game AI, self-driving cars, robotics, chatbots, finance,
healthcare treatment planning
OR
Q.02 a Define data. Explain 6V’s of Big Data.
➢ Data refers to facts and information that can be recorded, stored, and
processed by a computer system.
➢ Data can be in the form of numbers, text, images, audio, or video.
➢ Some data like numbers or text is directly human-readable.
➢ Other types like images, audio, and video are interpreted by computers
using algorithms.

L2 10

6 V’s of Big Data


1. Volume
Refers to the huge amount of data generated every second from various
sources like social media, sensors, transactions, and IoT devices.
2. Velocity
Refers to the speed at which data is generated, collected, and processed to
meet demand. Example: real-time processing of online transactions or
sensor data.
3. Variety
Refers to the different types and formats of data — structured, semi-
structured, and unstructured (e.g., text, images, audio, video).
4. Veracity
Refers to the reliability and accuracy of data. It deals with data quality,
inconsistencies, noise, and trustworthiness.
5. Value
Refers to the usefulness of the data. Big data should provide value to
organizations through insights, trends, and decision-making.
6. Variability
Refers to the inconsistency of data flow. Data loads may vary greatly with
time, affecting processing and analysis.

b Explain data preprocessing with an example. L2 10

Data Preprocessing
Data preprocessing is the process of cleaning, transforming, and organizing raw
data into a suitable format for analysis or machine learning models. It helps
improve the quality of data and ensures better results from models.

Steps in Data Preprocessing


• Data Cleaning: Handling missing values, removing duplicates, fixing
errors.
• Data Integration: Combining data from multiple sources.
• Data Transformation: Normalizing or scaling data, encoding categorical
variables.
• Data Reduction: Reducing data size by selecting important features or
sampling.
• Data Discretization: Converting continuous data into discrete buckets or
intervals.

Example
Suppose you have a dataset of customer information with these issues:
• Missing age values
• Categorical gender values as “Male” and “Female”
• Income values ranging widely, from $100 to $100,000

Data Preprocessing:
1. Handle Missing Values: Fill missing ages with the average age.
2. Encode Categorical Data: Convert gender to numbers (Male = 0,
Female = 1).
3. Normalize Income: Scale income values to a range between 0 and 1.
After preprocessing, the dataset is clean, consistent, and ready for building
machine learning models.
Module-2
Q. a Apply and explain principal component analysis algorithm for the given
03 data points and prove that PCA works.

L2 12
b Explain continuous and discrete probability distributions. 2, 3 8

Continuous
o All values within a specific interval [a,b][a, b][a,b] are equally likely
o Continuous (infinite number of possible values)
o Uses probability density function (PDF)
o Area under the PDF curve is equal to 1
o Examples: random time of day, random position on a line segment, random
point in a unit square
o Symmetric about the mean
o PDF is constant within the interval
o Mean = Median = Mode
o Uniform flat distribution
Discrete
o All distinct, finite outcomes have equal probability of occurring
o Discrete (limited set of countable outcomes)
o Uses probability mass function (PMF)
o Sum of all probabilities is equal to 1
o Examples: rolling a fair die, flipping a fair coin, drawing a card, choosing a
random day
o Equal probability for all outcomes
o Finite and countable outcome set
o Uniformity across all values

OR
Q.04 a Design a learning system for chess game. L2 10

🔧 Chess Game Learning System Design

🧠 Learning Approaches
1. Supervised Learning
o Input: Thousands of labeled expert chess games (e.g., from
grandmasters).
o Output: Predict the best move for a given position.
o Use: Opening and middle-game strategy learning.
o Model: Deep neural networks or decision trees.
2. Reinforcement Learning
o Agent: Chess AI
o Environment: Chess board
o Actions: Legal moves
o Reward: +1 for win, 0 for draw, -1 for loss
o Algorithm: Deep Q-Learning, Monte Carlo Tree Search (MCTS),
AlphaZero-style Self-Play

⚙️ System Components
1. Game Environment
• Chess engine (e.g., Stockfish, custom implementation)
• Rules and legal move generation
• Board representation (e.g., 8x8 matrix, FEN format)
2. State Representation
• Represent the board as tensors or feature maps
• Encode positions, piece types, player turn, etc.
3. Move Predictor / Policy Network
• Neural network that predicts the best next move
• Trained on supervised expert data or via reinforcement
4. Value Network
• Predicts probability of winning from a given board state
• Helps evaluate positions for decision-making

5. Training Module
• For supervised learning: Train on expert game datasets (PGN format)
• For reinforcement learning: Use self-play and update using rewards
6. Evaluation & Improvement
• Play against known opponents (engines or humans)
• Analyze mistakes and retrain
• Use Elo rating for performance tracking

b Explain and apply candidate elimination algorithm for the given dataset. L2 10
Steps of the Algorithm
1. Initialization:
• Set S to the most specific hypothesis: S ={Sunny, Warm, Normal, Strong,
Same}
• Set G to the most general hypothesis: G={?,?,?,?,?}

2. Process Training Example 1 (Positive Example):


• The hypothesis S remains the same as it matches the positive instance.
• The general hypothesis G remains unchanged as it covers this example.

3. Process Training Example 2 (Positive Example):


• Update S to generalize, as the humidity attribute does not match:
• S={Sunny, Warm, ?, Strong, Same}S
• Remove any inconsistent hypotheses from G. In this case, G remains
unchanged.

4. Process Training Example 3 (Negative Example):


• Refine G to exclude the negative instance by making each attribute more
specific:
• G={Sunny, ?, ?, ?, ?}, G={?,Warm, ?, ?, ?} etc.
• Ensure S still matches this example.

5. Process Training Example 4 (Positive Example):


• Update S to generalize further as the forecast attribute differs:
• S={Sunny, Warm, ?, Strong, ?}.
• G is updated to ensure consistency with all positive examples.

Module-3
Q. a Distinguish between L2 10
05 i. Locally weighted regression and Linear regression.
ii. Multiple linear regression and Logistic regression.
b Apply weighted KNN algorithm using the given dataset to classify the test set
data (7.6, 60,8) where k=3

Solution: Given a test instance (6.1, 40, 5) and a set of categories {Pass, Fail}
also called as classes,
we need to use the training set to classify the test instance using Euclidean
distance.
The task of classification is to assign a category or class to an arbitrary instance.
Assign k=3.
Step 1: Calculate the Euclidean distance between the test instance (6.1, 40, and
5) and each of the training instances as shown in Table L2 7
OR
Q. a Make use of entropy and information gain to discover the root node for the L2 10
06 decision tree for the following dataset using ID3 algorithm.
b Analyze decision tree learning with its structure, advantages, and L2 10
disadvantages.

Decision Tree Learning Model


✅ Introduction
1. Decision Tree is a supervised predictive model used for classification tasks.
2. It uses inductive inference to derive general conclusions from observed
data.
3. Summarizes training data in a tree structure to classify new test data.

✅ Model Description
1. Accepts input instances with features (discrete or continuous).
2. Produces a decision tree as output, which classifies test objects.
3. Attributes = independent variables, target class = response variable.

✅ Structure of a Decision Tree


1. Root Node: Topmost node.
2. Decision Nodes: Internal test nodes (represented as diamonds).
3. Branches: Represent outcomes of test conditions.
4. Leaf Nodes: Final outcomes or target class labels (rectangles).
5. Each path from root to leaf = logical rule (conjunction of tests).
6. Whole tree = disjunction of these rules.

✅ Procedures Involved
13. Building the Tree:
o Start at root and select the best attribute to split recursively.
o Continue till no further split is possible (reaches leaf).
o Output: Decision tree (complete hypothesis space).
14. Classification (Inference):
o Traverse the tree from root to leaf using test data.
o Output: Target class of the test instance.

✅ Advantages
1. Easy to model and interpret.
2. Works with discrete and continuous variables.
3. Captures nonlinear relationships.
4. Simple, fast, and effective for classification.
✅ Disadvantages
1. Hard to decide tree depth (when to stop splitting).
2. Unstable with missing/erroneous data.
3. Complex with continuous attributes.
4. May overfit training data.
5. Not ideal for multi-output classes.
6. Learning optimal trees is NP-complete.

Module-4
Q. a Define prior probability. Explain Bayes theorem, hML and hMAP with an
07 example.
Prior Probability
It is the general probability of an uncertain event before an observation is seen or
some evidence is collected. It is the initial probability that is believed before any
new information is collected.

L2 10
b Analyze the student performance using Navie Bayes algorithm for continuous L2 10
attribute. Predict whether student will get job offer or not in the final year.
OR
Q. a Analyze different types of artificial neural network with diagram.
08

3, 4 10
b Define activation function. Explain different types of activation function. L2 10
Activation Functions
➢ Activation functions are mathematical functions associated with each
neuron in the neural network that map input signals to output signals.
➢ It decides whether to fire a neuron or not based on the input signals the
neuron receives.
➢ These functions normalize the output value of each neuron either between
0 and 1 or between -1 and +1. Typical activation functions can be linear or
non-linear.
Module-5
Q. 09 a Analyze Grid based approach and mention the steps of CLIQUE L2 10
Grid-Based Approach
1. Grid-based approach is a space-based clustering technique.
2. It works by partitioning the data space into cells and fitting data into
these cells for cluster formation.
3. There are three key concepts in grid-based clustering:
o Subspace Clustering
o Dense Cells
o Monotonicity Property

Subspace Clustering
1. Used for high-dimensional data (data with many attributes/dimensions).
2. Not all dimensions are necessary for every application.
a. Example: In disease profiling, age may be needed but address is
not.
3. Only a subset of features (dimensions) is required for meaningful
clustering.
4. Grouping gene data or organs with similar functions are examples of
subspace clustering applications.

Challenge in Subspace Clustering


1. Finding relevant subspaces is difficult due to the exponential growth of
subspace combinations.
2. For N dimensions, the number of subspaces could be large (e.g., 2ⁿ
possible combinations).

CLIQUE Algorithm – Stage 1


1. Step 1: Identify the dense cells.
2. Step 2: Merge dense cells cᵢ and cⱼ if they share the same interval.
3. Step 3: Generate Apriori rule to form (k + 1)ᵗʰ dimensional cells.
4. Check whether the number of points in the generated cell crosses the
threshold.
5. Repeat the above steps until no new dense cells are found or no further
generation of dense cells is possible.

CLIQUE Algorithm – Stage 2


6. Step 1: Merge dense cells into clusters in each subspace using maximal
regions.
7. A maximal region is a hyperrectangle that covers all the dense cells.
8. Step 2: Maximal region attempts to cover all dense cells to form clusters.
9. CLIQUE starts merging from dimension 2 and continues up to n-
dimensions.

Advantages of CLIQUE
10. Insensitive to the input order of objects.
11. Makes no assumptions about underlying data distributions.
12. Identifies subspaces of higher dimensions with high-density clusters.

Disadvantage of CLIQUE
13. Tuning grid parameters like grid size and finding the optimal threshold
for determining dense cells is challenging.
b Apply k means clustering algorithm for the given data with initial value of L2 10
objects 2 and 5 considered as initial seeds.
OR
Q. 10 a Determine characteristics, application and challenges of reinforcement 3, 4 10
learning.
Characteristics of Reinforcement Learning
1. Sequential Decision Making
o Path from start to goal is a sequence of steps.
o A single wrong move may result in failure.
2. Delayed Feedback
o Rewards are not always immediate.
o Success or failure often comes after many steps.
3. Interdependent Actions
o Each action affects the outcome of subsequent actions.
o A wrong action can influence the entire decision path.
4. Time-Related Actions
o Every action has a time stamp.
o Actions are inherently ordered over time.

Challenges of Reinforcement Learning


1. Reward Design
o Designing the right rewards and assigning values is difficult.
2. Absence of a Model
o Some environments lack fixed rules or structures.
o Simulation is needed to collect experience.
3. Partial Observability of States
o Not all information about the state is available.
o Example: Weather forecasting involves uncertainty.
4. Time-Consuming Operations
o Large state spaces and many actions increase time complexity.
5. Complexity
o Games like GO have large boards and many possibilities.
o Lack of labeled data makes algorithm design harder.

Applications of Reinforcement Learning


1. Industrial Automation
2. Resource Management
o Efficient allocation of resources.
3. Traffic Light Control
o To reduce traffic congestion.
4. Personalized Recommendation Systems
o Such as news feeds.
5. Advertisement Bidding
6. Customized Applications
7. Driverless Cars
8. Games (with Deep Learning)
o Chess, GO, etc.
9. DeepMind Applications
o Generating programs and images.

b Analyze components of reinforcement learning with a diagram. 3, 4 10


Components of Reinforcement Learning
1. Main Components:
o Environment: The world where all actions take place. It defines
input, output, and rewards.
o Agent: An autonomous entity (human, robot, or program) that
interacts with the environment by performing actions.
o States: The input to the RL system; represents the current situation
or condition of the environment.
o Actions: The output from the agent; a decision taken based on the
current state.
o Rewards: Numerical feedback received after performing an action.

Types of Reinforcement Learning Problems


1. Learning Problems
o Environment is unknown.
o Agent learns by trial and error to improve policy.

2. Planning Problems
o Environment is known.
o Agent uses model-based computations to improve policy.

Environment and Agent


• Environment:
o Contains maps, rules, obstacles, etc.
o Starts in an initial state.
• Agent:
o Observes the environment and takes actions.
o Can be human, robot, or software (e.g., chatbot).

States and Actions


• State: Input to the RL algorithm (e.g., position in a grid).
• Action: Output decided by the agent (e.g., move up/down).

Policies
• A policy maps states to actions.
• Represents agent’s behavior.
Types of Policies:
1. Deterministic Policy
o Formula: a = μ(s)
o Same action is always returned for a given state.
2. Stochastic Policy
o Formula: π(a|s) = Pr[aₜ = a | sₜ = s]
o Returns probability of choosing an action in a given state.
• Goal: Learn the best policy that maximizes cumulative expected
rewards.

Rewards
• Used to measure system performance.
• Given by the environment after an action is performed.
• Denoted by symbol r.
Types of Rewards:
1. Immediate Reward
o Given right after the action.
o Total reward = sum of rewards from start to end of the episode.
o Formula:
▪ Gₜ = rₜ + rₜ₊₁ + rₜ₊₂ + ... + r_T
▪ Gₜ is the return or goal.
2. Long-term Reward
o Collected over a longer period (e.g., total game outcome).
o Considers current reward plus future rewards.
o Formula with Discount Factor (γ):
▪ Gₜ = rₜ + γrₜ₊₁ + γ²rₜ₊₂ + ...
▪ γ is called the discount factor.

You might also like