0% found this document useful (0 votes)
19 views7 pages

Assignment Part A

Uploaded by

Mamoona Jabbar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views7 pages

Assignment Part A

Uploaded by

Mamoona Jabbar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Assignment

Final Exams

BY

Muhammad Assad Jabbar


Roll No. 67011
Registration No. 2020-GCUF-09067
MS Computer Science (1st Semester)

DEPARTMENT OF COMPUTER SCIENCE,


GOVERNMENT COLLEGE UNIVERSITY, FAISALABAD
PAKISTAN
ASSIGNMENT PART A

SECTION-1

Q 1: What is linear regression?

Linear Regression is a linear model that assumes a linear relationship between input variables
(independent variables ‘x’) and output variable (dependent variable ’y’) such that ‘y’ can be
calculated from a linear combination of input variables(x).

Q 5: What is multiple regression?

Multiple regression is an extension of simple linear regression. It is used when we want to


predict the value of a variable based on the value of two or more other variables. The variable
we want to predict is called the dependent variable. The variables we are using to predict the
value of the dependent variable are called the independent variables.

Q 3: What is polynomial regression?

Polynomial Regression is a special case of Linear Regression where we fit the polynomial
equation on the data with a curvilinear relationship between the dependent and independent
variables.

Polynomial Regression does not require the relationship between the independent and
dependent variables to be linear in the data set.

SECTION-2

Q 5: What is Supervised learning?

Supervised learning provides you with a powerful tool to classify and process data using
machine language. With supervised learning you use “labeled” data, which is a data set
that has been classified, to infer a learning algorithm. The data set is used as the basis for
predicting the classification of other unlabeled data through the use of machine learning
algorithms. Supervised learning is good at classification and regression problems, such as
determining what category a news article belongs to or predicting the volume of sales for
a given future date. In supervised learning, the aim is to make sense of data within the
context of a specific question.
Q 6: What is Unsupervised Learning?

Unsupervised learning is a machine learning technique, where you do not need to supervise
the model. Instead, you need to allow the model to work on its own to discover information.
It mainly deals with the unlabelled data.

Q 1: What is the difference between user and item based collaborative


filtering?

1. User Based Collaborative Filtering:

Items recommendation rating for a user is calculated depending on that items rating by other
similar users.

 The ratings are predicted using the ratings of neighboring users.

 Neighborhoods are defined by similarities among users.

 Pearson Correlation provides superior results.

2. Item Based Collaborative Filtering:

Item rating is predicted based on how similar items have been rated by that user.

 The ratings are predicted using the user’s own ratings on neighboring (closely related)
items.

 Neighborhoods are defined by similarities among items.

 Adjusted Cosine similarity provides superier results.

SECTION-3

Q 1: What is k fold cross validation?


Cross-validation is a resembling procedure used to evaluate machine learning models on a
limited data sample.

The procedure has a single parameter called k that refers to the number of groups that a given
data sample is to be split into. As such, the procedure is often called k-fold cross-validation.
When a specific value for k is chosen, it may be used in place of k in the reference to the
model, such as k=10 becoming 10-fold cross-validation.

Cross-validation is primarily used in applied machine learning to estimate the skill of a


machine learning model on unseen data. That is, to use a limited sample in order to estimate
how the model is expected to perform in general when used to make predictions on data not
used during the training of the model.

It is a popular method because it is simple to understand and because it generally results in a


less biased or less optimistic estimate of the model skill than other methods, such as a simple
train/test split.

The general procedure is as follows:

 Shuffle the dataset randomly.


 Split the dataset into k groups
 For each unique group:
1. Take the group as a hold out or test data set
2. Take the remaining groups as a training data set
3. Fit a model on the training set and evaluate it on the test set
4. Retain the evaluation score and discard the model
 Summarize the skill of the model using the sample of model evaluation scores

Importantly, each observation in the data sample is assigned to an individual group and stays
in that group for the duration of the procedure. This means that each sample is given the
opportunity to be used in the hold out set 1 time and used to train the model k-1 times.

SECTION-4

Q 1: What is Bayesian method?


Bayes' theorem, named after 18th-century British mathematician Thomas Bayes, is a
mathematical formula for determining conditional probability. Conditional probability is the
likelihood of an outcome occurring, based on a previous outcome occurring. Bayes' theorem
provides a way to revise existing predictions or theories (update probabilities) given new or
additional evidence. In finance, Bayes' theorem can be used to rate the risk of lending money
to potential borrowers.

Q 2: What is k mean clustering?

K-means clustering is a type of unsupervised learning, which is used when you have
unlabeled data (data without defined categories or groups). The goal of this algorithm is to
find groups in the data, with the number of groups represented by the variable K. The
algorithm works iteratively to assign each data point to one of K groups based on the features
that are provided. Data points are clustered based on feature similarity. The results of the K-
means clustering algorithm are:

 The centroids of the K clusters, which can be used to label new data
 Labels for the training data (each data point is assigned to a single cluster)

Rather than defining groups before looking at the data, clustering allows you to find and
analyze the groups that have formed organically. The "Choosing K" section below describes
how the number of groups can be determined.

Each centroid of a cluster is a collection of feature values which define the resulting groups.
Examining the centroid feature weights can be used to qualitatively interpret what kind of
group each cluster represents.

Q 4: What is ensemble learning?

Ensemble learning is the process by which multiple models, such as classifiers or experts, are
strategically generated and combined to solve a particular computational intelligence
problem. Ensemble learning is primarily used to improve the (classification, prediction,
function approximation, etc.) performance of a model, or reduce the likelihood of an
unfortunate selection of a poor one. Other applications of ensemble learning include
assigning a confidence to the decision made by the model, selecting optimal (or near optimal)
features, data fusion, incremental learning, nonstationary learning and error-correcting.

SECTION-5

Q 2 : What are classification metrics?

In binary classification, there are two possible output classes. In multi-class classification,
there are more than two possible classes.

There are many ways of measuring classification performance:

 Accuracy
 Confusion matrix
 Log-loss
 Precision and Recall
 F-Scores
 Receiver operating characteristic (ROC) curve
 Area under curve (AUC) ("curve" corresponds to the ROC curve)

Q 4 : What are ranking metrics?

Ranking related metrics. Ranking is a fundamental problem in machine learning, which tries
to rank a list of items based on their relevance in a particular task (e.g. ranking pages on
Google based on their relevance to a given query).

Q 1: How output of different algorithms can be measured?

Time efficiency- A measure of amount of time for an algorithm to execute.

Space efficiency- A measure of the amount of memory needed for an algorithm to execute.

Complexity theory- A study of algorithm performance Function dominance - a comparison


of cost functions
SECTION-6

Q 1: What is uniform distribution?

A uniform distribution is a type of distribution of probabilities where all outcomes are equally
likely; each variable has the same probability that it will be the outcome. A deck of cards has
within its uniform distributions because the probability that a heart, club, diamond, or spade
is pulled is the same.

Q 3: What is percentile?

The most common definition of a percentile is a number where a certain percentage of scores
fall below that number. You might know that you scored 67 out of 90 on a test. But that
figure has no real meaning unless you know what percentile you fall into. If you know that
your score is in the 90th percentile, that means you scored better than 90% of people who
took the test.

Q 4: What are moments?

For a random variable x, its Nth moment is the expected value of the Nth power of x, where
N is a positive integer. The Nth moment of the deviation of x from its mean is called "the Nth
central moment".

The 1st moment is the mean, the 2nd central moment is the variance.

You might also like