0% found this document useful (0 votes)
27 views37 pages

Weka DW&DM Lab Notes

DWDM notes

Uploaded by

tempofnc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views37 pages

Weka DW&DM Lab Notes

DWDM notes

Uploaded by

tempofnc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

DATAWARE HOUSE TOOLS

ParAccel (Actian)
Cloudera

Talend

Query surge

Amazon Redshift

Teradata

Oracle

TabLeau

Page 1.1
Open Source Data Mining Tools

WEKA

Orange

KNIME

R-Programming

Rapid Miner
Apache Mahout

Tanagra

XL Miner

Page 1.2
Experiment 1: Installation of WEKA Tool
Aim: A. Investigation the Application interfaces of the Weka tool. Introduction:

Introduction
Weka (pronounced to rhyme with Mecca) is a workbench that contains a collection of
visualization tools and algorithms for data analysis and predictive modeling, together with
graphical user interfaces for easy access to these functions. The original non-Java version of
Weka was a Tcl/Tk front-end to (mostly third-party) modeling algorithms implemented in other
programming languages, plus data preprocessing utilities in C, and Make file-based system for
running machine learning experiments. This original version was primarily designed as a tool for
analyzing data from agricultural domains, but the more recent fully Java-based version (Weka 3),
for which development started in 1997, is now used in many different application areas, in
particular for educational purposes and research. Advantages of Weka include:

 Free availability under the GNU General Public License.


 Portability, since it is fully implemented in the Java programming language and thus runs
on almost any modern computing platform
 A comprehensive collection of data preprocessing and modeling techniques
 Ease of use due to its graphical user interfaces

Description:
Open the program. Once the program has been loaded on the user‟s machine it is opened by
navigating to the programs start option and that will depend on the user‟s operating system.
Figure 1.1 is an example of the initial opening screen on a computer.
There are four options available on this initial screen:

Information Technology Page 1


Fig: 1.1 Weka GUI

1. Explorer - the graphical interface used to conduct experimentation on raw data After clicking
the Explorer button the weka explorer interface appears.

Fig: 1.2 Pre-processor

Information Technology Page 2


Information Technology Page 3
Inside the weka explorer window there are six tabs:
1. Preprocess- used to choose the data file to be used by the application.
Open File- allows for the user to select files residing on the local machine or recorded medium
Open URL- provides a mechanism to locate a file or data source from a different location
specified by the user
Open Database- allows the user to retrieve files or data from a database source provided by user
2. Classify- used to test and train different learning schemes on the preprocessed data file under
experimentation

Fig: 1.3 choosing Zero set from classify


Again there are several options to be selected inside of the classify tab. Test option gives the user
the choice of using four different test mode scenarios on the data set.
1. Use training set
2. Supplied training set
3. Cross validation
4. Split percentage

3. Cluster- used to apply different tools that identify clusters within the data file.
The Cluster tab opens the process that is used to identify commonalties or clusters of occurrences
within the data set and produce information for the user to analyze.

Information Technology Page 4


4. Association- used to apply different rules to the data file that identify association within the
data. The associate tab opens a window to select the options for associations within the dataset.

Information Technology Page 5


5. Select attributes-used to apply different rules to reveal changes based on selected attributes
inclusion or exclusion from the experiment

6. Visualize- used to see what the various manipulation produced on the data set in a 2D format,
in scatter plot and bar graph output.

2. Experimenter - this option allows users to conduct different experimental variations on data
sets and perform statistical manipulation. The Weka Experiment Environment enables the user to
create, run, modify, and analyze experiments in a more convenient manner than is possible when
processing the schemes individually. For example, the user can create an experiment that runs
several schemes against a series of datasets and then analyze the results to determine if one of the
schemes is (statistically) better than the other schemes.

Fig: 1.6 Weka experiment

Results destination: ARFF file, CSV file, JDBC database.


Experiment type: Cross-validation (default), Train/Test Percentage Split (data randomized).
Iteration control: Number of repetitions, Data sets first/Algorithms first.
Algorithms: filters

Information Technology Page 6


3. Knowledge Flow -basically the same functionality as Explorer with drag and drop
functionality. The advantage of this option is that it supports incremental learning from previous
results
4. Simple CLI - provides users without a graphic interface option the ability to execute
commands from a terminal window.
b. Explore the default datasets in weka tool.

Click the “Open file…” button to open a data set and double click on the “data” directory.
Weka provides a number of small common machine learning datasets that you can use to practiceon.
Select the “iris.arff” file to load the Iris dataset.

Fig: 1.7 Different Data Sets in weka

References:
[1] Witten, I.H. and Frank, E. (2005) Data Mining: Practical machine learning tools and
techniques. 2nd edition Morgan Kaufmann, San Francisco.
[2] Ross Quinlan (1993). C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers,
San Mateo, CA.
[3] CVS–http://weka.sourceforge.net/wiki/index.php/CVS
[4] Weka Doc–http://weka.sourceforge.net/wekadoc/

Exercise:
1. Normalize the data using min-max normalization

Information Technology Page 7


Weka
1. Waikato Environment for Knowledge Analysis
2. OPEN SOURCE DATA MINING TOOLS

3. Weka logo, a bird of New Zealand


4. Weka contains a collection of visualization tools and
algorithms for data analysis and predictive modeling

5. Operating system Windows, macOS, Linux

6. Latest Version 3.9.6(Jan 2022)

7. https://www.filehorse.com/download-weka/

8. Open Source Data Mining Tools-Weka,orange, KNIME,


R- Programming, XL-Miner
1. Which of the following is true about Weka?

a. Weka is a data visualization tool.


b. Weka is a programming language.
c. Weka is a collection of machine learning algorithms.
d. Weka is used only for unsupervised learning.

Answer: c. Weka is a collection of machine learning algorithms.

Explanation: Weka stands for Waikato Environment for Knowledge


Analysis and is a collection of machine learning algorithms and data
preprocessing tools.

2. Which file format is commonly used to import data into Weka?

a. PDF
b. CSV
c. MP4
d. PNG

Answer: b. CSV.

Explanation: Weka can import data in various file formats, but CSV
(Comma Separated Values) is the most commonly used file format for
importing data.

3. Which of the following is NOT a type of data preprocessing


available in Weka?

a. Attribute selection
b. Data cleaning
c. Data normalization
d. Data visualization
Answer: d. Data visualization.

Explanation: Weka does not have built-in data visualization tools, but it
does provide various data preprocessing techniques such as attribute
selection, data cleaning, and data normalization.

4. Which algorithm is used for classification in Weka?

a. Naive Bayes
b. K-means
c. Random Forest
d. PCA

Answer: a. Naive Bayes.

Explanation: Weka provides a variety of classification algorithms,


including Naive Bayes, Decision Trees, Support Vector Machines, and
more.

5. Which of the following is NOT a clustering algorithm in Weka?

a. K-means
b. DBSCAN
c. EM
d. Linear Regression

Answer: d. Linear Regression.

Explanation: Linear Regression is not a clustering algorithm, but a


supervised learning algorithm used for regression tasks.

6. Which of the following is NOT a type of evaluation metric in


Weka?
a. Accuracy
b. Precision
c. Recall
d. Distance

Answer: d. Distance.

Explanation: Distance is not an evaluation metric in Weka, but a concept


used in clustering algorithms to measure the similarity or dissimilarity
between data points.

7. Which of the following is a feature selection technique in Weka?

a. Principal Component Analysis (PCA)


b. Recursive Feature Elimination (RFE)
c. K-means clustering
d. K-nearest neighbors (KNN)

Answer: b. Recursive Feature Elimination (RFE).

Explanation: RFE is a feature selection technique that recursively


removes the least important features from the dataset until a desired
number of features is reached. Weka provides various feature selection
techniques, including RFE, Correlation-based Feature Selection (CFS), and
more.

8. Which of the following is a disadvantage of the K-nearest


neighbors (KNN) algorithm in Weka?

a. It is computationally expensive
b. It requires large amounts of training data
c. It is sensitive to irrelevant features
d. It cannot handle categorical data

Answer: a. It is computationally expensive.

Explanation: KNN algorithm is computationally expensive as it requires


calculating the distance between the query point and all the training data
points, which can be time-consuming for large datasets.

9. Which of the following is an ensemble learning algorithm in


Weka?

a. Linear Regression
b. Naive Bayes
c. Random Forest
d. K-means

Answer: c. Random Forest.

Explanation: Random Forest is an ensemble learning algorithm that


combines multiple decision trees to make more accurate predictions.
Weka provides various ensemble learning algorithms, including Bagging,
Boosting, and more.

10. Which of the following is NOT a type of neural network


available in Weka?

a. Multilayer Perceptron (MLP)


b. Radial Basis Function (RBF)
c. Convolutional Neural Network (CNN)
d. Decision Tree

Answer: d. Decision Tree.


Explanation: Decision Tree is not a type of neural network, but
a machine learning algorithm used for classification and regression tasks.

11. Which of the following is a supervised learning algorithm in


Weka?

a. K-means
b. DBSCAN
c. Naive Bayes
d. EM

Answer: c. Naive Bayes.

Explanation: Naive Bayes is a supervised learning algorithm used for


classification tasks, where the target variable is known.

12. Which of the following is NOT a data preprocessing technique


in Weka?

a. Data normalization
b. Data imputation
c. Data visualization
d. Data discretization

Answer: c. Data visualization.

Explanation: Weka does not have built-in data visualization tools, but it
does provide various data preprocessing techniques such as data
normalization, data imputation, and data discretization.

13. Which of the following is a feature extraction technique in


Weka?
a. Principal Component Analysis (PCA)
b. Recursive Feature Elimination (RFE)
c. Correlation-based Feature Selection (CFS)
d. K-nearest neighbors (KNN)

Answer: a. Principal Component Analysis (PCA).

Explanation: PCA is a feature extraction technique that transforms the


original features into a smaller set of uncorrelated features that explain
most of the variance in the data. Weka provides various feature
extraction techniques, including PCA, Linear Discriminant Analysis (LDA),
and more.

14. Which of the following is NOT a type of attribute in Weka?

a. Numeric
b. Nominal
c. Binary
d. Sequential

Answer: d. Sequential.

Explanation: Sequential is not a type of attribute in Weka, but a concept


used in time series analysis to represent data that is ordered in time.

15. Which of the following is NOT a data mining task in Weka?

a. Classification
b. Clustering
c. Association Rule Mining
d. Data Visualization

Answer: d. Data Visualization.


Explanation: Data Visualization is not a data mining task in Weka, but a
technique used to represent data in a visual form for better understanding
and insights.

16. Which of the following is a rule-based learning algorithm in


Weka?

a. Random Forest
b. J48
c. K-means
d. DBSCAN

Answer: b. J48.

Explanation: J48 is a decision tree algorithm based on the C4.5


algorithm, which builds a tree of if-then rules to make
predictions. Weka provides various rule-based learning algorithms,
including ZeroR, OneR, and more.

17. Which of the following is a data imbalance problem in Weka?

a. Overfitting
b. Underfitting
c. Missing values
d. Class imbalance

Answer: d. Class imbalance.

Explanation: Class imbalance is a data imbalance problem that occurs


when one class in the dataset has significantly fewer samples than the
other classes, leading to biased predictions. Weka provides various
techniques to handle class imbalance, including resampling, cost-sensitive
learning, and more.

18. Which of the following is NOT a type of regression algorithm in


Weka?

a. Linear Regression
b. Polynomial Regression
c. Logistic Regression
d. K-means

Answer: d. K-means.

Explanation: K-means is not a regression algorithm, but a clustering


algorithm used to group similar data points together.

19. Which of the following is NOT a type of cross-validation in


Weka?

a. K-fold cross-validation
b. Leave-one-out cross validation
c. Stratified cross-validation
d. Naive Bayes cross-validation

Answer: d. Naive Bayes cross-validation.

Explanation: Naive Bayes cross-validation is not a type of cross-


validation in Weka, but a technique used to evaluate the performance of
Naive Bayes classifier on a dataset.

20. Which of the following is a data discretization technique in


Weka?
a. Equal width discretization
b. Normalization
c. Principal Component Analysis (PCA)
d. Recursive Feature Elimination (RFE)

Answer: a. Equal width discretization.

Explanation: Equal width discretization is a data discretization technique


that divides the range of values into equal-width intervals and assigns a
discrete value to each interval. Weka provides various data discretization
techniques, including equal frequency discretization, unsupervised
discretization, and more.

21. Which of the following is NOT a type of ensemble learning


algorithm in Weka?

a. Bagging
b. Boosting
c. Random Forest
d. K-means

Answer: d. K-means.

Explanation: K-means is not an ensemble learning algorithm, but a


clustering algorithm used to group similar data points together. Weka
provides various ensemble learning algorithms, including Bagging,
Boosting, and Random Forest.

22. Which of the following is a dimensionality reduction technique


in Weka?
a. Principal Component Analysis (PCA)
b. Recursive Feature Elimination (RFE)
c. Correlation-based Feature Selection (CFS)
d. K-nearest neighbors (KNN)

Answer: a. Principal Component Analysis (PCA).

Explanation: PCA is a dimensionality reduction technique that transforms


the original features into a smaller set of uncorrelated features that
explain most of the variance in the data. Weka provides various
dimensionality reduction techniques, including PCA, Linear Discriminant
Analysis (LDA), and more.

23. Which of the following is a non-parametric classification


algorithm in Weka?

a. Logistic Regression
b. Decision Tree
c. Naive Bayes
d. k-Nearest Neighbors (k-NN)

Answer: d. k-Nearest Neighbors (k-NN).

Explanation: k-NN is a non-parametric classification algorithm that uses


the k-nearest neighbors to classify a new instance based on the majority
class of its neighbors. Weka provides various non-parametric classification
algorithms, including k-NN, Random Forest, and more.

24. Which of the following is a neural network activation function


available in Weka?
a. Sigmoid
b. ReLU
c. Tanh
d. All of the above

Answer: d. All of the above.

Explanation: Weka provides various neural network activation functions,


including Sigmoid, ReLU, Tanh, and more.

25. Which of the following is a clustering evaluation metric


in Weka?

a. Accuracy
b. F-measure
c. Silhouette coefficient
d. Precision

Answer: c. Silhouette coefficient.

Explanation: Silhouette coefficient is a clustering evaluation metric that


measures the quality of clustering by comparing the distance between the
data points within the same cluster and the distance between the data
points of different clusters. Weka provides various clustering evaluation
metrics, including Silhouette coefficient, Sum of Squared Error (SSE), and
more.

26. Which of the following is a data imbalance handling technique


in Weka?

a. Bagging
b. SMOTE
c. Random Forest
d. Boosting

Answer: b. SMOTE.

Explanation: SMOTE (Synthetic Minority Over-sampling Technique) is a


data imbalance handling technique that creates synthetic samples of the
minority class by interpolating between the existing minority class
samples. Weka provides various data imbalance handling techniques,
including SMOTE, ADASYN, and more.

27. Which of the following is a regression algorithm in Weka?

a. Decision Tree
b. k-Nearest Neighbors (k-NN)
c. Linear Regression
d. Support Vector Machine (SVM)

Answer: c. Linear Regression.

Explanation: Linear Regression is a regression algorithm that models the


relationship between the dependent variable and one or more
independent variables by fitting a linear equation to the data. Weka
provides various regression algorithms, including Linear Regression,
Multilayer Perceptron (MLP), and more.

28. Which of the following is a data normalization technique in


Weka?

a. Min-max normalization
b. Recursive Feature Elimination (RFE)
c. Correlation-based Feature Selection (CFS)
d. Principal Component Analysis (PCA)

Answer: a. Min-max normalization.

Explanation: Min-max normalization is a data normalization technique


that scales the data to a fixed range of values between 0 and 1. Weka
provides various data normalization techniques, including z-score
normalization, decimal scaling, and more.

29. Which of the following is NOT a type of attribute selection in


Weka?

a. Wrapper Subset Evaluator


b. Filter Subset Evaluator
c. Correlation-based Feature Selection (CFS)
d. Boosting

Answer: d. Boosting.

Explanation: Boosting is not a type of attribute selection, but an


ensemble learning algorithm used for classification and regression tasks.
Weka provides various attribute selection techniques, including Wrapper
Subset Evaluator, Filter Subset Evaluator, and more.

30. Which of the following is a classification algorithm in Weka?

a. Support Vector Machine (SVM)


b. k-Means
c. Hierarchical Clustering
d. PCA

Answer: a. Support Vector Machine (SVM).


Explanation: SVM is a classification algorithm that separates the data
into different classes by finding the hyperplane that maximally separates
the classes. Weka provides various classification algorithms, including
SVM, Naive Bayes, and more.

31. Which of the following is NOT a type of ensemble learning in


Weka?

a. AdaBoost
b. Bagging
c. Boosting
d. Random Forest

Answer: d. Random Forest.

Explanation: Random Forest is not a type of ensemble learning, but a


specific ensemble learning algorithm that uses decision trees as the base
classifiers. Weka provides various ensemble learning algorithms, including
AdaBoost, Bagging, and Boosting.

32. Which of the following is a distance metric used in k-Nearest


Neighbors (k-NN) algorithm in Weka?

a. Euclidean distance
b. Manhattan distance
c. Mahalanobis distance
d. All of the above

Answer: d. All of the above.


Explanation: Weka provides various distance metrics used in k-Nearest
Neighbors (k-NN) algorithm, including Euclidean distance, Manhattan
distance, and Mahalanobis distance.

33. Which of the following is a missing value handling technique


in Weka?

a. Mean imputation
b. Median imputation
c. Mode imputation
d. All of the above

Answer: d. All of the above.

Explanation: Weka provides various missing value handling techniques,


including mean imputation, median imputation, mode imputation, and
more.

34. Which of the following is a kernel function used in Support


Vector Machine (SVM) algorithm in Weka?

a. Linear kernel
b. Polynomial kernel
c. Gaussian kernel
d. All of the above

Answer: d. All of the above.

Explanation: Weka provides various kernel functions used in Support


Vector Machine (SVM) algorithm, including linear kernel, polynomial
kernel, and Gaussian kernel.

35. Which of the following is a rule-based classifier in Weka?


a. Decision Tree
b. Naive Bayes
c. ZeroR
d. JRip

Answer: d. JRip.

Explanation: JRip is a rule-based classifier in Weka that constructs a set


of rules from the data that classify the instances based on their attribute
values. Weka provides various rule-based classifiers, including JRip, PART,
and more.

36. Which of the following is NOT a type of clustering algorithm in


Weka?

a. k-Means
b. Hierarchical Clustering
c. DBSCAN
d. Linear Regression

Answer: d. Linear Regression.

Explanation: Linear Regression is not a type of clustering algorithm, but


a regression algorithm used to model the relationship between the
dependent variable and one or more independent variables. Weka
provides various clustering algorithms, including k-Means, Hierarchical
Clustering, DBSCAN, and more.

37. Which of the following is a feature selection technique that


selects a subset of features based on their correlation with the
class attribute in Weka?
a. Wrapper Subset Evaluator
b. Filter Subset Evaluator
c. Correlation-based Feature Selection (CFS)
d. Principal Component Analysis (PCA)

Answer: c. Correlation-based Feature Selection (CFS).

Explanation: CFS is a feature selection technique in Weka that selects a


subset of features based on their correlation with the class attribute.
Weka provides various feature selection techniques, including Wrapper
Subset Evaluator, Filter Subset Evaluator, and more.

38. Which of the following is a rule induction algorithm in Weka?

a. k-NN
b. Apriori
c. Random Forest
d. JRip

Answer: d. JRip.

Explanation: JRip is a rule induction algorithm in Weka that constructs a


set of rules from the data that classify the instances based on their
attribute values. Weka provides various rule induction algorithms,
including JRip, PART, and more.

39. Which of the following is a type of ensemble learning


technique in Weka?

a. Decision Tree
b. Naive Bayes
c. Bagging
d. k-NN

Answer: c. Bagging.

Explanation: Bagging is a type of ensemble learning technique in Weka


that constructs multiple models from different subsets of the data and
combines them to improve the predictive performance. Weka provides
various ensemble learning techniques, including Bagging, Boosting, and
more.

40. Which of the following is a dimensionality reduction technique


that maximizes the margin between classes in Weka?

a. PCA
b. LDA
c. ICA
d. SVM

Answer: b. LDA.

Explanation: LDA is a dimensionality reduction technique in Weka that


maximizes the margin between classes by finding the linear combinations
of features that best separate the classes. SVM is a classification
algorithm that can use LDA as a preprocessing step. Weka provides
various dimensionality reduction techniques, including PCA, LDA, and
more.

41. Which of the following is a type of classification algorithm in


Weka that assigns a class label based on the most common class
in the training data?
a. Decision Tree
b. Naive Bayes
c. k-NN
d. ZeroR

Answer: d. ZeroR.

Explanation: ZeroR is a type of classification algorithm in Weka that


assigns a class label based on the most common class in the training
data. It is a simple baseline classifier used to evaluate the predictive
performance of more complex classifiers. Weka provides various
classification algorithms, including Decision Tree, Naive Bayes, k-NN, and
more.

42. Which of the following is a type of ensemble learning


technique that combines multiple models using weighted voting in
Weka?

a. Bagging
b. Boosting
c. Stacking
d. Random Forest

Answer: c. Stacking.

Explanation: Stacking is a type of ensemble learning technique in Weka


that combines multiple models using weighted voting. The output of the
base models is used as input to a meta-model that learns how to combine
them to make the final prediction. Weka provides various ensemble
learning techniques, including Bagging, Boosting, Random Forest, and
more.
43. Which of the following is a feature selection technique that
evaluates the subsets of features using a learning algorithm in
Weka?

a. Wrapper Subset Evaluator


b. Filter Subset Evaluator
c. Correlation-based Feature Selection (CFS)
d. Principal Component Analysis (PCA)

Answer: a. Wrapper Subset Evaluator.

Explanation: Wrapper Subset Evaluator is a feature selection technique


in Weka that evaluates the subsets of features using a learning algorithm.
It searches through the space of possible feature subsets and selects the
one that achieves the best performance on the validation set. Weka
provides various feature selection techniques, including Wrapper Subset
Evaluator, Filter Subset Evaluator, and more

44. Which of the following is a clustering algorithm in Weka that


uses a density-based approach?

a. k-Means
b. EM
c. DBSCAN
d. SOM

Answer: c. DBSCAN.

Explanation: DBSCAN is a clustering algorithm in Weka that uses a


density-based approach to group the instances into clusters. It works by
identifying the dense regions of the data and connecting them into
clusters. Weka provides various clustering algorithms, including k-Means,
EM, DBSCAN, SOM, and more.

45. Which of the following is a method for handling missing values


in Weka that uses the available data to estimate the missing
values?

a. Mean Imputation
b. Mode Imputation
c. Median Imputation
d. k-NN Imputation

Answer: d. k-NN Imputation.

Explanation: k-NN Imputation is a method for handling missing values in


Weka that uses the available data to estimate the missing values. It
works by finding the k nearest instances to the instance with missing
values and using their attribute values to estimate the missing values.
Weka provides various methods for handling missing values, including
Mean Imputation, Mode Imputation, Median Imputation, k-NN Imputation,
and more.

46. Which of the following is a type of ensemble learning


technique in Weka that combines multiple models using a
weighted sum of their predictions?

a. Bagging
b. Boosting
c. Stacking
d. Random Forest

Answer: b. Boosting.
Explanation: Boosting is a type of ensemble learning technique in Weka
that combines multiple models using a weighted sum of their predictions.
It works by iteratively reweighting the instances based on their
classification errors and building a new model on the reweighted data.
Weka provides various ensemble learning techniques, including Bagging,
Boosting, Random Forest, Stacking, and more.

47. Which of the following is a type of classification algorithm in


Weka that models the joint probability distribution of the features
and the class?

a. Naive Bayes
b. k-NN
c. Decision Tree
d. SVM

Answer: a. Naive Bayes.

Explanation: Naive Bayes is a type of classification algorithm in Weka


that models the joint probability distribution of the features and the class
using Bayes’ theorem and the assumption of independence between the
features. Weka provides various classification algorithms, including Naive
Bayes, k-NN, Decision Tree, SVM, and more.

48. Which of the following is a clustering algorithm in Weka that


uses a probabilistic approach?

a. k-Means
b. EM
c. DBSCAN
d. SOM
Answer: b. EM.

Explanation: EM is a clustering algorithm in Weka that uses a


probabilistic approach to group the instances into clusters. It works by
modeling the data as a mixture of probability distributions and estimating
the parameters of the distributions using the Expectation-Maximization
algorithm. Weka provides various clustering algorithms, including k-
Means, EM, DBSCAN, SOM, and more.

49. Which of the following is a feature selection technique that


evaluates the subsets of features based on their predictive power
and selects the subset that gives the best performance?

a. Filter
b. Wrapper
c. Embedded
d. Correlation-based

Answer: b. Wrapper.

Explanation: Wrapper is a feature selection technique in Weka that


evaluates the subsets of features based on their predictive power and
selects the subset that gives the best performance. It works by using a
learning algorithm to train and evaluate the model on each subset of
features and selecting the subset that gives the best performance. Weka
provides various feature selection techniques, including Filter, Wrapper,
Embedded, Correlation-based, and more.

50. Which of the following is a type of dimensionality reduction


technique in Weka that maps the high-dimensional data to a
lower-dimensional space while preserving the pairwise distances
between the instances?
a. Principal Component Analysis (PCA)
b. Linear Discriminant Analysis (LDA)
c. t-SNE
d. Isomap

Answer: d. Isomap.

Explanation: Isomap is a type of dimensionality reduction technique in


Weka that maps the high-dimensional data to a lower-dimensional space
while preserving the pairwise distances between the instances. It works
by constructing a neighborhood graph of the instances and estimating the
geodesic distances between them using a shortest path algorithm. Weka
provides various dimensionality reduction techniques, including PCA, LDA,
t-SNE, Isomap, and more.

51. Which of the following is a type of rule-based classification


algorithm in Weka that builds a set of rules from the data?

a. OneR
b. ZeroR
c. JRip
d. Random Tree

Answer: c. JRip.

Explanation: JRip is a type of rule-based classification algorithm


in Weka that builds a set of rules from the data. It works by iteratively
adding rules to the rule set based on the accuracy and coverage of the
rules. Weka provides various classification algorithms, including OneR,
ZeroR, JRip, Random Tree, and more.
52. Which of the following is a type of clustering algorithm in
Weka that uses a hierarchical approach to group the instances
into clusters?

a. k-Means
b. EM
c. DBSCAN
d. Hierarchical

Answer: d. Hierarchical.

Explanation: Hierarchical is a type of clustering algorithm in Weka that


uses a hierarchical approach to group the instances into clusters. It works
by recursively merging the most similar clusters based on a distance
metric until all the instances are in a single cluster. Weka provides various
clustering algorithms, including k-Means, EM, DBSCAN, Hierarchical, and
more.

53. Which of the following is a type of feature selection technique


in Weka that selects the features based on their correlation with
the class and removes the redundant features?

a. Filter
b. Wrapper
c. Embedded
d. Correlation-based

Answer: d. Correlation-based.

Explanation: Correlation-based is a feature selection technique in Weka


that selects the features based on their correlation with the class and
removes the redundant features. It works by computing the correlation
between each feature and the class and selecting the subset of features
with the highest correlation. Weka provides various feature selection
techniques, including Filter, Wrapper, Embedded, Correlation-based, and
more.

54. Which of the following is a type of clustering algorithm in


Weka that uses a grid-based approach to group the instances into
clusters?

a. k-Means
b. EM
c. DBSCAN
d. CLIQUE

Answer: d. CLIQUE.

Explanation: CLIQUE is a type of clustering algorithm in Weka that uses


a grid-based approach to group the instances into clusters. It works by
partitioning the data into overlapping grids and identifying the dense
regions of the data within each grid. Weka provides various clustering
algorithms, including k-Means, EM, DBSCAN, CLIQUE, and more.

55. Which of the following is a type of classification algorithm in


Weka that builds a decision tree from the data?

a. Naive Bayes
b. k-NN
c. J48
d. Random Forest

Answer: c. J48.

You might also like