0% found this document useful (0 votes)

27 views37 pages

Weka DW&DM Lab Notes

DWDM notes

Uploaded by

tempofnc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views37 pages

Weka DW&DM Lab Notes

DWDM notes

Uploaded by

tempofnc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

DATAWARE HOUSE TOOLS

ParAccel (Actian)
Cloudera

Talend

Query surge

Amazon Redshift

Teradata

Oracle

TabLeau

Page 1.1
Open Source Data Mining Tools

WEKA

Orange

KNIME

R-Programming

Rapid Miner
Apache Mahout

Tanagra

XL Miner

Page 1.2
Experiment 1: Installation of WEKA Tool
Aim: A. Investigation the Application interfaces of the Weka tool. Introduction:

Introduction
Weka (pronounced to rhyme with Mecca) is a workbench that contains a collection of
visualization tools and algorithms for data analysis and predictive modeling, together with
graphical user interfaces for easy access to these functions. The original non-Java version of
Weka was a Tcl/Tk front-end to (mostly third-party) modeling algorithms implemented in other
programming languages, plus data preprocessing utilities in C, and Make file-based system for
running machine learning experiments. This original version was primarily designed as a tool for
analyzing data from agricultural domains, but the more recent fully Java-based version (Weka 3),
for which development started in 1997, is now used in many different application areas, in
particular for educational purposes and research. Advantages of Weka include:

 Free availability under the GNU General Public License.

 Portability, since it is fully implemented in the Java programming language and thus runs
on almost any modern computing platform
 A comprehensive collection of data preprocessing and modeling techniques
 Ease of use due to its graphical user interfaces

Description:
Open the program. Once the program has been loaded on the user‟s machine it is opened by
navigating to the programs start option and that will depend on the user‟s operating system.
Figure 1.1 is an example of the initial opening screen on a computer.
There are four options available on this initial screen:

Information Technology Page 1

Fig: 1.1 Weka GUI

1. Explorer - the graphical interface used to conduct experimentation on raw data After clicking
the Explorer button the weka explorer interface appears.

Fig: 1.2 Pre-processor

Information Technology Page 2

Information Technology Page 3
Inside the weka explorer window there are six tabs:
1. Preprocess- used to choose the data file to be used by the application.
Open File- allows for the user to select files residing on the local machine or recorded medium
Open URL- provides a mechanism to locate a file or data source from a different location
specified by the user
Open Database- allows the user to retrieve files or data from a database source provided by user
2. Classify- used to test and train different learning schemes on the preprocessed data file under
experimentation

Fig: 1.3 choosing Zero set from classify

Again there are several options to be selected inside of the classify tab. Test option gives the user
the choice of using four different test mode scenarios on the data set.
1. Use training set
2. Supplied training set
3. Cross validation
4. Split percentage

3. Cluster- used to apply different tools that identify clusters within the data file.
The Cluster tab opens the process that is used to identify commonalties or clusters of occurrences
within the data set and produce information for the user to analyze.

Information Technology Page 4

4. Association- used to apply different rules to the data file that identify association within the
data. The associate tab opens a window to select the options for associations within the dataset.

Information Technology Page 5

5. Select attributes-used to apply different rules to reveal changes based on selected attributes
inclusion or exclusion from the experiment

6. Visualize- used to see what the various manipulation produced on the data set in a 2D format,
in scatter plot and bar graph output.

2. Experimenter - this option allows users to conduct different experimental variations on data
sets and perform statistical manipulation. The Weka Experiment Environment enables the user to
create, run, modify, and analyze experiments in a more convenient manner than is possible when
processing the schemes individually. For example, the user can create an experiment that runs
several schemes against a series of datasets and then analyze the results to determine if one of the
schemes is (statistically) better than the other schemes.

Fig: 1.6 Weka experiment

Results destination: ARFF file, CSV file, JDBC database.

Experiment type: Cross-validation (default), Train/Test Percentage Split (data randomized).
Iteration control: Number of repetitions, Data sets first/Algorithms first.
Algorithms: filters

Information Technology Page 6

3. Knowledge Flow -basically the same functionality as Explorer with drag and drop
functionality. The advantage of this option is that it supports incremental learning from previous
results
4. Simple CLI - provides users without a graphic interface option the ability to execute
commands from a terminal window.
b. Explore the default datasets in weka tool.

Click the “Open file…” button to open a data set and double click on the “data” directory.
Weka provides a number of small common machine learning datasets that you can use to practiceon.
Select the “iris.arff” file to load the Iris dataset.

Fig: 1.7 Different Data Sets in weka

References:
[1] Witten, I.H. and Frank, E. (2005) Data Mining: Practical machine learning tools and
techniques. 2nd edition Morgan Kaufmann, San Francisco.
[2] Ross Quinlan (1993). C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers,
San Mateo, CA.
[3] CVS–http://weka.sourceforge.net/wiki/index.php/CVS
[4] Weka Doc–http://weka.sourceforge.net/wekadoc/

Exercise:
1. Normalize the data using min-max normalization

Information Technology Page 7

Weka
1. Waikato Environment for Knowledge Analysis
2. OPEN SOURCE DATA MINING TOOLS

3. Weka logo, a bird of New Zealand

4. Weka contains a collection of visualization tools and
algorithms for data analysis and predictive modeling

5. Operating system Windows, macOS, Linux

6. Latest Version 3.9.6(Jan 2022)

7. https://www.filehorse.com/download-weka/

8. Open Source Data Mining Tools-Weka,orange, KNIME,

R- Programming, XL-Miner
1. Which of the following is true about Weka?

a. Weka is a data visualization tool.

b. Weka is a programming language.
c. Weka is a collection of machine learning algorithms.
d. Weka is used only for unsupervised learning.

Answer: c. Weka is a collection of machine learning algorithms.

Explanation: Weka stands for Waikato Environment for Knowledge

Analysis and is a collection of machine learning algorithms and data
preprocessing tools.

2. Which file format is commonly used to import data into Weka?

a. PDF
b. CSV
c. MP4
d. PNG

Answer: b. CSV.

Explanation: Weka can import data in various file formats, but CSV
(Comma Separated Values) is the most commonly used file format for
importing data.

3. Which of the following is NOT a type of data preprocessing

available in Weka?

a. Attribute selection
b. Data cleaning
c. Data normalization
d. Data visualization
Answer: d. Data visualization.

Explanation: Weka does not have built-in data visualization tools, but it
does provide various data preprocessing techniques such as attribute
selection, data cleaning, and data normalization.

4. Which algorithm is used for classification in Weka?

a. Naive Bayes
b. K-means
c. Random Forest
d. PCA

Answer: a. Naive Bayes.

Explanation: Weka provides a variety of classification algorithms,

including Naive Bayes, Decision Trees, Support Vector Machines, and
more.

5. Which of the following is NOT a clustering algorithm in Weka?

a. K-means
b. DBSCAN
c. EM
d. Linear Regression

Answer: d. Linear Regression.

Explanation: Linear Regression is not a clustering algorithm, but a

supervised learning algorithm used for regression tasks.

6. Which of the following is NOT a type of evaluation metric in

Weka?
a. Accuracy
b. Precision
c. Recall
d. Distance

Answer: d. Distance.

Explanation: Distance is not an evaluation metric in Weka, but a concept

used in clustering algorithms to measure the similarity or dissimilarity
between data points.

7. Which of the following is a feature selection technique in Weka?

a. Principal Component Analysis (PCA)

b. Recursive Feature Elimination (RFE)
c. K-means clustering
d. K-nearest neighbors (KNN)

Answer: b. Recursive Feature Elimination (RFE).

Explanation: RFE is a feature selection technique that recursively

removes the least important features from the dataset until a desired
number of features is reached. Weka provides various feature selection
techniques, including RFE, Correlation-based Feature Selection (CFS), and
more.

8. Which of the following is a disadvantage of the K-nearest

neighbors (KNN) algorithm in Weka?

a. It is computationally expensive
b. It requires large amounts of training data
c. It is sensitive to irrelevant features
d. It cannot handle categorical data

Answer: a. It is computationally expensive.

Explanation: KNN algorithm is computationally expensive as it requires

calculating the distance between the query point and all the training data
points, which can be time-consuming for large datasets.

9. Which of the following is an ensemble learning algorithm in

Weka?

a. Linear Regression
b. Naive Bayes
c. Random Forest
d. K-means

Answer: c. Random Forest.

Explanation: Random Forest is an ensemble learning algorithm that

combines multiple decision trees to make more accurate predictions.
Weka provides various ensemble learning algorithms, including Bagging,
Boosting, and more.

10. Which of the following is NOT a type of neural network

available in Weka?

a. Multilayer Perceptron (MLP)

b. Radial Basis Function (RBF)
c. Convolutional Neural Network (CNN)
d. Decision Tree

Answer: d. Decision Tree.

Explanation: Decision Tree is not a type of neural network, but
a machine learning algorithm used for classification and regression tasks.

11. Which of the following is a supervised learning algorithm in

Weka?

a. K-means
b. DBSCAN
c. Naive Bayes
d. EM

Answer: c. Naive Bayes.

Explanation: Naive Bayes is a supervised learning algorithm used for

classification tasks, where the target variable is known.

12. Which of the following is NOT a data preprocessing technique

in Weka?

a. Data normalization
b. Data imputation
c. Data visualization
d. Data discretization

Answer: c. Data visualization.

Explanation: Weka does not have built-in data visualization tools, but it
does provide various data preprocessing techniques such as data
normalization, data imputation, and data discretization.

13. Which of the following is a feature extraction technique in

Weka?
a. Principal Component Analysis (PCA)
b. Recursive Feature Elimination (RFE)
c. Correlation-based Feature Selection (CFS)
d. K-nearest neighbors (KNN)

Answer: a. Principal Component Analysis (PCA).

Explanation: PCA is a feature extraction technique that transforms the

original features into a smaller set of uncorrelated features that explain
most of the variance in the data. Weka provides various feature
extraction techniques, including PCA, Linear Discriminant Analysis (LDA),
and more.

14. Which of the following is NOT a type of attribute in Weka?

a. Numeric
b. Nominal
c. Binary
d. Sequential

Answer: d. Sequential.

Explanation: Sequential is not a type of attribute in Weka, but a concept

used in time series analysis to represent data that is ordered in time.

15. Which of the following is NOT a data mining task in Weka?

a. Classification
b. Clustering
c. Association Rule Mining
d. Data Visualization

Answer: d. Data Visualization.

Explanation: Data Visualization is not a data mining task in Weka, but a
technique used to represent data in a visual form for better understanding
and insights.

16. Which of the following is a rule-based learning algorithm in

Weka?

a. Random Forest
b. J48
c. K-means
d. DBSCAN

Answer: b. J48.

Explanation: J48 is a decision tree algorithm based on the C4.5

algorithm, which builds a tree of if-then rules to make
predictions. Weka provides various rule-based learning algorithms,
including ZeroR, OneR, and more.

17. Which of the following is a data imbalance problem in Weka?

a. Overfitting
b. Underfitting
c. Missing values
d. Class imbalance

Answer: d. Class imbalance.

Explanation: Class imbalance is a data imbalance problem that occurs

when one class in the dataset has significantly fewer samples than the
other classes, leading to biased predictions. Weka provides various
techniques to handle class imbalance, including resampling, cost-sensitive
learning, and more.

18. Which of the following is NOT a type of regression algorithm in

Weka?

a. Linear Regression
b. Polynomial Regression
c. Logistic Regression
d. K-means

Answer: d. K-means.

Explanation: K-means is not a regression algorithm, but a clustering

algorithm used to group similar data points together.

19. Which of the following is NOT a type of cross-validation in

Weka?

a. K-fold cross-validation
b. Leave-one-out cross validation
c. Stratified cross-validation
d. Naive Bayes cross-validation

Answer: d. Naive Bayes cross-validation.

Explanation: Naive Bayes cross-validation is not a type of cross-

validation in Weka, but a technique used to evaluate the performance of
Naive Bayes classifier on a dataset.

20. Which of the following is a data discretization technique in

Weka?
a. Equal width discretization
b. Normalization
c. Principal Component Analysis (PCA)
d. Recursive Feature Elimination (RFE)

Answer: a. Equal width discretization.

Explanation: Equal width discretization is a data discretization technique

that divides the range of values into equal-width intervals and assigns a
discrete value to each interval. Weka provides various data discretization
techniques, including equal frequency discretization, unsupervised
discretization, and more.

21. Which of the following is NOT a type of ensemble learning

algorithm in Weka?

a. Bagging
b. Boosting
c. Random Forest
d. K-means

Answer: d. K-means.

Explanation: K-means is not an ensemble learning algorithm, but a

clustering algorithm used to group similar data points together. Weka
provides various ensemble learning algorithms, including Bagging,
Boosting, and Random Forest.

22. Which of the following is a dimensionality reduction technique

in Weka?
a. Principal Component Analysis (PCA)
b. Recursive Feature Elimination (RFE)
c. Correlation-based Feature Selection (CFS)
d. K-nearest neighbors (KNN)

Answer: a. Principal Component Analysis (PCA).

Explanation: PCA is a dimensionality reduction technique that transforms

the original features into a smaller set of uncorrelated features that
explain most of the variance in the data. Weka provides various
dimensionality reduction techniques, including PCA, Linear Discriminant
Analysis (LDA), and more.

23. Which of the following is a non-parametric classification

algorithm in Weka?

a. Logistic Regression
b. Decision Tree
c. Naive Bayes
d. k-Nearest Neighbors (k-NN)

Answer: d. k-Nearest Neighbors (k-NN).

Explanation: k-NN is a non-parametric classification algorithm that uses

the k-nearest neighbors to classify a new instance based on the majority
class of its neighbors. Weka provides various non-parametric classification
algorithms, including k-NN, Random Forest, and more.

24. Which of the following is a neural network activation function

available in Weka?
a. Sigmoid
b. ReLU
c. Tanh
d. All of the above

Answer: d. All of the above.

Explanation: Weka provides various neural network activation functions,

including Sigmoid, ReLU, Tanh, and more.

25. Which of the following is a clustering evaluation metric

in Weka?

a. Accuracy
b. F-measure
c. Silhouette coefficient
d. Precision

Answer: c. Silhouette coefficient.

Explanation: Silhouette coefficient is a clustering evaluation metric that

measures the quality of clustering by comparing the distance between the
data points within the same cluster and the distance between the data
points of different clusters. Weka provides various clustering evaluation
metrics, including Silhouette coefficient, Sum of Squared Error (SSE), and
more.

26. Which of the following is a data imbalance handling technique

in Weka?

a. Bagging
b. SMOTE
c. Random Forest
d. Boosting

Answer: b. SMOTE.

Explanation: SMOTE (Synthetic Minority Over-sampling Technique) is a

data imbalance handling technique that creates synthetic samples of the
minority class by interpolating between the existing minority class
samples. Weka provides various data imbalance handling techniques,
including SMOTE, ADASYN, and more.

27. Which of the following is a regression algorithm in Weka?

a. Decision Tree
b. k-Nearest Neighbors (k-NN)
c. Linear Regression
d. Support Vector Machine (SVM)

Answer: c. Linear Regression.

Explanation: Linear Regression is a regression algorithm that models the

relationship between the dependent variable and one or more
independent variables by fitting a linear equation to the data. Weka
provides various regression algorithms, including Linear Regression,
Multilayer Perceptron (MLP), and more.

28. Which of the following is a data normalization technique in

Weka?

a. Min-max normalization
b. Recursive Feature Elimination (RFE)
c. Correlation-based Feature Selection (CFS)
d. Principal Component Analysis (PCA)

Answer: a. Min-max normalization.

Explanation: Min-max normalization is a data normalization technique

that scales the data to a fixed range of values between 0 and 1. Weka
provides various data normalization techniques, including z-score
normalization, decimal scaling, and more.

29. Which of the following is NOT a type of attribute selection in

Weka?

a. Wrapper Subset Evaluator

b. Filter Subset Evaluator
c. Correlation-based Feature Selection (CFS)
d. Boosting

Answer: d. Boosting.

Explanation: Boosting is not a type of attribute selection, but an

ensemble learning algorithm used for classification and regression tasks.
Weka provides various attribute selection techniques, including Wrapper
Subset Evaluator, Filter Subset Evaluator, and more.

30. Which of the following is a classification algorithm in Weka?

a. Support Vector Machine (SVM)

b. k-Means
c. Hierarchical Clustering
d. PCA

Answer: a. Support Vector Machine (SVM).

Explanation: SVM is a classification algorithm that separates the data
into different classes by finding the hyperplane that maximally separates
the classes. Weka provides various classification algorithms, including
SVM, Naive Bayes, and more.

31. Which of the following is NOT a type of ensemble learning in

Weka?

a. AdaBoost
b. Bagging
c. Boosting
d. Random Forest

Answer: d. Random Forest.

Explanation: Random Forest is not a type of ensemble learning, but a

specific ensemble learning algorithm that uses decision trees as the base
classifiers. Weka provides various ensemble learning algorithms, including
AdaBoost, Bagging, and Boosting.

32. Which of the following is a distance metric used in k-Nearest

Neighbors (k-NN) algorithm in Weka?

a. Euclidean distance
b. Manhattan distance
c. Mahalanobis distance
d. All of the above

Answer: d. All of the above.

Explanation: Weka provides various distance metrics used in k-Nearest
Neighbors (k-NN) algorithm, including Euclidean distance, Manhattan
distance, and Mahalanobis distance.

33. Which of the following is a missing value handling technique

in Weka?

a. Mean imputation
b. Median imputation
c. Mode imputation
d. All of the above

Answer: d. All of the above.

Explanation: Weka provides various missing value handling techniques,

including mean imputation, median imputation, mode imputation, and
more.

34. Which of the following is a kernel function used in Support

Vector Machine (SVM) algorithm in Weka?

a. Linear kernel
b. Polynomial kernel
c. Gaussian kernel
d. All of the above

Answer: d. All of the above.

Explanation: Weka provides various kernel functions used in Support

Vector Machine (SVM) algorithm, including linear kernel, polynomial
kernel, and Gaussian kernel.

35. Which of the following is a rule-based classifier in Weka?

a. Decision Tree
b. Naive Bayes
c. ZeroR
d. JRip

Answer: d. JRip.

Explanation: JRip is a rule-based classifier in Weka that constructs a set

of rules from the data that classify the instances based on their attribute
values. Weka provides various rule-based classifiers, including JRip, PART,
and more.

36. Which of the following is NOT a type of clustering algorithm in

Weka?

a. k-Means
b. Hierarchical Clustering
c. DBSCAN
d. Linear Regression

Answer: d. Linear Regression.

Explanation: Linear Regression is not a type of clustering algorithm, but

a regression algorithm used to model the relationship between the
dependent variable and one or more independent variables. Weka
provides various clustering algorithms, including k-Means, Hierarchical
Clustering, DBSCAN, and more.

37. Which of the following is a feature selection technique that

selects a subset of features based on their correlation with the
class attribute in Weka?
a. Wrapper Subset Evaluator
b. Filter Subset Evaluator
c. Correlation-based Feature Selection (CFS)
d. Principal Component Analysis (PCA)

Answer: c. Correlation-based Feature Selection (CFS).

Explanation: CFS is a feature selection technique in Weka that selects a

subset of features based on their correlation with the class attribute.
Weka provides various feature selection techniques, including Wrapper
Subset Evaluator, Filter Subset Evaluator, and more.

38. Which of the following is a rule induction algorithm in Weka?

a. k-NN
b. Apriori
c. Random Forest
d. JRip

Answer: d. JRip.

Explanation: JRip is a rule induction algorithm in Weka that constructs a

set of rules from the data that classify the instances based on their
attribute values. Weka provides various rule induction algorithms,
including JRip, PART, and more.

39. Which of the following is a type of ensemble learning

technique in Weka?

a. Decision Tree
b. Naive Bayes
c. Bagging
d. k-NN

Answer: c. Bagging.

Explanation: Bagging is a type of ensemble learning technique in Weka

that constructs multiple models from different subsets of the data and
combines them to improve the predictive performance. Weka provides
various ensemble learning techniques, including Bagging, Boosting, and
more.

40. Which of the following is a dimensionality reduction technique

that maximizes the margin between classes in Weka?

a. PCA
b. LDA
c. ICA
d. SVM

Answer: b. LDA.

Explanation: LDA is a dimensionality reduction technique in Weka that

maximizes the margin between classes by finding the linear combinations
of features that best separate the classes. SVM is a classification
algorithm that can use LDA as a preprocessing step. Weka provides
various dimensionality reduction techniques, including PCA, LDA, and
more.

41. Which of the following is a type of classification algorithm in

Weka that assigns a class label based on the most common class
in the training data?
a. Decision Tree
b. Naive Bayes
c. k-NN
d. ZeroR

Answer: d. ZeroR.

Explanation: ZeroR is a type of classification algorithm in Weka that

assigns a class label based on the most common class in the training
data. It is a simple baseline classifier used to evaluate the predictive
performance of more complex classifiers. Weka provides various
classification algorithms, including Decision Tree, Naive Bayes, k-NN, and
more.

42. Which of the following is a type of ensemble learning

technique that combines multiple models using weighted voting in
Weka?

a. Bagging
b. Boosting
c. Stacking
d. Random Forest

Answer: c. Stacking.

Explanation: Stacking is a type of ensemble learning technique in Weka

that combines multiple models using weighted voting. The output of the
base models is used as input to a meta-model that learns how to combine
them to make the final prediction. Weka provides various ensemble
learning techniques, including Bagging, Boosting, Random Forest, and
more.
43. Which of the following is a feature selection technique that
evaluates the subsets of features using a learning algorithm in
Weka?

a. Wrapper Subset Evaluator

b. Filter Subset Evaluator
c. Correlation-based Feature Selection (CFS)
d. Principal Component Analysis (PCA)

Answer: a. Wrapper Subset Evaluator.

Explanation: Wrapper Subset Evaluator is a feature selection technique

in Weka that evaluates the subsets of features using a learning algorithm.
It searches through the space of possible feature subsets and selects the
one that achieves the best performance on the validation set. Weka
provides various feature selection techniques, including Wrapper Subset
Evaluator, Filter Subset Evaluator, and more

44. Which of the following is a clustering algorithm in Weka that

uses a density-based approach?

a. k-Means
b. EM
c. DBSCAN
d. SOM

Answer: c. DBSCAN.

Explanation: DBSCAN is a clustering algorithm in Weka that uses a

density-based approach to group the instances into clusters. It works by
identifying the dense regions of the data and connecting them into
clusters. Weka provides various clustering algorithms, including k-Means,
EM, DBSCAN, SOM, and more.

45. Which of the following is a method for handling missing values

in Weka that uses the available data to estimate the missing
values?

a. Mean Imputation
b. Mode Imputation
c. Median Imputation
d. k-NN Imputation

Answer: d. k-NN Imputation.

Explanation: k-NN Imputation is a method for handling missing values in

Weka that uses the available data to estimate the missing values. It
works by finding the k nearest instances to the instance with missing
values and using their attribute values to estimate the missing values.
Weka provides various methods for handling missing values, including
Mean Imputation, Mode Imputation, Median Imputation, k-NN Imputation,
and more.

46. Which of the following is a type of ensemble learning

technique in Weka that combines multiple models using a
weighted sum of their predictions?

a. Bagging
b. Boosting
c. Stacking
d. Random Forest

Answer: b. Boosting.
Explanation: Boosting is a type of ensemble learning technique in Weka
that combines multiple models using a weighted sum of their predictions.
It works by iteratively reweighting the instances based on their
classification errors and building a new model on the reweighted data.
Weka provides various ensemble learning techniques, including Bagging,
Boosting, Random Forest, Stacking, and more.

47. Which of the following is a type of classification algorithm in

Weka that models the joint probability distribution of the features
and the class?

a. Naive Bayes
b. k-NN
c. Decision Tree
d. SVM

Answer: a. Naive Bayes.

Explanation: Naive Bayes is a type of classification algorithm in Weka

that models the joint probability distribution of the features and the class
using Bayes’ theorem and the assumption of independence between the
features. Weka provides various classification algorithms, including Naive
Bayes, k-NN, Decision Tree, SVM, and more.

48. Which of the following is a clustering algorithm in Weka that

uses a probabilistic approach?

a. k-Means
b. EM
c. DBSCAN
d. SOM
Answer: b. EM.

Explanation: EM is a clustering algorithm in Weka that uses a

probabilistic approach to group the instances into clusters. It works by
modeling the data as a mixture of probability distributions and estimating
the parameters of the distributions using the Expectation-Maximization
algorithm. Weka provides various clustering algorithms, including k-
Means, EM, DBSCAN, SOM, and more.

49. Which of the following is a feature selection technique that

evaluates the subsets of features based on their predictive power
and selects the subset that gives the best performance?

a. Filter
b. Wrapper
c. Embedded
d. Correlation-based

Answer: b. Wrapper.

Explanation: Wrapper is a feature selection technique in Weka that

evaluates the subsets of features based on their predictive power and
selects the subset that gives the best performance. It works by using a
learning algorithm to train and evaluate the model on each subset of
features and selecting the subset that gives the best performance. Weka
provides various feature selection techniques, including Filter, Wrapper,
Embedded, Correlation-based, and more.

50. Which of the following is a type of dimensionality reduction

technique in Weka that maps the high-dimensional data to a
lower-dimensional space while preserving the pairwise distances
between the instances?
a. Principal Component Analysis (PCA)
b. Linear Discriminant Analysis (LDA)
c. t-SNE
d. Isomap

Answer: d. Isomap.

Explanation: Isomap is a type of dimensionality reduction technique in

Weka that maps the high-dimensional data to a lower-dimensional space
while preserving the pairwise distances between the instances. It works
by constructing a neighborhood graph of the instances and estimating the
geodesic distances between them using a shortest path algorithm. Weka
provides various dimensionality reduction techniques, including PCA, LDA,
t-SNE, Isomap, and more.

51. Which of the following is a type of rule-based classification

algorithm in Weka that builds a set of rules from the data?

a. OneR
b. ZeroR
c. JRip
d. Random Tree

Answer: c. JRip.

Explanation: JRip is a type of rule-based classification algorithm

in Weka that builds a set of rules from the data. It works by iteratively
adding rules to the rule set based on the accuracy and coverage of the
rules. Weka provides various classification algorithms, including OneR,
ZeroR, JRip, Random Tree, and more.
52. Which of the following is a type of clustering algorithm in
Weka that uses a hierarchical approach to group the instances
into clusters?

a. k-Means
b. EM
c. DBSCAN
d. Hierarchical

Answer: d. Hierarchical.

Explanation: Hierarchical is a type of clustering algorithm in Weka that

uses a hierarchical approach to group the instances into clusters. It works
by recursively merging the most similar clusters based on a distance
metric until all the instances are in a single cluster. Weka provides various
clustering algorithms, including k-Means, EM, DBSCAN, Hierarchical, and
more.

53. Which of the following is a type of feature selection technique

in Weka that selects the features based on their correlation with
the class and removes the redundant features?

a. Filter
b. Wrapper
c. Embedded
d. Correlation-based

Answer: d. Correlation-based.

Explanation: Correlation-based is a feature selection technique in Weka

that selects the features based on their correlation with the class and
removes the redundant features. It works by computing the correlation
between each feature and the class and selecting the subset of features
with the highest correlation. Weka provides various feature selection
techniques, including Filter, Wrapper, Embedded, Correlation-based, and
more.

54. Which of the following is a type of clustering algorithm in

Weka that uses a grid-based approach to group the instances into
clusters?

a. k-Means
b. EM
c. DBSCAN
d. CLIQUE

Answer: d. CLIQUE.

Explanation: CLIQUE is a type of clustering algorithm in Weka that uses

a grid-based approach to group the instances into clusters. It works by
partitioning the data into overlapping grids and identifying the dense
regions of the data within each grid. Weka provides various clustering
algorithms, including k-Means, EM, DBSCAN, CLIQUE, and more.

55. Which of the following is a type of classification algorithm in

Weka that builds a decision tree from the data?

a. Naive Bayes
b. k-NN
c. J48
d. Random Forest

Answer: c. J48.

Laboratory Manual On: Data Mining
No ratings yet
Laboratory Manual On: Data Mining
41 pages
Summer Internship 514 Report
No ratings yet
Summer Internship 514 Report
36 pages
Introduction To WEKA: Data Mining WEKA - What Is It? Weka Uis Integration With Pentaho Projects Based On Weka
No ratings yet
Introduction To WEKA: Data Mining WEKA - What Is It? Weka Uis Integration With Pentaho Projects Based On Weka
27 pages
CS-703 (B) Data Warehousing and Data Mining Lab
No ratings yet
CS-703 (B) Data Warehousing and Data Mining Lab
50 pages
WEKA Practical Protocol
No ratings yet
WEKA Practical Protocol
40 pages
Data Mining Example (Using Weka)
50% (2)
Data Mining Example (Using Weka)
59 pages
MCSL-223 Section 2 Data Mining Lab
No ratings yet
MCSL-223 Section 2 Data Mining Lab
55 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
55 pages
Literature Survey On Customer Churn Prediction
No ratings yet
Literature Survey On Customer Churn Prediction
4 pages
WEKA Lab Record
No ratings yet
WEKA Lab Record
69 pages
Data Warehousing and Data Mining Lab Manual
100% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
Data Warehousing and Data Mining Lab Manual
0% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
Data Warehousing Lab Excercise
No ratings yet
Data Warehousing Lab Excercise
45 pages
Lab Manual - DM
No ratings yet
Lab Manual - DM
56 pages
Soal CISDM
No ratings yet
Soal CISDM
3 pages
Rintro Wekacomplete
No ratings yet
Rintro Wekacomplete
135 pages
From Field Problems To Machine Learning
No ratings yet
From Field Problems To Machine Learning
51 pages
Additional Program
No ratings yet
Additional Program
573 pages
DM Lab Material
No ratings yet
DM Lab Material
88 pages
32013105-BDA LabManual
No ratings yet
32013105-BDA LabManual
122 pages
Unit-7 Tools of AI (April 9, 2024)
No ratings yet
Unit-7 Tools of AI (April 9, 2024)
88 pages
AI-43 Data Mining
No ratings yet
AI-43 Data Mining
96 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
50 pages
DWM1 Riya
No ratings yet
DWM1 Riya
16 pages
Data Warehousing Lab Record Final
No ratings yet
Data Warehousing Lab Record Final
45 pages
Deepak Dmbi File
No ratings yet
Deepak Dmbi File
40 pages
Lab Updated - Merged
No ratings yet
Lab Updated - Merged
49 pages
Mooc On Weka
No ratings yet
Mooc On Weka
59 pages
DW Lab Record
No ratings yet
DW Lab Record
44 pages
DWDM File-Final Ver3.pdf 20241230 172003 0000
No ratings yet
DWDM File-Final Ver3.pdf 20241230 172003 0000
54 pages
Weka Weka: A - Antony Alex MCA DR G R D College of Science - CBE Tamil Nadu - India
No ratings yet
Weka Weka: A - Antony Alex MCA DR G R D College of Science - CBE Tamil Nadu - India
23 pages
Weka Lab Manual
No ratings yet
Weka Lab Manual
49 pages
DWM Practical ..
No ratings yet
DWM Practical ..
41 pages
Weka Software Manuala
No ratings yet
Weka Software Manuala
20 pages
Weka Tutorial
No ratings yet
Weka Tutorial
32 pages
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
No ratings yet
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
31 pages
Report - Mini Project
No ratings yet
Report - Mini Project
66 pages
Weka Overview Slides
No ratings yet
Weka Overview Slides
31 pages
DWM1
No ratings yet
DWM1
19 pages
Appendix Weka
No ratings yet
Appendix Weka
17 pages
Jawaharlal Nehru Technology University-A, Ananthapur: A Social Relevant Project Report Submitted To
No ratings yet
Jawaharlal Nehru Technology University-A, Ananthapur: A Social Relevant Project Report Submitted To
24 pages
Data Warehouse Lab Manual
No ratings yet
Data Warehouse Lab Manual
60 pages
Evaluating A New Generation of Expansive Claims About Vote Manipulation
No ratings yet
Evaluating A New Generation of Expansive Claims About Vote Manipulation
57 pages
Machine Learning Part: Domain Overview
No ratings yet
Machine Learning Part: Domain Overview
20 pages
Weka Tutorial
No ratings yet
Weka Tutorial
8 pages
Classification Algorithm
No ratings yet
Classification Algorithm
51 pages
Chapter 1: Introduction: 1.1. General
No ratings yet
Chapter 1: Introduction: 1.1. General
49 pages
DWDM Lab File
No ratings yet
DWDM Lab File
29 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
WEKA Intro
No ratings yet
WEKA Intro
17 pages
Aiml Manual
No ratings yet
Aiml Manual
27 pages
9348 11568 1 PB Published Paper
No ratings yet
9348 11568 1 PB Published Paper
12 pages
SSRN Id4128261
No ratings yet
SSRN Id4128261
13 pages
Diabetes Prediction Using Machine Learning Classification Techniques
No ratings yet
Diabetes Prediction Using Machine Learning Classification Techniques
34 pages
Horse Pologne
No ratings yet
Horse Pologne
37 pages
Project Report On Customer Lifetime Value
No ratings yet
Project Report On Customer Lifetime Value
23 pages
Data Warehousing
No ratings yet
Data Warehousing
54 pages
Deep Learning Based Campus Placement Prediction
No ratings yet
Deep Learning Based Campus Placement Prediction
19 pages
Causal Forest Athey Wagner 2019
No ratings yet
Causal Forest Athey Wagner 2019
15 pages
Ijiset V2 I2 63 PDF
No ratings yet
Ijiset V2 I2 63 PDF
9 pages
Paper 99-Liver Disease Prediction and Classification Using Machine Learning
No ratings yet
Paper 99-Liver Disease Prediction and Classification Using Machine Learning
9 pages
Time Series Regime Analysis in Python - by Spencer - Medium
No ratings yet
Time Series Regime Analysis in Python - by Spencer - Medium
26 pages
Computer Network
No ratings yet
Computer Network
10 pages
Data Base Management Key Points
No ratings yet
Data Base Management Key Points
8 pages
Agrilyst The Crop Advisor
No ratings yet
Agrilyst The Crop Advisor
7 pages
Ihic-2022 PPT Paper - Id 100
No ratings yet
Ihic-2022 PPT Paper - Id 100
11 pages
Data Werehousing Lab Manual
No ratings yet
Data Werehousing Lab Manual
63 pages
Ultimate Beginner's Path For 2017: 3.1: Getting Started and Testing The Waters
No ratings yet
Ultimate Beginner's Path For 2017: 3.1: Getting Started and Testing The Waters
14 pages
Weka Data Miningvsem
No ratings yet
Weka Data Miningvsem
7 pages
Biomolecules
No ratings yet
Biomolecules
14 pages
Datawarehouse Pract 2
No ratings yet
Datawarehouse Pract 2
7 pages
Experiment WEKA
No ratings yet
Experiment WEKA
16 pages
Weka Installation Steps Final
No ratings yet
Weka Installation Steps Final
7 pages
DMBI Exp1: Introduction To WEKA Tool
No ratings yet
DMBI Exp1: Introduction To WEKA Tool
6 pages
Lab 04
No ratings yet
Lab 04
7 pages
DWDM WEEK1&2
No ratings yet
DWDM WEEK1&2
13 pages
EXP No 1
No ratings yet
EXP No 1
7 pages
Abstracts
No ratings yet
Abstracts
18 pages
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
No ratings yet
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
4 pages
Dual Axis Solar Tracking
No ratings yet
Dual Axis Solar Tracking
9 pages
9419 44206 1 PB
No ratings yet
9419 44206 1 PB
7 pages
20BCP021 Assignment 3
No ratings yet
20BCP021 Assignment 3
7 pages
Lab 02
No ratings yet
Lab 02
4 pages
Week 1
No ratings yet
Week 1
4 pages
A Survey On Diabetes Risk Prediction Using Machine.50
No ratings yet
A Survey On Diabetes Risk Prediction Using Machine.50
6 pages
Weka (Software)
No ratings yet
Weka (Software)
4 pages
Weka (Software)
No ratings yet
Weka (Software)
4 pages
Bioinformatics: Applications Note
No ratings yet
Bioinformatics: Applications Note
3 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Machine Learning with Python: A Comprehensive Guide with a Practical Example
From Everand
Machine Learning with Python: A Comprehensive Guide with a Practical Example
MARTIN NEEL
No ratings yet