0% found this document useful (0 votes)

59 views

Viva Data Mining Lab

Good

Uploaded by

Amit Gaurav

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views

Viva Data Mining Lab

Good

Uploaded by

Amit Gaurav

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

1. What is Data Mining?

Data mining refers to extracting or mining knowledge from large amounts of data. In other words, Data
mining is the science, art, and technology of discovering large and complex bodies of data in order to
discover useful patterns.

2. What are the different tasks of Data Mining?

The following activities are carried out during data mining:

• Classification

• Clustering

• Association Rule Discovery

• Sequential Pattern Discovery

• Regression

• Deviation Detection

3. Discuss the Life cycle of Data Mining projects?

The life cycle of Data mining projects:

Business understanding: Understanding projects objectives from a business perspective, data mining
problem definition.

Data understanding: Initial data collection and understand it.

Data preparation: Constructing the final data set from raw data.

Modeling: Select and apply data modeling techniques.

Evaluation: Evaluate model, decide on further deployment.

Deployment: Create a report, carry out actions based on new insights.

4. Explain the process of KDD?

Data mining treat as a synonym for another popularly used term, Knowledge Discovery from Data, or
KDD. In others view data mining as simply an essential step in the process of knowledge discovery, in
which intelligent methods are applied in order to extract data patterns.

Knowledge discovery from data consists of the following steps:

• Data cleaning (to remove noise or irrelevant data).

• Data integration (where multiple data sources may be combined).

• Data selection (where data relevant to the analysis task are retrieved from the database).

• Data transformation (where data are transmuted or consolidated into forms appropriate for
mining by performing summary or aggregation functions, for sample).

• Data mining (an important process where intelligent methods are applied in order to extract
data patterns).

• Pattern evaluation (to identify the fascinating patterns representing knowledge based on some
interestingness measures).

• Knowledge presentation (where knowledge representation and visualization techniques are used
to present the mined knowledge to the user).

5. What is Classification?

Classification is the processing of finding a set of models (or functions) that describe and distinguish data
classes or concepts, for the purpose of being able to use the model to predict the class of objects whose
class label is unknown. Classification can be used for predicting the class label of data items. However, in
many applications, one may like to calculate some missing or unavailable data values rather than class
labels.

7. What is Prediction?

Prediction can be viewed as the construction and use of a model to assess the class of an unlabeled
object, or to measure the value or value ranges of an attribute that a given object is likely to have. In this
interpretation, classification and regression are the two major types of prediction problems where
classification is used to predict discrete or nominal values, while regression is used to predict incessant
or ordered values.

8. Explain the Decision Tree Classifier?

A Decision tree is a flow chart-like tree structure, where each internal node (non-leaf node) denotes a
test on an attribute, each branch represents an outcome of the test and each leaf node (or terminal
node) holds a class label. The topmost node of a tree is the root node.

A Decision tree is a classification scheme that generates a tree and a set of rules, representing the model
of different classes, from a given data set. The set of records available for developing classification
methods is generally divided into two disjoint subsets namely a training set and a test set. The former is
used for originating the classifier while the latter is used to measure the accuracy of the classifier. The
accuracy of the classifier is determined by the percentage of the test examples that are correctly
classified.

In the decision tree classifier, we categorize the attributes of the records into two different types.
Attributes whose domain is numerical are called the numerical attributes and the attributes whose
domain is not numerical are called categorical attributes. There is one distinguished attribute called a
class label. The goal of classification is to build a concise model that can be used to predict the class of
the records whose class label is unknown. Decision trees can simply be converted to classification rules.
9. What are the advantages of a decision tree classifier?

• Decision trees are able to produce understandable rules.

• They are able to handle both numerical and categorical attributes.

• They are easy to understand.

• Once a decision tree model has been built, classifying a test record is extremely fast.

• Decision tree depiction is rich enough to represent any discrete value classifier.

• Decision trees can handle datasets that may have errors.

• Decision trees can deal with handle datasets that may have missing values.

• They do not require any prior assumptions. Decision trees are self-explanatory and when
compacted they are also easy to follow. That is to say, if the decision tree has a reasonable
number of leaves it can be grasped by non-professional users. Furthermore, since decision trees
can be converted to a set of rules, this sort of representation is considered comprehensible.

9. What are the advantages of a decision tree classifier?

Decision trees are able to produce understandable rules.

They are able to handle both numerical and categorical attributes.

They are easy to understand.

Once a decision tree model has been built, classifying a test record is extremely fast.

Decision tree depiction is rich enough to represent any discrete value classifier.

Decision trees can handle datasets that may have errors.

Decision trees can deal with handle datasets that may have missing values.

They do not require any prior assumptions. Decision trees are self-explanatory and when compacted
they are also easy to follow. That is to say, if the decision tree has a reasonable number of leaves it can
be grasped by non-professional users. Furthermore, since decision trees can be converted to a set of
rules, this sort of representation is considered comprehensible.

9. What are the advantages of a decision tree classifier?

• Decision trees are able to produce understandable rules.

• They are able to handle both numerical and categorical attributes.

• They are easy to understand.

• Once a decision tree model has been built, classifying a test record is extremely fast.

• Decision tree depiction is rich enough to represent any discrete value classifier.
• Decision trees can handle datasets that may have errors.

• Decision trees can deal with handle datasets that may have missing values.

12. What are Neural networks?

A neural network is a set of connected input/output units where each connection has a weight
associated with it. During the knowledge phase, the network acquires by adjusting the weights to be
able to predict the correct class label of the input samples. Neural network learning is also denoted as
connectionist learning due to the connections between units. Neural networks involve long training
times and are therefore more appropriate for applications where this is feasible. They require a number
of parameters that are typically best determined empirically, such as the network topology or
“structure”. Neural networks have been criticized for their poor interpretability since it is difficult for
humans to take the symbolic meaning behind the learned weights. These features firstly made neural
networks less desirable for data mining.

The advantages of neural networks, however, contain their high tolerance to noisy data as well as their
ability to classify patterns on which they have not been trained. In addition, several algorithms have
newly been developed for the extraction of rules from trained neural networks. These issues contribute
to the usefulness of neural networks for classification in data mining. The most popular neural network
algorithm is the backpropagation algorithm, proposed in the 1980s

16. Define Clustering in Data Mining?

Clustering is the task of dividing the population or data points into a number of groups such that data
points in the same groups are more similar to other data points in the same group and dissimilar to the
data points in other groups. It is basically a collection of objects on the basis of similarity and
dissimilarity between them

17. Write a difference between classification and clustering?[IMP]

Parameters CLASSIFICATION CLUSTERING

Type Used for supervised need learning Used for unsupervised learning

Process of classifying the input Grouping the instances based on

Basic instances based on their their similarity without the help of
corresponding class labels class labels
Parameters CLASSIFICATION CLUSTERING

It has labels so there is a need for

There is no need for training and
Need training and testing data set for
testing dataset
verifying the model created

More complex as compared to Less complex as compared to

Complexity
clustering classification

k-means clustering algorithm,

Logistic regression, Naive Bayes
Example Fuzzy c-means clustering
classifier, Support vector machines,
Algorithms algorithm, Gaussian (EM)
etc.
clustering algorithm etc.
18. What is Supervised and Unsupervised Learning?[TCS interview question]

Supervised learning, as the name indicates, has the presence of a supervisor as a teacher. Basically
supervised learning is when we teach or train the machine using data that is well labeled. Which means
some data is already tagged with the correct answer. After that, the machine is provided with a new set
of examples(data) so that the supervised learning algorithm analyses the training data(set of training
examples) and produces a correct outcome from labeled data.

Unsupervised learning is the training of a machine using information that is neither classified nor
labeled and allowing the algorithm to act on that information without guidance. Here the task of the
machine is to group unsorted information according to similarities, patterns, and differences without any
prior training of data.

Unlike supervised learning, no teacher is provided that means no training will be given to the machine.
Therefore, the machine is restricted to find the hidden structure in unlabeled data by itself.

19. Name areas of applications of data mining?

• Data Mining Applications for Finance

• Healthcare

• Intelligence

• Telecommunication

• Energy

• Retail

• E-commerce

• Supermarkets

• Crime Agencies
• Businesses Benefit from data mining

18. What is Supervised and Unsupervised Learning?[TCS interview question]

Unsupervised learning is the training of a machine using information that is neither classified nor labeled
and allowing the algorithm to act on that information without guidance. Here the task of the machine is
to group unsorted information according to similarities, patterns, and differences without any prior
training of data.

Unlike supervised learning, no teacher is provided that means no training will be given to the machine.
Therefore, the machine is restricted to find the hidden structure in unlabeled data by itself.

19. Name areas of applications of data mining?

Data Mining Applications for Finance

Healthcare

Intelligence

Telecommunication

Energy

Retail

E-commerce

Supermarkets

Crime Agencies

Businesses Benefit from data mining

22. Differentiate Between Data Mining And Data Warehousing?

Data Mining: It is the process of finding patterns and correlations within large data sets to identify
relationships between data. Data mining tools allow a business organization to predict customer
behavior. Data mining tools are used to build risk models and detect fraud. Data mining is used in market
analysis and management, fraud detection, corporate analysis, and risk management.

It is a technology that aggregates structured data from one or more sources so that it can be compared
and analyzed rather than transaction processing.

Data Warehouse: A data warehouse is designed to support the management decision-making process by
providing a platform for data cleaning, data integration, and data consolidation. A data warehouse
contains subject-oriented, integrated, time-variant, and non-volatile data.

Data warehouse consolidates data from many sources while ensuring data quality, consistency, and
accuracy. Data warehouse improves system performance by separating analytics processing from
transnational databases. Data flows into a data warehouse from the various databases. A data
warehouse works by organizing data into a schema that describes the layout and type of data. Query
tools analyze the data tables using schema.

23.What is Data Purging?

The term purging can be defined as Erase or Remove. In the context of data mining, data purging is the
process of remove, unnecessary data from the database permanently and clean data to maintain its
integrity.

24. What Are Cubes?

A data cube stores data in a summarized version which helps in a faster analysis of data. The data is
stored in such a way that it allows reporting easily. E.g. using a data cube A user may want to analyze the
weekly, monthly performance of an employee. Here, month and week could be considered as the
dimensions of the cube.

25.What are the differences between OLAP And OLTP?[IMP]

OLTP (Online Transaction
OLAP (Online Analytical Processing) Processing)

Consists of historical data from various Consists only of application-oriented

Databases. day-to-day operational current data.

Application-oriented day-to-dayIt is subject-

It is application-oriented. Used for
oriented. Used for Data Mining, Analytics,
business tasks.
Decision making, etc.
OLTP (Online Transaction
OLAP (Online Analytical Processing) Processing)

The data is used in planning, problem- The data is used to perform day-to-day
solving, and decision-making. fundamental operations.

It reveals a snapshot of present business It provides a multi-dimensional view of

tasks. different business tasks.

The size of the data is relatively small as

A large forex amount of data is stored
the historical data is archived. For
typically in TB, PB
example, MB, GB

Relatively slow as the amount of data Very Fast as the queries operate on 5%
involved is large. Queries may take hours. of the data.

It only needs backup from time to time as The backup and recovery process is
compared to OLTP. maintained religiously

This data is generally managed by the CEO, This data is managed by clerks,
MD, GM. managers.

Only read and rarely write operation. Both read and write operations.
26. Explain Association Algorithm In Data Mining?

Association analysis is the finding of association rules showing attribute-value conditions that occur
frequently together in a given set of data. Association analysis is widely used for a market basket or
transaction data analysis. Association rule mining is a significant and exceptionally dynamic area of data
mining research. One method of association-based classification, called associative classification,
consists of two steps. In the main step, association instructions are generated using a modified version of
the standard association rule mining algorithm known as Apriori. The second step constructs a classifier
based on the association rules discovered.

27. Explain how to work with data mining algorithms included in SQL server data mining?

SQL Server data mining offers Data Mining Add-ins for Office 2007 that permits finding the patterns and
relationships of the information. This helps in an improved analysis. The Add-in called a Data Mining
Client for Excel is utilized to initially prepare information, create models, manage, analyze, results.

28. Explain Over-fitting?

The concept of over-fitting is very important in data mining. It refers to the situation in which the
induction algorithm generates a classifier that perfectly fits the training data but has lost the capability of
generalizing to instances not presented during training. In other words, instead of learning, the classifier
just memorizes the training instances. In the decision trees over fitting usually occurs when the tree has
too many nodes relative to the amount of training data available. By increasing the number of nodes, the
training error usually decreases while at some point the generalization error becomes worse. The Over-
fitting can lead to difficulties when there is noise in the training data or when the number of the training
datasets, the error of the fully built tree is zero, while the true error is likely to be bigger.

There are many disadvantages of an over-fitted decision tree:

• Over-fitted models are incorrect.

• Over-fitted decision trees require more space and more computational resources.

• They require the collection of unnecessary features.

32. Explain the Issues regarding Classification And Prediction?

Preparing the data for classification and prediction:

Data cleaning

Relevance analysis

Data transformation

Comparing classification methods

Predictive accuracy

Speed

Robustness

Scalability

Interpretability

32. Explain the Issues regarding Classification And Prediction?

Preparing the data for classification and prediction:

• Data cleaning

• Relevance analysis

• Data transformation

• Comparing classification methods

• Predictive accuracy

• Speed

• Robustness

• Scalability

• Interpretability

34. What is a machine learning-based approach to data mining?

This question is the high-level Data Mining Interview Questions asked in an Interview. Machine learning
is basically utilized in data mining since it covers automatic programmed processing systems, and it
depended on logical or binary tasks. . Machine learning for the most part follows the rule that would
permit us to manage more general information types, incorporating cases and in these sorts and number
of attributes may differ. Machine learning is one of the famous procedures utilized for data mining and in
Artificial intelligence too

35.What is the K-means algorithm?

K-means clustering algorithm – It is the simplest unsupervised learning algorithm that solves clustering
problems. K-means algorithm partition n observations into k clusters where each observation belongs to
the cluster with the nearest mean serving as a prototype of the cluster.
40. Why is KNN preferred when determining missing numbers in data?

K-Nearest Neighbour (KNN) is preferred here because of the fact that KNN can easily approximate the
value to be determined based on the values closest to it.

The k-nearest neighbor (K-NN) classifier is taken into account as an example-based classifier, which
means that the training documents are used for comparison instead of an exact class illustration, like the
class profiles utilized by other classifiers. As such, there’s no real training section. once a new document
has to be classified, the k most similar documents (neighbors) are found and if a large enough proportion
of them are allotted to a precise class, the new document is also appointed to the present class,
otherwise not. Additionally, finding the closest neighbors is quickened using traditional classification
strategies.

5 What Is Data-WPS Office
No ratings yet
5 What Is Data-WPS Office
19 pages
dataminingshort Question part2
No ratings yet
dataminingshort Question part2
17 pages
Data Mining and Visualization Question Bank
100% (1)
Data Mining and Visualization Question Bank
11 pages
Top 50 Data Mining Interview Questions & Answers PDF
No ratings yet
Top 50 Data Mining Interview Questions & Answers PDF
30 pages
Module 04
No ratings yet
Module 04
75 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
DMDA Viva Questions-1
No ratings yet
DMDA Viva Questions-1
7 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
Fundamentals of Data Science Unit 4
100% (1)
Fundamentals of Data Science Unit 4
31 pages
CH-5 DM Classification
No ratings yet
CH-5 DM Classification
31 pages
Data Warehouse and Mining Notes
No ratings yet
Data Warehouse and Mining Notes
12 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Unit 4 Datamining
No ratings yet
Unit 4 Datamining
5 pages
Lecture 8
No ratings yet
Lecture 8
28 pages
Week001-Module (1) Merged
No ratings yet
Week001-Module (1) Merged
122 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
3031 Ahmad Aslam IT 6th EVE
No ratings yet
3031 Ahmad Aslam IT 6th EVE
2 pages
DataMining_Unit-3
No ratings yet
DataMining_Unit-3
8 pages
3 Module DWM
No ratings yet
3 Module DWM
16 pages
Classification Unit3
No ratings yet
Classification Unit3
15 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
V1-CH-6-Classification and Prediction
No ratings yet
V1-CH-6-Classification and Prediction
38 pages
ml unit 3
No ratings yet
ml unit 3
13 pages
Down 4
No ratings yet
Down 4
83 pages
Data Mining Unit-Iii
No ratings yet
Data Mining Unit-Iii
36 pages
Module 3
No ratings yet
Module 3
64 pages
Supervised Learning: Adane Letta Mamuye (PHD)
No ratings yet
Supervised Learning: Adane Letta Mamuye (PHD)
41 pages
Unit 2
No ratings yet
Unit 2
57 pages
Knowledge Mining Using Classification Through Clustering
No ratings yet
Knowledge Mining Using Classification Through Clustering
6 pages
4 Classification
No ratings yet
4 Classification
20 pages
Asign-3 DWDM
No ratings yet
Asign-3 DWDM
27 pages
Exercise of Chapter 4_ Data Mining Tools and Techniques Worksheet
No ratings yet
Exercise of Chapter 4_ Data Mining Tools and Techniques Worksheet
4 pages
Analysis of Various Decision Tree Algorithms For Classification in Data Mining PDF
No ratings yet
Analysis of Various Decision Tree Algorithms For Classification in Data Mining PDF
5 pages
Classification and Prediction
No ratings yet
Classification and Prediction
41 pages
updated dm unit 3
No ratings yet
updated dm unit 3
28 pages
Decision Tree For The Weather Forecasting
No ratings yet
Decision Tree For The Weather Forecasting
4 pages
Module 04 Edited
No ratings yet
Module 04 Edited
19 pages
Unit Iii Classification
No ratings yet
Unit Iii Classification
57 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
DWM Unit 3 Final Notes
No ratings yet
DWM Unit 3 Final Notes
47 pages
CH 5
No ratings yet
CH 5
84 pages
Chapter-V CLASSIFICATION & CLUSTERING
No ratings yet
Chapter-V CLASSIFICATION & CLUSTERING
153 pages
Data Mining Real
No ratings yet
Data Mining Real
19 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Data Mining
No ratings yet
Data Mining
30 pages
Neural Network - Merged
No ratings yet
Neural Network - Merged
9 pages
08 - Classification - Decision Trees
No ratings yet
08 - Classification - Decision Trees
116 pages
overview_basics
No ratings yet
overview_basics
16 pages
Classification (Part II)
No ratings yet
Classification (Part II)
162 pages
UNIT 3 DM
No ratings yet
UNIT 3 DM
34 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
5.classification in AI - Unit 2
No ratings yet
5.classification in AI - Unit 2
5 pages
Unit 3
No ratings yet
Unit 3
16 pages
FAM PTT2
No ratings yet
FAM PTT2
17 pages
6 الى13 داتا ماينق
No ratings yet
6 الى13 داتا ماينق
19 pages
Classification Algorithm
No ratings yet
Classification Algorithm
51 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Data Structures Explained: A Practical Guide with Examples
From Everand
Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Viva Networking
No ratings yet
Viva Networking
10 pages
Project Viva-Voce Mca Tee1223
No ratings yet
Project Viva-Voce Mca Tee1223
2 pages
2 038 PDF
No ratings yet
2 038 PDF
11 pages
What Is Zig-Bee?: Wireless Standared Comparision
No ratings yet
What Is Zig-Bee?: Wireless Standared Comparision
7 pages
Information Storage Management (1)
No ratings yet
Information Storage Management (1)
41 pages
Conducting A Literature Review: Biographical Notes
No ratings yet
Conducting A Literature Review: Biographical Notes
14 pages
Ibm Db2 Mock Test1
No ratings yet
Ibm Db2 Mock Test1
9 pages
Unit 1 - Computer Essentials
No ratings yet
Unit 1 - Computer Essentials
8 pages
Cambridge IGCSE: 0417/12 Information and Communication Technology
No ratings yet
Cambridge IGCSE: 0417/12 Information and Communication Technology
16 pages
Oracle SQL 9i
No ratings yet
Oracle SQL 9i
76 pages
Convergence in Big Data Analytics
No ratings yet
Convergence in Big Data Analytics
5 pages
Joins and Keys: SQL Join
No ratings yet
Joins and Keys: SQL Join
5 pages
BPharm Career Data Analysis
No ratings yet
BPharm Career Data Analysis
3 pages
CH 01
No ratings yet
CH 01
32 pages
DAF Implementation Guide
No ratings yet
DAF Implementation Guide
36 pages
Cassandra Hadoop integration
No ratings yet
Cassandra Hadoop integration
2 pages
BI Security
No ratings yet
BI Security
40 pages
Paper Template AZA
No ratings yet
Paper Template AZA
3 pages
Kriteria SPLaSK - Tagging
No ratings yet
Kriteria SPLaSK - Tagging
52 pages
BARSUKOV, Stranstvovaniya Vasiliya Barskogo 1723-1747 v.1
No ratings yet
BARSUKOV, Stranstvovaniya Vasiliya Barskogo 1723-1747 v.1
545 pages
Downloading PDF File On Application Server - SAP Q&A
No ratings yet
Downloading PDF File On Application Server - SAP Q&A
3 pages
IBM QRadar SIEM For Security Intelligence - ScienceSoft
100% (1)
IBM QRadar SIEM For Security Intelligence - ScienceSoft
8 pages
BE COMP - Machine Learning
No ratings yet
BE COMP - Machine Learning
2 pages
What Is LogMiner
No ratings yet
What Is LogMiner
3 pages
Dama-Dmbok2 Framework: Patricia Cupoli Editor Susan Earley Production Editor Deborah Henderson Program Director
33% (3)
Dama-Dmbok2 Framework: Patricia Cupoli Editor Susan Earley Production Editor Deborah Henderson Program Director
27 pages
FDBM Ece-Tutorials, Assignments
No ratings yet
FDBM Ece-Tutorials, Assignments
9 pages
Nosql Databases: P.Krishna Reddy Iiit Hyderabad
No ratings yet
Nosql Databases: P.Krishna Reddy Iiit Hyderabad
30 pages
Digital Libraries
No ratings yet
Digital Libraries
13 pages
Marketing Doswiadczen A Budowanie Strategii Komunikacji Marki W Mediach Spolecznosciowych - 659559957
No ratings yet
Marketing Doswiadczen A Budowanie Strategii Komunikacji Marki W Mediach Spolecznosciowych - 659559957
9 pages
B6259G IBM Cognos Author Reports Advanced CourseDesc
No ratings yet
B6259G IBM Cognos Author Reports Advanced CourseDesc
2 pages
My Own Work
No ratings yet
My Own Work
14 pages
CORE (Research Service)
No ratings yet
CORE (Research Service)
5 pages
IT1 2017 Database Management System
No ratings yet
IT1 2017 Database Management System
6 pages
Project Spec
No ratings yet
Project Spec
5 pages

Viva Data Mining Lab

Uploaded by

Viva Data Mining Lab

Uploaded by

1. What is Data Mining?

2. What are the different tasks of Data Mining?

The following activities are carried out during data mining:

• Association Rule Discovery

• Sequential Pattern Discovery

3. Discuss the Life cycle of Data Mining projects?

The life cycle of Data mining projects:

Data understanding: Initial data collection and understand it.

Modeling: Select and apply data modeling techniques.

Evaluation: Evaluate model, decide on further deployment.

Deployment: Create a report, carry out actions based on new insights.

4. Explain the process of KDD?

Knowledge discovery from data consists of the following steps:

• Data cleaning (to remove noise or irrelevant data).

• Data integration (where multiple data sources may be combined).

8. Explain the Decision Tree Classifier?

• Decision trees are able to produce understandable rules.

• They are able to handle both numerical and categorical attributes.

• They are easy to understand.

• Decision trees can handle datasets that may have errors.

9. What are the advantages of a decision tree classifier?

Decision trees are able to produce understandable rules.

They are able to handle both numerical and categorical attributes.

They are easy to understand.

Decision trees can handle datasets that may have errors.

9. What are the advantages of a decision tree classifier?

• Decision trees are able to produce understandable rules.

• They are able to handle both numerical and categorical attributes.

• They are easy to understand.

12. What are Neural networks?

16. Define Clustering in Data Mining?

17. Write a difference between classification and clustering?[IMP]

Process of classifying the input Grouping the instances based on

It has labels so there is a need for

More complex as compared to Less complex as compared to

k-means clustering algorithm,

19. Name areas of applications of data mining?

• Data Mining Applications for Finance

18. What is Supervised and Unsupervised Learning?[TCS interview question]

19. Name areas of applications of data mining?

Data Mining Applications for Finance

Businesses Benefit from data mining

22. Differentiate Between Data Mining And Data Warehousing?

23.What is Data Purging?

24. What Are Cubes?

25.What are the differences between OLAP And OLTP?[IMP]

Consists of historical data from various Consists only of application-oriented

Application-oriented day-to-dayIt is subject-

It reveals a snapshot of present business It provides a multi-dimensional view of

The size of the data is relatively small as

28. Explain Over-fitting?

There are many disadvantages of an over-fitted decision tree:

• Over-fitted models are incorrect.

• They require the collection of unnecessary features.

32. Explain the Issues regarding Classification And Prediction?

Preparing the data for classification and prediction:

Comparing classification methods

32. Explain the Issues regarding Classification And Prediction?

Preparing the data for classification and prediction:

• Comparing classification methods

34. What is a machine learning-based approach to data mining?

35.What is the K-means algorithm?

You might also like