0% found this document useful (0 votes)

28 views5 pages

DM - Midsem - Question Bank

Uploaded by

Rudra Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views5 pages

DM - Midsem - Question Bank

Uploaded by

Rudra Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

03610335-Data Mining

Unit-1 Fundamentals of data mining

Q: What is data mining?

A: Data mining is the process of discovering patterns, trends, and insights from large datasets
to extract useful information for decision-making and predictive analysis.

Q: What is the history of data mining?

A: Data mining has its roots in the 1960s and 1970s when statisticians began using computers
to analyze data. The term "data mining" gained popularity in the 1990s as computational
power increased and businesses began to recognize the value of extracting insights from their
data.

Q: What are some strategies and techniques used in data mining?

A: Strategies in data mining include association rule mining, classification, clustering,

regression analysis, and anomaly detection. Techniques such as decision trees, neural
networks, genetic algorithms, and support vector machines are commonly employed for these
purposes.

Q: What are some applications of data mining?

A: Data mining is applied in various fields including business and marketing (customer
segmentation, market basket analysis), healthcare (disease prediction, patient outcome
analysis), finance (fraud detection, risk management), and science (genome analysis,
environmental monitoring).

Q: What are the challenges of data mining?

A: Challenges in data mining include dealing with large volumes of data (big data), ensuring
data quality and consistency, addressing privacy concerns, handling noisy and incomplete
data, and selecting appropriate algorithms for specific tasks.

Q: What is the future of data mining?

A: The future of data mining is likely to involve advancements in machine learning

algorithms, deep learning, and artificial intelligence. Integration with other technologies such
as IoT and blockchain may further enhance its capabilities for real-time analysis and
decision-making.

Q: What are the issues in the Knowledge Discovery in Databases (KDD) process?

A: Issues in the KDD process include data preprocessing (cleaning, integration,

transformation), selecting suitable data mining techniques, interpreting and evaluating results,
and deploying models into operational systems.

Q: What are the types of data used in data mining?

A: Types of data used in data mining include structured data (relational databases), semi-
structured data (XML, JSON), unstructured data (text documents, images), spatial data
(geographical information), temporal data (time series), and multimedia data (audio, video).

Q: What is database data in the context of data mining?

A: Database data refers to structured data stored in relational databases, typically organized in
tables with predefined schemas. This type of data is commonly used in data mining for
analysis and modeling purposes.

Q: What are data warehouses and how are they relevant to data mining?

A: Data warehouses are centralized repositories that store integrated and structured data from
various sources for reporting and analysis. They are relevant to data mining as they provide a
unified view of data, which facilitates the discovery of patterns and trends across different
data sources.

Q: What is transactional data in the context of data mining?

A: Transactional data refers to records of individual transactions or events, such as purchases,

interactions, or behaviors. Analyzing transactional data can reveal patterns, associations, and
trends that are useful for business intelligence and decision-making.

Q: What are some other kinds of data that can be used in data mining?

A: Other kinds of data used in data mining include textual data (documents, emails), sensor
data (from IoT devices), social media data (tweets, posts), biological data (DNA sequences),
and streaming data (real-time data feeds). These diverse types of data provide valuable
insights when analyzed using appropriate techniques.

Unit-2 Objects, Attributes, & Statistical Description of Data

Q: What is a data attribute?

A: A data attribute, also known as a feature or variable, is a characteristic or property of an

object or phenomenon that can be measured or observed. In data mining and statistics,
attributes are used to describe and analyze data.

Q: What are nominal attributes?

A: Nominal attributes are categorical variables that represent qualitative data without any
inherent order or ranking. Examples include colors, types of animals, or categories of
products.

Q: What are binary attributes?

A: Binary attributes are nominal attributes with only two possible values, typically
represented as 0 and 1 or as "yes" and "no". Examples include gender (male/female),
presence/absence of a characteristic, or true/false responses.
Q: What are ordinal attributes?

A: Ordinal attributes are categorical variables that have a natural order or ranking but with
uneven intervals between values. Examples include ratings (e.g., 1 to 5 stars), education
levels (e.g., high school, college, graduate), or socioeconomic status (e.g., low, medium,
high).

Q: What are numeric attributes?

A: Numeric attributes are variables that represent quantitative data and can take on numerical
values. They can be further classified into discrete and continuous attributes.

Q: What is the difference between discrete and continuous attributes?

A: Discrete attributes take on a finite or countable number of distinct values, while

continuous attributes can take on any value within a certain range. For example, the number
of children in a family is discrete, whereas temperature is continuous.

Q: What are mean, median, and mode?

A: Mean is the average value of a set of numbers calculated by summing all values and
dividing by the total count. Median is the middle value when the data is arranged in
ascending or descending order. Mode is the value that appears most frequently in a dataset.

Q: How do you measure the dispersion of data?

A: The dispersion of data refers to how spread out or clustered the values are around the
central tendency (mean, median, or mode). Common measures of dispersion include range,
quartiles, variance, and standard deviation.

Q: What is range as a measure of dispersion?

A: Range is the difference between the maximum and minimum values in a dataset. It
provides a simple measure of the spread of data but is sensitive to outliers.

Q: What are quartiles?

A: Quartiles divide a dataset into four equal parts, each containing 25% of the data. The first
quartile (Q1) is the value below which 25% of the data falls, the second quartile (Q2) is the
median, and the third quartile (Q3) is the value below which 75% of the data falls.

Q: What is variance?

A: Variance measures the average squared deviation of each data point from the mean. It
provides a measure of how much the values in a dataset vary from the mean.

Q: What is standard deviation?

A: Standard deviation is the square root of the variance and provides a measure of the
dispersion of data around the mean. It is commonly used because it is expressed in the same
units as the original data and is sensitive to outliers.

Unit-3 Data Preprocessing

Q: What is data preprocessing?
A: Data preprocessing is the initial step in the data mining process that involves cleaning,
transforming, and organizing raw data into a format suitable for analysis. It aims to improve the
quality of data and prepare it for modelling.

Q: What are the major tasks in data preprocessing?

A: The major tasks in data preprocessing include data cleaning, data integration, data
transformation, and data reduction. These tasks address issues such as missing values, noise,
inconsistencies, and redundancy in the data.

Q: What is data cleaning and why is it important?

A: Data cleaning is the process of identifying and correcting errors, inconsistencies, and missing
values in a dataset. It is important because clean data ensures the accuracy and reliability of analysis
results.

Q: What are some common issues addressed in data cleaning?

A: Common issues in data cleaning include handling missing values, dealing with noisy data (outliers
and errors), and resolving inconsistencies such as duplicate records or conflicting information.

Q: How is missing data handled in data cleaning?

A: Missing data can be handled by imputation techniques such as mean imputation (replacing
missing values with the mean of the variable), mode imputation (replacing missing values with the
most frequent value), or using predictive models to estimate missing values based on other
variables.

Q: What is data integration and why is it important?

A: Data integration is the process of combining data from multiple sources into a unified view. It is
important for eliminating redundancy, resolving inconsistencies, and providing a comprehensive
dataset for analysis.

Q: What is the entity identification problem in data integration?

A: The entity identification problem arises when different datasets use different identifiers or
formats to represent the same entities. Resolving this problem involves identifying corresponding
entities across datasets and merging them into a single entity.
Q: How is redundancy and correlation analysis conducted in data integration?
A: Redundancy and correlation analysis involves identifying redundant attributes or tuples in the
integrated dataset. This can be done by analyzing correlations between variables and removing
redundant information to simplify the dataset.

Q: What is tuple duplication and how is it addressed in data integration?

A: Tuple duplication occurs when the same record appears multiple times in a dataset, either due to
errors or intentional duplication. It is addressed by identifying and removing duplicate tuples to
ensure data integrity and accuracy.

Q: How is data value conflict detection and resolution performed in data integration?
A: Data value conflict detection involves identifying discrepancies or conflicts in data values across
different sources. Resolution methods may include using voting schemes, expert judgment, or
statistical methods to reconcile conflicting information and create a consistent dataset.

Data Mining Unit-1 Notes
No ratings yet
Data Mining Unit-1 Notes
18 pages
Cloud Computing Lab Manual Final
100% (1)
Cloud Computing Lab Manual Final
72 pages
Full and Correct Notes For FDS-6th Bca
No ratings yet
Full and Correct Notes For FDS-6th Bca
83 pages
DS Notes BCA
No ratings yet
DS Notes BCA
16 pages
Computer Awareness 1 PDF
No ratings yet
Computer Awareness 1 PDF
17 pages
Analyzing Information Using Ict
100% (3)
Analyzing Information Using Ict
13 pages
Week 2 - 3getting To Know Your Data
No ratings yet
Week 2 - 3getting To Know Your Data
67 pages
Yealink SIP-T2xP and SIP-T19P IP Phone Family Administrator Guide V72 25
No ratings yet
Yealink SIP-T2xP and SIP-T19P IP Phone Family Administrator Guide V72 25
506 pages
Soln 1
100% (1)
Soln 1
6 pages
Nptel Swayam DWDM Slides
No ratings yet
Nptel Swayam DWDM Slides
406 pages
DMW - Unit 1
No ratings yet
DMW - Unit 1
21 pages
Sri Venkateswara University: Tirupati: Department of Computer Science
No ratings yet
Sri Venkateswara University: Tirupati: Department of Computer Science
24 pages
DataMining S
No ratings yet
DataMining S
103 pages
Data Warehousing and Data Mining: DR Seema Agarwal
No ratings yet
Data Warehousing and Data Mining: DR Seema Agarwal
72 pages
Data Mining Notes UNIT I
No ratings yet
Data Mining Notes UNIT I
21 pages
Networking Notes Class 12th
No ratings yet
Networking Notes Class 12th
186 pages
Data Analytics 2marks PDF
100% (1)
Data Analytics 2marks PDF
13 pages
Table of Specifications (Tos) Epp 6 - Ict and Entrepreneurship - Quarter 1
100% (1)
Table of Specifications (Tos) Epp 6 - Ict and Entrepreneurship - Quarter 1
1 page
C - Notes (Data Planet)
No ratings yet
C - Notes (Data Planet)
142 pages
Recursion, As A Different Way of Solving Problems. Example Programs Such As Finding Factorial. Fibon
No ratings yet
Recursion, As A Different Way of Solving Problems. Example Programs Such As Finding Factorial. Fibon
11 pages
Fundamentals of Datascience
No ratings yet
Fundamentals of Datascience
80 pages
R21 DM Unit1
No ratings yet
R21 DM Unit1
77 pages
DWDM Reference Notes
No ratings yet
DWDM Reference Notes
126 pages
Privacy Tools v19.84 Secure Open List: Ubuntu Touch: Android Alternative For Phones and Tablets
No ratings yet
Privacy Tools v19.84 Secure Open List: Ubuntu Touch: Android Alternative For Phones and Tablets
84 pages
DMML Notes
No ratings yet
DMML Notes
89 pages
Whats App
No ratings yet
Whats App
23 pages
Unit 2
No ratings yet
Unit 2
37 pages
DM Unit-1-1
No ratings yet
DM Unit-1-1
56 pages
Unit I DATA MINING AAGAC
No ratings yet
Unit I DATA MINING AAGAC
27 pages
Vending Technology Revolution
No ratings yet
Vending Technology Revolution
11 pages
Down 2
No ratings yet
Down 2
61 pages
BDA Class1
No ratings yet
BDA Class1
33 pages
Data Mining Unit I Notes
No ratings yet
Data Mining Unit I Notes
29 pages
Data ch2
No ratings yet
Data ch2
16 pages
Chapter 2 - Tagged
No ratings yet
Chapter 2 - Tagged
66 pages
Unit I Notes
No ratings yet
Unit I Notes
23 pages
Unit 1
No ratings yet
Unit 1
21 pages
Data Mining 3
No ratings yet
Data Mining 3
31 pages
DM Question Bank
No ratings yet
DM Question Bank
50 pages
Data Mining Unit-1
No ratings yet
Data Mining Unit-1
59 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
18mca52c U1
No ratings yet
18mca52c U1
17 pages
Computer Training
No ratings yet
Computer Training
19 pages
ConVox CCS NRHM
No ratings yet
ConVox CCS NRHM
29 pages
Fundamentals of Datascience1
No ratings yet
Fundamentals of Datascience1
83 pages
Philips Audio Tips
No ratings yet
Philips Audio Tips
29 pages
SAP FICO Enterprise Structure
No ratings yet
SAP FICO Enterprise Structure
14 pages
Important Questions
No ratings yet
Important Questions
26 pages
Brocade SMI Agent 120.9.0 Release Notes v2.0: September 8, 2009
No ratings yet
Brocade SMI Agent 120.9.0 Release Notes v2.0: September 8, 2009
23 pages
Data Communications Standards by Tomasi
No ratings yet
Data Communications Standards by Tomasi
26 pages
ADET - Lesson 2
No ratings yet
ADET - Lesson 2
21 pages
Bi - Unit 3
No ratings yet
Bi - Unit 3
18 pages
Data Mining: Set-01: (Introduction)
No ratings yet
Data Mining: Set-01: (Introduction)
14 pages
Data Mining Report
No ratings yet
Data Mining Report
15 pages
HMI of Yarn Conditioning Plant
No ratings yet
HMI of Yarn Conditioning Plant
14 pages
Unit-II Notes
No ratings yet
Unit-II Notes
9 pages
Walchand Institute of Technology, Solapur: Direct Linking Loaders
No ratings yet
Walchand Institute of Technology, Solapur: Direct Linking Loaders
14 pages
Data Mining For Exam
No ratings yet
Data Mining For Exam
10 pages
DWM Q1-10 - 240426 - 090822
No ratings yet
DWM Q1-10 - 240426 - 090822
13 pages
Unit 1
No ratings yet
Unit 1
34 pages
Data Science Module 1 Notes
No ratings yet
Data Science Module 1 Notes
16 pages
Data Mining
No ratings yet
Data Mining
15 pages
Business Analytics and Data Mining Modeling Using R
No ratings yet
Business Analytics and Data Mining Modeling Using R
6 pages
Wa0005.
No ratings yet
Wa0005.
12 pages
Sheet No. Sheet Name: Hierarchical Block
No ratings yet
Sheet No. Sheet Name: Hierarchical Block
8 pages
DM QB Ans
No ratings yet
DM QB Ans
47 pages
Data Mining Unit-II
No ratings yet
Data Mining Unit-II
4 pages
Data Mining
No ratings yet
Data Mining
7 pages
Lect2 - Data Preprocessing
No ratings yet
Lect2 - Data Preprocessing
10 pages
Jonah Marindoque Balicoco SAS 21 Nursing InformaticsFINAL
No ratings yet
Jonah Marindoque Balicoco SAS 21 Nursing InformaticsFINAL
9 pages
Dwdmsem 6 QB
No ratings yet
Dwdmsem 6 QB
13 pages
Computer & Generations & AI
No ratings yet
Computer & Generations & AI
12 pages
Q.1. Why Is Data Preprocessing Required?
100% (1)
Q.1. Why Is Data Preprocessing Required?
26 pages
Add Copy File Name & Copy File Path Options To Context Menu in Windows
No ratings yet
Add Copy File Name & Copy File Path Options To Context Menu in Windows
5 pages
Module 1.
No ratings yet
Module 1.
7 pages
Fds Print
No ratings yet
Fds Print
7 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-1 (Lecture Note)
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-1 (Lecture Note)
2 pages
Data Mining
No ratings yet
Data Mining
5 pages
Data Analytics For Business-3 Marks
No ratings yet
Data Analytics For Business-3 Marks
5 pages
Gitam: Department of Computer Science & Engineering and Department of Information Technology
No ratings yet
Gitam: Department of Computer Science & Engineering and Department of Information Technology
5 pages
RCA 5-Why's Template
No ratings yet
RCA 5-Why's Template
2 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
3 pages
Radio Fire Alarm Control Panel: Protocol
No ratings yet
Radio Fire Alarm Control Panel: Protocol
2 pages
Simrank
No ratings yet
Simrank
2 pages
LTE Quick Reference: Idle Mode Procedure
No ratings yet
LTE Quick Reference: Idle Mode Procedure
1 page
AI Weekly Lesson Plan
No ratings yet
AI Weekly Lesson Plan
1 page
Data Insights: The Science of Data Analysis
From Everand
Data Insights: The Science of Data Analysis
Lexa N. Palmer
No ratings yet
Data Analytics
From Everand
Data Analytics
Jeffery Short
1/5 (1)
Get Hired as a Data Analyst FAST in 2024
From Everand
Get Hired as a Data Analyst FAST in 2024
Silas Meadowlark
No ratings yet
"Data Analysis" Basic Concepts and Applications
From Everand
"Data Analysis" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet

DM - Midsem - Question Bank

Uploaded by

DM - Midsem - Question Bank

Uploaded by

03610335-Data Mining

Unit-1 Fundamentals of data mining

Q: What is data mining?

Q: What is the history of data mining?

Q: What are some strategies and techniques used in data mining?

A: Strategies in data mining include association rule mining, classification, clustering,

Q: What are some applications of data mining?

Q: What are the challenges of data mining?

Q: What is the future of data mining?

A: The future of data mining is likely to involve advancements in machine learning

A: Issues in the KDD process include data preprocessing (cleaning, integration,

Q: What are the types of data used in data mining?

Q: What is database data in the context of data mining?

Q: What is transactional data in the context of data mining?

A: Transactional data refers to records of individual transactions or events, such as purchases,

Unit-2 Objects, Attributes, & Statistical Description of Data

Q: What is a data attribute?

A: A data attribute, also known as a feature or variable, is a characteristic or property of an

Q: What are nominal attributes?

Q: What are binary attributes?

Q: What are numeric attributes?

Q: What is the difference between discrete and continuous attributes?

A: Discrete attributes take on a finite or countable number of distinct values, while

Q: What are mean, median, and mode?

Q: How do you measure the dispersion of data?

Q: What is range as a measure of dispersion?

Q: What are quartiles?

Q: What is standard deviation?

Unit-3 Data Preprocessing

Q: What are the major tasks in data preprocessing?

Q: What is data cleaning and why is it important?

Q: What are some common issues addressed in data cleaning?

Q: How is missing data handled in data cleaning?

Q: What is data integration and why is it important?

Q: What is the entity identification problem in data integration?

Q: What is tuple duplication and how is it addressed in data integration?

You might also like