Task 1

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Name (LN, FN, MI):__GARCIA, KYLA MAE A.

________ Score: ________


Course/Year/Section: ____BSIT – 3A______________ Date: ________

IT 311 – Advanced Database System


TASK 1

1. Define Data mining. List out the steps in data mining.


Data mining, which is also referred to as Knowledge Discovery in Databases,
involves the extraction of patterns, trends and insights, from datasets usually stored
in databases. This process entails analyzing and interpreting volumes of data to
uncover patterns and relationships that can inform business decisions, predict future
outcomes and provide a competitive edge.
STEPS IN DATA MINING
 Data Cleaning – removing unrelated data from the collection.
 Data Integration – refers to the data gathered from multiple sources and
merged into a single repository.
 Data Selection – determining and retrieving data from the data collection
that's pertinent to the analysis.
 Data Transformation – turning data into suitable form required by mining
process.
 Data Mining – techniques for extracting valuable patterns that have the
potential to be useful.
 Pattern Evaluation – assessing the quality, relevance and usefulness of the
patterns by using predefined criteria or measures.
 Knowledge Representation – makes use of visualization tools to showcase the
outcomes of data mining.
2. Compare Discrete versus Continuous Attributes.
 Discrete Attribute is characterized by having a set of values that are either
finite or countably infinite. These values, often represented as integers or in
categorical form whereas Continuous Attribute have an infinite number of
states and are also of the float type. It represents a continuous range of
possible values and is frequently associated with measurements or quantities
3. Give the applications of Data Mining.
 Financial Data Analysis – Banking services includes loans, investments,
credits, debits, etc. It is generally reliable and of high quality, making
systematic data analysis and data mining possible.
 Retail Industry – It gathers a large amount of data from on sales,
consumers, goods, consumption, and service. It aids in understanding
customer purchasing patterns and trends, which leads to enhanced customer
quality and satisfaction.
 Telecommunication Industry – It is one of the most rapidly growing
industries, offering a wide range of services. This industry aids in identifying
telecommunication patterns, detecting fraudulent actions, making better use
of resources, and improving service quality.
 Biological Data Analysis – It deals with Genomics (Gene Study),
Proteomics (Protein Study), and Biomedical Research, also comparison and
identification of human genomes.
 Other Scientific Applications – Scientific domains (Geosciences,
Astronomy, Climate and Ecosystem Modelling, Chemical Engineering, Fluid
Dynamics, etc.)
 Intrusion Detection – Any set of actions that threaten the integrity,
confidentiality/availability of network resource.
4. Analyze the issues in Data Mining Techniques.
5. Generalize in detail about Numeric Attributes.
 Numeric attributes are a fundamental type of data attribute used in various
data analysis and machine learning applications. These attributes represent
measurable quantities and can take on a range of numerical values. Numeric
attributes are characterized by the following key features, the Interval-Scaled
and Ratio-Scaled.
6. Evaluate the major tasks of data preprocessing.
 Data Cleaning – filling in missing values, smoothing the noisy data, or
resolving the inconsistencies in the data.
 Data Integration – Data from several representations is combined, and
conflicts within the data are addressed.
 Data Transformation – Data is normalized, aggregated, and generalized.
 Data Reduction – The goal of this procedure is to give a simplified
representation of the data in a data warehouse.
 Data Discretization – Involves dividing the range of attribute intervals to
reduce a number of continuous attribute values.
7. Define an efficient procedure for cleaning the noisy data.
8. Distinguish between data similarity and dissimilarity.
 Data Similarity
- Numerical measure of how alike two data objects are.
- Value is higher when objects are more alike.
Often falls in the range [0,1].
 Data Dissimilarity
- Numerical measure of how different two data objects are.
- Values are lower when objects are more alike.
- Minimum dissimilarity is often 0.
- Upper limit varies.
9. Show the Displays of Basic Statistical Descriptions of Data.
 Measures of Central Tendency –The mean, median, and mode are the
primary measurements of central tendency that represent the value in a
dataset.

 Measures of Dispersion – These are range, variance, and standard


deviation. It helps one to determine the quality of data in an objectively
quantifiable manner.

 Frequency Distribution – It is a graphical or tabular representation that shows the


number of observations inside a specified interval.

10. Formulate what is data discretization.


 Data discretization is the process of transforming continuous data into
discrete or categorical values by dividing the data into intervals, making it
easier to analyze and categorize information.

You might also like