0% found this document useful (0 votes)

36 views20 pages

Dmda Mid 1

The document outlines the Three-Tier Architecture of a Data Warehouse, which consists of the Bottom Tier (data repository), Middle Tier (OLAP servers), and Top Tier (user interface tools). It also discusses the KDD process for extracting valuable information from large datasets, including steps like data cleaning, integration, selection, transformation, mining, evaluation, and representation. Additionally, it covers various data mining tasks such as anomaly detection, association rule learning, clustering, classification, regression, and summarization, as well as data preprocessing techniques to ensure data quality and accuracy.

Uploaded by

snishanthreddy312

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views20 pages

Dmda Mid 1

Uploaded by

snishanthreddy312

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

DMDA MID 1

LAQS

Q.1 Describe the 3-tier Architecture of the Data Warehouse with a neat
sketch
Three-Tier Data Warehouse Architecture
The Three-Tier Data Warehouse Architecture is the commonly used Data
Warehouse design to build a Data Warehouse by including the required Data
Warehouse Schema Model, the required OLAP server type, and the required
front-end tools for Reporting or Analysis purposes, which as the name suggests
contain three tiers such as Top tier, Bottom Tier and the Middle Tier that are
procedurally linked with one another from Bottom tier(data sources) through
Middle tier(OLAP servers) to the Top tier(Front-end tools).
Data Warehouse Architecture is the design based on which a Data Warehouse is
built, to accommodate the desired type of Data Warehouse Schema, user interface
application and database management system, for data organization and
repository structure. The type of Architecture is chosen based on the requirement
provided by the project team. Three-tier Data Warehouse Architecture is the
commonly used choice, due to its detailing in the structure. The three different
tiers here are termed as:
Top-Tier
Middle-Tier
Bottom-Tier
Each Tier can have different components based on the prerequisites presented by
the decision-makers of the project but are subject to the novelty of their respective
tier.
1. Bottom Tier
The Bottom Tier in the three-tier architecture of a data warehouse consists of the
Data Repository. The Data Repository is the storage space for the data extracted
from various data sources, which undergoes a series of activities as a part of the
ETL process. ETL stands for Extract, Transform and Load. As a preliminary
process, before the data is loaded into the repository, all the data relevant and
required are identified from several sources of the system. These data are then
cleaned up, to avoid repeating or junk data from its current storage units. The next
step is to transform all these data into a single format of storage. The final step of
ETL is to Load the data on the repository.
The storage type of the repository can be a relational database management
system or a multidimensional database management system. A relational database
system can hold simple relational data, whereas a multidimensional database
system can hold data with more than one dimension. Whenever the Repository
includes both relational and multidimensional database management systems,
there exists a metadata unit.
2. Middle Tier
The Middle tier here is the tier with the OLAP servers. The Data Warehouse can
have more than one OLAP server, and it can have more than one type of OLAP
server model as well, which depends on the volume of the data to be processed
and the type of data held in the bottom tier. There are three types of OLAP server
models, such as:
ROLAP
 Relational online analytical processing
MOLAP
 Multidimensional online analytical processing
HOLAP
 Hybrid online analytical processing

The Middle Tier acts as an intermediary component between the top tier and the
data repository, that is, the top tier and the bottom tier respectively. From the
user’s standpoint, the middle tier gives an idea about the conceptual outlook of
the database.

3. Top Tier

The Top Tier is a front-end layer, that is, the user interface that allows the user to
connect with the database systems. This user interface is usually a tool or an API
call, which is used to fetch the required data for Reporting, Analysis, and Data
Mining purposes. The type of tool depends purely on the form of outcome
expected. It could be a Reporting tool, an Analysis tool, a Query tool or a Data
mining tool.

The Top Tier must be uncomplicated in terms of usability. Only user-friendly

tools can give effective outcomes. Even when the bottom tier and middle tier are
designed with at most cautiousness and clarity, if the Top tier is enabled with a
bungling front-end tool, then the whole Data Warehouse Architecture can become
an utter failure. This makes the selection of the user interface/ front-end tool as
the Top Tier, which will serve as the face of the Data Warehouse system, a very
significant part of the Three-Tier Data Warehouse Architecture designing process.

Q.2 Explain KDD in detail with a neat diagram

KDD Process
KDD (Knowledge Discovery in Databases) is a process that involves the
extraction of useful, previously unknown, and potentially valuable information
from large datasets. The KDD process is iterative and it requires multiple
iterations of the above steps to extract accurate knowledge from the data. The
following steps are included in the KDD process:

Data Cleaning
Data cleaning is defined as the removal of noisy and irrelevant data from
collection.
1. Cleaning in case of Missing values.
2. Cleaning noisy data, where noise is a random or variance error.
3. Cleaning with Data discrepancy detection and Data transformation tools.

Data Integration
Data integration is defined as heterogeneous data from multiple sources
combined in a common source(DataWarehouse). Data integration using Data
Migration tools, Data Synchronization tools and ETL(Extract-Load-
Transformation) process.

Data Selection
Data selection is defined as the process where data relevant to the analysis is
decided and retrieved from the data collection. For this, we can use Neural
networks, Decision Trees, Naive Bayes, Clustering, and Regression methods.

Data Transformation
Data Transformation is defined as the process of transforming data into the
appropriate form required by the mining procedure. Data Transformation is a
two-step process:
1. Data Mapping: Assigning elements from source base to destination to
capture transformations.
2. Code generation: Creation of the actual transformation program.

Data Mining
Data mining is defined as techniques that are applied to extract patterns
potentially useful. It transforms task-relevant data into patterns and decides the
purpose of the model using classification or characterization.

Pattern Evaluation
Pattern Evaluation is defined as identifying strictly increasing patterns
representing knowledge based on given measures. It finds the interestingness
score of each pattern and uses summarization and Visualization to make data
understandable by the user.

Knowledge Representation
This involves presenting the results in a way that is meaningful and can be used
to make decisions.
Note: KDD is an iterative process where evaluation measures can be enhanced,
mining can be refined, and new data can be integrated and transformed to get
different and more appropriate results. Preprocessing of databases consists
of Data cleaning and Data Integration.

Q.3: Explain about various Data Mining Tasks with appropriate examples.

1. Anomaly Detection:
 Task Explanation: Anomaly detection involves identifying data
points that deviate significantly from the norm or expected
behaviour.
 Example: In network security, abnormal spikes in network traffic
might indicate a potential security breach or a denial-of-service
attack. Anomaly detection algorithms can flag these unusual patterns
for further investigation.
2. Association Rule Learning:
 Task Explanation: Association rule learning aims to discover
interesting relationships or patterns in large datasets.
 Example: In a retail setting, if customers who buy sunscreen also
tend to purchase beach towels, a store can use this association to
optimize product placements or create targeted promotions for those
items.
3. Clustering:
 Task Explanation: Clustering involves grouping similar data points
based on certain characteristics, without predefined categories.
 Example: In marketing, clustering can be applied to group
customers with similar purchasing behaviour. This can help
businesses tailor marketing strategies to each cluster's preferences,
increasing the effectiveness of targeted campaigns.
4. Classification:
 Task Explanation: Classification involves training a model to
categorize new data points into predefined classes based on existing
labelled data.
 Example: In healthcare, a classification model can be trained to
predict whether a patient is likely to develop a specific medical
condition based on features such as age, family history, and lifestyle
choices.
5. Regression:
 Task Explanation: Regression aims to find the relationship between
variables and predict a continuous outcome.
 Example: In finance, a regression model can be used to predict the
future value of a stock based on various factors such as historical
prices, market trends, and economic indicators.
6. Summarization:
 Task Explanation: Summarization involves presenting a condensed
version or visualization of the data to highlight key patterns or
trends.
 Example: In social media analytics, summarization might include
generating visualizations that show the overall sentiment of user
comments over time or creating reports that highlight the most
engaging posts based on likes and shares.
These data mining tasks collectively empower organizations to extract
meaningful insights, make informed decisions, and discover hidden patterns
within their data, leading to improved efficiency and strategic decision-
making.

Q.4 Explain Data Pre Processing Techniques in detail

Data preprocessing plays a crucial role in ensuring the quality of data and the
accuracy of the analysis results. The specific steps involved in data
preprocessing may vary depending on the nature of the data and the analysis
goals.
By performing these steps, the data mining process becomes more efficient
and the results become more accurate.
Preprocessing in Data Mining:
Data preprocessing is a data mining technique that is used to transform the
raw data into a useful and efficient format.
1. Data Cleaning:
The data can have many irrelevant and missing parts. To handle this part,
data cleaning is done. It involves handling missing data, noisy data etc.
 (a). Missing Data:
This situation arises when some data is missing in the data. It can be
handled in various ways.
Some of them are:
1. Ignore the tuples:
This approach is suitable only when the dataset we have is quite
large and multiple values are missing within a tuple.
2. Fill the Missing values:
There are various ways to do this task. You can choose to fill the
missing values manually, by attribute mean or the most probable
value.

 (b). Noisy Data:

Noisy data is meaningless data that can’t be interpreted by machines. It
can be generated due to faulty data collection, data entry errors etc. It can
be handled in the following ways :
1. Binning Method:
This method works on sorted data to smooth it. The whole data is
divided into segments of equal size and then various methods are
performed to complete the task. Each segment is handled
separately. One can replace all data in a segment by its mean or
boundary values can be used to complete the task.
2. Regression:
Here data can be made smooth by fitting it to a regression function.
The regression used may be linear (having one independent
variable) or multiple (having multiple independent variables).
3. Clustering:
This approach groups similar data in a cluster. The outliers may be
undetected or they will fall outside the clusters.

2. Data Transformation:
This step is taken to transform the data into appropriate forms suitable for
the mining process. This involves the following ways:
1. Normalization:
It is done to scale the data values in a specified range (-1.0 to 1.0 or 0.0 to
1.0)
2. Attribute Selection:
In this strategy, new attributes are constructed from the given set of
attributes to help the mining process.
3. Discretization:
This is done to replace the raw values of numeric attributes with interval
levels or conceptual levels.
4. Concept Hierarchy Generation:
Here attributes are converted from lower level to higher level in the
hierarchy. For Example-The attribute “city” can be converted to
“country”.

3. Data Reduction:
Data reduction is a crucial step in the data mining process that involves
reducing the size of the dataset while preserving the important information. This
is done to improve the efficiency of data analysis and to avoid overfitting of the
model. Some common steps involved in data reduction are:

 Feature Selection: This involves selecting a subset of relevant features

from the dataset. Feature selection is often performed to remove
irrelevant or redundant features from the dataset. It can be done using
various techniques such as correlation analysis, mutual information, and
principal component analysis (PCA).
 Feature Extraction: This involves transforming the data into a lower-
dimensional space while preserving the important information. Feature
extraction is often used when the original features are high-dimensional
and complex. It can be done using techniques such as PCA, linear
discriminant analysis (LDA), and non-negative matrix factorization
(NMF).
 Sampling: This involves selecting a subset of data points from the
dataset. Sampling is often used to reduce the size of the dataset while
preserving the important information. It can be done using techniques
such as random sampling, stratified sampling, and systematic sampling.
 Clustering: This involves grouping similar data points into clusters.
Clustering is often used to reduce the size of the dataset by replacing
similar data points with a representative centroid. It can be done using
techniques such as k-means, hierarchical clustering, and density-based
clustering.
 Compression: This involves compressing the dataset while preserving
the important information. Compression is often used to reduce the size
of the dataset for storage and transmission purposes. It can be done using
techniques such as wavelet compression, JPEG compression, and gzip
compression.
Q.5 Difference between OLTP and OLAP
MODULE 2:
Q.1 Demonstrate Missing Imputation methods in detail with examples
Missing data is defined as the values or data that is not stored (or not present) for
some variable/s in the given dataset. In Pandas, usually, missing values are
represented by NaN. It stands for Not a Number.
Types of Missing Values

Missing Completely At Random (MCAR)

In MCAR, the probability of data being missing is the same for all the
observations. In this case, there is no relationship between the missing data and
any other values observed or unobserved (the data which is not recorded) within
the given dataset. That is, missing values are completely independent of other
data. There is no pattern

Missing At Random (MAR)

MAR data means that the reason for missing values can be explained by variables
on which you have complete information, as there is some relationship between
the missing data and other values/data. In this case, the data is not missing for all
the observations. It is missing only within sub-samples of the data, and there is
some pattern in the missing values.

Missing Not At Random (MNAR)

Missing values depend on the unobserved data. If there is some structure/pattern
in missing data and other observed data can not explain it, then it is considered
to be Missing Not At Random (MNAR).
If the missing data does not fall under the MCAR or MAR, it can be categorized
as MNAR. It can happen due to the reluctance of people to provide the required
information. A specific group of respondents may not answer some questions in
a survey.

7 ways to handle missing values in the dataset:

1. Deleting Rows with missing values
2. Impute missing values for continuous variable
3. Impute missing values for categorical variable
4. Other Imputation Methods
5. Using Algorithms that support missing values
6. Prediction of missing values
7. Imputation using Deep Learning Library — Datawig
Deleting Rows with Missing Values:
One straightforward approach is to remove entire rows that contain missing
values. While this method ensures that there are no missing values left in the
dataset, it may lead to a significant loss of data, especially if many rows have
missing values.

Impute Missing Values for Continuous Variables:

For numerical (continuous) variables, missing values can be filled in using
various methods such as mean, median, or mode imputation. Choosing the
appropriate method depends on the distribution of the data. Mean imputation
involves replacing missing values with the mean of the available data, while
median imputation uses the median, and mode imputation uses the mode (most
frequent value).

Impute Missing Values for Categorical Variables:

Categorical variables can be imputed using the mode, i.e., replacing missing
values with the most frequently occurring category in the variable. This method
works well when the categorical variables have a clear dominant category.

Other Imputation Methods:

There are more advanced imputation techniques like k-nearest neighbors (KNN)
imputation, regression imputation, and interpolation methods. KNN imputation
involves finding 'k' nearest neighbors for a missing value and imputing it based
on the values from these neighbors. Regression imputation involves predicting
missing values using a regression model built on the non-missing data.
Interpolation methods estimate missing values based on the patterns observed in
the existing data points.

Using Algorithms that Support Missing Values:

Some machine learning algorithms, like Random Forests and XGBoost,
inherently handle missing values. These algorithms can make splits and decisions
based on the available data without requiring imputation. It's important to note
that not all algorithms can handle missing values, so choosing the right algorithm
is crucial.

Prediction of Missing Values:

Missing values can be predicted using machine learning models. For instance, a
regression model can be trained to predict missing continuous values, and a
classification model can be used for missing categorical values. Once the model
is trained, it can predict missing values based on the patterns found in the rest of
the data.

Imputation Using Deep Learning Library — Datawig:

Datawig is a deep learning library specifically designed for imputing missing
values. It can handle both numerical and categorical variables. Datawig uses
neural networks to learn patterns from the available data and predict missing
values accordingly. Deep learning techniques can capture complex relationships
in the data, making them useful for imputation tasks.

Q.2 Illustrate Data modeling techniques.

Data modeling is the process of creating a visual representation of the structure
and relationships within a dataset. There are several data modeling techniques,
each serving different purposes. Here are three commonly used techniques:
1. Entity-Relationship Diagrams (ERD):
 Purpose: Entity-Relationship diagrams depict the entities (objects
or concepts) in a system and the relationships between them.
 Components:
 Entities: Represented by rectangles, these are the objects or
concepts in the system (e.g., Customer, Product).
 Attributes: Characteristics or properties of entities (e.g.,
Customer Name, Product Price).
 Relationships: Connect entities and show how they are
related (e.g., "Customer purchases Product").
 Example: In a university database, you might have entities like
"Student," "Course," and "Professor," with relationships such as
"Student enrolls in Course" or "Professor teaches Course."
2. UML Class Diagrams:
 Purpose: Unified Modeling Language (UML) class diagrams are
used to model the structure of a system, particularly in object-
oriented programming.
 Components:
 Classes: Represented as rectangles, these are the blueprints
for objects in the system.
 Attributes: Properties or characteristics of classes.
 Methods: Functions or operations that can be performed by
objects of a class.
 Relationships: Connections between classes (e.g.,
association, inheritance).
 Example: In a banking system, you might have classes like
"Account," "Customer," and "Transaction," with relationships like
"Customer owns Account" and "Transaction involves Account."
3. Data Flow Diagrams (DFD):
 Purpose: Data Flow Diagrams illustrate the flow of data within a
system and the processes that transform or store that data.
 Components:
 Processes: Represented by circles or ovals, these are
activities or operations that manipulate data.
 Data Stores: Represented by rectangles, these are where data
is stored or retrieved.
 Data Flows: Represent the movement of data between
processes, data stores, and external entities.
 External Entities: Represent sources or destinations of data
outside the system.
 Example: In an online shopping system, you might have processes
like "Place Order," "Process Payment," and "Ship Product," with
data flows representing the movement of order details between
these processes and data stores.
These modeling techniques help stakeholders, including developers, analysts,
and business users, to better understand the structure and behavior of a system
or dataset, facilitating effective communication and decision-making during the
design and development process.
Q.3 What are the tools of Data Analytics in detail
There are numerous tools available for data analytics, catering to various needs
and preferences. Here's an overview of some widely used data analytics tools:
1. Python and R:
 Description: Python and R are versatile programming languages
commonly used for data analysis and statistical modeling. They
have extensive libraries and frameworks (e.g., Pandas, NumPy,
Scikit-Learn for Python; dplyr, ggplot2 for R) that support data
manipulation, visualization, and machine learning.
2. Microsoft Excel:
 Description: Excel is a spreadsheet tool widely used for basic data
analysis and visualization. It's user-friendly and suitable for small
to medium-sized datasets. Excel provides functions, charts, and
pivot tables for analysis.
3. SQL (Structured Query Language):
 Description: SQL is a standard language for managing and
manipulating relational databases. It is used to query, update, and
manage databases. Tools like MySQL, PostgreSQL, and Microsoft
SQL Server use SQL for data manipulation.
4. Tableau:
 Description: Tableau is a powerful data visualization tool that
allows users to create interactive and shareable dashboards. It can
connect to various data sources and supports real-time data
visualization.
5. Power BI:
 Description: Power BI, developed by Microsoft, is a business
analytics tool used for data visualization and sharing insights
across an organization. It connects to various data sources,
transforms data, and creates interactive reports and dashboards.
6. Google Analytics:
 Description: Google Analytics is a web analytics service that
tracks and reports website traffic. It provides insights into user
behavior, conversion tracking, and other website-related analytics.
7. KNIME:
 Description: KNIME is an open-source data analytics, reporting,
and integration platform. It allows users to visually design data
workflows, execute analysis processes, and integrate with various
data sources.
8. SAS (Statistical Analysis System):
 Description: SAS is a statistical software suite used for advanced
analytics, business intelligence, and data management. It provides
tools for data analysis, machine learning, and statistical modeling.
9. Jupyter Notebooks:
 Description: Jupyter Notebooks are open-source, web-based
interactive computing environments that support various
programming languages (such as Python, R, and Julia). They are
widely used for data analysis, visualization, and collaborative
work.
10.Alteryx:
 Description: Alteryx is a data blending and advanced analytics
platform that enables users to prepare, blend, and analyze data
without the need for extensive coding. It supports data cleansing,
blending, and predictive analytics.
11.IBM SPSS:
 Description: SPSS (Statistical Package for the Social Sciences) is
a statistical software package used for data analysis, data
management, and statistical modeling. It is widely used in social
science research and business analytics.
These tools cater to different aspects of the data analytics workflow, from data
preparation and cleaning to advanced statistical modeling and visualization. The
choice of tools depends on factors like the specific analytics needs, user
preferences, and the scale of the data being analyzed.

Q.4 How many types of Data Analytics and explain in detail

Data analytics encompasses various approaches to extracting meaningful
insights from data. There are generally four main types of data analytics, each
serving a specific purpose and level of complexity:
1. Descriptive Analytics:
 Purpose: Descriptive analytics involves summarizing historical
data to understand what has happened in the past. It focuses on
providing a clear picture of the current state of affairs and often
involves the use of key performance indicators (KPIs) and basic
statistical measures.
 Techniques:
 Aggregation: Summarizing and aggregating data to provide
overviews.
 Visualization: Representing data through charts, graphs, and
dashboards.
 Example: Generating monthly sales reports, creating pie charts to
represent market share.
2. Diagnostic Analytics:
 Purpose: Diagnostic analytics aims to identify the reasons why
certain events occurred by examining patterns and trends in
historical data. It involves digging deeper into data to understand
the root causes of specific outcomes.
 Techniques:
 Data Mining: Identifying patterns and correlations in large
datasets.
 Root Cause Analysis: Investigating factors contributing to
specific events.
 Example: Analyzing customer feedback to understand reasons for
a drop in product sales.
3. Predictive Analytics:
 Purpose: Predictive analytics involves forecasting future trends
and outcomes based on historical data. It leverages statistical
algorithms, machine learning models, and data mining techniques
to make predictions and inform decision-making.
 Techniques:
 Regression Analysis: Modeling relationships between
variables.
 Machine Learning Algorithms: Building models for
predictions.
 Example: Predicting customer churn, forecasting sales for the next
quarter.
4. Prescriptive Analytics:
 Purpose: Prescriptive analytics goes beyond predicting future
outcomes; it provides recommendations on what actions to take to
achieve desired results. It involves determining the best course of
action based on predictive models and business objectives.
 Techniques:
 Optimization Models: Finding the best solutions based on
constraints.
 Simulations: Testing different scenarios to identify optimal
outcomes.
 Example: Recommending marketing strategies to maximize
customer retention, optimizing supply chain operations.
These four types of analytics represent a progression from understanding
historical data to making informed decisions for the future. Organizations often
use a combination of these analytics types to gain a comprehensive view of their
data and drive strategic decision-making. The adoption of advanced
technologies, such as artificial intelligence and machine learning, has
significantly enhanced the capabilities of predictive and prescriptive analytics.
Q.5 Discuss the importance of business modelling
Business modeling is a crucial aspect of strategic planning and decision-making
within an organization. It involves creating visual representations and
frameworks that describe various aspects of a business, helping stakeholders to
understand, analyze, and improve different facets of operations. Here are key
reasons why business modeling is important:
1. Strategic Planning:
 Alignment of Objectives: Business modeling helps align
organizational objectives with strategies by providing a clear
representation of how different components of the business
contribute to overall goals.
 Risk Mitigation: Modeling allows organizations to simulate and
analyze potential scenarios, helping identify and mitigate risks
before implementation.
2. Process Optimization:
 Efficiency Improvement: Business modeling facilitates the
identification of inefficiencies in processes. By modeling current
workflows, organizations can pinpoint areas for improvement and
optimize operations for increased efficiency.
 Resource Allocation: Models can assist in understanding resource
dependencies and constraints, aiding in the optimal allocation of
resources across different business functions.
3. Change Management:
 Impact Assessment: When implementing changes, whether
technological, structural, or procedural, business models help
assess the potential impact on various aspects of the business. This
ensures a smoother transition and minimizes disruptions.
 Communication: Models serve as communication tools, making it
easier for stakeholders to understand and visualize proposed
changes, leading to better collaboration and acceptance.
4. Decision Support:
 Data-Driven Decision-Making: Business models often
incorporate data, providing a foundation for evidence-based
decision-making. This is particularly important in today's data-
driven business environment.
 Scenario Analysis: Models allow decision-makers to explore
different scenarios and evaluate potential outcomes, helping in
making informed and strategic decisions.
5. Resource Management:
 Budgeting and Planning: Business models play a crucial role in
budgeting and financial planning. They help organizations allocate
resources effectively and forecast financial implications of various
business strategies.
 Optimizing Workforce: Through modeling, organizations can
assess workforce requirements, skill gaps, and staffing needs,
aiding in workforce planning and development.
6. Continuous Improvement:
 Feedback Loop: Business models create a feedback loop for
continuous improvement. As organizations evolve, models can be
updated to reflect changes, ensuring that strategies and operations
stay aligned with business goals.
 Innovation: Modeling encourages innovation by providing a
structured framework for brainstorming and testing new ideas. It
supports the exploration of innovative solutions to challenges.
7. Communication and Collaboration:
 Stakeholder Alignment: Models provide a visual and easily
understandable representation of complex business structures and
processes, facilitating communication and alignment among
stakeholders.
 Cross-Functional Collaboration: Business modeling encourages
collaboration between different departments and teams, fostering a
holistic understanding of the organization.
In summary, business modeling is essential for strategic planning, process
optimization, decision support, and fostering continuous improvement within an
organization. It serves as a valuable tool for navigating the complexities of the
business environment and ensuring that resources are utilized effectively to
achieve business objectives.

DWDM B Tech Unit 1 Part-A
No ratings yet
DWDM B Tech Unit 1 Part-A
15 pages
DM HarshQuesAns
No ratings yet
DM HarshQuesAns
183 pages
DWDM 5 Unit Notes
No ratings yet
DWDM 5 Unit Notes
86 pages
Unit I DWDM
No ratings yet
Unit I DWDM
26 pages
Data Mining Important
No ratings yet
Data Mining Important
15 pages
Important Questions
No ratings yet
Important Questions
26 pages
Defining Data Mining and Data Warehouse
No ratings yet
Defining Data Mining and Data Warehouse
10 pages
DM 1
No ratings yet
DM 1
20 pages
DWDM Reference Notes
No ratings yet
DWDM Reference Notes
126 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
36 pages
UNIT-3 DATA MINING - Part1
No ratings yet
UNIT-3 DATA MINING - Part1
111 pages
01 Data Warehouse
No ratings yet
01 Data Warehouse
15 pages
Data Mining Unit-1
No ratings yet
Data Mining Unit-1
59 pages
Data Mining and Data Warehouse Study Material - Edited
No ratings yet
Data Mining and Data Warehouse Study Material - Edited
7 pages
Unit 1 DM
No ratings yet
Unit 1 DM
62 pages
Dmda Mid 1 Laqs
No ratings yet
Dmda Mid 1 Laqs
10 pages
Data Mining - Reference - 1
No ratings yet
Data Mining - Reference - 1
91 pages
1intro - Data Mining
No ratings yet
1intro - Data Mining
61 pages
Dwdm-Unit-1 R16
No ratings yet
Dwdm-Unit-1 R16
17 pages
Data Mining Questions 1st Unit
No ratings yet
Data Mining Questions 1st Unit
6 pages
Data Mining New
No ratings yet
Data Mining New
21 pages
FDTD Reference Guide PDF
100% (2)
FDTD Reference Guide PDF
409 pages
Current Trends
No ratings yet
Current Trends
35 pages
DWDM Notes - Unit 1
No ratings yet
DWDM Notes - Unit 1
26 pages
Topic 3 - Data Mining
No ratings yet
Topic 3 - Data Mining
37 pages
Unit III DWDM
No ratings yet
Unit III DWDM
113 pages
DWDM External
No ratings yet
DWDM External
30 pages
Data Mining Unit-I
No ratings yet
Data Mining Unit-I
11 pages
DW Assignment
No ratings yet
DW Assignment
6 pages
Assignment of DMDW kg11
No ratings yet
Assignment of DMDW kg11
17 pages
Data Mining
No ratings yet
Data Mining
26 pages
Engine Mechanical Text Section
100% (5)
Engine Mechanical Text Section
103 pages
Method of Testing To Determine Flow Resistance of HVAC Ducts and Fittings
100% (1)
Method of Testing To Determine Flow Resistance of HVAC Ducts and Fittings
6 pages
DWM (Data Warehousing and Mining) : By: Akatsuki
No ratings yet
DWM (Data Warehousing and Mining) : By: Akatsuki
12 pages
Wao
No ratings yet
Wao
9 pages
Datawarehouse & Data Mining
No ratings yet
Datawarehouse & Data Mining
59 pages
DWM Unit-I Notes
No ratings yet
DWM Unit-I Notes
10 pages
Unit 1 - Introduction
No ratings yet
Unit 1 - Introduction
8 pages
Data Mining Notes UNIT I
No ratings yet
Data Mining Notes UNIT I
21 pages
Great Compiled Notes Data Mining V1
No ratings yet
Great Compiled Notes Data Mining V1
92 pages
Data Mining Assignment
No ratings yet
Data Mining Assignment
11 pages
DWM 4
No ratings yet
DWM 4
23 pages
Unit-1 Notes
No ratings yet
Unit-1 Notes
24 pages
DATA MINING Unit 1
No ratings yet
DATA MINING Unit 1
22 pages
Unit-2 Introduction To Data Mining
100% (1)
Unit-2 Introduction To Data Mining
11 pages
Unit-2 Finalized
No ratings yet
Unit-2 Finalized
12 pages
Module 1
No ratings yet
Module 1
41 pages
Adbms Unit5
No ratings yet
Adbms Unit5
10 pages
DWDM
No ratings yet
DWDM
11 pages
Defining Data Mining and Data Warehouse (Adugna Gutema)
No ratings yet
Defining Data Mining and Data Warehouse (Adugna Gutema)
9 pages
Data Minng
No ratings yet
Data Minng
20 pages
Unit 1 DMDW
No ratings yet
Unit 1 DMDW
57 pages
UNIT-1 Introduction To Data Mining
No ratings yet
UNIT-1 Introduction To Data Mining
29 pages
D-Unit-1 R16
No ratings yet
D-Unit-1 R16
17 pages
What Motivated Data Mining? Why Is It Important?
No ratings yet
What Motivated Data Mining? Why Is It Important?
14 pages
2023 Msce Mock 2 Chemistry P1
100% (2)
2023 Msce Mock 2 Chemistry P1
12 pages
Data Mining Basics
No ratings yet
Data Mining Basics
20 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
29 pages
Transmission Powershift Hl750
100% (1)
Transmission Powershift Hl750
28 pages
Dmdw-Unit-1 R16
No ratings yet
Dmdw-Unit-1 R16
17 pages
Unit 1
No ratings yet
Unit 1
9 pages
Data Mining N Business Intelligence
No ratings yet
Data Mining N Business Intelligence
63 pages
FCE 412 Geotechnical Engineering IIB: Lecture Notes
No ratings yet
FCE 412 Geotechnical Engineering IIB: Lecture Notes
81 pages
DataWarehouseMining Complete Notes
No ratings yet
DataWarehouseMining Complete Notes
55 pages
Level I Final Branded Exam Information Document 2019
No ratings yet
Level I Final Branded Exam Information Document 2019
13 pages
Rectifier Report
No ratings yet
Rectifier Report
9 pages
JR RSM Two - Year PDF
No ratings yet
JR RSM Two - Year PDF
4 pages
ThermostatCatalog 570-280
0% (1)
ThermostatCatalog 570-280
12 pages
FY BCA Syllabus 2024 25
No ratings yet
FY BCA Syllabus 2024 25
54 pages
APDU Basic Commands
100% (1)
APDU Basic Commands
3 pages
Ukuran Bearing
100% (1)
Ukuran Bearing
32 pages
User Manual LS 300 RT-2016 PDF
No ratings yet
User Manual LS 300 RT-2016 PDF
36 pages
Topic 2 - A: F M M Multiphase Flow Tutorial Questions
No ratings yet
Topic 2 - A: F M M Multiphase Flow Tutorial Questions
7 pages
Common Mode Noise On Bob Smith Termination
No ratings yet
Common Mode Noise On Bob Smith Termination
15 pages
Lab.3 Industrial Pharmacy
No ratings yet
Lab.3 Industrial Pharmacy
4 pages
Quantum Riemannian Geometry Edwin Beggs Shahn Majid Instant Download
No ratings yet
Quantum Riemannian Geometry Edwin Beggs Shahn Majid Instant Download
37 pages
Handbook of Production Economics Subhash C. Ray (Editor)
100% (1)
Handbook of Production Economics Subhash C. Ray (Editor)
72 pages
S TN Mat 001
No ratings yet
S TN Mat 001
18 pages
WP Contentuploads2018102017 DSE MATH CP 1 1 PDF
No ratings yet
WP Contentuploads2018102017 DSE MATH CP 1 1 PDF
24 pages
Get Theory of Neural Information Processing Systems A. C. C. Coolen PDF Ebook With Full Chapters Now
No ratings yet
Get Theory of Neural Information Processing Systems A. C. C. Coolen PDF Ebook With Full Chapters Now
45 pages
Linear Verification For Spanning Trees 1
No ratings yet
Linear Verification For Spanning Trees 1
6 pages
Luck Is What Happens When Preparation Meets Opportunity. Seneca
No ratings yet
Luck Is What Happens When Preparation Meets Opportunity. Seneca
6 pages
An Intelligent Hair and Scalp Analysis System
No ratings yet
An Intelligent Hair and Scalp Analysis System
16 pages
Condensed Matter Theory I - WS14/15
No ratings yet
Condensed Matter Theory I - WS14/15
13 pages
2021 DSQ-40 Evaluation Students
No ratings yet
2021 DSQ-40 Evaluation Students
10 pages
Experiment - 5 Raymond Classifier: Name: Aman Agrawal Roll No:18CH30003
No ratings yet
Experiment - 5 Raymond Classifier: Name: Aman Agrawal Roll No:18CH30003
6 pages
Exercise 2.1 Page 143 PDF
No ratings yet
Exercise 2.1 Page 143 PDF
3 pages
Opde5 Second Order Part2
No ratings yet
Opde5 Second Order Part2
2 pages
Database Management System
From Everand
Database Management System
Manish Soni
No ratings yet
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet

Dmda Mid 1

Uploaded by

Dmda Mid 1

Uploaded by

DMDA MID 1

The Top Tier must be uncomplicated in terms of usability. Only user-friendly

Q.2 Explain KDD in detail with a neat diagram

Q.4 Explain Data Pre Processing Techniques in detail

 (b). Noisy Data:

 Feature Selection: This involves selecting a subset of relevant features

Missing Completely At Random (MCAR)

Missing At Random (MAR)

Missing Not At Random (MNAR)

7 ways to handle missing values in the dataset:

Impute Missing Values for Continuous Variables:

Impute Missing Values for Categorical Variables:

Other Imputation Methods:

Using Algorithms that Support Missing Values:

Prediction of Missing Values:

Imputation Using Deep Learning Library — Datawig:

Q.2 Illustrate Data modeling techniques.

Q.4 How many types of Data Analytics and explain in detail

You might also like