ADS CHP 6final
ADS CHP 6final
Classes of models
• There are several classes of mathematical models for
decision making, which in turn can
be solved by a number of alternative solution techniques.
• Each model class is better suited to represent certain types
of decision-making processes.
• In this section we will cover the main categories of
mathematical models for decision
making, including:
• Predictive models;
• Optimization models;
Predictive Models
• Predictive models;
• Optimization models;
Predictive Models
Classes of models
• There are several classes of mathematical models for
decision making, which in turn can
be solved by a number of alternative solution techniques.
• Each model class is better suited to represent certain types
of decision-making processes.
• In this section we will cover the main categories of
mathematical models for decision
making, including:
• Predictive models;
• Optimization models;
Predictive Models
Classes of models
• There are several classes of mathematical models for
decision making, which in turn can
be solved by a number of alternative solution techniques.
• Each model class is better suited to represent certain types
of decision-making processes.
• In this section we will cover the main categories of
mathematical models for decision
making, including:
• Predictive models;
• Optimization models;
Predictive Models
• Predictive models;
• Optimization models;
Predictive Models
Scientific Analysis:
Scientific simulations are generating bulks of data every day. This
includes data collected from nuclear laboratories, data about
human psychology, etc. Data mining techniques are capable of
the analysis of these data. Now we can capture and store more
new data faster than we can analyze the old data already
accumulated. Example of scientific analysis:
• Sequence analysis in bioinformatics
• Classification of astronomical objects
• Medical decision support.
Intrusion Detection:
A network intrusion refers to any unauthorized activity on a digital
network. Network intrusions often involve stealing valuable
network resources. Data mining technique plays a vital role in
searching intrusion detection, network attacks, and anomalies.
These techniques help in selecting and refining useful and
relevant information from large data sets. Data mining technique
helps in classify relevant data for Intrusion Detection System.
Intrusion Detection system generates alarms for the network
traffic about the foreign invasions in the system. For example:
• Detect security violations
• Misuse Detection
• Anomaly Detection
Business Transactions:
Every business industry is memorized for perpetuity. Such
transactions are usually time-related and can be inter-business
deals or intra-business operations. The effective and in-time use
of the data in a reasonable time frame for competitive decision-
making is definitely the most important problem to solve for
businesses that struggle to survive in a highly competitive world.
Data mining helps to analyze these business transactions and
identify marketing approaches and decision-making. Example :
• Direct mail targeting
• Stock trading
• Customer segmentation
• Churn prediction (Churn prediction is one of the most
popular Big Data use cases in business)
Market Basket Analysis:
Market Basket Analysis is a technique that gives the careful study
of purchases done by a customer in a supermarket. This concept
identifies the pattern of frequent purchase items by customers.
This analysis can help to promote deals, offers, sale by the
companies and data mining techniques helps to achieve this
analysis task. Example:
2. Code Check
A code check ensures that a field is selected from a valid list of
values or follows certain formatting rules. For example, it is easier
to verify that a postal code is valid by checking it against a list of
valid codes. The same concept can be applied to other items
such as country codes and NAICS industry codes.
3. Range Check
A range check will verify whether input data falls within a
predefined range. For example, latitude and longitude are
commonly used in geographic data. A latitude value should be
between -90 and 90, while a longitude value must be between -
180 and 180. Any values out of this range are invalid.
4. Format Check
Many data types follow a certain predefined format. A common
use case is date columns that are stored in a fixed format like
“YYYY-MM-DD” or “DD-MM-YYYY.” A data validation procedure
that ensures dates are in the proper format helps maintain
consistency across data and through time.
5. Consistency Check
A consistency check is a type of logical check that confirms the
data’s been entered in a logically consistent way. An example is
checking if the delivery date is after the shipping date for a parcel.
6. Uniqueness Check
Some data like IDs or e-mail addresses are unique by nature. A
database should likely have unique entries on these fields. A
uniqueness check ensures that an item is not entered multiple
times into a database.
Benefits of Data Validation
• Cost and Time Efficiency:
• Reduces errors and saves resources by ensuring clean
data upfront.
• Improved Decision-Making:
• High-quality data leads to better outcomes.
• Compatibility:
• Ensures data is ready for use in other processes or
systems.
Limitations of Data Validation
• Complexity in Large Datasets:
• Validation can be time-consuming for massive datasets.
• Synchronization Challenges:
• Outdated or unsynchronized databases may result in
inconsistencies.
• Confidence
Confidence indicates how often the rule has been found to be
true. Or how often the items X and Y occur together in the dataset
when the occurrence of X is already given. It is the ratio of the
transaction that contains X and Y to the number of records that
contain X.
• Lift
It is the strength of any rule, which can be defined as below
formula:
3. Prep Data. Clean and preprocess your collected data to ensure its
quality and suitability for analysis. This step involves tasks such as
removing duplicate or irrelevant records, handling missing values,
correcting inconsistencies, and transforming the data into a suitable
format.