0% found this document useful (0 votes)
17 views5 pages

SHORT BA Mid 2

The document discusses various concepts and techniques in data analysis, including practical applications of multiple regression, definitions of least squares regression and K-Nearest Neighbor (KNN), and the principles of unsupervised learning. It also covers data cube aggregation, advantages of supervised learning, simulation processes, and the importance of what-if analysis. Additionally, it outlines the types of association rules in data mining, advantages and disadvantages of simulation techniques, and the differences between validation and verification.

Uploaded by

maninani0332
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views5 pages

SHORT BA Mid 2

The document discusses various concepts and techniques in data analysis, including practical applications of multiple regression, definitions of least squares regression and K-Nearest Neighbor (KNN), and the principles of unsupervised learning. It also covers data cube aggregation, advantages of supervised learning, simulation processes, and the importance of what-if analysis. Additionally, it outlines the types of association rules in data mining, advantages and disadvantages of simulation techniques, and the differences between validation and verification.

Uploaded by

maninani0332
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

1.

Write some examples of practical applications of multiple regression


analysis.
• Market segmentation
• Demand forecast

2. Define least squares regression.


A least squares regression line represents the relationship between variables in a
scatterplot. The procedure fits the line to the data points in a way that minimizes
the sum of the squared vertical distances between the line and the points. It is also
known as a line of best fit or a trend line.

3. Write down the assumptions for building good regression models.


4. What are the categorical independent variables?
5. Define K-Nearest Neighbor (KNN)?

• K-Nearest Neighbor (KNN) is an algorithm that classifies data based on


its proximity to other data. The basis for KNN is rooted in the assumption
that data points that are close to each other are more similar to each other
than other bits of data. This non-parametric, supervised technique is used
to predict the features of a group based on individual data points.

6. Write three applications of data exploration.


• Business Intelligence and Analytics
• Healthcare and Medicine
• Financial Sector
• E-commerce and Customer Experience
7. Explain the data cube aggregation with an example?
1. Data Cube Aggregation:
This technique is used to aggregate data in a simpler form. For example,
imagine the information you gathered for your analysis for the years 2012 to
2014, that data includes the revenue of your company every three months.
They involve you in the annual sales, rather than the quarterly average, so we
can summarize the data in such a way that the resulting data summarizes the
total sales per year instead of per quarter. It summarizes the data.

8. Outline the unsupervised learning?

Unsupervised learning is a branch of machine learning that deals with unlabeled


data. Unlike supervised learning, where the data is labeled with a specific
category or outcome, unsupervised learning algorithms are tasked with finding
patterns and relationships within the data without any prior knowledge of the
data’s meaning. This makes unsupervised learning a powerful tool for exploratory
data analysis, where the goal is to understand the underlying structure of the data.

9. What are the types of Association Rules in Data Mining?

Types of Association Rules in Data Mining

There are typically four different types of association rules in data mining.
They are

• Multi-relational association rules


• Generalized Association rule
• Interval Information Association Rules
• Quantitative Association Rules

10.Differentiate between an antecedent (if) and a consequent?

The same has been discussed in brief in this article.


An association rule has 2 parts:

• an antecedent (if) and


• a consequent (then)

An antecedent is something that’s found in data, and a consequent is


an item that is found in combination with the antecedent.
11.Write three Advantages of Supervised learning?
Advantages of Supervised learning
• Supervised learning allows collecting data and produces data output from
previous experiences.
• Helps to optimize performance criteria with the help of experience.
• Supervised machine learning helps to solve various types of real-world
computation problems.
• It performs classification and regression tasks.
• It allows estimating or mapping the result to a new sample.
• We have complete control over choosing the number of classes we want in
the training data.

12. Outline the advantages of data partitioning?


• Improve scalability
• Improve availability
• Improve performance

13.Define Simulation with real time example.


Define Simulation?
Simulation is the process of creating a model of a real world scenario for a variety
of reasons including education, preparing for an anticipated event or
troubleshooting a problem. The models used during a simulation might be real or
dramatized.

What is an example of a simulation?


A fire drill is an example of a simulation. It reenacts the real world scenario of a
fire in a building or an environment with the purpose of teaching appropriate
actions in the event a real fire is encountered.

14.Explain random number generation with an example.

Random Number Generation


At the hearth of any simulation model there is the capability of creating
numbers that mimic those we would expect in real life. In simulation modeling
we will assume that specific processes will be distributed according to a
specific random variable. For instance we will assume that an employee in a
donut shop takes a random time to serve customers distributed according to a
Normal random variable with mean μ and variance σ2. In order to then carry
out a simulation the computer will need to generate random serving times. This
corresponds to simulating number that are distributed according to a specific
distribution.
Let’s consider an example. Suppose you managed to generate two sequences of
numbers, say x1 and x2. Your objective is to simulate numbers from a Normal
distribution. The histograms of the two sequences are reported in
Figure 4.1 together with the estimated shape of the density. Clearly the
sequence x1 could be following a Normal distribution, since it is bell-shaped
and reasonably symmetric. On the other hand, the sequence x2 is not symmetric
at all and does not resembles the density of a Normal.
15.How ‘what-if analysis’ is useful in analytics?
By using What-If Analysis tools in Excel, you can use several different sets of
values in one or more formulas to explore all the various results.
For example, you can do what-If Analysis to build two budgets that each assumes
a certain level of revenue. Or, you can specify a result that you want a formula to
produce, and then determine what sets of values will produce that result. Excel
provides several different tools to help you perform the type of analysis that fits
your needs.

16.What are the advantages and disadvantages of simulation techniques?

Advantages of Simulation
• Control over Variables
• Risk-Free Environment
• Cost-Effective
Disadvantages of Simulation
• Accuracy and Validity
• Data Requirements
• Simplification of Realities
• Technical Skills Required
17.Write any two disadvantages of decision tree analysis. Disadvantages of
using a tree diagram as a decision-making tool
Rather than displaying real outcomes, decision trees only show patterns
connected with decisions. Because decision trees don’t provide information on
aspects like implementation, timeliness, and prices, more research may be needed
to figure out if a particular plan is viable.This type of model does not provide
insight into why certain events are likely while others are not, but it can be used
to develop prediction models that illustrate the chance of an event occurring in
certain situations.

18.What are three application areas of Monte Carlo simulation technique?


19.Differentiate between ‘Validation’ and ‘Verification’.
20.What are the two advantages of simulation?

You might also like