0% found this document useful (0 votes)
159 views24 pages

1) Explain in Detail Drill Up & Drill Down Operations

Grouping and sorting data in reports helps structure information meaningfully. Grouping data by fields like teams, roles, or components displays records according to hierarchy. Sorting data alphabetically or numerically organizes it differently. For example, a report on planned work hours can group by user and sort to show the most hours at the top. This summarizes the key points on using grouping and sorting in reports.

Uploaded by

Rohit Aher
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
159 views24 pages

1) Explain in Detail Drill Up & Drill Down Operations

Grouping and sorting data in reports helps structure information meaningfully. Grouping data by fields like teams, roles, or components displays records according to hierarchy. Sorting data alphabetically or numerically organizes it differently. For example, a report on planned work hours can group by user and sort to show the most hours at the top. This summarizes the key points on using grouping and sorting in reports.

Uploaded by

Rohit Aher
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

1) Explain in detail drill up & drill down operations.

Drill Up
This one lets you go into higher levels of the data.
This operation you can meet as a part of pair drill up and drill down. Drill-up is an operation to
gather data from the cube either by ascending a concept hierarchy for a dimension or by
dimension reduction in order to receive measures at a less detailed granularity. So that to see a
broader perspective in compliance with the concept hierarchy a user has to group columns and
unite the values. As there are fewer specifics, one or more dimensions from the data cube will
be deleted, when this operation is run. In some sources drill up and roll up operations come as
synonyms, so this variant is also possible.
Example of a Drill-up or roll up operations example:
• Do you have any data that is based on a specific date? Then the level of groupings could
look something like this: Year, Month, Year’s Day, Quarter, and Hour
• Do you have any geographic information? The level might then look like this: Country,
Province/State, Postal Code, and City are all required fields.

Drill down
Drilling down means being able to go into lower levels of hierarchical data without the need to
change the graph.
Drill-down is an operation opposite to Drill-up. It is carried out either by descending a concept
hierarchy for a dimension or by adding a new dimension. It lets a user deploy highly detailed
data from a less detailed cube. Consequently, when the operation is run, one or more
dimensions from the data cube must be appended to provide more information elements.
Example of Drill down:
Do you have data on a daily basis? To see the most recent results, it makes sense to default
the dataset to show the bottom level. Drilling down becomes a system that allows the user to
easily see new results while exploring more trends.

2) Suggest the use of Data Grouping & Sorting, Filtering Reports

Filtering, Grouping and Sorting the Data in Reports


In order to get the most out of your Tempo reports, you can apply a range of tools to sort the
data. This includes applying filters, organizing the information into groups, sorting the data by
columns, and adding columns to show Jira fields and work attributes.
Tempo remembers all your view settings and preferences between sessions.
• Select Reports in the Tempo sidebar to create or open a saved report.

Filtering Data
Selecting filters for the reports allows you to see only the information you need. You can, for
example, select to view time logged on a specific project or all planned work for your team.
The Filter by box at the top of the report shows which filters are applied
To filter data in a report:
1. Click the Filter by box to display a list of filter options.
2. Select the data you want to include in the report.
• Use the search box to search for projects, teams, accounts, etc. To add a filter, select its
check-box. To remove a filter, clear the check-box or click the x beside its name in
the Filter by box.
• If you select to filter by issues, you can also choose to include sub-tasks.
• Click Back to return to the list of filters.

Grouping Data
Grouping data in your reports helps you to structure your information in a meaningful way.
The groups are displayed in the report according to Jira hierarchy. In order to avoid duplicating
data in its reports, Tempo assigns time record data to the groups that they have been added to
most recently. This logic applies to the following:
• Teams • Components
• Roles • Fix versions
This means that, if a time record is associated with an employee who is a member of multiple
teams, Tempo reports will place the time record under the team that the employee joined most
recently. Likewise, if a time record has multiple components, the one added most recently will
be reflected in the reports.
To group data in a report:
1. Click the Group by box to to display a list of possible choices. Select the groups you want
to add.
2. To remove a level of grouping, click Group by, and then click x to the right of the group
level.

Sorting Data
Sorting by alphabetical or numerical order allows you to organize and display your report's
data differently. You can sort a report by the data in a particular column by clicking that
column’s heading. This then sorts data according to that column’s ascending or descending
order: the text is sorted from A to Z, numerical data is sorted from highest to lowest, and
time/date data is sorted from earliest to latest.
• Up and down arrows next to a column name indicate that data is being sorted by that
column.
• To reverse the sort order, click the column heading a second time.
• In a report with multiple grouping levels, data is grouped by the top-level group. For
example, the report above shows the most number of planned hours at the top sorted by
user.
3) What is Charts? List different charts used. Discuss pie chart in details
Charts are an essential part of working with data, as they are a way to condense large amounts
of data into an easy to understand format. Visualizations of data can bring out insights to
someone looking at the data for the first time, as well as convey findings to others who won’t
see the raw data. There are countless chart types out there, each with different use cases. Often,
the most difficult part of creating a data visualization is figuring out which chart type is best for
the task at hand.

1) Line Charts 4) Columns Charts 7) Decomposition Tree


2) Bar Charts 5) Combo Charts 8) Funnel Charts
3) Area Charts 6) Pie Charts 9) Waterfall Charts

Pie Charts
When it comes to statistical types of graphs and charts, the pie chart (or the circle chart) has a
crucial place and meaning. It displays data and statistics in an easy-to-understand ‘pie-slice’
format and illustrates numerical proportion.
Each pie slice is relative to the size of a particular category in a given group as a whole. To say
it in another way, the pie chart brakes down a group into smaller pieces. It shows part-whole
relationships.
To make a pie chart, you need a list of categorical variables and numerical variables.
Pie Chart Uses:
▪ When you want to create and represent the composition of something.
▪ It is very useful for displaying nominal or ordinal categories of data.
▪ To show percentage or proportional data.
▪ When comparing areas of growth within a business such as profit.
▪ Pie charts work best for displaying data for 3 to 7 categories.
Example:
The pie chart below represents the proportion of types of transportation used by 1000 students to
go to their school.

Pie charts are widely used by data-driven


marketers for displaying marketing data.
4) What is a File Extension? Explain the structure of .CSV file.
File extensions are used by the operating system to identify what apps are associated with what file
types—in other words, what app opens when you double-click the file. For example, a file named
“awesome_picture.jpg” has the “jpg” file extension. When you open that file in Windows, for
example, the operating system looks for whatever app is associated with JPG files, opens that app,
and loads the file.

Types Of Extensions:

• DOC/DOCX: A Microsoft Word document. DOC was the original extension used for Word
documents, but Microsoft changed the format when Word 2007 debuted. Word documents are
now based on the XML format, hence the addition of the “X” at the end of the extension.
• XLS/XLSX: A Microsoft Excel spreadsheet.
• PNG: Portable Network Graphics, a lossless image file format.
• HTM/HTML: The HyperText Markup Language format for creating web pages online.
• PDF: The Portable Document Format originated by Adobe, and is used to maintain formatting
in distributed documents.
• EXE: An executable format used for programs you can run.

.CSV file
A CSV is a comma-separated values file, which allows data to be saved in a tabular format. CSVs
look like a garden-variety spreadsheet but with a .csv extension.
CSV files can be used with most any spreadsheet program, such as Microsoft Excel or Google
Spreadsheets. They differ from other spreadsheet file types because you can only have a single
sheet in a file, they can not save cell, column, or row. Also, you cannot not save formulas in this
format.
How do I save CSV files?
Saving CSV files is relatively easy, you just need to know where to change the file type.
Under the "File name" section in the "Save As" tab, you can select "Save as type" and change it to
"CSV (Comma delimited) (*.csv). Once that option is selected, you are on your way to quicker and
easier data organization. This should be the same for both Apple and Microsoft operating systems.
CSV File Format
Usually the first line in a CSV file contains the table column labels. Each of the subsequent lines
represent a row of the table. Commas separate each cell in the row, which is where the name comes
from.
Here is an example of a CSV file. The example has three columns, labeled 'name', 'id', and 'food'.
It has five rows including the header row.
name, id, favorite food
quincy, 1, hot dogs
beau, 2, cereal
abbey, 3, pizza
mrugesh, 4, ice cream
5) How Data Grouping & Sorting is useful in reporting. Justify with
suitable example.
Grouping Data:
After designing the basic layout, you may decide that grouping the records by certain fields or
other criteria would make the report easier to read. Grouping allows you to separate groups of
records visually and display introductory and summary data for each group. The group break is
based on a grouping expression. This expression is usually based on one or more recordset fields
but it can be as complex as you like.
You can group the data in your reports using the C1ReportDesigner application or using code

Sorting Data:
You can sort data in reports the following two ways:
• Sort the data source object itself (for example, using a SQL statement with an ORDER BY
clause).
• Add groups to the report and specify how each group should be sorted using the
group's GroupBy and Sort properties.
Group sorting is done using the DataView.Sort property, which takes a list of column names
only (not expressions on column names). So if your grouping expression is DatePart("yyyy",
dateColumn), the control will actually sort on the dates in the dateColumn field, not on the years
of those dates as most would expect.
To sort based on the dates, add a calculated column to the data table (by changing the SQL
statement), and then group/sort on the calculated column instead. See the Sort property for an
XML discussion of this, including a sample.

6) State different types of reports with their application


List report lists data in rows and columns, where columns typically contain data items but may
also contain a variety of layout objects, such as images or URL's. It is usually used when reporting
transaction or definition details, such as customer lists and invoice lines.
Crosstab report shows data in rows and columns in a pivot table fashion. The values at the
intersection points of rows and columns are calculated and provide summarized information, such
as sum, average, standard deviation, percentile, etc. Crosstabs are useful for displaying large
amounts of precise data in a tabular, summarized format.
Map report presents data geographically. For instance, map charts can be used to depict the level
of revenue generated by divisions around the world.
Chart report visualizes data graphically. There is a large number of chart types available in
Cognos 8. Charts are useful for identifying relationships between items, comparing data and
understanding trends.

Financial report provides a template for creating such financial reports as profit/loss statements,
customer profitability records and balance sheets.
7) What is Data Reduction? Explain Types of Data Reduction in detail

Data reduction
Data reduction is a technique used in data mining to reduce the size of a dataset while still
preserving the most important information. This can be beneficial in situations where the
dataset is too large to be processed efficiently, or where the dataset contains a large amount of
irrelevant or redundant information.
When facing a large dataset it is also appropriate to reduce its size, in order to make learning
algorithms more efficient, without sacrificing the quality of the results obtained.
There are three main criteria to determine whether a data reduction technique should be used:
▪ Efficiency. The application of learning algorithms to a dataset smaller than the original one
usually means a shorter computation time.
▪ Accuracy. Data reduction techniques should not significantly compromise the accuracy of the
model generated.
▪ Simplicity. it is important that the models generated be easily translated into simple rules that
can be understood by experts in the application domain.

There are several different data reduction techniques that can be used in data mining,
including:
1. Data Sampling: This technique involves selecting a subset of the data to work with, rather
than using the entire dataset. This can be useful for reducing the size of a dataset while still
preserving the overall trends and patterns in the data.
2. Dimensionality Reduction: This technique involves reducing the number of features in the
dataset, either by removing features that are not relevant or by combining multiple features
into a single feature.
3. Data Compression: This technique involves using techniques such as lossy or lossless
compression to reduce the size of a dataset.
4. Data Discretization: This technique involves converting continuous data into discrete data
by partitioning the range of possible values into intervals or bins.
5. Feature Selection: This technique involves selecting a subset of features from the dataset
that are most relevant to the task at hand.

Data reduction can be pursued in three distinct directions


▪ reduction in the number of observations through sampling
▪ reduction in the number of attributes through selection and projection
▪ reduction in the number of values through discretization and aggregation
8) Difference between Univariate, Bivariate, Multivariate analysis.
Univariate Analysis
Univariate analysis is the simplest form of data analysis where the data being analyzed contains
only one variable. Since it's a single variable it doesn’t deal with causes or relationships. The
main purpose of univariate analysis is to describe the data and find patterns that exist within it.
You can think of the variable as a category that your data falls into. One example of a variable in
univariate analysis might be "age". Another might be "height". Univariate analysis would not
look at these two variables at the same time, nor would it look at the relationship between them.
Some ways you can describe patterns found in univariate data include looking at mean, mode,
median, range, variance, maximum, minimum, quartiles, and standard deviation. Additionally,
some ways you may display univariate data include frequency distribution tables, bar charts,
histograms, frequency polygons, and pie charts.

Bivarate Analysis
Bivariate analysis is used to find out if there is a relationship between two different variables.
Something as simple as creating a scatterplot by plotting one variable against another on a
Cartesian plane (think X and Y axis) can sometimes give you a picture of what the data is trying
to tell you. If the data seems to fit a line or curve then there is a relationship or correlation between
the two variables. For example, one might choose to plot caloric intake versus weight.

Multivariate Analysis
Multivariate analysis is the analysis of three or more variables. There are many ways to perform
multivariate analysis depending on your goals. Some of these methods include:
▪ Additive Tree ▪ MANOVA
▪ Canonical Correlation Analysis ▪ Multidimensional Scaling
▪ Cluster Analysis ▪ Multiple Regression Analysis
▪ Factor Analysis ▪ Partial Least Square Regression
▪ Generalized Procrustean Analysis ▪ Redundancy Analysis.
▪ Correspondence Analysis / Multiple Correspondence Analysis
▪ Principal Component Analysis / Regression / PARAFAC

9) Explain following Data reduction techniques: Sampling, Feature selection, Principal


component analysis.
Sampling
Reduction in the size of the original dataset can be achieved by extracting a sample of
observations that is significant from a statistical standpoint.
Sampling is used in preliminary investigation as well as final analysis of data. Sampling is
important in data mining as processing the entire data set is expensive and time consuming.
Types of Sampling:
a) Simple random sampling: There is an equal probability of selecting any particular item.
b) Sampling without replacement: As each item is selected it is removed from the population.
c) Sampling with replacement: The object selected for the sample is not removed from the
population. Inthis technique the same object may be selected multiple times.
d) Stratified sampling: The data is split into partitions and samples are drawn from each partition
randomly.

Feature selection
The purpose of feature selection, also called feature reduction, is to eliminate from the dataset
a subset of variables which are not deemed relevant for the purpose of the data mining activities.
Feature selection methods can be classified into three main categories: filter methods, wrapper
methods and embedded methods
▪ Filter methods. Filter methods select the relevant attributes before moving on to the
subsequent learning phase, and are therefore independent of the specific algorithm being used.
▪ Wrapper methods : use of a wrapper method is for attribute selection.
▪ Embedded methods. For the embedded methods, the attribute selection process lies inside the
learning algorithm, so that the selection of the optimal set of attributes is directly made during
the phase of model generation.
In particular, three distinct myopic search schemes can be followed: forward, backward and
forward–backward search.
▪ Forward. According to the forward search scheme, also referred to as bottom-up search(from
start index to last index)
▪ Backward. The backward search scheme, also referred to as top-down search (from last
indexes to first index)
▪ Forward–backward. The forward–backward method represents a trade-off between forward
and backward search(stops when appropriate value is found)

Principal component analysis


Principal component analysis (PCA) is the most widely known technique of attribute reduction
by means of projection. Generally speaking, the purpose of this method is to obtain a projective
transformation that replaces a subset of the original numerical attributes with a lower number of
new attributes obtained as their linear combination, without this change causing a loss of
information. PCA is mostly used as a tool in exploratory data analysis and for making predictive
models. It is often used to visualize genetic distance and relatedness between populations.

▪ Application of PCA o Quantitative finance (principal component analysis can be directly


applied to the risk management of interest rate derivative portfolios)
Neuroscience(principal components analysis is used in neuroscience to identify the specific
properties of a stimulus that increase a neuron's probability)
10)What is a Contingency Table? What is Marginal Distribution? Justify with suitable
example
Contingency tables are used to evaluate the interaction of statistics from two different
categorical variables. They are often used to organize data from different random variables in
preparation for a contingency test (which we will be discussing further in the next lesson).
Contingency tables are sometimes called two-way tables because they are organized with the
outputs of one variable across the top, and another down the side. Consider the table below:

Male Female

Chocolate Candy 42 77

Fruit Candy 58 23
This is a contingency table comparing the variable ‘Gender’ with the variable ‘Candy
Preference’. You can see that, across the top of the table are the two gender options for this
particular study: ‘male students’ and ‘female students’. Down the left side are the two candy
preference options: ‘chocolate’ and ‘fruit’. The data in the center of the table indicates the
reported candy preferences of the 100 students polled during the study.

Example Contingency Table


The contingency table example below displays computer sales at our fictional store.
Specifically, it describes sales frequencies by the customer’s gender and the type of computer
purchased. It is a two-way table (2 X 2). I cover the naming conventions at the end.

In this contingency table, columns represent computer types and rows represent genders. Cell
values are frequencies for each combination of gender and computer type. Totals are in the
margins. Notice the grand total in the bottom-right margin.

Marginal Distribution
These distributions represent the frequency distribution of one categorical variable without
regard for other variables. Unsurprisingly, you can find these distributions in the margins of a
contingency table.
The following marginal distribution examples correspond to the blue highlights.
For example, the marginal distribution of gender without considering computer type is the
following:
Males: 106
Females: 117
Alternatively, the marginal distribution of computer types is the following:
PC: 96
Mac: 127

11)Write a short note on data discretization.


Data discretization is the primary reduction method. On the one hand, it reduces continuous
attributes to categorical attributes characterized by a limited number of distinct values.
The weekly spending of a mobile phone customer is a continuous numerical value, which might
be discretized into, say, five classes:
low, [0 − 10) euros; medium low, [10 − 20) euros; medium, [20 − 30) euros; medium high, [30 −
40) euros; and high, over 40 euros.

Discretization techniques
▪ Subjective subdivision. Subjective subdivision is the most popular and intuitive method.
Classes are defined based on the experience and judgment of experts in the application domain.
▪ Subdivision into classes. subdivision can be based on classes of equal size or equal width.
▪ Hierarchical discretization. discretization is based on hierarchical relationships between
concepts and may be applied to categorical attributes, just as for the hierarchical relationships
between provinces and regions.
12) Write Short note on Logistic Regression
Logistic regression is used for predicting the categorical dependent variable using a given set of
independent variables.
Logistic regression predicts the output of a categorical dependent variable. Therefore the outcome
must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or False, etc. but
instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie between 0
and 1. Logistic Regression is much similar to the Linear Regression except that how they are
used. Linear Regression is used for solving Regression problems, whereas Logistic regression is
used for solving the classification problems.
Logistic Regression Equation:
We use logistic function or sigmoid function to calculate probability in logistic regression. The
logistic function is a simple S-shaped curve used to convert data into a value between 0 and 1.

Type of Logistic Regression:


o Binomial: In binomial Logistic regression, there can be only two possible types of the
dependent variables, such as 0 or 1, Pass or Fail, etc.
o Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered
types of the dependent variable, such as "cat", "dogs", or "sheep"
o Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of
dependent variables, such as "low", "Medium", or "High".

Use logistic regression


Logistic regression is used to solve classification problems, and the most common use case
is binary logistic regression, where the outcome is binary (yes or no). In the real world, you can
see logistic regression applied across multiple areas and fields.

• In health care, logistic regression can be used to predict if a tumor is likely to be benign or
malignant.
• In the financial industry, logistic regression can be used to predict if a transaction is fraudulent
or not.
• In marketing, logistic regression can be used to predict if a targeted audience will respond or
not.
13)Discuss the classification evaluation model using Confusion matrix, Recall,
Precision and Accuracy

Confusion matrix
It is a matrix of size 2×2 for binary classification with actual values on one axis and predicted on
another.

Terms in the confusion matrix: true positive, true negative, false negative, and false
positive with an example.

EXAMPLE
A machine learning model is trained to predict tumor in patients. The test dataset consists of 100
people.

Confusion Matrix for tumour detection


True Positive (TP) — model correctly predicts the positive class (prediction and actual both
are positive). In the above example, 10 people who have tumors are predicted positively by the
model.
True Negative (TN) — model correctly predicts the negative class (prediction and actual both
are negative). In the above example, 60 people who don’t have tumors are predicted negatively
by the model.
False Positive (FP) — model gives the wrong prediction of the negative class (predicted-
positive, actual-negative). In the above example, 22 people are predicted as positive of having a
tumor, although they don’t have a tumor. FP is also called a TYPE I error.
False Negative (FN) — model wrongly predicts the positive class (predicted-negative, actual-
positive). In the above example, 8 people who have tumors are predicted as negative. FN is
also called a TYPE II error.
With the help of these four values, we can calculate True Positive Rate (TPR), False Negative
Rate (FPR), True Negative Rate (TNR), and False Negative Rate (FNR).
Precision, Recall
Both precision and recall are crucial for information retrieval, where positive class mattered the
most as compared to negative. Why?
While searching something on the web, the model does not care about
something irrelevant and not retrieved (this is the true negative case). Therefore only TP, FP,
FN are used in Precision and Recall.
Precision
Out of all the positive predicted, what percentage is truly positive.

The precision value lies between 0 and 1.


Recall
Out of the total positive, what percentage are predicted positive. It is the same as TPR (true
positive rate).

EXAMPLE 1 — Spam detection

In the detection of spam mail, it is okay if any spam mail remains undetected (false negative),
but what if we miss any critical mail because it is classified as spam (false positive). In this
situation, False Positive should be as low as possible. Here, precision is more vital as compared
to recall.
Accuracy:

Accuracy represents the number of correctly classified data instances over the total number of
data instances.
14) How does Clustering differ from Classification? Discuss K Mean Partitioning
method with a suitable method.

Classification Clustering

Classification is a supervised learning approach Clustering is an unsupervised learning approach


where a specific label is provided to the where grouping is done on similarities basis.
machine to classify new observations. Here the
machine needs proper testing and training for
the label verification.

Supervised learning approach. Unsupervised learning approach.

It uses a training dataset. It does not use a training dataset.

It uses algorithms to categorize the new data as It uses statistical concepts in which the data set is
per the observations of the training set. divided into subsets with the same features.

In classification, there are labels for training In clustering, there are no labels for training data.
data.

Its objective is to find which class a new object Its objective is to group a set of objects to find
belongs to form the set of predefined classes. whether there is any relationship between them.

It is more complex as compared to clustering. It is less complex as compared to clustering.

What is K-Means Algorithm?


K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset
into different clusters. Here K defines the number of pre-defined clusters that need to be created
in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and
so on.
It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a
way that each dataset belongs only one group that has similar properties.
It allows us to cluster the data into different groups and a convenient way to discover the
categories of groups in the unlabeled dataset on its own without the need for any training.

The k-means clustering algorithm mainly performs two tasks:


o Determines the best value for K center points or centroids by an iterative process.
o Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.

Hence each cluster has datapoints with some commonalities, and it is away from other clusters.
The below diagram explains the working of the K-means Clustering Algorithm:

How does the K-Means Algorithm Work?


Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid
of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.

15) What are association rules? How to evaluate them using Support and
Confidence? Explain with Example
Association Rule
Association rule mining finds interesting associations and relationships among large sets of
data items. This rule shows how frequently a itemset occurs in a transaction.
A typical example is a Market Based Analysis.
Market Based Analysis is one of the key techniques used by large relations to show
associations between items.It allows retailers to identify relationships between the items that
people buy together frequently.
It is employed in Market Basket analysis, Web usage mining, continuous production, etc. Here
market basket analysis is a technique used by the various big retailer to discover the associations
between items. We can understand it by taking an example of a supermarket, as in a supermarket,
all products that are purchased together are put together.

For example, if a customer buys bread, he most likely can also buy butter, eggs, or milk, so these
products are stored within a shelf or mostly nearby. Consider the below diagram:
Association rule learning can be divided into three types of algorithms:
1. Apriori 3. F-P Growth Algorithm
2. Eclat
How does Association Rule Learning work?
Association rule learning works on the concept of If and Else Statement, such as if A then B.

Here the If element is called antecedent, and then statement is called as Consequent. These
types of relationships where we can find out some association or relation between two items is
known as single cardinality. It is all about creating rules, and if the number of items increases,
then cardinality also increases accordingly. So, to measure the associations between thousands
of data items, there are several metrics. These metrics are given below:

Support
Support is the frequency of A or how frequently an item appears in the dataset. It is defined as
the fraction of the transaction T that contains the itemset X. If there are X datasets, then for
transactions T, it can be written as:

Confidence
Confidence indicates how often the rule has been found to be true. Or how often the items X and
Y occur together in the dataset when the occurrence of X is already given. It is the ratio of the
transaction that contains X and Y to the number of records that contain X.

16)Explain the Use of WEKA tool in Business Intelligence

Weka
Weka contains a collection of visualization tools and algorithms for data analysis and predictive
modelling, together with graphical user interfaces for easy access to these functions. The original
non-Java version of Weka was a Tcl/Tk front-end to (mostly third-party) modelling algorithms
implemented in other programming languages, plus data preprocessing utilities in C and a
makefile-based system for running machine learning experiments.

Weka has the following advantages, such as:


o Free availability under the GNU General Public License.
o Portability, since it is fully implemented in the Java programming language and thus runs on
almost any modern computing platform.
o A comprehensive collection of data preprocessing and modelling techniques.
o Ease of use due to its graphical user interfaces.

Weka supports several standard data mining tasks, specifically, data preprocessing, clustering,
classification, regression, visualization, and feature selection. Input to Weka is expected to be
formatted according to the Attribute-Relational File Format and filename with the .arff extension.

Features of Weka
1. Preprocess
The preprocessing of data is a crucial task in data mining. Because most of the data is raw, there
are chances that it may contain empty or duplicate values, have garbage values, outliers, extra
columns, or have a different naming convention. All these things degrade the results.
To make data cleaner, better and comprehensive, WEKA comes up with a comprehensive set of
options under the filter category. Here, the tool provides both supervised and unsupervised types
of operations.
2. Classify
Classification is one of the essential functions in machine learning, where we assign classes or
categories to items. The classic examples of classification are: declaring a brain tumour as
"malignant" or "benign" or assigning an email to a "spam" or "not_spam" class.
After selecting the desired classifier, we select test options for the training set. Some of the
options are:
o Use training set: the classifier will be tested on the same training set.
o A supplied test set: evaluates the classifier based on a separate test set.
o Cross-validation Folds: assessment of the classifier based on cross-validation using the
number of provided folds.
o Percentage split: the classifier will be judged on a specific percentage of data.

3. Cluster
In clustering, a dataset is arranged in different groups/clusters based on some similarities. In this
case, the items within the same cluster are identical but different from other clusters. Examples
of clustering include identifying customers with similar behaviours and organizing the regions
according to homogenous land use.
4. Associate
Association rules highlight all the associations and correlations between items of a dataset. In
short, it is an if-then statement that depicts the probability of relationships between data items.
A classic example of association refers to a connection between the sale of milk and bread.
The tool provides Apriori, FilteredAssociator, and FPGrowth algorithms for association
rules mining in this category.
5. Select Attributes
Every dataset contains a lot of attributes, but several of them may not be significantly valuable.
Therefore, removing the unnecessary and keeping the relevant details are very important for
building a good model.
Many attribute evaluators and search methods include BestFirst, GreedyStepwise, and Ranker.
6. Visualize
In the visualize tab, different plot matrices and graphs are available to show the trends and errors
identified by the model.

17) Write a short note on BI Applications in Logistics and Production

Applications of Business Intelligence in Logistics:


The benefits of using these types of BI and big data tools are already tangible and measurable.
Already, many companies are heading for a new way of doing business that is more efficient
and competitive because of them.

Accuracy and clarity of information: We can find the status and evolution of the whole
business at a glance, without the need of consulting different sources and cross data with the
help of spreadsheets. It also shortens the time it takes to train employees.
Information updates: Automatically, Business Intelligence tools are updated whenever new
data exists. It minimizes the reporting times as well as updates.
More agile and responsive: In case if something happens in the operation which will affect
the business, alerts are received immediately, which helps us accelerate decision making.
Fewer bottlenecks: With appropriate permissions, the information can be easily accessible by
the users, So that it is not required to request every area manager the data necessary to generate
status reports.
Broader context of information: Through the creation of evolutionary reports, it is easy to
understand the data we manage. Moreover, these kinds of tools offer visual information like
shipment tracking maps.
18) Comment “How might you implement business intelligence findings within
an organization?”

1. Create a business intelligence strategy


A business intelligence strategy is a blueprint that allows any company to measure its
performance, expose shortcomings, improve competitive advantages, and use data mining and
analytics for successful decision making. Any implementation is impossible without a clear
understanding of the key elements:
• What’s your objective?
• What do you have?
• What do you need?
Once you can answer these questions, you can start working on your strategy or roadmap.
Depending on the maturity level of the company, previous experience in BI adoption or lack of
thereof, and size of the company, the final results may vary.
2. Set the Key Performance Indicators
Once you have gathered enough information, it is important to define the KPIs you are going to
track on the company-wide scale, and KPIs to track within the departments. Your KPIs should
be measurable, matching your objectives, and vital for achieving your business goals.
3. Appoint stakeholders and educate the staff
One of the first business intelligence implementation challenges is a human tendency to resist
change. The most effective way to minimize resistance is to educate your personnel. If your
company has no previous experience with BI, you must explain how each department can benefit
from BI implementation. You should also determine key stakeholders in each department. They
will help you collect and prioritize pain points and key performance indicators (KPIs) across the
company.
4. Build a strong BI team or outsource
BI is a cathedral. And every team member plays a vital role in its strength, majesty and beauty.
Or can contribute to its failure. A BI team that has a clear vision and remains focused on that
vision throughout their tasks will work with synergy, enthusiasm and creativity that can never be
nurtured in a scattered group of individuals, regardless of their talent and experience. ~ Douglas
McDowell, CEO of SolidQ North America
The technical stack behind business intelligence solutions can be versatile and depends on the
chosen software and your business needs. Some of the most common technologies used are MS
SQL, Oracle, MySQL Hbase, BigSQL Data Lakes, AWS Redshift, Apache Spark & Hadoop,
SSIS, SSAS, Pentaho, Tableau, QlikView, Power BI, and others.
5. Find the best software for your needs
The selection of tools will vary upon the requirement and budget. However, it is crucial to
understand and evaluate these factors while choosing a software solution:
• Do you have access to data and a convenient view of the relevant information?
• Does this system offer integration within the existing systems or APIs to connect to your
systems?
• How can you interact with data within a visual interface of the software?
• Can you collaborate with others on data analysis and share visualized analytics?
• Will you be able to dive deeper into data and discover new insights on your own?

6. Choose your data storage, environment, and platform


If you don’t have the infrastructure, it’s a good idea to start with choosing your data storage
option. Typically, data warehouse is considered a more suitable choice for Business Intelligence
implementation, as it provides an analysis of relational data coming from both online transaction
processing (OLTP) systems and business apps (e.g., ERP, CRM, and HRM systems). However,
many companies practice adoption of both types of data storages, reaching the maximum
potential of their BI systems.

7. Finetune your data preparation process


Often, big organizations struggle with large amounts of useless data, a.k.a 'data silos'. It happens
when teams or departments use different tools, have entirely different approaches, and keep data
to themselves. Typically, data preparation takes up to 80% of BI development time.
Any successful business intelligence implementation relies heavily on high-quality data.
According to the study, more than 63% of respondents say that data preparation for business
intelligence implementation is either ‘very important’ or ‘critical’.

8. Consider more advanced solutions


As of 2019, 91.6% of global companies said they are increasing their investments in big data and
AI [2]. If you rely heavily on Business Intelligence, its implementation might be even more
beneficial if you incorporate more advanced technologies like Machine Learning. It can also help
you:
• Build optimized data pipelines;
• Achieve real-time data analysis;
• Make actual forecasts;
• And analyze larger sets of data.

9. Implement the PoC or a pilot project


Once you have all the processes ready, it is high time for a test run. And while it might seem like
a great idea to test the system on the company-wide scale, it is better to have a pilot project within
a smaller group.

10.Implement the changes to meet the KPIs


And we are back to steps 1 and 2. Review your results and see whether you've met the initial
expectations. If not, see what can be done to achieve the initial KPIs. Once you implemented the
changes, run another pilot to understand how much you have covered between these two pilot
runs and how this changes the picture. It is a continual process, and it needs optimization at every
run until all of the involved parties are happy with the result. Once you reach that point, you can
safely scale up.
19) Explain Roles of Analytical tools in BI.

Tools for Business Intelligence


1. QlikView
QlikView is one of the most preferred tools for business analytics because of its unique features,
such as patented technology and in-memory processing, facilitating the delivery of ultra-fast
business analytics reports.
2. Sisense
Sisense - one of the most popular business analyst software tools - incorporates dynamic and
robust text analysis functionalities that enable users to transform unstructured text into valuable
business intelligence.
3. Tableau
Tableau's business analytics platform, which includes robust and reliable statistical tools,
empowers users to perform an in-depth analysis of social media networks, and predict patterns
based on current and historical data.
4. Power BI
Microsoft makes Power BI among the many business analytics tools. It offers dynamic
visualizations with self-service business intelligence features, allowing end users to create
dashboards and reports independently without assistance.
5. Yellowfin BI
Yellowfin BI is a business intelligence tool and ‘end-to-end’ analytics platform that combines
visualisation, machine learning, and collaboration. The nice thing about this BI tool is that you
can easily take dashboards and visualisations to the next level using a no code/low code
development environment.
6. SAP Business Objects
SAP Business Objects is a business intelligence software which offers comprehensive reporting,
analysis and interactive data visualisation. The platform focuses heavily on categories such as
Customer Experience (CX) and CRM, digital supply chain, ERP and more.. SAP is a robust
software intended for all roles (IT, end uses and management) and offers tons of functionalities
in one platform.
7. Datapine
Datapine is an all-in-one business intelligence platform that facilitates the complex process of
data analytics even for non-technical users. Thanks to a comprehensive self-service analytics
approach, datapine’s solution enables data analysts and business users alike to easily integrate
different data sources, perform advanced data analysis, build interactive business dashboards
and generate actionable business insights.
8. SAS Business Intelligence
While SAS’ most popular offering is its advanced predictive analytics, it also provides a great
business intelligence platform. This well-seasoned self-service tool, which was founded back
in the 1970s, allows users to leverage data and metrics to make informed decisions about their
business. Using their set of APIs, users are provided with lots of customisation options, and
SAS ensures high-level data integration and advanced analytics & reporting.
20) Write short note on BI in Finance, BI in Marketing
Business Intelligence in Finance
Business Intelligence (BI) in finance uses BI techniques and tools to support financial analysis
and decision-making. This includes budgeting, forecasting, financial reporting, and
performance monitoring.

1. Monitor Financial Performance


Monitoring financial performance refers to the process of using Business Intelligence (BI)
techniques and tools to track and analyze financial metrics, such as revenue, expenses, and
profitability. This is an important aspect of using BI in finance. It can help finance teams
identify areas where cost savings can be made and new revenue streams can be generated.
2. Budgeting and Forecasting
Budgeting and forecasting refer to the process of using Business Intelligence (BI) techniques
and tools to predict future financial performance. This is an important aspect of using BI in
finance. It can help finance teams create accurate budgets and forecasts, which are important
for strategic planning and decision-making.

3. Understanding Customers
Understanding customers is an important aspect of using Business Intelligence (BI) in finance.
This refers to the process of using BI techniques and tools to gain insights into customer
behavior and preferences to make informed decisions about financial matters such as pricing,
marketing, and product development.
4. Financial Reporting
Financial reporting refers to the process of using Business Intelligence (BI) techniques and tools
to generate reports that provide insight into financial performance and trends. This is an
important aspect of using BI in finance, as it can provide finance teams and other stakeholders
such as management and investors with greater visibility into financial performance and trends.

5. Compliance
Compliance refers to the process of ensuring that an organization adheres to laws, regulations,
standards, and guidelines that are relevant to its operations. In finance, compliance can refer to
ensuring that an organization’s financial practices and reporting comply with relevant laws,
regulations, and standards. Business Intelligence (BI) can be used to support compliance by
providing organizations with the necessary tools to monitor and report on financial performance

Business Intelligence in Marketing


Business intelligence is a tech-driven process which leverages software and services to
transform data into actionable and meaningful insight that acquaints an organization’s strategic
and tactical business decisions. Due to various number of tools that enable all-sized businesses
to analyze metrics and large amounts of data in real-time, BI is becoming a routine for marketers.
In the organizations, marketing and sales teams can take advantage of this information,
organizing campaigns that more precisely target the right audience, and gain a better
understanding of which initiatives generate the utmost revenue.
Identifying Customer
Business Intelligence can integrate a company’s entire business into one centrally managed
solution, along with customer insights too. Customer data from CRM, email marketing, social
media campaigns, and website engagements can be tracked in the company’s data management
platform.
Personalized Sales Strategy
It is always vital for marketing and sales team to comprehend other’s business they are trying to
work with. For a salesperson, it needs to learn every aspect of a company before pitching them.
So, Business Intelligence here will make a company’s sales strategies more composed and
effective.
Predictive Analytics
Predictive Analytics and other Business intelligence tools have transformed the way
organizations yield, analyze, and harness data. For marketers, the best BI application is
envisaging the behavior of individual customers.
Enhancing Competitive Intelligence
Business Intelligence can assist companies to improve and enhance their competitive edge with
the effective use of data and turn data into actionable insight. BI solutions make data accessible
for accredited users and enable them to interrelate with competitive intelligence from one secure,
centrally managed data warehouse.
Making Effective Business Model
Since a company’s marketing team has all the data – market insights, customer behavior,
competitor market strategy, they will work more on making their company’s business model
more enduring and result oriented.

nd trends on, to base their predictions.


21) Explain BI Applications in Marketing and CRM with suitable example

Business intelligence in CRM


BI in CRM can play an important part in helping the business to attract and retain their
customers, as customer service and customer satisfaction are the backbone of customer
relationships. In effort to ensure customer satisfaction and retention, business spend a lot time
trying to understand buying behavior, customer expectations for product support, website
support and service variety (service offerings).

BI in CRM is crucial to business success. Competitors have already embraced metrics and KPI
(key performance indicator) for customer relationship management to provide objective metrics
and understand what tasks and activities support goals, and where the business needs to refocus.
CRM is a business strategy that identifies the customer as a very important business asset. It
recognizes that customers are unique and need to be treated as uniquely as possible. It
recognizes customer satisfaction as important only as a determinant of managing churn and
future revenue potential, Information is critical in such a strategy and can only come about
through the analysis of data. Smart use and analysis of data makes business intelligence. So,
business intelligence is a broad category which includes many uses of data, one of which is in
pursuit of a CRM strategy.

Business Intelligence in Marketing


Business intelligence is a tech-driven process which leverages software and services to
transform data into actionable and meaningful insight that acquaints an organization’s strategic
and tactical business decisions. Due to various number of tools that enable all-sized businesses
to analyze metrics and large amounts of data in real-time, BI is becoming a routine for marketers.
In the organizations, marketing and sales teams can take advantage of this information,
organizing campaigns that more precisely target the right audience, and gain a better
understanding of which initiatives generate the utmost revenue.
Identifying Customer
Business Intelligence can integrate a company’s entire business into one centrally managed
solution, along with customer insights too. Customer data from CRM, email marketing, social
media campaigns, and website engagements can be tracked in the company’s data management
platform.
Personalized Sales Strategy
It is always vital for marketing and sales team to comprehend other’s business they are trying to
work with. For a salesperson, it needs to learn every aspect of a company before pitching them.
So, Business Intelligence here will make a company’s sales strategies more composed and
effective.
Predictive Analytics
Predictive Analytics and other Business intelligence tools have transformed the way
organizations yield, analyze, and harness data. For marketers, the best BI application is
envisaging the behavior of individual customers.
Enhancing Competitive Intelligence
Business Intelligence can assist companies to improve and enhance their competitive edge with
the effective use of data and turn data into actionable insight. BI solutions make data accessible
for accredited users and enable them to interrelate with competitive intelligence from one secure,
centrally managed data warehouse.
Making Effective Business Model
Since a company’s marketing team has all the data – market insights, customer behavior,
competitor market strategy, they will work more on making their company’s business model
more enduring and result oriented.

You might also like