DA Marathon Notes
DA Marathon Notes
NATURE OF DATA
▪ Numerical Data
▪ Descriptive Data
▪ Graphic Data
Now, Let’s Understand
Data, Information
and KNOWLEDGE
DATA, INFORMATION, KNOWLEDGE
▪ DATA is a source of INFORMATION and INFORMATION needs to be processed for gathering KNOWLEDGE.
▪ The Goal of any Information systems is to transform data into information to generate knowledge that can be used for
decision making.
Now Let’s Understand
NATURE OF DATA
NATURE OF DATA
Over the time the magnitude and availability of data has exponentially grown over the
years. However, the data sets may be classified into different groups as below:
NUMERICAL DATA Any data expressed as a number is a numerical data.
(i)
(Note – 1)
DESCRIPTIVE DATA Information gets deciphered in the form of
(ii)
(Note – 2) qualitative information.
▪ A picture or graphic may tell thousand stories.
GRAPHIC DATA
(iii) ▪ Data may also be presented in the form of a
(Note – 3)
picture or graphics.
8.2 – TYPES OF DATA IN FINANCE AND COSTING
10 | P a g e
TOPIC – 1
INTRODUCTION & KINDS OF DATA
▪ Data plays a very important role in the study of finance and cost accounting. From the
inception of the study of finance, accounting and cost accounting, data always played
the finance and accounting professionals played a significant role in helping the
▪ The kinds of data used in finance and costing may be quantitative as well as qualitative
in nature.
NUMBERS;
QUANTITATIVE
▪ The stock price data, financial statements etc are examples of
FINANCIAL DATA
quantitative data. As most of the financial records are maintained in
of text.
11 | P a g e
TYPES OF DATA
NOMINAL SCALE ORDINAL SCALE INTERVAL SCALE RATIO SCALE
It is being used for It is being used for It is used for categorising These are the ultimate
categorising/Classifying Classifying and putting it in and ranking using an equal nirvana when it comes to data
data. order. Interval scale. measurement scales because
they tell us about the order,
they tell us the exact value
between units, AND they also
have an absolute zero.
Example Example Example Example
8.3 – DIGITIZATION OF DATA AND INFORMATION
14 | P a g e
OBJECTIVES OF DIGITIZATION
1.) To provide a “WIDESPREAD ACCESS” of data and information to a very large group of
users simultaneously.
2.) It helps in “PRESERVATION OF DATA” for a longer period.
LARGEST DIGITIZATION PROJECT
One of the largest digitization projects taken up in India is “UNIQUE IDENTIFICATION
NUMBER” (UID) or “AADHAR”
Let’s Learn - WHY WE DIGITIZE? – 8 Points
SET – 1
Higher integration with BUSINESS INFORMATION
Integration
SYSTEMS;
Access Kaun Digitized records may be accessed by MORE THAN ONE
Karega? PERSON SIMULTANEOUSLY;
Access Kaha se Can be accessed from MULTIPLE LOCATIONS THROUGH
karega? NETWORKED SYSTEMS;
Helps in work PROCESSING;
Work
Phase – 1
Justification of the proposed
digitization project
Phase – 2
Assessment
Phase – 3
Planning
Phase – 4
Digitization activities
Phase – 5
Processes in the care of
records
Phase – 6
Evaluation
PHASE – 3 “PLANNING”
JUSTIFICATION OF
THE PROPOSED
DIGITIZATION
PROJECT
SELECTION OF
DIGITISATION RESOURCES MANAGEMENT
APPROACH
RISK MANAGEMENT
Note - 1
STAGES OF PLANNING
Selection of digitization approach Project Documentation Resources Management Technical Specifications Risk Management
PHASE 4: DIGITIZATION ACTIVITIES
Upon the completion of assessment and planning phase, the digitization activities start.
The Wisconsin Historical Society developed a six-phase process viz. Planning, Capture,
Primary quality control, Editing, Secondary quality control, and storage and
management.
The planning schedule is prepared at the first stage, calibration of
PLANNING
hardware/software and scanning etc. is done next.
PRIMARY QUALITY
A primary quality check is done on the output to check the reliability.
CONTROL
STORAGE AND And finally, user copies are created, and uploaded to dedicated
MANAGEMENT.
storage space, after doing file validation.
PHASE 5: PROCESSES IN THE CARE OF RECORDS
Once the digitization of records is complete, there are few additional requirements
arise which may be linked to administration of records. The permission for: -
▪ CLASSIFICATION (if necessary),
▪ INTELLECTUAL CONTROL (over data),
▪ ACCESSION OF DATA, and
▪ UPKEEPING AND MAINTENANCE OF DATA are few additional requirements for “DATA
MANAGEMENT.”
PHASE 6: EVALUATION
Once the digitization project is updated and implemented, the final phase should be: -
SYSTEMATIC of the PROJECT’S MERIT, WORTH AND SIGNIFICANT using objective
DETERMINATION criteria.
PRIMARY PURPOSE IDENTIFY CHANGES that would improve future digitization processes.
8.4 - TRANSFORMATION OF DATA TO DECISION RELEVANT
INFORMATION
INTRODUCTION
▪ The emergence of big data has changed the world of business like never before.
▪ The most important shift has happened in the information generation and the
harnessed into information that apprises, cares and prudent decision making in a
▪ The pertinent question here is What an enterprise needs to do for transforming data
into relevant information? As noted earlier, all types of data may not lead to relevant
19 | P a g e
▪ Data reporting stage involves translating the data into a
information.
20 | P a g e
8.5 – COMMUNICATION OF INFORMATION FOR QUALITY
DECISION MAKING
INTRODUCTION
The quality information should lead to quality decisions. With the help of well
curated and reported data, the decision makers should be able to add higher-value
By transforming the information into a process for quality decision making, the firm
2) Diagnose, filter and except value from financial and operational information for
MAKING BETTER BUSINESS DECISIONS;
21 | P a g e
8.6 – PROFESSIONAL SKEPTICISM REGARDING DATA
INTRODUCTION
While data analytics is an important tool for decision making, managers should never
that lie underneath the surface of the data set need to be explored. The emergence
of new data analytics tools and techniques in financial environment allows the
accounting and finance professionals to gain unique insights into the data.
As the availability of data is bigger now, analysts and auditors not only getting more
regulators and standard setters. At the same time, professional scepticism may result
understand conditions in which the finance and audit professionals should apply
scepticism and underutilizing data analytics to keep the cost under control.
22 | P a g e
8.7 ETHICAL USE OF DATA AND INFORMATION
Data analytics can help in decision making process and make an impact. However, this
empowerment for business also comes with challenges. The question is how the
business organizations can ethically collect store and use data? And what rights need
to be upheld? Below we will discuss five guiding principles in this regard.
The five basic principles of data ethics that a business organization should follow are:
PRINCIPLE – 1
REGARDING OWNERSHIP
The first principle is that ownership of any personal information belongs to the
Person.
It is unlawful and unethical to collect someone’s personal data without their
consent.
The consent may be obtained through digital privacy policies or signed agreements or
by asking the users to agree with terms and conditions.
It is always advisable to ask for permission beforehand to avoid future legal and
ethical complications.
HOW TO LEARN?
PRINCIPLE – 2
REGARDING TRANSPARENCY
Maintaining transparency is important while gathering data.
The objective with which the company is collecting user’s data should be known to
the user.
For example, is the company is using cookies to track the online behaviour of the user,
it should be mentioned to the user through a written policy that cookies would be used
for tracking user’s online behaviour and the collected data will be stored in a secure
database. After reading the policy, the user may decide to accept or not to accept the
policy.
Similarly, while collecting the financial data from clients, it should be clearly
mentioned that for which purpose the data should be used.
HOW TO LEARN?
OBJECTIVE PURPOSE
TRANSPARENCY
(Imp) (User data) (Financial data)
(User) (Clients)
PRINCIPLE – 3
REGARDING PRIVACY
As the user may allow to collect, store and analyze the PERSONALLY IDENTIFIABLE
INFORMATION (PII), that does not imply it should be made publicly available.
For companies, it is mandatory to publish some financial information to public e.g.,
through annual reports. However, there may be many confidential information, which
if falls on a wrong hand may create problems and financial loss.
To protect privacy of data, a data security process should be in place. This may
include file encryption and dual authentication password etc.
HOW TO LEARN?
PERSONALLY IDENTIFIABLE INFORMATION (PII) KE REGARD MEI?
What you “CAN” do? What you “CAN’T” do? What you “SHOULD” do?
A data security process
should be in place.
It should not be made publicly This may include file
Collect, Store and Analyze
available. encryption and dual
authentication password etc.
(NOTE - 1)
PRINCIPLE – 4
REGARDING INTENTION
The intention of data analysis: -
1.) Should never be making profits out of others;
OR
2.) Hurting others.
HOW TO LEARN?
×
PRINCIPLE – 5
REGARDING OUTCOMES
In some cases, even if the intentions are good, the result of data analysis may
inadvertently hurt the clients and data providers. This is called disparate
impact, which is unethical.
CHAPTER 9
DATA PROCESSING, ORGANIZATION, CLEANING AND VALIDATION
Chapter overview
9.1 Development of Data Processing
28 | P a g e
9.1 – DEVELOPMENT OF DATA PROCESSING
DATA PROCESSING
▪ Introduction
TOPIC – 1
▪ Phases of data processing
29 | P a g e
TOPIC - 1
DATA PROCESSING
INTRODUCTION
▪ In recent years, the capacity and effectiveness of DP have increased manifold with
of the data. Data quality may get affected due to several issues like missing data and
duplications.
▪ Data processing that used to require a lot of human labour progressively superseded
30 | P a g e
PHASES OF DATA PROCESSING
machines.
▪ This phase began in 1890 (Bohme et al., 1991) when a system made
MECHANICAL DP
up of intricate punch card machines was installed by the US Bureau
And finally, the electronic DP replaced the other two that resulted
ELECTRONIC DP fall in mistakes and rising productivity. Data processing is being done
31 | P a g e
HOW DATA PROCESSING AND DATA SCIENCE IS RELEVANT FOR FINANCE?
The relevance of data processing and data science in the area of finance is increasing
every day. Significant areas where data science play important role are:
industry.
decisions.
relative risk.
MANAGEMENT
▪ Due to the arrival of social media and new Internet of Things (IoT)
data.
32 | P a g e
▪ Despite the fact that each consumer is unique, it is only possible to
divided;
▪ In a world where choice has never been more crucial, it has become
a customised service.
Networks.
33 | P a g e
▪ Predictive analytics enables organisations in the financial sector to
extrapolate from existing data and anticipate what may occur in the
increases.
who has traditionally been very frugal, the card can be immediately
34 | P a g e
▪ Algorithmic trading happens when an unsupervised computer
indecision or thought.
35 | P a g e
9.2 – FUNCTIONS OF DATA PROCESSING
▪ Sorting
▪ Aggregation
TOPIC – 1
▪ Analysis
▪ Reporting
▪ Classification
36 | P a g e
TOPIC – 1
PROCESSES INVOLVED IN DATA PROCESSING
1.) VALIDATION
As per the UNECE glossary on statistical data editing (UNECE
OF ACCEPTABLE VALUES.’
1. Relevance;
2. Accessibility;
5. Comparability;
6. Comprehensiveness.
7. Correctness;
37 | P a g e
2.) SORTING
▪ Data sorting is any procedure that ORGANIZES DATA INTO A MEANINGFUL
ORDER TO MAKE IT SIMPLER TO COMPREHEND, ANALYSE, AND VISUALIZE.
or summarized output).
38 | P a g e
4.) ANALYSIS
It is described as the process of cleansing, converting data to obtain
MEANING
actionable business Intelligence.
analysis. Now, the same task that an analyst does for commercial
DATA ANALYSIS
TOOLS
39 | P a g e
5.) REPORTING
▪A DATA REPORT IS NOTHING MORE THAN A SET OF DOCUMENTED FACTS AND
NUMBERS;
provide more effective and efficient patient care, hence saving lives;
INTRODUCTION
tool;
BENEFITS OF
▪ It indicates where we should devote the most time and money,
REPORTING
as well as what need more organisation or attention;
THE MOST
1) PRIORITIZE the most pertinent data;
40 | P a g e
6.) CLASSIFICATION
Regarding risk management and data security, the classification
industry standards:
judgement
1) Facilitating access;
41 | P a g e
IT IS STANDARD PRACTISE TO DIVIDE DATA AND SYSTEMS INTO THREE RISK CATEGORIES.
If data is accessible to the public and recovery is simple, then
LOW RISK
this data and the mechanisms around it pose a smaller risk.
“high risk.”
Public data are the least sensitive and have the lowest security requirements;
Restricted data are the most sensitive and have the highest security rating.
STEPS FOR EFFECTIVE DATA CLASSIFICATION
▪ Taking a comprehensive look at the organization’s current data
and any applicable legislation is likely the best beginning point for
UNDERSTANDING THE
successfully classifying data;
CURRENT SETUP
▪ Before one classifies data, one must know what data he is
having.
42 | P a g e
DATA CLASSIFICATION MATRIX
Creating a matrix that rates data and/or systems based on how likely they are to be
hacked and how sensitive the data is enables you to rapidly identify how to classify and
information remain
43 | P a g e
9.3 – DATA ORGANIZATION AND DISTRIBUTION
DATA DISTRIBUTION
▪ Continuous Distributions
44 | P a g e
TOPIC – 1
DATA ORGANIZATION
INTRODUCTION
▪ In a world where data are among the most valuable assets possessed by firms
make better use of their data assets. Executives and other professionals may
▪ As time passes and the data volume grows, the time required to look for any
▪ STRUCTURED DATA consists of tabular information that may be readily imported into a
▪ UNSTRUCTURED DATA are raw and unformatted data, such as a basic text document with
45 | P a g e
TOPIC – 2
DATA DISTRIBUTION
INTRODUCTION
Data distribution is a function that identifies and quantifies all potential values for a
variable, as well as their relative frequency (probability of how often they occur).
TYPES OF DISTRIBUTION
Distributions are basically classified based on the type of data
DISCRETE DISTRIBUTIONS
A discrete distribution that results from countable data and has a finite number of
potential values. Example: rolling dice, selecting a specific amount of heads, etc.
tail is one-half.
46 | P a g e
▪ The hypergeometric distribution is a discrete distribution that
is that the chance of success is not the same for all trials in
distribution.
GEOMETRIC success;
DISTRIBUTION ▪ Example: A marketing representative from an advertising firm
47 | P a g e
CONTINUOUS DISTRIBUTIONS
A distribution with an unlimited number of (variable) data points that may be
distribution.
NORMAL
▪ It is a bell-shaped curve with a greater frequency around the
DISTRIBUTION
core point. As values go away from the centre value on each side,
48 | P a g e
▪ The exponential distribution is a probability distribution and
49 | P a g e
9.4 - DATA CLEANING AND VALIDATION
50 | P a g e
TOPIC - 1
DATA CLEANING
INTRODUCTION
▪ Data cleansing is the process of CORRECTING INACCURATE, IMPROPERLY FORMATTED,
▪ THE PROCESS OF CHANGING DATA FROM ONE FORMAT OR STRUCTURE to another is known as
wrangling or data munging, since they map and change “raw” data into another format
observations;
▪ DE-DUPLICATION IS ONE OF THE MOST IMPORTANT CONSIDERATIONS FOR THIS PROCEDURE. This
may make analysis more effective in addition to producing a more manageable and
effective dataset.
STEP 2: FIX STRUCTURAL ERRORS
▪ When transferring data, you may detect unusual naming standards.
▪ For instance, “N/A” & “NOT APPLICABLE” may both be present, but they should be examined
as a single category.
51 | P a g e
STEP 3: FILTER UNWANTED OUTLIERS
▪ Occasionally, you will encounter observations that, do not appear to fit inside the data
you are evaluating. If you have a valid cause to eliminate an outlier it will improve the
▪ Remember that the existence of an outlier does not imply that it is erroneous.
There are several approaches to handle missing data. Although neither is desirable, both
should be explored.
▪ AS A FIRST ALTERNATIVE, the observations with missing values may be dropped, but
doing so may result in the loss of information. This should be kept in mind before doing
so;
observations. Again, there is a chance that the data’s integrity may be compromised, as
▪ Does it verify or contradict your working hypothesis, or does it shed any light on it?
a culture of data quality inside the firm. To do this, one should specify the methods
that may be employed to establish this culture and also the definition of data quality.
52 | P a g e
BENEFITS OF QUALITY DATA
Main characteristics of quality data are:
1) Validity
2) Accuracy
3) Completeness
4) Consistency
originating;
53 | P a g e
TOPIC - 2
DATA VALIDATION
INTRODUCTION
Although data validation is an essential stage in every data pipeline, it is frequently
ignored. It may appear like data validation is an unnecessary step that slows down the
If the initial data is not valid, the outcomes will not be accurate either. It is therefore
Without data validation, one may into run the danger of basing judgments on faulty
DATA TYPE CHECK ▪ For instance, a field may only take NUMERIC VALUES;
▪ If this is the case, the system should reject any data containing
other characters.
FORMAT CHECK ▪ Date columns that are kept in a fixed format, such as “YYYY-MM-DD”
54 | P a g e
UNIQUENESS ▪ Some data like PAN OR E-MAIL IDS are unique by nature;
CHECK ▪ These fields should typically contain unique items in a database.
SPECIFIED RANGE.
degrees, whereas a longitude value must fall between -180 and 180
55 | P a g e
QUESTION BANK
a) that are categorised into two mutually a) The probability of the occurrence of the
b) that are categorised into three b) The probability of the occurrence of the
c) that are categorised into less than two c) The probability of the occurrence of the
d) that are categorised into four mutually d) The probability of the occurrence of the
ANSWER
QUES NO. 1 2 3 4 5
ANS NO. d a a a a
56 | P a g e
STATE TRUE OR FALSE
Data validation could be operationally defined as a process which ensures the
characteristics.
Financial data such as revenues, accounts receivable, and net profits are often
3
summarised in a company’s data reporting.
Structured data consists of tabular information that may be readily imported into
4
a database and then utilised by analytics software or other applications.
Data distribution is a function that identifies and quantifies all potential values
5 for a variable, as well as their relative frequency (probability of how often they
occur).
ANSWER
QUES NO. 1 2 3 4 5
ANSWER T T T T T
57 | P a g e
SHORT ESSAY TYPE QUESTIONS
1 Briefly discuss about the role of data analysis in fraud detection
ANSWER
1 Private
2 Classification
3 Image representation
4 Mean
5 Data cleaning
58 | P a g e
CHAPTER 10
DATA PRESENTATION: VISUALIZATION AND GRAPHICAL PRESENTATION
Chapter overview
10.1 Data Visualization of Financial and Non – Financial Data
59 | P a g e
10.1 - DATA VISUALIZATION OF FINANCIAL AND NON-
FINANCIAL DATA
TOPIC – 1 INTRODUCTION
60 | P a g e
TOPIC - 1
INTRODUCTION
▪ There is a saying ‘A picture speaks a thousand words’.
However, obtaining data and presenting it are two distinct and equally
61 | P a g e
TOPIC – 2
WHY DATA VISUALIZATION IS IMPORTANT?
▪ Scott Berinato, senior editor and data visualisation specialist for Harvard
Business Review, writes in a recent post that “data visualization was once a
▪ Several studies indicate that sixty five percent of individuals are visual
identifying more patterns and gaining deeper insights, particularly when many
62 | P a g e
TOPIC – 3
DOING DATA VISUALIZATION IN THE RIGHT WAY?
All data visualisation isn’t created equally engaging. When properly executed, it
Finance professionals who are investigating how data visualisation might help
their analytics efforts and communication should keep the following in mind:
KNOW THE OBJECTIVE ▪ For instance, if the objective is to display the income
objective is exploratory.
63 | P a g e
▪ There are a multitude of technological tools that make it
digital age;
INVEST IN THE BEST
▪ The firm should first implement an ERP that develops a
TECHNOLOGY
graphs.
64 | P a g e
10.2 OBJECTIVE AND FUNCTION OF DATA PRESENTATION
65 | P a g e
TOPIC - 1
INTRODUCTION TO DATA VISUALISATION
▪ The absence of data visualisation would make it difficult for organisations to
enables analysts to visualise new concepts and patterns. With the daily increase
▪ Every company may benefit from a better knowledge of their data, hence data
the most crucial asset for every organisation. Through the use of visuals, one
may effectively communicate their ideas and make use of the information.
▪ Dashboards, graphs, infographics, maps, charts, videos, and slides may all be
choices.
66 | P a g e
▪ Data visualisation enables business users to obtain
TOPIC - 2
OBJECTIVE OF DATA VISUALISATION
▪ Data visualisation enhances the effect of communications for the audiences
and fields.
67 | P a g e
TOPIC - 3
MOST COMMON ERRORS MADE BY ANALYSTS THAT MAKES A DATA VISUALISATION
UNSUCCESSFUL ARE:
investigate.
68 | P a g e
▪ The designer must guarantee that all viewers have the
SETTING UP A CLEAR
employed, whereas the syntax is concerned with the form
FRAMEWORK
of the communication. For instance, when utilising an icon,
viewer.
that the data is clean and that the analyst understands its
69 | P a g e
▪ There are few kinds of communication as convincing as a
information.
TELLING A STORY ▪ In order to comprehend the data and connect with the
70 | P a g e
10.3 DATA PRESENTATION ARCHITECTURE
71 | P a g e
TOPIC - 1
INTRODUCTION TO DPA
Data presentation architecture (DPA) is a set of skills that aims to identify,
find, modify, format, and present data in a manner that ideally conveys
applied skill set critical for the success and value of Business Intelligence.
intelligence solutions with the data scope, delivery timing, format and
visualizations that will most effectively support and drive operational, tactical
DPA is neither an IT nor a business skill set but exists as a separate field of
expertise.
much broader skill set that includes determining what data on what schedule
and in what exact format is to be presented, not just the best way to
present data that has already been chosen (which is data visualization).
72 | P a g e
TOPIC – 2
OBJECTIVES OF DPA
There are following objectives of DPA:
noise, complexity, and unneeded data or detail based on the demands and tasks
of each audience).
TOPIC - 3
SCOPE OF DPA
▪ Defining significant meaning (relevant information) required by each audience
▪ Obtaining the proper data (focus area, historic reach, extensiveness, level of
detail, etc.);
data);
73 | P a g e
10.4 - DASHBOARD, GRAPHS, DIAGRAMS, TABLES, REPORT
DESIGN
TOPIC – 1 INTRODUCTION
TOPIC – 2 DASHBOARDS
TOPIC - 4 TABLES
74 | P a g e
TOPIC - 1
INTRODUCTION
Data visualisation is the visual depiction of data and information. Through the
use of visual elements like dashboards, charts, graphs, and maps etc, data
manage important metrics across numerous financial channels, visualise the data
points, and generate reports for customers that summarise the results.
▪ Creating reports for your audience is one of the most effective means of
dashboard, the audience would be able to view the performance of their company
at a glance.
dashboard helps to explain what the company is doing and why, also fosters
▪ There are numerous levels of dashboards, ranging from those that represent
metrics vital to the firm as a whole to those that measure values vital to teams
75 | P a g e
GRAPH, DIAGRAM AND CHARTS
Henry D. Hubbard, Creator of the Periodic Table of Elements once said,
“There is magic in graphs. Few important and widely used graphs are mentioned
below:
▪ Bar Chart;
▪ Line Chart;
▪ Pie Chart;
▪ Map;
▪ Density Map;
▪ Scatter Plots;
▪ Gantt Chart;
▪ Histogram.
76 | P a g e
BAR CHART
▪ Bar graphs are one of the most used types of data visualisation.
▪ Bar graphs are very useful when the data can be divided into distinct
categories. For instance, the revenue earned in different years, the number of
▪ To add a zing, the bars can be made colourful. Using stacked and side-by-
side bar charts, one may further dissect the data for a more in-depth
examination.
FIGURE
77 | P a g e
LINE CHART
▪ The line chart or line graph joins various data points, displaying them as a
continuous progression.
▪ Utilize line charts to observe trends in data, often over time (such as stock
78 | P a g e
PIE CHART
▪ A pie chart (or circle chart) is a circular graphical representation of
▪ In a pie chart, the arc length of each slice is proportionate to the value it
depicts.
▪ The corporate world and the mass media make extensive use of pie charts.
FIGURE
79 | P a g e
MAP
▪ For displaying any type of location data, including postal codes, state
▪ If the data is related with geographic information, maps are a simple and
▪ There should be a correlation between location and the patterns in the data.
80 | P a g e
DENSITY MAP
▪ Density maps indicate patterns or relative concentrations that might
▪ Density maps are particularly useful when dealing with large data sets including
81 | P a g e
SCATTER PLOTS
▪ Scatter plots are a useful tool for examining the connection between many
82 | P a g e
GANTT CHART
▪ Gantt charts represent a project’s timeline or activity changes across time.
▪ A Gantt chart depicts tasks that must be accomplished before others may
▪ However, Gantt charts are not restricted to projects. This graphic can depict
83 | P a g e
HISTOGRAM
▪ Histograms illustrate the distribution of the data among various groups.
Histograms divide data into discrete categories and provide a bar proportionate
▪ This chart type might be used to show data such as number of items.
FIGURE
84 | P a g e
TOPIC - 3
TABLES
▪ Tables, often known as “crosstabs” or “matrices,” emphasize individual values
above aesthetic formatting. They are one of the most prevalent methods for
showing data and, thus, one of the most essential methods for analyzing data.
exercise, visual features may be added to tables to make them more effective
menus, and within Microsoft Excel. It is crucial to know how to interpret tables
and make the most of the information they provide since they are ubiquitous. It
is also crucial for analysts and knowledge workers to learn how to make
85 | P a g e
TOPIC - 4
REPORT DESIGN USING DATA VISUALIZATION
▪ After producing a report, the last thing one anticipates is for someone to
impression. To do this, one must present the report in a style that is both
mentioned below:
one must:
86 | P a g e
▪ Data Visualization is not limited to the creation of charts
VISUALIZATION
as form, size, colour, and labelling may have a significant
visualisations.
87 | P a g e
10.5 TOOLS AND TECHNIQUES OF VISUALIZATION AND
GRAPHICAL PRESENTATION
▪ Tableau
TOPIC – 1 ▪ Microsoft Power BI
▪ Microsoft Excel
▪ QlikView
88 | P a g e
DATA VISUALISATION TOOLS
We will now examine some of the most successful data visualisation tools for
data scientists and how they may boost their productivity. Here are four popular
data visualisation tools that may assist data scientists in making more compelling
presentations.
Tableau
▪ Tableau is a data visualisation application for creating interactive graphs,
▪ Tableau Desktop is the first product of its kind. Tableau Public is a free
▪ It takes time and effort to understand Tableau, but there are several tools
▪ As a data scientist, Tableau must be the most important tool one should
89 | P a g e
Microsoft Power BI
▪ Microsoft Power BI is a data visualisation tool for business intelligence
data.
▪ In addition, it provides a platform for end users to generate reports and share
▪ It serves as a centralized repository for all of the business data, which all of
90 | P a g e
Microsoft Excel
▪ Microsoft Excel is a data visualization tool which provides several options for
viewing data, such as, scatter plot, bar chart, histogram, pie chart, line
scientific, medical, and economic data for market research and financial
91 | P a g e
QlikView
▪ QlikView is a data discovery platform that enables users to make quicker,
▪ It may mix diverse data sources with color-coded tables, bar graphs, line
92 | P a g e
QUESTION BANK
visualization visualisation:
data)
QUESTION NO. 5
A scatter plot displays several unique data points:
a) On a single graph.
ANS NO. d d d d a
93 | P a g e
STATE TRUE OR FALSE
Data visualisation enhances the effect of communications for the audiences
1
and delivers the most convincing data analysis outcomes.
3 find, modify, format, and present data in a manner that ideally conveys
Scatter plots are a useful tool for examining the connection between many
ANSWER
QUES NO. 1 2 3 4 5
ANSWER T T T T T
94 | P a g e
FILL IN THE BLANKS
1 Data and insights available to decision-makers facilitate _______ analysis.
2 Broader
4 Maps
5 Density maps
95 | P a g e
CHAPTER 11
DATA ANALYSIS AND MODELLING
Chapter overview
11.1 Process, Benefits and Types of Data Analysis
96 | P a g e
11.1 – “PROCESS, BENEFITS AND TYPES OF DATA ANALYSIS”.
97 | P a g e
TOPIC – 1
INTRODUCTION TO DATA ANALYTICS
▪ Data analytics is the science of evaluating unprocessed datasets to draw conclusions
▪ It helps us to identify patterns in the raw data and extract useful information
from them.
▪ Here data are evaluated and used, to assist firms in gaining a deeper understanding
developing content strategies, and creating new products. Data analytics enables
TOPIC – 2
PROCESS OF DATA ANALYTICS
Following are the steps for data analytics:
DATA MAY BE SEGMENTED by a variety of PARAMETERS, including age,
STEP 1: CRITERIA FOR
population, income, and sex. The data values might be either
GROUPING DATA
numeric or category.
STEP 2: COLLECTING DATA MAY BE GATHERED from SEVERAL SOURCES, including internet
THE DATA sources, computers and community sources.
STEP 3: ORGANIZING After collecting the data, IT MUST BE ARRANGED SO THAT IT CAN BE
DUPLICATES OR ERRORS.
STEP 4: CLEANING THE
DATA
▪ Before data is sent to a data analyst for analysis, IT IS
DATA.
1) Descriptive analytics
STEP 5: ADOPT THE
4) Prescriptive analytics
98 | P a g e
TOPIC - 3
BENEFITS OF DATA ANALYTICS
Following are the benefits of data analytics:
IMPROVES DECISION MAKING PROCESS
▪ Companies can use the information gained from data analytics to base their decisions,
▪ Using advanced data analytics technologies, you can continuously collect and analyse
▪ When firms have a better understanding of their audience’s demands, they spend
▪ This enables the company to tailor stakeholders’ experiences to their needs, provide
more personalization, and build stronger relationships with them.
99 | P a g e
11.2 – “DATA MINING AND IMPLEMENTATION OF DATA MINING”.
100 | P a g e
TOPIC - 1
INTRODUCTION TO DATA MINING
▪ Given the advancement of DATA WAREHOUSING TECHNOLOGIES and the expansion of BIG
DATA, the use of data mining techniques has advanced dramatically over the past two
INFORMATION.
▪ DATA MINING, also known as KNOWLEDGE DISCOVERY IN DATA (KDD), is the extraction of
patterns and other useful information from massive data sets. Through smart data
▪ Nevertheless, despite the fact that technology is always evolving to manage massive
▪ The DATA MINING TECHNIQUES behind these investigations predicts results using
machine learning algorithms. These strategies are used to organise and filter data,
101 | P a g e
TOPIC - 2
PROCESS OF DATA MINING
▪ The PROCESS OF DATA MINING comprises a series of procedures, from data collecting
through visualisation, in order to EXTRACT USEFUL INFORMATION FROM MASSIVE DATA SETS.
TECHNIQUES, they discover outliers for use cases such as SPAM IDENTIFICATION.
4) Assessing outcomes.
SETTING THE BUSINESS OBJECTIVE
▪ This might be the MOST DIFFICULT ELEMENT IN THE DATA MINING process, yet many
▪ Together, data scientists and business stakeholders must identify the business
to determine which collection of data will assist the company in answering crucial
questions;
▪ Once the pertinent DATA HAS BEEN COLLECTED, IT WILL BE CLEANSED BY ELIMINATING ANY
▪ Based on the dataset, AN EXTRA STEP MAY BE DONE TO MINIMISE THE NUMBER OF DIMENSIONS,
102 | P a g e
MODEL BUILDING AND PATTERN MINING
▪ Data scientists may study any intriguing relationship between the data, such as
▪ Depending on the available data, DEEP LEARNING ALGORITHMS may also be utilised to
▪ If the INPUT DATA IS MARKED (i.e., supervised learning), a CLASSIFICATION MODEL may
points in the training set are compared to uncover underlying commonalities, then
▪ When completing results, they MUST BE VALID, ORIGINAL, PRACTICAL, AND COMPREHENSIBLE.
▪ When this criterion is satisfied, companies can execute new strategies based on this
103 | P a g e
TOPIC – 3
TECHNIQUES OF DATA MINING
Using various methods and approaches, data mining transforms vast quantities of data
▪ These methodologies are commonly employed for MARKET BASKET ANALYSIS, enabling
DATA.
▪ Every node has INPUTS, WEIGHTS, A BIAS (OR THRESHOLD), AS WELL AS AN OUTPUT.
▪ If the output value exceeds a predetermined threshold, the node “fires” and
▪ When the cost function is zero (0) or close to it (1), we may have confidence in the
collection of decisions.
▪ This technique assumes that COMPARABLE DATA POINTS exist in close proximity to one
another.
EUCLIDEAN DISTANCE, and then assigns some on the most common category or average.
104 | P a g e
TOPIC - 4
IMPLEMENTATION OF DATA MINING IN FINANCE AND MANAGEMENT
The widespread use of data mining techniques by business intelligence and data
analytics teams enables them to harvest insights for their organisations and industries.
Utilizing data mining techniques, hidden patterns and future trends may be predicted.
▪ In today’s society, DATA MINING TECHNIQUES have advanced to the point where they are
▪ The data mining methodology provides a mechanism for bank customers to detect or
allowing a vast volume of data to be correctly and reliably evaluated with the aid of
▪ With data mining, it is possible to keep earnings, margins, etc. and determine which
▪ Data mining aids in the management of all critical data and massive databases by
105 | P a g e
11.3 – “ANALYTICS AND MODEL BUILDING (DESCRIPTIVE,
DIAGNOSTIC, PREDICTIVE, PRESCRIPTIVE)”.
106 | P a g e
TOPIC – 1
DESCRIPTIVE ANALYTICS
INTRODUCTION
▪ Descriptive analytics is a frequently employed style of data analysis in which
▪ Descriptive analytics serves as a basic starting point to inform or prepare data for
subsequent analysis.
techniques: data aggregation and data mining (also known as data discovery). The
process of gathering and organising data into digestible data sets called data
aggregation.
According to Dan Vesset, the process of descriptive analytics may be broken into five
broad steps:
OF DATA
REQUIREMENT
107 | P a g e
STEP 3 Data preparation, which includes transformation and cleaning is a
PREPARATION
crucial step for ensuring correctness; it is also one of the most time-
AND COLLECTION
STEP 4: ANALYSIS Utilizing clustering and regression analysis, we discover data trends
OF DATA and evaluate performance.
STEP 5: Lastly, charts and graphs are utilised to portray findings in a
PRESENTATION
manner that non- experts in analytics may comprehend.
OF DATA
▪ The number of followers, likes, and posts may be utilised to calculate, for example,
the average number of replies per post, page visits, and response time.
▪ Past events, such as sales and operational data or marketing campaigns, are
summarized;
▪ Social media usage such as Instagram or Facebook likes, are examples of such
information.
ADVANTAGES AND DISADVANTAGES OF DESCRIPTIVE ANALYTICS
▪ Due to the fact that descriptive analytics depends just on historical data this
technique is easily applicable to day-to-day operations and does not need an in-depth
understanding of analytics.
▪ This implies that firms may report on performance very quickly and acquire insights
108 | P a g e
TOPIC - 2
DIAGNOSTIC ANALYTICS
INTRODUCTION
▪ Diagnostic analytics highlights the tools are employed to question the data, “Why did
insights.
▪ Descriptive analytics, the first phase in the data analysis process, whereas
Diagnostic analytics goes a step further by revealing the rationale behind particular
outcomes.
▪ Typical strategies for diagnostic analytics include data discovery, data mining;
unstructured data.
ADVANTAGES
▪ Data plays an increasingly important role in every organisation; Diagnostic tools helps
to make the most of the data by turning it into visuals and insights that can be
utilised by everyone;
▪ Diagnostic analytics enables to derive value from the data by asking the relevant
109 | P a g e
TOPIC – 3
PREDICTIVE ANALYTICS
INTRODUCTION
▪ Predictive analytics, as implied by its name, focuses on forecasting and
understanding what might occur in the future, whereas descriptive analytics focuses
on previous data;
▪ By analysing past data patterns and trends by examining historical data and customer
insights, it is possible to predict what may occur in the future and, as a result, many
▪ Using techniques such as data mining and machine learning algorithms (classification,
▪ Deep learning is a more recent subfield of machine learning that imitates the
building of “human brain networks as layers of nodes that understand a specific process
precise, but it may serve as a crucial tool for forecasting probable future occurrences
▪ Customer service, which may aid a business in gaining a deeper knowledge of who its
clients are and what they want so that it can personalise its suggestions, is essential;
▪ Risk mitigation;
▪ This kind of analysis requires the availability of historical data, typically in enormous
quantities.
110 | P a g e
EXAMPLE OF PREDICTIVE ANALYTICS
Following industries as some in which predictive analysis might be utilised:
▪ Sales – estimating the possibility that a buyer will buy another item or depart the
shop.
▪ Human resources – identifying employees who are contemplating resigning and urging
them to remain.
111 | P a g e
TOPIC – 4
PRESCRIPTIVE ANALYTICS
INTRODUCTION
▪ DESCRIPTIVE ANALYTICS DESCRIBES WHAT HAS OCCURRED, DIAGNOSTIC ANALYTICS EXPLORE WHY
▪ This approach is the fourth, final, and most sophisticated step of the business
analysis process, and it is the one that urges firms to action by assisting executives,
▪ A multitude of approaches and tools – such as rules, statistics, and machine learning
algorithms – may be used to accessible data, including internal data (from within the
are same.
enormous amounts of data without explicit instructions, and to adapt and become
112 | P a g e
EXAMPLES OF PRESCRIPTIVE ANALYTICS
Prescriptive analysis applications include the following:
readmission rates;
113 | P a g e
11.4 – “STANDARDS FOR DATA TAGGING AND REPORTING (XML,
XBRL)”
▪ Participants
114 | P a g e
TOPIC - 1
EXTENSIBLE MARKUP LANGUAGE (XBRL)
INTRODUCTION
▪ XML is a file format and markup language for storing, transferring, and recreating
arbitrary data. It specifies a set of standards for encoding texts in a format that is
understandable by both humans and machines. XML is defined by the 1998 XML 1.0
Specification of the World Wide Web Consortium and numerous other related
textual data format with significant support for many human languages via Unicode.
▪ Several schema systems exist to help in the design of XML-based languages, and
function of XML. In order for two dissimilar systems to share data, they must agree
systematically;
▪ The data structure is represented by XML tags, which also contain information. The
information included within the tags is encoded according to the XML standard. A
supplementary XML schema (XSD) defines the required metadata for reading and
verifying XML. This is likewise known as the canonical schema. A “well-formed” XML
to its schema.
115 | P a g e
TOPIC - 2
EXTENSIBLE BUSINESS REPORTING LANGUAGE (XBRL)
INTRODUCTION
▪ XBRL is a data description language that facilitates the interchange of STANDARD,
extraction of financial data across all software types and advanced technology,
including Internet.
▪ XBRL allows organisations to arrange data using TAGS. When a piece of data is labelled
as “revenue,” for instance, XBRL enabled applications know that it pertains to revenue.
financial documents.
With XBRL, a business, a person, or another software programme may quickly produce
elements of Cost Audit Report and Compliance Report can be mapped into XBRL
COMPARISON AND
Access, Comparison, and Analytic capabilities for information are
ANALYTIC
unparalleled.
116 | P a g e
PARTICIPANTS
117 | P a g e
11.5 – “CLOUD COMPUTING, BUSINESS INTELLIGENCE,
ARTIFICIAL INTELLIGENCE, ROBOTIC PROCESS AUTOMATION
AND MACHINE LEARNING”.
CLOUD COMPUTING
▪ Introduction
✓ Public Cloud
✓ Hybrid Cloud
BUSINESS INTELLIGENCE
TOPIC - 2 ▪ Introduction
▪ BI Methods
ARTIFICIAL INTELLIGENCE
TOPIC - 3 ▪ Introduction
▪ Benefits of RPA
MACHINE LEARNING
▪ Introduction
✓ Supervised Learning
TOPIC – 6
✓ Unsupervised Machine Learning
✓ Reinforcement Learning
✓ Dimensionality Reduction
118 | P a g e
TOPIC – 1
INTRODUCTION TO CLOUD COMPUTING
Before the advent of cloud computing, businesses had to acquire and
LOCAL SERVER
OPERATE THEIR OWN SERVERS TO SUIT THEIR DEMANDS.
“THE CLOUD.”
TRAFFIC VOLUMES.
LESS COSTLY RELIANCE ON COSTLY ONSITE SERVERS, MAINTENANCE STAFF, AND OTHER IT
RESOURCES.
119 | P a g e
PUBLIC CLOUD
▪ The public cloud stores and MANAGES ACCESS TO DATA AND APPLICATIONS
THROUGH THE INTERNET.
INTRODUCTION
▪ Because these resources are offered through the web, the public
120 | P a g e
TOPIC – 2
INTRODUCTION TO BUSINESS INTELLIGENCE
▪ Business intelligence includes data mining, data visualisation and best practises to
▪ When you have a complete picture of your organization’s data and utilise it to drive
change, remove inefficiencies, and swiftly adjust to market or supply changes, you have
BI METHODS
▪ Company intelligence is a broad word that ENCOMPASSES THE PROCEDURES AND METHODS
STATISTICAL
Taking the results of descriptive analytics and use statistics to
ANALYSIS
further explore the data, such as how and why this pattern occurred.
BI extracts responses from data sets in response to data-specific
QUERYING
queries.
PERFORMANCE Comparing current performance data to previous performance data in
METRICS AND
BENCHMARKING order to measure performance.
121 | P a g e
TOPIC - 3
INTRODUCTION TO ARTIFICIAL INTELLIGENCE
▪ As expected with any new developing technology on the market, AI development is
▪ Artificial intelligence is, in its simplest form, a topic that COMBINES COMPUTER SCIENCE
▪ IT INCLUDES THE SUBFIELDS OF MACHINE LEARNING AND DEEP LEARNING, WHICH ARE COMMONLY
aim to develop expert systems that make predictions or classifications based on input
data;
ASSISTANTS.
which has since become one of the MOST INFLUENTIAL AI TEXTBOOKS. In it, they discuss
Human approach:
Ideal approach:
Defined artificial intelligence as, “The science and engineering of making intelligent
122 | P a g e
TYPES OF ARTIFICIAL INTELLIGENCE – WEAK AI VS. STRONG AI
▪ Weak AI, also known as Narrow AI or Artificial Narrow
WEAK AI or
Intelligence (ANI), is AI that has been trained and toned to do
NARROW AI or
particular tasks.
ARTIFICIAL
▪ Most of the AI that surrounds us today is powered by weak AI.
NARROW ▪ This form of artificial intelligence is anything but feeble;
INTELLIGENCE ▪ Example includes: - APPLE’S SIRI, AMAZON’S ALEXA, IBM WATSON, AND
DRIVERLESS CARS, AMONG OTHERS.
ARTIFICIAL
KIND OF ARTIFICIAL INTELLIGENCE in which a machine
possesses: -
GENERAL
▪ Human-level intellect;
INTELLIGENCE
▪ Self-aware consciousness and
STRONG AI
▪ The ability to solve problems, learn, and plan for
the future.
▪ Super Intelligence, also known as Artificial Super
Intelligence (ASI), would TRANSCEND THE INTELLIGENCE
ARTIFICIAL AND CAPABILITIES OF THE HUMAN BRAIN.
SUPER ▪ Despite the fact that strong AI is YET TOTALLY
123 | P a g e
TOPIC 4 - MACHINE LEARNING VS. DEEP LEARNING
POINT – 1
▪ Deep learning and Machine
learning are frequently used
INTERCHANGEABLY;
124 | P a g e
ARTIFICIAL INTELLIGENCE MACHINE LEARNING DEEP LEARNING
▪ Intelligence, as defined in Chamber’s ▪ Machine Learning is a type of Artificial ▪ Deep learning automates a significant portion of the
problems and adapt to new situations”. ▪ In ML, we TEACH machines with data to perform hierarchy of characteristics that differentiate certain
125 | P a g e
MACHINE LEARNING VS. DEEP LEARNING
POINT – 2
MACHINE LEARNING DEEP LEARNING
Deep learning and machine learning differ in how their respective algorithms learn.
126 | P a g e
MACHINE LEARNING VS. DEEP LEARNING
POINT – 3
MACHINE LEARNING DEEP LEARNING
“Classical” or “Non – deep” machine learning requires more
It does not require human interaction to interpret data.
human interaction to learn.
127 | P a g e
TOPIC - 5
INTRODUCTION TO ROBOTIC PROCESS AUTOMATION
▪ With RPA, software users develop software robots or “bots” that are capable of
BOTS;
▪ Robotic Process Automation software bots can communicate with any application or
system in the same manner that humans can, with the exception that RPA bots can
dependability;
▪ Robotic Process Automation bots possess a digital skill set that exceeds that of
humans. Consider RPA bots to be a Digital Workforce capable of interacting with any
system or application.
BENEFITS OF RPA
1) Higher productivity;
2) Higher accuracy;
3) Saving of cost;
5) Scalability;
6) Harnessing AI.
128 | P a g e
TOPIC – 6
INTRODUCTION TO MACHINE LEARNING
▪ Machine learning (ML) is a branch of study devoted to develop systems that “LEARN”
CONCLUSIONS WITHOUT BEING EXPLICITLY TAUGHT TO DO SO i.e., Programs that are capable
of machine learning can complete tasks without being expressly designed to do so.
▪ Machine learning algorithms CONSTRUCT A MODEL BASED ON TRAINING DATA AND SAMPLE
DATA.
recognition etc.
129 | P a g e
APPROACHES TOWARDS MACHINE LEARNING
On the basis of the type of “signal” or “feedback” provided to the learning
system, machine learning systems are generally categorized into five major
categories:
SUPERVISED LEARNING
CONSTRUCT
Supervised learning algorithms construct a mathematical model of
MATHEMATICAL MODEL
OF DATASET a data set.
INCLUDES inputs and expected outcomes;
DATASET
CONSISTS
of collection of training examples and is known as
TRAINING DATA.
REPRESENTED by a MATRIX.
130 | P a g e
UNSUPERVISED LEARNING
DATASET
Unsupervised learning approaches utilize a dataset comprising just
COMPRISING JUST
INPUTS inputs to IDENTIFY DATA STRUCTURE.
EXAMPLES GROUPING, CLUSTERING;
CLUSTERING
(called clusters) cluster are similar clusters are
based on one or more different.
present criteria
POINT – 2
Different clustering approaches necessitate: -
Varying assumptions regarding the structure of the data, which is
frequently characterized by a similarity metric and evaluated.
SEMI-SUPERVISED LEARNING
Semi-supervised learning is intermediate BETWEEN UNSUPERVISED
INTERMEDIATE
LEARNING (without labelled training data) AND SUPERVISED LEARNING
(UL – SL)
(with completely labelled training data).
131 | P a g e
REINFORCEMENT LEARNING
1.) Reinforcement learning is a subfield of machine learning concerned with
determining: -
REWARD”.
learning (MDP).
3.) Autonomous cars and learning to play a game against a human opponent both employ
4.) Due to the field’s generic nature, it is explored in several different fields, including
algorithms.
DIMENSIONALITY REDUCTION
▪ PRINCIPAL COMPONENT ANALYSIS is a well-known technique for dimensionality reduction
(PCA).
▪ PCA includes transforming data with more dimensions (e.g., 3D) to a smaller space
(e.g., 2D). This results in a decreased data dimension (2D as opposed to 3D), while
retaining the original variables in the model and without altering the data.
▪ In other words, it is the PROCESS OF LOWERING THE SIZE OF THE FEATURE SET, which is also
132 | P a g e
11.6 – MODEL VS DATA – DRIVEN DECISION MAKING
INTRODUCTION
▪ A crucial feature of Data science to keep in mind is that POOR DATA WILL NEVER RESULT
▪ In artificial intelligence, there are two schools of thought: data-driven and model-
driven.
▪ The data-driven strategy focuses on enhancing data quality in order to ENHANCE THE
manipulations.
133 | P a g e