0% found this document useful (0 votes)
48 views134 pages

DA Marathon Notes

Uploaded by

bokachodasoumen6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views134 pages

DA Marathon Notes

Uploaded by

bokachodasoumen6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 134

8.

1 – Meaning, Nature, Properties, Scope of Data

Introduction Data Information Knowledge

NATURE OF DATA

▪ Numerical Data
▪ Descriptive Data
▪ Graphic Data
Now, Let’s Understand

Data, Information
and KNOWLEDGE
DATA, INFORMATION, KNOWLEDGE

DATA INFORMATION KNOWLEDGE

POINT - 1 POINT - 1 POINT - 1


Any ‘data’ on its own does not confer any Data needs to processed for gathering
When these ‘information’ is used for solving
meaning. information.
a problem, we may it’s the use of knowledge.
OR
Any ‘data’ on its own can’t convey any POINT - 2
*KNOWLEDGE = INFORMATION +
information. Most commonly, we take the help of
APPLICATION OF IT*
computers and software packages for
POINT - 2
processing data.
The idea of data in the syllabus is frequently
described to as ‘raw’ data, which is a collection *INFORMATION = DATA +
of meaningless text, numbers, and symbols. CONTEXT*
*DATA = RAW DATA*
RELATIONSHIP BETWEEN DATA, INFORMATION, KNOWLEDGE

DATA INFORMATION KNOWLEDGE

▪ DATA is a source of INFORMATION and INFORMATION needs to be processed for gathering KNOWLEDGE.
▪ The Goal of any Information systems is to transform data into information to generate knowledge that can be used for
decision making.
Now Let’s Understand

NATURE OF DATA
NATURE OF DATA
Over the time the magnitude and availability of data has exponentially grown over the
years. However, the data sets may be classified into different groups as below:
NUMERICAL DATA Any data expressed as a number is a numerical data.
(i)
(Note – 1)
DESCRIPTIVE DATA Information gets deciphered in the form of
(ii)
(Note – 2) qualitative information.
▪ A picture or graphic may tell thousand stories.
GRAPHIC DATA
(iii) ▪ Data may also be presented in the form of a
(Note – 3)
picture or graphics.
8.2 – TYPES OF DATA IN FINANCE AND COSTING

TOPIC – 1 INTRODUCTION AND KINDS OF DATA

TOPIC – 2 TYPES OF DATA

10 | P a g e
TOPIC – 1
INTRODUCTION & KINDS OF DATA
▪ Data plays a very important role in the study of finance and cost accounting. From the

inception of the study of finance, accounting and cost accounting, data always played

an important role. Be it in the form of financial statements, or cost statements etc.

the finance and accounting professionals played a significant role in helping the

management to make prudent decisions.

▪ The kinds of data used in finance and costing may be quantitative as well as qualitative

in nature.

▪ By the term ‘quantitative data’, we mean the DATA EXPRESSED IN

NUMBERS;
QUANTITATIVE
▪ The stock price data, financial statements etc are examples of
FINANCIAL DATA
quantitative data. As most of the financial records are maintained in

the form of organised numerical data.

▪ However, SOME DATA IN FINANCIAL STUDIES MAY APPEAR IN A

QUALITATIVE FORMAT e.g., text, videos, audio etc.;


QUALITATIVE
▪ For example, the ‘Management discussion and analysis’ presented
FINANCIAL DATA
as part of annual report of a company is mostly presented in the form

of text.

11 | P a g e
TYPES OF DATA
NOMINAL SCALE ORDINAL SCALE INTERVAL SCALE RATIO SCALE
It is being used for It is being used for It is used for categorising These are the ultimate
categorising/Classifying Classifying and putting it in and ranking using an equal nirvana when it comes to data
data. order. Interval scale. measurement scales because
they tell us about the order,
they tell us the exact value
between units, AND they also
have an absolute zero.
Example Example Example Example
8.3 – DIGITIZATION OF DATA AND INFORMATION

TOPIC – 1 OBJECTIVES OF DIGITIZATION

TOPIC – 2 LASRGEST DIGITIZATION PROJECT

TOPIC – 3 WHY WE DIGITIZE?

TOPIC – 4 HOW DO WE DIGITIZE?

14 | P a g e
OBJECTIVES OF DIGITIZATION
1.) To provide a “WIDESPREAD ACCESS” of data and information to a very large group of
users simultaneously.
2.) It helps in “PRESERVATION OF DATA” for a longer period.
LARGEST DIGITIZATION PROJECT
One of the largest digitization projects taken up in India is “UNIQUE IDENTIFICATION
NUMBER” (UID) or “AADHAR”
Let’s Learn - WHY WE DIGITIZE? – 8 Points
SET – 1
Higher integration with BUSINESS INFORMATION
Integration
SYSTEMS;
Access Kaun Digitized records may be accessed by MORE THAN ONE
Karega? PERSON SIMULTANEOUSLY;
Access Kaha se Can be accessed from MULTIPLE LOCATIONS THROUGH
karega? NETWORKED SYSTEMS;
Helps in work PROCESSING;
Work

Increased scope for rise in ORGANIZATIONAL


Rise
PRODUCTIVITY;
SET – 2
Improves classification and indexing for documents, this HELPS
Class – Index
IN RETRIEVAL OF THE RECORDS;
EASIER TO KEEP BACK-UP FILES and retrieval during any
Back Page
unexpected disaster;
Physical Requires LESS PHYSICAL STORAGE SPACE.
HOW DO WE DIGITIZE?

Phase – 1
Justification of the proposed
digitization project

Phase – 2
Assessment

Phase – 3
Planning

Phase – 4
Digitization activities

Phase – 5
Processes in the care of
records

Phase – 6
Evaluation
PHASE – 3 “PLANNING”

JUSTIFICATION OF
THE PROPOSED
DIGITIZATION
PROJECT

▪ Data Safety MACHINE – 100 Machine (Reqd)


▪ Employee turnover;
ASSESSMENT MAN – 2000 (Reqd)
BENEFIT > 5 Crores ▪ Possibility of Cyber Attacks;
▪ Possibility of Natural Disaster;

SELECTION OF
DIGITISATION RESOURCES MANAGEMENT
APPROACH
RISK MANAGEMENT

▪ In-house; RESOURCES MANAGEMENT.


▪ Outsourced Agency; (Since, Resource Abundancy is
PLANNING ▪ In Batches (Since Modular); not possible.)
or On demand.

Note - 1
STAGES OF PLANNING

Selection of digitization approach Project Documentation Resources Management Technical Specifications Risk Management
PHASE 4: DIGITIZATION ACTIVITIES
Upon the completion of assessment and planning phase, the digitization activities start.
The Wisconsin Historical Society developed a six-phase process viz. Planning, Capture,
Primary quality control, Editing, Secondary quality control, and storage and
management.
The planning schedule is prepared at the first stage, calibration of
PLANNING
hardware/software and scanning etc. is done next.
PRIMARY QUALITY
A primary quality check is done on the output to check the reliability.
CONTROL

Cropping, colour correction, assigning Metadata etc is done at the


EDITING
editing stage.
SECONDARY
A final check of quality is done on randomly selected samples.
QUALITY CONTROL

STORAGE AND And finally, user copies are created, and uploaded to dedicated
MANAGEMENT.
storage space, after doing file validation.
PHASE 5: PROCESSES IN THE CARE OF RECORDS
Once the digitization of records is complete, there are few additional requirements
arise which may be linked to administration of records. The permission for: -
▪ CLASSIFICATION (if necessary),
▪ INTELLECTUAL CONTROL (over data),
▪ ACCESSION OF DATA, and
▪ UPKEEPING AND MAINTENANCE OF DATA are few additional requirements for “DATA
MANAGEMENT.”
PHASE 6: EVALUATION
Once the digitization project is updated and implemented, the final phase should be: -
SYSTEMATIC of the PROJECT’S MERIT, WORTH AND SIGNIFICANT using objective
DETERMINATION criteria.
PRIMARY PURPOSE IDENTIFY CHANGES that would improve future digitization processes.
8.4 - TRANSFORMATION OF DATA TO DECISION RELEVANT
INFORMATION

INTRODUCTION
▪ The emergence of big data has changed the world of business like never before.

▪ The most important shift has happened in the information generation and the

decision-making process. The data that encompasses the organization is being

harnessed into information that apprises, cares and prudent decision making in a

judicious and repeatable manner.

▪ The pertinent question here is What an enterprise needs to do for transforming data

into relevant information? As noted earlier, all types of data may not lead to relevant

information for decision making.

TO MAKE THE DATA TURN INTO USER FRIENDLY INFORMATION, IT SHOULD GO

THROUGH SIX CORE STEPS:

Appropriate software and hardware may be used for this purpose.


COLLECTION OF Appointment of trained staff also plays an important role in collecting
DATA
accurate and relevant data.

▪ The raw data needs to be organized in an appropriate manner to

generate relevant information.


ORGANISING THE
DATA ▪ The data may be grouped, arranged in a manner that create useful

information for the target user groups.

At this step, data needs to be cleaned to remove the unnecessary


DATA elements. If any data point is missing or not available, that also need
PROCESSING
to be addressed;

▪ Data integration is the process of combining data from various

sources into a single, unified form.


INTEGRATION OF
DATA ▪ Data integration eventually enables the analytics tools to produce

effective, actionable business intelligence.

19 | P a g e
▪ Data reporting stage involves translating the data into a

consumable format to make it accessible by the users. For example,

for a business firm, they should be able to provide summarized

DATA REPORTING financial information e.g., revenue, net profit etc.

▪ The objective is, a user, who wants to understand the financial

position of the company should get the relevant and accurate

information.

At this ultimate step, data is being utilized to back corporate

DATA activities and enhance operational efficiencies and productivity for


UTILIZATION the growth of business. This makes the corporate decision making

really ‘data driven’.

20 | P a g e
8.5 – COMMUNICATION OF INFORMATION FOR QUALITY
DECISION MAKING

INTRODUCTION
The quality information should lead to quality decisions. With the help of well

curated and reported data, the decision makers should be able to add higher-value

business insights leading to better strategic decision making.

By transforming the information into a process for quality decision making, the firm

should achieve the following abilities:

1) Identifying POSSIBLE FRAUD POSSIBILITIES on the basis of data analytics;

2) Diagnose, filter and except value from financial and operational information for
MAKING BETTER BUSINESS DECISIONS;

3) Making strategies for responding to uncertain events like market volatility;

4) RECOGNIZE VIABLE ADVANTAGES TO SERVICE CUSTOMERS in a better manner;

5) Real time SPOTTING OF EMERGING OPPORTUNITIES and also capability gaps;

6) Predict outcomes more effectively compared to conventional forecasting techniques

based on historical financial reports;

7) LOGICAL UNDERSTANDING OF a wide-ranging STRUCTURED AND UNSTRUCTURED DATA and put

on that information to corporate planning and decision support.

21 | P a g e
8.6 – PROFESSIONAL SKEPTICISM REGARDING DATA

INTRODUCTION
While data analytics is an important tool for decision making, managers should never

take an important analysis at face value. A deeper understanding of hidden insights

that lie underneath the surface of the data set need to be explored. The emergence

of new data analytics tools and techniques in financial environment allows the

accounting and finance professionals to gain unique insights into the data.

As the availability of data is bigger now, analysts and auditors not only getting more

information, but also is facing challenges

Professional scepticism is an important focus area for practitioners, researchers,

regulators and standard setters. At the same time, professional scepticism may result

into additional costs. Under such circumstances, it is important to identify and

understand conditions in which the finance and audit professionals should apply

professional scepticism. There is a requirement to keep a fine balance between costly

scepticism and underutilizing data analytics to keep the cost under control.

22 | P a g e
8.7 ETHICAL USE OF DATA AND INFORMATION
Data analytics can help in decision making process and make an impact. However, this
empowerment for business also comes with challenges. The question is how the
business organizations can ethically collect store and use data? And what rights need
to be upheld? Below we will discuss five guiding principles in this regard.
The five basic principles of data ethics that a business organization should follow are:
PRINCIPLE – 1
REGARDING OWNERSHIP
 The first principle is that ownership of any personal information belongs to the
Person.
 It is unlawful and unethical to collect someone’s personal data without their
consent.
 The consent may be obtained through digital privacy policies or signed agreements or
by asking the users to agree with terms and conditions.
 It is always advisable to ask for permission beforehand to avoid future legal and
ethical complications.
HOW TO LEARN?
PRINCIPLE – 2
REGARDING TRANSPARENCY
 Maintaining transparency is important while gathering data.
 The objective with which the company is collecting user’s data should be known to
the user.
 For example, is the company is using cookies to track the online behaviour of the user,
it should be mentioned to the user through a written policy that cookies would be used
for tracking user’s online behaviour and the collected data will be stored in a secure
database. After reading the policy, the user may decide to accept or not to accept the
policy.
 Similarly, while collecting the financial data from clients, it should be clearly
mentioned that for which purpose the data should be used.
HOW TO LEARN?

OBJECTIVE PURPOSE
TRANSPARENCY
(Imp) (User data) (Financial data)

(User) (Clients)
PRINCIPLE – 3
REGARDING PRIVACY
 As the user may allow to collect, store and analyze the PERSONALLY IDENTIFIABLE
INFORMATION (PII), that does not imply it should be made publicly available.
 For companies, it is mandatory to publish some financial information to public e.g.,
through annual reports. However, there may be many confidential information, which
if falls on a wrong hand may create problems and financial loss.
 To protect privacy of data, a data security process should be in place. This may
include file encryption and dual authentication password etc.
HOW TO LEARN?
PERSONALLY IDENTIFIABLE INFORMATION (PII) KE REGARD MEI?
What you “CAN” do? What you “CAN’T” do? What you “SHOULD” do?
 A data security process
should be in place.
It should not be made publicly  This may include file
Collect, Store and Analyze
available. encryption and dual
authentication password etc.
(NOTE - 1)
PRINCIPLE – 4
REGARDING INTENTION
The intention of data analysis: -
1.) Should never be making profits out of others;
OR
2.) Hurting others.
HOW TO LEARN?

×
PRINCIPLE – 5
REGARDING OUTCOMES
In some cases, even if the intentions are good, the result of data analysis may
inadvertently hurt the clients and data providers. This is called disparate
impact, which is unethical.
CHAPTER 9
DATA PROCESSING, ORGANIZATION, CLEANING AND VALIDATION

Chapter overview
9.1 Development of Data Processing

9.2 Functions of Data Processing

9.3 Data Organisation and Distribution

9.4 Data Cleaning and Validation

28 | P a g e
9.1 – DEVELOPMENT OF DATA PROCESSING

DATA PROCESSING
▪ Introduction
TOPIC – 1
▪ Phases of data processing

▪ How data processing and data science is relevant for finance?

29 | P a g e
TOPIC - 1
DATA PROCESSING
INTRODUCTION
▪ In recent years, the capacity and effectiveness of DP have increased manifold with

the development of technology.

▪ Data processing (DP) is the process of Organising, categorizing, and manipulating

data in order to extract information.

▪ The information extracted as a result of DP is also heavily reliant on the quality

of the data. Data quality may get affected due to several issues like missing data and

duplications.

▪ Data processing that used to require a lot of human labour progressively superseded

by modern tools and technology.

30 | P a g e
PHASES OF DATA PROCESSING

▪ Manual DP involves processing data without much assistance from

machines.

MANUAL DP ▪ Prior to the phase of mechanical DP only small-scale data

processing was possible using manual efforts.

▪ However, in some special cases Manual DP is still in use today.

▪ Mechanical DP processes data using mechanical (not modern

computers) tools and technologies.

▪ This phase began in 1890 (Bohme et al., 1991) when a system made
MECHANICAL DP
up of intricate punch card machines was installed by the US Bureau

of the Census in order to assist in compiling the findings of a recent

national population census.

And finally, the electronic DP replaced the other two that resulted

ELECTRONIC DP fall in mistakes and rising productivity. Data processing is being done

electronically using computers and other cutting-edge electronics.

31 | P a g e
HOW DATA PROCESSING AND DATA SCIENCE IS RELEVANT FOR FINANCE?
The relevance of data processing and data science in the area of finance is increasing

every day. Significant areas where data science play important role are:

▪ Business inevitably involves risk, particularly in the financial

industry.

▪ It is crucial to determine the risk factor before making any

decisions.

▪ Given that a large portion of a company’s risk-related data is


RISK ANALYTICS
“unstructured,” its analysis without data science methods can be

challenging and prone to human mistake.

▪ Machine learning algorithms can look through historical

transactions and general information to help banks analyse each

customer’s reliability and trust worthiness and determine the

relative risk.

▪ In recent years, many financial institutions may have processed

their data solely through the machine learning capabilities of

Business Intelligence (BI);

▪ There are currently more transactions occurring every minute

than ever before.


CUSTOMER DATA

MANAGEMENT
▪ Due to the arrival of social media and new Internet of Things (IoT)

devices, a significant portion of this data does not conform to the

structure of organized data previously employed;

▪ Using methods such as data mining and natural language processing

is well- equipped to deal with massive volumes of unstructured new

data.

32 | P a g e
▪ Despite the fact that each consumer is unique, it is only possible to

comprehend their behaviour after they have been categorized or

divided;

▪ Customers are frequently segmented based on socioeconomic

factors, such as geography, age, and buying patterns;


CUSTOMER ▪ By examining these clusters collectively, organisations may assess
SEGMENTATION
a customer’s current and long-term worth;

▪ To do this, data scientists can use automated machine learning

algorithms to categorise their clients based on specified attributes

that have been assigned relative relevance scores. Comparing these

groupings to former customers reveals the expected value of time

invested with each client.

▪ Even major organisations strive to provide customized service to

their consumers as a method of enhancing their reputation and

increasing customer lifetime value;


PERSONALIZED
▪ Natural language processing (NLP) and voice recognition
SERVICES
technologies dissect these encounters into a series of important

points that can identify chances to increase revenue, enhance the

customer service experience, and steer the company’s future.

▪ In a world where choice has never been more crucial, it has become

evident that each customer is unique; nonetheless, there have never


CONSUMER
been more consumers;
ANALYTICS
▪ This cannot be sustained without the intelligence and automation of

machine learning as it is important to ensure that each client receives

a customised service.

▪ Anomalies are only proved to be anomalous after the event happens;

▪ Although data science can provide real-time insights, it cannot


ANOMALY anticipate singular incidents of credit card fraud or identity theft;
DETECTION
▪ The methods for anomaly identification consist of Recurrent Neural

Networks.

33 | P a g e
▪ Predictive analytics enables organisations in the financial sector to

extrapolate from existing data and anticipate what may occur in the

future, including how patterns may evolve.

▪ When prediction is necessary, machine learning is utilised.

▪ Using machine learning techniques, pre-processed data may be input

into the system in order for it to learn how to anticipate future


PREDICTIVE
ANALYTICS
occurrences accurately.

▪ In the case of stock market pricing, machine learning algorithms

learn trends from past data in a certain interval (may be a week,

month, or quarter) and then forecast future stock market trends

based on this historical information. This allows data scientists to

depict expected patterns for end-users in order to assist them in

making investment decisions and developing trading strategies.

▪ As the technologies used to analyse big data become more

sophisticated, so do their capacity to detect fraud early on.

▪ With a rise in financial transactions, the risk for fraud also

increases.

▪ Tracking incidents of fraud, such as identity theft and credit card

scams, and limiting the resulting harm is a primary responsibility for


FRAUD DETECTION
financial institutions.

▪ Artificial intelligence and machine learning algorithms can now

detect credit card fraud significantly more precisely.

▪ If a major purchase is made on a credit card belonging to a consumer

who has traditionally been very frugal, the card can be immediately

terminated, and a notification sent to the card owner.

34 | P a g e
▪ Algorithmic trading happens when an unsupervised computer

utilizing the intelligence supplied by an algorithm trade suggestion on

the stock market. As a consequence, it eliminates the risk of loss

ALGORITHMIC caused by indecision and human error;


TRADING
▪ One of the primary advantages of algorithmic trading is the

increased frequency of deals. Based on facts and taught behaviour,

the computer can operate in a fraction of a second without human

indecision or thought.

35 | P a g e
9.2 – FUNCTIONS OF DATA PROCESSING

PROCESSES INVOLVED IN DATA PROCESSING


▪ Validation

▪ Sorting

▪ Aggregation
TOPIC – 1
▪ Analysis

▪ Reporting

▪ Classification

36 | P a g e
TOPIC – 1
PROCESSES INVOLVED IN DATA PROCESSING
1.) VALIDATION
As per the UNECE glossary on statistical data editing (UNECE

2013), data validation may be defined as ‘AN ACTIVITY AIMED AT


MEANING
VERIFYING WHETHER THE VALUE OF A DATA ITEM COMES FROM THE GIVEN SET

OF ACCEPTABLE VALUES.’

The objective of data validation is to assure a particular degree of


OBJECTIVE
DATA QUALITY.

In official statistics, however, quality has multiple dimensions:

1. Relevance;

2. Accessibility;

DIMENSIONS OF 3.. Timeliness;


DATA QUALITY 4. Clarity;

5. Comparability;

6. Comprehensiveness.

7. Correctness;

37 | P a g e
2.) SORTING
▪ Data sorting is any procedure that ORGANIZES DATA INTO A MEANINGFUL
ORDER TO MAKE IT SIMPLER TO COMPREHEND, ANALYSE, AND VISUALIZE.

▪ It presents research data in a manner that facilitates

comprehension of the story being told by the data.

▪ Sorting can be performed on raw data (across all records) or

INTRODUCTION aggregated information (in a table, chart, or some other aggregated

or summarized output).

▪ Typically, data is sorted in ascending or decreasing order based on

actual numbers, counts, or percentages.

▪ It is also vitally necessary to organize visualizations (tables, charts,

etc.) correctly to facilitate accurate data interpretation.


3.) AGGREGATION
▪ Data aggregation REFERS TO ANY PROCESS IN WHICH DATA IS SUMMARISED

i.e., A common application of data aggregation is to offer relevant

MEANING summary data for business analysis;

▪ When data is aggregated, individual data rows, which are often

compiled from several sources, are replaced with summaries or totals.

As the amount of data kept by businesses continues to grow,


WHY IS IT
aggregating the most significant and often requested data can
REQUIRED?
facilitate their efficient access.

Data aggregation enables analysts to ACCESS and analyse VAST

QUANTITIES OF DATA IN A REASONABLE AMOUNT OF TIME. A data warehouse

BENEFITS often contains aggregate data since it may offer answers to

analytical inquiries and drastically cut the time required to query

massive data sets.

38 | P a g e
4.) ANALYSIS
It is described as the process of cleansing, converting data to obtain
MEANING
actionable business Intelligence.

The objective of data analysis is TO EXTRACT relevant information from


OBJECTIVE
data AND MAKE DECISIONS based on this knowledge.
WHY IS IT Analysis is sometimes all that is REQUIRED TO EXPAND YOUR BUSINESS.
REQUIRED

Every time we make a decision in our day-to-day lives, we consider

what occurred previously or what would occur if we choose a specific

option. This is a simple example of data analysis. This is nothing more


EXAMPLE
than studying the past or the future and basing judgments on that

analysis. Now, the same task that an analyst does for commercial

goals is known as Data Analysis.

DATA ANALYSIS

TOOLS

39 | P a g e
5.) REPORTING
▪A DATA REPORT IS NOTHING MORE THAN A SET OF DOCUMENTED FACTS AND

NUMBERS;

▪ In any industry, accurate data reporting plays a crucial role.

Utilizing business information in healthcare enables physicians to

provide more effective and efficient patient care, hence saving lives;
INTRODUCTION

▪ Financial data such as revenues, accounts receivable, and net

profits are often summarized in a company’s data reporting. This

gives an up-to-date record of the company’s financial health or a

portion of the finances, such as sales.

▪ It provides answer to fundamental inquiries regarding the status

of the firm, within an Excel document or a simple data visualization

tool;
BENEFITS OF
▪ It indicates where we should devote the most time and money,
REPORTING
as well as what need more organisation or attention;

▪ It facilitates the formation of judicious judgments that might

steer a business in new areas and provide additional income streams.

THE MOST
1) PRIORITIZE the most pertinent data;

EFFECTIVE 2) Capacity to COMPREHEND and organize ENORMOUS VOLUMES OF


BUSINESS
INFORMATION;
ANALYSTS
3) Ability to organize and PRESENT DATA IN AN EASY-TO-READ fashion;
POSSESS

SPECIFIC 4) Ability to EXTRACT VITAL INFORMATION FROM DATA, to keep things


COMPETENCIES
simple, and to prevent data hoarding.

40 | P a g e
6.) CLASSIFICATION
Regarding risk management and data security, the classification

of data is of special relevance.

MEANING Data classification is the process of classifying data to

important categories so that it may be utilised and

safeguarded more effectively.

Three primary methods of data classification are recognised as

industry standards:

1) Classification based on CONTENT, examines and interprets files

for sensitive data.


PRIMARY METHODS OF
2) CONTEXT-BASED CLASSIFICATION
DATA CLASSIFICATION
3) USER-BASED CLASSIFICATION relies on the human selection of

each document by the end user. To indicate sensitive documents,

user-based classification depends on human expertise and

judgement

1) Facilitating access;

PURPOSES FOR 2) Ensuring regulatory compliance; and


CATEGORISING DATA
3) Achieving other commercial or personal goals.

▪ For the purposes of data security, data classification is a useful

strategy that simplifies the application of appropriate security

measures based on the kind of data being accessed, sent, or

RESIDUARY POINTS duplicated.

▪ It is common practice to classify the sensitivity of data based

on changing levels of secrecy, which corresponds to the security

measures required to safeguard each classification level.

41 | P a g e
IT IS STANDARD PRACTISE TO DIVIDE DATA AND SYSTEMS INTO THREE RISK CATEGORIES.
If data is accessible to the public and recovery is simple, then
LOW RISK
this data and the mechanisms around it pose a smaller risk.

Essentially, they are non-public or internal data. However, it is

MODERATE RISK unlikely to be too mission-critical or sensitive to be considered

“high risk.”

▪ Anything even vaguely sensitive or critical to operational

security falls under the category of high risk. Additionally, data


HIGH RISK that is incredibly difficult to retrieve.

▪ All secret, sensitive, and essential data falls under the

category of high risk.


EXAMPLE OF DATA CLASSIFICATION
DATA MAY BE CLASSIFIED AS RESTRICTED, PRIVATE, OR PUBLIC BY AN ENTITY.

Public data are the least sensitive and have the lowest security requirements;

Restricted data are the most sensitive and have the highest security rating.
STEPS FOR EFFECTIVE DATA CLASSIFICATION
▪ Taking a comprehensive look at the organization’s current data

and any applicable legislation is likely the best beginning point for
UNDERSTANDING THE
successfully classifying data;
CURRENT SETUP
▪ Before one classifies data, one must know what data he is

having.

▪ Without adequate policy, maintaining compliance with data


CREATION OF A DATA

CLASSIFICATION protection standards in an organisation is practically difficult;


POLICY
▪ Priority number one should be the creation of a policy.

Now that a data classification policy is in place, it is time to


PRIORITIZE AND
categorise the data based on the sensitivity and privacy of
ORGANIZE DATA
the data.

42 | P a g e
DATA CLASSIFICATION MATRIX
Creating a matrix that rates data and/or systems based on how likely they are to be

hacked and how sensitive the data is enables you to rapidly identify how to classify and

safeguard all sensitive information.


CONFIDENTIAL DATA SENSITIVE DATA PUBLIC
RISK
HIGH MEDIUM LOW

Confidential data is ▪ Non-Public or Internal Public data is not

highly sensitive and may data is moderately considered sensitive.

have personal privacy sensitive in nature.

consideration. ▪ Often, Sensitive data

WHAT? is used for making

decisions, and therefore

it’s important this

information remain

timely and accurate.

If this data is not If this data is not If this data is not

available or improperly available or improperly available or improperly

disclosed, it creates a disclosed, it creates a disclosed, it creates a


INSTITUTION
negative impact on the negative impact on the negative impact on the
IMPACT
institution as a result institution as a result institution as a result

Risk is typically very Risk is typically very Risk is typically very

High. Moderate. Low.

Access to Confidential Access to Internal Access to Public

institutional data must data may be authorized institutional data may be

be controlled from to groups of persons granted to any

creation to destruction by their job requester, or it is

and vail be grained only classification or published with no


ACCESS?
to those persons responsibilities (“Role- restrictions.

affiliated with, the based” access).

University who require

such access in order to

perform their job.

43 | P a g e
9.3 – DATA ORGANIZATION AND DISTRIBUTION

TOPIC – 1 DATA ORGANIZATION

DATA DISTRIBUTION

TOPIC – 2 ▪ Discrete Distributions

▪ Continuous Distributions

44 | P a g e
TOPIC – 1
DATA ORGANIZATION
INTRODUCTION
▪ In a world where data are among the most valuable assets possessed by firms

across several industries, businesses employ data organisation methods in order to

make better use of their data assets. Executives and other professionals may

prioritize data organisation to expedite business operations, boost business

intelligence, and enhance the business model as a whole;

▪ As time passes and the data volume grows, the time required to look for any

information from the data source would rise;

▪ It is challenging to deal with or analyse raw data;

▪ Data organisation allows us to arrange data in a manner that is easy to

understand and manipulate;

▪ Classification, image representations, graphical representations etc. are examples

of data organisation techniques;

▪ STRUCTURED DATA consists of tabular information that may be readily imported into a

database and then utilised by analytics software or other applications;

▪ UNSTRUCTURED DATA are raw and unformatted data, such as a basic text document with

names, dates, and other information spread among random paragraphs.

45 | P a g e
TOPIC – 2
DATA DISTRIBUTION
INTRODUCTION
Data distribution is a function that identifies and quantifies all potential values for a

variable, as well as their relative frequency (probability of how often they occur).
TYPES OF DISTRIBUTION
Distributions are basically classified based on the type of data
DISCRETE DISTRIBUTIONS
A discrete distribution that results from countable data and has a finite number of

potential values. Example: rolling dice, selecting a specific amount of heads, etc.

Following are the discrete distributions of various types:


▪ The binomial distribution quantifies the chance of obtaining a

specific number of successes or failures each experiment.

▪ Binomial distribution applies to attributes that are

categorised into two mutually exclusive classes, such as


BINOMIAL
number of successes/failures and number of
DISTRIBUTIONS
acceptances/rejections;

▪ Example: When tossing a coin: The likelihood of a coin falling

on its head is one-half and the probability of a coin landing on its

tail is one-half.

▪ The Poisson distribution is the discrete probability

distribution that quantifies the chance of a certain number of

events occurring in a given time period.


POISSON DISTRIBUTION
▪ Poisson distribution applies to attributes that can potentially

take on huge values, but in practise take on tiny ones;

▪ Example: Number of flaws, mistakes, accidents, absentees etc.

46 | P a g e
▪ The hypergeometric distribution is a discrete distribution that

assesses the chance of a certain number of successes in (n)

trials, without replacement, from a sufficiently large population

(N). Specifically, sampling without replacement;


HYPERGEOMETRIC
▪ The hypergeometric distribution is comparable to the
DISTRIBUTION
binomial distribution; the primary distinction between the two

is that the chance of success is not the same for all trials in

the binomial distribution but it is in the hypergeometric

distribution.

▪ The geometric distribution is a discrete distribution that

assesses the probability of the occurrence of the first

GEOMETRIC success;
DISTRIBUTION ▪ Example: A marketing representative from an advertising firm

chooses hockey players from several institutions at random till

he discovers an Olympic participant.

47 | P a g e
CONTINUOUS DISTRIBUTIONS
A distribution with an unlimited number of (variable) data points that may be

represented on a continuous measuring scale. A continuous random variable is a random

variable with an unlimited and uncountable set of potential values.

Following are the Continuous distributions of various types:

▪ Gaussian distribution is another name for normal

distribution.
NORMAL
▪ It is a bell-shaped curve with a greater frequency around the
DISTRIBUTION
core point. As values go away from the centre value on each side,

the frequency drops dramatically.

LOGNORMAL A continuous random variable x follows a lognormal distribution


DISTRIBUTION if the distribution of its natural logarithm, ln(x), is normal.

▪ The F distribution is often employed to examine the equality

of variances between two normal populations.

F – DISTRIBUTION ▪ The F distribution is an asymmetric distribution with no

maximum value and a minimum value of 0.

▪ The curve approaches 0 but never reaches the horizontal axis.

▪ When independent variables with standard normal distribution

are squared and added, the chi square distribution occurs.


CHI SQUARE
▪ The distribution of chi square values is symmetrical and
DISTRIBUTIONS
constrained below zero. And approaches the form of the normal

distribution as the number of degrees of freedom grows.

48 | P a g e
▪ The exponential distribution is a probability distribution and

one of the most often employed continuous distributions. Used

EXPONENTIAL frequently to represent products with a consistent failure rate.


DISTRIBUTION ▪ The exponential distribution and the Poisson distribution are

closely connected. Has a constant failure rate since its form

characteristics remain constant.

▪ The t distribution or student’s t distribution is a probability

distribution with a bell shape that is symmetrical about its mean.

▪ Used frequently for testing hypotheses and building

T STUDENT confidence intervals for means. Substituted for the normal


DISTRIBUTION distribution when the standard deviation cannot be determined;

▪ When random variables are averages, the distribution of the

average tends to be normal, similar to the normal distribution,

independent of the distribution of the individuals.

49 | P a g e
9.4 - DATA CLEANING AND VALIDATION

TOPIC – 1 DATA CLEANING

TOPIC – 2 DATA VALIDATION

50 | P a g e
TOPIC - 1
DATA CLEANING
INTRODUCTION
▪ Data cleansing is the process of CORRECTING INACCURATE, IMPROPERLY FORMATTED,

DUPLICATE DATA from a dataset;

▪ DATA CLEANING IS DIFFERENT FROM DATA TRANSFORMATION;

▪ Data cleaning is the process of REMOVING IRRELEVANT DATA FROM A DATASET;

▪ THE PROCESS OF CHANGING DATA FROM ONE FORMAT OR STRUCTURE to another is known as

data transformation. Transformation procedures are sometimes known as data

wrangling or data munging, since they map and change “raw” data into another format

for warehousing and analysis.


STEPS FOR DATA CLEANING
STEP 1: REMOVAL OF DUPLICATE AND IRRELEVANT INFORMATION
▪ WHEN YOU MERGE DATA SETS from numerous sites, scrape data, or get data from

customers or several departments, THERE ARE POTENTIAL TO PRODUCE DUPLICATE DATA;

▪ Most duplicate observations will occur during data collecting;

▪ Eliminate unnecessary observations from your dataset, such as duplicate or irrelevant

observations;

▪ DE-DUPLICATION IS ONE OF THE MOST IMPORTANT CONSIDERATIONS FOR THIS PROCEDURE. This

may make analysis more effective in addition to producing a more manageable and

effective dataset.
STEP 2: FIX STRUCTURAL ERRORS
▪ When transferring data, you may detect unusual naming standards.

▪ For instance, “N/A” & “NOT APPLICABLE” may both be present, but they should be examined

as a single category.

51 | P a g e
STEP 3: FILTER UNWANTED OUTLIERS
▪ Occasionally, you will encounter observations that, do not appear to fit inside the data

you are evaluating. If you have a valid cause to eliminate an outlier it will improve the

performance of the data you are analysing;

▪ Remember that the existence of an outlier does not imply that it is erroneous.

Consider deleting an outlier if it appears to be unrelated to the analysis or an error.


STEP 4: HANDLE MISSING DATA
Many algorithms do not accept missing values, hence missing data cannot be ignored.

There are several approaches to handle missing data. Although neither is desirable, both

should be explored.

▪ AS A FIRST ALTERNATIVE, the observations with missing values may be dropped, but

doing so may result in the loss of information. This should be kept in mind before doing

so;

▪ AS A SECOND ALTERNATIVE, the missing numbers may be entered based on other

observations. Again, there is a chance that the data’s integrity may be compromised, as

action may be based on assumptions rather than real observations.


STEP 5: VALIDATION
As part of basic validation, one should be able to answer the following questions at the

conclusion of the data cleansing process:

▪ Does the data make sense?

▪ Does it verify or contradict your working hypothesis, or does it shed any light on it?

▪ Can data patterns assist you in formulating your next theory?

▪ Does the data adhere to the regulations applicable to its field?

False assumptions based on “Inaccurate” or “Dirty” data can lead to ineffective

company strategies and decisions. False conclusions might result in an uncomfortable

moment at a reporting meeting. Before reaching that point, it is essential to establish

a culture of data quality inside the firm. To do this, one should specify the methods

that may be employed to establish this culture and also the definition of data quality.

52 | P a g e
BENEFITS OF QUALITY DATA
Main characteristics of quality data are:

1) Validity

2) Accuracy

3) Completeness

4) Consistency

BENEFITS OF DATA CLEANING


Ultimately, having clean data would boost overall productivity and provide with the

greatest quality information for decision-making. Benefits include: -

1) Using data cleansing technologies will result in SPEEDIER DECISION-MAKING;

2) MONITORING mistakes AND IMPROVING REPORTING to determine where errors are

originating;

3) Capability to PLAN USES OF YOUR DATA;

4) Error correction when numerous data sources are involved.

53 | P a g e
TOPIC - 2
DATA VALIDATION
INTRODUCTION
Although data validation is an essential stage in every data pipeline, it is frequently

ignored. It may appear like data validation is an unnecessary step that slows down the

work, but it is vital for producing the finest possible outcomes;

If the initial data is not valid, the outcomes will not be accurate either. It is therefore

vital to check and validate data before using it;

Without data validation, one may into run the danger of basing judgments on faulty

data that is not indicative of the current situation;

Data validation is a crucial component of any data management process.


TYPES OF DATA VALIDATION
▪ A data type check verifies that the ENTERED DATA HAS THE APPROPRIATE
DATA TYPE;

DATA TYPE CHECK ▪ For instance, a field may only take NUMERIC VALUES;

▪ If this is the case, the system should reject any data containing

other characters.

▪ A code check verifies that a field’s value is picked from a legitimate

set of options or that it adheres to specific formatting


CODE CHECK requirements;

▪ For instance, it is easy to VERIFY THE VALIDITY OF A POSTAL CODE BY

COMPARING IT TO A LIST OF VALID CODES.

▪ Numerous data kinds adhere to a set format;

FORMAT CHECK ▪ Date columns that are kept in a fixed format, such as “YYYY-MM-DD”

or “DD-MM-YYYY,” are a popular use case.

▪ A consistency check is a FORM OF LOGICAL CHECK THAT VERIFIES THAT

CONSISTENCY THE DATA HAS BEEN INPUT IN A CONSISTENT MANNER;

CHECK ▪ Checking whether a package’s delivery date is later than its

shipment date is one example.

54 | P a g e
UNIQUENESS ▪ Some data like PAN OR E-MAIL IDS are unique by nature;
CHECK ▪ These fields should typically contain unique items in a database.

▪ ARRANGE CHECK DETERMINES WHETHER OR NOT INPUT DATA FALLS INSIDE A

SPECIFIED RANGE.

▪ Latitude and longitude, for instance, are frequently employed in


RANGE CHECK
geographic data. A latitude value must fall between -90 and 90

degrees, whereas a longitude value must fall between -180 and 180

degrees. Outside of this range, values are invalid.

55 | P a g e
QUESTION BANK

MULTIPLE CHOICE QUESTIONS


QUESTION NO. 1 QUESTION NO. 2
Data science plays an important role in: - The primary benefit of data distribution is: -

a) Risk analytics a) the estimation of the probability of any

b) Customer data management certain observation within a sample space

c) Consumer analytics b) the estimation of the probability of any

d) All of the above certain observation within a non-sample space

c) the estimation of the probability of any

certain observation within a population

d) the estimation of the probability of any

certain observation without a non-sample space


QUESTION NO. 3 QUESTION NO. 4
Binomial distribution applies to The geometric distribution is a discrete

attributes: - distribution that assesses: -

a) that are categorised into two mutually a) The probability of the occurrence of the

exclusive and exhaustive classes first success

b) that are categorised into three b) The probability of the occurrence of the

mutually exclusive and exhaustive classes second success

c) that are categorised into less than two c) The probability of the occurrence of the

mutually exclusive and exhaustive classes third success

d) that are categorised into four mutually d) The probability of the occurrence of the

exclusive and exhaustive classes less success


QUESTION NO. 5
The probability density function describes: -

a) the characteristics of a random variable

b) the characteristics of a non-random variable

c) the characteristics of a random constant

d) the characteristics of a non-random constant

ANSWER
QUES NO. 1 2 3 4 5
ANS NO. d a a a a

56 | P a g e
STATE TRUE OR FALSE
Data validation could be operationally defined as a process which ensures the

1 correspondence of the final (published) data with a number of quality

characteristics.

Data analysis is described as the process of cleansing, converting, and modelling


2
data to obtain actionable business intelligence.

Financial data such as revenues, accounts receivable, and net profits are often
3
summarised in a company’s data reporting.

Structured data consists of tabular information that may be readily imported into
4
a database and then utilised by analytics software or other applications.

Data distribution is a function that identifies and quantifies all potential values

5 for a variable, as well as their relative frequency (probability of how often they

occur).

ANSWER
QUES NO. 1 2 3 4 5
ANSWER T T T T T

57 | P a g e
SHORT ESSAY TYPE QUESTIONS
1 Briefly discuss about the role of data analysis in fraud detection

2 Discuss the difference between discrete distribution and continuous distribution.

3 Write a short note on binomial distribution

4 What is the significance of data cleaning?

5 Write a short note on ‘predictive analytics’.

ESSAY TYPE QUESTIONS

1 Elaborately discuss the functions of data analysis.

2 Elaborately discuss the various steps involved in data cleaning.

3 Discuss the benefits of ‘data cleaning’.

4 How data processing and data science is relevant for finance?

5 Discuss the steps for effective data classification.

FILL IN THE BLANKS


1 Data may be classified as Restricted, _______ or Public by an entity

2 Data organisation is the_______ of unstructured data into distinct groups

Classification, frequency distribution tables, _______, graphical


3
representations, etc. are examples of data organisation techniques.

The distribution or student’s distribution is a probability distribution with a bell


4
shape that is symmetrical about it’s __________.

________ is the process of correcting or deleting inaccurate, corrupted,


5
improperly formatted, duplicate, or insufficient data from a dataset.

ANSWER
1 Private

2 Classification
3 Image representation
4 Mean
5 Data cleaning

58 | P a g e
CHAPTER 10
DATA PRESENTATION: VISUALIZATION AND GRAPHICAL PRESENTATION

Chapter overview
10.1 Data Visualization of Financial and Non – Financial Data

10.2 Objective and Function of Data Presentation

10.3 Data Presentation Architecture

10.4 Dashboards, Graphs, Diagrams, Tables, Report Design

10.5 Tools and Techniques of Visualization and Graphical Presentation

59 | P a g e
10.1 - DATA VISUALIZATION OF FINANCIAL AND NON-
FINANCIAL DATA

TOPIC – 1 INTRODUCTION

TOPIC – 2 WHY DATA VISUALIZATION IS IMPORTANT?

TOPIC – 3 DOING DATA VISUALIZATION IN THE RIGHT WAY?

60 | P a g e
TOPIC - 1
INTRODUCTION
▪ There is a saying ‘A picture speaks a thousand words’.

▪ Numerous sources of in-depth data are now available to management teams,

allowing them to better track and anticipate organisational performance.

However, obtaining data and presenting it are two distinct and equally

essential activities. Data visualization comes into play at this point.

61 | P a g e
TOPIC – 2
WHY DATA VISUALIZATION IS IMPORTANT?
▪ Scott Berinato, senior editor and data visualisation specialist for Harvard

Business Review, writes in a recent post that “data visualization was once a

talent largely reserved for design- and data-minded managers. Today, he

deems it indispensable for managers who wish to comprehend and

communicate the significance of the data flood we are all experiencing.

▪ Several studies indicate that sixty five percent of individuals are visual

learners. Giving decision makers an opportunity to have visual representations

of facts improves comprehension and can eventually lead to better judgments.

▪ In addition, the technique of developing data visualisations may aid finance in

identifying more patterns and gaining deeper insights, particularly when many

data sources or interactive elements are utilised.

62 | P a g e
TOPIC – 3
DOING DATA VISUALIZATION IN THE RIGHT WAY?
All data visualisation isn’t created equally engaging. When properly executed, it

simplifies difficult topics. However, if data visualisations are executed

improperly, they might mislead the audience or misrepresent the data.

Finance professionals who are investigating how data visualisation might help

their analytics efforts and communication should keep the following in mind:

▪ HBR’s Berinato suggests, first establishment of the

information if it’s conceptual or data-driven (i.e., does it

rely on qualitative or quantitative data) is required. Then

specify if the objective is exploratory or declarative.

KNOW THE OBJECTIVE ▪ For instance, if the objective is to display the income

from the prior quarter, the goal is declarative. On the

other hand, one is curious as to whether the income

increase correlates with the social media spending, the

objective is exploratory.

▪ Who views the data visualisations will determine the

degree of detail required.

▪ For instance, finance data presentation requires high-

ALWAYS KEEP THE


level, highly relevant information to aid in strategic
AUDIENCE IN MIND
decision-making. However, if one is delivering a

presentation to ‘line of business’ executives, delving into

the deeper details might offer them with knowledge that

influences their daily operations.

63 | P a g e
▪ There are a multitude of technological tools that make it

simple to produce engaging visualisations in the current

digital age;
INVEST IN THE BEST
▪ The firm should first implement an ERP that develops a
TECHNOLOGY

centralised information repository; Then, look for tools

that allows to instantly display data by dropping charts, and

graphs.

While everyone on the finance team can understand the

fundamentals of data visualisation, training and a shift in


IMPROVE THE TEAM’S
hiring priorities may advance the team’s data
ABILITY TO VISUALISE

DATA visualisation skills. Additionally, when making new recruits,

look out individuals with proficiency in data analytics and

extensive data visualisation experience.

64 | P a g e
10.2 OBJECTIVE AND FUNCTION OF DATA PRESENTATION

TOPIC – 1 INTRODUCTION TO DATA VISUALISATION

TOPIC – 2 OBJECTIVE OF DATA VISUALISATION

MOST COMMON ERRORS MADE BY ANALYSTS THAT MAKES A DATA


TOPIC – 3
VISUALISATION UNSUCCESSFUL

65 | P a g e
TOPIC - 1
INTRODUCTION TO DATA VISUALISATION
▪ The absence of data visualisation would make it difficult for organisations to

immediately recognise data patterns. The graphical depiction of data sets

enables analysts to visualise new concepts and patterns. With the daily increase

in data volume, it is hard to make sense of the quintillion bytes of data

without data proliferation, which includes data visualisation.

▪ Every company may benefit from a better knowledge of their data, hence data

visualisation is expanding into all industries where data exists. Information is

the most crucial asset for every organisation. Through the use of visuals, one

may effectively communicate their ideas and make use of the information.

▪ Dashboards, graphs, infographics, maps, charts, videos, and slides may all be

used to visualise and comprehend data.

▪ Analysing reports assists company stakeholders’ in

focusing their attention on the areas that require it.


MAKING A BETTER
▪ Whether it is a sales report or a marketing plan, a visual
DATA ANALYSIS
representation of data assists businesses in increasing

their profits through improved analysis and business

choices.

Visuals are easier for humans to process than tiresome

tabular forms or reports. If the data is effectively


FASTER DECISION

MAKING communicated, decision-makers may move swiftly on the

basis of fresh data insights, increasing both decision-

making and corporate growth.

66 | P a g e
▪ Data visualisation enables business users to obtain

comprehension of their large quantities of data.;

▪ It is advantageous for them to identify new data trends


ANALYSING
and faults;
COMPLICATED DATA

▪ Understanding these patterns enables users to focus on

regions that suggest red flags or progress. In turn, this

process propels the firm forward.

TOPIC - 2
OBJECTIVE OF DATA VISUALISATION
▪ Data visualisation enhances the effect of communications for the audiences

and delivers the most convincing data analysis outcomes.

▪ It unites the organization’s communications systems across all organisations

and fields.

▪ Visualization allows to interpret large volumes of data more quickly and

effectively at a glance. It facilitates a better understanding of the data for

measuring its impact on the business and graphically communicates the

knowledge to internal and external audiences.

67 | P a g e
TOPIC - 3
MOST COMMON ERRORS MADE BY ANALYSTS THAT MAKES A DATA VISUALISATION

UNSUCCESSFUL ARE:

As mentioned earlier, before incorporating the data into

visualisation, the objective should be fixed. A great

visualization relies on the designer comprehending the

intended audience and executing on three essential points:

1. Who will read and understand the material and how

will they do so? Can it be presumed that it understands

the words and ideas employed, or if there is a need to


UNDERSTANDING THE
provide it with visual cues (e.g., a green arrow indicating
AUDIENCE
that good is ascending)? A specialist audience will have

different expectations than the broader public.

2. What are the expectations of the audience, and what

information is most beneficial to them?

3. What is the functional role of the visualisation, and

how may users take action based on it? A visualisation

that is exploratory should leave viewers with questions to

investigate.

68 | P a g e
▪ The designer must guarantee that all viewers have the

same understanding of what the visualisation represents.

To do this, the designer must establish a framework

consisting of the semantics and syntax within which the

data information is intended to be understood. The

semantics pertain to the meaning of the words and images

SETTING UP A CLEAR
employed, whereas the syntax is concerned with the form
FRAMEWORK
of the communication. For instance, when utilising an icon,

the element should resemble the object it symbolises, with

size, colour, and placement all conveying significance to the

viewer.

▪ There is one more component to the framework: Ensure

that the data is clean and that the analyst understands its

peculiarities before doing anything else.

69 | P a g e
▪ There are few kinds of communication as convincing as a

good story. Story telling assists the audience in gaining

understanding from facts.

▪ Information visualisation is a technique that turns data

and knowledge into a form that is perceivable by the

human visual system. The objective is to enable the

audience to see, comprehend, and interpret the

information.

TELLING A STORY ▪ In order to comprehend the data and connect with the

visualization’s audience, creators of visualisations must

delve deeply into the information. Good designers

understand not only how to select the appropriate graph

and data range, but also how to create an engaging story

through the visualisation.

▪ Data visualisation lends itself nicely to becoming a

narrative medium, particularly when the tale comprises a

large amount of data.

70 | P a g e
10.3 DATA PRESENTATION ARCHITECTURE

TOPIC – 1 INTRODUCTION TO DATA PRESENTATION ARCHITECTURE

TOPIC – 2 OBJECTIVE OF DATA PRESENTATION ARCHITECTURE

TOPIC – 3 SCOPE OF DPA

71 | P a g e
TOPIC - 1
INTRODUCTION TO DPA
Data presentation architecture (DPA) is a set of skills that aims to identify,

find, modify, format, and present data in a manner that ideally conveys

meaning and provides insight.

According to Kelly Lautt, “Data Presentation Architecture (DPA) is a rarely

applied skill set critical for the success and value of Business Intelligence.

Data presentation architecture weds the science of numbers, data and

statistics in discovering valuable information from data and making it usable,

relevant and actionable with the arts of data visualization, communications,

organizational psychology and change management in order to provide business

intelligence solutions with the data scope, delivery timing, format and

visualizations that will most effectively support and drive operational, tactical

and strategic behaviour toward understood business (or organizational) goals.

DPA is neither an IT nor a business skill set but exists as a separate field of

expertise.

Often confused with data visualization, data presentation architecture is a

much broader skill set that includes determining what data on what schedule

and in what exact format is to be presented, not just the best way to

present data that has already been chosen (which is data visualization).

Data visualization skills are one element of DPA.”

72 | P a g e
TOPIC – 2
OBJECTIVES OF DPA
There are following objectives of DPA:

▪ Utilize data to impart information in the most efficient method feasible

(provide pertinent, timely and comprehensive data to each audience participant

in a clear and reasonable manner that conveys important meaning, is actionable

and can affect understanding, behaviour and decisions).

▪ To utilise data to deliver information as effectively as feasible (minimise

noise, complexity, and unneeded data or detail based on the demands and tasks

of each audience).

TOPIC - 3
SCOPE OF DPA
▪ Defining significant meaning (relevant information) required by each audience

member in every scenario;

▪ Obtaining the proper data (focus area, historic reach, extensiveness, level of

detail, etc.);

▪ Determining the needed frequency of data refreshes (the currency of the

data);

▪ Determining the optimal presentation moment (the frequency of the user

needs to view the data);

▪ Using suitable analysis, categorization, visualization, and other display styles;

▪ Developing appropriate delivery techniques for each audience member based

on their job, duties, locations, and technological access.

73 | P a g e
10.4 - DASHBOARD, GRAPHS, DIAGRAMS, TABLES, REPORT
DESIGN

TOPIC – 1 INTRODUCTION

TOPIC – 2 DASHBOARDS

TOPIC – 3 GRAPHS, DIAGRAM AND CHARTS

TOPIC - 4 TABLES

74 | P a g e
TOPIC - 1
INTRODUCTION
Data visualisation is the visual depiction of data and information. Through the

use of visual elements like dashboards, charts, graphs, and maps etc, data

visualisation tools facilitate the identification and comprehension of trends,

outliers, and patterns in data.


DASHBOARD
▪ A data visualisation dashboard is an interactive dashboard that enables to

manage important metrics across numerous financial channels, visualise the data

points, and generate reports for customers that summarise the results.

▪ Creating reports for your audience is one of the most effective means of

establishing a strong working relationship with them. Using an interactive data

dashboard, the audience would be able to view the performance of their company

at a glance.

▪ On addition to having all the data in a single dashboard, a data visualisation

dashboard helps to explain what the company is doing and why, also fosters

client relationships, and gives a data set to guide decision-making.

▪ There are numerous levels of dashboards, ranging from those that represent

metrics vital to the firm as a whole to those that measure values vital to teams

inside an organisation. For a dashboard to be helpful, it must be automatically

or routinely updated to reflect the present condition of affairs.

75 | P a g e
GRAPH, DIAGRAM AND CHARTS
Henry D. Hubbard, Creator of the Periodic Table of Elements once said,

“There is magic in graphs. Few important and widely used graphs are mentioned

below:

▪ Bar Chart;

▪ Line Chart;

▪ Pie Chart;

▪ Map;

▪ Density Map;

▪ Scatter Plots;

▪ Gantt Chart;

▪ Histogram.

76 | P a g e
BAR CHART
▪ Bar graphs are one of the most used types of data visualisation.

▪ It may be used to easily compare data across categories, highlight

discrepancies and illustrate historical highs and lows.

▪ Bar graphs are very useful when the data can be divided into distinct

categories. For instance, the revenue earned in different years, the number of

car model produced in a year by an automobile company, change in economic value

added over the years etc.

▪ To add a zing, the bars can be made colourful. Using stacked and side-by-

side bar charts, one may further dissect the data for a more in-depth

examination.
FIGURE

77 | P a g e
LINE CHART
▪ The line chart or line graph joins various data points, displaying them as a

continuous progression.

▪ Utilize line charts to observe trends in data, often over time (such as stock

price fluctuations over five years or monthly website page visits).

▪ The outcome is a basic, simple method for representing changes in one

value relative to another.


FIGURE

78 | P a g e
PIE CHART
▪ A pie chart (or circle chart) is a circular graphical representation of

statistical data that is segmented to demonstrate numerical proportion.

▪ In a pie chart, the arc length of each slice is proportionate to the value it

depicts.

▪ The corporate world and the mass media make extensive use of pie charts.
FIGURE

79 | P a g e
MAP
▪ For displaying any type of location data, including postal codes, state

abbreviations, country names, and custom geocoding, maps are a no-brainer.

▪ If the data is related with geographic information, maps are a simple and

effective approach to illustrate the relationship.

▪ There should be a correlation between location and the patterns in the data.

Such as insurance claims by state and product export destinations by country,

automobile accidents by postal code, and custom sales areas etc.


FIGURE

80 | P a g e
DENSITY MAP
▪ Density maps indicate patterns or relative concentrations that might

otherwise be obscured by overlapping marks on a map, allowing to identify areas

with a larger or lesser number of data points.

▪ Density maps are particularly useful when dealing with large data sets including

several data points in a limited geographic region.


FIGURE

81 | P a g e
SCATTER PLOTS
▪ Scatter plots are a useful tool for examining the connection between many

variables, revealing whether one variable is a good predictor of another or

whether they tend to vary independently.

▪ A scatter plot displays several unique data points on a single graph.


FIGURE

82 | P a g e
GANTT CHART
▪ Gantt charts represent a project’s timeline or activity changes across time.

▪ A Gantt chart depicts tasks that must be accomplished before others may

begin, as well as the allocation of resources.

▪ However, Gantt charts are not restricted to projects. This graphic can depict

any data connected to a time series.


FIGURE

83 | P a g e
HISTOGRAM
▪ Histograms illustrate the distribution of the data among various groups.

Histograms divide data into discrete categories and provide a bar proportionate

to the number of entries inside each category.

▪ This chart type might be used to show data such as number of items.
FIGURE

84 | P a g e
TOPIC - 3
TABLES
▪ Tables, often known as “crosstabs” or “matrices,” emphasize individual values

above aesthetic formatting. They are one of the most prevalent methods for

showing data and, thus, one of the most essential methods for analyzing data.

While their focus is not intrinsically visual, as reading numbers is a linguistic

exercise, visual features may be added to tables to make them more effective

and simpler to assimilate.

▪ Tables are most frequently encountered on websites, as part of restaurant

menus, and within Microsoft Excel. It is crucial to know how to interpret tables

and make the most of the information they provide since they are ubiquitous. It

is also crucial for analysts and knowledge workers to learn how to make

information easier for their audience to comprehend.

85 | P a g e
TOPIC - 4
REPORT DESIGN USING DATA VISUALIZATION
▪ After producing a report, the last thing one anticipates is for someone to

actually read it.

▪ Whether conveying ideas or seeking help, the information must leave an

impression. To do this, one must present the report in a style that is both

attractive and simple to comprehend.

▪ This is especially accurate if your report layout includes numbers.


HOW TO USE DATA VISUALIZATION IN REPORT DESIGN?
There are few strategic steps to include data visualization in report design, as

mentioned below:

▪ Data-driven storytelling is a powerful tool;

▪ Finding a story that connects with the reader can help to

FIND A STORY IN THE


create an effective report. It’s also not that hard as it
DATA
looks;

▪ In order to locate the story, one must arrange the data,

identify any missing numbers, and then check for outliers.

When some individuals hear the term “data storytelling,”

they believe that it consists of a few statistics and that

the task is complete. This is a frequent misconception that

is false. Strong data storytelling comprises an engaging

narrative that takes the audience through the facts and

CREATE A NARRATIVE aids in their comprehension. To compose an excellent story,

one must:

▪ Engage the viewer with a catchy title and subheadings;

▪ Incorporate context into the data;

▪ Create a consistent and logical flow;

▪ Highlight significant discoveries and insights from the data.

86 | P a g e
▪ Data Visualization is not limited to the creation of charts

and graphs. It involves presenting the facts in the most

comprehensible chart possible.


CHOOSE THE MOST
▪ Applying basic design principles and utilising features like
SUITABLE DATA

VISUALIZATION
as form, size, colour, and labelling may have a significant

impact on how people comprehend the data.

▪ Knowing these tips may greatly improve the data

visualisations.

▪ The report design may be for internal or external

FOLLOW THE VISUAL consumption.


LANGUAGE
▪ It is essential to adhere to data visualisation principles in

order to achieve both uniformity and comprehension.

Some reports are not intended for public consumption.

However, since they include so much essential information,


PUBLICIZE THE REPORT
they may contain knowledge that is of interest to individuals

or media outside of the business.

87 | P a g e
10.5 TOOLS AND TECHNIQUES OF VISUALIZATION AND
GRAPHICAL PRESENTATION

DATA VISUALISATION TOOLS

▪ Tableau
TOPIC – 1 ▪ Microsoft Power BI
▪ Microsoft Excel
▪ QlikView

88 | P a g e
DATA VISUALISATION TOOLS
We will now examine some of the most successful data visualisation tools for

data scientists and how they may boost their productivity. Here are four popular

data visualisation tools that may assist data scientists in making more compelling

presentations.

Tableau
▪ Tableau is a data visualisation application for creating interactive graphs,

charts, and maps.

▪ Tableau Desktop is the first product of its kind. Tableau Public is a free

version of Tableau Desktop with some restrictions;

▪ It takes time and effort to understand Tableau, but there are several tools

available to assist doing it;

▪ As a data scientist, Tableau must be the most important tool one should

understand and employ on a daily basis.

89 | P a g e
Microsoft Power BI
▪ Microsoft Power BI is a data visualisation tool for business intelligence

data.

▪ Reporting, self-service analytics, and predictive analytics are supported.

▪ In addition, it provides a platform for end users to generate reports and share

insights with others inside their business.

▪ It serves as a centralized repository for all of the business data, which all of

the business users can access.

90 | P a g e
Microsoft Excel
▪ Microsoft Excel is a data visualization tool which provides several options for

viewing data, such as, scatter plot, bar chart, histogram, pie chart, line

chart, and tree map etc.

▪ Numerous data analysts utilize techniques in MS Excel to examine statistical,

scientific, medical, and economic data for market research and financial

planning, among other applications.

91 | P a g e
QlikView
▪ QlikView is a data discovery platform that enables users to make quicker,

more informed choices by speeding analytics, uncovering new business

insights, and enhancing the precision of outcomes.

▪ An easy software development kit that has been utilized by enterprises

worldwide for many years.

▪ It may mix diverse data sources with color-coded tables, bar graphs, line

graphs, pie charts, and sliders.

92 | P a g e
QUESTION BANK

MULTIPLE CHOICE QUESTIONS


QUESTION NO. 1 QUESTION NO. 2
Following is a widely used graph for data Following are the objectives of data

visualization visualisation:

a) Bar chart a) Making a better data analysis

b) Pie chart b) Faster decision making

c) Histogram c) Analysing complicated data

d) All of the above d) All of the above

QUESTION NO. 3 QUESTION NO. 4


Following are the scope of DPA Maps may be used for displaying

a) Defining significant meaning (relevant a) Pincode

information) required by each audience b) Country name

member in every scenario. c) State abbreviation

b) Obtaining the proper data (focus d) All of the above

area, historic reach, extensiveness,

level of detail, etc.)

c) Determining the needed frequency of

data refreshes (the currency of the

data)

d) All of the above

QUESTION NO. 5
A scatter plot displays several unique data points:

a) On a single graph.

b) On two different graphs

c) On four different graphs

d) None of the above


ANSWER
QUES NO. 1 2 3 4 5

ANS NO. d d d d a

93 | P a g e
STATE TRUE OR FALSE
Data visualisation enhances the effect of communications for the audiences
1
and delivers the most convincing data analysis outcomes.

Visualization allows to interpret large volumes of data more quickly and


2
effectively at a glance.

Data presentation architecture (DPA) is a set of skills that aims to identify,

3 find, modify, format, and present data in a manner that ideally conveys

meaning and provides insight.

Scatter plots are a useful tool for examining the connection between many

4 variables, revealing whether one variable is a good predictor of another or

whether they tend to vary independently.

5 Gantt charts represent a project’s timeline or activity changes across time.

ANSWER
QUES NO. 1 2 3 4 5

ANSWER T T T T T

SHORT ESSAY TYPE QUESTIONS


1 State the objectives of Data presentation architecture (DPA).

2 What are the scopes of Data presentation architecture (DPA).

3 Define the concept of data visualization dashboard.

4 Write a short note on bar chart

5 Write a short note on density map.

ESSAY TYPE QUESTIONS


Discuss the ways in which the finance professionals may be helped by data
1
visualization in analysing and reporting information.

2 Discuss the objectives of data visualization.

3 How to use data visualization in report design?

4 Discuss the different tools for Visualization and Graphical Presentation

5 Discuss the objectives and scope of data presentation architecture.

94 | P a g e
FILL IN THE BLANKS
1 Data and insights available to decision-makers facilitate _______ analysis.

Often confused with data visualization, data presentation architecture is a


2
much _______ skill set.

A _________ is a circular graphical representation of statistical data that is


3
segmented to demonstrate numerical proportion.

If the data is related with geographic information, ________ are a simple


4
and effective approach to illustrate the relationship.

________ indicate patterns or relative concentrations that might otherwise

5 be obscured by overlapping marks on a map, allowing to identify areas with a

larger or lesser number of data points.


ANSWER
1 Decision

2 Broader

3 pie chart (or circle chart)

4 Maps

5 Density maps

95 | P a g e
CHAPTER 11
DATA ANALYSIS AND MODELLING

Chapter overview
11.1 Process, Benefits and Types of Data Analysis

11.2 Data Mining and Implementation of Data Mining

Analytics and Model Building (Descriptive, Diagnostic, Predictive,


11.3
Prescriptive)

11.4 Standards for Data Tagging and Reporting (XML, XBRL)

Cloud Computing, Business Intelligence, Artificial Intelligence,


11.5
Robotic Process Automation and Machine Learning

11.6 Model vs Data – Driven Decision – Making

96 | P a g e
11.1 – “PROCESS, BENEFITS AND TYPES OF DATA ANALYSIS”.

TOPIC – 1 INTRODUCTION TO DATA ANALYTICS

TOPIC – 2 PROCESS OF DATA ANALYTICS

TOPIC – 3 BENEFITS OF DATA ANALYTICS

97 | P a g e
TOPIC – 1
INTRODUCTION TO DATA ANALYTICS
▪ Data analytics is the science of evaluating unprocessed datasets to draw conclusions

about the information they contain.

▪ It helps us to identify patterns in the raw data and extract useful information

from them.

▪ Here data are evaluated and used, to assist firms in gaining a deeper understanding

of their customers, analysing their promotional activities, customising their content,

developing content strategies, and creating new products. Data analytics enables

businesses to boost market efficiency and increase profits.

▪ Applications containing machine learning algorithms and automated systems may be

utilised by data analytics procedures and methodologies.

TOPIC – 2
PROCESS OF DATA ANALYTICS
Following are the steps for data analytics:
DATA MAY BE SEGMENTED by a variety of PARAMETERS, including age,
STEP 1: CRITERIA FOR
population, income, and sex. The data values might be either
GROUPING DATA
numeric or category.

STEP 2: COLLECTING DATA MAY BE GATHERED from SEVERAL SOURCES, including internet
THE DATA sources, computers and community sources.

STEP 3: ORGANIZING After collecting the data, IT MUST BE ARRANGED SO THAT IT CAN BE

THE DATA ANALYSED.

▪ The data is INITIALLY CLEANSED TO VERIFY THAT THERE ARE NO

DUPLICATES OR ERRORS.
STEP 4: CLEANING THE

DATA
▪ Before data is sent to a data analyst for analysis, IT IS

BENEFICIAL TO RECTIFY OR ELIMINATE ANY ERRORS BY CLEANING THE

DATA.

There are four types of data analytics process:

1) Descriptive analytics
STEP 5: ADOPT THE

RIGHT TYPE OF DATA 2) Diagnostics analytics


ANALYTICS PROCESS
3) Predictive analytics

4) Prescriptive analytics

98 | P a g e
TOPIC - 3
BENEFITS OF DATA ANALYTICS
Following are the benefits of data analytics:
IMPROVES DECISION MAKING PROCESS
▪ Companies can use the information gained from data analytics to base their decisions,

resulting in enhanced outcomes.

▪ Using data analytics significantly reduces the amount of guesswork involved in

preparing marketing plans, deciding what materials to produce, and more.

▪ Using advanced data analytics technologies, you can continuously collect and analyse

new data to gain a deeper understanding of changing circumstances.


INCREASE IN EFFICIENCY OF OPERATIONS
▪ Data analytics assists firms in streamlining their processes, conserving resources,

and increasing their profitability.

▪ When firms have a better understanding of their audience’s demands, they spend

less time creating advertising that do not fulfil those needs.


IMPROVED SERVICE TO STAKEHOLDERS
▪ Data analytics gives organisations with a more in-depth understanding of their

customers, employees and other stake holders.

▪ This enables the company to tailor stakeholders’ experiences to their needs, provide
more personalization, and build stronger relationships with them.

99 | P a g e
11.2 – “DATA MINING AND IMPLEMENTATION OF DATA MINING”.

TOPIC – 1 INTRODUCTION TO DATA MINING

TOPIC – 2 PROCESS OF DATA MINING

TOPIC – 3 TECHNIQUES OF DATA MINING

IMPLEMENTATION OF DATA MINING IN FINANCE AND


TOPIC – 4
MANAGEMENT

100 | P a g e
TOPIC - 1
INTRODUCTION TO DATA MINING
▪ Given the advancement of DATA WAREHOUSING TECHNOLOGIES and the expansion of BIG

DATA, the use of data mining techniques has advanced dramatically over the past two

decades, supporting businesses in TRANSLATING THEIR RAW DATA INTO MEANINGFUL

INFORMATION.

▪ DATA MINING, also known as KNOWLEDGE DISCOVERY IN DATA (KDD), is the extraction of

patterns and other useful information from massive data sets. Through smart data

analytics, data mining has enhanced corporate decision making.

▪ Nevertheless, despite the fact that technology is always evolving to manage massive

amounts of data, leaders continue to struggle with scalability and automation.

▪ The DATA MINING TECHNIQUES behind these investigations predicts results using

machine learning algorithms. These strategies are used to organise and filter data,

bringing to the surface the most relevant information.

101 | P a g e
TOPIC - 2
PROCESS OF DATA MINING
▪ The PROCESS OF DATA MINING comprises a series of procedures, from data collecting

through visualisation, in order to EXTRACT USEFUL INFORMATION FROM MASSIVE DATA SETS.

▪ In addition to classifying and clustering data using CLASSIFICATION AND REGRESSION

TECHNIQUES, they discover outliers for use cases such as SPAM IDENTIFICATION.

▪ Data mining techniques are utilised to DEVELOP DESCRIPTIONS AND HYPOTHESES ON A

SPECIFIC DATA SET.

▪ DATA MINING TYPICALLY INVOLVES FOUR STEPS:


1) Establishing objectives;

2) Acquiring and preparing data;

3) Implementing data mining techniques and

4) Assessing outcomes.
SETTING THE BUSINESS OBJECTIVE
▪ This might be the MOST DIFFICULT ELEMENT IN THE DATA MINING process, yet many

organisations spend inadequate effort on it.

▪ Together, data scientists and business stakeholders must identify the business

challenge to adequately comprehend the company environment.


PREPARATION OF DATA
▪ Once the scale of the problem has been established, it is simpler for data scientists

to determine which collection of data will assist the company in answering crucial

questions;

▪ Once the pertinent DATA HAS BEEN COLLECTED, IT WILL BE CLEANSED BY ELIMINATING ANY

NOISE, SUCH AS REPETITIONS AND MISSING NUMBERS;

▪ Based on the dataset, AN EXTRA STEP MAY BE DONE TO MINIMISE THE NUMBER OF DIMENSIONS,

as an excessive amount of features might slow down any further calculation.

102 | P a g e
MODEL BUILDING AND PATTERN MINING

▪ Data scientists may study any intriguing relationship between the data, such as

frequent patterns, or clustering algorithms depending on the sort of research.

▪ Depending on the available data, DEEP LEARNING ALGORITHMS may also be utilised to

categorise or cluster a data collection.

▪ If the INPUT DATA IS MARKED (i.e., supervised learning), a CLASSIFICATION MODEL may

be used to categorise data, or a REGRESSION may be employed to forecast the

probability of a specific assignment.

▪ If the DATASET IS UNLABELLED (i.e., unsupervised learning), the particular data

points in the training set are compared to uncover underlying commonalities, then

CLUSTERED based on those features.


RESULT EVALUATION AND IMPLEMENTATION OF KNOWLEDGE
▪ After aggregating the data, the FINDINGS MUST BE ANALYSED AND UNDERSTOOD.

▪ When completing results, they MUST BE VALID, ORIGINAL, PRACTICAL, AND COMPREHENSIBLE.

▪ When this criterion is satisfied, companies can execute new strategies based on this

understanding, therefore attaining their intended goals.

103 | P a g e
TOPIC – 3
TECHNIQUES OF DATA MINING
Using various methods and approaches, data mining transforms vast quantities of data

into valuable information. Here are a few of the most prevalent:


ASSOCIATION RULES
▪ An association rule is a RULE-BASED TECHNIQUE for discovering associations between

variables inside a given dataset.

▪ These methodologies are commonly employed for MARKET BASKET ANALYSIS, enabling

businesses to better comprehend the linkages between various items.

▪ UNDERSTANDING CLIENT CONSUMPTION PATTERNS helps organisations to create more

effective cross-selling tactics and recommendation engines.


NEURAL NETWORKS
▪ Primarily utilised for deep learning algorithms, Neural Networks replicate the

interconnection of the human brain through layers of nodes to process TRAINING

DATA.

▪ Every node has INPUTS, WEIGHTS, A BIAS (OR THRESHOLD), AS WELL AS AN OUTPUT.

▪ If the output value exceeds a predetermined threshold, the node “fires” and

passes data to the subsequent network layer.

▪ Neural networks acquire this mapping function by SUPERVISED LEARNING.

▪ When the cost function is zero (0) or close to it (1), we may have confidence in the

model’s ability to produce the correct answer.


DECISION TREE
▪ This data mining methodology CLASSIFIES OR PREDICTS LIKELY OUTCOMES based on

collection of decisions.

▪ As its name implies, it employs TREE-LIKE REPRESENTATION to depict the potential

results of these actions.


K – NEAREST NEIGHBOUR
▪ K-nearest neighbour, often known as the KNN algorithm, classifies data points

depending on their closeness with other accessible data.

▪ This technique assumes that COMPARABLE DATA POINTS exist in close proximity to one

another.

▪ Consequently, it attempts to measure the distance between data points, often by

EUCLIDEAN DISTANCE, and then assigns some on the most common category or average.

104 | P a g e
TOPIC - 4
IMPLEMENTATION OF DATA MINING IN FINANCE AND MANAGEMENT
The widespread use of data mining techniques by business intelligence and data

analytics teams enables them to harvest insights for their organisations and industries.

Utilizing data mining techniques, hidden patterns and future trends may be predicted.

Data mining applications are:


DETECTING MONEY LAUNDERING AND OTHER FINANCIAL CRIMES
▪ Money laundering is the ILLEGAL CONVERSION OF BLACK MONEY TO WHITE MONEY.

▪ In today’s society, DATA MINING TECHNIQUES have advanced to the point where they are

deemed suitable for DETECTING MONEY LAUNDERING.

▪ The data mining methodology provides a mechanism for bank customers to detect or

verify the detection of the anti-money laundering impact.


DESIGN AND CONSTRUCTION OF DATA WAREHOUSES
The business is able to RETRIEVE OR MOVE THE DATA INTO SEVERAL HUGE DATA WAREHOUSES,

allowing a vast volume of data to be correctly and reliably evaluated with the aid of

various data mining methodologies and techniques.


TARGET MARKETING
▪ Data mining techniques work to TARGET A CERTAIN MARKET, AND THEY ALSO ASSIST AND

DETERMINE MARKET DECISIONS.

▪ With data mining, it is possible to keep earnings, margins, etc. and determine which

product is optimal for various types of customers.


PREDICTION OF LOAN REPAYMENT AND CUSTOMER CREDIT POLICY ANALYSIS
▪ LOAN DISTRIBUTION IS THE CORE BUSINESS FUNCTION OF EVERY BANK.

▪ Data mining aids in the management of all critical data and massive databases by

utilising its models.

105 | P a g e
11.3 – “ANALYTICS AND MODEL BUILDING (DESCRIPTIVE,
DIAGNOSTIC, PREDICTIVE, PRESCRIPTIVE)”.

TOPIC – 1 DESCRIPTIVE ANALYTICS

TOPIC – 2 DIAGNOSTIC ANALYTICS

TOPIC – 3 PREDICTIVE ANALYTICS

TOPIC – 4 PRESCRIPTIVE ANALYTICS

106 | P a g e
TOPIC – 1
DESCRIPTIVE ANALYTICS
INTRODUCTION
▪ Descriptive analytics is a frequently employed style of data analysis in which

historical data is collected, organised, and presented in a readily digestible format;

▪ Descriptive analytics focuses exclusively on what has already occurred in an

organisation and, unlike other types of analysis;

▪ Descriptive analytics serves as a basic starting point to inform or prepare data for

subsequent analysis.

▪ In general, descriptive analytics is the simplest kind of data analytics, since it

employs simple mathematical and statistical methods, such as arithmetic,

averages, and percentage changes, rather of the complicated computations required

for predictive and prescriptive analytics.


HOW DOES DESCRIPTIVE ANALYTICS WORK?
To identify historical data, descriptive analytics employs two fundamental

techniques: data aggregation and data mining (also known as data discovery). The

process of gathering and organising data into digestible data sets called data

aggregation.

According to Dan Vesset, the process of descriptive analytics may be broken into five

broad steps:

Measurements are developed to evaluate performance against


STEP 1: DECIDE

THE BUSINESS corporate objectives, such as increasing operational efficiency or


METRICS
revenue.
STEP 2: The data is gathered from database.
IDENTIFICATION

OF DATA

REQUIREMENT

107 | P a g e
STEP 3 Data preparation, which includes transformation and cleaning is a
PREPARATION
crucial step for ensuring correctness; it is also one of the most time-
AND COLLECTION

OF DATA consuming tasks for the analyst.

STEP 4: ANALYSIS Utilizing clustering and regression analysis, we discover data trends
OF DATA and evaluate performance.
STEP 5: Lastly, charts and graphs are utilised to portray findings in a
PRESENTATION
manner that non- experts in analytics may comprehend.
OF DATA

INFORMATION REVEALED BY DESCRIPTIVE ANALYTICS


▪ Examples of descriptive analytics gives a historical overview of an organization’s

activities include company reports on inventory, workflow, sales, and revenue.

▪ Social analytics are virtually always a type of descriptive analytics.

▪ The number of followers, likes, and posts may be utilised to calculate, for example,

the average number of replies per post, page visits, and response time.

▪ Facebook and Instagram comments are additional instances of descriptive analytics

that may be utilised to better comprehend user sentiments.


EXAMPLES OF DESCRIPTIVE ANALYTICS
Several applications of descriptive analytics include the following:

▪ Past events, such as sales and operational data or marketing campaigns, are

summarized;

▪ Reporting general trends;

▪ Compiling survey data;

▪ Social media usage such as Instagram or Facebook likes, are examples of such

information.
ADVANTAGES AND DISADVANTAGES OF DESCRIPTIVE ANALYTICS
▪ Due to the fact that descriptive analytics depends just on historical data this

technique is easily applicable to day-to-day operations and does not need an in-depth

understanding of analytics.

▪ This implies that firms may report on performance very quickly and acquire insights

that can be utilised to make changes.

108 | P a g e
TOPIC - 2
DIAGNOSTIC ANALYTICS
INTRODUCTION
▪ Diagnostic analytics highlights the tools are employed to question the data, “Why did

this occur?”. It involves a thorough examination of the data to discover important

insights.

▪ Descriptive analytics, the first phase in the data analysis process, whereas

Diagnostic analytics goes a step further by revealing the rationale behind particular

outcomes.

▪ Typical strategies for diagnostic analytics include data discovery, data mining;

▪ Data mining is the automated extraction of information from vast quantities of

unstructured data.
ADVANTAGES
▪ Data plays an increasingly important role in every organisation; Diagnostic tools helps

to make the most of the data by turning it into visuals and insights that can be

utilised by everyone;

▪ Diagnostic analytics develops solutions that may be used to discover answers to

data-related problems and to communicate insights within the organisation;

▪ Diagnostic analytics enables to derive value from the data by asking the relevant

questions and doing in- depth analyses of the responses.

109 | P a g e
TOPIC – 3
PREDICTIVE ANALYTICS
INTRODUCTION
▪ Predictive analytics, as implied by its name, focuses on forecasting and

understanding what might occur in the future, whereas descriptive analytics focuses

on previous data;

▪ By analysing past data patterns and trends by examining historical data and customer

insights, it is possible to predict what may occur in the future and, as a result, many

aspects of a business can be informed, such as setting realistic goals.


HOW DOES PREDICTIVE ANALYTICS WORK?
▪ The foundation of predictive analytics is probability.

▪ Using techniques such as data mining and machine learning algorithms (classification,

regression, and clustering techniques), predictive analytics attempts to predict

possible future outcomes and the probability of those events.

▪ Deep learning is a more recent subfield of machine learning that imitates the

building of “human brain networks as layers of nodes that understand a specific process

area but are networked together to provide an overall forecast.”

▪ Utilizing predictive analytics, businesses may foresee customer behaviour and

purchase patterns, as well as discover sales trends.


ADVANTAGES AND DISADVANTAGES OF PREDICTIVE ANALYTICS
Given that predictive analysis is based on probabilities, it can never be absolutely

precise, but it may serve as a crucial tool for forecasting probable future occurrences

and informing future corporate strategy. Additionally, predictive analytics may

enhance several corporate functions, including:

▪ Effectiveness, including inventory forecasting;

▪ Customer service, which may aid a business in gaining a deeper knowledge of who its

clients are and what they want so that it can personalise its suggestions, is essential;

▪ Detection and prevention of fraud;

▪ Risk mitigation;

▪ This kind of analysis requires the availability of historical data, typically in enormous

quantities.

110 | P a g e
EXAMPLE OF PREDICTIVE ANALYTICS
Following industries as some in which predictive analysis might be utilised:

▪ E-commerce – Anticipating client preferences and proposing items based on previous

purchases and search histories

▪ Sales – estimating the possibility that a buyer will buy another item or depart the

shop.

▪ Healthcare – anticipating staffing and resource requirements

▪ IT security – detecting potential security vulnerabilities requiring more investigation

▪ Human resources – identifying employees who are contemplating resigning and urging

them to remain.

111 | P a g e
TOPIC – 4
PRESCRIPTIVE ANALYTICS
INTRODUCTION
▪ DESCRIPTIVE ANALYTICS DESCRIBES WHAT HAS OCCURRED, DIAGNOSTIC ANALYTICS EXPLORE WHY

IT OCCURRED, PREDICTIVE ANALYTICS DESCRIBES WHAT COULD OCCUR, AND PRESCRIPTIVE

ANALYTICS DESCRIBES WHAT SHOULD BE DONE;

▪ This approach is the fourth, final, and most sophisticated step of the business

analysis process, and it is the one that urges firms to action by assisting executives,

managers, and operational personnel.


HOW DOES PRESCRIPTIVE ANALYTICS WORK?
▪ Prescriptive analytics goes one step farther than descriptive and predictive analysis

by advising the best potential business actions.

▪ This is the most sophisticated step of the business analytics process.

▪ A multitude of approaches and tools – such as rules, statistics, and machine learning

algorithms – may be used to accessible data, including internal data (from within the

business) and external data, in order to produce predictions and recommendations

(such as data derived from social media).

▪ The widespread misconception is that predictive analytics and machine learning

are same.

▪ While predictive analytics uses historical data and statistical techniques to

make predictions about the future, machine learning, a subset of artificial

intelligence, refers to a computer system’s ability to understand large and often

enormous amounts of data without explicit instructions, and to adapt and become

increasingly intelligent as a result.

112 | P a g e
EXAMPLES OF PRESCRIPTIVE ANALYTICS
Prescriptive analysis applications include the following:

▪ Oil and manufacturing – monitoring price fluctuations;

▪ Healthcare – enhancing patient care and healthcare administration by analysing

readmission rates;

▪ Insurance – evaluating customer risk in terms of price and premium information;

▪ Pharmaceutical research – determining the patient populations for clinical trials.


ADVANTAGES AND DISADVANTAGES OF PRESCRIPTIVE ANALYTICS
Prescriptive analytics gives important insights for making the most optimal data-driven

decisions to optimise corporate performance. Nonetheless, similar to predictive

analytics, this technique requires enormous volumes of data to deliver effective

findings, which is not always the case.

113 | P a g e
11.4 – “STANDARDS FOR DATA TAGGING AND REPORTING (XML,
XBRL)”

TOPIC – 1 EXTENSIBLE MARKUP LANGUAGE (XML)

EXTENSIBLE BUSINESS REPORTING LANGUAGE (XBRL)


▪ Introduction
TOPIC – 2
▪ Benefits

▪ Participants

114 | P a g e
TOPIC - 1
EXTENSIBLE MARKUP LANGUAGE (XBRL)
INTRODUCTION
▪ XML is a file format and markup language for storing, transferring, and recreating

arbitrary data. It specifies a set of standards for encoding texts in a format that is

understandable by both humans and machines. XML is defined by the 1998 XML 1.0

Specification of the World Wide Web Consortium and numerous other related

specifications, which are all free open standards;

▪ XML’s design objectives stress Internet usability, universality, and simplicity. It is a

textual data format with significant support for many human languages via Unicode.

Although XML’s architecture is centred on texts, the language is commonly used to

express arbitrary data structures, such as those employed by web services;

▪ Several schema systems exist to help in the design of XML-based languages, and

numerous application programming interfaces (APIs) have been developed by

programmers to facilitate the processing of XML data;

▪ Serialization, or storing, sending, and rebuilding arbitrary data, is the primary

function of XML. In order for two dissimilar systems to share data, they must agree

on a file format. XML normalises this procedure;

▪ XML is comparable to a universal language for describing information;

▪ As a markup language, XML labels, categorises, and arranges information

systematically;

▪ The data structure is represented by XML tags, which also contain information. The

information included within the tags is encoded according to the XML standard. A

supplementary XML schema (XSD) defines the required metadata for reading and

verifying XML. This is likewise known as the canonical schema. A “well-formed” XML

document complies to fundamental XML principles, whereas a “valid” document adheres

to its schema.

115 | P a g e
TOPIC - 2
EXTENSIBLE BUSINESS REPORTING LANGUAGE (XBRL)
INTRODUCTION
▪ XBRL is a data description language that facilitates the interchange of STANDARD,

comprehensible CORPORATE DATA.

▪ It is based on XML and enables the automated interchange and trustworthy

extraction of financial data across all software types and advanced technology,

including Internet.

▪ XBRL allows organisations to arrange data using TAGS. When a piece of data is labelled

as “revenue,” for instance, XBRL enabled applications know that it pertains to revenue.

It conforms to a fixed definition of income and may appropriately utilize it.

▪ XBRL offers EXPANDED CONTEXTUAL INFORMATION ON THE PRECISE DATA content of

financial documents.

With XBRL, a business, a person, or another software programme may quickly produce

a variety of output formats and reports based on a financial statement. (Various

elements of Cost Audit Report and Compliance Report can be mapped into XBRL

tags of the costing taxonomy using specialized XBRL software tools.)


BENEFITS
REDUCES THE All reports are automatically created from a single source of
CHANCE OF
information, which reduces the chance of erroneous data entry and
ERRONEOUS DATA

ENTRY hence increases data reliability.


PREPARATION AND Reduces expenses by simplifying and automating the preparation and
PRODUCTION OF

REPORTS production of reports for various clients.


PUBLICATION OF
Facilitates the publication of analyst and investor reports.
REPORTS

COMPARISON AND
Access, Comparison, and Analytic capabilities for information are
ANALYTIC
unparalleled.

Accelerates the decision-making of financial entities such as banks


DECISION-MAKING
and rating services.

116 | P a g e
PARTICIPANTS

117 | P a g e
11.5 – “CLOUD COMPUTING, BUSINESS INTELLIGENCE,
ARTIFICIAL INTELLIGENCE, ROBOTIC PROCESS AUTOMATION
AND MACHINE LEARNING”.
CLOUD COMPUTING
▪ Introduction

▪ Types of Cloud Computing


TOPIC - 1
✓ Private Cloud

✓ Public Cloud

✓ Hybrid Cloud

BUSINESS INTELLIGENCE
TOPIC - 2 ▪ Introduction

▪ BI Methods

ARTIFICIAL INTELLIGENCE
TOPIC - 3 ▪ Introduction

▪ Types of Artificial Intelligence

TOPIC – 4 DEEP LEARNING VS MACHINE LEARNING

ROBOTIC PROCESS AUTOMATION


TOPIC – 5 ▪ Introduction

▪ Benefits of RPA

MACHINE LEARNING
▪ Introduction

▪ Approaches towards Machine Learning

✓ Supervised Learning
TOPIC – 6
✓ Unsupervised Machine Learning

✓ Semi – supervised Learning

✓ Reinforcement Learning

✓ Dimensionality Reduction

118 | P a g e
TOPIC – 1
INTRODUCTION TO CLOUD COMPUTING
Before the advent of cloud computing, businesses had to acquire and
LOCAL SERVER
OPERATE THEIR OWN SERVERS TO SUIT THEIR DEMANDS.

▪ Cloud Computing involves STORING AND ACCESSING DATA VIA DISTANT

SERVERS as opposed to local hard drives and private data centres.


CLOUD COMPUTING
▪ It is the DELIVERY OF A VARIETY OF SERVICES THROUGH THE INTERNET, OR

“THE CLOUD.”

This necessitated the purchase of sufficient server capacity to


STRINGENT SLA MINIMIZE THE RISK OF DOWNTIME AND DISRUPTIONS AND TO MEET PEAK

TRAFFIC VOLUMES.

Today’s cloud service providers enable businesses to LESSEN THEIR

LESS COSTLY RELIANCE ON COSTLY ONSITE SERVERS, MAINTENANCE STAFF, AND OTHER IT

RESOURCES.

TYPES OF CLOUD COMPUTING

There are three deployment options for cloud computing:


1. PRIVATE CLOUD,
2. PUBLIC CLOUD,
3. HYBRID CLOUD.
PRIVATE CLOUD

Private cloud offers a cloud environment that is EXCLUSIVE TO A SINGLE


INTRODUCTION CORPORATE ORGANISATION, with physical components housed on-
premises or in a vendor’s datacenter;
This solution gives a HIGH LEVEL OF CONTROL due to the fact that the
CENTRAL CONTROL
private cloud is available to just one enterprise;
SECURITY ENHANCED SECURITY PROCEDURES;

SCALABLE CAPACITY TO EXPAND COMPUTER RESOURCES AS NEEDED;

In many instances, a business maintains a private cloud


infrastructure on-premises and provides cloud computing services to
internal users over the intranet. (ON – PREMISE PRIVATE CLOUD)
TYPE
In other cases, the company engages with a third-party cloud service
provider to host and operate its servers off-site.
(OUTSOURCED PRIVATE CLOUD)

119 | P a g e
PUBLIC CLOUD
▪ The public cloud stores and MANAGES ACCESS TO DATA AND APPLICATIONS
THROUGH THE INTERNET.
INTRODUCTION
▪ Because these resources are offered through the web, the public

cloud deployment model enables enterprises to grow with more ease;

▪ Users are required to PAY for cloud services ON AN AS-NEEDED BASIS;

PAY – PER - USE ▪ It is fully virtualized, enabling an environment in which SHARED

RESOURCES MAY BE UTILISED AS NECESSARY.

Public cloud service providers use RIGOROUS SECURITY MEASURES to


SECURITY
prevent unauthorised access to user data by other tenants.
HYBRID CLOUD
▪ Hybrid cloud BLENDS private and public cloud models;

▪ It enabling enterprises to exploit the benefits of shared resources

(PUBLIC CLOUD) while leveraging their existing IT infrastructure for

mission-critical security needs (PRIVATE CLOUD);

▪ The hybrid cloud architecture enables businesses to store sensitive


INTRODUCTION
data on-premises (PRIVATE CLOUD) and access it through apps hosted in

the PUBLIC CLOUD;

▪ In order to comply with privacy rules, an organisation may, for

instance, KEEP SENSITIVE USER DATA IN A PRIVATE CLOUD AND EXECUTE

RESOURCE-INTENSIVE COMPUTATIONS IN A PUBLIC CLOUD.

120 | P a g e
TOPIC – 2
INTRODUCTION TO BUSINESS INTELLIGENCE
▪ Business intelligence includes data mining, data visualisation and best practises to

assist businesses in making choices that are more data-driven.

▪ When you have a complete picture of your organization’s data and utilise it to drive

change, remove inefficiencies, and swiftly adjust to market or supply changes, you have

contemporary business intelligence.

▪ Modern BI systems promote adaptable self- service analysis, controlled data on

dependable platforms, empowered business users, and rapid insight delivery.

▪ The phrase “BUSINESS INTELLIGENCE” was coined in 1989.

BI METHODS
▪ Company intelligence is a broad word that ENCOMPASSES THE PROCEDURES AND METHODS

OF GATHERING, STORING, AND EVALUATING DATA from business operations or activities in

order TO MAXIMIZE PERFORMANCE.

▪ All of these factors combine to provide a full perspective of a firm, ENABLING

INDIVIDUALS TO MAKE BETTER, PROACTIVE DECISIONS.

▪ In recent years, business intelligence has expanded to incorporate more procedures

and activities designed to enhance performance. These procedures consist of:

Multiple data source compilation, dimension and data analysis


DATA PREPARATION
preparation.
Large datasets may be mined for patterns using databases and
DATA MINING
Machine learning (ML).
DATA Data consumption is facilitated by transforming data analysis into
VISUALIZATION
visual representations such as charts, graphs, and histograms.
The dissemination of data analysis for stakeholders to form
REPORTING
conclusions and make decisions.
DESCRIPTIVE Utilizing basic data analysis.
ANALYTICS

STATISTICAL
Taking the results of descriptive analytics and use statistics to
ANALYSIS
further explore the data, such as how and why this pattern occurred.
BI extracts responses from data sets in response to data-specific
QUERYING
queries.
PERFORMANCE Comparing current performance data to previous performance data in
METRICS AND
BENCHMARKING order to measure performance.

121 | P a g e
TOPIC - 3
INTRODUCTION TO ARTIFICIAL INTELLIGENCE
▪ As expected with any new developing technology on the market, AI development is

still surrounded by a great deal of hype;

▪ Artificial intelligence is, in its simplest form, a topic that COMBINES COMPUTER SCIENCE

AND SUBSTANTIAL DATASETS TO ALLOW PROBLEM-SOLVING;

▪ IT INCLUDES THE SUBFIELDS OF MACHINE LEARNING AND DEEP LEARNING, WHICH ARE COMMONLY

ASSOCIATED WITH ARTIFICIAL INTELLIGENCE. These fields consist of AI algorithms that

aim to develop expert systems that make predictions or classifications based on input

data;

▪ Applications of Artificial Intelligence - SELF-DRIVING VEHICLES AND PERSONAL

ASSISTANTS.

▪ STUART RUSSELL AND PETER NORVIG


Stuart Russell and Peter Norvig published ‘ARTIFICIAL INTELLIGENCE: A MODERN APPROACH’,

which has since become one of the MOST INFLUENTIAL AI TEXTBOOKS. In it, they discuss

four alternative aims or definitions of artificial intelligence, which distinguish

computer systems based on reasoning and thinking vs. acting:

Human approach:

Systems that think like humans

Systems that act like humans

Ideal approach:

Systems that think rationally

Systems that act rationally


▪ JOHN MCCARTHY OF STANFORD UNIVERSITY

Defined artificial intelligence as, “The science and engineering of making intelligent

machines, especially intelligent computer programs. It is related to the similar

task of using computers to understand human intelligence”.

▪ ALAN TURING’S (Father of computer science) landmark paper “Computing Machinery

and Intelligence” marked the genesis of the artificial intelligence.

122 | P a g e
TYPES OF ARTIFICIAL INTELLIGENCE – WEAK AI VS. STRONG AI
▪ Weak AI, also known as Narrow AI or Artificial Narrow
WEAK AI or
Intelligence (ANI), is AI that has been trained and toned to do
NARROW AI or
particular tasks.
ARTIFICIAL
▪ Most of the AI that surrounds us today is powered by weak AI.
NARROW ▪ This form of artificial intelligence is anything but feeble;
INTELLIGENCE ▪ Example includes: - APPLE’S SIRI, AMAZON’S ALEXA, IBM WATSON, AND
DRIVERLESS CARS, AMONG OTHERS.

Artificial General Intelligence (AGI) and Artificial Super


Intelligence (AIS) comprise strong AI (ASI).
Artificial general intelligence (AGI), sometimes known
as general artificial intelligence (AI), is a HYPOTHETICAL

ARTIFICIAL
KIND OF ARTIFICIAL INTELLIGENCE in which a machine
possesses: -
GENERAL
▪ Human-level intellect;
INTELLIGENCE
▪ Self-aware consciousness and
STRONG AI
▪ The ability to solve problems, learn, and plan for
the future.
▪ Super Intelligence, also known as Artificial Super
Intelligence (ASI), would TRANSCEND THE INTELLIGENCE
ARTIFICIAL AND CAPABILITIES OF THE HUMAN BRAIN.
SUPER ▪ Despite the fact that strong AI is YET TOTALLY

INTELLIGENCE THEORETICAL AND HAS NO PRACTICAL APPLICATIONS, this


does not preclude AI researchers from studying its
development.
NOTE: -
In the meanwhile, the finest instances of ASI may come from science fiction, such as
HAL from 2001: A Space Odyssey, a superhuman, rogue computer aide.

123 | P a g e
TOPIC 4 - MACHINE LEARNING VS. DEEP LEARNING
POINT – 1
▪ Deep learning and Machine
learning are frequently used
INTERCHANGEABLY;

▪ It is important to note the


distinctions between the two;
▪ Both Deep learning and Machine
learning are subfields of artificial
intelligence;
▪ Deep learning is a subfield of
machine learning.

124 | P a g e
ARTIFICIAL INTELLIGENCE MACHINE LEARNING DEEP LEARNING

▪ Intelligence, as defined in Chamber’s ▪ Machine Learning is a type of Artificial ▪ Deep learning automates a significant portion of the

feature extraction step, reducing the need for manual


dictionary: “The ability to use memory, Intelligence (AI) that provides computers with
human involvement and enabling theusage of bigger data sets.
Knowledge, experience, understanding, the ability to automatically learn without
▪ It is capable of ingesting unstructured data in its raw form
reasoning, imagination, and judgement to solve explicitly programmed. (e.g., text and photos) and can automatically establish the

problems and adapt to new situations”. ▪ In ML, we TEACH machines with data to perform hierarchy of characteristics that differentiate certain

data categories from one another.


▪ The ability described above when exhibited by a particular task and give an accurate result i.e.,
▪ Human specialists develop the hierarchy of
machines is called Artificial Intelligence (AI). it allows a machine to automatically learn from
characteristics in order to comprehend the distinctions
▪ In AI, we MAKE intelligent systems to perform past data without programming explicitly. between data inputs.
any task like a human.

125 | P a g e
MACHINE LEARNING VS. DEEP LEARNING
POINT – 2
MACHINE LEARNING DEEP LEARNING
Deep learning and machine learning differ in how their respective algorithms learn.

▪ To be Qualified for deep learning, there has to be at least three


layers;
It can be defined as a shallow neural network which consists ▪ Neural networks truly constitute deep learning;
▪ “Deep” in deep learning REFERS TO A NEURAL NETWORK WITH MORE
one Input and one Output, with barely one Hidden layer.
THAN THREE LAYERS, which includes inputs and outputs, and may be
termed a deep learning method;
▪ Deep learning may be thought of as “SCALABLE MACHINE
LEARNING,” as Lex Fridman stated in the aforementioned MIT
presentation.

126 | P a g e
MACHINE LEARNING VS. DEEP LEARNING
POINT – 3
MACHINE LEARNING DEEP LEARNING
“Classical” or “Non – deep” machine learning requires more
It does not require human interaction to interpret data.
human interaction to learn.

127 | P a g e
TOPIC - 5
INTRODUCTION TO ROBOTIC PROCESS AUTOMATION
▪ With RPA, software users develop software robots or “bots” that are capable of

learning, simulating, and executing rules-based business processes;

▪ By studying human digital behaviours, RPA automation enables users to CONSTRUCT

BOTS;

▪ Robotic Process Automation software bots can communicate with any application or

system in the same manner that humans can, with the exception that RPA bots can

function continuously, around-the-clock, and with 100 percent accuracy and

dependability;

▪ Robotic Process Automation bots possess a digital skill set that exceeds that of

humans. Consider RPA bots to be a Digital Workforce capable of interacting with any

system or application.

BENEFITS OF RPA
1) Higher productivity;

2) Higher accuracy;

3) Saving of cost;

4) Better customer experience;

5) Scalability;

6) Harnessing AI.

128 | P a g e
TOPIC – 6
INTRODUCTION TO MACHINE LEARNING
▪ Machine learning (ML) is a branch of study devoted to develop systems that “LEARN”

TO USE DATA FOR IMPROVING PERFORMANCE ON A SET OF TASKS.

▪ It is a critical component of Artificial Intelligence that GENERATE PREDICTIONS OR

CONCLUSIONS WITHOUT BEING EXPLICITLY TAUGHT TO DO SO i.e., Programs that are capable

of machine learning can complete tasks without being expressly designed to do so.

▪ It includes computers learning from available data in order to do certain jobs.

▪ Machine learning algorithms CONSTRUCT A MODEL BASED ON TRAINING DATA AND SAMPLE

DATA.

▪ Applications of Machine Learning includes medicine, email filtering, speech

recognition etc.

129 | P a g e
APPROACHES TOWARDS MACHINE LEARNING
On the basis of the type of “signal” or “feedback” provided to the learning
system, machine learning systems are generally categorized into five major
categories:
SUPERVISED LEARNING
CONSTRUCT
Supervised learning algorithms construct a mathematical model of
MATHEMATICAL MODEL
OF DATASET a data set.
INCLUDES inputs and expected outcomes;
DATASET
CONSISTS
of collection of training examples and is known as
TRAINING DATA.

It consists of one or more inputs and the expected


CONSISTS
output;
TRAINING DATA
REFERRED to as a SUPERVISORY SIGNAL;

REPRESENTED by a MATRIX.

▪ Supervised learning algorithms discover a function that may be


used to PREDICT THE OUTPUT ASSOCIATED WITH FRESH INPUTS.

▪ A function that is optimum will enable the algorithm to find the


WORKING proper output for inputs that were not included in the training
data.
▪ It is claimed that an ALGORITHM HAS “LEARNED” TO DO A TASK if it
improves its outputs or predictions over time.
Active learning, classification and regression are examples of
supervised-learning algorithms.
CLASSIFICATION REGRESSION

▪ Classification algorithms are Regression techniques are used


EXAMPLES used when the outputs are when the outputs may take on
limited to a certain set of any value within a given range.
values;
▪ Classification algorithm filters
incoming emails. (Spam)
It has uses in: -
▪ FACE verification;
▪ MONITORing visual identities;
USES
▪ SPEAKER verification;
▪ RANKing;
▪ RECOMMENDation systems.

130 | P a g e
UNSUPERVISED LEARNING
DATASET
Unsupervised learning approaches utilize a dataset comprising just
COMPRISING JUST
INPUTS inputs to IDENTIFY DATA STRUCTURE.
EXAMPLES GROUPING, CLUSTERING;

The algorithms are taught using unlabeled,


TAUGHT
UNSUPERVISED unclassified and uncategorized test data;
LEARNING
ALGORITHMS Identify similarities in the data and respond based
IDENTIFY &
RESPOND
on the presence or absence of such similarities in
each new data set.
POINT – 1
Cluster analysis is the process of
OBSERVATIONS - SAME OBSERVATIONS - OTHER
ASSIGNING
CLUSTER - SIMILAR CLUSTERS – DIFFERENT

Assigning a set of so that observations while observations


data to subsets within the same obtained from other

CLUSTERING
(called clusters) cluster are similar clusters are
based on one or more different.
present criteria
POINT – 2
Different clustering approaches necessitate: -
Varying assumptions regarding the structure of the data, which is
frequently characterized by a similarity metric and evaluated.
SEMI-SUPERVISED LEARNING
Semi-supervised learning is intermediate BETWEEN UNSUPERVISED

INTERMEDIATE
LEARNING (without labelled training data) AND SUPERVISED LEARNING
(UL – SL)
(with completely labelled training data).

Many machine-learning researchers have discovered that when


COMBINED
UNLABELLED DATA IS COMBINED WITH A TINY QUANTITY OF LABELLED DATA ,
(UD – LD)
there is a significant gain in learning accuracy.

131 | P a g e
REINFORCEMENT LEARNING
1.) Reinforcement learning is a subfield of machine learning concerned with

determining: -

“How software AGENTS should operate in a given ENVIRONMENT so as to maximize certain

REWARD”.

2.) The environment is generally represented as a Markov decision process in machine

learning (MDP).

3.) Autonomous cars and learning to play a game against a human opponent both employ

reinforcement learning algorithms.

4.) Due to the field’s generic nature, it is explored in several different fields, including

game theory, control theory, operations research, information theory, simulation-

based optimization, multi-agent systems, swarm intelligence, statistics, and genetic

algorithms.
DIMENSIONALITY REDUCTION
▪ PRINCIPAL COMPONENT ANALYSIS is a well-known technique for dimensionality reduction

(PCA).

▪ PCA includes transforming data with more dimensions (e.g., 3D) to a smaller space

(e.g., 2D). This results in a decreased data dimension (2D as opposed to 3D), while

retaining the original variables in the model and without altering the data.

▪ Dimensionality reduction is the process of ACQUIRING A SET OF MAJOR VARIABLES in order

to REDUCE THE NUMBER OF RANDOM VARIABLES under consideration.

▪ In other words, it is the PROCESS OF LOWERING THE SIZE OF THE FEATURE SET, which is also

referred to as the “number of features.”

▪ The majority of dimensionality reduction strategies may be categorized as either

deletion or extraction of features.

132 | P a g e
11.6 – MODEL VS DATA – DRIVEN DECISION MAKING

INTRODUCTION
▪ A crucial feature of Data science to keep in mind is that POOR DATA WILL NEVER RESULT

IN SUPERIOR PERFORMANCE, regardless of how strong your model is.

▪ In artificial intelligence, there are two schools of thought: data-driven and model-

driven.

▪ The data-driven strategy focuses on enhancing data quality in order to ENHANCE THE

PERFORMANCE OF A PARTICULAR PROBLEM STATEMENT. In contrast, the model-driven method

attempts to INCREASE PERFORMANCE BY DEVELOPING NEW MODELS and algorithmic

manipulations.

▪ In a perfect world, these should go hand in hand.

133 | P a g e

You might also like