Business Data Analytics
Business Data Analytics
(CMA,B.Com)
Index
SL NO Module Name Page No
8 Introduction to Data Science for Business Decision-Making
8.1 Meaning, Nature, Properties, Scope of Data 2
8.2 Types of Data in Finance and Costing 2
8.3 Digitization of Data and Information 3
8.4 Transformation of Data to Decision Relevant Information 4
8.5 Communication of Information for Quality Decision-making 5
8.6 Professional Scepticism Regarding Data 5
8.7 Ethical Use of Data and Information 5
9 Data Processing, Organisation, Cleaning and Validation
9.1 Development of Data Processing 7
9.2 Functions of Data Processing 8
9.3 Data Organisation and Distribution 9
9.4 Data Cleaning and Validation 11
10 Data Presentation: Visualisation and Graphical Presentation
10.1 Data Visualisation of Financial and Non-Financial Data 14
10.2 Objective and Function of Data Presentation 14
10.3 Data Presentation Architecture 15
10.4 Dashboard, Graphs, Diagrams, Tables, Report Design 16
10.5 Tools and Techniques of Visualisation and Graphical Presentation 18
11 Data Analysis and Modelling
11.1 Process, Benefits and Types of Data Analysis 19
11.2 Data Mining and Implementation of Data Mining 20
11.3 Analytics and Model Building 21
11.4 Standards for Data Tagging and Reporting (XML, XBRL) 22
11.5 Cloud Computing, BI, AI, RPA and Machine Learning 23
11.6 Model vs. Data-driven Decision-making 26
Weightage
5% 5%
5% 5%
1|Page
Introduction to Data Science for Business Decision-Making
Data is a source of information and information needs to be processed for gathering knowledge. Any
‘data’ on its own does not confer any meaning.
When these ‘information’ is used for solving a problem, we say it’s the use of knowledge
Nature of Data
• Quantitative financial data: By the term ‘quantitative data’, we mean the data expressed in
numbers. The quantitative data availability in finance is significant.
Eg.The stock price data, financial statements
• Qualitative financial data: However, some data in financial studies may appear in a qualitative
format e.g. text, videos, audio etc. These types of data may be very useful for financial analysis.
Eg.The ‘management discussion and analysis’ presented as part of annual report of a company
Types of data
(i) Nominal Scale: Nominal scale is being used for categorising data. Under this scale,
observations are classified based on certain characteristics. The category labels may
contain numbers but have no numerical value
(ii) Ordinal Scale: Ordinal scale is being used for classifying and put it in order. The numbers
just indicate an order. They do not specify how much better or worse a stock is at a specific
price compared to one with a lower price.
(iii) Interval scale: Interval scale is used for categorising and ranking using an equal interval
scale. Equal intervals separate neighbouring scale values. As a result of scale’s arbitrary
zero point, ratios cannot be calculated
(iv) Ratio scale: The ratio scale possesses all characteristics of the nominal, ordinal, and
interval scales. The acquired data can not only be classified and rated on a ratio scale, but
2|Page
also have equal intervals. A ratio scale has a true zero, meaning that zero has a significant
value
i. To provide a widespread access of data and information to a very large group of users
simultaneously.
ii. Preservation of data for a longer period.
Why we digitize?
• Improves classification and indexing for documents, this helps in retrieval of the records.
• Digitized records may be accessed by more than one person simultaneously.
• It becomes easier to reuse the data, which are difficult to reuse in present format e.g. very
large maps, data recorded in microfilms etc.
• Helps in work processing
• Higher integration with business information systems
• Easier to keep back-up files and retrieval during any unexpected disaster
• Can be accessed from multiple locations through networked systems
• Increased scope for rise in organizational productivity
• Requires less physical storage space
How do we digitize?
6 phases:
Factors to be identified :
Phase 2: Assessment
In any institutions, all records are never digitized. The data that requires digitization is to be decided
on the basis of content and context. Some data may be digitized in a consolidated format, and some
in detailed format. The files, tables, documents, expected future use etc are to be accessed and
evaluated for the assessment. The hardware and software requirements for digitization is also
assessed at this stage. The human resource requirement for executing the digitization project is also
3|Page
planned. The risk assessment at this level e.g. possibilities of natural disasters, and/or cyber attacks
etc also need to be completed.
Phase 3: Planning
Successful execution of digitization project needs meticulous planning.. The institution may decide to
complete the digitization in-house or alternatively by an outsourced agency. It may also be done on-
demand or in batches.
Upon the completion of assessment and planning phase, the digitization activities start
Once the digitization of records is complete, there are few additional requirements arise which may
be linked to administration of records. The permission for accession of data, intellectual control (over
data), classification (if necessary), and upkeeping and maintenance of data are few additional
requirements for data management.
Phase 6: Evaluation
Once the digitization project is updated and implemented, the final phase should be a systematic
determination of the project’s merit, worth and significant using objective criteria. The primary
purpose is to enable reflection and assist identify changes that would improve future digitization
processes.
1. Collection of data: The collection of data may be done with standardized systems in place.
Appropriate software and hardware may be used for this purpose. Appointment of trained
staff also plays an important role in collecting accurate and relevant data.
2. Organising the data: The raw data needs to be organized in an appropriate manner to
generate relevant information. The data may be grouped, arranged in a manner that create
useful information for the target user groups.
3. Data processing: At this step, data needs to be cleaned to remove the unnecessary elements.
If any data point is missing or not available, that also need to be addressed. The options
available for presentation format for the data also need to be decided.
4. Integration of data: Data integration is the process of combining data from various sources
into a single, unified form. This step include creation of data network sources, a master server
and users accessing the data from master server. Data integration eventually enables the
analytics tools to produce effective, actionable business intelligence.
5. Data reporting: Data reporting stage involves translating the data into a consumable format
to make it accessible by the users..
6. Data utilization: At this ultimate step, data is being utilized to back corporate activities and
enhance operational efficiencies and productivity for the growth of business. This makes the
corporate decision making really ‘data driven’.
4|Page
8.5 Communication of Information for Quality Decision-making
By transforming the information into a process for quality decision making, the firm should achieve
the following abilities:
1. Logical understanding of a wide-ranging structured and unstructured data and put on that
information to corporate planning, budgeting and forecasting and decision support
2. Predict outcomes more effectively compared to conventional forecasting techniques based
on historical financial reports
3. Real time spotting of emerging opportunities and also capability gaps.
4. Making strategies for responding to uncertain events like market volatility and ‘black swan’
events through simulation.
5. Diagnose, filter and excerpt value from financial and operational information for making
better business decisions
6. Recognize viable advantages to service customers in a better manner
7. Identifying possible fraud possibilities on the basis of data analytics.
8. Building impressive and useful dashboards to measure and demonstrate success leading to
effective strategies.
Professional scepticism is an important focus area for practitioners, researchers, regulators and
standard setters. At the same time, professional scepticism may result into additional costs e.g.
strained client relationships, and budget coverages. Under such circumstances, it is important to
identify and understand conditions in which the finance and audit professionals should apply
professional scepticism.
1. Regarding ownership: The first principle is that ownership of any personal information
belongs to the person. It is unlawful and unethical to collect someone’s personal data without
their consent.. It is always advisable to ask for permission beforehand to avoid future legal
and ethical complications.
2. Regarding transparency: The objective with which the company is collecting user’s data
should be known to the user.While collecting the financial data from clients, it should be
clearly mentioned that for which purpose the data should be used.
3. Regarding privacy: As the user may allow to collect, store and analyze the personally
identifiable information (PII), that does not imply it should be made publicly available. For
companies, it is mandatory to publish some financial information to public e.g. through annual
reports. However, there may be many confidential information, which if falls on a wrong hand
may create problems and financial loss. To protect privacy of data, a data security process
should be in place. This may include file encryption and dual authentication password etc.
5|Page
4. Regarding intention: The intension of data analysis should never be making profits out of
others weaknesses or for hurting others. Collecting data which is unnecessary for analysis
should be avoided and it’s unethical.
5. Regarding outcomes: In some cases, even if the intentions are good, the result of data analysis
may inadvertently hurt the clients and data providers. This is called disparate impact, which is
unethical
6|Page
Data Processing, Organisation, Cleaning and Validation
1. Manual DP: Manual DP involves processing data without much assistance from machines.
Prior to the phase of mechanical DP only small-scale data processing was possible using
manual efforts.
2. Mechanical DP: Mechanical DP processes data using mechanical (not modern computers)
tools and technologies
3. Electronic DP: Data processing is being done electronically using computers and other
cutting-edge electronics.
1. Risk analytics: Business inevitably involves risk, particularly in the financial industry. It is
crucial to determine the risk factor before making any decisions. Once a danger has been
recognised, it may be prioritised and its recurrence closely watched.
2. Real time analytics: With modern advancements, Businesses can provide the optimal user
experience. Businesses can now respond quickly to consumer interactions. With real-time
analysis, there are no delays in establishing a customer’s worth to an organisation, and credit
ratings and transactions are far more precise.
3. Customer data management: Data science enables effective management of client data.
Using methods such as text analytics, data mining, and natural language processing, data
science is well equipped to deal with massive volumes of unstructured new data.
Consequently, despite the fact that data availability has been enhanced, data science implies
that a company’s analytical capabilities may also be upgraded, leading to a greater
understanding of market patterns and client behaviour.
4. Consumer Analytics: It is as important to ensure that each client receives a customised service
as it is to process their data swiftly and efficiently, without time-intensive individualised
analysis.
5. Customer segmentation: Customers are frequently segmented based on socioeconomic
factors, such as geography, age, and buying patterns. Customers are frequently segmented
based on socioeconomic factors, such as geography, age, and buying patterns.
6. Personalized services: Major organisations strive to provide customised service to their
consumers as a method of enhancing their reputation and increasing customer lifetime value.
This is also true for businesses in the finance sector.
7. Advanced customer service: Data science’s capacity to give superior customer service goes
hand in hand with its ability to provide customised services. As client interactions may be
evaluated in real-time, more effective recommendations can be offered to the customer care
agent managing the customer’s case throughout the conversation.
7|Page
8. Predictive Analytics: Predictive analytics enables organisations in the financial sector to
extrapolate from existing data and anticipate what may occur in the future, including how
patterns may evolve. When prediction is necessary, machine learning is utilised. Using
machine learning techniques, pre-processed data may be input into the system in order for it
to learn how to anticipate future occurrences accurately.
9. Fraud detection: With a rise in financial transactions, the risk for fraud also increases. Tracking
incidents of fraud, such as identity theft and credit card scams, and limiting the resulting harm
is a primary responsibility for financial institutions. As the technologies used to analyse big
data become more sophisticated, so do their capacity to detect fraud early on.
10. Anomaly detection: Financial services have long placed a premium on detecting abnormalities
in a customer’s bank account activities, partly because anomalies are only proved to be
anomalous after the event happens. Although data science can provide real-time insights, it
cannot anticipate singular incidents of credit card fraud or identity theft.
11. Algorithmic trading: Algorithmic trading is one of the key uses of data science in finance.
Algorithmic trading happens when an unsupervised computer utilising the intelligence
supplied by an algorithm trade suggestion on the stock market. As a consequence, it
eliminates the risk of loss caused by indecision and human error.
1. Validation: Data validation may be defined as ‘An activity aimed at verifying whether the value
of a data item comes from the given (finite or infinite) set of acceptable values.’ Data
validation could be operationally defined as a process which ensures the correspondence of
the final (published) data with a number of quality characteristics. A decision-making process
called data validation leads to the acceptance or rejection of data as acceptable
2. Sorting: Data sorting is any procedure that organises data into a meaningful order to make it
simpler to comprehend, analyse, and visualise. Sorting is a typical strategy for presenting
research data in a manner that facilitates comprehension of the story being told by the data.
Sorting is also frequently used to rank or prioritise records. Using sorting functions is an easy
idea to comprehend.
3. Aggregation: Data aggregation refers to any process in which data is collected and
summarised. When data is aggregated, individual data rows, which are often compiled from
several sources, are replaced with summaries or totals. Groups of observed aggregates are
replaced with statistical summaries based on these observations. A data warehouse often
contains aggregate data since it may offer answers to analytical inquiries and drastically cut
the time required to query massive data sets.
4. Analysis: Data analysis is described as the process of cleaning, converting, and modelling data
to obtain actionable business intelligence. The objective of data analysis is to extract relevant
information from data and make decisions based on this knowledge.
5. Reporting: Data reporting is the act of gathering and structuring raw data and turning it into
a consumable format in order to evaluate the organisation’s continuous performance. The
data reports can provide answers to fundamental inquiries regarding the status of the firm.
This gives an up-to-date record of the company’s financial health or a portion of the finances
6. Classification: Data classification is the process of classifying data according to important
categories so that it may be utilised and safeguarded more effectively. The categorization
process makes data easier to identify and access on a fundamental level. Regarding risk
8|Page
management, compliance, and data security, the classification of data is of special relevance.
Classifying data entails labelling it to make it searchable and trackable. Additionally, it avoids
many duplications of data, which can minimise storage and backup expenses and accelerate
the search procedure. It is standard practise to divide data and systems into three risk
categories.
i. Low risk: If data is accessible to the public and recovery is simple, then this data
collection and the mechanisms around it pose a smaller risk than others.
ii. Moderate risk: Essentially, they are non-public or internal (to a business or its partners)
data. However, it is unlikely to be too mission-critical or sensitive to be considered “high
risk.” The intermediate category may include proprietary operating processes, cost of
products, and certain corporate paperwork.
iii. High risk: Anything even vaguely sensitive or critical to operational security falls under
the category of high risk. Additionally, data that is incredibly difficult to retrieve (if lost).
All secret, sensitive, and essential data falls under the category of high risk.
1. Understanding the current setup: Taking a comprehensive look at the location of the
organisation’s current data and any applicable legislation is likely the best beginning
point for successfully classifying data. Before one classifies data, one must know what
data he is having.
2. Creation of a data classification policy: Without adequate policy, maintaining
compliance with data protection standards in an organisation is practically difficult.
Priority number one should be the creation of a policy.
3. Prioritize and organize data: Now that a data classification policy is in place, it is time
to categorise the data. Based on the sensitivity and privacy of the data, the optimal
method to be chosen for tagging it.
Data organisation is the classification of unstructured data into distinct groups. This raw data
comprises variables’ observations. Data organisation allows us to arrange data in a manner that is easy
to understand and manipulate. It is challenging to deal with or analyse raw data.
Data distribution
Data distribution is a function that identifies and quantifies all potential values for a variable, as well
as their relative frequency (probability of how often they occur). Any population with dispersed data
is categorised as a distribution. It is necessary to establish the population’s distribution type in order
to analyse it using the appropriate statistical procedures.
9|Page
Types of distribution Distributions are basically classified based on the type of data:
1. Discrete distributions: A discrete distribution that results from countable data and has a finite number
of potential values. In addition, discrete distributions may be displayed in tables, and the values of the
random variable can be counted. Example: rolling dice, selecting a specific amount of heads, etc.
i. Binomial distributions: The binomial distribution quantifies the chance of obtaining a specific
number of successes or failures each experiment.Binomial distribution applies to attributes
that are categorised into two mutually exclusive and exhaustive classes, such as number of
successes/failures and number of acceptances/rejections.
Example: When tossing a coin: The likelihood of a coin falling on its head is one-half and the
probability of a coin landing on its tail is one-half.
ii. Poisson distribution: The Poisson distribution is the discrete probability distribution that
quantifies the chance of a certain number of events occurring in a given time period, where
the events occur in a well-defined order.Poisson distribution applies to attributes that can
potentially take on huge values, but in practise take on tiny ones.
Example: A marketing representative from an advertising firm chooses hockey players from
several institutions at random till he discovers an Olympic participant.
2. Continuous distributions: A distribution with an unlimited number of (variable) data points that may
be represented on a continuous measuring scale. A continuous random variable is a random variable
with an unlimited and uncountable set of potential values. It is more than a simple count and is often
described using probability density functions (pdf). The probability density function describes the
characteristics of a random variable. Normally clustered frequency distribution is seen. Therefore, the
probability density function views it as the distribution’s “shape.”
i. Normal distribution: Gaussian distribution is another name for normal distribution. It is a bell-
shaped curve with a greater frequency (probability density) around the core point. As values
go away from the centre value on each side, the frequency drops dramatically.
ii. Lognormal distribution: A continuous random variable x follows a lognormal distribution if
the distribution of its natural logarithm, ln(x), is normal.As the sample size rises, the
distribution of the sum of random variables approaches a normal distribution, independent
of the distribution of the individuals.
iii. F distribution: The F distribution is often employed to examine the equality of variances
between two normal populations.The F distribution is an asymmetric distribution with no
maximum value and a minimum value of 0.
10 | P a g e
iv. Chi square distributions: When independent variables with standard normal distribution are
squared and added, the chi square distribution occurs.
v. Exponential distribution: The exponential distribution is a probability distribution and one of
the most often employed continuous distributions. Used frequently to represent products
with a consistent failure rate.
vi. T student distribution: The t distribution or student’s t distribution is a probability distribution
with a bell shape that is symmetrical about its mean.Used frequently for testing hypotheses
and building confidence intervals for means. Substituted for the normal distribution when the
standard deviation cannot be determined.
Data cleaning is the process of correcting or deleting inaccurate, corrupted, improperly formatted,
duplicate, or insufficient data from a dataset. When several data sources are combined, there are
numerous chances for data duplication and mis-labelling. it is essential to build a template for your
data cleaning process so that you can be certain you are always doing the steps correctly.
Eliminate unnecessary observations from your dataset, such as duplicate or irrelevant observations.
Most duplicate observations will occur during data collecting. De-duplication is one of the most
important considerations for this procedure. Observations are deemed irrelevant when they do not
pertain to the specific topic you are attempting to study. This may make analysis more effective and
reduce distractions from your core objective, in addition to producing a more manageable and
effective dataset.
When measuring or transferring data, you may detect unusual naming standards, typos, or wrong
capitalization. These contradictions may lead to mislabelled classes or groups.
Occasionally, you will encounter observations that, at first look, do not appear to fit inside the data
you are evaluating. If you have a valid cause to eliminate an outlier, such as erroneous data input,
doing so will improve the performance of the data you are analysing. Occasionally, though, the arrival
of an outlier will prove a notion you’re working on. Remember that the existence of an outlier does
not imply that it is erroneous. This step is required to validate the number. Consider deleting an outlier
if it appears to be unrelated to the analysis or an error.
Many algorithms do not accept missing values, hence missing data cannot be ignored. There are
several approaches to handle missing data. Although neither is desirable, both should be explored.
As a first alternative, the observations with missing values may be dropped, but doing so may result
in the loss of information. This should be kept in mind before doing so.
11 | P a g e
As a second alternative, the missing numbers may be entered based on other observations. Again,
there is a chance that the data’s integrity may be compromised, as action may be based on
assumptions rather than real observations.
As part of basic validation, one should be able to answer the following questions at the conclusion of
the data cleaning process:
(b) Does the data adhere to the regulations applicable to its field?
(c) Does it verify or contradict your working hypothesis, or does it shed any light on it?
(d) Can data patterns assist you in formulating your next theory?
False assumptions based on inaccurate or “dirty” data can lead to ineffective company strategies and
decisions.
(i) Validity
(ii) Accuracy
(iii) Completeness
(iv) Consistency
Data validation
Data validation is a crucial component of any data management process. If the initial data is not valid,
the outcomes will not be accurate either. It is therefore vital to check and validate data before using
it.
1. Data type check: A data type check verifies that the entered data has the appropriate data
type.
12 | P a g e
2. Code check: A code check verifies that a field’s value is picked from a legitimate set of options
or that it adheres to specific formatting requirements.
3. Range check: A range check determines whether or not input data falls inside a specified
range. Latitude and longitude, for instance, are frequently employed in geographic data.
4. Format check: Numerous data kinds adhere to a set format. Date columns that are kept in a
fixed format. A data validation technique that ensures data are in the correct format
contributes to data and temporal consistency.
5. Consistency check: A consistency check is a form of logical check that verifies that the data
has been input in a consistent manner. Checking whether a package’s delivery date is later
than its shipment date is one example.
6. Uniqueness check: A uniqueness check guarantees that an item is not put into a database
numerous times.
13 | P a g e
Data Presentation: Visualisation and Graphical Presentation
Finance professionals who are investigating how data visualisation might help their analytics efforts
and communication should keep the following in mind:
• Know the objective: Grasp the objectives. First establishment of the information if it’s
conceptual or data-driven (i.e. does it rely on qualitative or quantitative data) is required.
Specify if the objective is exploratory or declarative. Determining the answers would assist in
determining the tools and formats required.
• Always keep the audience in mind: Who views the data visualisations will determine the
degree of detail required.
• Invest in the best technology: The firm should first implement an ERP that removes data silos
and develops a centralised information repository. Then, look for tools that allows to instantly
display data by dragging and dropping assets, charts, and graphs; offer search options and
guided navigation to assist in answering queries; and enable any member of the financial team
to generate graphics.
• Improve the team’s ability to visualise data: Find ways to incorporate user training on data
visualisation tools, so that the staff is aware of the options that the technology affords.
Additionally, when making new recruits, look out individuals with proficiency in data analytics
and extensive data visualisation experience.
According to an article published by Harvard Business Review (HBR), the most common errors made
by analysts that makes a data visualisation unsuccessful are:
Before incorporating the data into visualisation, the objective should be fixed, which is to present
large volumes of information in a way that decision-makers can readily ingest. A great visualisation
relies on the designer comprehending the intended audience and executing on three essential
points:
i. Who will read and understand the material and how will they do so? Can it be presumed
that it understands the words and ideas employed, or if there is a need to provide it with
visual cues (e.g., a green arrow indicating that good is ascending)? A specialist audience will
have different expectations than the broader public.
14 | P a g e
ii.
What are the expectations of the audience, and what information is most beneficial to
them?
iii. What is the functional role of the visualisation, and how may users take action based on it?
A visualisation that is exploratory should leave viewers with questions to investigate, but
visualisations that are instructional or confirmatory should not.
2. Setting up a clear framework
The designer must guarantee that all viewers have the same understanding of what the visualisation
represents. To do this, the designer must establish a framework consisting of the semantics and syntax
within which the data information is intended to be understood. The semantics pertain to the meaning
of the words and images employed, whereas the syntax is concerned with the form of the
communication. For instance, when utilising an icon, the element should resemble the object it
symbolises, with size, colour, and placement all conveying significance to the viewer.
Ensure that the data is clean and that the analyst understands its peculiarities before doing anything
else.
3. Telling a story
Storytelling assists the audience in gaining understanding from facts. Information visualisation is a
technique that turns data and knowledge into a form that is perceivable by the human visual system.
The objective is to enable the audience to see, comprehend, and interpret the information. Design
strategies that favour specific interpretations in visuals that “tell a narrative” can have a substantial
impact on the interpretation of the end user.
Scope of DPA
(i) Defining significant meaning (relevant information) required by each audience member in
every scenario.
(ii) Obtaining the proper data (focus area, historic reach, extensiveness, level of detail, etc.)
(iii) Determining the needed frequency of data refreshes (the currency of the data)
(iv) determining the optimal presentation moment (the frequency of the user needs to view the
data)
(v) Using suitable analysis, categorization, visualisation, and other display styles
(vi) Developing appropriate delivery techniques for each audience member based on their job,
duties, locations, and technological accesses
15 | P a g e
10.4 Dashboard, Graphs, Diagrams, Tables, Report Design
A data visualisation dashboard is an interactive dashboard that enables to manage important metrics
across numerous financial channels, visualise the data points, and generate reports for customers that
summarise the results.
i. Bar Chart: It may be used to easily compare data across categories, highlight discrepancies,
demonstrate trends and outliers, and illustrate historical highs and lows. Bar graphs are very useful
when the data can be divided into distinct categories
ii. Line chart: The line chart or line graph joins various data points, displaying them as a continuous
progression. Utilize line charts to observe trends in data, often over time.
iii. Pie Chart: A pie chart (or circle chart) is a circular graphical representation of statistical data that is
segmented to demonstrate numerical proportion. In a pie chart, the arc length of each slice (and, by
extension, its centre angle and area) is proportionate to the value it depicts
iv. Scatter plots: Scatter plots are a useful tool for examining the connection between many variables,
revealing whether one variable is a good predictor of another or whether they tend to vary
independently. A scatter plot displays several unique data points on a single graph.
0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 0 1 2 3
v. Bubble chart: Although bubbles are not exactly their own sort of visualisation, utilising them as a
method enhances scatter plots and maps that illustrate the link between three or more variables. By
varying the size and colour of circles, charts display enormous amounts of data in an aesthetically
engaging manner
vi. Histogram: Histograms illustrate the distribution of the data among various groups. Histograms divide
data into discrete categories (sometimes known as “bins”) and provide a bar proportionate to the
number of entries inside each category
16 | P a g e
vii. Map: For displaying any type of location data, including postal codes, state abbreviations, country
names, and custom geocoding, maps are a no-brainer. If the data is related with geographic
information, maps are a simple and effective approach to illustrate the relationship.
viii. Density map: Density maps indicate patterns or relative concentrations that might otherwise be
obscured by overlapping marks on a map, allowing to identify areas with a larger or lesser number of
data points
ix. Gantt Chart: Gantt charts represent a project’s timeline or activity changes across time. A Gantt chart
depicts tasks that must be accomplished before others may begin, as well as the allocation of
resources.
Tables
Tables, often known as “crosstabs” or “matrices,” emphasise individual values above aesthetic
formatting. They are one of the most prevalent methods for showing data and, thus, one of the most
essential methods for analysing data
17 | P a g e
How to use data Visualisation in report design?
➢ Find a story in the data: Data-driven storytelling is a powerful tool. Finding a story that
connects with the reader can help to create an effective report.
➢ Create a narrative: When some individuals hear the term “data storytelling,” they believe that
it consists of a few statistics and that the task is complete. This is a frequent misconception
that is false. Strong data storytelling comprises an engaging narrative that takes the audience
through the facts and aids in their comprehension. Moreover, an explanation of the
significance of these ideas is essential. To compose an excellent story, one must:
(i) Engage the viewer with a catchy title and subheadings.
(ii) Incorporate context into the data.
(iii) Create a consistent and logical flow.
(iv) Highlight significant discoveries and insights from the data.
➢ Choose the most suitable data Visualisation: Data Visualisation is not limited to the creation
of charts and graphs. It involves presenting the facts in the most comprehensible chart
possible. Applying basic design principles and utilising features like as form, size, colour, and
labelling may have a significant impact on how people comprehend the data
➢ Follow the visual language: It is essential to adhere to data visualisation principles in order to
achieve both uniformity and comprehension. A strategic methodology assists in
implementation.
➢ Publicize the report: Some reports are not intended for public consumption. However, since
they include so much essential information, they may contain knowledge that is of interest to
individuals or media outside of the business.
18 | P a g e
Data Analysis and Modelling
Data may be segmented by a variety of parameters, including age, population, income, and sex. The
data values might be either numeric or category.
Data may be gathered from several sources, including internet sources, computers, personnel, and
community sources.
After collecting the data, it must be arranged so that it can be analysed. Statistical data can be
organised on a spreadsheet or other programme capable of handling statistical data.
The data is initially cleansed to verify that there are no duplicates or errors. The document is then
examined to ensure that it is comprehensive. Before data is sent to a data analyst for analysis, it is
beneficial to rectify or eliminate any errors by cleaning the data.
There are four types of data analytics process: (i) Descriptive analytics (ii) Diagnostics analytics (iii)
Predictive analytics (iv) Prescriptive analytics
Companies can use the information gained from data analytics to base their decisions, resulting in
enhanced outcomes. Using data analytics significantly reduces the amount of guesswork involved in
preparing marketing plans, deciding what materials to produce, and more.
Data analytics assists firms in streamlining their processes, conserving resources, and increasing their
profitability. When firms have a better understanding of their audience’s demands, they spend less
time creating advertising that do not fulfil those needs.
Data analytics gives organisations with a more in-depth understanding of their customers, employees
and other stake holders.
19 | P a g e
11.2 Data Mining and Implementation of Data Mining
Data mining, also known as knowledge discovery in data (KDD), is the extraction of patterns and other
useful information from massive data sets.
Data scientists and business stakeholders must identify the business challenge, which informs the data
queries and parameters for a specific project. Analysts may also need to conduct further study to
adequately comprehend the company environment.
(ii) Preparation of data: Once the scale of the problem has been established, it is simpler for data
scientists to determine which collection of data will assist the company in answering crucial questions.
Once the pertinent data has been collected, it will be cleansed by eliminating any noise, such as
repetitions, missing numbers, and outliers .
Data scientists may study any intriguing relationship between the data, such as frequent patterns,
clustering algorithms, or correlations, depending on the sort of research. While high frequency
patterns have larger applicability, data variations can often be more fascinating, exposing possible
fraud areas.Depending on the available data, deep learning algorithms may also be utilised to
categorise or cluster a data collection.
After aggregating the data, the findings must be analysed and understood. When completing results,
they must be valid, original, practical, and comprehensible. When this criterion is satisfied, companies
can execute new strategies based on this understanding, therefore attaining their intended goals.
Using various methods and approaches, data mining transforms vast quantities of data into valuable
information. Here are a few of the most prevalent:
An association rule is a rule-based technique for discovering associations between variables inside a
given dataset. These methodologies are commonly employed for market basket analysis, enabling
businesses to better comprehend the linkages between various items. Understanding client
consumption patterns helps organisations to create more effective cross-selling tactics and
recommendation engines.
Primarily utilised for deep learning algorithms, neural networks replicate the interconnection of the
human brain through layers of nodes to process training data. Every node has inputs, weights, a bias
(or threshold), as well as an output. If the output value exceeds a predetermined threshold, the node
“fires” and passes data to the subsequent network layer. Neural networks acquire this mapping
function by supervised learning and gradient descent, changing based on the loss function. When the
20 | P a g e
cost function is zero or close to it, we may have confidence in the model’s ability to produce the correct
answer.
Using classification or regression algorithms, this data mining methodology classifies or predicts likely
outcomes based on a collection of decisions. As its name implies, it employs a tree-like representation
to depict the potential results of these actions.
K-nearest neighbour, often known as the KNN algorithm, classifies data points depending on their
closeness to and correlation with other accessible data. This technique assumes that comparable data
points exist in close proximity to one another. Consequently, it attempts to measure the distance
between data points, often by Euclidean distance, and then assigns some on the most common
category or average.
Descriptive Analytics: Focuses on what has happened. It organizes historical data to provide insights
into past performance, often presented through charts and graphs. It is the simplest form of analytics
and serves as a foundation for more advanced types of analysis. Focuses on summarizing historical
data to track patterns and trends. The process involves data aggregation, preparation, analysis, and
presentation, often through visual tools like graphs and charts. While it offers insights into what has
happened, it doesn’t provide conclusions or predictions.
5 Steps :
Diagnostic Analytics: Explores why something happened. This type digs deeper into data to identify
reasons for particular outcomes, using techniques such as data discovery, drill-down, and correlation
analysis. goes beyond descriptive analysis to investigate why certain events occurred. Techniques such
as data discovery and mining help identify correlations and root causes behind specific outcomes. It’s
often used when businesses need to understand the underlying factors driving performance changes.
Diagnostic analytics develops solutions that may be used to discover answers to data-related problems
and to communicate insights within the organisation.Diagnostic analytics enables to derive value from
the data by asking the relevant questions and doing indepth analyses of the responses.
21 | P a g e
Predictive Analytics: Forecasts what might happen in the future. By examining past trends, predictive
analytics uses techniques like data mining, machine learning, and statistical modeling to estimate
potential future outcomes. applies statistical models and machine learning to predict future
outcomes. It uses historical data to forecast what may happen, providing businesses with foresight to
anticipate trends and customer behavior. it may serve as a crucial tool for forecasting probable future
occurrences and informing future corporate strategy. Examples include forecasting inventory needs
or predicting customer churn
Prescriptive Analytics: Advises what actions should be taken. This is the most advanced form, offering
recommendations based on predictive models and data analysis. It often uses machine learning to
evaluate various future scenarios and suggest optimal decisions. advises on the best course of action
by analyzing future scenarios. It uses complex algorithms to recommend decisions that optimize
business performance. Applications range from route optimization in GPS systems to strategic
decision-making in healthcare, manufacturing, and finance.
XML is a file format and markup language for storing, transferring, and recreating arbitrary data. It
specifies a set of standards for encoding texts in a format that is understandable by both humans and
machines. It is a textual data format with significant support for many human languages via Unicode.
Although XML’s architecture is centred on texts, the language is commonly used to express arbitrary
data structures, such as those employed by web services. Serialization, or storing, sending, and
rebuilding arbitrary data, is the primary function of XML. In order fortwo dissimilar systems to share
data, they must agree on a file format. XML normalises this procedure. XML is comparable to a
universal language for describing information.
As a markup language, XML labels, categorises, and arranges information systematically. The data
structure is represented by XML tags, which also contain information. The information included within
the tags is encoded according to the XML standard
Application of XML
XML is now widely utilised for the exchange of data via the Internet. There have been hundreds of
document formats created using XML syntax, including RSS, Atom, Office Open XML, OpenDocument,
SVG, and XHTML
XBRL is a data description language that facilitates the interchange of standard, comprehensible
corporate data.It is based on XML and enables the automated interchange and trustworthy extraction
of financial data across all software types and advanced technology, including Internet.XBRL allows
organisations to arrange data using tags. When a piece of data is labelled as “revenue,” for
instance,XBRL enabled applications know that it pertains to revenue.
22 | P a g e
Benefits of XBRL
1. All reports are automatically created from a single source of information, which reduces the
chance of erroneous data entry and hence increases data reliability.
2. Reduces expenses by simplifying and automating the preparation and production of reports
for various clients.
3. Accelerates the decision-making of financial entities such as banks and rating services.
4. Facilitates the publication of analyst and investor reports
5. Access, comparison, and analytic capabilities for information are unparalleled.
Simply described, cloud computing is the delivery of a variety of services through the Internet, or “the
cloud.” It involves storing and accessing data via distant servers as opposed to local hard drives and
private data centers.
1. Private cloud: Private cloud offers a cloud environment that is exclusive to a single corporate
organisation, with physical components housed on-premises or in a vendor’s data center. This
solution gives a high level of control due to the fact that the private cloud is available to just
one enterprise.
2. Public cloud: The public cloud stores and manages access to data and applications through
the internet. It is fully virtualized, enabling an environment in which shared resources may be
utilised as necessary. Because these resources are offered through the web, the public cloud
deployment model enables enterprises to grow with more ease; the option to pay for cloud
services on an as-needed basis is a significant benefit over local servers.
3. Hybrid cloud: Hybrid cloud blends private and public cloud models.The hybrid cloud
architecture enables businesses to store sensitive data on-premises and access it through
apps hosted in the public cloud. In order to comply with privacy rules, an organisation may,
for instance, keep sensitive user data in a private cloud and execute resource-intensive
computations in a public cloud
Business Intelligence:
Business intelligence includes business analytics, data mining, data visualisation, data tools and
infrastructure, and best practises to assist businesses in making choices that are more data-driven.
When you have a complete picture of your organization’s data and utilise it to drive change, remove
inefficiencies, and swiftly adjust to market or supply changes, you have contemporary business
intelligence
BI Methods:
(i) Data mining: Large datasets may be mined for patterns using databases, analytics, and
machine learning (ML).
(ii) Reporting: The dissemination of data analysis to stakeholders in order for them to form
conclusions and make decisions.
23 | P a g e
(iii) Performance metrics and benchmarking: Comparing current performance data to
previous performance data in order to measure performance versus objectives, generally
utilising customised dashboards.
(iv) Descriptive analytics: Utilizing basic data analysis to determine what transpired
(v) Querying: BI extracts responses from data sets in response to data-specific queries.
(vi) Statistical analysis: Taking the results of descriptive analytics and use statistics to further
explore the data, such as how and why this pattern occurred.
(vii) Data Visualization: Data consumption is facilitated by transforming data analysis into
visual representations such as charts, graphs, and histograms.
(viii) Visual Analysis: Exploring data using visual storytelling to share findings in real-time and
maintain the flow of analysis.
(ix) Data Preparation: Multiple data source compilation, dimension and measurement
identification, and data analysis preparation.
“ It is the science and engineering of making intelligent machines, especially intelligent computer
programs. It is related to the similar task of using computers to understand human intelligence, but AI
does not have to confine itself to methods that are biologically observable.”
Human approach:
Ideal approach:
Weak AI, also known as Narrow AI or Artificial Narrow Intelligence (ANI), is AI that has been trained
andhoned to do particular tasks. This form of artificial intelligence is anything but feeble; it allows
sophisticated applications such as Apple’s Siri, Amazon’s Alexa, .
24 | P a g e
Artificial General Intelligence (AGI) and Artificial Super Intelligence (AIS) comprise strong AI (ASI).
Artificial general intelligence (AGI), sometimes known as general artificial intelligence (AI), is a
hypothetical kind of artificial intelligence in which a machine possesses human-level intellect, a self-
aware consciousness, and the ability to solve problems, learn, and plan for the
future.Superintelligence, also known as Artificial Super Intelligence (ASI), would transcend the
intelligence and capabilities of the human brain.
Deep learning and machine learning differ in how their respective algorithms learn. Deep learning
automates a significant portion of the feature extraction step, reducing the need for manual human
involvement and enabling the usage of bigger data sets
With RPA, software users develop software robots or “bots” that are capable of learning, simulating,
and executing rules-based business processes. By studying human digital behaviours, RPA automation
enables users to construct bots. Give your bots instructions, then let them to complete the task.
Robotic Process Automation software bots can communicate with any application or system in the
same manner that humans can, with the exception that RPA bots can function continuously, around-
the-clock, and with 100 percent accuracy and dependability.
Benefits of RPA
Machine learning
Machine learning (ML) is a branch of study devoted to understanding and developing systems that
“learn,” or ways that use data to improve performance on a set of tasks. Considered a component of
artificial intelligence. In order to generate predictions or conclusions without being explicitly taught
to do so, machine learning algorithms construct a model based on training data and sample data. In
applications such as medicine, email filtering, speech recognition, and computer vision, when it is
difficult or impractical to create traditional algorithms to do the required tasks, machine learning
techniques are utilised.
1. Supervised Learning:
• In supervised learning, a model is trained on labeled data, meaning that the training
examples include both inputs and their corresponding outputs. The goal is to learn a
function that can make accurate predictions when given new, unseen inputs.
• Applications: Classification (e.g., email spam detection) and regression (e.g., predicting
house prices).
25 | P a g e
• Key concepts: Feature vectors, optimization of an objective function, and learning from
labeled data.
2. Unsupervised Learning:
• Unsupervised learning deals with data that does not have labeled outputs. The goal is to find
hidden structures, such as groups or patterns, in the data. Clustering and density estimation
are key techniques in this approach.
• Applications: Market segmentation, anomaly detection, and clustering in customer analysis.
• Key techniques: Cluster analysis, dimensionality reduction, and similarity detection.
3. Semi-Supervised Learning:
• Semi-supervised learning uses a small amount of labeled data combined with a large amount
of unlabeled data. This approach is useful when acquiring labeled data is expensive or time-
consuming, but large amounts of unlabeled data are available.
• Applications: Natural language processing, image recognition, and medical diagnoses.
• Key idea: The combination of labeled and unlabeled data improves the model's
performance.
4. Reinforcement Learning:
5. Dimensionality Reduction:
Data-driven strategies focus on improving data quality, governance, and management to enhance AI
performance. This involves ensuring that the data used is reliable, well-organized, and clean, which
is crucial for accurate and meaningful outputs.
Model-driven strategies focus on developing better algorithms and models to boost performance,
regardless of the quality of data. These methods have advanced significantly more than data-driven
approaches.
26 | P a g e
ALL THE VERY BEST FUTURE CMA’S
27 | P a g e