Module 3 - Analytics Techniques & Tools
Module 3 - Analytics Techniques & Tools
Module 3 - Analytics Techniques & Tools
MASTERCLASS
21st Century Leadership:
Mastering Analytical Thinking,
Technical Communication and
Adaptive Leadership
IMPORTANT NOTE
Materials in this course — unless otherwise indicated - are owned by Confexhub International Centre
for Professional Development (CICPD) and protected by copyright law.
Materials are presented in an educational context for personal use and study and should not be:
i. Shared, distributed, or sold, whether in print or digitally, outside the course without permission;
or
ii. Posted or linked on the internet or social media without permission. CICPD reserves the right to
delete or disable your post or link if in their judgment it would involve a violation of copyright
law.
MODULE3:
ANALYTICS TECHNIQUES &
TOOLS
CONTENTS
I. O v e r v i ew o f An a ly t i c a l T o o ls
a. Ty p e s o f D a t a Ana l y sis
b. To o l s a nd Te c h n iq u e s
c. I m po r t a n c e o f D a t a V i s u al i z a t i o n
d. Ap p l i c a t i o n s o f D a t a An al y t i c s
II. I n d us t r y - R e l e v a n t An a ly t i c a l T o o ls
III. S t a t i s t i c al M e th o d s f o r D a t a An al y s i s
IV. D a t a Vi s ua li s a t i o n
V. Fro m An a ly s i s to S o l u t i o n : U s e th e 7 Fra m es M e t h o d
D ev e l o p i n g P ers p ec t i v es
Definition of Data Analysis
Data analysis is a process of examining,
cleaning, transforming, and modelling
data to discover useful information, draw
conclusions, and support decision-
making.
1 2 3 4
Descriptive Diagnostic Predictive Prescriptive
Analytics Analytics Analytics Analytics
1 2 3 4 5
6 7 8 9 10
5 6 7
Integration Collaboration Mobile Support
The effectiveness of
While Tableau is Tableau can depend
While Tableau is
Tableau can be excellent for on the structure and
designed to be user-
relatively expensive, visualization, its data cleanliness of the
friendly, mastering
especially for larger manipulation underlying data.
advanced features
organizations or capabilities are not Messy or poorly-
may still require
when using as extensive as structured data may
some learning for
advanced features. some dedicated data require
new users.
preparation tools. preprocessing
outside of Tableau.
Summary
1 2 3 4
Data Transformation
Data Connectivity Visualization Dashboards
& Modeling
5 6 7 8
Natural Language Collaboration and Integration with
Mobile Accessibility
Query Sharing Microsoft Products
Limited Dependency on
Data Security Desktop Version
Learning Curve Customization for Microsoft
Concerns Limitations
Visualizations Ecosystem
1 2 3 4
Resilient Distributed
In-Memory Processing Ease of Use Versatility
Datasets (RDD)
Spark performs in- Spark provides high- Spark supports a RDD is Spark's
memory data level APIs for variety of data fundamental data
processing, storing programming in processing tasks, structure, offering
intermediate data in Java, Scala, Python, including batch fault-tolerant,
memory rather than and R. This makes it processing, real-time distributed data
writing it to disk. This accessible to a wide stream processing, processing across a
results in faster data range of users, machine learning, cluster
processing including data and graph
compared to scientists, engineers, processing.
traditional disk- and analysts.
based processing
Features of Apache Spark
5 6 7 8 9
Machine Learning Community
Spark SQL GraphX Spark Streaming
Library (MLlib) Support
1 2 3 4
Graphical User Rich Set of Tools and
Node-Based Workflow Extensive Integration
Interface (GUI) Algorithms
5 6 7 8
Workflow Sharing and Community
Data Visualization Scalability
Reusability Extensions
The platform KNIME allows users KNIME Server The KNIME Hub and
provides to share workflows enables the community
visualization tools for and components, deployment and extensions offer a
exploring and promoting execution of repository of
interpreting data. collaboration and workflows on a additional nodes and
Users can generate facilitating the reuse server, providing workflows
charts, graphs, and of analysis and scalability for contributed by the
other visual processing steps handling larger user community,
representations datasets or more extending the
within the KNIME intensive platform's
environment computations functionality
Advantages
1 2 3 4
Data Collection and Search and Query Correlation and
Real-Time Monitoring
Indexing Language Alerting
Splunk can collect Splunk's powerful Splunk provides Splunk can correlate
and index machine- search and query real-time monitoring data from different
generated data from language allow capabilities, allowing sources, helping
various sources, users to efficiently users to detect and users identify
including logs, search and analyze respond to events as patterns and
metrics, and events. large volumes of they occur. anomalies. It also
data in real-time. supports alerting
based on predefined
conditions
Features of Splunk
5 6 7 8
Dashboards and Machine Learning and Security and
App Ecosystem
Visualizations AI Integration Compliance
Splunk enables the Splunk integrates Splunk has a rich Splunk provides
creation of custom with machine ecosystem of apps features to help
dashboards and learning and artificial and add-ons organizations
visualizations to intelligence tools for developed by both monitor and manage
present data insights predictive analytics, Splunk and third- security events,
in a clear and anomaly detection, party vendors, supporting
actionable manner and automated extending its compliance
insights functionality for requirements and
specific use cases incident response
Advantages
•Splunk is designed to scale horizontally, allowing
Scalability organizations to handle increasing amounts of data by adding
more instances
Search and Query •The search and query language in Splunk is powerful and
flexible, enabling users to perform complex searches and
Capabilities analysis efficiently
Comprehensive App •The broad range of apps and add-ons available in the Splunk
ecosystem allows organizations to extend the platform's
Ecosystem capabilities for various use cases
Disadvantages
Complexity for
Data Retention
Cost Learning Curve Resource Intensive Small
Costs
Environments
Splunk can be
While powerful, Splunk can be Storing large
expensive,
Splunk's resource- volumes of data in For small
especially for
capabilities may intensive, and Splunk for environments or
larger
have a learning organizations extended periods simple use cases,
deployments or
curve for new need to allocate can lead to the complexity of
when advanced
users, particularly sufficient increased storage Splunk may be
features and add-
those unfamiliar resources to costs. perceived as
ons are required.
with the search handle the volume Organizations overkill, and
Organizations
query language of data being should plan for simpler solutions
should carefully
and data processed, data retention may be more
evaluate their
processing especially in large- policies suitable.
budget and
concepts. scale deployments accordingly.
licensing needs
Summary
1 2 3 4
Graphical Workflow Extensive Data Machine Learning
Data Preprocessing
Design Integration Algorithms
5 6 7 8
Automated Machine Model Validation and Integration with
Scalability
Learning (AutoML) Evaluation External Tools
Comprehensive Set of •It offers a wide range of tools for data integration,
preprocessing, modeling, and evaluation, providing a
Tools comprehensive environment for data science tasks
1 2 3 4
Associative Data In-Memory Data Interactive
Self-Service BI
Model Processing Dashboards
QlikView uses an QlikView utilizes in- Users can create QlikView supports
associative data memory data interactive and self-service business
model, allowing processing to quickly dynamic dashboards intelligence, allowing
users to easily analyze and with a wide variety of users to create their
explore and analyze visualize data visualizations, own reports,
data by making without the need for including charts, dashboards, and
dynamic constant queries to graphs, tables, and visualizations
associations the underlying data maps without heavy
between different source reliance on IT
data points
Features of QlikView
5 6 7 8
Data Integration and Collaboration Security and Access
Elastic Data Modeling
Connectivity Features Control
Associative Data •The associative data model allows users to explore data
interactively, making unexpected discoveries and uncovering
Model hidden insights.
Web-based authoring While the basics are To fully leverage QlikView may have a
capabilities may be user-friendly, collaboration and higher upfront cost
more limited mastering advanced access control compared to some
compared to some features may require features, other BI tools.
other BI tools, which training, particularly organizations may Organizations should
may impact the for complex data need to invest in carefully evaluate
ability to create modeling and QlikView Server their budget and
content on the go scripting. infrastructure requirements.
Summary
1 2 3 4
Data Integration Connectivity Big Data Integration Data Quality
5 6 7 8
Master Data Graphical Design Open Source
Cloud Integration
Management (MDM) Interface Foundation
Talend supports Talend supports Talend offers a user- Talend Open Studio
cloud-based data master data friendly, graphical is an open-source
integration, allowing management, interface for version that provides
users to work with helping designing data a cost-effective entry
data stored in cloud organizations integration jobs point for users. The
platforms such as maintain a single, using a drag-and- community actively
AWS, Azure, Google consistent version of drop approach. This contributes to
Cloud, and others master data across makes it accessible forums,
various systems to users with documentation, and
different technical additional
backgrounds components
Advantages
Learning Curve for Enterprise Edition Resource Intensive for Limited Features in
Advanced Features Costs Large Workloads Open Source Version
QlikView Splunk
2 4
1 3
SAS BA Board
Statistical Methods for Data Analysis
Descriptive Statistics
01 Inferential Statistics
06 02
Predictive Analysis
Parametric Tests
05 03
Exploratory Data 04
Analysis (EDA)
Non-Parametric Tests
Descriptive & Predictive Statistics
Descriptive Statistics:
Definition: Descriptive statistics involve the use of
numerical and graphical techniques to summarize and
describe the main features of a dataset.
Purpose: Descriptive statistics provide a concise Purpose: The goal of predictive statistics is to identify
summary of the essential characteristics of a dataset, patterns and relationships within a dataset that can be
helping to simplify large amounts of data into used to make informed predictions about future
understandable patterns. Common measures in observations. This is commonly used in fields such as
descriptive statistics include measures of central machine learning, where models are trained on
tendency (mean, median, mode) and measures of historical data to make predictions about new, unseen
variability (range, variance, standard deviation). data.
Example: If you have a dataset of exam scores for a Example: Using past sales data to build a predictive
class, descriptive statistics would help you summarize model that forecasts future sales, or using historical
the average score, the spread of scores, and the most weather data to predict future weather conditions.
common score.
Descriptive & Predictive Statistics
Predictive Statistics:
Definition: Predictive statistics involve the use of statistical models
PAST and techniques to make predictions or forecasts about future
events based on historical data.
1 2 3 4 5
Column Chart Bar Chart Line Chart Pie Chart Scatter Plot
6 7 8 9 10
Network
Histogram Heat Map Treemap Gantt Chart
Diagram
7 Frames Method Developing Perspectives