0% found this document useful (0 votes)
12 views12 pages

Introduction To Explain

Uploaded by

kachizih
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views12 pages

Introduction To Explain

Uploaded by

kachizih
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Introduction to Explain

© Explore Data Science Academy


Overview

This tutorial is laid out as follows:

Explain

The Explore Data Science Process

Visualise

Communicate

Which tools we are using

Conclusion
Explain

Explaining your work is about visualising the data and communicating your insights in an effective way.

Visualise Communicate

= data dinosaur?

Same Stats, Different Graphs


The Explore Data Science Process

The Explore Data Science Process is about solving real-world problems using data.

EXPLAIN

Business Intelligence
Data Exploration
Dashboards

Business Acumen
Communication
Problem identification
Ethical considerations

Statistics
Probability Theory
Applied Statistics
Bayesian Stats
Visualise

Visualise is the dual skill of understanding and interpreting your data through visual images.

Data Exploration Dashboards

• Key to understanding ALL of your data.


• Dashboards turn data into insights.
• Individual variables summarised by summary • The key is to find key performance indicators
statistics and distributions. (KPIs) that the end user cares about.
• Relationships between variables determined by
• To help them understand their problem.
hypothesis testing.
Why do we need to visualise data?

What do all of the below datasets have in


common with this dinosaur?

...make both calculations and graphs.


Both sorts of output should be studied;
each will contribute to understanding.
F. J. Anscombe, 1973

Source: Same Stats, Different Graphs


But visualisation by itself is dangerous

Hypothesis Testing
• Summary Statistics help us gain insight into a single
variable’s distribution.

• Hypothesis Testing is the set of statistical techniques


to help us understand if our intuitions and hypotheses
are true and provable.

• Linear Regression is an essential tool to understand


the relationship between two numerical variables.

• Chi2 tests allow us to understand how categorical


variables interact and are related.

Source: Spurious Correlations


Visualisation and communication go hand-in-hand

1. 6.
Show your data! Utilise Macro / Micro

Edward Tufte’s
2. Rules for Data Visualisation 7.
Use graphics Use Layers

3. 8.
Avoid Chartjunk Use Multiples

4. 9.
Utilise Data Ink Use Colours

5. 10.
Use Labels Understand Narrative

Source: Tufte's Rules


Communicate

Communicate is the skill of delivering solutions that create change in the real-world.

Communication Skills

• Problem identification is a fundamental data


science skill.

• Effective communication is about being able


to describe your solution to suit your audience.

• It’s not about what you have done, it’s about


what they must do.

• Presentations help us deliver our solutions


with structure and evidence.

• Develop your written and spoken


communication skills throughout the year.
What tools are used within the Academy?

The some of the following tools may be used within your course:

● Google Docs is a free web-based application in which documents and


spreadsheets can be created, edited and stored online.

● Google Slides is a free web-based application in which presentations can be


created edited and stored online.
What tools are used within the Academy?

The some of the following tools may be used within your course:

● Power BI is a business analytics service that provides interactive


visualizations with self-service business intelligence capabilities,
where end users can create reports and dashboards by themselves.

● Matplotlib is a Python 2D plotting library which produces publication


quality figures in a variety of hard copy formats and interactive
environments across platforms.

● Seaborn is a Python visualization library based on matplotlib that


provides a high-level interface for drawing attractive statistical
graphics.
Conclusion

What we’ve learnt

Visualising data and communicating your insights in


an effective way

Understanding and interpreting your data through


visual images

Delivering solutions that create change in the


real-world

You might also like