0% found this document useful (0 votes)
8 views25 pages

DV-Week 9

Uploaded by

D One
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views25 pages

DV-Week 9

Uploaded by

D One
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Data Visualization

Visualisasi Data - Program Studi Informatika


Welcome to DV Class

● Lesson Plan - Outcome-based Education (OBE) version - bit.ly/VD-RPS-OBE


● Elective Code: 211861031
● Credits: 3
● Schedule: Tuesday, 10.30 AM-12.00 PM
● Whatsapp Group:
● Instructor CP: [email protected] | Students CP:
● Google Classroom Link: bit.ly/gcdatavizkelasa
● Assessments: Mini Assignments, Quiz (UK), Mid Exam, Final Exam, Team-based Project, Lab Works

bit.ly/ask-vd
Infographics vs Data Visualization (or Information Vis)
Infographics
● manually drawn (and therefore a custom treatment of the information);
● specific to the data at hand (and therefore nontrivial to recreate with different data);
● aesthetically rich (strong visual content meant to draw the eye and hold interest);
● relatively data-poor (because each piece of information must be manually encoded).

DV
● algorithmically drawn (may have custom touches but is largely rendered with the help of computerized
methods);
● easy to regenerate with different data (the same form may be repurposed to represent different
datasets with similar dimensions or characteristics);
● often aesthetically barren (data is not decorated);
● and relatively data-rich (large volumes of data are welcome and viable, in contrast to infographics).

bit.ly/ask-vd
Know your data

How do you choose a visualization format that shows your data’s best (and most
interesting or informative) side?

The type of basic questions you will want to ask about your data include:
• Is it a time-series? A hierarchy?
• How many dimensions does it have? Which are the most important ones?
• What sort of relationships do they have (e.g., one-to-one or many-to-many)?
• How variable are they?
• Are the values categorical? Discrete or continuous? Linear or non-linear? How are they bounded?
• How many categories are there?

bit.ly/ask-vd
Know your data

Goals of visualization:
to inform (informative vis) or to change reader’s opinion (persuasive vis)

Example goals for visualizations:


● to monitor systems, ● find outliers,
● find bargains, ● show trends,
● compare company performances, ● support arguments,
● select suitable solutions, ● or simply give an overview of the data.
● track populations,
● tell stories,
● find specific data points,

bit.ly/ask-vd
Week 8

Color Scales

bit.ly/ask-vd
Choosing Appropriate Visual Encoding

● once you know the “shape” of your data, you can encode its various dimensions
with appropriate visual properties.

● Natural ordering and number of distinct values will indicate whether a visual
property is best suited to one of the main data types: quantitative, ordinal,
categorical, or relational data. (Spatial data is another common data type, and is
usually best represented with some kind of map.)

bit.ly/ask-vd
Choosing Appropriate
Visual Encoding

When choosing a visual property, select one that has


a number of useful differentiable values and an
ordering similar to that of your data

bit.ly/ask-vd
Color as Visual Encoding

Color is tricky.
It’s very appealing, and as designers, we’re tempted to use it all the time.
However, getting color right can be much more difficult than it seems.

● a common mistake: avoid using color (hue) for any sort of ranking or ordering of data.
● You can vary brightness or saturation quite effectively for uses such as heat maps and relative
intensity, but please don’t vary color as a way to encode rank, order, intensity, or value.
● it can be an excellent property for labeling categorical data, or non-ordered categories for
differentiation purposes. (Examples of non-ordered categories include operating system, gender,
region, conference track, and genre.)
● don’t use too many distinct values if you’re using color as the visual property by which to encode
categories
bit.ly/ask-vd
Color as Visual Encoding

● The standard advice for using color to


encode categories is to limit your
selection to ideally about six

● It is preferred to use colors from the


first half of the list before moving on
to the second half.

bit.ly/ask-vd
Color as Visual Encoding

● Red is associated with warning, danger, and warfare. It can also be associated with passion—either
love or anger—and blood. In the East it is associated with good luck and prosperity.
● Green is associated with nature, the earth, environmentalism, and renewal. It can also be
associated with permission to move ahead, clearance, etc. (as in “green light”)—especially when
paired with red.
● Yellow is associated with happiness, sunshine, and playfulness. However, on its own or in large
fields, it can be irritating. It is also associated with caution.
● Blue is associated with water, coolness, and calm. Depending on the shade, it may be associated
with religion or the military.

bit.ly/ask-vd
Color as Visual Encoding

● Black is associated with mourning and death, but also with luxury and sophistication.
● White is associated with purity, innocence, and weddings, but also with sympathy and the afterlife
(and therefore, with death).
● Pink is associated with affection, imagination, and childishness. Light pink is associated with young
girls, and light blue with young boys—especially when paired
together.
● Grey is associated with neutrality, conservatism, modesty, and maturity.
● Orange is associated with fire, energy, and—in the East—spirituality. It is named for the fruit, and so
can also be associated with health and vigor.
● Brown is associated with dirt, leather, stone, and “earthiness.” It may also be associated with animal
waste.
● Purple is associated with royalty (nobility) and magic (falsehood or artificiality).

bit.ly/ask-vd
Color as Visual Encoding

● color does not have a non-negotiable natural ordering built into the brain.
so…
● the use of color doesn’t always help
● the reader’s brain is looking for patterns

You can’t depend on everyone to agree that yellow follows


purple in the way that you can depend on them to agree that four follows three.

bit.ly/ask-vd
For example, elevation
can be represented by
increasing the darkness of
browns, rather than cycling
through the rainbow
bit.ly/ask-vd
bit.ly/ask-vd
Week 9

Exploratory Data Analysis

bit.ly/ask-vd
Exploration vs Explanation

two categories of DV: exploration and explanation.


● Exploratory data visualizations are appropriate when you have a whole bunch of data
and you’re not sure what’s in it. When you need to get a sense of what’s inside your
data set, translating it into a visual medium can help you quickly identify its features,
including interesting curves, lines, trends, or anomalous outliers.

● Explanatory data visualization is appropriate when you already know what the data
has to say, and you are trying to tell that story to somebody else. It could be the head of
your department, a grant committee, or the general public.

● exploratory data visualization is part of the data analysis phase, and explanatory data
visualization is part of the presentation phase.
bit.ly/ask-vd
Exploratory Data Analysis
● a part of data science process:

○ Data Preparation
○ Data Cleansing
○ Exploratory Data Analysis ● discover patterns
○ Feature Engineering ● spot anomalies
○ Modeling ● test a hypothesis
○ Evaluation
● check assumptions
○ Deployment
● used by data scientists to analyze and investigate datasets and summarize their main
characteristics, often employing data visualization methods. [IBM.com]

bit.ly/ask-vd
Exploratory Data Analysis
● Data scientists can use exploratory analysis to ensure the results they produce are valid
and applicable to any desired business outcomes and goals.
● EDA can help answer questions about standard deviations, categorical variables, and
confidence intervals.
● Once EDA is complete and insights are drawn, its features can then be used for more
sophisticated data analysis or modeling, including machine learning. [IBM.com]
● Things to see first after loading the dataset: data dimension, and some statistic
descriptives: mean, median, mode, quartile, standard deviation

bit.ly/ask-vd
Exploratory Data Analysis (EDA) Tools
● Clustering and dimension reduction techniques
● Univariate visualization
● Bivariate visualizations
● Multivariate visualizations
● K-means Clustering is a clustering method in unsupervised learning (commonly used in
market segmentation, pattern recognition, and image compression)
● Predictive models, such as linear regression
● Python: An interpreted, object-oriented programming language with dynamic semantics. Its high-level, built-in
data structures, combined with dynamic typing and dynamic binding, make it very attractive for rapid
application development, as well as for use as a scripting or glue language to connect existing components
together. Python and EDA can be used together to identify missing values in a data set, which is important so
you can decide how to handle missing values for machine learning.
● R: An open-source programming language and free software environment for statistical computing and
graphics supported by the R Foundation for Statistical Computing. The R language is widely used among
statisticians in data science in developing statistical observations and data analysis.
bit.ly/ask-vd
EDA using Python Library
● Py libraries oftenly used: pandas, numpy,matplotlib and seaborn
● an easier option to do EDA visually in Python: Sweetviz
● SweetViz Package is an open-source Python library that can automatically launch EDA
and create stunning visuals with just a few lines of code.

bit.ly/ask-vd
EDA components (using Python libraries)

Acquire &Lorem
Loading
1 Dataset Lorem
Cleaning 2
Dataset Exploring
Lorem
& Visualizing
3
https://towardsdatascience.com/data-exploration-on-airbnb-singapore-01-40698c54cac3

● Load Python libraries ● Checking column with ● find top-N data


● Load dataset missing values ● map the data
● Understanding data ● Removing redundant ● see the data
(data size: number of variables distribution
columns, number of ● Replacing all the ● display data based on
rows missing values descriptive analytics

bit.ly/ask-vd
Exercise Selasa, 21 Juni 2022
● Buat tim berisi 2 orang. Tugas ini akan dikerjakan secara berkelompok dan dibahas di
pertemuan online berikutnya.
● Bukalah dataset yang bisa diakses di link:
https://drive.google.com/file/d/13E0pK-O2GgY_6Ihfc8fZDO05Rbni7EcZ/view?usp=shari
ng

● Periksalah data tersebut untuk menemukan insight apapun yang menarik untuk
diketahui dalam bentuk dashboard. Dashboard= ada judul, ada beberapa visualisasi
untuk informasi yang berbeda dan hanya menggunakan 1 ukuran layar desktop.
● Waktu pengerjaan adalah 1 minggu, setiap tim akan mempresentasikan hasilnya dan
akan dimasukkan dalam komponen nilai Tugas.

bit.ly/ask-vd
References
https://builtin.com/data-science/EDA-python

https://towardsdatascience.com/3-python-libraries-for-effective
-eda-that-you-might-have-missed-3320f48ff070

Bertin, Jacques. Graphics and Graphic Information-Processing. W.


De Gruyter, 1981.

Data Mining: Concepts and Techniques (3 rd Edition), Jiawei Han,


Micheline, Kamber, and Jian Pei, University of Illinois at Urbana
Champaign & Simon Fraser University, 2011

Introduction to Data Mining, Tan, Steinbach, Kumar, 2004

bit.ly/ask-vd
Thanks
bit.ly/ask-vd

You might also like