DV-Week 9
DV-Week 9
bit.ly/ask-vd
Infographics vs Data Visualization (or Information Vis)
Infographics
● manually drawn (and therefore a custom treatment of the information);
● specific to the data at hand (and therefore nontrivial to recreate with different data);
● aesthetically rich (strong visual content meant to draw the eye and hold interest);
● relatively data-poor (because each piece of information must be manually encoded).
DV
● algorithmically drawn (may have custom touches but is largely rendered with the help of computerized
methods);
● easy to regenerate with different data (the same form may be repurposed to represent different
datasets with similar dimensions or characteristics);
● often aesthetically barren (data is not decorated);
● and relatively data-rich (large volumes of data are welcome and viable, in contrast to infographics).
bit.ly/ask-vd
Know your data
How do you choose a visualization format that shows your data’s best (and most
interesting or informative) side?
The type of basic questions you will want to ask about your data include:
• Is it a time-series? A hierarchy?
• How many dimensions does it have? Which are the most important ones?
• What sort of relationships do they have (e.g., one-to-one or many-to-many)?
• How variable are they?
• Are the values categorical? Discrete or continuous? Linear or non-linear? How are they bounded?
• How many categories are there?
bit.ly/ask-vd
Know your data
Goals of visualization:
to inform (informative vis) or to change reader’s opinion (persuasive vis)
bit.ly/ask-vd
Week 8
Color Scales
bit.ly/ask-vd
Choosing Appropriate Visual Encoding
● once you know the “shape” of your data, you can encode its various dimensions
with appropriate visual properties.
● Natural ordering and number of distinct values will indicate whether a visual
property is best suited to one of the main data types: quantitative, ordinal,
categorical, or relational data. (Spatial data is another common data type, and is
usually best represented with some kind of map.)
bit.ly/ask-vd
Choosing Appropriate
Visual Encoding
bit.ly/ask-vd
Color as Visual Encoding
Color is tricky.
It’s very appealing, and as designers, we’re tempted to use it all the time.
However, getting color right can be much more difficult than it seems.
● a common mistake: avoid using color (hue) for any sort of ranking or ordering of data.
● You can vary brightness or saturation quite effectively for uses such as heat maps and relative
intensity, but please don’t vary color as a way to encode rank, order, intensity, or value.
● it can be an excellent property for labeling categorical data, or non-ordered categories for
differentiation purposes. (Examples of non-ordered categories include operating system, gender,
region, conference track, and genre.)
● don’t use too many distinct values if you’re using color as the visual property by which to encode
categories
bit.ly/ask-vd
Color as Visual Encoding
bit.ly/ask-vd
Color as Visual Encoding
● Red is associated with warning, danger, and warfare. It can also be associated with passion—either
love or anger—and blood. In the East it is associated with good luck and prosperity.
● Green is associated with nature, the earth, environmentalism, and renewal. It can also be
associated with permission to move ahead, clearance, etc. (as in “green light”)—especially when
paired with red.
● Yellow is associated with happiness, sunshine, and playfulness. However, on its own or in large
fields, it can be irritating. It is also associated with caution.
● Blue is associated with water, coolness, and calm. Depending on the shade, it may be associated
with religion or the military.
bit.ly/ask-vd
Color as Visual Encoding
● Black is associated with mourning and death, but also with luxury and sophistication.
● White is associated with purity, innocence, and weddings, but also with sympathy and the afterlife
(and therefore, with death).
● Pink is associated with affection, imagination, and childishness. Light pink is associated with young
girls, and light blue with young boys—especially when paired
together.
● Grey is associated with neutrality, conservatism, modesty, and maturity.
● Orange is associated with fire, energy, and—in the East—spirituality. It is named for the fruit, and so
can also be associated with health and vigor.
● Brown is associated with dirt, leather, stone, and “earthiness.” It may also be associated with animal
waste.
● Purple is associated with royalty (nobility) and magic (falsehood or artificiality).
bit.ly/ask-vd
Color as Visual Encoding
● color does not have a non-negotiable natural ordering built into the brain.
so…
● the use of color doesn’t always help
● the reader’s brain is looking for patterns
bit.ly/ask-vd
For example, elevation
can be represented by
increasing the darkness of
browns, rather than cycling
through the rainbow
bit.ly/ask-vd
bit.ly/ask-vd
Week 9
bit.ly/ask-vd
Exploration vs Explanation
● Explanatory data visualization is appropriate when you already know what the data
has to say, and you are trying to tell that story to somebody else. It could be the head of
your department, a grant committee, or the general public.
● exploratory data visualization is part of the data analysis phase, and explanatory data
visualization is part of the presentation phase.
bit.ly/ask-vd
Exploratory Data Analysis
● a part of data science process:
○ Data Preparation
○ Data Cleansing
○ Exploratory Data Analysis ● discover patterns
○ Feature Engineering ● spot anomalies
○ Modeling ● test a hypothesis
○ Evaluation
● check assumptions
○ Deployment
● used by data scientists to analyze and investigate datasets and summarize their main
characteristics, often employing data visualization methods. [IBM.com]
bit.ly/ask-vd
Exploratory Data Analysis
● Data scientists can use exploratory analysis to ensure the results they produce are valid
and applicable to any desired business outcomes and goals.
● EDA can help answer questions about standard deviations, categorical variables, and
confidence intervals.
● Once EDA is complete and insights are drawn, its features can then be used for more
sophisticated data analysis or modeling, including machine learning. [IBM.com]
● Things to see first after loading the dataset: data dimension, and some statistic
descriptives: mean, median, mode, quartile, standard deviation
bit.ly/ask-vd
Exploratory Data Analysis (EDA) Tools
● Clustering and dimension reduction techniques
● Univariate visualization
● Bivariate visualizations
● Multivariate visualizations
● K-means Clustering is a clustering method in unsupervised learning (commonly used in
market segmentation, pattern recognition, and image compression)
● Predictive models, such as linear regression
● Python: An interpreted, object-oriented programming language with dynamic semantics. Its high-level, built-in
data structures, combined with dynamic typing and dynamic binding, make it very attractive for rapid
application development, as well as for use as a scripting or glue language to connect existing components
together. Python and EDA can be used together to identify missing values in a data set, which is important so
you can decide how to handle missing values for machine learning.
● R: An open-source programming language and free software environment for statistical computing and
graphics supported by the R Foundation for Statistical Computing. The R language is widely used among
statisticians in data science in developing statistical observations and data analysis.
bit.ly/ask-vd
EDA using Python Library
● Py libraries oftenly used: pandas, numpy,matplotlib and seaborn
● an easier option to do EDA visually in Python: Sweetviz
● SweetViz Package is an open-source Python library that can automatically launch EDA
and create stunning visuals with just a few lines of code.
bit.ly/ask-vd
EDA components (using Python libraries)
Acquire &Lorem
Loading
1 Dataset Lorem
Cleaning 2
Dataset Exploring
Lorem
& Visualizing
3
https://towardsdatascience.com/data-exploration-on-airbnb-singapore-01-40698c54cac3
bit.ly/ask-vd
Exercise Selasa, 21 Juni 2022
● Buat tim berisi 2 orang. Tugas ini akan dikerjakan secara berkelompok dan dibahas di
pertemuan online berikutnya.
● Bukalah dataset yang bisa diakses di link:
https://drive.google.com/file/d/13E0pK-O2GgY_6Ihfc8fZDO05Rbni7EcZ/view?usp=shari
ng
● Periksalah data tersebut untuk menemukan insight apapun yang menarik untuk
diketahui dalam bentuk dashboard. Dashboard= ada judul, ada beberapa visualisasi
untuk informasi yang berbeda dan hanya menggunakan 1 ukuran layar desktop.
● Waktu pengerjaan adalah 1 minggu, setiap tim akan mempresentasikan hasilnya dan
akan dimasukkan dalam komponen nilai Tugas.
bit.ly/ask-vd
References
https://builtin.com/data-science/EDA-python
https://towardsdatascience.com/3-python-libraries-for-effective
-eda-that-you-might-have-missed-3320f48ff070
bit.ly/ask-vd
Thanks
bit.ly/ask-vd