0% found this document useful (0 votes)

11 views

1.Data Handling and Visualization Module 1 Slides

Module 1 provides an introduction to data visualization, focusing on data collection strategies, preparation, and visualization techniques. It emphasizes the importance of using multiple data collection methods, cleaning and labeling data, and utilizing tools like NumPy and pandas for data analysis. The module also covers the interaction with databases and the significance of data transformation and visualization libraries such as Matplotlib and ggplot2.

Uploaded by

varshinipd1345

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

1.Data Handling and Visualization Module 1 Slides

Uploaded by

varshinipd1345

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Module 1

Introduction to Data Visualization

Module 1 Introduction to Data Visualization

Data collection
Module 1 Introduction to Data Visualization

Data Collection Strategies

• No one best way: decision depends on:
• What you need to know: numbers or stories
• Where the data reside: environment, files, people
• Resources and time available
• Complexity of the data to be collected
• Frequency of data collection
• Intended forms of data analysis
Module 1 Introduction to Data Visualization

Rules for Collecting Data

• Use multiple data collection methods
• Use available data, but need to know
• how the measures were defined
• how the data were collected and cleaned
• the extent of missing data
• how accuracy of the data was ensured
Module 1 Introduction to Data Visualization
Data Collection Tools
• Participatory Methods
• Records and Secondary Data
• Observation
• Surveys and Interviews
• Focus Groups
• Diaries, Journals, Self-reported Checklists
• Expert Judgment
• Delphi Technique
• Other Tools
Module 1 Introduction to Data Visualization

Data Preparation Basic Models

Module 1 Introduction to Data Visualization
Data Preparation
• Data preparation is the process of preparing raw data so that it is
suitable for further processing and analysis.
• Key steps include collecting, cleaning, and labeling raw data into a
form suitable for machine learning (ML) algorithms and then
exploring and visualizing the data.
• Data preparation can take up to 80% of the time spent on an ML
project.
• Using specialized data preparation tools is important to optimize this
process.
Module 1 Introduction to Data Visualization
Data Preparation
• Data preparation follows a series of steps that starts with collecting
the right data, followed by cleaning, labeling, and then validation and
visualization.
1) Collect data
Collecting data is the process of assembling all the data you need for
ML.
2) Clean data
Cleaning data corrects errors and fills in missing data as a step to
ensure data quality.
Module 1 Introduction to Data Visualization
Data Preparation
3) Label data
Data labeling is the process of identifying raw data (images, text files,
videos, and so on) and adding one or more meaningful and informative
labels to provide context so an ML model can learn from it.
4) Validate and visualize
After data is cleaned and labeled, ML teams often explore the data to
make sure it is correct and ready for ML.
Visualizations like histograms, scatter plots, box and whisker plots, line
plots, and bar charts are all useful tools to confirm data is correct.
Module 1 Introduction to Data Visualization

Overview of Data Visualization

Module 1 Introduction to Data Visualization

Defining visualization (vis)

• Computer-based visualization systems provide visual
representations of datasets designed to help people carry
out tasks more effectively.

• Visualization is suitable when there is a need to augment

human capabilities rather than replace people with
computational decision-making methods.
Module 1 Introduction to Data Visualization
Defining visualization (vis)
• external representation: replace cognition with perception

[Cerebral: Visualizing Multiple Experimental Conditions on a

Graph with Biological Context. Barsky, Munzner, Gardy, and
Kincaid. IEEE TVCG (Proc. InfoVis) 14(6):1253-1260, 2008.]
Module 1 Introduction to Data Visualization

Data Abstraction
Module 1 Introduction to Data Visualization

Data abstraction: Three operations

• translate from domain-specific language to generic visualization language
• identify dataset type(s), attribute types
• identify cardinality
• how many items in the dataset?
• what is cardinality of each attribute?
• number of levels for categorical data
• range for quantitative data
• consider whether to transform data
• guided by understanding of task
Module 1 Introduction to Data Visualization

Task Abstraction
Module 1 Introduction to Data Visualization
Task abstraction: Actions and targets
• very high-level pattern
• actions
• {action, target}
– analyze
pairs
• high-level choices
– discover distribution
– search – compare trends
• find a known/unknown item – locate outliers
– query – browse topology
• find out about characteristics of item
• targets
– what is being acted on

16
Module 1 Introduction to Data Visualization
Actions: Analyze
• consume
– discover vs present
• classic split
• aka explore vs explain
– enjoy
• newcomer
• aka casual, social

• produce
– annotate, record
– derive
• crucial design choice

17
Actions: Search Module 1 Introduction to Data Visualization
• what does user know?
– target, location
• lookup
– ex: word in dictionary
• alphabetical order

• locate
– ex: keys in your house
– ex: node in network
• browse
– ex: books in bookstore
• explore
– ex: find cool neighborhood in new city

18
Module 1 Introduction to Data Visualization
Actions: Query
• how much of the data
matters?
– one: identify
– some: compare
– all: summarize

19
Module 1 Introduction to Data Visualization

Analysis: Four Levels for

Validation
Module 1 Introduction to Data Visualization
Analysis framework: Four levels, three questions
• domain situation domain

– who are the target users? abstraction

• abstraction
– translate from specifics of domain to vocabulary of vis
• what is shown? data abstraction idiom
• why is the user looking at it? task abstraction
algorithm
• idiom
– how is it shown?
• visual encoding idiom: how to draw
• interaction idiom: how to manipulate
[A Multi-Level Typology of Abstract Visualization Tasks. Brehmer and

Munzner. IEEE TVCG 19(12):2376-2385, 2013 (Proc. InfoVis 2013). ]

• algorithm [A Nested Model of Visualization Design and Validation. Munzner.

– efficient computation IEEE TVCG 15(6):921-928, 2009 (Proc. InfoVis 2009). ]

2
1
Module 1 Introduction to Data Visualization
Nested model
• downstream: cascading effects

[A Nested Model of Visualization Design and

Validation. Munzner. IEEE TVCG 15(6):921-928,
2009 (Proc. InfoVis 2009). ]

2
2
Module 1 Introduction to Data Visualization

Interacting with Databases

Module 1 Introduction to Data Visualization
Interacting with Databases
• In many applications data rarely comes from text files, that being a fairly
inefficient way to store large amounts of data.
• SQL-based relational databases (such as SQL Server, PostgreSQL, and MySQL)
are in wide use, and many alternative non-SQL (so-called NoSQL) databases have
become quite popular.
• The choice of database is usually dependent on the performance, data integrity,
and scalability needs of an application.
Module 1 Introduction to Data Visualization

Data Cleaning and

Preparation
Dirty Data
Module 1 Introduction to Data Visualization

• The Statistics View: • The Domain Expert’s View:

• There is a process that produces data
• This Data Doesn’t look right
• Any dataset is a sample of the output of that
process • This Answer Doesn’t look right
• Results are probabilistic • What happened?
• You can correct bias in your sample
• The Database View:
• I got my hands on this data set
• Some of the values are missing, corrupted, wrong, duplicated
• Results are absolute (relational model)
• You get a better answer by improving the quality of the values in your dataset
Module 1 Introduction to Data Visualization
• The Data Scientist’s View:
• Some Combination of all of the above
Module 1 Introduction to Data Visualization

Data Cleaning Makes Everything Okay?

The appearance of a hole in the earth's ozone
layer over Antarctica, first detected in 1976,
was so unexpected that scientists didn't pay
attention to what their instruments were telling
them; they thought their instruments were
malfunctioning.
National Center for Atmospheric Research

In fact, the data were rejected as unreasonable

by data quality control algorithms
Module 1 Introduction to Data Visualization
How Clean is “clean-enough”?
• How much cleaning is too much?
• Answers are likely to be:
• domain-specific
• data source-specific
• application-specific
• user-specific
• all of the above?
How to split between shared and application-specific cleaning?
Module 1 Introduction to Data Visualization

• Data in the Real World Is Dirty: Lots of potentially incorrect data, e.g., instrument faulty, human
or computer error, transmission error
• incomplete: lacking attribute values, lacking certain attributes of interest, or containing only
aggregate data
• e.g., Occupation=“ ” (missing data)
• noisy: containing noise, errors, or outliers
• e.g., Salary=“−10” (an error)
• inconsistent: containing discrepancies in codes or names, e.g.,
• Age=“42”, Birthday=“03/07/2010”
• Was rating “1, 2, 3”, now rating “A, B, C”
• discrepancy between duplicate records
• Intentional (e.g., disguised missing data)
• Jan. 1 as everyone’s birthday?
Module 1 Introduction to Data Visualization

Handling Missing Data

Module 1 Introduction to Data Visualization
Incomplete (Missing) Data
• Data is not always available
• E.g., many tuples have no recorded value for several
attributes, such as customer income in sales data
• Missing data may be due to
• equipment malfunction
• inconsistent with other recorded data and thus deleted
• data not entered due to misunderstanding
• certain data may not be considered important at the
time of entry
• not register history or changes of the data
• Missing data may need to be inferred
Module 1 Introduction to Data Visualization

Data Transformation
Data Transformation Module 1 Introduction to Data Visualization

• A function that maps the entire set of values of a given attribute to a new set of replacement
values s.t. each old value can be identified with one of the new values
• Methods
• Smoothing: Remove noise from data
• Attribute/feature construction
• New attributes constructed from the given ones
• Aggregation: Summarization, data cube construction
• Normalization: Scaled to fall within a smaller, specified range
• min-max normalization
• z-score normalization
• normalization by decimal scaling
• Discretization: Concept hierarchy climbing
Module 1 Introduction to Data Visualization

Python Libraries: NumPy

NumPy
Module 1 Introduction to Data Visualization

• Stands for Numerical Python

• Is the fundamental package required for high performance computing and
data analysis
• NumPy is so important for numerical computations in Python is because it
is designed for efficiency on large arrays of data.
• It provides
• ndarray for creating multiple dimensional arrays
• Internally stores data in a contiguous block of memory, independent of other built-in
Python objects, use much less memory than built-in Python sequences.
• Standard math functions for fast operations on entire arrays of data without having
to write loops
• NumPy Arrays are important because they enable you to express batch operations
on data without writing any for loops. We call this vectorization.
Module 1 Introduction to Data Visualization
NumPy ndarray vs list
• One of the key features of NumPy is its N-dimensional array object, or
ndarray, which is a fast, flexible container for large datasets in Python.

• Whenever you see “array,” “NumPy array,” or “ndarray” in the text, with few
exceptions they all refer to the same thing: the ndarray object.

• NumPy-based algorithms are generally 10 to 100 times faster (or more) than
their pure Python counterparts and use significantly less memory.
import numpy as np
my_arr = np.arange(1000000)
my_list = list(range(1000000))
Module 1 Introduction to Data Visualization

pandas
Module 1 Introduction to Data Visualization
Why pandas?
• One of the most popular library that data scientists
use
• Labeled axes to avoid misalignment of data
salar Credit score
y
• When merge two tables, some rows may be different Alice 5000 700

• Missing values or special values may need to be

0
Bob NA 670
removed or replaced heigh Weig Weigh age Gend Chris 6000 NA
t ht t2 er 0
Amy 160 125 126 32 2 David - 750
Bob 170 167 155 -1 1 9999
9
Chris 168 143 150 28 1
Ella 7000 685
David 190 182 NA 42 1
0
Ella 175 133 138 23 2 Tom 4500 660
Frank 172 150 148 45 1 0
Module 1 Introduction to Data Visualization

Overview
• Created by Wes McKinney in 2008, now maintained by many others.
• Author of one of the textbooks: Python for Data Analysis
• Powerful and productive Python data analysis and Management
Library
• Panel Data System
• The name is derived from the term "panel data", an econometrics term for
data sets that include both time-series and cross-sectional data
• Its an open source product.
Module 1 Introduction to Data Visualization

Overview - 2
• Python Library to provide data analysis features similar to: R,
MATLAB, SAS
• Rich data structures and functions to make working with data
structure fast, easy and expressive.
• It is built on top of NumPy
• Key components provided by Pandas:
• Series
From now on:
• DataFrame
from pandas import Series, DataFrame
import pandas as pd
Module 1 Introduction to Data Visualization

matplotlib
Module 1 Introduction to Data Visualization
• Matplotlib is one of the most popular Python packages used for data
visualization.
• It is a cross-platform library for making 2D plots from data in arrays.
• Matplotlib is written in Python and makes use of NumPy, the numerical
mathematics extension of Python.
• It can be used in Python and IPython shells, Jupyter notebook and web
application servers also.
• Matplotlib has a procedural interface named the Pylab, which is designed to
resemble MATLAB, a proprietary programming language developed by
MathWorks.
• Matplotlib along with NumPy can be considered as the open source equivalent of
MATLAB. Matplotlib was originally written by John D. Hunter in 2003.
Module 1 Introduction to Data Visualization

GGplot
Module 1 Introduction to Data Visualization
ggplot2
• ggplot2: probably the most important visualization library in R.
• Enables most basic plot types.
• Implementation of the Grammar of Graphics (2010) by Hadley
Wickham, the guru of R.
• http://vita.had.co.nz/papers/layered-grammar.pdf
• The Grammar of Graphics is a philosophical outlook on exploratory
visualization expressed in Wilkinson, L., Anand, A., and Grossman, R.
(2005), “Graph-Theoretic Scagnostics”.
• http://papers.rgrossman.com/proc-094.pdf
Module 1 Introduction to Data Visualization

Plotting figures and graphs with ggplot

• ggplot is the plotting library for tidyverse
• Powerful
• Flexible

• Follows the same conventions as the rest of tidyverse

• Data stored in tibbles
• Data is arranged in 'tidy' format
• Tibble is the first argument to each function
Module 1 Introduction to Data Visualization

Code structure of a ggplot graph

• Start with a call to ggplot()
• Pass the tibble of data (normally via a pipe)
• Say which columns you want to use via a call to aes()

• Say which graphical representation (geometry) you want

to use
• Points, lines, barplots etc

• Customise labels, colours annotations etc.

Module 1 Introduction to Data Visualization

Introduction to pandas Data

Structures
Series
Module 1 Introduction to Data Visualization

• One dimensional array-like object

• It contains array of data (of any NumPy data type) with associated
indexes. (Indexes can be strings or integers or other data types.)
• By default , the series will get indexing from 0 to N where N = size -1
from pandas import Series, DataFrame #Output
import pandas as pd 0 4
1 7
obj = Series([4, 7, -5, 3])
2 -5
print(obj) 3 3
print(obj.index) dtype: int64
print(obj.values) RangeIndex(start=0, stop=4, step=1)
[ 4 7 -5 3]
Module 1 Introduction to Data Visualization
Series – referencing elements
obj2 = Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c']) obj2['d']= 10
print(obj2) print(obj2[['d', 'c', 'a']])
#Output #Output
d 4 d 10
b 7 c 3
a -5 a -5
c 3 dtype: int64
dtype: int64
print(obj2.index) print(obj2[:2])
#Output #Output
Index(['d', 'b', 'a', 'c'], dtype='object') d 10
b 7
print(obj2.values)
dtype: int64
#Output
[ 4 7 -5 3]
print(obj2.a)
#Output
print(obj2['a'])
-5
#Output
-5
Module 1 Introduction to Data Visualization
Series – array/dict operations obj4 = obj3[obj3>0]
print(obj4)
Can be thought of as a dict.
#output
Can be constructed from a dict directly. d 10
b 7
obj3 = Series({'d': 4, 'b': 7, 'a': -5, 'c':3 }) c 3
print(obj3) dtype: int64
#output
print(obj3**2)
d 4 #output
b 7 d 100
a -5 b 49
c 3 a 25
c 9
dtype: int64
dtype: int64

numpy array operations can print(‘b’ in obj3)

also be applied, which will #output
preserve the index-value link true

1 Introduction
No ratings yet
1 Introduction
130 pages
DVP Unit1
No ratings yet
DVP Unit1
44 pages
Data Visualization Techniques: Dr. D. Koteswara Rao
No ratings yet
Data Visualization Techniques: Dr. D. Koteswara Rao
41 pages
Business Data Visual
No ratings yet
Business Data Visual
50 pages
Intro Visualization
No ratings yet
Intro Visualization
46 pages
DATA4
No ratings yet
DATA4
259 pages
2.1 Introduction To Data Visualization
No ratings yet
2.1 Introduction To Data Visualization
16 pages
Eds Unit 3
No ratings yet
Eds Unit 3
22 pages
Gr9... DATA EXPLORATION
No ratings yet
Gr9... DATA EXPLORATION
37 pages
Module4 DSV
No ratings yet
Module4 DSV
89 pages
Data Preprocessing
No ratings yet
Data Preprocessing
76 pages
417 AI Handbook Class9 Data Visualization
No ratings yet
417 AI Handbook Class9 Data Visualization
11 pages
Data Visualization in Data Science
100% (6)
Data Visualization in Data Science
34 pages
Introduction To Data Visualisation
100% (1)
Introduction To Data Visualisation
47 pages
Class X AI Project Cycle Notes
No ratings yet
Class X AI Project Cycle Notes
19 pages
Webinar StorytellingwithDataSession5-6
No ratings yet
Webinar StorytellingwithDataSession5-6
30 pages
Data Visualization1
No ratings yet
Data Visualization1
5 pages
IDV-02-Data Foundations
No ratings yet
IDV-02-Data Foundations
208 pages
1152cs191 Data Visualization Unit i
No ratings yet
1152cs191 Data Visualization Unit i
129 pages
DV
No ratings yet
DV
30 pages
Introduction to Data Visualization
No ratings yet
Introduction to Data Visualization
28 pages
Data Science Visualization in R
No ratings yet
Data Science Visualization in R
42 pages
Introduction To Visualization and Stages
No ratings yet
Introduction To Visualization and Stages
4 pages
notes
No ratings yet
notes
10 pages
1.fundamentals of 1D Visualization
No ratings yet
1.fundamentals of 1D Visualization
246 pages
Data visualisation
No ratings yet
Data visualisation
232 pages
Dmv-Unit 1
No ratings yet
Dmv-Unit 1
49 pages
Dv Chapter 1
No ratings yet
Dv Chapter 1
25 pages
BDA - UNIT 5
No ratings yet
BDA - UNIT 5
24 pages
Notes_DV_2025[1]
No ratings yet
Notes_DV_2025[1]
10 pages
L01-intro
No ratings yet
L01-intro
47 pages
DECAP782_ADVANCE_DATA_VISUALIZATION
No ratings yet
DECAP782_ADVANCE_DATA_VISUALIZATION
368 pages
W5 Lecture Slides
No ratings yet
W5 Lecture Slides
54 pages
DV UNIT-1
No ratings yet
DV UNIT-1
8 pages
foundation of Data science imp notes
No ratings yet
foundation of Data science imp notes
6 pages
Data Visualization-1
No ratings yet
Data Visualization-1
29 pages
CO5-Session-1-Evaluate data visualization and identify ways to improve it
No ratings yet
CO5-Session-1-Evaluate data visualization and identify ways to improve it
14 pages
Data Visualization
No ratings yet
Data Visualization
16 pages
Performance Management, Evaluation, and Data Analysis
No ratings yet
Performance Management, Evaluation, and Data Analysis
22 pages
Visualization 2 Data Representation 1
No ratings yet
Visualization 2 Data Representation 1
59 pages
Chapter 1 - 1
No ratings yet
Chapter 1 - 1
44 pages
All_Unit_DV_Notes
No ratings yet
All_Unit_DV_Notes
31 pages
DMV - UNIT 3 & 4 (1)
No ratings yet
DMV - UNIT 3 & 4 (1)
32 pages
Unit v Da Online.pptx
No ratings yet
Unit v Da Online.pptx
66 pages
Elec 3-Reviewer
No ratings yet
Elec 3-Reviewer
33 pages
UNIT 5 (1)
No ratings yet
UNIT 5 (1)
6 pages
Data Mining
No ratings yet
Data Mining
34 pages
Lectures
No ratings yet
Lectures
191 pages
IN4089 - Lecture 01 - Intro - What Why How-Pdfjam
No ratings yet
IN4089 - Lecture 01 - Intro - What Why How-Pdfjam
16 pages
Fe 550
No ratings yet
Fe 550
4 pages
117891
No ratings yet
117891
16 pages
Introduction To Data Visualization
No ratings yet
Introduction To Data Visualization
28 pages
DS351 DataViz Intro
No ratings yet
DS351 DataViz Intro
49 pages
Chapter 5 - Big Data Implementation Part 2 (Data Visualization)
No ratings yet
Chapter 5 - Big Data Implementation Part 2 (Data Visualization)
50 pages
unit-1-total-data-visualization-techniques
No ratings yet
unit-1-total-data-visualization-techniques
22 pages
(AK Peters Visualization Series) Neil Richards - Questions in Dataviz_ a Design-Driven Process for Data Visualisation-CRC Press_ an a K Peters (2022)
No ratings yet
(AK Peters Visualization Series) Neil Richards - Questions in Dataviz_ a Design-Driven Process for Data Visualisation-CRC Press_ an a K Peters (2022)
367 pages
Class IX - Chapter 2 AI Project Cycle Notes
50% (6)
Class IX - Chapter 2 AI Project Cycle Notes
11 pages
UNIT 1 DVT
No ratings yet
UNIT 1 DVT
22 pages
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Machine Learning Lab: Raheel Aslam (74-FET/BSEE/F16)
No ratings yet
Machine Learning Lab: Raheel Aslam (74-FET/BSEE/F16)
5 pages
Python Lab
No ratings yet
Python Lab
8 pages
Unit 5 previous year papers
No ratings yet
Unit 5 previous year papers
3 pages
II CSE CS3352 FDS QB Unit5
No ratings yet
II CSE CS3352 FDS QB Unit5
4 pages
Python Data Visualization Cookbook 2nd Edition Igor Milovanovic - Own the ebook now with all fully detailed content
100% (1)
Python Data Visualization Cookbook 2nd Edition Igor Milovanovic - Own the ebook now with all fully detailed content
62 pages
UML501 Project Report
No ratings yet
UML501 Project Report
12 pages
Bike Sharing Python Report
No ratings yet
Bike Sharing Python Report
40 pages
Plotting With Pyplot: This PDF Is Created at
No ratings yet
Plotting With Pyplot: This PDF Is Created at
5 pages
Cheat Sheets For AI, Neural Networks, Machine Learning, Deep Learning & Big Data PDF
100% (1)
Cheat Sheets For AI, Neural Networks, Machine Learning, Deep Learning & Big Data PDF
30 pages
DVAP - Final Project Report
No ratings yet
DVAP - Final Project Report
27 pages
P.pranay Kumar 21BAI1504 EXP-10
No ratings yet
P.pranay Kumar 21BAI1504 EXP-10
5 pages
Zomato Data Analysis
No ratings yet
Zomato Data Analysis
8 pages
Downloadable: Cheat Sheets For AI, Neural Networks, Machine Learning, Deep Learning & Data Science PDF
No ratings yet
Downloadable: Cheat Sheets For AI, Neural Networks, Machine Learning, Deep Learning & Data Science PDF
34 pages
python科学计算第二版（可编辑）
No ratings yet
python科学计算第二版（可编辑）
723 pages
Mastering Matplotlib - Sample Chapter
No ratings yet
Mastering Matplotlib - Sample Chapter
27 pages
Data Visualization
No ratings yet
Data Visualization
18 pages
Introductory Notes: Matplotlib: Preliminaries
No ratings yet
Introductory Notes: Matplotlib: Preliminaries
11 pages
Create Power BI Visuals by Using Python
100% (1)
Create Power BI Visuals by Using Python
10 pages
Python Lecture 6 (2025)
No ratings yet
Python Lecture 6 (2025)
27 pages
(Ebook) Python Data Visualization Cookbook by Milovanović, Igor ISBN 9781782163367, 1782163360 - The complete ebook version is now available for download
100% (2)
(Ebook) Python Data Visualization Cookbook by Milovanović, Igor ISBN 9781782163367, 1782163360 - The complete ebook version is now available for download
59 pages
VSA - Very Short Answer Question (For 1 Mark) : Chapter - Data Visualization
No ratings yet
VSA - Very Short Answer Question (For 1 Mark) : Chapter - Data Visualization
6 pages
Matplotlib-Users Guide 0.90.0
No ratings yet
Matplotlib-Users Guide 0.90.0
101 pages
Matplotlib in Python
No ratings yet
Matplotlib in Python
23 pages
Olympic Data Minor Project 5th Sem
No ratings yet
Olympic Data Minor Project 5th Sem
23 pages
A Summer Trainning Report
No ratings yet
A Summer Trainning Report
18 pages
Python - Draw Polygons More Efficiently With Matplotlib - Stack Overflow
No ratings yet
Python - Draw Polygons More Efficiently With Matplotlib - Stack Overflow
5 pages
AIGDEL - 0820 Red 1 26 - Compressed 1 26
No ratings yet
AIGDEL - 0820 Red 1 26 - Compressed 1 26
26 pages
Statistics With Python (Matplotlib)
No ratings yet
Statistics With Python (Matplotlib)
22 pages
Matplotlib 1
No ratings yet
Matplotlib 1
14 pages
Python Libraries
No ratings yet
Python Libraries
10 pages

1.Data Handling and Visualization Module 1 Slides

Uploaded by

1.Data Handling and Visualization Module 1 Slides

Uploaded by

Module 1

Introduction to Data Visualization

Data Collection Strategies

Rules for Collecting Data

Data Preparation Basic Models

Overview of Data Visualization

Defining visualization (vis)

• Visualization is suitable when there is a need to augment

[Cerebral: Visualizing Multiple Experimental Conditions on a

Data abstraction: Three operations

Analysis: Four Levels for

– who are the target users? abstraction

Munzner. IEEE TVCG 19(12):2376-2385, 2013 (Proc. InfoVis 2013). ]

– efficient computation IEEE TVCG 15(6):921-928, 2009 (Proc. InfoVis 2009). ]

[A Nested Model of Visualization Design and

Interacting with Databases

Data Cleaning and

• The Statistics View: • The Domain Expert’s View:

Data Cleaning Makes Everything Okay?

In fact, the data were rejected as unreasonable

Handling Missing Data

Python Libraries: NumPy

• Stands for Numerical Python

• Missing values or special values may need to be

Plotting figures and graphs with ggplot

• Follows the same conventions as the rest of tidyverse

Code structure of a ggplot graph

• Say which graphical representation (geometry) you want

• Customise labels, colours annotations etc.

Introduction to pandas Data

• One dimensional array-like object

numpy array operations can print(‘b’ in obj3)

You might also like