0% found this document useful (0 votes)

992 views22 pages

UNIT 1 DVT

Uploaded by

Yashwanth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

992 views22 pages

UNIT 1 DVT

Uploaded by

Yashwanth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

UNIT – I

Introduction and Data Foundation: Basics - Relationship between Visualization and Other Fields

 The Visualization Process

 Pseudocode Conventions

 The Scatter plot

 Data Foundation

 Types of Data

 Structure within and between Records

 Data Pre-processing

 Data Sets

 Introduction:

What Is Visualization?

 We define visualization as the communication of information using graphical representations.

 It is pictorial or Graphical representation of data is called data visualization.

 Its helps to understand complex information in easier and simple way

 Its help you to make data more meaningful by adding life to it.

 to convey information through visual representations

Data visualization Techniques: There are various data visualization techniques and tools available, and choice of
techniques depends on the nature of the data message you want to convey.

Common visualization techniques are:

Bar Charts: bar charts used to compare categories of data. vertical and horizontal bar charts are common types.Disply the
data in bars of different heights.

Line charts: Line charts are great for showing trends over time. They connect data points with lines , making it easy to
visualize changes and patterns.

Pie Charts: It displays the partes of the whole. They are suitable for the composition of data set.

Scatter plots: These are used to show the relationship between two variables. Each data points are represented by a Dot.

Various data visualization techniques: are

 Heat map, Histograms, Area charts, Bubble charts, Tree maps,3D visualization…
Visualization in Everyday Life:

 a table in a newspaper, representing data being discussed in an article;

 a train and subway map with times used for determining train arrivals and departures;

 a map of the region, to help select a route to a new destination;

 a weather chart showing the movement of a storm front that might influence your weekend activities;

 a graph of stock market activities that might indicate an upswing (or downturn) in the economy;

 a plot comparing the effectiveness of your pain killer to that of the leading brand;

 a 3D reconstruction of your injured knee, as generated from a CT scan;

 the result of a financial and stock market analysis;

 a mechanical and civil engineering rotary bridge design and systems analysis;

 a breast cancer MRI for diagnosis and therapy;

 a comet path data and trend analysis;

 the study of actuarial data for confirming and guiding quantitative analysis;

 the simulation of a complex process;

 the analysis of a simulation of a physical system;

 marketing posters and advertising.

Uses of data visualization:

 Decision making

 Finding solution to problems

 For understanding the data clearly

 To find the Relationship among the data

 Comparative analysis

Importance

 Data visualization allows business users to gain insight into their vast amounts of data. It benefits them to recognize
new patterns and errors in the data. Making sense of these patterns helps the users pay attention to areas that indicate
red flags or progress. This process, in turn, drives the business ahead. Because of the way the human brain processes
information, using charts or graphs to visualize large amounts of complex data is easier than poring over spreadsheets
or reports. ..., Data visualization can also: Identify areas that need attention or improvement .

Relationship between Visualization and Other Fields:

 What Is the Difference between Visualization and Computer Graphics?

 visualization was considered a subfield of computer graphics, primarily because visualization uses graphics to display
information via images.visualization applies graphical techniques to generate visual displays of data. Here, graphics
is used as the communication medium.In all visualizations, one can clearly see the use of the graphics primitives
(points, lines, areas, and volumes).While computer graphics can be used to define and generate the displays that are
used to communicate the information, the sources of data and the way users interact and perceive the data are all
important components to understand when presenting information. A secondary application of computer graphics is
in art and entertainment, with video games, cartoons, advertisements, and movie special effects as typical examples.
Visualization, on the other hand, does not emphasize visual realism as much as the effective communication of
information. Many types of visualizations do not deal with physical objects, and those that do are often
communicating attributes of the objects that would normally not be visible, such as material stress or fluid flow
patterns. Thus, while computer graphics and visualization share many concepts, tools, and techniques, the underlying
models and goals are fundamentally different.Thus, computer graphics consists of the tools that display the
visualizations seen in this book. This includes the graphics-programming language, and more.

Scientific Data Visualization vs. Information Visualization

 Although during the 1990s and early 2000s the visualization community differentiated between scientific visualization
and information visualization, we do not. Both provide representations of data. However the data sets are most often
different.In Scientific data visualization is frequently considered to focus on the visual display of spatial data associated
with scientific processes such as bonding of molecules in computational chemistry.Information data visualization
examines developing visual metaphors for non-inherently spatial data such as the exploration of text based document
data base.

Relationship between visualization and Other fields

 Visualization is a powerful tool that helps to represent data and information in a graphical format.

It has connections to many fields, including computer science, data analytics, business intelligence, and more…
Data science and analytics:It can be used to explore the data, identify patterns and trends,communicate the results of an
analysis to stakeholders.

Business Intelligence:visualization is key component of Business intelligence(BI).It can be used to create the dashboards
and reports that provides insights into business performance.
Engineering:It is used in engineering to design the systems.It is used to create models of physical systems,simulate their
behavior,and identify the potential problems.

Medicine:Visualization used in medicine to diagnose diseases plan surgeries and track patient progress.It can be used to
create images of the body, such as X-Ray,MRI scans and ultrasounds.

Visualization also useful to education and art…in education …teacher explain use the animation(photosynthesis).

Art: It is used to create visual representations of ideas and emotions.

The Visualization Process

 The visualization process is the steps involved in Creating visual representation of data.

 Define the purpose of visualization

 Gather the data

 Clean and prepare the data

 Choose right visualization technique

 Design the visualization

 Evaluate the visualization

 Data visualization is the practice of translating information into a visual context, such as a map or graph, to make
data easier for the human brain to understand and pull insights from. The main goal of data visualization is to make
it easier to identify patterns, trends and outliers in large data sets.The first step toward good data visualization is to
identify the problem you’re trying to solve.More data isn’t always better — what you need is the right data for the
right question. And choosing the best pieces of the puzzle to highlight relies on a solid understanding of what you
want to measure.

Gather the data:

• collect the relevant data from various sources such as database, sensors, surveys.
Clean and prepare the data:

 clean and preprocess the data to handle missing values , outliers.

 Transform the data necessary(Aggregation , normalization , scaling).

Choose right visualization technique:

 After you’ve identified your purpose (Step1) and audience (step 2), you know what kind of data you need to
summarize. Armed with that knowledge, you can use the guide to the right to pick an appropriate chart type.

 Choose the appropriate visualization tools and techniques based on nature of your data and objectives.
Design the visualization:

• Choose the colour ,fonts and other visual attributes.

• Whenever possible, try to make sure important information I is communicated in a way that doesn’t rely entirely
on color. This will allow your data to be understood by the broadest possible audience.

• Color is an important component of data visualization and is used extensively as a way to represent information
within a graphic.

Evaluate the visualization:

 Test the visualization with your audience

 Instead of just presenting your data, connect it to a broader context. Make sure it’s easy to interpret the information
quickly and apply it in the context of a particular challenge.

 Finally, and perhaps most importantly, make sure the data you’re sharing is actionable. The best visualizations can
turn insights into action by allowing your audience to use data to inform their strategies and business decisions.
The Visualization Process:

 The process of visualizing directs your subconscious to be aware of the end goal you have in mind.

 The designer of a new visualization most often begins with an analysis of the type of data available for display and of
the type of information the viewer hopes to extract from or convey with the display

 The data can come from a wide variety of sources and may be simple or complex in structure

The Computer Graphics Pipeline

 For computer graphics the stages are as follows

 Modeling. A three-dimensional model, consisting of planar polygons defined by vertices and surface properties, is
generated using a world coordinate system.

 Viewing. A virtual camera is defined at a location in world coordinates, along with a direction and orientation
(generally given as vectors). All vertices are transformed into a viewing coordinate system based on the camera
parameters.

 Clipping. By specifying the bounds of the desired image (usually given by corner positions on a plane of projection
placed in front of the camera), objects out of view can be removed, and those that are partially visible can be clipped.
Objects may be transformed into normalized viewing coordinates to simplify the clipping process. Clipping can
actually be performed at many different stages of the pipeline.

 Hidden surface removal. Polygons facing away from the camera, or those obscured by others, are removed or
clipped. This process may be integrated into the projection process.

 Projection. Three-dimensional polygons are projected onto the two dimensional plane of projection, usually using a
perspective transformation. The results may be in a normalized 2D coordinate system or device/screen coordinates.

 Rendering. The actual color of the pixels associated with a visible polygon depends on a number of factors,
including the material properties being synthesized (base color, texture, surface roughness, shininess), the type(s),
location(s), color, and intensity of the light source(s), the degree of occlusion from direct light exposure, and the
amount and color of light being reflected off of other objects onto the polygon.

The Graphics PipeLine :

The Visualization Pipeline:

 The data/information visualization pipeline has some similarities to the graphics pipeline, at least on an abstract level.
The stages of this pipeline are as follows:

 Data modelling. The data to be visualized, whether from a file or a database, has to be structured to facilitate its
visualization. The name, type, range, and semantics of each attribute or field of a data record must be available in a
format that ensures rapid access and easy modification

 Data selection. Similar to clipping, data selection involves identifying the subset of the data that will be potentially
visualized. This can occur totally under user control or via algorithmic methods, such as cycling through time slices
or automatically detecting features of potential interest to the use.

The Knowledge Discovery Pipeline:

 The knowledge discovery (also called data mining) field has its own pipeline.Note that the visualization pipeline can
be overlaid on this knowledge discovery (KD) pipeline.

 Data: In the KD pipeline there is more focus on data, as the graphics and visualization processes often assume that
the data is already structured to facilitate its display Data integration, cleaning, warehousing and selection:These
involve identifying the various data sets that will be potentially analyzed. Again, the user may participate in this step.
This can involve filtering, sampling, subsetting, aggregating, and other techniques that help curate and manage the
data for the data mining step. Data mining:The heart of the KD pipeline is algorithmically analyzing the data to
produce a model.

 Pattern evaluation: The resulting model or models must be evaluated to determine their robustness, stability,
precision, and accuracy.

 Rendering or visualization: The specific results must be presented to the user. It does not matter whether we think
of this as part of the graphics or visualization pipelines.

 Interactive visualization can be used at every step of the KD pipeline. One can think of this as computational
steering.

Pseudocode Conventions:
 In our pseudocode, we aim to convey the essence of the algorithms at hand, while leaving out details required for user
interaction, graphics nuances, and data management.

 the following global variables and functions exist in the environment of the pseudocode:

 data—The working data table. This data table is assumed to contain only numeric values. In practice, dimensions of
the original data table that contain non-numeric values must be somehow converted to numeric values. When
visualizing a subset of the entire original data table, the working data table is assumed to be the subset.

 m—The number of dimensions (columns) in the working data table.

 Dimensions are typically iterated over using j as the running dimension index.

 n—The number of records (rows) in the working data table. Records are typically iterated over using i as the running
record index.

 Normalize(record, dimension), Normalize(record, dimension, min , max)—

 A function that maps the value for the given record and dimension in the working data table to a value between min
and max ,or between zero and one if min and max are not specified.

 The normalization is typically linear and local to a single dimension. However, in practice, code must be structured
such that various kinds of normalization could be used (logarithmic or square root, for example) either locally (using
the bounds of the current dimension), globally (using the bounds of all dimensions), or local to the active dimensions
(using the bounds of the dimensions being displayed). Also, in practice, one must accommodate multiple kinds of
normalization within a single visualization. For example, a scatterplot may require a linear normalization for the x-
axis and a logarithmic normalization for the y-axis.\

 Color(color)—A function that sets the color state of the graphics environment to the specified color (whose type is
assumed to be an integer containing RGB values).

 MapColor(record, dimension)—A function that sets the color state of the graphics environment to be the color
derived from applying the global color map to the normalized value of the given record and dimension in the working
data table.

 Circle(x, y,radius)—A function that fills a circle centered at the given(x, y)-location, with the given radius, with the
color of the color state of the graphics environment. The plotting space for all visualizations is the unit square. In
practice, this function must map the unit square to a square in pixel coordinates.

 Polyline(xs, ys)—A function that draws a polyline (many connected line segments) from the given arrays of x and y
coordinates.

 Polygon(xs, ys)—A function that fills the polygon defined by the given arrays of x- and y-coordinates with the color
of the current color state.

The Scatterplot:

 A scatterplot is a type of data display that shows the relationship between two numerical variables. Each member of the
dataset gets plotted as a point whose ( x , y ) (x, y) (x,y)left parenthesis, x, comma, y, right parenthesis coordinates
relates to its values for the two variables.

 A scatter plot is composed of a horizontal axis containing the measured values of one variable (independent variable)
and a vertical axis representing the measurements of the other variable (dependent variable). The purpose of the scatter
plot is to display what happens to one variable when another variable is changed.
 A scatter plot is a type of graph.

 A scatter plot can be defined as a graph containing bivariate data in the form of plotted points, which allows viewers
to see a correlation between plotted points.

 A scatter plot in its simplest form is a way to display bivariate data.

 bivariate data is simply data that has been collected that reflects two different variables.

 Scatter plots are sometimes called scatter diagrams. Some other known that may be more familiar are line graphs, bar
graphs, box and whisker plots, or even picture graphs.

EX:

 A simple scatter plot can be used to see the difference in outdoor temperatures compared to ice cream sales. The two
variables would be outside temperature and ice cream sales. This data could be collected and organized into a table.

 Once the data is organized into a table, it can be turned into ordered pairs. The x-value will always be
the independent variable while the y-value will always be the dependent variable.

 (50, 3), (65, 18), (70, 54), (85, 75), (100, 98)

 Now that points have been created, they can be plotted to see what the scatter plot looks like. The independent
variable will go along the x-axis and the dependent variable will go along the y-axis.

 Scatter plot showing Outside Temperature versus Ice Cream Cone Sales.

What is the Purpose of a Scatter Plot?

 The main purpose of a scatter plot is to display bivariate data.

 Being able to visualize the relationship between bivariate data gives us a lot of information.

Data Foundations:

 Data Foundations will include the varied streams of data an organization has in a truly unified, interoperable, and
accessible system.
 Any business analytics or decision making will depend upon this data foundation, which has become the
organizational source of truth.

 Every visualization starts with the data that is to be displayed, a first step in addressing the design of visualizations is
to examine the characteristics of the data

 Data comes from many sources; it can be gathered from sensors or surveys, or it can be generated by simulations and
computations.

 Data can be raw (untreated), or it can be derived from raw data via some process, such as smoothing, noise removal,
scaling, or interpolation. It also can have a wide range of characteristics and structures.

 A typical data set used in visualization consists of a list of n records,(r1, r2,...,rn).

 Each record ri consists of m (one or more) observations or variables, (v1,v2,...vm).

 A variable may be classified as either independent or dependent.

 An independent variable ivi is one whose value is not controlled or affected by another variable, such as the time
variable in a time-series data set.

 A dependent variable dvj is one whose value is affected by a variation in one or more associated independent
variablesTemperature for a region would be considered a dependent variable, as its value could be affected by
variables such as date, time, or location. Thus we can formally represent a record as

ri = (iv1, iv2, . . . ivmi , dv1, dv2, . . . dvmd ),

where

mi is the number of independent variables,

md is the number of dependent variables.

With this notation we have, m = mi + md.

Why is data foundation important?

Improved decision making, enhanced operational efficiency, and increased innovation

Types of Data:

 each variable of a data record represents a single piece of Information.

 These are two types

1. Ordinal(numeric)

2. Nominal(nonnumeric)

Ordinal. The data take on numeric values:

• binary—assuming only values of 0 and 1;

• discrete—taking on only integer values or from a specific subset

(e.g., (2, 4, 6));

• continuous—representing real values (e.g., in the interval [0, 5]).

 Nominal. The data take on nonnumeric values:

 categorical—a value selected from a finite (often short) list of possibilities (e.g., red, blue, green);

 ranked—a categorical variable that has an implied ordering (e.g., small, medium, large);

 arbitrary—a variable with a potentially infinite range of values with no implied ordering (e.g., addresses).

 Another method of categorizing variables is by using the mathematical concept of scale.

 Scale. Three attributes that define a variable’s measure are as follows:

 Ordering relation, with which the data can be ordered in some fashion. By definition, ranked nominal variables
and all ordinal variables exhibit this relation.

 Distance metric, with which the distances can be computed between different records. This measure is clearly
present in all ordinal variables, but is generally not found in nominal variables.

 Existence of absolute zero, in which variables may have a fixed

lowest value. This is useful for differentiating types of ordinal variables. A variable such as weight possesses an
absolute zero, while bank balance does not. A variable possesses an absolute zero if it makes sense to apply all four
mathematical operations(+, −, ×, ÷) to it [129].

Structure within and between Records

 Data sets have structure, both in terms of the means of representation (syntax ), and the types of interrelationships
within a given record and between records (semantics).

1. Scalars, Vectors, and Tensors

2. Geometry and Grids

3. Other Forms of Structure

Scalars, Vectors, and Tensors

 An individual number in a data record is often referred to as a scalar.

 Scalar values, such as the cost of an item or the age of an individual, are often the focus for analysis and visualization.

 Multiple variables within a single record can represent a composite data item.

 For example, a point in a two-dimensional flow field might be represented by a pair of values, such as a displacement
in x and y. This pair, and any such composition, is referred to as a vector.

 While each component of a vector might be examined individually, it is most common to treat the vector as a whole.

 Scalars and vectors are simple variants on a more general structure known as a tensor.

 A tensor is defined by its rank and by the dimensionality of the space within which it is defined. It is generally
represented as an array or matrix.

 A scalar is a tensor of rank 0,

 a vector is a tensor of rank 1.

 a tensor of rank M in D-dimensional space requires DM data values.

Geometry and Grids:

 Geometric structure can commonly be found in data sets, especially those from scientific and engineering domains.

 The simplest method of incorporating geometric structure in a data set is to have explicit coordinates for each data
record.

 A grid can be used to organize graphic elements in relation to a page, in relation to other graphic elements on the
page, or relation to other parts of the same graphic element or shape.
 A grid is a set of intersecting horizontal and vertical lines defining columns and rows. Elements can be placed onto
the grid within these column and row lines.

Other Forms of Structure:

 A timestamp is an important attribute that can be associated with a data record.

 Another important form of structure found within many data sets is that of topology

 The following are examples of various structured data:

 MRI (magnetic resonance imaging). Density (scalar), with three spatial attributes, 3D grid connectivity;

 CFD (computational fluid dynamics). Three dimensions for displacement, with one temporal and three spatial
attributes, 3D grid connectivity (uniform or non uniform);

 CAD (computer-aided design). Three spatial attributes with edge and polygon connections, and surface properties;

 Remote sensing. Multiple channels, with two or three spatial attributes, one temporal attribute, and grid connectivity;

 Census. Multiple fields of all types, spatial attributes (e.g., addresses), temporal attribute, and connectivity implied by
similarities in fields;

 Social Network. Nodes consisting of multiple fields of all types, with various connectivity attributes that could be
spatial, temporal, or dependent on other attributes, such as belonging to the same group or having some common
computed values.

Data Preprocessing

 Data preprocessing is the process of transforming raw data into an understandable format.

 Data Preprocessing Steps:

1. Metadata and Statistics

2. Missing Values and Data Cleaning

3. Normalization

4. Segmentation

5. Sampling and Subsetting

6. Dimension Reduction

7. Mapping Nominal Dimensions to Numbers

8. Aggregation and Summarization

9. Smoothing and Filtering

10. Raster-to-Vector Conversion

 In many domains, such as medical imaging, the data analyst is often opposed to any sort of data modifications, such
as filtering or smoothing, for fear that important information will be lost or deceptive artifacts will be added.

 Viewing raw data also often identifies problems in the data set, such as missing data, or outliers that may be the
result of errors in computation or input.

 Depending on the type of data and the visualization techniques to be applied, however, some forms of preprocessing
might be necessary.

Metadata and Statistics:

 Metadata are data that describe other data. Thus, statistical metadata are data that describe statistical data.

 Statistical metadata may also describe processes that collect, process, or produce statistical data;

 such metadata are also called process data.

 Information regarding a data set of interest (its metadata) and statistical analysis can provide invaluable guidance in
preprocessing the data.

 Metadata may provide information that can help in its interpretation, such as the format of individual fields within the
data records.

 It may also contain the base reference point from which some of the data fields are measured, the units used in the
measurements, the symbol or number used to indicate a missing value and the resolution at which measurements were
acquired.

2.Missing Values and Data Cleaning:

 One of the realities of analyzing and visualizing “real” data sets is that they often are missing some data entries or
have erroneous entries.

 Missing data may be caused by several reasons, including, for example, a malfunctioning sensor, a blank entry on a
survey, or an omission on the part of the person entering the data.
 Erroneous data is most often caused by human error and can be difficult to detect.

 In either case, the data analyst must choose a strategy for dealing with these common events.

 Some of these strategies, specifically those that are commonly used in data visualization, are outlined below

 Discard the bad record: This seemingly drastic measure, namely to throw away any data record containing a
missing or erroneous field, is actually one of the most commonly applied, since the quality of the remaining data
entries in that record may be in question.

 Assign a sentinel value. Another popular strategy is to have a designated sentinel value for each variable in the data
set that can be assigned when the real value in a record is in question

 Assign the average value. A simple strategy for dealing with bad or missing data is to replace it with the average
value for that variable or dimension.

 Assign value based on nearest neighbor. A better approximation for a substitute value is to find the record that has
the highest similarity with the record in question, based on analyzing the differences in all other variables. The basic
idea here is that if record A is missing an entry for variable i, and record B is closer than any other record to A
without considering variable i, then using the value of variable i from record B as a substitute in A is a reasonable
assumption

 Compute a substitute value. Researchers in multivariate statistics have dedicated a significant amount of energy to
developing methods for generating values to replace missing or erroneous data. The process, known as imputation,
seeks to find values that have high statistical confidence
4. Normalization:
 Normalization is the process of transforming a data set so that the results satisfy a particular statistical property.
 A simple example of this is to transform the range of values a particular variable assumes so that all numbers fall
within the range of 0.0 to 1.0.
 Other forms of normalization convert the data such that each dimension has a common mean and standard deviation.
Normalization is a useful operation since it allows us to compare seemingly unrelated variables
 we need to convert the data range to be compatible with the graphical attribute range.
 For example, if dmin and dmax are the minimum and maximum values for a particular data variable, we can
normalize the values to the range of 0.0 to 1.0 using the formula
 dnormalized = (doriginal − dmin)/(dmax − dmin).
 If the data has a highly non-linear distribution, a linear normalization will map most values to the same or close-by
values. In this case, it may be more appropriate to perform a non-linear normalization such as a square root mapping
5. Segmentation:
Segmentation is the process of dividing a company's target market into groups of potential customers with similar
needs and behaviors.
In many situations, the data can be separated into contiguous regions, where each region corresponds to a particular
classification of the data .For example, an MRI data set might originally have 256 possible values for each data point,
and then be segmented into specific categories, such as bone, muscle, fat, and skin . Simple segmentation can be
performed by just mapping disjoint ranges of the data values to specific categories.

Market segmentation:
6. Sampling and Subsetting:
Often it is necessary to transform a data set with one spatial resolution into another data set with a different spatial
resolution. For example, we might have an image we would like to shrink or expand, or we might have only a small
sampling of data points and wish to fill in values for locations between our samples. In each case, we assume that the
data we possess is a discrete sampling of a continuous phenomenon, and therefore we can predict the values at
another location by examining the actual data nearest to it.
 The process of interpolation is a commonly used resampling method in many fields, including visualization.
 Some common techniques include the following
 1. Linear interpolation
 2. Bilinear interpolation
 3. Nonlinear interpolation
 Sampling means selecting the group that you will actually collect data from in your research

7. Dimension Reduction:
In situations where the dimensionality of the data exceeds the capabilities of the visualization technique, it is necessary
to investigate ways to reduce the data dimensionality, while at the same time preserving, as much as possible, the
information contained within This can be done manually by allowing the user to select the dimensions deemed most
important, or via computational techniques, such as principal component analysis (PCA) [385], multidimensional scaling
(MDS) [259], Kohonen self-organizing maps (SOMs) [248], and Local Linear Embedding (LLE) [350].

7. Mapping Nominal Dimensions to Numbers:

 In many domains, one or more of the data dimensions consist of nominal values

 We may have several alternative strategies for handling these dimensions within our visualizations

 depending on how many nominal dimensions there are, how many distinct values each variable can take on, and
whether an ordering or distance relation is available or can be derived.

 The key is to find a mapping of the data to a graphical entity or attribute that doesn’t introduce artificial relationships
that don’t exist in the data.

 One way to display nominal variables using numeric displays is to map the nominal values to numbers, i.e.,
assigning order and spacing to the nominal values.

8. Aggregation and Summarization:

In the event that too much data is present, it is often useful to group data points based on their similarity in value and/or
position and represent the group by some smaller amount of data. This can be as simple as averaging the values, or there
might be more descriptive information, such as the number of members in the group and the extents of their positions or
values Thus, there are two components to aggregation: the method of grouping the points and the method of displaying
the resulting groups.
9. Smoothing and Filtering:
10.A common process in signal processing is to smooth the data values, to reduce noise and to blur sharp discontinuities.

11. A typical way to perform this task is through a process known as convolution, which for our purposes can be viewed
as a weighted averaging of neighbors surrounding a data point.

12. In a one-dimensional case, this might be implemented via a formula as follows.

1. Pi=pi-1/4 + pi/2 + pi+1/4

2. Where each pi is data point.

13.Mean filtering is a simple method of smoothing and diminishing noise in images by eliminating pixel values that are
unrepresentative of their surroundings.

10. Raster-to-Vector Conversion:

 In computer graphics, objects are typically represented by sets of connected, planar polygons (vertices, edges, and
triangular or quadrilateral patches), and the task is to create a raster (pixel-level) image representing these objects,
their surface properties, and their interactions with light sources and other objects.

 In spatial data visualization, our objects can be points or regions, or they can be linear structures, such as a road on a
map

 It is sometimes useful to take a raster-based data set, such as an image, and extract linear structures from

 The image processing and computer vision fields have developed a wide assortment of techniques for converting
raster images into vertex and edge based models [153, 371].

 A partial list of these include the following

1. Thresholding.

2. Region-growing

3. Boundary-detection.

4. Thinning

 Thresholding: Identify one or more values with which to break the data into regions, after which the boundaries can
be traced to generate the edges and vertices.

 Region-growing: Starting with seed locations, either selected by a human ob-server or computed via scanning of the
data, merge pixels into clusters if they are sufficiently similar to any neighboring point that has been assigned to a
cluster associated with one of the seed pixels.
 Boundary-Detection: Compute a new image from the existing image by convolving the image with a particular
pattern matrix.

 Convolution is a straightforward process.

 Thinning: The convolution process mentioned above can also be used to perform a process called thinning, where the
goal is to reduce wide linear features, such as arteries, to a single pixel in width.

Data Sets:

 Data set is a collection of records usually

 Presented in tabular form.While each may be appreciated without understanding the data being displayed, in
general, the effectiveness of a visualization is enhanced by the user having some context for interpreting what is
being shown. A data set is a structured collection of data points related to a particular subject.

 A collection of related data sets is called a database.

 Data sets can hold information such as medical records or insurance records, to be used by a program running on the
system.

 Data sets are also used to store information needed by applications or the operating system itself, such as source
programs, macro libraries, or system variables or parameters.

Unit I - Data Science
No ratings yet
Unit I - Data Science
161 pages
Big Data Analytics Notes
No ratings yet
Big Data Analytics Notes
33 pages
DVT - Unit 1 Notes
No ratings yet
DVT - Unit 1 Notes
10 pages
Ui Ux-Unit1
No ratings yet
Ui Ux-Unit1
15 pages
CS3352 Fds
No ratings yet
CS3352 Fds
23 pages
Unit-1 Data Visualization Notes
No ratings yet
Unit-1 Data Visualization Notes
15 pages
ds4015 Big Data Analytics Vignesh K Notes
No ratings yet
ds4015 Big Data Analytics Vignesh K Notes
146 pages
U1 - Data Mining Task Primitives
No ratings yet
U1 - Data Mining Task Primitives
4 pages
Chapter 1 Introduction To Visualization
No ratings yet
Chapter 1 Introduction To Visualization
53 pages
R22 DVT Handout
No ratings yet
R22 DVT Handout
10 pages
Unit 3 DV
No ratings yet
Unit 3 DV
44 pages
Data Classification Template
75% (4)
Data Classification Template
5 pages
R22 ML Syllabus
No ratings yet
R22 ML Syllabus
2 pages
Unit 1 DataScience
No ratings yet
Unit 1 DataScience
105 pages
BCA-404: Data Mining and Data Ware Housing
No ratings yet
BCA-404: Data Mining and Data Ware Housing
19 pages
Data Visualization and Story Telling Notes
No ratings yet
Data Visualization and Story Telling Notes
31 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
48 pages
Module - 1 IDS
100% (1)
Module - 1 IDS
19 pages
Data Science and Ethical Issues
No ratings yet
Data Science and Ethical Issues
42 pages
R22-Ids-Question Bank
No ratings yet
R22-Ids-Question Bank
4 pages
Knowledge Representation in Data Mining
No ratings yet
Knowledge Representation in Data Mining
22 pages
MCA - BigData Notes
No ratings yet
MCA - BigData Notes
136 pages
Deep Learning Techniques Notes
No ratings yet
Deep Learning Techniques Notes
42 pages
Unit4 Datascience
No ratings yet
Unit4 Datascience
43 pages
Data Science Module1
No ratings yet
Data Science Module1
20 pages
Data Analytics Question Bank
No ratings yet
Data Analytics Question Bank
4 pages
FDS Unit II Notes
No ratings yet
FDS Unit II Notes
48 pages
Convolution Neural Networks U2
No ratings yet
Convolution Neural Networks U2
24 pages
Chi Merge
No ratings yet
Chi Merge
5 pages
Fundamentals of Data Science: Nehru Institute of Engineering and Technology
100% (1)
Fundamentals of Data Science: Nehru Institute of Engineering and Technology
17 pages
Unit 5 Fod (1) (Repaired)
No ratings yet
Unit 5 Fod (1) (Repaired)
28 pages
Data Literacy Questions All Types
No ratings yet
Data Literacy Questions All Types
2 pages
Data Science Lecture 1 Introduction
No ratings yet
Data Science Lecture 1 Introduction
27 pages
Ad3381 - Data Base Design and Management Manual
No ratings yet
Ad3381 - Data Base Design and Management Manual
56 pages
R Programming UNIT-1
No ratings yet
R Programming UNIT-1
48 pages
24cs3019-Data Analytics and Visualization
No ratings yet
24cs3019-Data Analytics and Visualization
2 pages
OOSE Lab Report
No ratings yet
OOSE Lab Report
30 pages
Unit-2 Solution
No ratings yet
Unit-2 Solution
22 pages
Data Science PPT PD41
100% (1)
Data Science PPT PD41
8 pages
Data Science Techniques Classification Regression and Clustering
No ratings yet
Data Science Techniques Classification Regression and Clustering
5 pages
FDS Lesson Plan
No ratings yet
FDS Lesson Plan
8 pages
FDS Iat-2 Part-B
No ratings yet
FDS Iat-2 Part-B
4 pages
Lecture Notes: Introduction To Data Science and Big Data
No ratings yet
Lecture Notes: Introduction To Data Science and Big Data
5 pages
Data Visualization Techniques
No ratings yet
Data Visualization Techniques
20 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
98 pages
Visual Analytics: Networks and Trees - Heat Map - Map Color and Other Channels - Manipulate View - Visual Attributes
No ratings yet
Visual Analytics: Networks and Trees - Heat Map - Map Color and Other Channels - Manipulate View - Visual Attributes
20 pages
BDA Unit 1
No ratings yet
BDA Unit 1
10 pages
Unit V Big Data Analytics
No ratings yet
Unit V Big Data Analytics
47 pages
SCT - QB - Anwers - p1
No ratings yet
SCT - QB - Anwers - p1
53 pages
Knowledge Representation Issue
No ratings yet
Knowledge Representation Issue
18 pages
B.tech 15CS326E Visualization Techniques
No ratings yet
B.tech 15CS326E Visualization Techniques
5 pages
SalesForce AI Specialist
No ratings yet
SalesForce AI Specialist
14 pages
Oral Com - q1 - Week 1 - The Nature, Function, and Process of Communication
100% (2)
Oral Com - q1 - Week 1 - The Nature, Function, and Process of Communication
2 pages
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
DM Important Questions
100% (1)
DM Important Questions
2 pages
BDA Unit 1-1
No ratings yet
BDA Unit 1-1
21 pages
Reading and Writing Set 2 Assgn
No ratings yet
Reading and Writing Set 2 Assgn
16 pages
5.1 Mining Data Streams
No ratings yet
5.1 Mining Data Streams
16 pages
Data Analytics (A) CS-503, B.Tech. 5 Semester Assignment Questions
0% (1)
Data Analytics (A) CS-503, B.Tech. 5 Semester Assignment Questions
2 pages
It6006 Data Analytics Syllabus
No ratings yet
It6006 Data Analytics Syllabus
1 page
Data Mining & Business Intelligence (2170715) : Unit-5 Concept Description and Association Rule Mining
No ratings yet
Data Mining & Business Intelligence (2170715) : Unit-5 Concept Description and Association Rule Mining
39 pages
Optimization Technique Course Objective
No ratings yet
Optimization Technique Course Objective
1 page
Student'S Learning Modules: St. Peter of Verona Academy Inc. Kristel Anne O. Cruz
No ratings yet
Student'S Learning Modules: St. Peter of Verona Academy Inc. Kristel Anne O. Cruz
6 pages
English 10 Quarter 1 Reviewer
100% (2)
English 10 Quarter 1 Reviewer
4 pages
Complete Bundle Lewiss MedicalSurgical Nursing Assessment and Management of Clinical Probles 12th Edition HQ File
100% (1)
Complete Bundle Lewiss MedicalSurgical Nursing Assessment and Management of Clinical Probles 12th Edition HQ File
403 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
4 pages
Technical Manual On Transport
No ratings yet
Technical Manual On Transport
455 pages
Methods For Evaluating Project Performance
No ratings yet
Methods For Evaluating Project Performance
10 pages
Fifotrack FIMS User Guide V1.4
No ratings yet
Fifotrack FIMS User Guide V1.4
46 pages
2 1 Cognitive Psychology1
100% (1)
2 1 Cognitive Psychology1
1 page
Types of Management Information Systems
No ratings yet
Types of Management Information Systems
2 pages
Term Paper With Pictures
100% (1)
Term Paper With Pictures
6 pages
Literature Review Chemistry
100% (2)
Literature Review Chemistry
4 pages
Salon System Thesis
100% (2)
Salon System Thesis
8 pages
Portal Administration
No ratings yet
Portal Administration
69 pages
OpenServ Whitepaper
No ratings yet
OpenServ Whitepaper
46 pages
Delivering Itsm For Business Maturity A Practical Framework
No ratings yet
Delivering Itsm For Business Maturity A Practical Framework
178 pages
Introduction To IT and Systems
No ratings yet
Introduction To IT and Systems
4 pages
D253 Task 2 Template
No ratings yet
D253 Task 2 Template
4 pages
CHED (Detailed Topic Explanation) Graviles, Glyziel Anne P.
No ratings yet
CHED (Detailed Topic Explanation) Graviles, Glyziel Anne P.
11 pages
Employee Confidential Agreement
No ratings yet
Employee Confidential Agreement
7 pages
True Birmingham IM - Knight Frank
No ratings yet
True Birmingham IM - Knight Frank
19 pages
Suction Unit Confirmation
No ratings yet
Suction Unit Confirmation
15 pages
(2017) AKROUT, DIALLO - Fundamental Transformations of Trust and Its Drivers - A Multi-Stage Approach
No ratings yet
(2017) AKROUT, DIALLO - Fundamental Transformations of Trust and Its Drivers - A Multi-Stage Approach
13 pages
The Mediated Planet
No ratings yet
The Mediated Planet
5 pages
SE GTU Study Material Presentations Unit-5 11082020081205AM
No ratings yet
SE GTU Study Material Presentations Unit-5 11082020081205AM
41 pages
Manual Técnico Hydronic
No ratings yet
Manual Técnico Hydronic
44 pages
Lesson - Informative Text On Giraffes Nadine
No ratings yet
Lesson - Informative Text On Giraffes Nadine
3 pages
Amazon Music: Careers Behind The Beats Note Catcher
No ratings yet
Amazon Music: Careers Behind The Beats Note Catcher
3 pages
TOK Essay - "Context Is All"
No ratings yet
TOK Essay - "Context Is All"
3 pages