The document discusses various data visualization techniques, including pixel-oriented, geometric projection, icon-based, and hierarchical methods, aimed at effectively communicating data through graphical representation. It highlights the challenges of visualizing high-dimensional data and introduces techniques like scatter plots, parallel coordinates, Chernoff faces, and tree-maps. Additionally, it emphasizes the growing interest in visualizing non-numeric data and the role of visualization in data mining processes.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
7 views28 pages
Unit 5-2
The document discusses various data visualization techniques, including pixel-oriented, geometric projection, icon-based, and hierarchical methods, aimed at effectively communicating data through graphical representation. It highlights the challenges of visualizing high-dimensional data and introduces techniques like scatter plots, parallel coordinates, Chernoff faces, and tree-maps. Additionally, it emphasizes the growing interest in visualizing non-numeric data and the role of visualization in data mining processes.
Projection Visualization Techniques, Icon-Based Visualization Techniques, Hierarchical Visualization Techniques, Visualizing Complex Data and Relations. Data visualization aims to communicate data clearly and effectively through graphical representation. Data visualization has been used extensively in many applications— for example, at work for reporting, managing business operations, and tracking progress of tasks. We discuss several representative approaches, including pixel- oriented techniques, geometric projection techniques, icon-based techniques, and hierarchical and graph-based techniques. A simple way to visualize the value of a dimension is to use a pixel where the color of the pixel reflects the dimension’s value. For a data set of m dimensions, pixel-oriented techniques create m windows on the screen, one for each dimension. The m dimension values of a record are mapped to m pixels at the corresponding positions in the windows. The colors of the pixels reflect the corresponding values. Inside a window, the data values are arranged in some global order shared by all windows. The order may be obtained by sorting all data records in a way that’s meaningful for the task at hand. AllElectronics maintains a customer information table, which consists of four dimensions: income, credit limit, transaction volume, and age. We can sort all customers in income-ascending order, and use this order to lay out the customer data in the four visualization windows. A drawback of pixel-oriented visualization techniques is that they cannot help us much in understanding the distribution of data in a multidimensional space. The central challenge the geometric projection techniques try to address is how to visualize a high-dimensional space on a 2-D display. A 3-D scatter plot uses three axes in a Cartesian coordinate system. If it also uses color, it can display up to 4-D data points. The scatter-plot matrix technique is a useful extension to the scatter plot. For an n-dimensional data set, a scatter-plot matrix is an n × n grid of 2-D scatter plots that provides a visualization of each dimension with every other dimension. Figure shows an example, which visualizes the Iris data set. The data set consists of 450 samples from each of three species of Iris flowers. There are five dimensions in the data set:length and width of sepal and petal, and species. The scatter-plot matrix becomes less effective as the dimensionality increases. Another popular technique, called parallel coordinates, can handle higher dimensionality. To visualize n-dimensional data points, the parallel coordinates technique draws n equally spaced axes, one for each dimension, parallel to one of the display axes. A data record is represented by a polygonal line that intersects each axis at the point corresponding to the associated dimension value Icon-based visualization techniques use small icons to represent multidimensional data values. We look at two popular icon-based techniques: Chernoff faces and stick figures. Chernoff faces were introduced in 1973 by statistician Herman Chernoff. They display multidimensional data of up to 18 variables (or dimensions) as a cartoon human face. Chernoff faces help reveal trends in the data. Components of the face, such as the eyes, ears, mouth, and nose, represent values of the dimensions by their shape, size, placement, and orientation. For example, dimensions can be mapped to the following facial characteristics: eye size, eye spacing, nose length, nose width, mouth curvature, mouth width, mouth openness, pupil size, eyebrow slant. Chernoff faces make use of the ability of the human mind to recognize small differences in facial characteristics and to assimilate many facial characteristics at once. The stick figure visualization technique maps multidimensional data to five-piece stick figures, where each figure has four limbs and a body Figure shows census data, where age and income are mapped to the display axes, and the remaining dimensions (gender, education, and so on) are mapped to stick figures. The visualization techniques discussed so far focus on visualizing multiple dimensions simultaneously. However, for a large data set of high dimensionality, it would be difficult to visualize all dimensions at the same time. Hierarchical visualization techniques partition all dimensions into subsets (i.e., subspaces). The subspaces are visualized in a hierarchical manner. “Worlds-within-Worlds,” also known as n-Vision, is a representative hierarchical visualization method. Suppose we want to visualize a 6-D data set, where the dimensions are F,X1,...,X5. As another example of hierarchical visualization methods, tree-maps display hierarchical data In early days, visualization techniques were mainly for numeric data. Recently, more and more non-numeric data, such as text and social networks, have become available. Visualizing and analyzing such data attracts a lot of interest. There are many new visualization techniques dedicated to these kinds of data. For example, many people on the Web tag various objects such as pictures, blog entries, and product reviews. A tag cloud is a visualization of statistics of user-generated tags. Often, in a tag cloud, tags are listed alphabetically or in a user-preferred order. The importance of a tag is indicated by font size or color. Figure uses a disease influence graph to visualize the correlations between diseases The nodes in the graph are diseases, and the size of each node is proportional to the prevalence of the corresponding disease. In summary, visualization provides effective tools to explore data. We have introduced several popular methods and the essential ideas behind them. There are many existing tools and methods. Moreover, visualization can be used in data mining in various aspects. In addition to visualizing data, visualization can be used to represent the data mining process, the patterns obtained from a mining method, and user interaction with the data. Visual data mining is an important research and development direction.