0% found this document useful (0 votes)

15 views16 pages

Unit-4 Data Exploration (E-Next - In)

Uploaded by

Natasha Sawant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views16 pages

Unit-4 Data Exploration (E-Next - In)

Uploaded by

Natasha Sawant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Prof.

Tirup Parmar

UNIT –IV
Data exploration:

Exploration, attribute data query, spatial data query, raster data query, geographic
visualization

Q.1. Explain the term 'Data Exploration' with its various

aspects
Data Exploration

 Statistician have traditionally used variety of graphic techniques and descriptive

statistics to examine data prior to more formal and structured data analysis.
 The windows operating system come up with multiple and dynamic link for windows,
has further assisted exploratory data analysis by a allowing the user to directly
manipulate data types in charts and diagrams.

Descriptive statistics

 Descriptive statistics summarizes the value of a dataset.

 They include the following

o Range: the difference between the minimum and maximum values.

o Median: the midpoint value, or the 50th percentile.

o First quartile: the 25th percentile.

o Third quartile: call the 75th percentile.

o Mean: the average of data values. The mean can be calculated by Σxi/n,=1 where xi is
the ith value and n is the number of values.

o Variance: the average of the squared deviations of each data value about the mean.
The variance can be calculated by

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 3

https://E-next.in
Prof. Tirup Parmar

o Standard deviation: the square root of the variance.

o Z score: a standardized score that can be computed by (x-mean)/s, where s is the

standard deviation.

Graphs
Different types of graphs are used for data exploration. A line graph displays data as a
line. The line graph example in figure 1 shows the rate of population change in the United
States along the y-axis and the state along the x-axis. Notice a couple of “peaks” in the line
graph.

Fig. 1 : A line graph Fig. 2 : A histogram

(bar chart)

A bar chart, also called a histogram, groups data into equal intervals and uses bars to
show the number of frequency of values falling within each class. A bar chart may have vertical
bars or horizontal bars. Figure 2 uses a vertical bar chart to group rates of population change in
the United States into six classes. Notice one bar at the high end of the histogram.

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 4

https://E-next.in
Prof. Tirup Parmar

A cumulative distribution graph is one type of

line graph that plots the ordered data values against
the cumulative distribution values. The cumulative
distribution value of the ith ordered value is typically
calculated as (i 0.5)/n, where n is the number of
values. This computational formula converts the values
of a data set to within the range of 0.0 to 1.0. Figure 3
show a cumulative distribution graph.

Fig. 3 : A cumulative distribution graph

Fig. 4 Fig. 5

Fig. 4 : A scatterplot plotting percent persons 18 years old in 2000 against percent population
change, 1990-2000. A weak-positive relationship is present scatterplot uses markings to plot
the values of two variables along the xand yaxes. Figure 4 plots percent population
change19902000 against percent persons under 18 years old in 2000 by state in the United
States. The scatterplot suggests a weak positive relationship between the two variables.

Fig. 5 : A bubble plot showing percent population change 1990-2000, percent persons under
18 years old in 2000, and state population Bubble plots are a variation of scatterplots. Instead
of using constant symbols as in a scatterplot, a bubble plot has varying sized bubbles that are

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 5

https://E-next.in
Prof. Tirup Parmar

made proportional to the value of a third variable. Figure 5 is a variation of Figure 4, the
additional variable shows by the bubble size is the state population ins 2000. As an illustration,
Figure 5 only shown states in the Mountain region, one of the nine regions defined by the U.S.
Census Bureau.

Boxplots, also called the “box and whisker” plots, summarized the distribution of five statistics
from a data set the minimum, first quartile, median, third quartile, and maximum. By examining
the position of the statistics in a boxplot, we can tell if the distribution of data values is
symmetric or skewed and if there are unusually data points (i.e. outliers). Figure 6 shows a
boxplot based on the rate of population change in the United States. This data set is clearly
skewed toward the higher end. Figure 7 summarizes three basic types of data sets in terms of
distribution of data values. Boxplots are therefore useful for comparisons between different
data sets.

Fig. 7 : Boxplot (a) suggests that the data

Fig. 6 : A boxplot based on the percent values follow a normal distribution. Boxplot (b)
population change 1990-2000 data set. shows a positively skewed distribution with a
higher concentration of data values near the
high end. The x’s in (b) may represent outliers,
which are more than 1.5 box lengths from the
end of the box. Boxplot (c) shows a

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 6

https://E-next.in
Prof. Tirup Parmar

Fig. 8 : A QQ plot plotting percent population change, 1990-2000 against the standardized
value from a normal distribution.

Some graphs are more specialized. Quantile-quantile plots, also called QQ plots,
compare the cumulative distribution of a data set with that of some theoretical distribution
such as the normal distribution, a bell-shaped frequency distribution.
The points in a QQ plot fall along a straight line if the data set follows the theorectical
distribution. Figure 8 plots the rate of population change against the standardized value from a
normal distribution. It shows that the data set is not normally distributed. The main departure
occurs at the two highest values, which are also highlighted in previous graphs.

Fig.9 : A 3D plot showing annual precipitation at 105 weather stations in Idaho. A north-
tosouth decreasing trend is apparent in the plot.

Some graphs are designed for spatial data. Figure 9, for example, shows a plot of spatial data
values by raising a bar at each point location so that the height of the bar is proportionate to its
value. This kind of plot allows the user to see the general trends among the data values in both
the x-dimension (east-west) and y-dimension (north-south).

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 7

https://E-next.in
Prof. Tirup Parmar

Dynamic Graphics
When graphs are displayed in multiple and dynamically linked windows, they become dynamic
graphs. We can directly manipulate data points in dynamic graphs. For example, we can pose a
query in one windows and get the response in other windows, all in the same visual field. By
viewing selected data points highlighted in multiple windows, we can hypothesize any patterns
or relationships that may exist in the data. This is why multiple linked views have been
described as the optimal framework for posing queries about data (Buja et al. 1996).

Fig. 10 : The scatterplot on the left is dynamically linked to the map on the right. The
“brushing” of two data points in the scatterplot highlights the corresponding states
(Washington and New Mexico) on the map.

A common method for manipulating dynamic graphs is brushing, which allows the user to
graphically select a subset of points from a scatter plot and views related data points in other
graphics (Backer and Cleveland, 1987). Brushing can be extended to maps (Monmonier 1989).
Figure 10 illustrates a brushing example that links a scatter plot and a map. May GIS packages
including ArcGIS have implemented brushing in the graphical user interface?

Q.2. Explain the term 'Attribute Data Query' with its various
aspects.
(A) Attribute Data Query
 Attribute data query retrieve a data subset by working with attribute data.
 The selected data subset can be simultaneously examined in the table, displayed in
chart and linked to the highlighted features in the map.
 The selected data subset can also be set for further processing.

SQL (Structured Query Language)

 SQL is a data query language design for relational databases.

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 8

https://E-next.in
Prof. Tirup Parmar

 To use SQL to access database, we must follow the structure i.e. syntax of the query
language
 The basic syntax of SQL
select <attributes list >
from <relation>
where <condition>
 The select keyword selects field from the database, the from keyword selects tables
from the database, and the where keyword specifies the condition or criteria for data
query.
 We are considering the following Table A

 Let us take an example. Suppose we have to find the sale date of the parcel coded
P101.:
Select Parcel.Sale_Date
From Parcel
Where Parcel.Pin=’P101’

 The prefix for Parcel in Parcel.Sale_Date and Parcel. Pin indicates that the fields are from
the parcel table.
 Suppose we have to find parcels that are greater than 2 acres and are zoned commercial
:
Select Parcel.Pin
From Parcel
Where Parcel.Acres>2 And
Parcel.Zone_code=2

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 9

https://E-next.in
Prof. Tirup Parmar

Type of operations:-
 Attribute data query begins with a complete data set.
 A basic query operation is to select a subset and divide the data set into 2 groups one
containing selected records and the other unselected records.

 The different types of operations allow greater flexibility in data query for example,
instead of using an expression of Parcel.Acres > 2 and Parcel.Zone_code =2, we can first
use Parcel.Acres > 2 to select a subset and then use Parcel.Zone_Code =2 to select a
subset from the previously selected subset. Examples of query operations

 Example 1
Q. Caller select a data subset and then add more records to it?

[ Create a new selection ] “cost” >=5 AND “soiltype”= ‘Ns1’

Output: 0 of 10 records selected

[ Add to current selection ] “soiltype= ‘N3’

Output: - 3 of 10 records selected.

 Example 2:

Q. Select a data subset and then switch selection?

[Create a new selection] “cost”>8 OR “area>=400

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 10

https://E-next.in
Prof. Tirup Parmar

Output: 2 of the 10 records selected.

[Switch selection]

Output: 8 of 10 records selected

 Example 3 :

Select a Data subset and then switch select a smaller subset from it?

[Create a new selection] “cost” > 8 Or “area”>400

4 of 10 records selected

Q.3. Explain how an attribute query is executed on a

relational GIS database.
Relational database query
 Relational database query works with a relational database, which may consist of many
separate but interrelated tables.
 A query of a table in a relational database not only selects a data subset in the table but
also selects records related to the subset in other tables.
 This feature is desirable in data exploration because it allows the user to examine
related table characteristics from multiple tables.
 To use a relational database, we must be familiar with the overall structure of the
database, the designation of keys in related tables, and a data dictionary listing and
describing the fields in each table.
 For data query in 2 or more tables, we can choose to either join or relate the tables.
 A join operation combines attribute data from two or more tables into a single table.
 A relate operation dynamically links the tables but keeps the tables separate.
 A relate operation on the other hand can be used with all four types of relationship
 When a record in one table is selected the league will automatically select and highlight
the corresponding record or records in the related tables.
 Join operation is appropriate for the one to one or many to one relationship but
inappropriate for the one to many or many to many relationship.

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 11

https://E-next.in
Prof. Tirup Parmar

Q.4. Explain the term 'Spatial Data Query' with all its aspects.
Spatial data query
 Spatial data query refers to the process of retrieving data subset from a layer by working
directly with features.
 We may select features using cursor, a graphic, or the spatial relationship between
features.

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 12

https://E-next.in
Prof. Tirup Parmar

 As the geometric interface to the database, spatial data query complements attribute
data query in data exploration.
 Similar to attribute data query, the results of spatial data query can be simultaneously
inspected in the map, linked to the highlighted records in the table and displayed in
charts.
 So they can also be saved as a new data set for further processing.
 Feature selection by cursor the simplest spatial data query is to select a feature by
pointing at it or to select features by dragging a box around them.

Feature Selection by Cursor

 The simplest spatial data query is to select a feature by pointing at it or to select
features by dragging a box around them.

Feature selection by graphic

 The query method uses a graphic such as a circle, a box, line or a polygon to select
features the fall inside or what are intersected by the graphic object.
 We can draw the graphics for selection by using the mouse pointer.
 Example of query by graphic include selecting restaurants within 1 mile radius of a hotel,
selecting land parcel that intersect a proposed highway.

Selection by spatial relationship

 This query method select feature based on their spatial relationships to other features.
 Features to be selected may be in the same layer as features for selection. Or more
commonly they are in different layer.
 Spatial relationships used for query include the following:
o Containment selects features that fall completely within features for selection.
Examples include finding schools within a selected county, and finding state parks within
a selected state.

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 13

https://E-next.in
Prof. Tirup Parmar

o Intersect selects features that intersect features for selection. Examples include
selecting land parcels that intersect a proposed road, and finding urban areas that
intersect an active fault line.
o Proximity—selects features that are within a specified distance of features for
selection. Examples include finding state parks within 10 miles of an interstate highway,
and finding pet shops within 1 mile of selected streets. If features to be selected and
features for selection share common boundaries and if the specified distance is 0, then
proximity becomes adjacency. Examples of spatial adjacency include selecting land
parcels that are adjacent to a flood zone, and finding vacant lots that are adjacent to a
new theme park.

Combining Attribute and Spatial Data Queries :

So far we have approached data exploration through attribute data query or spatial data query.
In many cases data exploration requires both types of queries. For example, both are needed to
find gas stations that are within 1 mile of a freeway exit in southern. California and have an
annual revenue exceeding $2 million each. Assuming that the layers of gas stations and freeway
exits are available, there are at least two ways to answer the question.

1. Locate all freeway exits in the study area, and draw a circle around each exit with a 1
mile radius. Select gas stations within the circles through spatial data query. Then use
attribute data query to find gas stations that have annual revenues exceeding $2 million.
2. Locate all gas stations in the study area, and select those stations with annual revenues
exceeding $2 million through attribute data query. Next, use spatial data query to
narrow the selection of gas stations to those within 1 mile of a freeway exit.

The first option queries spatial data and then attribute data. The process is reversed with
the second option. Assuming that there are many more gas stations than freeway exits, the first
option may be a better option, especially if the gas station map must be linked to other
attribute tables for getting the revenue data.

The Combination Of spatial and attribute data queries opens wide the possibilities of data
exploration. Some GIS users might even consider this kind of data exploration to be data
analysis because that is what they need to do to solve most of their routine tasks.

Q.5. Explain the term 'Raster Data Query' with all its aspects.
Raster Data Query
Although the concept and even some methods for data query are basically the same for
both raster data and vector data, there are enough practical differences to warrant a separate
section on raster data query.

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 14

https://E-next.in
Prof. Tirup Parmar

Query by cell value

 The cell value in a raster typically represents a specific attribute value (example land use
type, elevation, value etc.) at the cell location.
 Therefore the operand in raster data query is the raster itself rather than a field as in
the case of vector data query.
 Raster data query uses a boolean statement separate cells that satisfy the query
statement from cells that do not.
 The expression, [road=1] queries a road raster that has the cell value of 1.
 The operand [road] refers to the raster and the operand refers to a cell value, which
may represent the interstate category.
 This next expression, [elevation]>1243.06 queries the floating point elevation raster that
has the cell value greater than 1243.06.
 Because a floating point elevation raster contains continuous values, querying a specific
value is not likely find any cell in the raster.
 Raster data query can also use the Boolean connectors of AND, OR and NOT to string
together separate expressions.
 A compound statement with separate expressions usually applies to multiple rasters
which may be integer or floating point or a mix of both types.
 For example, the statement, ([slope] =2) and ([aspect] =1), selects cells that have the
value of 2 in the slope raster and 1 in the aspect raster.
 Those cells that satisfy the statement have the cell value of 1 on the output, while other
cells have the cell value of 0.
 Figure l

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 15

https://E-next.in
Prof. Tirup Parmar

Query by select feature:

 We can query a raster by using features such as points, circles, boxes or polygons.
 The query returns an output raster with value for cells that corresponds to the point
locations or fall within the features for selection.

Q.6. Explain Geographic Visualization and its various

techniques.
Map based data manipulation
 maps are an important part of GIS operations, including data exploration

Data Classification
 Data classification can be a tool for data exploration, spatially if the classification is
based on descriptive statistics.
 Suppose we want to explore rate of unemployment by state in the United States.
 To get a preliminary look at the data, we may place rate of unemployment into classes
of above and below the national average. Figure J.a.
 Although generalized, the map divides the country into contiguous regions, which may
suggest some regional factors for explaining employment.
 To isolate does states that are way above or below the national average, we can classify
rate of unemployment by using the mean and standard deviation method figure J.b .
 We can now focus our attention on states that are, for example, more than one
standard deviation above the mean.
 Classified maps can be linked with tables, graph and statistics for more data exploration
activities. For example, we can link the maps in figure J with a table showing % change in
median household income and find out whether states that have lower unemployment
rates tend to have higher rates of income growth and vice versa.

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 16

https://E-next.in
Prof. Tirup Parmar

Spatial aggregation
 Spatial aggregation is functional is similar to data classification except that it groups data
spatially.
 Figure K shows % population change in the United States by state and by region.
 Used by the U.S. Census Bureau for data collection, regions are Spatial aggregate of
states stop as shown in figure K.b , a map by region is a more general view of population
growth in the country than a map by state does.

Map Comparison
 Map comparison can help a GIS user sort out the relationship between different maps.
For example, the display of wildlife locations on a vegetation layer may reveal the
association between the wildlife species and the distribution of vegetation covers.
 If the maps to be compared consist of only point or line features, they can be coded in
different colors and superimposed on one another in a single view. But this process
becomes difficult if they include polygon features or raster data. One option is to use
transparency as a visual variable.
 A Semi transparent layer allows another layer to show through. For example, to
compare two raster layers, we can display one layer in a color scheme and the other in
semitransparent shades of gray.
 The gray shades simply darken the color symbols and do not produce confusing color
mixtures. Another example is to use transparency for displaying temporal changes such
as land cover change between 1990 and 2000. Because one layer is semitransparent, we
can follow the areal extent of a land cover type from both years. But it is difficult to
apply transparency to more than two layers.
 There are three other options for comparing polygon or raster layers. The first option is
to place all polygon and raster layers, along with other point and line layers, onto the
screen but to turn on and off polygon and raster layers so that only one of them is
viewed at a time. Used by many websites for interactive mapping, this option is
designed for casual users.

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 17

https://E-next.in
Prof. Tirup Parmar

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 18

https://E-next.in

DWDM LS2 Fall 24 25
No ratings yet
DWDM LS2 Fall 24 25
42 pages
Module 1
No ratings yet
Module 1
64 pages
02 Data
No ratings yet
02 Data
66 pages
Lec 2
No ratings yet
Lec 2
26 pages
Data Mining 2
No ratings yet
Data Mining 2
64 pages
Lec.02 Getting To Know Your Data
No ratings yet
Lec.02 Getting To Know Your Data
62 pages
Chapter 2
No ratings yet
Chapter 2
65 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
54 pages
VIPDMTheory Chapter 2
No ratings yet
VIPDMTheory Chapter 2
56 pages
Data Analysts-1
No ratings yet
Data Analysts-1
65 pages
Data Mining: Data Exploration: - Chapter 6
No ratings yet
Data Mining: Data Exploration: - Chapter 6
56 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
65 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
65 pages
Unit1 Statistics
No ratings yet
Unit1 Statistics
60 pages
Week 02.1 Chaptr002
No ratings yet
Week 02.1 Chaptr002
29 pages
Data Mining:: Concepts and Techniques
100% (1)
Data Mining:: Concepts and Techniques
63 pages
02 Kinds of Data
No ratings yet
02 Kinds of Data
41 pages
Data Type, Data Chart, Descriptive Statistics
No ratings yet
Data Type, Data Chart, Descriptive Statistics
65 pages
02 Data
No ratings yet
02 Data
24 pages
02 Data
No ratings yet
02 Data
62 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
29 pages
02 Data
No ratings yet
02 Data
65 pages
02 Data
No ratings yet
02 Data
64 pages
Lect 3
No ratings yet
Lect 3
51 pages
Chapter 2 - Understand Data
No ratings yet
Chapter 2 - Understand Data
63 pages
CS 591.03 Introduction To Data Mining Instructor: Abdullah Mueen
No ratings yet
CS 591.03 Introduction To Data Mining Instructor: Abdullah Mueen
52 pages
Chapter 2
No ratings yet
Chapter 2
53 pages
Unit 01 Statistics
No ratings yet
Unit 01 Statistics
10 pages
02 Data
No ratings yet
02 Data
42 pages
IT326 - Ch2
No ratings yet
IT326 - Ch2
44 pages
Slide-04-Chapter2-Getting To Know Your Data
No ratings yet
Slide-04-Chapter2-Getting To Know Your Data
47 pages
Chapter 2 - Tagged
No ratings yet
Chapter 2 - Tagged
66 pages
1 L2 Intro DAM
No ratings yet
1 L2 Intro DAM
27 pages
Getting To Know Your Data
No ratings yet
Getting To Know Your Data
78 pages
02 Data
No ratings yet
02 Data
65 pages
Data Science Four Marks Qa
No ratings yet
Data Science Four Marks Qa
4 pages
Visualization
No ratings yet
Visualization
24 pages
E-Book On Essentials of Business Analytics: Group 7
No ratings yet
E-Book On Essentials of Business Analytics: Group 7
6 pages
02 Data
No ratings yet
02 Data
41 pages
02data Edited v2
No ratings yet
02data Edited v2
43 pages
It B.tech II Year II Sem DV (R18a0555)
No ratings yet
It B.tech II Year II Sem DV (R18a0555)
73 pages
Lectur 4 Basic Statistical Descriptions of Data
No ratings yet
Lectur 4 Basic Statistical Descriptions of Data
44 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
36 pages
Data Mining 1
No ratings yet
Data Mining 1
29 pages
02data (Compatibility Mode)
No ratings yet
02data (Compatibility Mode)
11 pages
Visualization
No ratings yet
Visualization
27 pages
DM Unit-1-1
No ratings yet
DM Unit-1-1
56 pages
Lecture 2
No ratings yet
Lecture 2
62 pages
CH 2
No ratings yet
CH 2
68 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
46 pages
Transportation Data Mining: Chapter 2. Getting To Know Your Data
No ratings yet
Transportation Data Mining: Chapter 2. Getting To Know Your Data
77 pages
Data Preprocessing Data Basics
No ratings yet
Data Preprocessing Data Basics
86 pages
Unit 3 Data Preprocessing - Data
No ratings yet
Unit 3 Data Preprocessing - Data
90 pages
Data Analytics Summary
No ratings yet
Data Analytics Summary
80 pages
02 KnowYourData
No ratings yet
02 KnowYourData
44 pages
02a EDA and Data Visualization
No ratings yet
02a EDA and Data Visualization
79 pages
DM Introduction
No ratings yet
DM Introduction
50 pages
SERDES
No ratings yet
SERDES
47 pages
Project-Timeline Ms
No ratings yet
Project-Timeline Ms
3 pages
How To Send Money Without Debit Card On Cash App - Google Search
No ratings yet
How To Send Money Without Debit Card On Cash App - Google Search
1 page
Brocade 300 8 GB Fibre Channel Switch Up To 24 Ports: Issue
No ratings yet
Brocade 300 8 GB Fibre Channel Switch Up To 24 Ports: Issue
3 pages
Introduction of Structured Query Language: SQL Practical File
No ratings yet
Introduction of Structured Query Language: SQL Practical File
18 pages
Dinesh 2 1
No ratings yet
Dinesh 2 1
2 pages
Quarashi Network Whitepaper
No ratings yet
Quarashi Network Whitepaper
16 pages
DMR Conventional Mobile Radio - Clarity Transmission - Application Notes - R1.0
No ratings yet
DMR Conventional Mobile Radio - Clarity Transmission - Application Notes - R1.0
13 pages
Fleeti Presentation
No ratings yet
Fleeti Presentation
14 pages
Nokia V Sim
No ratings yet
Nokia V Sim
114 pages
Cecilia Asabre CV1
No ratings yet
Cecilia Asabre CV1
3 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
Fifth Generation: List Processing: LISP
No ratings yet
Fifth Generation: List Processing: LISP
7 pages
Eproc Tenders
No ratings yet
Eproc Tenders
104 pages
Course Design Dressmaking NCII
No ratings yet
Course Design Dressmaking NCII
81 pages
GVP - CEMS Internship
No ratings yet
GVP - CEMS Internship
2 pages
Bcac-101 - Arka Dey
No ratings yet
Bcac-101 - Arka Dey
26 pages
Swann Catalog 2015
0% (1)
Swann Catalog 2015
20 pages
HF-3 Instruction Manual
No ratings yet
HF-3 Instruction Manual
11 pages
Fds Front Pages
No ratings yet
Fds Front Pages
7 pages
Swot Template Thomason
No ratings yet
Swot Template Thomason
16 pages
100 HRS New - syllabus-ITT
No ratings yet
100 HRS New - syllabus-ITT
11 pages
Protecting Personal Data in Epidemiological Research: Datashield and Uk Law
No ratings yet
Protecting Personal Data in Epidemiological Research: Datashield and Uk Law
9 pages
Shared Printer
No ratings yet
Shared Printer
3 pages
List of Drawing Instruments Equipments and Materials
No ratings yet
List of Drawing Instruments Equipments and Materials
16 pages
Brochure SRT 4930 - en
No ratings yet
Brochure SRT 4930 - en
2 pages
Aec3012 4001
No ratings yet
Aec3012 4001
17 pages
Internet and Multimedia
No ratings yet
Internet and Multimedia
6 pages
Relational Database and SQL
No ratings yet
Relational Database and SQL
35 pages
Complete Project
No ratings yet
Complete Project
43 pages

Unit-4 Data Exploration (E-Next - In)

Uploaded by

Unit-4 Data Exploration (E-Next - In)

Uploaded by

Prof.

Q.1. Explain the term 'Data Exploration' with its various

 Statistician have traditionally used variety of graphic techniques and descriptive

 Descriptive statistics summarizes the value of a dataset.

o Range: the difference between the minimum and maximum values.

o Median: the midpoint value, or the 50th percentile.

o First quartile: the 25th percentile.

o Third quartile: call the 75th percentile.

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 3

o Standard deviation: the square root of the variance.

o Z score: a standardized score that can be computed by (x-mean)/s, where s is the

Fig. 1 : A line graph Fig. 2 : A histogram

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 4

A cumulative distribution graph is one type of

Fig. 3 : A cumulative distribution graph

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 5

Fig. 7 : Boxplot (a) suggests that the data

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 6

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 7

SQL (Structured Query Language)

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 8

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 9

[ Create a new selection ] “cost” >=5 AND “soiltype”= ‘Ns1’

Output: 0 of 10 records selected

[ Add to current selection ] “soiltype= ‘N3’

Output: - 3 of 10 records selected.

Q. Select a data subset and then switch selection?

[Create a new selection] “cost”>8 OR “area>=400

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 10

Output: 2 of the 10 records selected.

Output: 8 of 10 records selected

[Create a new selection] “cost” > 8 Or “area”>400

Q.3. Explain how an attribute query is executed on a

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 11

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 12

Feature Selection by Cursor

Feature selection by graphic

Selection by spatial relationship

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 13

Combining Attribute and Spatial Data Queries :

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 14

Query by cell value

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 15

Query by select feature:

Q.6. Explain Geographic Visualization and its various

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 16

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 17

Video lectures:- https://www.youtube.com/c/TirupParmar & Notes :- https://t.me/bscit Page 18

You might also like