Computer Basis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Computer Application

STATISTICAL ANALYSIS, SPATIAL DATA


ANALYSIS
What is Statistical Analysis?

Statistics is the study of the collection, organization,


examination, summarization, manipulation,
interpretation and presentation of quantitative data. It
deals with all aspects of data including the planning of
data collection in terms of the design of surveys and
experiments.
Statistical analysis is a component of data
analytics. It involves collecting and scrutinizing every
data sample in a set of items from which samples can
be drawn to find information or fact or to discover
underlying pattern or trends.
R

R is a widely used environment for statistical analysis.


The striking difference between R and most other
statistical packages is that it is free software and that it
is maintained by scientists for scientists.
R project has gained many users and contributors who
continuously extend the capabilities of R by releasing
add–ons (packages) that offer new functions and
methods, or improve the existing ones.
R is a command line based language, all commands
are entered directly into the console.
R Data Type

As in other programming languages, there are


different data types available in R, namely
“numeric”, “character” and “logical”
As the name indicates, “numeric” is used for
numerical values (double precision).
The type “character” is used for characters and is
generally entered using quotation marks.
The data type “logical” is used for boolean
variables: (TRUE or T, and FALSE or F).
R Object Type

Depending on the structure of the data, R recognizes


4 standard object types: “vectors”, “matrices”, “data
frames” and “lists”.
Vectors are one–dimensional arrays of data;
matrices are two–dimensional data arrays. Data
frames and lists are further generalizations.
SPSS

SPSS is the acronym of Statistical Package for the


Social Science.
SPSS is a Windows based program that can be used
to perform data entry and analysis and to create
tables and graphs.
SPSS is capable of handling large amounts of data
and can perform all of the analyses covered in the
text and much more.
SPSS has four windows - Data editor; Output viewer;
Syntax editor; Script window.
SPSS Programme Functionality

SPSS has scores of statistical and mathematical functions,


scores statistical procedures, and a very flexible data handling
capability and it can read data in almost any format (e.g.,
numeric, alphanumeric, binary, dollar, date, time formats), and
version 6 onwards can read files created using spread
sheet/data base software.
The following is a brief overview of some of the functionalities
of SPSS: Data transformations, Data Examination, Descriptive
Statistics, Contingency tables, Reliability tests, Correlation, T-
tests, ANOVA, MANOVA, General Linear Model, Regression,
Nonlinear Regression, Logistic Regression, Loglinear
Regression, Discriminant Analysis, Factor Analysis, Cluster
analysis, Multidimensional scaling, Probit analysis,
Forecasting/Time Series, Survival analysis, Nonparametric
analysis, Graphics and graphical interface.
Spatial Data Analysis

Spatial data
 information about phenomena organized in a spatial frame
 the geographic frame
Methods applied to spatial data that
 add value
 reveal patterns and anomalies
 support decisions
Spatial Data – it is the data or information that
identifies the geographic location of features and
boundaries on Earth , such as natural or constructed
features, oceans , and more.
Spatial data is usually stored as coordinate and topology,
and is data that can be mapped.
Why Spatial Data Analysis?

What types of relationships exist between geographic


features, and how do we express them?
Properties of spatial features and/or relationships
between them: size, distribution, pattern, contiguity,
neighborhood, shape, scale, orientation
The purpose of geographic inquiry is to examine
relationships between geographic features collectively
and to use the relationships to describe the real-world
phenomena that map features represent.
Techniques that enable the representation,
description, measurement, comparison, and
generation of spatial patterns
How can we characterize Spatial Analysis
(what skills does it require)?

Spatial analysis is an artistic and a scientific


endeavor (what does this mean?)
 It requires knowledge of the problem and/or question to
be answered
 It requires knowledge about the data (how it was collected,
organized, coded, etc.)
 It requires knowledge of GIS capabilities
 It may require knowledge of statistical techniques
 It requires envisioning the results of any operation…and
the combination of any operations
 It is not completely objective, in fact some argue that it is
completely subjective
 Many times there is more than one way to derive
information that answers a question
Components of Spatial Analysis

Visualization
 Showing interesting patterns
Exploratory Spatial Data Analysis (ESDA)
 Finding interesting patterns
 Spatial Modeling, Regression
 Explaining interesting patterns
Spatial Data Types
 Vector and Raster
Raster Vs. Vector World
Vector Data

 Vector data provide a way to represent real world features


within the GIS environment. A vector feature has its shape
represented using geometry. The geometry is made up of
one or more interconnected vertices. A vertex describe a
position in space using an x, y and optionally z axis. In the
vector data model, features on the earth are represented as
 Points
 Lines/routes
 Polygons/regions
 Triangulated Irregular Networks (TINs)
Vector Data

 This system of recording features is based on the


interaction between arcs and nodes, represented by points,
lines and polygons. A point is a single node, a line is two
nodes with an arc between them, and a polygon is a closed
group of three or more arcs. With these three elements , it
is possible to record most all necessary information.
Points Lines Polygons
Vector data

 Advantages : Data can be represented at its original


resolution and form without generalization. Graphic output
is usually more aesthetically pleasing; Since most data, e.g.
hard copy maps, is in vector form no data conversion is
required. Accurate geographic location of data is
maintained.
 Disadvantages: The location of each vertex needs to be
stored explicitly. For effective analysis, vector data must be
converted into a topological structure. This is often
processing intensive and usually requires extensive data
cleaning. As well, topology is static, and any updating or
editing of the vector data requires re-building of the
topology.
Raster Data

 Raster Data – cell –based data such as aerial imagery and


digital elevation models. Raster data is characterized by
pixel values. Basically, a raster file is a giant table, where
each pixel is assigned a specific value from 0 to 255. The
meaning behind these values is specified by the user – they
can represent elevations, temperature, hydrology and etc.
Raster Data
 Advantages : The geographic location of each cell is implied by its
position in the cell matrix. Accordingly, other than an origin point,
e.g. bottom left corner, no geographic coordinates are stored. Due to
the nature of the data storage technique data analysis is usually easy
to program and quick to perform. The inherent nature of raster
maps, e.g. one attribute maps, is ideally suited for mathematical
modeling and quantitative analysis. Grid-cell systems are very
compatible with raster-based output devices, e.g. electrostatic
plotters, graphic terminals.
 Disadvantages: The cell size determines the resolution at which
the data is represented; It is especially difficult to adequately
represent linear features depending on the cell resolution.
Accordingly, network linkages are difficult to establish. Processing of
associated attribute data may be cumbersome if large amounts of
data exists. Raster maps inherently reflect only one attribute or
characteristic for an area. Most output maps from grid-cell systems
do not conform to high-quality cartographic needs.
GIS

A GIS is designed for the collection storage, and


analysis of objects and phenomena where geographic
location is an important characteristic or critical to
the analysis.
Computer tool for managing geographic feature
location data and data related to those features.
GIS is a tool for managing data about where features
are (geographic coordinate data) and what they are
like (attribute data), and for providing the ability to
query, manipulate, and analyze those data.
GIS: Overview

Just as we use a word processor to write documents and


deal with words on a computer, we can use a GIS
application to deal with spatial information on a
computer. GIS stands for ’Geographical Information
System’. A GIS consists of:
Digital Data - the geographical information that you will
view and analyze using computer hardware and software.
Computer Hardware - computers used for storing
data, displaying graphics and processing data.
Computer Software - computer programs that run on
the computer hardware and allow you to work with digital
data. A software program that forms part of the GIS is
called a GIS Application.
Quantum GIS (QGIS)

Developed in 2002 and undergone significant


development.
Open source, community driven geographic
information system.
Fully functioning desktop geographic information
system.
Features of QGIS

Importing data from multiple sources


Digitizing
Editing
On the fly reprojection
Geoprocessing
Database connectivity; and
Raster processing
Benefits of QGIS

Very user friendly


Simplified version of expensive GIS based software
Program negotiation is straight forward
Help tips for new users
Professional layout opportunities
Works on laptops
Challenges of GIS

How to characterize what is missing?


 error, accuracy, uncertainty
How to choose the best representation?
 confusing influences
How to support many data models in a
single software package
Weaknesses of GIS

There are too many possible data models


 special-purpose GIS
 lack of interoperability

Difficult to add data models retroactively


Thanks

Reference: Introduction to Computer Science and


Basic Programming by S. Jain. BPB Publications,
India.

You might also like