Planning An Enterprise Geodatabase Solution
Planning An Enterprise Geodatabase Solution
GeoDatabase Solution
Robert
Robert Kircher,
Kircher, Chris
Chris Cushenbery
Cushenbery
Enterprise
Enterprise Implementation
Implementation Services
Services and
and Product
Product Development
Development
Agenda
Stack ArcObjects
ArcSDE API
Enterprise GeoDatabase
Components and Internals T C P /IP
S oc kets
ArcSDE/DBMS
Server
SQL SQL
ArcSDE
GeoDatabase
System Tables
SQL System Tables
Domains
Locks Disconnected
Spatial Feature
Editing
Columns Raster Geometric Datasets
Networks
Logs Topology
Rules
Geocoding Versioning Subtypes
Feature
Classes
S p a tia l F e tc h in g
P ro c e s s in g In s e rtin g
M od if yin g
S ea rc h e s
Complete Plan
Work Work
Work and Job Management
Start Schedule Report
Monitor Work Work
Work Work
Analysis
QA/QC Load Extract
Product
Editors Data Data
• Large Solutions
– Consultant team, dedicated project management
– Custom applications
– Large user base
– Elaborate, large databases
– Multi phased approach
• Workgroup Solutions
– Built in house, part time project management
– As much COTS functionality as possible
– Evolve the Geodatabase, gradually move old to
new
• Mixed mode of old and new can work here (old
ArcInfo accessing a GDB)
• Plan by datasets
– Start with a key data/dataset, and evolve the
functionality around it (great way to quickly get
traction, momentum, and experience)
• Plan by functionality
– Pick discrete parts of functionality, prioritize it (most
bang, simplest first), and build it
• Plan by resources (budget, timeframe, and
staffing)
– Build what you can get …
• Data Availability
– Just business hours? 24x7? Epic “Five 9’s”?
– Automated fail-over options? Manual recovery?
• Data Recovery
– Recovery options? DBMS backups and utilities?
Various export and import tools?
– Recover responsiveness?
• Data Access
– Accessing the database? Intranet? Internet?
Mobile scenarios?
– Means of access? Interfaces? ArcObjects?
ArcMap and ArcCatalog? SQL? ArcSDE C or Java
APIs?
– Interface maintenance? Versions compatibility?
Major components -- cont’d
• Data monitoring
– Who uses my data? How much?
– Most active data?
– Database usage? Common queries? Operations?
Optimized accordingly?
– Alerts? Notifications? System crashes? Slow
transactions? Security breaches?
• Data infrastructure
– Hardware? Software? DBMS? Compatibility?
– Centralize it? Distribute it?
– Licensing issues? (notice how I gave this an
exclusive bullet)
– Other services (recovery, availability, replication,
etc.)?
– Development resources?
– Configuration control?
Major components -- cont’d
• Data replication
– Replicate? Why? Recovery? Performance?
Mobility? Accessibility?
– Architecture options?
• Data distribution
– Sharing data? Internet (ArcIMS)? Flat files
(shapefiles)? Automated replication? Direct
connection to ArcSDE? Disconnected edits?
– What data? How much?
• Data security
– Users?
– Roles?
Let’s Briefly Look at Each Component …
Data Modeling
Data Modeling
• Essential Tasks
– Gather requirements
• Data Products
• Map and Visualization Products
• Analysis and Decision Support Products
• Maintenance and Editing Needs
• Metadata (editing and product)
• Spatial and Business Data Integration
– Analysis and Design
• Create conceptual data model
– Identify data, metadata, specification, relationships
• Create physical data model, UML
– Identify GDB feature datasets, classes, relationships, domains,
subtypes, geometric networks, etc.
Data Modeling
• Essential Tasks
– Conceptual Modeling
• Document what will be in the spatial database, how the
data will be maintained, interact, and be published
(conceptual modeling deliverables)
– Physical Modeling
• Document physical data model in a UML and code
• Build physical model into Geodatabase instance.
Data Modeling
• Key Deliverables
– Requirements documentation
– UML based data model or script/code generated data
model
• Challenges and Risks
– Application development has critical dependency on
the modeling deliverables
– Normalization balance (over versus de-normalized)
– Changing the model downstream, propagating schema
changes
– Thorough review of model among publication,
maintenance, and vendor teams
– Optimized for both publication and maintenance needs
• ESRI Resources
– ESRI library of essential, industry data models
Data Procurement and Loading
Data Procurement and Loading
• Building a “one off” loading system for the initial load of the
Geodatabase
• Essential Tasks
– Requirements
• Identify data requirements (spatial, business)
• Identify conversion and translations requirements
• Identify data staging needs
• Identify conversion automation
– Analysis and Design
• Discover data sources (formats, internal, commercial)
• Define conversion toolset
– Simple, object data loaders
– Model Builder, Geoprocessing, Interoperability extension
– Custom data loader
– ArcSDE data loaders (very simple data)
• Create automation
• Discover anticipated data volumes for storage and DBMS
sizing (number, size of features)
• Define toolset and conversion methods
• Discover special order to loading
• Create strategies for big data
• Define data verification/qc tools and methods
Data Procurement and Loading cont’d
• Implementation
– Build data loading system for development and deployment
• Procure data
• Stage data
• Load data
• View/use/verify data
• Key Deliverables
– Requirements documents
– Create toolset
– Create loading automation
– Pilot project (full cycle)
• Identify valid data subset
• procure data
• stage it
• load it
• Use and verify it
Data Procurement and Loading cont’d
• Essential Tasks
– Requirements
• Identify maintenance workflow requirements (who, what,
when, collaboration, automation, objects, batch
automation, etc.)
• Identify anticipated data volatility
– Analysis and Design
• Define editing workflows (simple, cyclical, elaborate,
optimization opportunities beyond current workflow)
• Define reconcile, post, compress regimes
• Define versioning structure
• Define specifics about edit volumes, version durations,
etc. that impact performance
Data Maintenance cont’d
– Implementation
• Configure versions in workflow or develop them
into application
• Build administration toolset (DBMS, GDB,
compress, version folding, etc.)
Data Performance
Data Performance and Scalability(QoS)
• Essential Tasks
– Review anticipated data loads
• Volume (data file growth management)
• Extent characteristics (spatial index tuning)
• Volatility (storage partitioning)
– Identify key business transactions
• Maintenance operations
• Publication operations
– Identify QoS requirements for key business transactions
• response time
• Initial and scheduled user loads
• throughput
Data Performance (QoS)
• Deliverables
– Document requirements
– Execute performance, analyze, optimize iterations
– Tuning DBMS, tuning application
– Scaling strategy
• Scale out vs up
• Challenges and Risks
– Sizing the spatial index optimally
– Data too granular
• Group features
– Overloading your application
• Overloading application table of contents
• Building batch-like operations into application
– ... many others (please attend the performance related GDB
sessions at the conference. This is an important topic).
Data Access
Data Access
• Essential Tasks
– Review DBMS and ArcSDE interfaces
• DBMS: JDBC, SQL, OLEDB/ADO, ODBC, DBMS
specific APIs
• ArcSDE: OLEDB/ArcObjects, C and Java API, SQL
– Identify database access and interface needs
• Mobile needs
• Development environment (SQL, ArcObjects,
MapObjects, C or Java API)
• Direct connect, traditional 3 tier, multi tier ArcSDE config
Data Access
• Essential Tasks
– Identify non-GIS application needs
• GIS attribute data
• Business reports based on GIS data or
processing
– Define and configure the application interfaces
based on application needs
• Network configuration (host and ports)
• Client libraries (e.g. SQLNet, Java libs, ArcSDE
client libs, etc.)
Data Access cont’d
• Deliverables
– Document requirements and design
– Clients correctly configured to access ArcSDE
instance
• Challenges and Risks
– Compatibility issues
– Interface limitations (no interface can do all
business and GIS application operations)
Data Monitoring
Data Monitoring
• Essential Tasks
– Identify alerting and notification needs
• Functional related alerts
– DBMS crash
– Query activity
• Performance related alerts
– System usage
– System loads
– SQL query loads
• Notification tools and infrastructure
– Email
– Telephony
Data Monitoring
• Essential Tasks
– Review DBMS authentication schemes
• Integrated with OS and network domain
security
• Standard DBMS security
• Mixed mode
• Users and roles
– Identify anticipated users (GIS and business
applications), and accessible objects (spreadsheet
sitting here)
Data Security
• Essential Tasks
– Requirements
• Identify hardware and software requirements based functional
and system needs
– Development and test
– Production
– Licensing
– System capacity and growth
– Storage needs
– Host CPU, RAM
– Network bandwidth
Data Infrastructure
– Implementation
• Create ArcSDE and DBMS instances
– Development, Test, and Deployment
Data Infrastructure cont’d
• Deliverables
– Configured development and test environment
– Configured production environment
– Create and enforce configuration control plan
• Challenges and Risks
– Creating nimble development and test
environments (quickly change)
– Controlled and predictable configuration control
– Difficult to size and plan for capacity early on
(there are mitigation tactics for this, scale out/up)
Data Recovery
Data Recovery
• Essential Tasks
– Review and understand recovery options
• DBMS backup resources
• DBMS import/export
• 3rd party DBMS solutions
• ArcToolbox conversion tools (various flat files)
• ArcSDE sdeimport/sdeexport
• GDB copy/paste
• Disconnected editing
• Incremental vs full backups
Data Recovery
• Deliverables
– Document requirements and design
– Document recovery procedures
• Challenges and Risks
– Not ensuring backup actually works, testing it
regularly.
– Time constraints will not allow full backup, only
deltas
– Stringing together backup changes
– Disk size limitations
– Automation fraught with complications
– Recovering application specific data, configuration
files
Data Replication
Data Replication
• Essential Tasks
– Review and understand common replication
configurations
• Snapshot
• Multi-master/merge
• Transactional
• Hybrid
– Review replication options
• Device level, OS level, DBMS level
• 3rd party solutions
• ArcGIS solutions (disconnected editing,
extracts, distributed replication, archiving)
Data Replication
• Essential Tasks
– Requirements
• Identify replication uses and benefits
– Performance/load balancing
– Mobility
– Recovery
– Availability
– Network load reduction (be careful here)
• Identify data to be replicated
• Identify QoS requirements
– how fast should changes replicate?
– how frequent is acceptable?
Data Replication cont’d
• Deliverables
– Document requirements and design
– Full cycle of prototyping
• Procure and configure replication
software/hardware
• Build master database
• Modify data, and measure success and
performance of replica
– Configured and tested replication system
Data Replication cont’d
• Essential Tasks
– Analysis and Design
• Define availability architecture
– Implementation
• Configure availability architecture
– Test
• Test availability failure scenarios against QoS
requirements
Data Availability (QoS) cont’d
• Deliverables
– Document requirements and design
– Design, procure, and configure availability
architecture
– Testing
• Challenges and Risks
– Identifying and mitigating
• single-point of failures in your data system
• developing redundant systems
– clustering may not be enough
– Not consistently testing system
Data Distribution
Data Distribution
• Essential Tasks
– Review and understand distribution options
– GDB to GDB, GDB to shapefile, GDB to
coverage
– Disconnected editing, distributed replication
– DBMS export file
– ArcIMS, ArcGIS Server
– Replication services (see replication)
– Requirements
• Identify data consumers
– ArcGIS applications
– Custom ArcObject app
Data Distribution
• Essential Tasks
– Requirements
• Identify delivery needs
– Response time and throughput
– Data volume
– Distribution medium (network, DVD, etc.)
• Identify distribution infrastructure requirements
– Bandwidth
– WAN/LAN/wireless
– Capacity
– Security issues
Data Distribution cont’d
• Deliverables
– Document of data distribution needs
• Consumers
• Delivery
• Destination
– Configured distribution mechanisms
• Challenges and Risks
– Bandwidth limitations for moving data
– Size of data
– Advanced data types (annotation, geonetworks)
Other Planning Observations
Deployment Approaches
• “Big Bang”
– Release the entire system at once (data,
functionality, staff, etc.)
• Geographic Rollout
– Users by geographic regions
– Data by geographic regions
• Functional Rollout
– Release by discrete functionality
• Dataset Rollout
– Release by dataset
Other Planning Issues
• Training
– Introduction to editing and versioning
– Introduction enterprise Geodatabase
administration
• ArcSDE admin tools
• DBMS admin
– Introduction to enterprise Geodatabase
application development
• ArcObjects
• MapObjects
• ArcSDE C and Java API
• SQL
Other Planning Issues
Robert Kircher
[email protected]
Chris Cushenbery
[email protected]