Database Applications - The UC Berkeley Environmental Digital Library
Database Applications - The UC Berkeley Environmental Digital Library
07- SLIDE 1
IS 257 - Fall 2002
Database Applications -- The UC
Berkeley Environmental Digital
Library
University of California, Berkeley
School of Information Management
and Systems
SIMS 257: Database Management
2002.11.07- SLIDE 2
IS 257 - Fall 2002
Lecture Outline
Review
Database Administration
Database Applications
Berkeleys Environmental Digital Library
2002.11.07- SLIDE 3
IS 257 - Fall 2002
Final Project Requirements
See WWW site:
http://sims.berkeley.edu/courses/is257/f02/index.html
Report on personal/group database including:
Database description and purpose
Data Dictionary
Relationships Diagram
Sample queries and results (Web or Access tools)
Sample forms (Web or Access tools)
Sample reports (Web or Access tools)
Application Screens (Web or Access tools)
2002.11.07- SLIDE 4
IS 257 - Fall 2002
Final Presentations and Reports
Specifications for final report are on the
Web Site under assignments
Presentations (1 on Nov. 28, Others on
Nov 30, Dec 5
th
and 7
th
(Full))
2002.11.07- SLIDE 5
IS 257 - Fall 2002
Lecture Outline
Review
Database Administration
Database Applications
Berkeleys Environmental Digital Library
2002.11.07- SLIDE 6
IS 257 - Fall 2002
Terms and Concepts (trad)
Data Administration
Responsibility for the overall management
of data resources within an organization
Database Administration
Responsibility for physical database design
and technical issues in database
management
These roles are often combined or
overlapping in some organizations
2002.11.07- SLIDE 7
IS 257 - Fall 2002
Database System Life Cycle
Operation &
Maintenance
Database
Implementation
Database
Design
Growth &
Change
Database
Analysis
Database
Planning
Note: this is a different version of this
life cycle than discussed previously
2002.11.07- SLIDE 8
IS 257 - Fall 2002
Database Planning: DA & DBA functions
Develop corporate database strategy (DA)
Develop enterprise model (DA)
Develop cost/benefit models (DA)
Design database environment (DA)
Develop data administration plan (DA)
2002.11.07- SLIDE 9
IS 257 - Fall 2002
Database Analysis: DA & DBA functions
Define and model data requirements (DA)
Define and model business rules (DA)
Define operational requirements (DA)
Maintain corporate Data Dictionary (DA)
2002.11.07- SLIDE 10
IS 257 - Fall 2002
Database Design: DA &DBA functions
Perform logical database design (DA)
Design external models (subschemas)
(DBA)
Design internal model (Physical design)
(DBA)
Design integrity controls (DBA)
2002.11.07- SLIDE 11
IS 257 - Fall 2002
Database Implementation DA & DBA functions
Specify database access policies (DA & DBA)
Establish Security controls (DBA)
Supervise Database loading (DBA)
Specify test procedures (DBA)
Develop application programming standards
(DBA)
Establish procedures for backup and recovery
(DBA)
Conduct User training (DA & DBA)
2002.11.07- SLIDE 12
IS 257 - Fall 2002
Operation and Maintenance: DA & DBA functions
Monitor database performance (DBA)
Tune and reorganize databases (DBA)
Enforce standards and procedures (DBA)
Support users (DA & DBA)
2002.11.07- SLIDE 13
IS 257 - Fall 2002
Growth & Change: DA & DBA functions
Implement change control procedures (DA
& DBA)
Plan for growth and change (DA & DBA)
Evaluate new technology (DA & DBA)
2002.11.07- SLIDE 14
IS 257 - Fall 2002
Functions in Database Administration
Planning and Design (we have already
looked at theses processes in detail)
Data Integrity
Backup and Recovery
Security Management
2002.11.07- SLIDE 15
IS 257 - Fall 2002
Data Integrity
Intrarecord integrity (enforcing constraints
on contents of fields, etc.)
Referential Integrity (enforcing the validity
of references between records in the
database)
Concurrency control (ensuring the validity
of database updates in a shared multiuser
environment)
2002.11.07- SLIDE 16
IS 257 - Fall 2002
Database Security
Views or restricted subschemas
Authorization rules to identify users and
the actions they can perform
User-defined procedures (and rule
systems) to define additional constraints or
limitations in using the database
Encryption to encode sensitive data
Authentication schemes to positively
identify a person attempting to gain access
to the database
2002.11.07- SLIDE 17
IS 257 - Fall 2002
Database Backup and Recovery
Backup
Journaling (audit trail)
Checkpoint facility
Recovery manager
2002.11.07- SLIDE 18
IS 257 - Fall 2002
Disaster Recovery Planning
Testing and
Training
Procedures
Development
Budget &
Implement
Plan
Maintenance
Recovery
Strategies
Risk
Analysis
From Toigo Disaster Recovery Planning
2002.11.07- SLIDE 19
IS 257 - Fall 2002
Threats to Assets and Functions
Water
Fire
Power Failure
Mechanical breakdown or software failure
Accidental or deliberate destruction of
hardware or software
By hackers, disgruntled employees, industrial
saboteurs, terrorists, or others
2002.11.07- SLIDE 20
IS 257 - Fall 2002
Threats
Between 1967 and 1978 fire and water
damage accounted for 62% of all data
processing disasters in the U.S.
The water damage was sometimes
caused by fighting fires
More recently improvements in fire
suppression (e.g., Halon) for DP centers
has meant that water is the primary
danger to DP centers
2002.11.07- SLIDE 21
IS 257 - Fall 2002
Kinds of Records
Class I: VITAL
Essential, irreplaceable or necessary to recovery
Class II: IMPORTANT
Essential or important, but reproducible with difficulty
or at extra expense
Class III: USEFUL
Records whose loss would be inconvenient, but which
are replaceable
Class IV: NONESSENTIAL
Records which upon examination are found to be no
longer necessary
2002.11.07- SLIDE 22
IS 257 - Fall 2002
Offsite Storage of Data
Early offsite storage facilities were often
intended to survive atomic explosions
PRISM International directory
Mirror sites (Hot sites)
E.g. Cantor-Fitzgerald
2002.11.07- SLIDE 23
IS 257 - Fall 2002
Lecture Outline
Review
Database Administration
Database Applications
Berkeleys Environmental Digital Library
2002.11.07- SLIDE 24
IS 257 - Fall 2002
Berkeley DL Project
Object Relational Database Applications
The Berkeley Digital Library Project
Slides from RRL and Robert Wilensky, EECS
Use of DBMS in DL project
2002.11.07- SLIDE 25
IS 257 - Fall 2002
Overview
What is an Digital Library?
Overview of Ongoing Research on
Information Access in Digital Libraries
2002.11.07- SLIDE 26
IS 257 - Fall 2002
Digital Libraries Are Like Traditional Libraries...
Involve large repositories of information
(storage, preservation, and access)
Provide information organization and
retrieval facilities (categorization, indexing)
Provide access for communities of users
(communities may be as large as the
general public or small as the employees
of a particular organization)
2002.11.07- SLIDE 27
IS 257 - Fall 2002
Originators
Libraries
Users
Traditional Library System
2002.11.07- SLIDE 28
IS 257 - Fall 2002
But Digital Libraries Are Different From
Libraries...
Not a physical location with local copies;
objects held closer to originators
Decoupling of storage, organization,
access
Enhanced Authoring (origination,
annotation, support for work groups)
Subscription, pay-per-view supported in
addition to free browsing.
Integration into user tasks.
2002.11.07- SLIDE 29
IS 257 - Fall 2002
Originators
Repositories
Users
Index
Services
Network
A Digital Library Infrastructure Model
2002.11.07- SLIDE 30
IS 257 - Fall 2002
UC Berkeley Digital Library Project
Focus: Work-centered digital information
services
Testbed: Digital Library for the California
Environment
Research: Technical agenda supporting
user-oriented access to large distributed
collections of diverse data types.
Part of the NSF/NASA/DARPA Digital
Library Initiative (Phases 1 and 2)
2002.11.07- SLIDE 31
IS 257 - Fall 2002
UCB Digital Library Project: Research
Organizations
UC Berkeley EECS, SIMS, CED, IS&T
UCOP/CDL
Xerox PARCs Document Image Decoding group
and Work Practices group
Hewlett-Packard
NEC
SUN Microsystems
IBM Almaden
Microsoft
Ricoh California Research
Philips Research
2002.11.07- SLIDE 32
IS 257 - Fall 2002
Testbed: An Environmental Digital Library
Collection: Diverse material relevant to
Californias key habitats.
Users: A consortium of state agencies,
development corporations, private
corporations, regional government
alliances, educational institutions, and
libraries.
Potential: Impact on state-wide
environmental system (CERES )
2002.11.07- SLIDE 33
IS 257 - Fall 2002
The Environmental Library -
Users/Contributors
California Resources Agency, California
Environment Resources Evaluation
System (CERES)
California Department of Water Resources
The California Department of Fish & Game
SANDAG
UC Water Resources Center Archives
New Partners: CDL and SDSC
2002.11.07- SLIDE 34
IS 257 - Fall 2002
The Environmental Library - Contents
Environmental technical reports, bulletins, etc.
County general plans
Aerial and ground photography
USGS topographic maps
Land use and other special purpose maps
Sensor data
Derived information
Collection data bases for the classification and
distribution of the California biota (e.g.,
SMASCH)
Supporting 3-D, economic, traffic, etc. models
Videos collected by the California Resources
Agency
2002.11.07- SLIDE 35
IS 257 - Fall 2002
The Environmental Library - Contents
As of late 2002, the collection represents
over one terabyte of data, including over
183,000 digital images, about 300,000
pages of environmental documents, and
over 2 million records in geographical and
botanical databases.
2002.11.07- SLIDE 36
IS 257 - Fall 2002
Botanical Data:
The CalFlora Database contains
taxonomical and distribution information
for more than 8000 native California
plants. The Occurrence Database includes
over 600,000 records of California plant
sightings from many federal, state, and
private sources. The botanical databases
are linked to the CalPhotos collection of
California plants, and are also linked to
external collections of data, maps, and
photos.
2002.11.07- SLIDE 37
IS 257 - Fall 2002
Geographical Data:
Much of the geographical data in the collection
has been used to develop our web-based GIS
Viewer. The Street Finder uses 500,000 Tiger
records of S.F. Bay Area streets along with the
70,000-records from the USGS GNIS database.
California Dams is a database of information
about the 1395 dams under state jurisdiction. An
additional 11 GB of geographical data
represents maps and imagery that have been
processed for inclusion as layers in our GIS
Viewer. This includes Digital Ortho Quads and
DRG maps for the S.F. Bay Area.
2002.11.07- SLIDE 38
IS 257 - Fall 2002
Documents:
Most of the 300,000 pages of digital documents are
environmental reports and plans that were provided by
California state agencies. This collection includes
documents, maps, articles, and reports on the California
environment including Environmental Impact Reports
(EIRs), educational pamphlets, water usage bulletins,
and county plans. Documents in this collection come
from the California Department of Water Resources
(DWR), California Department of Fish and Game (DFG),
San Diego Association of Governments (SANDAG), and
many other agencies. Among the most frequently
accessed documents are County General Plans for
every California county and a survey of 125 Sacramento
Delta fish species.
2002.11.07- SLIDE 39
IS 257 - Fall 2002
Testbed Success Stories
LUPIN: CERES Land Use Planning Information
Network
California Country General Plans and other
environmental documents.
Enter at Resources Agency Server, documents stored
at and retrieved from UCB DLIB server.
California flood relief efforts
High demand for some data sets only available on our
server (created by document recognition).
CalFlora: Creation and interoperation of
repositories pertaining to plant biology.
Cloning of services at Cal State Library, FBI
2002.11.07- SLIDE 40
IS 257 - Fall 2002
Research Highlights
Documents
Multivalent Document prototype
Page images, structured documents, GIS data,
photographs
Intelligent Access to Content
Document recognition
Vision-based Image Retrieval: stuff, thing,
scene retrieval
Natural Language Processing: categorizing
the web, Cheshire II, TileBar Interfaces
2002.11.07- SLIDE 41
IS 257 - Fall 2002
Multivalent Documents
MVD Model
radically distributed, open, extensible
behaviors and layers
behaviors conform to a protocol suite
inter-operation via IDEG
Applied to enlivening legacy documents
various nice behaviors, e.g., lenses
2002.11.07- SLIDE 42
IS 257 - Fall 2002
Document Presentation
Problem: Digital libraries must deliver
digital documents -- but in what form?
Different forms have advantages for
particular purposes
Retrieval
Reuse
Content Analysis
Storage and archiving
Combining forms (Multivalent documents)
2002.11.07- SLIDE 43
IS 257 - Fall 2002
Spectrum of Digital Document
Representations
Adapted from Fox, E.A., et al. Users, User Interfaces and Objects: Evision, an Electronic Library, JASIS 44(8), 1993
2002.11.07- SLIDE 44
IS 257 - Fall 2002
Document Representation: Multivalent
Documents
Primary user interface/document model for
UCB Digital Library (Wilensky & Phelps)
Goal: An approach to new document
representations and their authoring.
Supports active, distributed, composable
transformations of multimedia documents.
Enables sophisticated annotations,
intelligent result handling, user-modifiable
interface, composite documents.
2002.11.07- SLIDE 45
IS 257 - Fall 2002
Multivalent Documents
Cheshire Layer
OCR Layer
OCR Mapping
Layer
History of The Classical World
The jsfj sjjhfjs jsjj
jsjhfsjf sjhfjksh sshf
jsfksfjk sjs jsjfs kj
sjfkjsfhskjf sjfhjksh
skjfhkjshfjksh
jsfhkjshfjkskjfhsfh
skjfksjflksjflksjflksf
sjfksjfkjskfjskfjklsslk
slfjlskfjklsfklkkkdsj
ksfksjfkskflk sjfjksf
kjsfkjsfkjshf sjfsjfjks
ksfjksfjksjfkthsjir\\
ks
ksfjksjfkksjklsks
klsjfkskfksjjjhsjhuu
sfsjfkjs
Modernjsfj sjjhfjs jsjj
jsjhfsjf sslfjksh sshf
jsfksfjk sjs jsjfs kj
sjfkjsfhskjf sjfhjksh
skjfhkjshfjksh
jsfhkjshfjkskjfhsfh
skjfksjflksjflksjflksf
sjfksjfkjskfjskfjklsslk
slfjlskfjklsfklkkkdsj
GIS Layer
taksksh kdjjdkd kdjkdjkd kj
sksksk kdkdk kdkd dkk
skksksk jdjjdj clclc ldldl
Table 1.
Table Layer
kdk
dkd
kdk
Scanned
Page
Image
Valence:
2: The relative
capacity to unite,
react, or interact
(as with antigens
or a biological
substrate).
Websters 7th Collegiate
Dictionary
Network
Protocols &
Resources
2002.11.07- SLIDE 46
IS 257 - Fall 2002
2002.11.07- SLIDE 47
IS 257 - Fall 2002
2002.11.07- SLIDE 48
IS 257 - Fall 2002
MVD availability
The MVD Browser is now available as
open source on SourceForge
http://sourceforge.net/project/showfiles.php?group_id=44509
See also:
http://http.cs.berkeley.edu/~phelps/Multivalent/
2002.11.07- SLIDE 49
IS 257 - Fall 2002
GIS in the MVD Framework
Layers are georeferenced data sets.
Behaviors are
display semi-transparently
pan
zoom
issue query
display context
spatial hyperlinks
annotations
Written in Java
2002.11.07- SLIDE 50
IS 257 - Fall 2002
GIS Viewer: Features
Annotation and saving
points, rectangles (w. labels and links),
vectors
saving of annotations as separate layer
Integration with address, street finding,
gazetteer services
Application to image viewing: tilePix
Castanet client
2002.11.07- SLIDE 51
IS 257 - Fall 2002
2002.11.07- SLIDE 52
IS 257 - Fall 2002
2002.11.07- SLIDE 53
IS 257 - Fall 2002
2002.11.07- SLIDE 54
IS 257 - Fall 2002
GIS Viewer Example
http://elib.cs.berkeley.edu/annotations/gis/buildings.html
2002.11.07- SLIDE 55
IS 257 - Fall 2002
Geographic Information: Plans and Ideas
More annotations, flexible saving
Support for large vector data sets
Interoperability
On-the-fly
conversion of formats
generation of catalogs
Via OGDI/GLTP
Experimenting with various CERES servers
2002.11.07- SLIDE 56
IS 257 - Fall 2002
Documents: Information from scanned
documents
Built document recognizers for some
important documents, e.g. Bulletin 17.
TR-9.
Recognized document structure, with
order magnitude better OCR.
Automatically generated 1395 item dam
relational data base.
Enabled access via forms, map interfaces.
Enable interoperation with image DB.
2002.11.07- SLIDE 60
IS 257 - Fall 2002
Document Recognition: Ongoing Work
Document recognizers: for ~ dozen
document types
Development and integration of
mathematical OCR and recognition.
Eventually produce document recognizer
generator, i.e., make it easier to write
recognizers.
2002.11.07- SLIDE 61
IS 257 - Fall 2002
Vision-Based Image Retrieval
Stuff-based queries: blobs
Basic blobs: colors, sizes, variable number
demonstrated utility for interesting queries
Blob world: Above plus texture, applied to
retrieving similar images
successful learning scene classifier
Thing-finding: Successfully deployed
detectors adding body plans (adding
shape, geometry and kinematic
constraints)
2002.11.07- SLIDE 62
IS 257 - Fall 2002
Image Retrieval Research
Finding Stuff vs Things
BlobWorld
Other Vision Research
2002.11.07- SLIDE 63
IS 257 - Fall 2002
(Old stuff-based image retrieval: Query)
2002.11.07- SLIDE 64
IS 257 - Fall 2002
(Old stuff-based image retrieval: Result)
2002.11.07- SLIDE 65
IS 257 - Fall 2002
Blobworld: use regions for retrieval
We want to find general objects
Represent images based on coherent
regions
2002.11.07- SLIDE 68
IS 257 - Fall 2002
(Thing-based image retrieval using
body plans: Result)
2002.11.07- SLIDE 69
IS 257 - Fall 2002
Natural Language Processing
Developed automatic
categorization/disambiguation method to
point where topic assignment (but not
disambiguation) appears feasible.
Ran controlled experiment:
Took Yahoo as ground truth.
Chose 9 overlapping categories; took 1000
web pages from Yahoo as input.
Result: 84% precision; 48% recall (using top
5 of 1073 categories)
Automatic Topic Assignment
2002.11.07- SLIDE 70
IS 257 - Fall 2002
Further Information
Berkeley DL web site
http://elib.cs.berkeley.edu