0% found this document useful (0 votes)
39 views70 pages

Database Applications - The UC Berkeley Environmental Digital Library

This document outlines slides from a lecture on database applications and the Berkeley Environmental Digital Library project. The key points are: 1) The lecture discusses database administration and the Berkeley Digital Library Project's use of databases for its Environmental Digital Library. 2) The Berkeley Digital Library Project aims to provide work-centered digital information services through its testbed Environmental Digital Library for California environmental data, users, and organizations. 3) The Environmental Digital Library contains diverse materials like reports, maps, photos and sensor data relevant to California habitats from contributors like state agencies and is part of research on technical challenges in digital libraries.

Uploaded by

AmritaSingh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views70 pages

Database Applications - The UC Berkeley Environmental Digital Library

This document outlines slides from a lecture on database applications and the Berkeley Environmental Digital Library project. The key points are: 1) The lecture discusses database administration and the Berkeley Digital Library Project's use of databases for its Environmental Digital Library. 2) The Berkeley Digital Library Project aims to provide work-centered digital information services through its testbed Environmental Digital Library for California environmental data, users, and organizations. 3) The Environmental Digital Library contains diverse materials like reports, maps, photos and sensor data relevant to California habitats from contributors like state agencies and is part of research on technical challenges in digital libraries.

Uploaded by

AmritaSingh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 70

2002.11.

07- SLIDE 1


IS 257 - Fall 2002

Database Applications -- The UC
Berkeley Environmental Digital
Library
University of California, Berkeley
School of Information Management
and Systems
SIMS 257: Database Management

2002.11.07- SLIDE 2


IS 257 - Fall 2002

Lecture Outline
Review
Database Administration
Database Applications
Berkeleys Environmental Digital Library



2002.11.07- SLIDE 3


IS 257 - Fall 2002

Final Project Requirements
See WWW site:
http://sims.berkeley.edu/courses/is257/f02/index.html
Report on personal/group database including:
Database description and purpose
Data Dictionary
Relationships Diagram
Sample queries and results (Web or Access tools)
Sample forms (Web or Access tools)
Sample reports (Web or Access tools)
Application Screens (Web or Access tools)

2002.11.07- SLIDE 4


IS 257 - Fall 2002

Final Presentations and Reports
Specifications for final report are on the
Web Site under assignments
Presentations (1 on Nov. 28, Others on
Nov 30, Dec 5
th
and 7
th
(Full))

2002.11.07- SLIDE 5


IS 257 - Fall 2002

Lecture Outline
Review
Database Administration
Database Applications
Berkeleys Environmental Digital Library



2002.11.07- SLIDE 6


IS 257 - Fall 2002

Terms and Concepts (trad)
Data Administration
Responsibility for the overall management
of data resources within an organization
Database Administration
Responsibility for physical database design
and technical issues in database
management
These roles are often combined or
overlapping in some organizations

2002.11.07- SLIDE 7


IS 257 - Fall 2002

Database System Life Cycle
Operation &
Maintenance
Database
Implementation
Database
Design
Growth &
Change
Database
Analysis
Database
Planning
Note: this is a different version of this
life cycle than discussed previously

2002.11.07- SLIDE 8


IS 257 - Fall 2002

Database Planning: DA & DBA functions
Develop corporate database strategy (DA)
Develop enterprise model (DA)
Develop cost/benefit models (DA)
Design database environment (DA)
Develop data administration plan (DA)

2002.11.07- SLIDE 9


IS 257 - Fall 2002

Database Analysis: DA & DBA functions
Define and model data requirements (DA)
Define and model business rules (DA)
Define operational requirements (DA)
Maintain corporate Data Dictionary (DA)

2002.11.07- SLIDE 10


IS 257 - Fall 2002

Database Design: DA &DBA functions
Perform logical database design (DA)
Design external models (subschemas)
(DBA)
Design internal model (Physical design)
(DBA)
Design integrity controls (DBA)

2002.11.07- SLIDE 11


IS 257 - Fall 2002

Database Implementation DA & DBA functions
Specify database access policies (DA & DBA)
Establish Security controls (DBA)
Supervise Database loading (DBA)
Specify test procedures (DBA)
Develop application programming standards
(DBA)
Establish procedures for backup and recovery
(DBA)
Conduct User training (DA & DBA)

2002.11.07- SLIDE 12


IS 257 - Fall 2002

Operation and Maintenance: DA & DBA functions
Monitor database performance (DBA)
Tune and reorganize databases (DBA)
Enforce standards and procedures (DBA)
Support users (DA & DBA)

2002.11.07- SLIDE 13


IS 257 - Fall 2002

Growth & Change: DA & DBA functions
Implement change control procedures (DA
& DBA)
Plan for growth and change (DA & DBA)
Evaluate new technology (DA & DBA)

2002.11.07- SLIDE 14


IS 257 - Fall 2002

Functions in Database Administration
Planning and Design (we have already
looked at theses processes in detail)
Data Integrity
Backup and Recovery
Security Management

2002.11.07- SLIDE 15


IS 257 - Fall 2002

Data Integrity
Intrarecord integrity (enforcing constraints
on contents of fields, etc.)
Referential Integrity (enforcing the validity
of references between records in the
database)
Concurrency control (ensuring the validity
of database updates in a shared multiuser
environment)

2002.11.07- SLIDE 16


IS 257 - Fall 2002

Database Security
Views or restricted subschemas
Authorization rules to identify users and
the actions they can perform
User-defined procedures (and rule
systems) to define additional constraints or
limitations in using the database
Encryption to encode sensitive data
Authentication schemes to positively
identify a person attempting to gain access
to the database

2002.11.07- SLIDE 17


IS 257 - Fall 2002

Database Backup and Recovery
Backup
Journaling (audit trail)
Checkpoint facility
Recovery manager


2002.11.07- SLIDE 18


IS 257 - Fall 2002

Disaster Recovery Planning
Testing and
Training
Procedures
Development
Budget &
Implement
Plan
Maintenance
Recovery
Strategies
Risk
Analysis
From Toigo Disaster Recovery Planning

2002.11.07- SLIDE 19


IS 257 - Fall 2002

Threats to Assets and Functions
Water
Fire
Power Failure
Mechanical breakdown or software failure
Accidental or deliberate destruction of
hardware or software
By hackers, disgruntled employees, industrial
saboteurs, terrorists, or others

2002.11.07- SLIDE 20


IS 257 - Fall 2002

Threats
Between 1967 and 1978 fire and water
damage accounted for 62% of all data
processing disasters in the U.S.
The water damage was sometimes
caused by fighting fires
More recently improvements in fire
suppression (e.g., Halon) for DP centers
has meant that water is the primary
danger to DP centers

2002.11.07- SLIDE 21


IS 257 - Fall 2002

Kinds of Records
Class I: VITAL
Essential, irreplaceable or necessary to recovery
Class II: IMPORTANT
Essential or important, but reproducible with difficulty
or at extra expense
Class III: USEFUL
Records whose loss would be inconvenient, but which
are replaceable
Class IV: NONESSENTIAL
Records which upon examination are found to be no
longer necessary

2002.11.07- SLIDE 22


IS 257 - Fall 2002

Offsite Storage of Data
Early offsite storage facilities were often
intended to survive atomic explosions
PRISM International directory
Mirror sites (Hot sites)
E.g. Cantor-Fitzgerald


2002.11.07- SLIDE 23


IS 257 - Fall 2002

Lecture Outline
Review
Database Administration
Database Applications
Berkeleys Environmental Digital Library



2002.11.07- SLIDE 24


IS 257 - Fall 2002

Berkeley DL Project
Object Relational Database Applications
The Berkeley Digital Library Project
Slides from RRL and Robert Wilensky, EECS
Use of DBMS in DL project


2002.11.07- SLIDE 25


IS 257 - Fall 2002

Overview
What is an Digital Library?
Overview of Ongoing Research on
Information Access in Digital Libraries

2002.11.07- SLIDE 26


IS 257 - Fall 2002

Digital Libraries Are Like Traditional Libraries...
Involve large repositories of information
(storage, preservation, and access)
Provide information organization and
retrieval facilities (categorization, indexing)
Provide access for communities of users
(communities may be as large as the
general public or small as the employees
of a particular organization)

2002.11.07- SLIDE 27


IS 257 - Fall 2002

Originators
Libraries
Users
Traditional Library System

2002.11.07- SLIDE 28


IS 257 - Fall 2002

But Digital Libraries Are Different From
Libraries...
Not a physical location with local copies;
objects held closer to originators
Decoupling of storage, organization,
access
Enhanced Authoring (origination,
annotation, support for work groups)
Subscription, pay-per-view supported in
addition to free browsing.
Integration into user tasks.

2002.11.07- SLIDE 29


IS 257 - Fall 2002

Originators
Repositories
Users
Index
Services
Network
A Digital Library Infrastructure Model

2002.11.07- SLIDE 30


IS 257 - Fall 2002

UC Berkeley Digital Library Project
Focus: Work-centered digital information
services
Testbed: Digital Library for the California
Environment
Research: Technical agenda supporting
user-oriented access to large distributed
collections of diverse data types.
Part of the NSF/NASA/DARPA Digital
Library Initiative (Phases 1 and 2)

2002.11.07- SLIDE 31


IS 257 - Fall 2002

UCB Digital Library Project: Research
Organizations
UC Berkeley EECS, SIMS, CED, IS&T
UCOP/CDL
Xerox PARCs Document Image Decoding group
and Work Practices group
Hewlett-Packard
NEC
SUN Microsystems
IBM Almaden
Microsoft
Ricoh California Research
Philips Research

2002.11.07- SLIDE 32


IS 257 - Fall 2002

Testbed: An Environmental Digital Library
Collection: Diverse material relevant to
Californias key habitats.
Users: A consortium of state agencies,
development corporations, private
corporations, regional government
alliances, educational institutions, and
libraries.
Potential: Impact on state-wide
environmental system (CERES )

2002.11.07- SLIDE 33


IS 257 - Fall 2002

The Environmental Library -
Users/Contributors
California Resources Agency, California
Environment Resources Evaluation
System (CERES)
California Department of Water Resources
The California Department of Fish & Game
SANDAG
UC Water Resources Center Archives
New Partners: CDL and SDSC


2002.11.07- SLIDE 34


IS 257 - Fall 2002

The Environmental Library - Contents
Environmental technical reports, bulletins, etc.
County general plans
Aerial and ground photography
USGS topographic maps
Land use and other special purpose maps
Sensor data
Derived information
Collection data bases for the classification and
distribution of the California biota (e.g.,
SMASCH)
Supporting 3-D, economic, traffic, etc. models
Videos collected by the California Resources
Agency

2002.11.07- SLIDE 35


IS 257 - Fall 2002

The Environmental Library - Contents
As of late 2002, the collection represents
over one terabyte of data, including over
183,000 digital images, about 300,000
pages of environmental documents, and
over 2 million records in geographical and
botanical databases.

2002.11.07- SLIDE 36


IS 257 - Fall 2002

Botanical Data:
The CalFlora Database contains
taxonomical and distribution information
for more than 8000 native California
plants. The Occurrence Database includes
over 600,000 records of California plant
sightings from many federal, state, and
private sources. The botanical databases
are linked to the CalPhotos collection of
California plants, and are also linked to
external collections of data, maps, and
photos.

2002.11.07- SLIDE 37


IS 257 - Fall 2002

Geographical Data:
Much of the geographical data in the collection
has been used to develop our web-based GIS
Viewer. The Street Finder uses 500,000 Tiger
records of S.F. Bay Area streets along with the
70,000-records from the USGS GNIS database.
California Dams is a database of information
about the 1395 dams under state jurisdiction. An
additional 11 GB of geographical data
represents maps and imagery that have been
processed for inclusion as layers in our GIS
Viewer. This includes Digital Ortho Quads and
DRG maps for the S.F. Bay Area.

2002.11.07- SLIDE 38


IS 257 - Fall 2002

Documents:
Most of the 300,000 pages of digital documents are
environmental reports and plans that were provided by
California state agencies. This collection includes
documents, maps, articles, and reports on the California
environment including Environmental Impact Reports
(EIRs), educational pamphlets, water usage bulletins,
and county plans. Documents in this collection come
from the California Department of Water Resources
(DWR), California Department of Fish and Game (DFG),
San Diego Association of Governments (SANDAG), and
many other agencies. Among the most frequently
accessed documents are County General Plans for
every California county and a survey of 125 Sacramento
Delta fish species.

2002.11.07- SLIDE 39


IS 257 - Fall 2002

Testbed Success Stories
LUPIN: CERES Land Use Planning Information
Network
California Country General Plans and other
environmental documents.
Enter at Resources Agency Server, documents stored
at and retrieved from UCB DLIB server.
California flood relief efforts
High demand for some data sets only available on our
server (created by document recognition).
CalFlora: Creation and interoperation of
repositories pertaining to plant biology.
Cloning of services at Cal State Library, FBI

2002.11.07- SLIDE 40


IS 257 - Fall 2002

Research Highlights
Documents
Multivalent Document prototype
Page images, structured documents, GIS data,
photographs
Intelligent Access to Content
Document recognition
Vision-based Image Retrieval: stuff, thing,
scene retrieval
Natural Language Processing: categorizing
the web, Cheshire II, TileBar Interfaces


2002.11.07- SLIDE 41


IS 257 - Fall 2002

Multivalent Documents
MVD Model
radically distributed, open, extensible
behaviors and layers
behaviors conform to a protocol suite
inter-operation via IDEG
Applied to enlivening legacy documents
various nice behaviors, e.g., lenses


2002.11.07- SLIDE 42


IS 257 - Fall 2002

Document Presentation
Problem: Digital libraries must deliver
digital documents -- but in what form?
Different forms have advantages for
particular purposes
Retrieval
Reuse
Content Analysis
Storage and archiving
Combining forms (Multivalent documents)

2002.11.07- SLIDE 43


IS 257 - Fall 2002

Spectrum of Digital Document
Representations
Adapted from Fox, E.A., et al. Users, User Interfaces and Objects: Evision, an Electronic Library, JASIS 44(8), 1993

2002.11.07- SLIDE 44


IS 257 - Fall 2002

Document Representation: Multivalent
Documents
Primary user interface/document model for
UCB Digital Library (Wilensky & Phelps)
Goal: An approach to new document
representations and their authoring.
Supports active, distributed, composable
transformations of multimedia documents.
Enables sophisticated annotations,
intelligent result handling, user-modifiable
interface, composite documents.

2002.11.07- SLIDE 45


IS 257 - Fall 2002

Multivalent Documents
Cheshire Layer










OCR Layer
OCR Mapping
Layer
History of The Classical World
The jsfj sjjhfjs jsjj
jsjhfsjf sjhfjksh sshf
jsfksfjk sjs jsjfs kj
sjfkjsfhskjf sjfhjksh
skjfhkjshfjksh
jsfhkjshfjkskjfhsfh
skjfksjflksjflksjflksf
sjfksjfkjskfjskfjklsslk
slfjlskfjklsfklkkkdsj
ksfksjfkskflk sjfjksf
kjsfkjsfkjshf sjfsjfjks
ksfjksfjksjfkthsjir\\
ks
ksfjksjfkksjklsks
klsjfkskfksjjjhsjhuu
sfsjfkjs
Modernjsfj sjjhfjs jsjj
jsjhfsjf sslfjksh sshf
jsfksfjk sjs jsjfs kj
sjfkjsfhskjf sjfhjksh
skjfhkjshfjksh
jsfhkjshfjkskjfhsfh
skjfksjflksjflksjflksf
sjfksjfkjskfjskfjklsslk
slfjlskfjklsfklkkkdsj

GIS Layer
taksksh kdjjdkd kdjkdjkd kj
sksksk kdkdk kdkd dkk
skksksk jdjjdj clclc ldldl
Table 1.
Table Layer
kdk
dkd
kdk

Scanned
Page
Image
Valence:
2: The relative
capacity to unite,
react, or interact
(as with antigens
or a biological
substrate).

Websters 7th Collegiate
Dictionary

Network
Protocols &
Resources

2002.11.07- SLIDE 46


IS 257 - Fall 2002


2002.11.07- SLIDE 47


IS 257 - Fall 2002


2002.11.07- SLIDE 48


IS 257 - Fall 2002

MVD availability
The MVD Browser is now available as
open source on SourceForge
http://sourceforge.net/project/showfiles.php?group_id=44509
See also:
http://http.cs.berkeley.edu/~phelps/Multivalent/


2002.11.07- SLIDE 49


IS 257 - Fall 2002

GIS in the MVD Framework
Layers are georeferenced data sets.
Behaviors are
display semi-transparently
pan
zoom
issue query
display context
spatial hyperlinks
annotations
Written in Java

2002.11.07- SLIDE 50


IS 257 - Fall 2002

GIS Viewer: Features
Annotation and saving
points, rectangles (w. labels and links),
vectors
saving of annotations as separate layer
Integration with address, street finding,
gazetteer services
Application to image viewing: tilePix
Castanet client

2002.11.07- SLIDE 51


IS 257 - Fall 2002



2002.11.07- SLIDE 52


IS 257 - Fall 2002



2002.11.07- SLIDE 53


IS 257 - Fall 2002



2002.11.07- SLIDE 54


IS 257 - Fall 2002

GIS Viewer Example

http://elib.cs.berkeley.edu/annotations/gis/buildings.html

2002.11.07- SLIDE 55


IS 257 - Fall 2002

Geographic Information: Plans and Ideas
More annotations, flexible saving
Support for large vector data sets
Interoperability
On-the-fly
conversion of formats
generation of catalogs
Via OGDI/GLTP
Experimenting with various CERES servers


2002.11.07- SLIDE 56


IS 257 - Fall 2002

Documents: Information from scanned
documents
Built document recognizers for some
important documents, e.g. Bulletin 17.
TR-9.
Recognized document structure, with
order magnitude better OCR.
Automatically generated 1395 item dam
relational data base.
Enabled access via forms, map interfaces.
Enable interoperation with image DB.

2002.11.07- SLIDE 60


IS 257 - Fall 2002

Document Recognition: Ongoing Work
Document recognizers: for ~ dozen
document types
Development and integration of
mathematical OCR and recognition.
Eventually produce document recognizer
generator, i.e., make it easier to write
recognizers.

2002.11.07- SLIDE 61


IS 257 - Fall 2002

Vision-Based Image Retrieval
Stuff-based queries: blobs
Basic blobs: colors, sizes, variable number
demonstrated utility for interesting queries
Blob world: Above plus texture, applied to
retrieving similar images
successful learning scene classifier
Thing-finding: Successfully deployed
detectors adding body plans (adding
shape, geometry and kinematic
constraints)

2002.11.07- SLIDE 62


IS 257 - Fall 2002

Image Retrieval Research
Finding Stuff vs Things
BlobWorld
Other Vision Research

2002.11.07- SLIDE 63


IS 257 - Fall 2002


(Old stuff-based image retrieval: Query)

2002.11.07- SLIDE 64


IS 257 - Fall 2002


(Old stuff-based image retrieval: Result)

2002.11.07- SLIDE 65


IS 257 - Fall 2002

Blobworld: use regions for retrieval
We want to find general objects
Represent images based on coherent
regions











2002.11.07- SLIDE 68


IS 257 - Fall 2002


(Thing-based image retrieval using
body plans: Result)

2002.11.07- SLIDE 69


IS 257 - Fall 2002

Natural Language Processing
Developed automatic
categorization/disambiguation method to
point where topic assignment (but not
disambiguation) appears feasible.
Ran controlled experiment:
Took Yahoo as ground truth.
Chose 9 overlapping categories; took 1000
web pages from Yahoo as input.
Result: 84% precision; 48% recall (using top
5 of 1073 categories)
Automatic Topic Assignment

2002.11.07- SLIDE 70


IS 257 - Fall 2002

Further Information
Berkeley DL web site
http://elib.cs.berkeley.edu

You might also like