CSE494/598 Principles of Information Engineering
CSE494/598 Principles of Information Engineering
Principles of Information
Engineering
Re -engineering Retrieval
Storage Presentation
Analysis/mining/ Packaging/
processing Visualization
Coding/ Transport
Compression
Discard
Acquisition
Information Science
Life Cycle
1. Information Acquisition
• Acquiring of business-related information
in digital form
• Traditionally, record based data mostly in
table form
• Now multimedia data
– Conversion to digital form for on-line
processing
– Overall organization for seamless integration
2. Coding & Compression
• Coding of data in order to minimize its
representation for reducing the storage
requirement and reducing the bandwidth
requirement in communication
• Need different techniques for each type of media
and even each type of object
– Facsimile vs. aerial pictures vs. portrait
• Technique must be fast, one-pass, adaptive and
invertible, and must not impose unreasonable
requirements on resources.
3. Analysis & mining
• Raw numbers, words, images and sounds
are not immediately useful – their contents
must be analyzed and represented in
machine processable form:
– Mining of databases for useful information
– Extraction of contents from images and video
– Conceptualization of text
– Feature analysis of audio segments
4. Storage
• Business data can be very large and heterogeneous
with respect to all parameters
• Appropriate storage techniques ensure: proper
management, location and distribution, and the flow
of objects.
• Among issues to be considered:
– Data placement
– What technology (medium) to use for storage
– Distribution: local, remote, out-sourced
– Speed of delivery
5. Re-engineering
• Legacy systems make up most of the
business data systems
• Maintenance and modernization of these
systems represents a large portion of IT
efforts
• Important decisions:
– Maintenance
– replace & migrate
– modernize for co-existence
Is legacy code like Chernobyl?
• Remember Chernobyl? The meltdown of
the Nuclear reactor.
Officials poured concrete over it and hoped
that, someday, it would just go away!
• Legacy code and Chernobyl: Too messy to
clean up but too dangerous to ignore!
Legacy code...
• Theory: rebuild the legacy system from
ground up with
– a relational (or OO) database
– graphical user interfaces
– client/server architecture
• Practice: expensive and risky, because of
size, complexity and poor documentation.
Case study 1
• 700 clients
• 120,000,000 credit cards (mid-90s figure)
• Over 14 tera bytes of data
• 2 billion transactions per month
– 19 billion disk/tape I/O per month
• Around 23 million transactions are
processed from 8:00 pm to 2:00 am
Case study 2
• 22 million telephone customers
• “zero downtime” must be guaranteed
• COBOL code: Hundreds of millions of lines
• Many tera bytes of data “owned” by
applications
– no sharing -> redundant storage
• Regulatory change: “rate of return” to “price cap”
• Reengineer 80% of the business process
Case study 2
• Incremental migration into a client server
computing architecture
• Began in late 80s ago, still on-going…
• Around 10,000 workstations, and growing
• Biggest challenge: Inability of mainframe
to participate in distributed C/S computing
– CICS unable to cooperate in a nested sub-
transaction Integrity?
About Legacy Systems
• Large, with millions of lines of code
• Over 10 years old
• Mission-critical - 24-hour operation
• Difficulty in supporting current/future
business requirements
• 80-90% of IT budget
• Instinctive solution: Migrate!
Migration Strategies
• Complete rewrite of legacy code
– Many problems
– Risky
– Prone to failure
• Incremental migration
– Migrate the legacy system in place by small
incremental steps
– Control risk by choosing increment size.
One-step Migration Impediments
• Business conditions never stand still
• Specifications rarely exist
• Undocumented dependencies
• Management of large projects
– too hard, tend to bloat
• Migration with live data
• Analysis paralysis sets in
• Fear of change
Incremental Migration
• Incrementally analyze the legacy IS
• Incrementally decompose
• Incrementally design the target interfaces
• Incrementally design the target applications
• Incrementally design the target database
• Incrementally install the target environment
• Create and install the necessary gateways
• Incrementally migrate the legacy database
• Incrementally migrate the legacy applications
• Incrementally migrate the legacy interfaces
• Incrementally cut over to the target IS
A Comparison
One step Incremental
Suited for Non-decomposable Decomposable
Additional notes…
Information Analysis and Mining
• In multimedia objects:
– Extraction of features
– Their representation
– Indexing on the basis of contents
• For data: Mining in order to find useful
patterns and correlations
• For text:
– Conceptual representation
– Ontological classification of concepts
Analysis of Images
• Extract features
– Color
– Shape
– Texture
– Spatial relationships
• Create a logical representation for the image
– Semantic nets are effective
• Classify and index so that the search process will
be efficient
Analysis of Video
• Determine video segments by detecting scene cuts
(Scene cut detection process)
• Select a representative frame for each segment
• Extract Spatial features :
– color, texture, shape, and relative object positions
• Extract Temporal features:
– object trajectories, camera motion, viewing perspective
– temporal relationships among objects
• Represent each segment with an object that can be
efficiently indexed by its features
Video Indexing process
Scene Representative
Change Frame
Detection Selection/Creation
Closed Camera
Audio
Caption Operation + Spatial
Analysis Object
Object Motion Features Text Analysis
Analysis Segmentation
Extraction Extraction
Together…
Database Management Systems
(1970s-early 1980s)
-Hierarchical and network database systems
-Relational database systems
-Data modeling tools: entity-relationship model, etc.
-Indexing and data organization techniques:
B+ -tree, hashing etc.
-Query languages: SQL, etc.
-User interfaces, forms and reports
-Query processing and query optimization
-Transaction management: recovery,
concurrency control,etc.
-On-line transaction processing(OLTP)
Data Mining
Patterns
Selection and
Transformation
Data
Warehouse
Cleaning and
Integration
Flat files
Databases
Data Warehousing and ETL
• An organized repository of data from
multiple data sources
– A unified schema for all of the participating
databases
• Provides data analysis capabilities,
collectively known as On-Line Analytical
Processing (OLAP)
• A number of pieces are needed: tools,
gateways, and conversion routines
Typical architecture of a data warehouse
Client
Data source in Location 1
Clean
Transform Query and
Data source in Location 2 Data
Integrate analysis tools
Warehouse
Load