0% found this document useful (0 votes)

38 views45 pages

CSE494/598 Principles of Information Engineering

1. The document discusses the information life cycle which includes acquisition, coding/compression, analysis/mining, storage, re-engineering, preservation, retrieval, and presentation. 2. It describes challenges with legacy systems including their large size, complexity, and risks of complete rewrite. Incremental migration is presented as a safer approach. 3. Web search engines are discussed as an example of an information retrieval system. They preprocess documents and queries to determine similarity and rank results. Link analysis is also used to improve search effectiveness.

Uploaded by

ashutoshgindauliya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views45 pages

CSE494/598 Principles of Information Engineering

Uploaded by

ashutoshgindauliya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 45

CSE494/598

Principles of Information
Engineering

Information Life Cycle

Lesson Objectives:

1.     Describe the parts of the Information Life Cycle.
2.     Explain the advantages and disadvantages of coding and compression.
3.     Discuss considerations for information storage.
4.     Describe common factors that must be addressed for proper information presentation.
5.     Relate information analysis to information retrieval.
6.     Classify characteristics of various generations of database management.

Advance Reading Material:

Read “The Top 10 Data Mining Questions”. It can be found at:
http://www.datamining.com/top10
Information Life Cycle
Preservation

Re -engineering Retrieval

Storage Presentation

Analysis/mining/ Packaging/
processing Visualization

Coding/ Transport
Compression

Discard
Acquisition

Information Science
Life Cycle
1. Information Acquisition
• Acquiring of business-related information
in digital form
• Traditionally, record based data mostly in
table form
• Now multimedia data
– Conversion to digital form for on-line
processing
– Overall organization for seamless integration
2. Coding & Compression
• Coding of data in order to minimize its
representation for reducing the storage
requirement and reducing the bandwidth
requirement in communication
• Need different techniques for each type of media
and even each type of object
– Facsimile vs. aerial pictures vs. portrait
• Technique must be fast, one-pass, adaptive and
invertible, and must not impose unreasonable
requirements on resources.
3. Analysis & mining
• Raw numbers, words, images and sounds
are not immediately useful – their contents
must be analyzed and represented in
machine processable form:
– Mining of databases for useful information
– Extraction of contents from images and video
– Conceptualization of text
– Feature analysis of audio segments
4. Storage
• Business data can be very large and heterogeneous
with respect to all parameters
• Appropriate storage techniques ensure: proper
management, location and distribution, and the flow
of objects.
• Among issues to be considered:
– Data placement
– What technology (medium) to use for storage
– Distribution: local, remote, out-sourced
– Speed of delivery
5. Re-engineering
• Legacy systems make up most of the
business data systems
• Maintenance and modernization of these
systems represents a large portion of IT
efforts
• Important decisions:
– Maintenance
– replace & migrate
– modernize for co-existence
Is legacy code like Chernobyl?
• Remember Chernobyl? The meltdown of
the Nuclear reactor.
Officials poured concrete over it and hoped
that, someday, it would just go away!
• Legacy code and Chernobyl: Too messy to
clean up but too dangerous to ignore!
Legacy code...
• Theory: rebuild the legacy system from
ground up with
– a relational (or OO) database
– graphical user interfaces
– client/server architecture
• Practice: expensive and risky, because of
size, complexity and poor documentation.
Case study 1
• 700 clients
• 120,000,000 credit cards (mid-90s figure)
• Over 14 tera bytes of data
• 2 billion transactions per month
– 19 billion disk/tape I/O per month
• Around 23 million transactions are
processed from 8:00 pm to 2:00 am
Case study 2
• 22 million telephone customers
• “zero downtime” must be guaranteed
• COBOL code: Hundreds of millions of lines
• Many tera bytes of data “owned” by
applications
– no sharing -> redundant storage
• Regulatory change: “rate of return” to “price cap”
• Reengineer 80% of the business process
Case study 2
• Incremental migration into a client server
computing architecture
• Began in late 80s ago, still on-going…
• Around 10,000 workstations, and growing
• Biggest challenge: Inability of mainframe
to participate in distributed C/S computing
– CICS unable to cooperate in a nested sub-
transaction  Integrity?
About Legacy Systems
• Large, with millions of lines of code
• Over 10 years old
• Mission-critical - 24-hour operation
• Difficulty in supporting current/future
business requirements
• 80-90% of IT budget
• Instinctive solution: Migrate!
Migration Strategies
• Complete rewrite of legacy code
– Many problems
– Risky
– Prone to failure
• Incremental migration
– Migrate the legacy system in place by small
incremental steps
– Control risk by choosing increment size.
One-step Migration Impediments
• Business conditions never stand still
• Specifications rarely exist
• Undocumented dependencies
• Management of large projects
– too hard, tend to bloat
• Migration with live data
• Analysis paralysis sets in
• Fear of change
Incremental Migration
• Incrementally analyze the legacy IS
• Incrementally decompose
• Incrementally design the target interfaces
• Incrementally design the target applications
• Incrementally design the target database
• Incrementally install the target environment
• Create and install the necessary gateways
• Incrementally migrate the legacy database
• Incrementally migrate the legacy applications
• Incrementally migrate the legacy interfaces
• Incrementally cut over to the target IS
A Comparison
One step Incremental
Suited for Non-decomposable Decomposable

Risk Huge Controllable

Failure Entire project Step at a time

Benefits Immediate Incremental

Outlook Unpredictable until Conservatively

deadline optimistic
6. Preservation
• Similar to physical security measures for protecting
buildings, cash and other tangible assets,
information must be protected while recorded,
processed, stored, shared, transmitted, or retrieved.
• Must protect against loss, alteration, and disclosure
• Must prevent unauthorized access and unauthorized
use of
– Computer systems
– Networks
– Information
7. Retrieval
• Query languages have come a long way
from old style navigational queries to
today’s content-based query languages
• Important: Any constraint (e.g., a
processable feature) may be used as the
criterion for search
• Require efficient retrieval techniques,
similar to those for data retrieval, for all
types of information
Web Search Engines
• A text retrieval system with a Web interface
• The document collection of a search engine
can be either a pre-compiled special
collection or a set of Web pages collected
from many web servers by a program called
Web robot.
• Each document is preprocessed and
represented as a vector of terms with
weights.
Web Search Engines (cont’d..)

• The steps are,

• Stopward removal: Remove non-content words
such as “a” and “is” from each document.
• Stemming: Map variations of the same word into
a term.
• Term weighting: Assign a weight to each term
in a document to indicate the importance of the
term in representing the importance of the term in
representing the contents of the document.
Web Search Engines (cont’d..)
• A query is also transformed into a vector with weights.
• The similarity between a query and a document can be
measured by the dot product of their respective vectors.
• When documents are HTML web pages, other factors can
influence a term's weight in a document.
– title or the header
– enclosed in special tags or in special fonts
– Google and AltaVista use tag and location information
Web Search Engines (cont’d..)
• Web pages are hyperlinked. There are pointers going
from one page to another.
• Associated with each pointer are words (anchor
terms), which show the users what trey are likely to
find if the pointer is followed.
• Anchor terms are utilized to index referenced/pointed
pages.
• Linkage information can also be combined with
similarity information to improve the retrieval
effectiveness.
Meta-Search Engines
• Has a number of modules.
• The user interface module accepts the user’s
query which will be forwarded, with necessary
reformatting, by the query dispatcher module to
the various search engines.
• When the search engines return the sets of the
retrieved documents to the metasearch engine,
these sets are merged by the result merger module
into a single ranked list of documents.
Meta-Search Engines (cont’d..)
• Certain number of the top documents from this list
are displayed to the user.
• When the number of search engines underlying a
metasearch engine is large, forwarding each user
query to each search engine is very inefficient. To
overcome this, a database selection module is
included.
• Its function is to identify for each user query the
search engines that are likely to return useful
documents to the user.
8. Presentation
• Information must be presented to the user in
a form that is usable
– Cookies take care of part of the issue
• Issues are diverse and range from
formatting, visualization, language, and
even cultural barriers
• In the case of multimedia information, both
temporal and spatial issues must be dealt
with
9. Transport
• Moving of data/information from one
location to another
– Most common form: digital communication
• Technology selection for information
transport:
– What communication service?
– What protocols?
– What quality of service?
– What physical resources?
10. Information discard
• Destruction of information once its useful
life is over
– Generally, preserve data – unless discard is
needed
• Methods for discard
• Legal issues must be taken into account
3. Analysis & mining

Additional notes…
Information Analysis and Mining
• In multimedia objects:
– Extraction of features
– Their representation
– Indexing on the basis of contents
• For data: Mining in order to find useful
patterns and correlations
• For text:
– Conceptual representation
– Ontological classification of concepts
Analysis of Images
• Extract features
– Color
– Shape
– Texture
– Spatial relationships
• Create a logical representation for the image
– Semantic nets are effective
• Classify and index so that the search process will
be efficient
Analysis of Video
• Determine video segments by detecting scene cuts
(Scene cut detection process)
• Select a representative frame for each segment
• Extract Spatial features :
– color, texture, shape, and relative object positions
• Extract Temporal features:
– object trajectories, camera motion, viewing perspective
– temporal relationships among objects
• Represent each segment with an object that can be
efficiently indexed by its features
Video Indexing process

Scene Representative
Change Frame
Detection Selection/Creation

Closed Camera
Audio
Caption Operation + Spatial
Analysis Object
Object Motion Features Text Analysis
Analysis Segmentation
Extraction Extraction

Camera Sketch Shape Descriptio

Keywords Keywords Objects
Operation n
Spatial Color Text
Sound Object Relation- Keywords
Characteristic Trajectory ships Texture
Analysis of Audio
• For Speech:
– Textual information from speech (then sound retrieval becomes text
retrieval)
– Speaker Information (identification)
• For Generic Sound:
– Loudness
– Pitch
– Tone
– Cepstrum
– Derivatives
• For Music:
– Rhythm
– Event
– Instrument
Analysis of data
• The hardest task: Integration of data from multiple
databases
– Despite many years of work, we still have difficulty in
this area
• Data mining tasks: descriptive, predictive
– Descriptive: Characterize general properties of the data
– Predictive: Perform inference on the data to make
predictions
• Most common types: Specialized abstracts and
integrated tables
Early days of databases

Data Collection and Database Creation

(1960s and earlier)
-Primitive file processing
Database management systems

Database Management Systems

(1970s-early 1980s)
-Hierarchical and network database systems
-Relational database systems
-Data modeling tools: entity-relationship model, etc.
-Indexing and data organization techniques:
B+ -tree, hashing etc.
-Query languages: SQL, etc.
-User interfaces, forms and reports
-Query processing and query optimization
-Transaction management: recovery,
concurrency control,etc.
-On-line transaction processing(OLTP)
Current databases
Advanced Database Systems
(mid-1980s-present)
-Advanced data models:
extended-relational,
object-oriented,
object-relational, deductive
-Application-oriented:
spatial, temporal,
multimedia, active, scientific,
knowledge bases
Data Integration
Data Warehousing and Data Mining
(late 1980s-present)
-Data warehouse and OLAP technology
-Data mining and knowledge discovery

Web-based Database Systems

(1990s-present)
-XML based database systems
-Web mining
Data Collection and Database Creation

Putting it all (1960s and earlier)

-Primitive file processing

Together…
Database Management Systems
(1970s-early 1980s)
-Hierarchical and network database systems
-Relational database systems
-Data modeling tools: entity-relationship model, etc.
-Indexing and data organization techniques:
B+ -tree, hashing etc.
-Query languages: SQL, etc.
-User interfaces, forms and reports
-Query processing and query optimization
-Transaction management: recovery,
concurrency control,etc.
-On-line transaction processing(OLTP)

Advanced Database Systems

Web-based Database Systems
(mid-1980s-present)
(1990s-present)
-Advanced data models:
-XML based database systems
extended-relational, Data Warehousing and Data Mining
-Web mining
object-oriented, (late 1980s-present)

object-relational, deductive -Data warehouse and OLAP technology

-Application-oriented: spatial, -Data mining and knowledge discovery

temporal, multimedia, active,
scientific, knowledge bases

New Generation of Integrated Information Systems

(2000-…)
Information mining process
• Data cleaning
– Reformatting and conversation may be necessary
• Data integration
– Heterogeneity possible in any aspect
• Data selection
• Data transformation
• Data mining and evaluation of patterns
• Presentation of knowledge
Evaluation and
Knowledge
Presentation

Data Mining
Patterns

Selection and
Transformation

Data
Warehouse

Cleaning and
Integration

Flat files
Databases
Data Warehousing and ETL
• An organized repository of data from
multiple data sources
– A unified schema for all of the participating
databases
• Provides data analysis capabilities,
collectively known as On-Line Analytical
Processing (OLAP)
• A number of pieces are needed: tools,
gateways, and conversion routines
Typical architecture of a data warehouse

Client
Data source in Location 1

Clean
Transform Query and
Data source in Location 2 Data
Integrate analysis tools
Warehouse
Load

Data source in Location 3

Client

Data source in Location 4

Search Engines
No ratings yet
Search Engines
15 pages
9735complete Download Internet Guide To Food Safety and Security 1st Edition Elizabeth Connor (Editor) PDF All Chapters
No ratings yet
9735complete Download Internet Guide To Food Safety and Security 1st Edition Elizabeth Connor (Editor) PDF All Chapters
67 pages
Tutorial Letter 201
No ratings yet
Tutorial Letter 201
7 pages
IB Computer Science HL - Revision Guide 2024
No ratings yet
IB Computer Science HL - Revision Guide 2024
74 pages
HCI - 7 - Information Architecture
No ratings yet
HCI - 7 - Information Architecture
41 pages
Lec 15-16
No ratings yet
Lec 15-16
33 pages
Information Retrieval: Prof: Ehab Ezzat Hassanein
No ratings yet
Information Retrieval: Prof: Ehab Ezzat Hassanein
49 pages
Sample Thesis Resort Reservation System
100% (2)
Sample Thesis Resort Reservation System
4 pages
Irs 1
No ratings yet
Irs 1
4 pages
Book Store Management by Beniel, Pranavi, Meracklin
No ratings yet
Book Store Management by Beniel, Pranavi, Meracklin
37 pages
3 RD
No ratings yet
3 RD
9 pages
Lo 1
No ratings yet
Lo 1
65 pages
HCI Unit-5
No ratings yet
HCI Unit-5
6 pages
Unit - 6
No ratings yet
Unit - 6
12 pages
Group 2 Information Engineering and Architecture Word File
No ratings yet
Group 2 Information Engineering and Architecture Word File
7 pages
Search Engines
No ratings yet
Search Engines
24 pages
UNIT 5 - The Internet
No ratings yet
UNIT 5 - The Internet
16 pages
1 Complete List of OSINT Web Resources Light Ohshint
100% (3)
1 Complete List of OSINT Web Resources Light Ohshint
140 pages
Export Topic 2 Lesson PART-A - Search Strategies 2021-08!27!1958
No ratings yet
Export Topic 2 Lesson PART-A - Search Strategies 2021-08!27!1958
25 pages
Unit I
No ratings yet
Unit I
65 pages
Racshanyaa Computer Science Theory Notes
No ratings yet
Racshanyaa Computer Science Theory Notes
115 pages
Library Managment System
No ratings yet
Library Managment System
19 pages
Topic 1 Systems in Organizations 10 Hours - COMPUTER SCIENCE Middle High School
No ratings yet
Topic 1 Systems in Organizations 10 Hours - COMPUTER SCIENCE Middle High School
1 page
Final Exam Study Guide
No ratings yet
Final Exam Study Guide
6 pages
User Interfaces and Visualization: Prof - Pravin V.Shinde
No ratings yet
User Interfaces and Visualization: Prof - Pravin V.Shinde
24 pages
College Information System Using PHP
No ratings yet
College Information System Using PHP
17 pages
Lecture 1 - Project Management Concepts
100% (1)
Lecture 1 - Project Management Concepts
65 pages
BCS6B15lab List
No ratings yet
BCS6B15lab List
19 pages
PDF - English Grade 7 - Unit 12 - Gathering Information From Different Sources, 3 Topics
No ratings yet
PDF - English Grade 7 - Unit 12 - Gathering Information From Different Sources, 3 Topics
17 pages
Finalclinic Report
No ratings yet
Finalclinic Report
60 pages
ICT Brochure
No ratings yet
ICT Brochure
2 pages
Module 10 MIS March 29 2021
No ratings yet
Module 10 MIS March 29 2021
49 pages
Duplicate Cleaner Log
No ratings yet
Duplicate Cleaner Log
1,134 pages
Major Project PROPOSAL-BACHELOR OF ENGINEERING
No ratings yet
Major Project PROPOSAL-BACHELOR OF ENGINEERING
37 pages
Mail and Web Prelim Exam
No ratings yet
Mail and Web Prelim Exam
11 pages
BA4029 SOCIAL MEDIA WEB ANALYTICS Unit 5
No ratings yet
BA4029 SOCIAL MEDIA WEB ANALYTICS Unit 5
23 pages
Reviewer 151
No ratings yet
Reviewer 151
8 pages
To Information Systems: Pearson
No ratings yet
To Information Systems: Pearson
9 pages
Invoice 477637 For Order BHFO 477067 021013 BigBasket PDF
No ratings yet
Invoice 477637 For Order BHFO 477067 021013 BigBasket PDF
2 pages
Service Mining Based On The Knowledge and Customer Database
No ratings yet
Service Mining Based On The Knowledge and Customer Database
40 pages
Chapter - 1 Synopsis: About Project
No ratings yet
Chapter - 1 Synopsis: About Project
37 pages
Research Projects: Amanda Spink School of Information Sciences University of Pittsburgh
No ratings yet
Research Projects: Amanda Spink School of Information Sciences University of Pittsburgh
52 pages
Information Engineering and Architecture Group 1
No ratings yet
Information Engineering and Architecture Group 1
2 pages
History of Search Engines
No ratings yet
History of Search Engines
13 pages
Adina Lipai
No ratings yet
Adina Lipai
7 pages
Search Tools Comparison
No ratings yet
Search Tools Comparison
4 pages
Lit Survey
No ratings yet
Lit Survey
11 pages
Information Search and Visualization: - Who Earns $50,000 Among The Residents of Eugene, Oregon?
No ratings yet
Information Search and Visualization: - Who Earns $50,000 Among The Residents of Eugene, Oregon?
9 pages
UPPCS Prelims General Studies Paper 1 Official Answer Key State Upper Subordinate Services Exam 2018 All Booklet - WWW - Dhyeyaias.com
No ratings yet
UPPCS Prelims General Studies Paper 1 Official Answer Key State Upper Subordinate Services Exam 2018 All Booklet - WWW - Dhyeyaias.com
6 pages
Emerging Technologies
No ratings yet
Emerging Technologies
82 pages
Car Sales and Inventory Showroom
0% (1)
Car Sales and Inventory Showroom
39 pages
Ciudadanía Digital 4º ESO - Lesson Plan Unit 3 - Crawling The Web
No ratings yet
Ciudadanía Digital 4º ESO - Lesson Plan Unit 3 - Crawling The Web
6 pages
Planning & System: Installation
No ratings yet
Planning & System: Installation
53 pages
Computer Science Study Notes SL - HL
No ratings yet
Computer Science Study Notes SL - HL
22 pages
Synopsis Table of Contents: Project Report 33
No ratings yet
Synopsis Table of Contents: Project Report 33
249 pages
Transition Nets
No ratings yet
Transition Nets
3 pages
Example Project Proposal of Hotel Reservation System0
No ratings yet
Example Project Proposal of Hotel Reservation System0
26 pages
Course Name: Digital Literacy: Sales@evowarecreative - Co.ls
No ratings yet
Course Name: Digital Literacy: Sales@evowarecreative - Co.ls
12 pages
Web Mining
No ratings yet
Web Mining
48 pages
SEARCH ENGINE (Synopsis) - Vivek
No ratings yet
SEARCH ENGINE (Synopsis) - Vivek
17 pages
Documentation UDBI
No ratings yet
Documentation UDBI
102 pages
The Search Engine List
No ratings yet
The Search Engine List
18 pages
CS317 IR W1a
No ratings yet
CS317 IR W1a
20 pages
Career - Jist - Best Career and Education Web Sites, 4th Ed
No ratings yet
Career - Jist - Best Career and Education Web Sites, 4th Ed
209 pages
CompSc. HL Revision Notes
No ratings yet
CompSc. HL Revision Notes
75 pages
Searching The World Wide Web
No ratings yet
Searching The World Wide Web
8 pages
1.1 Statement of The Project: Information Science & Engineering, Sjcit
No ratings yet
1.1 Statement of The Project: Information Science & Engineering, Sjcit
26 pages
Computer Science Notes HL
No ratings yet
Computer Science Notes HL
6 pages
Sainik School Amaravathinagar: Computerproject
No ratings yet
Sainik School Amaravathinagar: Computerproject
9 pages
Empowerment Technology: Quarter 1 - Module 3 "Contextualized Online Search and Research Skills"
No ratings yet
Empowerment Technology: Quarter 1 - Module 3 "Contextualized Online Search and Research Skills"
10 pages
Cse3024 Web-Mining Eth 1.1 47 Cse3024 PDF
No ratings yet
Cse3024 Web-Mining Eth 1.1 47 Cse3024 PDF
12 pages
Cs101 Lec45
No ratings yet
Cs101 Lec45
61 pages
Lakos Large Scale C++
No ratings yet
Lakos Large Scale C++
5 pages
Final OTA-Trends-2020 Ebook
100% (1)
Final OTA-Trends-2020 Ebook
46 pages
Advanced Search Engine Strategies
No ratings yet
Advanced Search Engine Strategies
66 pages
Search Engine: by Bhupendra Ratha, Lecturer
No ratings yet
Search Engine: by Bhupendra Ratha, Lecturer
22 pages
2.1.2. Students 2.1.3.2. Faculty 2.2 University 2.4. Human Resource Management Office 3. Management
No ratings yet
2.1.2. Students 2.1.3.2. Faculty 2.2 University 2.4. Human Resource Management Office 3. Management
10 pages
Good Luck All, .: Kareem Mukhtar
No ratings yet
Good Luck All, .: Kareem Mukhtar
64 pages
At The End of The Lesson, Learners Are Expected To
No ratings yet
At The End of The Lesson, Learners Are Expected To
4 pages
InfluxDB Essentials: Definitive Reference for Developers and Engineers
From Everand
InfluxDB Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Information Search and Retrieval
No ratings yet
Information Search and Retrieval
23 pages
Search and Meta Search Engines
No ratings yet
Search and Meta Search Engines
9 pages
Demo Report
No ratings yet
Demo Report
59 pages
Google Search Shortcuts-Google Cheat Sheet: About Computing & Technology Web Search
100% (2)
Google Search Shortcuts-Google Cheat Sheet: About Computing & Technology Web Search
8 pages
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
From Everand
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Saumil Resume
No ratings yet
Saumil Resume
2 pages
FAFD Q - A Last Attempt
No ratings yet
FAFD Q - A Last Attempt
20 pages
Computer Science Revision Notes Paper 1
No ratings yet
Computer Science Revision Notes Paper 1
30 pages
Chapter 1: Introduction: Content
No ratings yet
Chapter 1: Introduction: Content
61 pages
Educational Research: Fundamentals For The Consumer: Sixth Edition
No ratings yet
Educational Research: Fundamentals For The Consumer: Sixth Edition
37 pages
Domain 8 - Software Development Security
No ratings yet
Domain 8 - Software Development Security
19 pages
ICT Notes (Theory)
No ratings yet
ICT Notes (Theory)
9 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)