Data Management TURBAN

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 39

m 


      
  
   

Information Technology For Management 5th Edition


Turban, Leidner, McLean, Wetherbe
Lecture Slides by A. Lekacos,
Stony Brook University
John Wiley & Sons, Inc.
m  
À  

½ ecognize the importance of data, their managerial issues, and


their life cycle.
½ escribe the sources of data, their collection, and quality issues.
½ escribe document management systems.
½ Explain the operation of data warehousing and its role in decision
support.
½ escribe information and knowledge discovery and business
intelligence.
½ Understand the power and benefits of data mining.
½ escribe data presentation methods and explain geographical
information systems, visual simulations, and virtual reality as
decision support tools.
½ iscuss the role of marketing databases and provide examples.
½ ecognize the role of the Web in data management.

m 

    

IT applications cannot be done without using some kind of data


Which are at the core of management and marketing operations. However,
managing data is difficult for various reasons.

½ The amount of data increases exponentially with time.


½ ata are scattered throughout organizations.
½ ata are collected by many individuals using several
methods.
½ External data needs to be considered in making
organizational decisions.
½ ata security, quality, and integrity are critical.
½ Data are an data
Selecting asset,management
when converted to can
tools information andproblem.
be a major
knowledge, give the firm competitive advantages.
m 

À m!
usinesses run on data that have been processed to information and
knowledge, which mangers apply to businesses problems and
opportunities. This transformation of data into knowledge and solutions is
accomplished in several ways.
Ú. New data collection occurs from various sources.
2. It is temporarily stored in a database then preprocessed to
fit the format of the organizations data warehouse or data
marts
3. Users then access the warehouse or data mart and take a
copy of the needed data for analysis.
4. Analysis (looking for patterns) is done with
èata analysis tools
result mining
6he èata tools activities is the generating of decision
of all these
support and knowledge
m 
À m! m  

6he result - generating knowledge

m 
"

The data life cycle begins with the acquisition of data from data sources.
These sources can be classified as internal, personal, and external.

½ Internal ata Sources are usually stored in the corporate database and
are about people, products, services, and processes.
½ ersonal ata is documentation on the expertise of corporate employees
usually maintained by the employee. It can take the form of:
½ estimates of sales
½ opinions about competitors
½ business rules
½ rocedures
½ Etc.
½ External ata Sources range from commercial databases to Government
reports.
½ Internet and Commercial atabase Services are accessible through the
Internet.

m 

  m #$ 

The task of data collection is fairly complex. Which can create data-quality
problem requiring validation and cleansing of data.

½ Collection can take place


½ in the field
½ from individuals
½ via manually methods
½ time studies
½ Surveys
½ bservations
½ contributions from experts
½ using instruments and sensors
½ Transaction processing systems (TS)
½ via electronic transfer
½ from a web site (Clickstream)
m  

      

ñne way to improve data collection from multiple external sources is to use
a data flow manager (DFM), which takes information from external sources
and puts it where it is needed, when it is needed, in a usable form.

½ FM consists of
½ a decision support system
½ a central data request processor
½ a data integrity component
½ links to external data suppliers
½ the processes used by the external data suppliers.

m  
% &  

Data quality (DQ) is an extremely important issue since quality determines


the data¶s usefulness as well as the quality of the decisions based on the
data. Data integrity means that data must be accurate, accessible, and up-
to-date.

½ Intrinsic : Accuracy, objectivity, believability, and


reputation.
½ Accessibility : Accessibility and access security.
½ Contextual : elevancy, value added, timeliness,
completeness, amount of data.
½ epresentation : Interpretability, ease of
understanding, concise representation, consistent
Data quality is the cornerstone of effective business intelligence.
representation.
m  
  
    

Document management is the automated control of electronic


documents, page images, spreadsheets, word processing documents, and
other complex documents through their entire life cycle within an
organization, from initial creation to final archiving.

½ Maintaining paper documents, requires that:


½ Everyone have the current version
½ An update schedule be determined
½ Security be provided for the document
½ The documents be distributed to the appropriate individuals in a
timely manner

m  
6  '  !

6ransactional processing takes place in operational systems (TPS) that


provide the organization with the capability to perform business
transactions and produce transaction reports. The data are organized
mainly in a      and are centrally processed. This is done
primarily for fast and efficient processing of routine, repetitive data.

A supplementary activity to transaction processing is called analytical


processing, which involves the analysis of accumulated data. Analytical
processing, sometimes referred to as © 

 
, includes data
mining, decision support systems (DSS), querying, and other analysis
activities. These analyses place strategic information in the hands of
decision makers to enhance productivity and make better decisions,
leading to greater competitive advantage.

m  
6   
A data warehouse is a repository of subject-oriented historical data that is
organized to be accessible in a form readily acceptable for analytical
processing activities (  

   
  


  
).

½ Benefits of a data warehouse are:


½ The ability to reach data quickly, since they are located in one place
½ The ability to reach data easily and frequently by end users with Web
browsers.
½ Characteristics of data warehousing are:
½ rganization. ata are organized by subject
½ Consistency. In the warehouse data will be coded in a consistent
manner.

m  
6    m  

½ 
e
of d
 we   e:
½ T me v 
. Te d
 e kep
f  my ye
ey  be
 ed f 
e , f e 
,  mp  ve
me.
½ N v l
le.  e e
ee 

e we  e, 
 e 

p
e.
½ el
l. Typ lly
e 
 we  e  e  el
l


e.
½ l e
 eve. Te 
 we  e  e
e l e
 eve
 
e
e m ly
p v e
e e  e  e y  e



.
½ Web-b e. 
 we  e e e e
p v e 
eff e
mp
 ev  me
f  Web-b e ppl 


m  

6    m  

m  
6  


A data mart is a small scaled-down version of a data warehouse


designed for a strategic business unit (S ) or a department. Since they
contain less information than the data warehouse they provide more rapid
response and are more easily navigated than enterprise-wide data
warehouses.

½ There are two major types of data marts:


½ eplicated (dependent) data marts are small subsets of the data
warehouse. In such cases one replicates some subset of the data
warehouse into smaller data marts, each of which is dedicated to a
certain functional area.
½ Stand-alone data marts. A company can have one or more
independent data marts without having a data warehouse. Typical
data marts are for marketing, finance, and engineering applications.

m  
6  m

Multidimensional databases £    are specialized data


stores that organize facts by dimensions, such as geographical region,
product line, salesperson, time. The data in these databases are usually
preprocessed and stored in  ©

½ ne intersection might be the quantities of a product sold by


specific retail locations during certain time periods.
½ Another matrix might be Sales volume by department, by
day, by month, by year for a specific region
½ Cubes provide faster:
½ ueries
½ Slices and ices of the information
½ ollups
½ rill owns

m  
  "

ñperational data store is a database for transaction processing systems


that uses data warehouse concepts to provide clean data to the TPS. It
brings the concepts and benefits of a data warehouse to the operational
portions of the business.
½ It is typically used for short-term decisions that require time
sensitive data analysis
½ It logically falls between the operational data in legacy
systems and the data warehouse.
½ It provides detail as opposed to summary data.
½ It is optimized for frequent access
½ It provides faster response times.

m  
 &   

usiness intelligence (I) is a broad category of applications and


techniques for gathering, storing, analyzing and providing access to data.
It help¶s enterprise users make better business and strategic decisions.
Major applications include the activities of query and reporting, online
analytical processing (ñAP), DSS, data mining, forecasting and statistical
analysis.
½ Business intelligence includes:
½ outputs such as financial modeling and budgeting
½ resource allocation
½ coupons and sales promotions
½ Seasonality trends
½ Benchmarking (business performance)
½ competitive intelligence.
×tarts with Knowledge Discovery
m  
 &    m  

^ow It Works.
m  
å $  

efore information can be processed into I it must be discovered or


extracted from the data stores. The major objective of this knowledge
discovery in databases (KDD) is to identify valid, novel, potentially useful,
and understandable patterns in data.

½  supported by three technologies:


½ massive data collection
½ powerful multiprocessor computers
½ data mining and other algorithms.
½  primarily employs three tools for information discovery:
½ Traditional query languages (SL, Î)
½ LA
½ ata mining

Discovering useful patterns

m  
å $  m  

Discovering useful patterns


m  
%

Queries allow users to request information from the computer that is not
available in periodic reports. Query systems are often based on menus or
if the data is stored in a database via a structured query language (SQ)
or using a query-by-example (Q) method.

½ User requests are stated in a query language and


the results are subsets of the relationship
½ Sales by department by customer type for specific period
½ Weather conditions for specific date
½ Sales by day of week
üÎ

m 
   !

ñnline analytical processing (ñAP) is a set of tools that analyze and


aggregate data to reflect business needs of the company. These business
structures (multidimensional views of data) allow users to quickly answer
business questions. ñAP is performed on Data Warehouses and Marts.

½ LA ( elational LA) is an LA database implemented


on top of an existing relational database. The
multidimensional view is created each time for the user.
½ MLA (Multidimensional LA) is a specialized
multidimensional data store such as a ata Cube. The
multidimensional view is physically stored in specialize data
files.

pplication View not a data structure or schema

m 


 

Data mining is a tool for analyzing large amounts of data. It derives its
name from the similarities between searching for valuable business
information in a large database, and mining a mountain for a vein of
valuable ore.

½ ata mining technology can generate new business


opportunities by providing:
½ Automated prediction of trends and behaviors.
½ Automated discovery of previously unknown or hidden patterns.
½ ata mining tools can be combined with:
½ Spreadsheets
½ ther end-user software development tools
½ ata mining creates a data cube then extracts data

m 

  6 (

½ Case-based reasoning. uses historical cases to recognize


patterns
½ Neural computing is a machine learning approach which
examines historical data for patterns.
½ Intelligent agents retrieving information from the Internet or
from intranet-based databases .
½ Association analysis uses a specialized set of algorithms that
sort through large data sets and express statistical rules
among items.
½ ecision trees
½ Genetic algorithms
½ Nearest-neighbor method
m 

  6)

½ Classification. Infers the defining characteristics of a


certain group.
½ Clustering. Identifies groups of items that share a
particular characteristic. Clustering differs from classification in that
no predefining characteristic is given.
½ Association. Identifies relationships between events that
occur at one time.
½ Sequencing. Identifies relationships that exist over a period
of time.
½ Forecasting. Estimates future values based on patterns
within large sets of data.
½ egression. Maps a data item to a prediction variable.
½ Time Series analysis examines a value as it varies over
time. m 
r *
  +   

In addition to data stored in traditional databases there are other


³structures´ that can be mined for patterns.

½ Text Mining is the application of data mining to non-


structured or less-structured text files
½ Web Mining is the application of data mining techniques to
data related to the World Wide Web. The data may be
present in web pages or related to Web activity.
½ Spatial Mining is the application of data mining techniques
to data that have a location component.
½ Temporal Mining is the application of data mining
techniques to data that are maintained for multiple points
in time.
m  


Data visualization refers to presentation of data by technologies such as


digital images, geographical information systems, graphical user interfaces,
multidimensional tables and graphs, virtual reality, three-dimensional
presentations, videos and animation.

½ Multidimensionality Visualization: Modern data and


information may have several dimensions.
½ imensions:
½ roducts
½ Salespeople
½ Market segments
½ Business units
½ Geographical locations
½ istribution channels
½ Countries
½ Industries

m  
 m  

Multidimensionality Visualization:

½ Measures:
½ Money
½ Sales volume
½ ead count
½ Inventory profit
½ Actual versus forecasted results.
½ Time:
½ aily
½ Weekly
½ Monthly
½ uarterly
½ early.

m  
 m  

m 

 m  

½ A geographical information system (GIS) is a computer-based


system for capturing, storing, checking, integrating, manipulating,
and displaying data using digitized maps. Every record or digital
object has an identified geographical location. It employs spatially
oriented databases.
½ Visual interactive modeling (VIM) uses computer graphic displays
to represent the impact of different management or operational
decisions on objectives such as profit or market share.
½ Virtual reality (V ) is interactive, computer-generated, three-
dimensional graphics delivered to the user. These artificial sensory
cues cause the user to ³believe´ that what they are doing is real.

m 

" 

Data warehouses and data marts serve end users in all functional areas.
Most current databases are static: They simply gather and store information.
Today¶s business environment also requires specialized databases.

½ Marketing transaction database (MT)


½ combines many of the characteristics of the current databases and
marketing data sources into a new database that allows marketers to
engage in real-time personalization and target every interaction with
customers
½ Interactive capability
½ an interactive transaction occurs with the customer exchanging
information and updating the database in real time, as opposed to the
periodic (weekly, monthly, or quarterly) updates of classical
warehouses and marts.

m 

, 
    " 

Data management and business intelligence activities²from data


acquisition to mining²are often performed with Web tools, or are
interrelated with Web technologies and e-business. This is done through
intranets, and for outsiders via extranets.

½ Enterprise BI suites and Corporate ortals integrate query,


reporting, LA, and other tools
½ Intelligent ata Warehouse Web-based Systems employ a
search engine for specific applications which can improve
the operation of a data warehouse
½ Clickstream ata Warehouse occur inside the Web
environment, when customers visit a Web site.

m 

, 
    " 

m

m 

, 
    " 

m

m 


-.+#&À&""/+"

½ m  
 
   Some of the data management solutions
discussed are very expensive and justifiable only in large corporations. Smaller organizations
can make the solutions cost effective if they leverage existing databases rather than create
new ones. A careful cost-benefit analysis must be undertaken before any commitment to the
new technologies is made.
½        Should data be distributed close to their users? This
could potentially speed up data entry and updating, but adds replication and security risks.
r should data be centralized for easier control, security, and disaster recovery? This has
communications and single point of failure risks.
½   
 ata mining may suggest that a company send catalogs or promotions to
only one age group or one gender. A man sued Victoria  Se e
 . be  e  feme
e b  e e e  m  e 
  w
 ee  
e
em  e e e e 
e
e 
  (
e  
w 
  e f   e   . Se

     

e  be e exe e.
½      Should a firm invest in internally collecting, storing, maintaining,
and purging its own databases of information? r should it subscribe to external databases,
where providers are responsible for all data management and data access?

m 


-.+#&À&""/+" m  
½     Can an organization¶s business processes, which have become dependent on
databases, recover and sustain operations after a natural or other type of information system disaster?
ow can a data warehouse be protected? At what cost?
½  
   Are the company¶s competitive data safe from external snooping or
sabotage? Are confidential data, such as personnel details, safe from improper or illegal access and
alteration? Who owns such personal data?
½ ’ aying for use of data. Compilers of public-domain information, such as Lexis-Nexis, face a
problem of people lifting large sections of their work without first paying royalties. The Collection of
Information Antipiracy Act (Bill  2652 in the U.S. Congress) will provide greater protection from
online piracy. This, and other intellectual property issues, are being debated in Congress and
adjudicated in the courts.
½   Collecting data in a warehouse and conducting data mining may result in the invasion of
individual privacy. What will companies do to protect individuals? What can individuals do to protect
their privacy?

m 


-.+#&À&""/+" m  

½     ne very real issue, often known as the legacy data acquisition problem, is what to
do with the mass of information already stored in a variety of systems and formats,. ata in older,
perhaps obsolete, databases still need to be available to newer database management systems.
Many of the legacy application programs used to access the older data simply cannot be converted
into new computing environments without considerable expense. Basically, there are three
approaches to solving this problem. ne is to create a database front end that can act as a translator
from the old system to the new. The second is to cause applications to be integrated with the new
system, so that data can be seamlessly accessed in the original format. The third is to cause the data
to migrate into the new system by reformatting it.
½   Moving data efficiently around an enterprise is often a major problem. The
inability to communicate effectively and efficiently among different groups, in different geographical
locations is a serious roadblock to implementing distributed applications properly, especially given the
many remote sites and mobility of today¶s workers.

m 

m 

Copyright © 2004 John Wiley & Sons, Inc. All rights


reserved. eproduction or translation of this work beyond
that permitted in Section ÚÚ of the Ú 6 United States
Copyright Act without the express written permission of the
copyright owner is unlawful. equest for further information
should be addressed to the ermissions epartment, John
Wiley & Sons, Inc. The purchaser may make back-up copies
for hisher own use only and not for distribution or
resale. The ublisher assumes no responsibility for errors,
omissions, or damages, caused by the use of these
programs or from the use of the information contained
herein.

m 


You might also like