100% found this document useful (2 votes)
32 views76 pages

Information Management

Uploaded by

larioquenerlisa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
32 views76 pages

Information Management

Uploaded by

larioquenerlisa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 76

INFORMATION MANAGEMENT

COURSE GUIDE

COURSE SEMESTER SCHOOL YEAR


Comp 211 1st Sem 2021-2022
COURSE DESCRIPTION
This course provides an introduction to the core concepts of information
management. It also provides essential skills in identifying, analyzing organizational
information requirements and building an information management system.
Course Outline
TOPIC
Module 1: Introduction to Information Management
Lesson 1. Information Concept
Lesson 2. Characteristics and Value of Information
Lesson 3. Computer-Based Information System
Lesson 4. Purpose of Information Management
Module 2: Organizing Data and Information
Lesson 1. Data Management
Lesson 2. Database Models
Lesson 3. Database Management Systems
Module 3. Data Warehousing
Lesson 1. Data Warehousing Overview
Lesson 2. Data Warehousing Concepts
Lesson 3. Data Warehousing Delivery Process
Lesson 4. Data Warehousing System Process
Lesson 5. Data Warehousing Architecture
Module 4. Data Mining
Lesson 1. Data Mining Concept
Lesson 2. Primary Goals of Data Mining
Lesson 3. Data Mining Process
Module 5. Data Security
Lesson 1. History of Information Security
Lesson 2. Definition of Security
Lesson 3. Key Information Security Concepts
Lesson 4. Critical Characteristics of Information
Lesson 5. Approaches to Information Security Implementation
Module 6. Social and Ethical Issues in Information System
Lesson 1. Information System Issues
Lesson 2. Computer Waste and Mistakes
Lesson 3. Computer Crime
Lesson 4. Privacy
Lesson 5. Work Environment
COURSE REQUIREMENTS
Students are expected to submit the following requirements or outputs:
 Compiled Assessment
 Major Exams
COURSE LEARNING OBJECTIVES
LO1. Identify and explain the components and importance of information
management system.
LO2. Relate the processes of developing and implementing information systems.
LO3. Apply the understanding of how various information systems work together to
accomplish the information objectives of an organization.
LO4. Outline the role of the ethical, social, and security issues of information
systems.

1
COURSE POLICIES
1. Students must pass all the assessment and major examinations.
2. Students must ensure that all assessment that will be submitted are original.
3. Students must complete all course requirements.
GRADING SYSTEM
Assessment 60%
Major Exams 40%
Total 100%
REFERENCES
1. Fundamentals of Information Systems 5th Edition Module 1. An Introduction to
Information Systems in Organizations retrieved from
https://www.radford.edu/mhtay/ITEC110/Fundamental_Info_Sys/Lecture/ch01_5e.pdf
2. Bennett, R. (2019, May 16) 8 Information Management Objectives to Benchmark Your
Success retrieved from https://miktysh.com.au/8-key-information-management-
objectives/
3. Oracle Philippines. (2020.). retrieved from https://www.oracle.com/ph/database/what-is-
data-management/
4. Tutorialspoint Simply Easy Learning. (2020). Retrieved from Tutorialspoint.com:
https://www.tutorialspoint.com/dbms/dbms_data_models.htm
5. Saad, A. Database Systems: Design, Implementation, & Management, Rob & Coronel
retrieved from https://www.aast.edu/pheed/staffadminview/pdf_retreive.php?
url=65_43655_CC414_20132014_1__2_1_CE414-lec2-Database%20Models
%20[Compatibility%20Mode].pdf&stafftype=staffcourses
6. Components of DBMS retrieved from https://www.studytonight.com/dbms/components-
of-dbms.php
7. Working with ER Diagrams retrieved from https://www.studytonight.com/dbms/er-
diagram.php
8. Tutorialspoint Simply Easy Learning. (2020). Learn DWH Data Warehousing retrieved
from https://www.tutorialspoint.com/dwh/dwh_overview.htm
9. Tutorialspoint Simply Easy Learning. (2020). Learn DWH Data Warehousing retrieved
from https://www.tutorialspoint.com/dwh/dwh_data_warehousing.htm
10. Tutorialspoint Simply Easy Learning. (2020). Learn DWH Data Warehousing retrieved
from https://www.tutorialspoint.com/dwh/dwh_delivery_process.htm
11. Tutorialspoint Simply Easy Learning. (2020). Learn DWH Data Warehousing retrieved
from https://www.tutorialspoint.com/dwh/dwh_system_processes.htm
12. Tutorialspoint Simply Easy Learning. (2020). Learn DWH Data Warehousing retrieved
from https://www.tutorialspoint.com/dwh/dwh_architecture.htm
13. Peltier, T., Peltier, J., & Blackley, J. (2005). Information Security Fundamentals. NY:
Auerbach Publications
14. Whitman, M., & Mattord, H. (2012). Principles of Information Security. Boston, MA:
Course Technology, Cengage Learning
15. Aggarwal, C. (2015). Data Mining. New York: Springer
16. Kantardzic, M. (2020). Data Mining: Concepts, Models, Methods, and Algorithms. New
Jersey: Wiley and Sons Inc.
17. Ethical and Social Issues in Information Systems (2013, March 15) retrieved from
http://ocmis.blogspot.com/2013/03/ethical-and-social-issues-in.html
18. Ethical and Social Issues in Information Systems by SAMMER QADER retrieved from
https://www.slideshare.net/SammerQader/module-4-ethical-and-social-issues-in-
information-systems-102496877
19. Module 9 The Personal and Social Impact of Computers Fundamentals of Information
Systems, Fifth Edition retrieved from
https://www.radford.edu/~mhtay/ITEC110/Fundamental_Info_Sys/Lecture/ch09_5e.pdf

2
3
MODULE 1: INTRODUCTION TO INFORMATION MANAGEMENT
Introduction

In this module, students will learn the fundamentals and core principles of information
management

Learning Objectives

At the end of this module, students should be able to:

1. Differentiate data from information.


2. Describe the characteristics and value of information used to evaluate the quality of
data.
3. Define what Computer-based Information System (CBIS) is.
4. Understand the purpose of Information Management.

Lesson 1: Information Concept

Data, information, and knowledge

Data: Raw facts

Information is a set of facts that have been organized in such a way that they have
worth beyond the facts themselves.

Process: A sequence of logically related tasks carried out to attain a specific goal.

Knowledge is the awareness and comprehension of a set of facts.

Types of Data

Data Represented By
Alphanumeric Data Numbers, letters and other characters
Image Data Graphic images and pictures
Audio Data Sound, noise or tones
Video Data Moving images or pictures
Data, Information and Knowledge

The transformation process


DATA (applying knowledge by
INFORMATION
selecting, organizing, and
manipulating data)

Process of Transforming Data into Information

Lesson 2: Characteristics and Value of Information

4
If information is not accurate or complete

 People can make bad decisions that cost tens of thousands of dollars, if not millions of
dollars and information can be of little use to a company.
 If information isn't relevant, it's either not supplied to decision makers in a timely
manner or it's too complicated to comprehend.

The following are the Characteristics of Valuable Information:

Characteristic
Definition
s
Authorized users should have easy access to information in the
Accessible
right format and at the right time to satisfy their needs.
Error-free data is accurate data. In some circumstances,
Accurate incorrect data is input into the transformation process, resulting
in false information.
Complete All of the key facts are included in the complete information.
Information should also be affordable to generate. Decision-
Economical makers must always weigh the usefulness of data against the
expense of gathering it.
Flexible Flexible data can be applied to a wide range of applications.
For the decision maker, relevant information is critical.
Relevant Information indicating a decline in lumber prices may not be
significant to a computer ship maker.
Users can rely on reliable information. In many circumstances,
the information's trustworthiness is determined by the data
Reliable gathering method's reliability. In other cases, the information's
dependability is determined by the source. A notion that oil
prices will rise from an unknown source may not be true.
Secure Unauthorized users must not be able to access information.
Information should be easy to understand and not unduly
complicated. It's possible that sophisticated and thorough
Simple information won't be required. In fact, having too much
information can lead to information overload, where a decision
maker is unable to decide what is truly important.
Timely Timely information is delivered when it is needed.
The information provided should be verifiable. This implies you
Verifiable may double-check it, possibly by examining multiple sources
for the same information.

Valuable information

 Can assist individuals and organizations in completing activities more efficiently


and effectively.

5
- Can assist managers in determining whether or not to invest in new information
systems and technology.
Lesson 3: Computer-based Information System

- is a well-organized combination of hardware and software technologies, as well as


human aspects, that produces fast, accurate, and usable data for decision-making.

- is a type of information system that uses computer technology to carry out some or all
of its activities.
The basic components of computer-based information system are:

 Hardware– these are the


devices like the monitor,
processor, printer and
keyboard, all of which work
together to accept, process,
show data and information.
 Software– are the programs that
allow the hardware to process the
data.
 Databases– are the gathering of
associated files or tables
containing related data.
 Networks– are a connecting
system that allows diverse
computers to distribute
resources.
 Procedures– are the
commands for combining the components above to process information and produce
the preferred output. Include strategies, policies, methods, and rules for using the
CBIS.
 Telecommunications- the electronic transmission of signals for communications.
 Internet – World’s largest computer network, consisting of thousands of
interconnected networks, all freely exchanging information.
 People – The most important element in most computer-based information systems.

What is known as the information technology platform consists of the first four
components (hardware, software, database, and network). These components might then be
used by information technology professionals to build information systems that monitor
safety, risk, and data management. Information technology services refer to these activities.

Lesson 4: Purpose of Information Management


Information is organized in a way that makes it easy to access and use.

6
Information architecture (IA) is the structure of data within a company, and it's what
information management relies on to assure data security, findability, usability, and
interpretation.

Protecting and managing information in the workplace

Effective IM also relies on data and information security, which crosses over into IT
and has significant consequences for data and information privacy, security, cyber security,
and the decommissioning and archiving of old equipment.

Increasing the worth of company information

The opportunities given by focusing on information value were described by Andrew


McAfee and Erik Brynjolfsson in their 2012 Feature in Big Data. “Companies in the top third
of their industry, in terms of data-driven decision making, were, on average, 5% more
productive and 6% more profitable than their competitors,” according to the survey.

Operational risk is managed and mitigated.

Risks will be identified, assessed, evaluated, and mitigated via an effective


information management approach. Every day, businesses confront a variety of IM-related
hazards, such as noncompliance with regulatory recordkeeping requirements, unauthorized
data destruction, cyber-attacks, data breaches, and lost or leaked intellectual property (IP) or
other valuable information.

Regulatory and legislative compliance are ensured.

Business information is governed by laws and regulations that control how it is


gathered, maintained, utilized, and disposed of. Understanding the many policies and
legislation affecting the process, which frequently covers a lot of ground, is crucial to
minimizing risk.

Managing the lifecycle of information assets in an efficient and effective manner

A strategy for information management provides a framework, policies, procedures,


and processes for managing information throughout its lifecycle. It considers the people,
processes, and technologies needed to assist in the protection, management, and extension
of the value and usefulness of information. Lifecycle management seeks to improve
understanding of how information is created, managed, and used within an organization, as
well as to find ways to reduce inefficiencies and prioritize usefulness over time.

Internal and external collaboration should be promoted and supported.

7
Collaboration, communication, and information sharing have become critical in
modern businesses, particularly in organizations where employees are dispersed, work
remotely, or are on the go. The continuous evolution of the Internet has resulted in the
proliferation of networks and technological solutions that encourage collaboration.

Data integration allows for the automation of business processes.

The increased opportunities for business process automation enabled by the


provision of consistent and high-quality data across business applications is a significant
benefit of implementing an effective information management program on an organizational
level.

Assessment No. 1

Introduction:

This activity will help students differentiate between data and information, identify
different types of data, and understand how data is transformed into an information.

Integrative Activity:

A. Briefly answer the following. (5pts each)


1. Discuss the distinction between data and information.
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________.
2. Discuss the process of Transforming Data into Information.
_________________________________________________________________
_________________________________________________________________
_________________________________________________________________
_________________________________________________________________.

B. Complete the table below

Types of Data Represented By


Alphanumeric Data
Graphic images and pictures
Audio Data
Moving images or pictures
C. Write the letter of the correct answer before the statement.

_________________1. Collection of facts organized in such a way that they have additional
value beyond the value of the facts themselves.

8
_________________2. The commands for combining the components above to process
information and produce the preferred output.
_________________3. The most important element in most computer-based information
systems.
_________________4. An organized integration of hardware and software technologies and
human elements designed to produce timely, integrated, accurate and useful information for
decision making purposes.

_________________5. Set of logically related tasks performed to achieve a defined


outcome.

_________________6. A type of data represented by numbers, letters and other characters.

_________________7. Programs that allow the hardware to process the data.

_________________8. Awareness and understanding of a set of information.

_________________9. A connecting system that allows diverse computers to distribute


resources.
________________10. The structure of information within an organization and is what
information management relies on to ensure efficient information security, findability,
usability and interpretation.

Assessment No. 2

Introduction:

This activity will help students in describing the characteristics and value of
information used to assess data quality.

Integrative Activity:

Define the following Valuable Information Characteristics in your own words.

Characteristic
Definition
s
Accessible

Accurate

Complete

Economical

Relevant

9
Flexible

Reliable

Secure

Simple

Timely

Verifiable

Assessment No. 3

Introduction:

This activity will help students in determining the purpose of a computer-based


information system in an organization and its fundamental components.

Integrative Activity:

A. Provide at least three reasons why a computer-based information system is important in


an organization.
1. ___________________________________________________________________
___________________________________________________________________
____________________________________________________
2. ___________________________________________________________________
___________________________________________________________________
____________________________________________________
3. ___________________________________________________________________
___________________________________________________________________
____________________________________________________

B. Enumerate and describe and the fundamental components of a computer-based


information system.

1. ___________________________________________________________________
_________________________________________________________
2. ___________________________________________________________________
_________________________________________________________
3. ___________________________________________________________________
_________________________________________________________
4. ___________________________________________________________________
_________________________________________________________

10
5. ___________________________________________________________________
_________________________________________________________
6. ___________________________________________________________________
_________________________________________________________
7. ___________________________________________________________________
_________________________________________________________
8. ___________________________________________________________________
_________________________________________________________

Assessment No. 4

Introduction:

This activity will help students identify the functions of an information management
system.

Integrative Activity:

Provide at least five Information Management Functions.

1. ___________________________________________________________________
_________________________________________________________
2. ___________________________________________________________________
_________________________________________________________
3. ___________________________________________________________________
_________________________________________________________
4. ___________________________________________________________________
_________________________________________________________
5. ___________________________________________________________________
_________________________________________________________

MODULE 2: ORGANIZING DATA AND INFORMATION: FILES AND DATABASES

Introduction

This module gives students an insight on how an organization uses data, manages
data and transforms it into useful information. This module also covers the different data

11
models used in organizing and managing data and an overview of database management
system.

Learning Objectives

At the end of this module, the students should be able to:


1. Define Data Management.
2. Identify the different Data Model and discuss how to use them.
3. Define Database Management System and identify its purpose.

Lesson 1: Data Management

Data management is the practice of collecting, keeping, and using data securely,
efficiently, and cost-effectively. The goal of data management is to help people,
organizations, and connected things optimize the use of data within the bounds of policy and
regulation so that they can make decisions and take actions that maximize the benefit to the
organization. As organizations increasingly rely on intangible assets to create value, a strong
data management strategy is more important than ever.
In an organization, managing digital data entails a wide range of tasks, policies,
procedures, and practices. The work of data management encompasses a wide range of
issues, including how to
 Create, access, and update data across a diverse data tier
 Store data across multiple clouds and on premises
 Provide high availability and disaster recovery
 Use data in a growing variety of apps, analytics, and algorithms
 Ensure data privacy and security
 Archive and destroy data in accordance with retention schedules and compliance
requirements

A formal data management strategy addresses the activity of users and


administrators, the capabilities of data management technologies, the demands of regulatory
requirements, and the needs of the organization to obtain value from its data.

Lesson 2: Database Models

A Database model defines the logical design and structure of a database and defines
how data will be stored, accessed and updated in a database management system.

12
The Importance of Data Models

 Data model
- A relatively simple graphical representation of complex real-world data structures.
- A tool for facilitating communication between the designer, the application programmer,
and the end user.
 An appropriate data model serves as the foundation for good database design.
 End-users have different perspectives and needs when it comes to data.
 The data model organizes information for a variety of users.
Development of Data Models

url=65_43655_CC414_20132014_1__2_1_CE414-lec2-Database%20Models%20[Compatibility
%20Mode].pdf&stafftype=staffcourses.

13
1. Hierarchical Model

This database model organizes


data into a tree-like-structure, with a
single root, to which all the other data is
linked. The hierarchy starts from
the Root data, and expands like a tree,
adding child nodes to the parent nodes.
In this model, a child node will only have
a single parent node. This model
efficiently describes many real-world
relationships like index of a book,
recipes etc.
Image source: https://www.studytonight.com/dbms/database-model.php
In hierarchical model, data is
organized into tree-like structure
with one one-to-many relationship between two different types of data, for example, one
department can have many courses, many professors and of course many students.

2. Network Model

This is an extension of the


Hierarchical model. In this model data is
organized more like a graph, and are allowed
to have more than one parent node. In this
database model data is more related as more
relationships are established in this
database model. Also, as the data is more related,
hence accessing the data is also easier and fast.
This database model was used to map many-
to-many data relationships. This was the Image source: https://www.studytonight.com/dbms/database-
most widely used database model, before model.php

Relational Model was introduced.

3. Entity-relationship Model

14
In this database model, relationships are created by dividing object of interest into
entity and its characteristics into attributes. Different entities are related using relationships.

E-R Models are defined to represent the relationships into pictorial form to make it
easier for different stakeholders to understand.

This model is good to design a database, which can then be turned into tables in
relational model.

Let's take an example, if we have to


design a School Database, then Student will
be an entity with attributes name, age,
address etc. As Address is generally
complex, it can be
another entity with attributes street name,
pin code, city etc., and there will be a
relationship between them.
Image source: https://www.studytonight.com/dbms/database-
model.php

Working with ER Diagrams

An ER Diagram is a graphical
representation of data that describes how data
is related to one another. We disintegrate data
into entities, attributes, and setup relationships
between entities in the ER Model, which can all
be visually represented using the ER diagram.

In the ER Diagram, for example, anyone


Image source: https://www.studytonight.com/dbms/er-
diagram.php

could see and fully comprehend what the


diagram is attempting to say: A Developer
creates a website, whereas a Visitor views
one.

Components of ER Diagram

15
Entities, attributes, relationships, and so on are the components of an ER Diagram,
and there are defined symbols and shapes to represent each of them.

Let's see how we can represent these in our ER Diagram.

Entity

An Entity is represented by a simple


rectangular box.

Relationships between Entities - Weak and


Strong

Rhombuses are used to establish


connections between two or more entities.

Attributes for any Entity

Ellipse is a shape that can be used to


represent any entity's attributes. It is
associated with the entity.

Weak Entity

A weak Entity is represented by two


rectangular boxes. It is usually associated with
another entity.

Key Attribute for any Entity

To represent a Key attribute, the


attribute name within the Ellipse is
underlined.

Derived Attribute for any Entity

Derived attributes are those that can be


derived from other attributes, such as age,
which can be derived from a person's birth date.
To represent a derived attribute, another
dotted ellipse is drawn inside the main ellipse.

Multivalued Attribute for any Entity Image source: https://www.studytonight.com/dbms/er-


diagram.php

16
The attribute with multiple values is represented by a double ellipse, one inside the
other.

Composite Attribute for any Entity

An attribute with attributes is referred to as a composite attribute.

ER Diagram: Entity
Image source: https://www.studytonight.com/dbms/er-
An Entity can be any object, place, diagram.php
person, or class. Rectangles are used to
represent entities in the ER Diagram. Consider the following entities to be entities in an
Organization: employee,
manager, department,
product, and numerous
others.

A relationship is represented
by the yellow rhombus in the
middle.

ER Diagram: Weak Entity

A weak entity is one that is dependent on another entity.

Weak entity lacks any key attribute of its own. A


Image source: https://www.studytonight.com/dbms/er-
weak entity is represented by a double diagram.php
rectangle.

ER Diagram: Attribute

An Attribute is a term that describes a property or


feature of an entity.

For example, Name, Age, Address etc. can be


attributes of a Student. An attribute is represented using
eclipse.
Image source: https://www.studytonight.com/dbms/er-
diagram.php

ER Diagram: Key Attribute

17
The main characteristic of an Entity is represented by the key attribute. It is used to
denote a primary key. Key Attribute is represented by an ellipse with the text underlined.

Image source: https://www.studytonight.com/dbms/er-


diagram.php

ER Diagram: Composite Attribute

An attribute can also have attributes of their own. These are referred to as
Composite attributes.

ER Diagram: Relationship
A Relationship describes the
relationship between two or more
entities. Diamonds or rhombuses are Image source: https://www.studytonight.com/dbms/er-
diagram.php
used to represent relationships.

Entities have three types of relationships with


one another.

1. Binary Relationship
2. Recursive Relationship
3. Ternary Relationship

ER Diagram: Binary Relationship

A binary relationship is one that exists between two entities. This is further
subdivided into three categories.

One to One Relationship

This type of relationship is


exactly unusual in the real world.
Image source: https://www.studytonight.com/dbms/er-
diagram.php

According to the example, a student can only enroll in one course, and a course can
only have one student. This is not typically seen in real-life relationships.

18
One to Many Relationship

The example illustrates this


relationship, which means that 1student
can choose from a variety of courses, but
each course can only have 1 student.

Many to One Relationship

It is reflected in business rules that


multiple entities can be associated with a
single entity. For example, a Student may
enroll in only one Course, but a Course
may have multiple Students.

Many to Many Relationship

The example diagram depicts how a


single student can enroll in multiple
courses. A course can also have more
Image source: https://www.studytonight.com/dbms/er-
than one student enrolled in it. diagram.php

4. Relational Model

Data is organized in two-dimensional tables in this model, and the relationship is


maintained by storing a common field. E.F Codd introduced this model in 1970, and it has
since become the most widely used database model, if not the only database model used
globally.

Tables are the basic data structure in the relational model. All information pertaining to a
specific type is stored in the table's rows.

Hence, tables are


also known
as rel ations in
relational model.

19
Image source: https://www.studytonight.com/dbms/database-
model.php

5. Object Oriented Model


Hammer and McLeod created this semantic data model (SDM) in 1981. Data and
their relationships were modeled in a single structure known as an object. This model serves
as the foundation for the object-oriented data model (OODM).
OODM is used as the foundation for the object-oriented database management
system (OODBMS). Objects, like entities in a relational model, are described by their factual
content. Unlike the entity in the relational model, it includes information about relationships
between facts within an object as well as relationships with other objects. Following OODM
development, an object could also contain operations. The object is transformed into a
fundamental building block for autonomous structures.
Basic Structure of an Object Oriented Data Model
 Object: an abstraction of a real-world entity
 Object: an abstraction of a real-world entity
 Attributes describe an object's properties
 Classes group objects with similar characteristics
 Classes are organized in a class hierarchy
 Inheritance is the ability of an object within a class hierarchy to inherit the attributes
and methods of classes above it
Advantages
 Adds semantic content
 Visual presentation includes semantic content
 Database integrity
 Both structural and data independence

20
Disadvantages
 Slow pace of OODM standards development
 Complex navigational data access
 Steep learning curve
 High system overhead slows transactions
 Lack of market penetration

Lesson 3: Database Management Systems

Database Management System (DBMS) refers to the technology solution used to


optimize and manage the storage and retrieval of data from databases. DBMS offers a
systematic approach to manage databases via an interface for users as well as workloads
accessing the databases via apps. The management responsibilities for DBMS encompass
information within the databases, the processes applied to databases (such as access and
modification), and the database’s logic structure. DBMS also facilitates additional
administrative operations such as change management, disaster recovery, compliance, and
performance monitoring, among others.

Functions of DBMS
DBMS performs several important functions that guarantee the integrity and
consistency of the data in the database. The most important functions of Database
Management System are

1. Data Dictionary Management,


2. Data Storage Management,
3. Data Transformation and Presentation,
4. Security Management,
5. Multi user Access Control,
6. Backup and Recovery Management,
7. Data Integrity Management,
8. Database Access Languages and Application Programming Interfaces and
9. Database Communication interfaces.

Assessment No. 1

21
Introduction:

This activity will help students to understand the importance of data management, identify
different data models, and discuss how to use them. This activity also helps students in
recognizing the components of an Entity Relationship (ER) Diagram.

Integrative Activity:

A. Discuss the significance of data management in an organization.


________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________

Explain the purpose of database models and their development.


________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________.

B. Complete the table by writing the functions of the following database models.

Database Models Functions

1. Hierarchical Model

2. Network Model

3. Relational Model

4. Entity Relationship
Model

5. Object-oriented Model

C. Enumerate and discuss the components of ER Diagram.

________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________.

22
D. List the essential features of a database management system.

________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________

MODULE 3: DATA WAREHOUSING

Introduction

This module explains the necessary concepts related to data warehousing.

Learning Objectives
At the end of this module, the students should be able to:
1. Define Data Warehousing.
2. Identify the types and functions of Data Warehousing.
3. Enumerate the Data Warehousing Delivery Process and Data Warehousing System
Process.
4. Identify the Data Warehousing Architecture.

Lesson 1: Data Warehousing Overview

The term "Data Warehouse" was first coined by Bill Inmon in 1990. According to
Inmon, a data warehouse is a subject oriented, integrated, time-variant, and non-volatile
collection of data. This data helps analysts to take informed decisions in an organization.

Because of the transactions that occur on a daily basis, an operational database


undergoes frequent changes. If a business executive wants to analyze previous feedback
on any data, such as a product, a supplier, or consumer data, the executive will have no
data to analyze because the previous data has been updated as a result of transactions.

A data warehouse provides us with aggregated and consolidated data in a


multidimensional format. A data warehouse provides us with Online Analytical Processing
(OLAP) tools in addition to a generalized and consolidated view of data. These tools enable
us to conduct interactive and effective data analysis in a multidimensional space. Data
generalization and data mining occur as a result of this analysis.

To improve interactive knowledge mining at multiple levels of abstraction, data


mining functions such as association, clustering, classification, and prediction can be
combined with OLAP operations. As a result, data warehouses have become increasingly
important as a platform for data analysis and online analytical processing.

23
Why is a Data Warehouse distinct from operational databases?
For the following reasons, a data warehouse is kept separate from operational
databases:
 An operational database is designed for common tasks and workloads such as
searching for specific files, indexing, and so on. Contract data warehouse queries are
commonly complex and contain a wide variety of data.
 Multiple transactions can be processed concurrently in operational databases.
Concurrency control and recovery mechanisms are required for operational databases
to ensure the database's robustness and consistency.
 An operational database query can read and modify operations, whereas an OLAP
query can only read stored data.
 An operational database keeps current data. A data warehouse, on the other hand,
stores historical data.
Data Warehouse Features

 Subject Oriented- A data warehouse is subject oriented because it offers information


about a specific subject rather than the ongoing operations of the organization. A
product, customers, suppliers, sales, revenue, and so on are examples of such topics. A
data warehouse is not concerned with ongoing operations; instead, it is concerned with
data analysis and design for decision making.
 Integrated − A data warehouse is built by combining composed of different sources
such as relational databases, flat files, and so on. This integration improves the
efficiency with which data can be analyzed.
 Time Variant − A data warehouse's data is associated with a specific time period. A
data warehouse's data provides historical information.
 Non-volatile − When new data is added to a non-volatile storage device, the previous
data is not erased. Because a data warehouse is kept separate from an operational
database, frequent changes in the operational database are not reflected in the data
warehouse.

Note − because a data warehouse is physically stored and separate from the operational
database, it does not require transaction processing, recovery, or concurrency controls.

24
Data Warehouse Applications
A data warehouse assists business executives in organizing, analyzing, and making
decisions based on their data. A data warehouse is the sole component of an enterprise
management's plan-execute-assess "closed-loop" feedback system. Data warehouses are
commonly used in the following industries:

 Financial services

 Banking services

 Consumer goods

 Retail sectors

 Controlled manufacturing

Types of Data Warehouse

The three types of data warehouse applications discussed below are information
processing, analytical processing, and data mining.

Information Processing- A data warehouse enables the processing of data stored in it.
Data can be processed using querying, basic statistical analysis, and reporting via
crosstabs, tables, charts, or graphs.

Analytical Processing- A data warehouse facilitates the analytical processing of the


information it stores. Basic OLAP operations such as slice-and-dice, drill down, drill up, and
pivoting can be used to analyze the data.

Data mining contributes to knowledge discovery by revealing hidden patterns and


associations, developing analytical models, and performing classification and prediction.
Mining results can be presented using visualization tools.

SR. Operational Database (OLTP)


Data Warehouse (OLAP)
NO.
It involves historical processing of
1 It involves day-to-day processing
information.
OLAP systems are used by knowledge
OLTP systems are used by clerks,
2 workers such as executives,
DBAs, or database professionals.
managers, and analysts.
3 It is used to analyze the business. It is used to run the business.
4 It focuses on Information out. It focuses on Data in.
It is based on Star Schema, Snowflake
It is based on Entity Relationship
5 Schema, and Fact Constellation
Model.
Schema.
6 It focuses on Information out. It is application oriented.

25
7 It contains historical data. It contains current data.
It provides summarized and It provides primitive and highly
8
consolidated data. detailed data.
It provides summarized and It provides detailed and flat
9
multidimensional view of data. relational view of data.
The number of users is in
10 The number of users is in hundreds.
thousands.
The number of records accessed is in The number of records accessed
11
millions. is in tens.
The database size is from 100GB to The database size is from 100 MB
12
100 TB. to 100 GB.
13 These are highly flexible. It provides high performance.

Lesson 2: Data Warehousing Concepts

The process of creating and utilizing a data warehouse is known as data


warehousing. A data warehouse is built by combining data from disparate sources to support
analytical reporting, structured and/or ad hoc queries, and decision making. Data
warehousing entails cleaning, integrating, and consolidating data.
Using Data Warehouse Information
There are decision support technologies that can help you make use of the data in a
data warehouse. These technologies assist executives in making efficient and effective use
of the warehouse. They can gather information in the warehouse, evaluate it, and make
choices based on it. The data gathered in a warehouse can be used in any of the following
domains:
 Tuning Production Strategies − Product strategies can be fine-tuned by
repositioning products and managing product portfolios by comparing quarterly or
yearly sales.

 Customer Analysis − Customer analysis is performed by evaluating the customer's


purchasing preferences, purchasing time, budget cycles, and so on.
 Operations Analysis − Data warehousing and operations analysis also aid in
customer relationship management and environmental corrections. We can also use
the data to analyze business operations.

26
Integrating Heterogeneous Databases
We have two approaches for integrating heterogeneous databases –
 Query-driven Approach
 Update-driven Approach
Query-Driven Approach

This is the standard method for trying to integrate disparate databases. This method
was used to build wrappers and integrators on top of multiple heterogeneous databases.
These integrators are also known as mediators.
Process of Query-Driven Approach

 When a query is issued to a client, a metadata dictionary transforms the query into
an appropriate form for each of the involved heterogeneous sites.
 These queries are now mapped and routed to the local query processor.
 The results from various sites are combined to form a global answer set.
Disadvantages

 A query-driven approach necessitates complex integration and filtering processes.


 This method is extremely inefficient.
 It is prohibitively expensive for frequent queries.
 This method is also prohibitively expensive for queries that require aggregations.
Update-Driven Approach

This is an alternative approach to the traditional one. Today's data warehouse


systems use an update-driven approach rather than the traditional approach discussed
earlier. In an update-driven approach, information from multiple heterogeneous sources is
integrated in advance and stored in a warehouse. This information is available for direct
querying and analysis.

Advantages

27
This approach has the following benefits:
 It provides high performance
 The data is copied, processed, integrated, annotated, summarized, and restructured
in advance in a semantic data store.
 To process data from local sources, query processing does not necessitate the use
of an interface.
Data Warehouse Tools and Utilities Functions
The functions of data warehouse tools and utilities are as follows:

 Data Extraction − It entails gathering data from a variety of heterogeneous sources.

 Data Cleaning − Involves finding and correcting the errors in data.

 Data Transformation − Involves converting the data from legacy format to


warehouse format.
 Data Loading − Involves sorting, summarizing, consolidating, checking integrity,
and building indices and partitions.
 Refreshing − Involves updating from data sources to warehouse.

Note − Data cleaning and data transformation are important steps in improving the quality
of data and data mining results.

Lesson 3: Data Warehousing Delivery Process

A data warehouse is never static; it evolves in tandem with the growth of the
business. As a business evolves, so do its requirements, and a data warehouse must be
designed to keep up. As a result, flexibility is required in a data warehouse system.

To deliver a data warehouse, ideally, there should be a delivery process. However,


data warehouse projects are frequently plagued by a slew of issues that make it difficult to
complete tasks and deliverables in the strict and orderly manner required by the waterfall
method.

Frequently, the requirements are not completely understood. Only after gathering
and studying all of the requirements can architectures, designs, and build components be
completed.
Delivery Method

The method of delivery is a variation on the joint application development approach


used for data warehouse delivery. To reduce risks, we have staged the delivery of the data
warehouse.

28
The approach we will discuss here does not shorten overall delivery times, but
ensures that business benefits are delivered incrementally throughout the development
process.

Note − The delivery process is broken into phases to reduce the project and delivery risk.

The following diagram explains the stages in the delivery process –


IT

Image source: www.tutorialspoint.com/dwh/dwh_delivery_process.htm


Strategy

Data warehouse are strategic investments that require a business process to


generate benefits. IT Strategy is required to procure and retain funding for the project.

Business Case
The goal of a business case is to estimate the business benefits of implementing a
data warehouse.
These benefits may not be quantifiable, but the projected benefits must be stated
clearly. If a data warehouse lacks a clear business case, the company is likely to face
credibility issues at some point during the delivery process. As a result, in data warehouse
projects, we must comprehend the business case for investment.
Education and Prototyping
Before settling on a solution, organizations experiment with the concept of data
analysis and educate themselves on the importance of having a data warehouse.
Prototyping addresses this issue. It aids in comprehending the feasibility and benefits
of a data warehouse. Prototyping on a small scale can help the educational process as long
as

 The prototype addresses a defined technical objective.


 The prototype can be thrown away after the feasibility concept has been shown.
 The activity addresses a small subset of eventual data content of the data
warehouse.

29
 The activity timescale is non-critical.
The following points are to be kept in mind to produce an early release and deliver business
benefits.
 Identify the architecture that is capable of evolving.
 Focus on business requirements and technical blueprint phases.
 Limit the scope of the first build phase to the minimum that delivers business
benefits.
 Understand the short-term and medium-term requirements of the data warehouse.

Business Requirements

We must ensure that the overall requirements are understood in order to provide
high-quality deliverables. We can design a solution to meet short-term requirements if we
understand the business requirements for both the short and medium term. The short-term
solution can then be expanded into a comprehensive solution.

This stage determines the following aspects:

 The business rule to be applied on data.


 The logical model for information within the data warehouse.
 The query profiles for the immediate requirement.
 The source systems that provide this data.

Technical Blueprint

This phase must produce an overall architecture that meets the long-term
requirements. This phase also provides the components that must be implemented quickly
in order to generate any business benefits.

The blueprint needs to identify the following.

 The overall system architecture.


 The data retention policy.
 The backup and recovery strategy.
 The server and data mart architecture.
 The capacity plan for hardware and infrastructure.
 The components of database design.

30
Building the Version

In this stage, the first production deliverable is produced. This production deliverable
is the smallest component of a data warehouse. This smallest component adds business
benefit.

History Load

This is the phase where the remainder of the required history is loaded into the data
warehouse. In this phase, we do not add new entities, but additional physical tables would
probably be created to store increased data volumes.

Let us take an example. Suppose the build version phase has delivered a retail sales
analysis data warehouse with 2 months’ worth of history. This information will allow the user
to analyze only the recent trends and address the short-term issues. The user in this case
cannot identify annual and seasonal trends.

To help him do so, last 2 years’ sales history could be loaded from the archive. Now
the 40GB data is extended to 400GB.

Note − The backup and recovery procedures may become complex, therefore it is
recommended to perform this activity within a separate phase.

Ad hoc Query

In this phase, we configure an ad hoc query tool that is used to operate a data
warehouse. These tools can generate the database query.

Note − It is recommended not to use these access tools when the database is being
substantially modified.

Automation
In this phase, operational management processes are fully automated. These would
include −

 Transforming the data into a form suitable for analysis.


 Monitoring query profiles and determining appropriate aggregations to maintain
system performance.
 Extracting and loading data from different source systems.
 Generating aggregations from predefined definitions within the data warehouse.
 Backing up, restoring, and archiving the data.

31
Extending Scope

In this phase, the data warehouse is extended to address a new set of business
requirements. The scope can be extended in two ways −

 By loading additional data into the data warehouse.


 By introducing new data marts using the existing information.

Note − This phase should be performed separately, since it involves substantial efforts and
complexity.

Requirements Evolution
The requirements are always changeable from the standpoint of the delivery process.
They are not inactive. This must be supported by the delivery process, which must allow
these changes to be reflected in the system.

This problem is solved by designing the data warehouse around the use of data
within business processes rather than the data requirements of existing queries.

The architecture is intended to change and grow to meet the needs of the business;
the process operates as a pseudo-application development process, in which new
requirements are constantly fed into the development activities and partial deliverables are
produced. These partial deliverables are fed back to users and then reworked to ensure that
the overall system is constantly updated to meet business needs.

Lesson 4: Data Warehousing System Process

Process Flow in Data Warehouse

There are four major processes that contribute to a data warehouse −

 Extract and load the data.

 Cleaning and transforming the


data.

 Backup and archive the data.

Image source: www.tutorialspoint.com/dwh/dwh_system_processes.htm


 Managing queries and directing
them to the appropriate data
sources.

1. Extract and Load Process

Data extraction takes data from the source systems. Data load takes the extracted
data and loads it into the data warehouse.

32
Note − Before loading the data into the data warehouse, the information extracted from the
external sources must be reconstructed.

Controlling the Process

Controlling the process involves determining when to start data extraction and the
consistency check on data. Controlling process ensures that the tools, the logic modules,
and the programs are executed in correct sequence and at correct time.

When to Initiate Extract

Data needs to be in a consistent state when it is extracted, i.e., the data warehouse
should represent a single, consistent version of the information to the user.

For example, in a customer profiling data warehouse in telecommunication sector, it


is illogical to merge the list of customers at 8 pm on Wednesday from a customer database
with the customer subscription events up to 8 pm on Tuesday. This would mean that we are
finding the customers for whom there are no associated subscriptions.

Loading the Data

After extracting the data, it is loaded into a temporary data store where it is cleaned
up and made consistent.

Note − Consistency checks are executed only when all the data sources have been loaded
into the temporary data store.

2. Clean and Transform Process

Once the data is extracted and loaded into the temporary data store, it is time to
perform Cleaning and Transforming. Here is the list of steps involved in Cleaning and
Transforming −

 Clean and transform the loaded data into a structure

 Partition the data

 Aggregation

Clean and Transform the Loaded Data into a Structure

Cleaning and transforming the loaded data help speed up the queries. It can be
done by making the data consistent –

33
 within itself.

 with other data within the same data source.

 with the data in other source systems.

 with the existing data present in the warehouse.

Transforming involves converting the source data into a structure. Structuring the
data increases the query performance and decreases the operational cost. The data
contained in a data warehouse must be transformed to support performance requirements
and control the ongoing operational costs.

Partition the Data

It will optimize the hardware performance and simplify the management of data
warehouse. Here we partition each fact table into multiple separate partitions.

Aggregation

Aggregation is required to speed up common queries. Aggregation relies on the fact


that most common queries will analyze a subset or an aggregation of the detailed data.

3. Backup and Archive the Data

In order to recover the data in the event of data loss, software failure, or hardware
failure, it is necessary to keep regular backups. Archiving involves removing the old data
from the system in a format that allow it to be quickly restored whenever required.

For example, in a retail sales analysis data warehouse, it may be required to keep
data for 3 years with the latest 6 months data being kept online. In such a scenario, there is
often a requirement to be able to do month-on-month comparisons for this year and last
year. In this case, we require some data to be restored from the archive.

4. Query Management Process

This process performs the following functions −

 manages the queries.


 helps speed up the execution time of queries.
 directs the queries to their most effective data sources.
 ensures that all the system sources are used in the most effective way.

34
 monitors actual query profiles.

The information generated in this process is used by the warehouse management


process to determine which aggregations to generate. This process does not generally
operate during the regular load of information into data warehouse.

Lesson 5: Data Warehousing Architecture

Business Analysis Framework

The business analyst gets the information from the data warehouses to measure the
performance and make critical adjustments in order to win over other business holders in
the market.

Having a data warehouse offers the following advantages −

 Since a data warehouse can gather information quickly and efficiently, it can
enhance business productivity.
 A data warehouse provides us a consistent view of customers and items; hence, it
helps us manage customer relationship.
 A data warehouse also helps in bringing down the costs by tracking trends, patterns
over a long period in a consistent and reliable manner.

To design an effective and efficient data warehouse, we need to understand and


analyze the business needs and construct a business analysis framework. Each person
has different views regarding the design of a data warehouse. These views are as follows −

 The top-down view − This view allows the selection of relevant information needed
for a data warehouse.
 The data source view − This view presents the information being captured, stored,
and managed by the operational system.
 The data warehouse view − This view includes the fact tables and dimension
tables. It represents the information stored inside the data warehouse.
 The business query view − It is the view of the data from the viewpoint of the end-
user.

Three-Tier Data Warehouse Architecture

Generally, a data warehouses adopts a three-tier architecture. Following are the


three tiers of the data warehouse architecture.

35
 Bottom Tier − The bottom tier of the architecture is the data warehouse database
server. It is the relational database system. We use the back-end tools and utilities to
feed data into the bottom tier. These back-end tools and utilities perform the Extract,
Clean, Load, and refresh functions.
 Middle Tier − In the middle tier, we have the OLAP Server that can be implemented
in either of the following ways.

o By Relational OLAP (ROLAP), which is an extended relational database


management system. The ROLAP maps the operations on multidimensional
data to standard relational operations.
o By Multidimensional OLAP (MOLAP) model, which directly implements the
multidimensional data and operations.
 Top-Tier − This tier is the front-end client layer. This layer holds the query tools and
reporting tools, analysis tools and data mining tools.

The following diagram depicts the three-tier architecture of data warehouse –

Image source: www.tutorialspoint.com/dwh/dwh_architecture.htm


Data Warehouse Models

From the perspective of data warehouse architecture, we have the following data
warehouse models −

 Virtual Warehouse

 Data mart

 Enterprise Warehouse

Virtual Warehouse

36
The view over an operational data warehouse is known as a virtual warehouse. It is
easy to build a virtual warehouse. Building a virtual warehouse requires excess capacity on
operational database servers.

Data Mart

Data mart contains a subset of organization-wide data. This subset of data is


valuable to specific groups of an organization.

In other words, we can claim that data marts contain data specific to a particular group. For
example, the marketing data mart may contain data related to items, customers, and sales.
Data marts are confined to subjects.

Points to remember about data marts −

 Window-based or Unix/Linux-based servers are used to implement data marts. They


are implemented on low-cost servers.
 The implementation data mart cycles are measured in short periods of time, i.e., in
weeks rather than months or years.
 The life cycle of a data mart may be complex in long run, if its planning and design
are not organization-wide.
 Data marts are small in size.
 Data marts are customized by department.
 The source of a data mart is departmentally structured data warehouse.
 Data mart are flexible.

Enterprise Warehouse

 An enterprise warehouse collects all the information and the subjects spanning an
entire organization
 It provides us enterprise-wide data integration.
 The data is integrated from operational systems and external information providers.
 This information can vary from a few gigabytes to hundreds of gigabytes, terabytes
or beyond.

37
Load Manager
This component performs the operations required to extract and load process.

The size and complexity of the load manager varies between specific solutions from one
data warehouse to other.

Load Manager Architecture

The load manager performs the following functions −

 Extract the data from source system.


 Fast Load the extracted data into temporary data store.
 Perform simple transformations into structure similar to the one in the data
warehouse.

Image source: www.tutorialspoint.com/dwh/dwh_architecture.htm

Extract Data from Source

The data is extracted from the operational databases or the external information
providers.

Gateways is the application programs that are used to extract data. It is supported
by underlying DBMS and allows client program to generate SQL to be executed at a server.
Open Database Connection (ODBC), Java Database Connection (JDBC), are examples of
gateway.

Fast Load

 In order to minimize the total load window, the data need to be loaded into the
warehouse in the fastest possible time.
 The transformations affect the speed of data processing.
 It is more effective to load the data into relational database prior to applying
transformations and checks.
 Gateway technology proves to be not suitable, since they tend not be preformat
when large data volumes are involved.

38
Simple Transformations

While loading it may be required to perform simple transformations. After this has
been completed, we are in position to do the complex checks. Suppose we are loading the
EPOS sales transaction we need to perform the following checks:

 Strip out all the columns that are not required within the warehouse.

 Convert all the values to required data types.

Warehouse Manager

A warehouse manager is responsible for the warehouse management process. It


consists of third-party system software, C programs, and shell scripts.

The size and complexity of warehouse managers varies between specific solutions.

Warehouse Manager Architecture

A warehouse manager includes the following −

 The controlling process

 Stored procedures or C with SQL

 Backup/Recovery tool

 SQL Scripts

Image source: www.tutorialspoint.com/dwh/dwh_architecture.htm

Operations Performed by Warehouse Manager

39
 A warehouse manager analyzes the data to perform consistency and referential
integrity checks.
 Creates indexes, business views, partition views against the base data.
 Generates new aggregations and updates existing aggregations. Generates
normalizations.
 Transforms and merges the source data into the published data warehouse.
 Backup the data in the data warehouse.
 Archives the data that has reached the end of its captured life.

Note − A warehouse Manager also analyzes query profiles to determine index and
aggregations are appropriate.

Query Manager

 Query manager is responsible for directing the queries to the suitable tables.
 By directing the queries to appropriate tables, the speed of querying and response
generation can be increased.
 Query manager is responsible for scheduling the execution of the queries posed by
the user.

Query Manager Architecture

The following screenshot shows the architecture of a query manager. It includes the
following:

 Query redirection via C tool or RDBMS

 Stored procedures

 Query management tool

 Query scheduling via C tool or RDBMS

 Query scheduling via third-party software

40
Image source: www.tutorialspoint.com/dwh/dwh_architecture.htm

Detailed Information
Detailed information is not kept online, rather it is aggregated to the next level of
detail and then archived to tape. The detailed information part of data warehouse keeps the
detailed information in the starflake schema. Detailed information is loaded into the data
warehouse to supplement the aggregated data.

The following diagram shows a pictorial impression of where detailed information is


stored and how it is used.

Image source: www.tutorialspoint.com/dwh/dwh_architecture.htm

Note − If detailed information is held offline to minimize disk storage, we should make sure
that the data has been extracted, cleaned up, and transformed into starflake schema before
it is archived.

Summary Information

Summary Information is a part of data warehouse that stores predefined


aggregations. These aggregations are generated by the warehouse manager. Summary
Information must be treated as transient. It changes on-the-go in order to respond to the
changing query profiles.

41
The points to note about summary information are as follows −

 Summary information speeds up the performance of common queries.


 It increases the operational cost.
 It needs to be updated whenever new data is loaded into the data warehouse.
 It may not have been backed up, since it can be generated fresh from the detailed
information.

Assessment

Introduction:

This activity helps students in understanding Data Warehousing and identifying its
various types and functions. It also helps them in determining the Data Warehousing
Delivery Process and Data Warehousing System Process and identify the Data
Warehousing Architecture.

Integrative Activity:
A. What exactly is data warehousing, and how does it differ from a database?

________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________

B. List the different types of data warehouses and their functions.

Types Functions

C. Make a diagram that shows the Data Warehousing Delivery Process and explain each
process.

42
D. Make a diagram that shows the Data Warehousing System Process and explain each
process.

E. Identify the three tiers of the data warehouse architecture and identify its functions.

Three-Tier Data Warehouse


Functions
Architecture

43
Module 4. Data Mining

Introduction

In this lesson students will learn what data mining is. This lesson covers the concept of data
mining, its primary goals and processes.

Learning Objectives

At the end of this lesson, students should be able to:

1. Understand the need for analyses of large, complex, information-rich data sets.
2. Identify the goals and primary tasks of the data-mining process.
3. Describe the roots of data-mining technology.
4. Recognize the iterative character of a data-mining process and specify its basic
steps.
Lesson 1: Data Mining Concept

Data mining is the study of collecting, cleaning, processing, analyzing, and gaining
useful insights from data. A wide variation exists in terms of the problem domains,
applications, formulations, and data representations that are encountered in real world
applications.

Therefore, “data mining” is a broad umbrella term that is used to describe these
different aspects of data processing. In the modern age, virtually all automated systems
generate some form of data either for diagnostic or analysis purposes. This has resulted in a
deluge of data, which has been reaching the order of petabytes or exabytes.

Some examples of different kinds of data are as follows:


World Wide Web
The number of documents on the indexed Web is now on the order of billions, and the
invisible Web is much larger. User accesses to such documents create Web access logs at
servers and customer behavior profiles at commercial sites.
Furthermore, the linked structure of the Web is referred to as the Web graph, which is
itself a kind of data. These different types of data are useful in various applications. For
example, the Web documents and link structure can be mined to determine associations
between different topics on the Web. On the other hand, user access logs can be mined to
determine frequent patterns of accesses or unusual patterns of possibly unwarranted
behavior.
Financial interactions

44
Most common transactions of everyday life, such as using an automated teller
machine (ATM) card or a credit card, can create data in an automated way. Such
transactions can be mined for many useful insights such as fraud or other unusual activity.
User interactions:
Many forms of user interactions create large volumes of data. For example, the use
of a telephone typically creates a record at the telecommunication company with details
about the duration and destination of the call. Many phone companies routinely analyze such
data to determine relevant patterns of behavior that can be used to make decisions about
network capacity, promotions, pricing, or customer targeting.
Sensor technologies and the Internet of Things
A recent trend is the development of low-cost wearable sensors, smartphones, and
other smart devices that can communicate with one another. By one estimate, the number of
such devices exceeded the number of people on the planet in 2008. The implications of such
massive data collection are significant for mining algorithms.
Data mining is an iterative process within which progress is defined by discovery,
through either automatic or manual methods. Data mining is most useful in an exploratory
analysis scenario in which there are no predetermined notions about what will constitute an
“interesting” outcome. Data mining is the search for new, valuable, and nontrivial information
in large volumes of data.
It is a cooperative effort of humans and computers. Best results are achieved by
balancing the knowledge of human experts in describing problems and goals with the search
capabilities of computers.
Lesson 2: Primary Goals of Data Mining
In practice, the two primary goals of data mining tend to be prediction and
description. Prediction involves using some variables or fields in the data set to predict
unknown or future values of other variables of interest. Description, on the other hand,
focuses on finding patterns describing the data that can be interpreted by humans.

Therefore, it is possible to put data-mining activities into one of two categories:


1. Predictive data mining, which produces the model of the system described by the given
data set, or
2. Descriptive data mining, which produces new, nontrivial information based on the
available data set.
On the predictive end of the spectrum, the goal of data mining is to produce a model,
expressed as an executable code, which can be used to perform classification, prediction,
estimation, or other similar tasks. On the other, descriptive end of the spectrum, the goal is

45
to gain an understanding of the analyzed system by uncovering patterns and relationships in
large data sets. The relative importance of prediction and description for particular data
mining applications can vary considerably. The goals of prediction and description are
achieved by using data-mining techniques.

The following are the primary data-mining tasks:

1. Classification—Discovery of a predictive learning function that classifies a data item into


one of several predefined classes.

2. Regression—Discovery of a predictive learning function, which maps a data item to a


real-value prediction variable.

3. Clustering—A common descriptive task in which one seeks to identify a finite set of
categories or clusters to describe the data.

4. Summarization—An additional descriptive task that involves methods for finding a


compact description for a set (or subset) of data.

5. Dependency modeling—Finding a local model that describes significant dependencies


between variables or between the values of a feature in a data set or in a part of a data set.

6. Change and deviation detection—Discovering the most significant changes in the data
set.

Lesson 3: Data Mining Process


Without trying to cover all possible approaches and all different views about data
mining as a discipline, let us start with one possible, sufficiently broad definition of data
mining: Data Mining is a process of discovering various models, summaries, and
derived values from a given collection of data. The word “process” is very important here.
Even in some professional environments, there is a belief that data mining simply consists of
picking and applying a computer-based tool to match the presented problem and
automatically obtaining a solution. This is a misconception based on an artificial idealization
of the world. There are several reasons why this is incorrect. One reason is that data mining
is not simply a collection of isolated tools, each completely different from the other and
waiting to be matched to the problem.

A second reason lies in the notion of matching a problem to a technique. Only very
rarely is a research question stated sufficiently precisely that a single and simple application
of the method will suffice. In fact, what happens in practice is that data mining becomes an
iterative process. One studies the data, examines it using some analytic technique, decides
to look at it another way, perhaps modifies it, and then goes back to the beginning and

46
applies another data-analysis tool, reaching either better or different results. This can go
round and round many times; each technique is used to probe slightly different aspects of
data—to ask a slightly different question of the data. What is essentially being described
here is a voyage of discovery that makes modern data mining exciting.

Still, data mining is not a random application of statistical, machine learning, and
other methods and tools. It is not a random walk through the space of analytic techniques
but a carefully planned and considered process of deciding what will be most useful,
promising, and revealing.

It is important to realize that the problem of discovering or estimating dependencies


from data or discovering totally new data is only one part of the general experimental
procedure used by scientists, engineers, and others who apply standard steps to draw
conclusions from the data.

The general experimental procedure adapted to data-mining problems involves the


following steps:

1. State the problem and formulate the hypothesis


Most data-based modeling studies are performed in a particular application domain.
Hence, domain-specific knowledge and experience are usually necessary in order to come
up with a meaningful problem statement. Unfortunately, many application studies tend to
focus on the data-mining technique at the expense of a clear problem statement. In this step,
a modeler usually specifies a set of variables for the unknown dependency and, if possible, a
general form of this dependency as an initial hypothesis. There may be several hypotheses
formulated for a single problem at this stage. The first step requires the combined expertise
of an application domain and a data-mining model. In practice, it usually means a close
interaction between the data-mining expert and the application expert. In successful data-
mining applications, this cooperation does not stop in the initial phase; it continues during the
entire data-mining process.
2. Collect the data.
This step is concerned with how the data are generated and collected. In general,
there are two distinct possibilities. The first is when the data-generation process is under the
control of an expert (modeler): this approach is known as a designed experiment.
The second possibility is when the expert cannot influence the data generation
process: this is known as the observational approach. An observational setting, namely,
random data generation, is assumed in most data-mining applications. Typically, the
sampling distribution is completely unknown after data are collected, or it is partially and
implicitly given in the data-collection procedure. It is very important, however, to understand

47
how data collection affects its theoretical distribution, since such a priori knowledge can be
very useful for modeling and, later, for the final interpretation of results. Also, it is important
to make sure that the data used for estimating a model and the data used later for testing
and applying a model come from the same, unknown, sampling distribution. If this is not the
case, the estimated model cannot be successfully used in a final application of the results.
3. Preprocessing the data
In the observational setting, data are usually “collected” from the existing databases,
data warehouses, and data marts. Data preprocessing usually includes at least two common
tasks:
a. Outlier detection (and removal). Outliers are unusual data values that are not
consistent with most observations. Commonly, outliers result from measurement
errors and coding and recording errors and, sometimes, are natural, abnormal
values. Such nonrepresentative samples can seriously affect the model produced
later. There are two strategies for dealing with outliers:
a.1 Detect and eventually remove outliers as a part of the preprocessing phase.
a.2 Develop robust modeling methods that are insensitive to outliers.
b. Scaling, encoding, and selecting features
Data preprocessing includes several steps such as variable scaling and different
types of encoding. For example, one feature with the range [0, 1] and the other with the
range [–100, 1000] will not have the same weight in the applied technique; they will also
influence the final data-mining results differently. Therefore, it is recommended to scale
them and bring both features to the same weight for further analysis.
Also, application-specific encoding methods usually achieve dimensionality reduction
by providing a smaller number of informative features for subsequent data modeling.

These two classes of preprocessing tasks are only illustrative examples of a large
spectrum of preprocessing activities in a data-mining process. Data-preprocessing steps
should not be considered completely independent from other data-mining phases. In
every iteration of the data-mining process, all activities, together, could define new and
improved data sets for subsequent iterations.

Generally, a good preprocessing method provides an optimal representation for a


data-mining technique by incorporating a priori knowledge in the form of application-
specific scaling and encoding.

4. Estimate the model

48
The selection and implementation of the appropriate data-mining technique is the
main task in this phase. This process is not straightforward; usually, in practice, the
implementation is based on several models, and selecting the best one is an additional task.

5. Interpret the model and draw conclusions

In most cases, data-mining models should help in decision-making. Hence, such


models need to be interpretable in order to be useful because humans are not likely to base
their decisions on complex “black-box” models. Note that the goals of accuracy of the model
and accuracy of its interpretation are somewhat contradictory. Usually, simple models are
more interpretable, but they are also less accurate. Modern data-mining methods are
expected to yield highly accurate results using high-dimensional models. The problem of
interpreting these models, also very important, is considered a separate task, with specific
techniques to validate the results. A user does not want hundreds of pages of numerical
results. He does not understand them; he cannot summarize, interpret, and use them for
successful decision making.

Data Mining Process

Assessment

Introduction:

This activity helps students in understanding Data Mining by identifying the primary
goals and tasks of the data-mining process.

Integrative Activity:

A. Define Data Mining.

49
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
________________________________________
B. Discuss the Primary Goals of Data Mining
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
________________________________________
C. List the Data Mining Process Mining
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
________________________________________

D. Determine whether or not each of the following activities is a data-mining task. Discuss
your answer.
1. Dividing the customers of a company according to their age and sex.
___________________________________________________________________
_________________________________________________________
______________________________________________________________

2. Classifying the customers of a company according to the level of their debt.


___________________________________________________________________
___________________________________________________________________
____________________________________________________
3. Analyzing the total sale of a company in the next month based on current month sale.
___________________________________________________________________
___________________________________________________________________
____________________________________________________
4. Classifying a student database based on a department, sorted based on a student
identification number.
___________________________________________________________________
___________________________________________________________________
____________________________________________________
5. Determining the influence of the number of new University of Louisville students on
the stock market value.
___________________________________________________________________
___________________________________________________________________
____________________________________________________
6. Estimating the future stock price of a company using historical records.

50
___________________________________________________________________
___________________________________________________________________
____________________________________________________
7. Monitoring the heart rate of a patient with abnormalities.
___________________________________________________________________
___________________________________________________________________
____________________________________________________
8. Monitoring seismic waves for earthquake activities.
___________________________________________________________________
___________________________________________________________________
____________________________________________________
9. Extracting frequencies of a sound wave.
___________________________________________________________________
___________________________________________________________________
____________________________________________________
10. Predicting the outcome of tossing a pair of dice.
___________________________________________________________________
___________________________________________________________________
____________________________________________________

Module 5: Data Security


Introduction
The purpose of information protection is to protect an organization’s valuable
resources, such as information, hardware, and software. Through the selection and
application of appropriate safeguards, security helps the organization meet its business
objectives or mission by protecting its physical and financial resources, reputation, legal
position, employees, and other tangible and intangible assets. In this lesson students will
examine the elements of computer security, employee roles and responsibilities, and
common threats. Students will also examine the need for management controls, policies and
procedures, and risk analysis and finally they will present a comprehensive list of tasks,
responsibilities, and objectives that make up a typical information protection program.

Learning Objectives

At the end of this lesson, students should be able to:

1. Define information security.

51
2. Recount the history of computer security, and explain how it evolved into information
security.
3. Define key terms and critical concepts of information security.
4. Enumerate the phases of the security systems development life cycle.
5. Describe the information security roles of professionals within an organization.

Lesson 1: The History of Information Security

Computer security is where the history of information security begins. The need for
computer security—that is, the need to protect physical locations, hardware, and software
from threats—arose during World War II, when the first mainframe computers were used to
aid computations for communication code breaking (as depicted in the figure below). Multiple
levels of security were implemented to protect these mainframes and maintain the integrity
of their data. Access to sensitive military locations, for example, was controlled by means of
badges, keys, and the facial recognition of authorized personnel by security guards. The
growing need to maintain national security eventually led to more complex and more
technologically sophisticated computer security safeguards.

During these early years, information security was


a straightforward process composed predominantly of
physical security and simple document classification
schemes. The primary threats to security were physical
theft of equipment, espionage against the products of the
systems, and sabotage. One of the first documented
security problems that fell outside these categories
occurred in the early 1960s, when a systems administrator
was working on an MOTD (message of the day) file, and
another administrator was editing the password file. A
Image Source: The Enigma. Earlier versions of the
software glitch mixed the two files, and the entire German code machine Enigma were first broken by
the Poles in the 1930s. The British and Americans
password file was printed on every output file. managed to break later, more complex versions
during World War II. Source: Courtesy of National
Security Agency

Lesson 2: Definition of Security

In general, security is “the quality or state of being secure—to be free from danger.”
In other words, protection against adversaries—from those who would do harm, intentionally
or otherwise—is the objective. National security, for example, is a multilayered system that
protects the sovereignty of a state, its assets, its resources, and its people. Achieving the
appropriate level of security for an organization also requires a multifaceted system.

52
A successful organization should have the following multiple layers of security in
place to protect its operations:

Physical security, to protect physical items, objects, or areas from unauthorized


access and misuse

Personnel security, to protect the individual or group of individuals who are


authorized to access the organization and its operations

Operations security, to protect the details of a particular operation or series of


activities

Communications security, to protect communications media, technology, and


content

Network security, to protect networking components, connections, and contents

Information security, to protect the confidentiality, integrity and availability of


information assets, whether in storage, processing, or transmission. It is achieved via
the application of policy, education, training and awareness, and technology.

The C.I.A. triangle has been the industry standard for computer security since the
development of the mainframe. It is based on the three characteristics of information that
give it value to organizations: confidentiality, integrity, and availability. The security of these
three characteristics of information is as important today as it has always been, but the C.I.A.
triangle model no longer adequately addresses the constantly changing environment. The
threats to the confidentiality, integrity, and availability of information have evolved into a vast
collection of events, including accidental or intentional damage, destruction, theft,
unintended or unauthorized
modification, or other misuse
from human or nonhuman
threats. This new environment of
many constantly evolving threats has
prompted the development of a
more robust model that
addresses the complexities of the
current information security Components of Information Security. Source: Course
Technology/Cengage Learning
environment.

Lesson 3: Key Information Security


Concepts

53
Access: A subject or object’s ability to use, manipulate, modify, or affect another
subject or object. Authorized users have legal access to a system, whereas hackers have
illegal access to a system. Access controls regulate this ability.

Asset: The organizational resource that is being protected. An asset can be logical,
such as a Web site, information, or data; or an asset can be physical, such as a person,
computer system, or other tangible object. Assets, and particularly information assets, are
the focus of security efforts; they are what those efforts are attempting to protect.

Attack: An intentional or unintentional act that can cause damage to or otherwise


compromise information and/or the systems that support it. Attacks can be active or passive,
intentional or unintentional, and direct or indirect. Someone casually reading sensitive
information not intended for his or her use is a passive attack. A hacker attempting to break
into an information system is an intentional attack. A lightning strike that causes a fire in a
building is an unintentional attack. A direct attack is a hacker using a personal computer to
break into a system. An indirect attack is a hacker compromising a system and using it to
attack other systems, for example, as part of a botnet (slang for robot network). This group
of compromised computers, running software of the attacker’s choosing, can operate
autonomously or under the attacker’s direct control to attack systems and steal user
information or conduct distributed denial-of-service attacks. Direct attacks originate from the
threat itself. Indirect attacks originate from a compromised system or resource that is
malfunctioning or working under the control of a threat.

Control, safeguard, or countermeasure: Security mechanisms, policies, or


procedures that can successfully counter attacks, reduce risk, resolve vulnerabilities, and
otherwise improve the security within an organization. The various levels and types of
controls are discussed more fully in the following modules.

Exploit: A technique used to compromise a system. This term can be a verb or a


noun. Threat agents may attempt to exploit a system or other information asset by using it
illegally for their personal gain. Or, an exploit can be a documented process to take
advantage of a vulnerability or exposure, usually in software, that is either inherent in the
software or is created by the attacker. Exploits make use of existing software tools or
custom-made software components.

Exposure: A condition or state of being exposed. In information security, exposure


exists when a vulnerability known to an attacker is present.

54
Loss: A single instance of an information asset suffering damage or unintended or
unauthorized modification or disclosure. When an organization’s information is stolen, it has
suffered a loss.

Protection profile or security posture: The entire set of controls and safeguards,
including policy, education, training and awareness, and technology, that the organization
implements (or fails to implement) to protect the asset. The terms are sometimes used
interchangeably with the term security program, although the security program often
comprises managerial aspects of security, including planning, personnel, and subordinate
programs.

Risk: The probability that something unwanted will happen. Organizations must
minimize risk to match their risk appetite—the quantity and nature of risk the organization is
willing to accept.

Subjects and objects: A computer can be either the subject of an attack—an agent
entity used to conduct the attack—or the object of an attack—the target entity. A computer
can be both the subject and object of an attack, when, for example, it is compromised by an
attack (object), and is then used to attack other systems (subject).

Threat: A category of objects, persons, or other entities that presents a danger to an


asset. Threats are always present and can be purposeful or undirected. For example,
hackers purposefully threaten unprotected information systems, while severe storms
incidentally threaten buildings and their contents.

Threat agent: The specific instance or a component of a threat. For example, all
hackers in the world present a collective threat, while Kevin Mitnick, who was convicted for
hacking into phone systems, is a specific threat agent. Likewise, a lightning strike, hailstorm,
or tornado is a threat agent that is part of the threat of severe storms.

Vulnerability: A weaknesses or fault in a system or protection mechanism that


opens it to attack or damage. Some examples of vulnerabilities are a flaw in a software
package, an unprotected system port, and an unlocked door. Some well-known
vulnerabilities have been examined, documented, and published; others remain latent (or
undiscovered)

Lesson 4: Critical Characteristics of Information

The value of information comes from the characteristics it possesses. When a


characteristic of information changes, the value of that information either increases, or, more
commonly, decreases. Some characteristics affect information’s value to users more than
others do. This can depend on circumstances; for example, timeliness of information can be

55
a critical factor, because information loses much or all of its value when it is delivered too
late. Though information security professionals and end users share an understanding of the
characteristics of information, tensions can arise when the need to secure the information
from threats conflicts with the end users’ need for unhindered access to the information. For
instance, end users may perceive a tenth-of-a-second delay in the computation of data to be
an unnecessary annoyance. Information security professionals, however, may perceive that
tenth of a second as a minor delay that enables an important task, like data encryption. Each
critical characteristic of information—that is, the expanded C.I.A. triangle.

Availability

Availability enables authorized users—persons or computer systems—to access


information without interference or obstruction and to receive it in the required format.
Consider, for example, research libraries that require identification before entrance.
Librarians protect the contents of the library so that they are available only to authorized
patrons. The librarian must accept a patron’s identification before that patron has free
access to the book stacks. Once authorized patrons have access to the contents of the
stacks, they expect to find the information they need available in a useable format and
familiar language, which in this case typically means bound in a book and written in English.

Accuracy

Information has accuracy when it is free from mistakes or errors and it has the value
that the end user expects. If information has been intentionally or unintentionally modified, it
is no longer accurate. Consider, for example, a checking account. You assume that the
information contained in your checking account is an accurate representation of your
finances. Incorrect information in your checking account can result from external or internal
errors. If a bank teller, for instance, mistakenly adds or subtracts too much from your
account, the value of the information is changed. Or, you may accidentally enter an incorrect
amount into your account register. Either way, an inaccurate bank balance could cause you
to make mistakes, such as bouncing a check.

Authenticity

Authenticity of information is the quality or state of being genuine or original, rather


than a reproduction or fabrication. Information is authentic when it is in the same state in
which it was created, placed, stored, or transferred. Consider for a moment some common

56
assumptions about e-mail. When you receive e-mail, you assume that a specific individual or
group created and transmitted the e-mail—you assume you know the origin of the e-mail.
This is not always the case. E-mail spoofing, the act of sending an e-mail message with a
modified field, is a problem for many people today, because often the modified field is the
address of the originator. Spoofing the sender’s address can fool e-mail recipients into
thinking that messages are legitimate traffic, thus inducing them to open e-mail they
otherwise might not have. Spoofing can also alter data being transmitted across a network,
as in the case of user data protocol (UDP) packet spoofing, which can enable the attacker to
get access to data stored on computing systems.

Confidentiality

Information has confidentiality when it is protected from disclosure or exposure to


unauthorized individuals or systems. Confidentiality ensures that only those with the rights
and privileges to access information are able to do so. When unauthorized individuals or
systems can view information, confidentiality is breached. To protect the confidentiality of
information, you can use a number of measures, including the following: Information
classification, Secure document storage, Application of general security policies, Education
of information custodians and end users

Integrity

Information has integrity when it is whole, complete, and uncorrupted. The integrity of
information is threatened when the information is exposed to corruption, damage,
destruction, or other disruption of its authentic state. Corruption can occur while information
is being stored or transmitted. Many computer viruses and worms are designed with the
explicit purpose of corrupting data.

Utility

The utility of information is the quality or state of having value for some purpose or
end. Information has value when it can serve a purpose. If information is available, but is not
in a format meaningful to the end user, it is not useful.

Possession

The possession of information is the quality or state of ownership or control.


Information is said to be in one’s possession if one obtains it, independent of format or other
characteristics. While a breach of confidentiality always results in a breach of possession, a
breach of possession does not always result in a breach of confidentiality. For example,
assume a company stores its critical customer data using an encrypted file system. An

57
employee who has quit decides to take a copy of the tape backups to sell the customer
records to the competition. The removal of the tapes from their secure environment is a
breach of possession.

Lesson 5: Approaches to Information Security Implementation

The implementation of information security in an organization must begin


somewhere, and cannot happen overnight. Securing information assets is in fact an
incremental process that requires coordination, time, and patience. Information security can
begin as a grassroots effort in which systems administrators attempt to improve the security
of their systems. This is often referred to as a bottom-up approach. The key advantage of
the bottom-up approach is the technical expertise of the individual administrators. Working
with information systems on a day-to-day basis, these administrators possess in-depth
knowledge that can greatly enhance the development of an information security system.
They know and understand the threats to their systems and the mechanisms needed to
protect them successfully. Unfortunately, this approach seldom works, as it lacks a number
of critical features, such as participant support and organizational staying power.

The top-down approach—in which the project is initiated by upper-level managers


who issue policy, procedures and processes, dictate the goals and expected Objectives, and
determine accountability for each required action—has a higher probability of success. This
approach has strong upper-management support, a dedicated champion, usually dedicated
funding, a clear planning and implementation process, and the means of influencing
organizational culture. The most successful kind of top-down approach also involves a
formal development strategy referred to as a systems development life cycle.

58
Approaches to Information Security Implementation Source: Course
Technology/Cengage Learning

Assessment No. 1:

Introduction:

This activity help students in examining the elements and components of a computer
security and common threats. This activity will also help students in enumerating the phases
of the security systems development life cycle and describe the information security roles of
professionals within an organization.

Integrative Activity:

1. In your own words, define information security.


______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
___________________________________________
2. Explain the components of Information Security.

59
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________
3. Enumerate and discuss the phases of the security systems development life cycle.
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________
______________________________________________________________________
______________________________________________________________________
____________________________________________________
4. Why is there a need for multiple layers of security in an organization?
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________
5. Discuss briefly the different approaches to Information Security Implementation.
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________

MODULE 6: SOCIAL AND ETHICAL ISSUES IN INFORMATION SYSTEM

Introduction

Technology can be a double-edged sword. It can be the source of many benefits but
it can also create new opportunities for invading your privacy, and enabling the reckless use
of that information in a variety of decisions about you. This lesson discusses the Social and
Ethical Issues in Information System.

Learning Objectives

At the end of this lesson, students should be able


to:
1. Explain the Social and Ethical Issues in
Information System.

60
2. Describe some examples of waste and mistakes in an IS environment, their causes,
and possible solutions.
3. Discuss the principles and limits of an individual’s right to privacy.
4. Explain the types and effects of Image Source: https://www.google.com/url?sa=i&url=https%3A%2F

computer crime. %2Fpearlsofprofundity.wordpress.com


%2F2013%2F12%2F10%2Fquestion-of-the-day-black-white-or-gray
5. Identify specific measures to prevent %2F&psig=AOvVaw1QXTKmqfYU8uxXzN-
Mn4JR&ust=1596516016858000&source=images&cd=vfe&ved=0CA
computer crime.
IQjRxqFwoTCMjFsu2b_uoCFQAAAAAdAAAAABAD
6. Explain the important effects of
computers on the work environment and identify specific actions to ensure the health
and safety of employees.

Lesson 1: Information System Issues

Ethical, social, and political issues are closely linked. Introduction of new
technology has a ripple effect in the current equilibrium, creating new ethical, social, and
political issues that must be dealt with on individual, social, and political levels. Both social
and political institutions require time before developing new behaviors, rules, and laws.

A Model for Thinking about Ethical, Social and Political Issues

Ethical, social, and political issues are closely linked. The ethical dilemma you may
face as a manager of information systems typically is reflected in social and political debate.

The Relationship Between Ethical, Social,


and Political Issues in an Information
Society

The introduction of new information


technology has a ripple effect, raising new
Ethical, social, and political issues that
must be dealt with on the individual, social,
and political levels. These issues have five
moral dimensions: information rights and

Image Source: https://www.google.com/url?sa=i&url=https%3A


%2F%2Fwww.igi-global.com%2Fviewtitle.aspx%3FTitleId
%3D66970&psig=AOvVaw0euTPjCFT7zRYVJY_grc9z&ust=1596
61
515849161000&source=images&cd=vfe&ved=0CAIQjRxqFwoTC
LCxxqSb_uoCFQAAAAAdAAAAABAD
obligations, property rights and obligations, system quality, quality of life, and accountability
and control.

Five Moral Dimensions of the Information Age

The major ethical, social, and political issues raised by information systems include
the following moral dimensions:

1. Information rights and obligations. What information rights do individuals and


organizations possess with respect to themselves? What can they protect? Eg. (Privacy
&Web sites Privacy, Spyware, Cookies)
2. Property rights and obligations. How will traditional intellectual property rights be
protected in a digital society in which tracing and accounting for ownership are difficult
and ignoring such property rights is so easy? Eg .(trade secret, copyright, and patent
law)
3. Accountability and control. Who can and will be held accountable and liable for the
harm done to individual and collective information and property rights?
4. System quality. What standards of data and system quality should we demand to
protect individual rights and the safety of society? Eg (Computer crime, Spam junk e-
mail.
5. Quality of Life. What values should be preserved in an information- and knowledge-
based society? Which institutions should we protect from violation? Which cultural values
and practices are supported by the new information technology?
 Repetitive stress injury (RSI)
 Computer vision syndrome (CVS) any eyestrain condition related to computer
display screen use
 Techno stress

62
Image Source: https://www.google.com/url?sa=i&url=https%3A%2F%2Ffanyv88.com%3A443%2Fhttp%2Fcs.furman.edu%2F~pbatchelor%2Fmis%2FSlides
%2FPDF%2520Powerpoints%2520Laudon%252013e
%2FLaudon_MIS13_ch04.pdf&psig=AOvVaw1Bv8o8_JuJvzJwQn0m2NsH&ust=1596512878307000&source=images&cd=
vfe&ved=0CAIQjRxqFwoTCMjM88CQ_uoCFQAAAAAdAAAAABAD
HOW COOKIES IDENTIFY WEB VISITORS

Key Technology Trends that Raise Ethical Issue

Profiling – the use of computers to combine data from multiple sources and create
electronic dossiers of detailed information on individuals. Nonobvious relationship
awareness (NORA) – a more powerful profiling capabilities technology, can take information
about people from many disparate sources, such as employment applications, telephone
records, customer listings, and “wanted” lists, and correlated relationships to find obscure
hidden connections that might help identify criminals or terrorists.

Nonobvious relationship awareness (NORA)

Image Source: https://www.google.com/url?sa=i&url=https%3A%2F%2Ffanyv88.com%3A443%2Fhttp%2Fwww4.comp.polyu.edu.hk%2F~csajaykr


%2FESI.pdf&psig=AOvVaw0LXujJcdv892yHvw43vPYi&ust=1596516098543000&source=images&cd=vfe&ved=0
Ethics in an Information Society
CAIQjRxqFwoTCMCSuJac_uoCFQAAAAAdAAAAABAD

Basic Concepts: Responsibility, Accountability, and Liability

Ethical choices are decisions made by individuals who are responsible for the
consequences of their actions.

 Responsibility is a key element and means that you accept the potential costs, duties,
and obligations for the decisions you make.
 Accountability is a feature of systems and social institutions and means mechanisms
are in place to determine who took responsible action, and who is responsible.
 Liability is a feature of political systems in which a body of laws is in place that permits
individuals to recover the damages done to them by other actors, systems, or
organizations.
 Due process is a related feature of law-governed societies and is a process in which
laws are known and understood, and there is an ability to appeal to higher authorities to
ensure that the laws are applied correctly.
 The Moral Dimensions of Information Systems

63
Information Rights: Privacy and Freedom in The Internet Age
Privacy is the claim of individuals to be left alone, free from surveillance or
interference from other individuals or organizations, including the state.
Internet Challenges to Privacy

Internet
technology has posed new
challenges for the
protection of individual
privacy. Information sent over
this vast network of
networks may pass
through many different
computer systems
before it reaches its final
destination. Each of these
systems is capable of
Image Source: www.google.com/url?sa=i&url=http%3A%2F
monitoring, capturing, and %2Fcs.furman.edu%2F~pbatchelor%2Fmis%2FSlides%2FPDF storing
communications that pass %2520Powerpoints%2520Laudon%252013e
%2FLaudon_MIS13_ch04.pdf&psig=AOvVaw1Bv8o8_JuJvzJwQn0m2Ns
through it. H&ust=1596512878307000&source=images&cd=vfe&ved=0CAIQjRxqFw
oTCMjM88CQ_uoCFQAAAAAdAAAAABAD

Cookies are small text files deposited on a computer hard drive when a user visits to
the web sites. Cookies identify the visitor’s web browser software and track visits to the
website. Web beacons, also called web bugs, are tiny objects invisibly embedded in e-mail
messages and Web pages that are designed to monitor the behavior of the user visiting a
web site or sending e-mail. Spyware can secretly install itself on an Internet user’s computer
by piggybacking on larger applications. Once installed, the spyware calls out to Web sites to
send banner ads and other unsolicited material to the user, and it can also report the user’s
movements on the Internet to other computers.
Property Rights: Intellectual Property
Intellectual property is considered to be intangible property created by individuals or
corporations. Information technology has made it difficult to protect intellectual property
because computerized information can be so easily copied or distributed on networks.
Intellectual property is subject to a variety of protections under three different legal traditions:
trade secrets, copyright, and patent law.
Trade Secrets

64
Any intellectual work product – a formula, device, pattern, or compilation of data-used
for a business purpose can be classified as a trade secret, provided it is not based on
information in the public domain.
Copyright
Copyright is a statutory grant that protects creators of intellectual property from
having their work copied by others for any purpose during the life of the author plus an
additional 70 years after the author’s death.
Patents
A patent grants the owner an exclusive monopoly on the ideas behind an invention
for 20 years. The congressional intent behind patent law was to ensure that inventors of new
machines, devices, or methods receive the full financial and other rewards of their labor and
yet make widespread use of the invention possible by providing detailed diagrams for those
wishing to use the idea under license from the patient’s owner.
System Quality: Data Quality and System Errors
Three principle sources of poor system performance are (1) software bugs and errors
(2) hardware or facility failures caused by natural or other causes and (3) poor input data
quality. The software industry has not yet arrived at testing standards for producing software
of acceptable but not perfect performance.
Quality of Life: Equity, Access, and Boundaries
Balancing Power: Center versus Periphery
Lower level employees many be empowered to make minor decisions but the key
policy decisions may be as centralized as in the past.
Rapidity of Change: Reduced Response Time to Competition
Information systems have helped to create much more efficient national and
international market. The now-more-efficient global marketplace has reduced the normal
social buffers that permitted businesses many years to adjust to competition. We stand the
risk of developing a “just-in-time society” with “just-in-time jobs” and “just-in-time”
workplaces, families, and vacations.

Maintaining Boundaries: Family, Work, and Leisure


The danger to ubiquitous computing, telecommuting, nomad computing, and the “do
anything anywhere” computing environment is that it is actually coming true. The traditional
boundaries that separate work from family and just plain leisure have been weakened. The
work umbrella now extends far beyond the eight-hour day.
Dependence and Vulnerability
Today our businesses, governments, schools, and private associations, such as
churches are incredibly dependent on information systems and are, therefore, highly

65
vulnerable if these systems fail. The absence of standards and the criticality of some system
applications will probably call forth demands for national standards and perhaps regulatory
oversight.
Computer Crime and Abuse
New technologies, including computers, create new opportunities for committing
crimes by creating new valuable items to steal, new way to steal them, and new ways to
harm others. Computer crime is the commission illegal acts through the use of a computer
or against a computer system. Simply accessing a computer system without authorization or
with intent to do harm, even by accident, is now a federal crime.
Computer abuse
It is the commission of acts involving a computer that may not illegal but that are
considered unethical. The popularity of the Internet and e-mail has turned one form of
computer abuse – spamming – into a serious problem for both individuals and
businesses. Spam is junk e-mail sent by an organization or individual to a mass audience of
Internet users who have expressed no interest in the product or service being marketed.

Lesson 2: Computer Waste and Mistakes

Computer waste – Inappropriate use of computer technology and resources.

Computer-related mistakes – Errors, failures, and other computer problems that make
computer output incorrect or not useful.

Computer Waste

 Discarding old software and even complete computer systems when they still have
value
 Building and maintaining complex systems that are never used to their fullest extent
 Using corporate time and technology for personal use
 Spam

Computer-Related Mistakes

Causes:

 Failure by users to follow proper procedures


 Unclear expectations and a lack of feedback
 Program development that contains errors
 Incorrect data entry by data-entry clerk

Preventing Computer-Related Waste and Mistakes

66
1. Establish policies and procedures regarding efficient acquisition, use, and disposal of
systems and devices.
 Training programs for individuals and workgroups
 Manuals and documents on how computer systems are to be maintained and used
 Approval of certain systems and applications to ensure compatibility and cost-
effectiveness

Policies often focus on:

 Implementation of source data automation and the use of data editing to ensure data
accuracy and completeness
 Assignment of clear responsibility for data accuracy within each information system
 Training – Key aspect of implementation

2. Implement internal audits to measure actual results against established goals

1. Monitoring policies and procedures

Monitor routine practices and take corrective action if necessary

2. Reviewing policies and procedures


• during review, people should ask the following questions:
 Do current policies cover existing practices adequately?
 Does the organization plan any new activities in the future?
 Are contingencies and disasters covered?

Safe Disposal of Personal Computers

 Deleting files and emptying the Recycle Bin does not make it impossible for
determined individuals to view the data.
 To make data unrecoverable use disk-wiping software utilities that overwrite all
sectors of your disk drive.

Lesson 3: Computer Crime

Financial fraud, followed by virus attacks is the leading cause of financial loss from
computer incidents. Computer crime is now global.

The Computer as a Tool to Commit Crime

1. Social engineering- Using social skills to get computer users to provide information to
access an information system or its data.

67
2. Dumpster diving- Going through the trash cans of an organization to find confidential
information, including information needed to access an information system.
3. Cyberterrorism
 Cyberterrorist – Intimidates or coerces a government to advance his or her political
or social objectives by launching computer-based attacks against computers,
networks, and the information stored on them.
4. Identity Theft
 Imposter obtains key pieces of personal identification information in order to
impersonate someone else
 Consumers can help protect themselves by:
 Regularly checking their credit reports with major credit bureaus
 Following up with creditors if their bills do not arrive on time
 Not revealing any personal information in response to unsolicited e-mail or
phone calls
5. Internet Gambling
 Size of the online gambling market is not known
 Estimate is that $10–20 billion is wagered on online poker alone each year
 Revenues generated by Internet gambling
 Represent a major untapped source of income for the state and federal
governments.

The Computer as the Object of Crime

Crimes fall into several categories such as:

1. Illegal access and use


2. Data alteration and destruction
3. Information and equipment theft
4. Software and Internet piracy
5. Computer-related scams
6. International computer crime

Illegal Access and Use

 Hacker - Learns about and uses computer systems


 Criminal hacker (cracker) - Gains unauthorized use or illegal access to computer
systems
 Script bunnies- Automate the job of crackers
 Insider - Employee who compromises corporate systems

68
 Virus - Computer program file capable of attaching to disks or other files and
replicating itself repeatedly
 Worm - Parasitic computer programs that replicate but do not infect other computer
program files
 Trojan horse - Disguises itself as a useful application or game and purposefully
does something the user does not expect

Using Antivirus Programs

Antivirus program

Runs in the background to protect your computer from dangers lurking on the
Internet and other possible sources of infected files.

Tips on using antivirus software:

 Run and update antivirus software often


 Scan all removable media before use
 Install software only from a sealed package or secure, well-known Web site
 Follow careful downloading practices
 If you detect a virus, take immediate action

Spyware

Software installed on a personal computer to:

 Intercept or take partial control over the user’s interaction with the computer without
knowledge or permission of the user.

Information and Equipment Theft

Data and information - Assets or goods that can also be stolen

Password sniffer - Small program hidden in a network or a computer system that records
identification numbers and passwords

Patent and Copyright Violations

Software piracy

 The act of unauthorized copying or distribution of copyrighted software


 Penalties can be severe

69
Patent infringement- Occurs when someone makes unauthorized use of another’s patent

Computer-Related Scams

To avoid becoming a scam victim:

 Do not agree to anything in a high-pressure meeting or seminar


 Do not judge a company based on appearances
 Avoid any plan that pays commissions simply for recruiting additional distributors
 Do your homework

Preventing Computer-Related Crime

All states have passed computer crime legislation

Some believe that these laws are not effective because:

 Companies do not always actively detect and pursue computer crime


 Security is inadequate
 Convicted criminals are not severely punished

Crime Prevention by Corporations

Encryption - The process of converting an original electronic message into a form that can
be understood only by the intended recipients

To protect your computer from criminal hackers:

 Install strong user authentication and encryption capabilities on your firewall


 Install the latest security patches
 Disable guest accounts and null user accounts
 Turn audit trails on

Using Intrusion Detection Software

 Monitors system and network resources and notifies network security personnel
when it senses a possible intrusion
 Can provide false alarms that result in wasted effort

Security Dashboard

70
Employed to provide a comprehensive display on a single computer screen of all the vital
data related to an organization’s security defenses

Data comes from a variety of sources including:

 Firewalls
 Applications
 Servers
 Other software and hardware devices

Using Managed Security Service Providers (MSSPs)

 Monitors, manages, and maintains network security for both hardware and software
 Provide vulnerability scanning and Web blocking/ filtering capabilities

Filtering and Classifying Internet Content

Filtering software

 Screens Internet content


 Used by companies to prevent employees from visiting nonwork-related Web sites

Internet Content Rating Association (ICRA)

 Goals are to protect children from potentially harmful material, while also
safeguarding free speech on the Internet

Internet Libel Concerns

Companies should be aware that:

 Publishing Internet content to the world can subject them to different countries’ laws
Geolocation tools
 Match user’s IP address with outside information to determine the actual geographic
location of the online user

Individuals

 Must be careful what they post on the Internet to avoid libel charges

Preventing Crime on the Internet

 Develop effective Internet usage and security policies for all employees
 Use a stand-alone firewall (hardware and software) with network monitoring
capabilities

71
 Deploy intrusion detection systems, monitor them, and follow up on their alarms
 Monitor managers and employees to make sure that they are using the Internet for
business purposes
 Use Internet security specialists to perform audits of all Internet and network activities

Lesson 4: Privacy

Issue of privacy

 Deals with this right to be left alone or to be withdrawn from public view.

More data and information are produced and used today than ever before:

 “Who owns this information and knowledge?”

Privacy at Work

 There is conflict between rights of workers who want their privacy and the interests of
companies that demand to know more about their employees
 Nearly one-third of companies have fired an employee for violating corporate e-mail
policies

E-Mail Privacy

 Federal law – Permits employers to monitor e-mail sent and received by employees
 E-mail messages that have been erased from hard disks can be retrieved and used
in lawsuits
 Use of e-mail among public officials might violate “open meeting” laws

Instant Messaging Privacy

 Using instant messaging (IM) to send and receive messages, files, and images
introduces the same privacy issues associated with e-mail
 Do not send personal or private IMs at work

Privacy and Personal Sensing Devices

 RFID tags – Essentially microchips with antenna, are embedded in many of the
products we buy – Generate radio transmissions that, if appropriate measures are
not taken, can lead to potential privacy concerns

Privacy and the Internet

 Platform for Privacy Preferences (P3P) – Screening technology that shields users
from Web sites that do not provide the level of privacy protection they desire

72
Corporate Privacy Policies

Invasions of privacy- Can hurt business, turn away customers, and dramatically reduce
revenues and profits

Multinational companies- Face an extremely difficult challenge in implementing data


collection and dissemination processes and policies

Individual Efforts to Protect Privacy

 Find out what is stored about you in existing databases


 Be careful when you share information about yourself
 Be proactive to protect your privacy
 When purchasing anything from a Web site, make sure that you safeguard your
credit card numbers, passwords, and personal information.

Lesson 5: Work Environment

Computer technology and information systems- Have opened up numerous avenues


to professionals and nonprofessionals.

Despite increases in productivity and efficiency computers and information systems


can raise other concerns.

Health Concerns

 Working with computers – Can cause occupational stress


 Training and counseling – Can often help the employee and deter problems
 Carpal tunnel syndrome (CTS) – Aggravation of the pathway for the nerves that
travel through the wrist (carpal tunnel)

Avoiding Health and Environmental Problems

 Many computer-related health problems – Are caused by a poorly designed work


environment
 Ergonomics – Science of designing machines, products, and systems to maximize the
safety, comfort, and efficiency of the people who use them

Ethical Issues in Information System

Code of ethics – States the principles and core values that are essential to a set of people
and thus governs their behavior.

Assessment:

73
Introduction:

This activity will help students in understanding the Social and Ethical Issues in
Information System and describe some examples of waste and mistakes in an IS
environment, their causes, and possible solutions. This activity will also help students in
identifying specific measures to prevent computer crime.

Integrative Activity:

1. What specific principles can be used to guide ethical decisions?


______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________

2. Discuss the principles and limits of an individual’s right to privacy.


______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________
3. Explain the types and effects of computer crime.
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________
4. Identify specific measures to prevent computer crime.
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________
5. Outline measures for the ethical use of information systems.
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
____________________________

74
75

You might also like