0% found this document useful (0 votes)
43 views14 pages

Unit 5 Foundations of Business Intelligence Databases and Information Management

The document explains the distinction between data and information, emphasizing that data is raw facts while information is processed data that provides meaning. It discusses the challenges of traditional file environments, including data redundancy, inconsistency, and poor security, and introduces the database approach as a solution to these issues. Database Management Systems (DBMS) are highlighted as tools that centralize data management, reduce redundancy, and improve data accessibility for better business decision-making.

Uploaded by

Khem Raj Pant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views14 pages

Unit 5 Foundations of Business Intelligence Databases and Information Management

The document explains the distinction between data and information, emphasizing that data is raw facts while information is processed data that provides meaning. It discusses the challenges of traditional file environments, including data redundancy, inconsistency, and poor security, and introduces the database approach as a solution to these issues. Database Management Systems (DBMS) are highlighted as tools that centralize data management, reduce redundancy, and improve data accessibility for better business decision-making.

Uploaded by

Khem Raj Pant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Unit 6: Foundations of Business Intelligence Databases and Information

Management
Data vs. Information

• Data is raw facts collected from environment about physical phenomena or business transactions.
• Data can be in any form-numerical, textual, graphical, image, sound, video etc. It has no meaning.
It is input to any system in an organization.
• For example, data would be the marks obtained by students in different subjects.
• On the other hand, information is defined as refined or processed data that has been transformed
into meaningful and useful form for specific users.
• For example, after processing the marks obtained by student it transformed into information,
which is meaningful and from which we can decide which student stood first, second and so forth.
Information comes from data and takes the form of table, graphs, diagrams etc.

Organizing Data in a Traditional File Environment:

• An effective information system provides users with accurate, timely, and relevant information.
Accurate information is free of errors.
• Information is timely when it is available to decision makers when it is needed. Information is
relevant when it is useful and appropriate for the types of work and decisions that require it.
• Many businesses don’t have timely, accurate, or relevant information because the data in their
information systems have been poorly organized and maintained. That’s why data management
is so essential.

File Organization Terms and Concepts:

• A computer system organizes data in a hierarchy that starts with bits and bytes and progresses to
fields, records, files, and databases.
• A bit represents the smallest unit of data a computer can handle.
• A group of bits, called a byte, represents a single character, which can be a letter, a number, or
another symbol.
• A grouping of characters into a word, a group of words, or a complete number (such as a person’s
name or age) is called a field.
• A group of related fields, such as the student’s name, the course taken, the date, and the grade,
comprises a record; a group of records of the same type is called a file.
• For example, the records in Figure below could constitute a student course file.
• A group of related files makes up a database. The student course file illustrated in Figure below
could be grouped with files on students’ personal histories and financial backgrounds to create a
student database.
• A record describes an entity. An entity is a person, place, thing, or event on which we store and
maintain information.
• Each characteristic or quality describing a particular entity is called an attribute.
• For example, Student_ID, Course, Date, and Grade are attributes of the entity COURSE. The
specific values that these attributes can have been found in the fields of the record describing the
entity COURSE.

Compiled By: Krishna Bhandari www.genuinenotes.com


Problems with the Traditional File Environment:

• In most organizations, systems tended to grow independently without a company-wide plan.


Accounting, finance, manufacturing, human resources, and sales and marketing all developed
their own systems and data files.
• Each application, of course, required its own files and its own computer program to operate.
• For example, the human resources functional area might have a personnel master file, a payroll
file, a medical insurance file, a pension file, a mailing list file, and so forth until tens, perhaps
hundreds, of files and programs existed.
• In the company as a whole, this process led to multiple master files created, maintained, and
operated by separate divisions or departments.
• As this process goes on for 5 or 10 years, the organization is saddled with hundreds of programs
and applications that are very difficult to maintain and manage.
• The resulting problems are data redundancy and inconsistency, program-data dependence,
inflexibility, poor data security, and an inability to share data among applications.

Compiled By: Krishna Bhandari www.genuinenotes.com


Data Redundancy and Inconsistency

• Data redundancy is the presence of duplicate data in multiple data files so that the same data are
stored in more than place or location.
• Data redundancy occurs when different groups in an organization independently collect the same
piece of data and store it independently of each other.
• Data redundancy wastes storage resources and also leads to data inconsistency, where the same
attribute may have different values.
• For example, in instances of the entity COURSE, the Date may be updated in some systems but
not in others.
• The same attribute, Student_ID, may also have different names in different systems throughout
the organization.
• Some systems might use Student_ID and others might use ID, for example.

Program-Data Dependence

• Program-data dependence refers to the coupling of data stored in files and the specific programs
required to update and maintain those files such that changes in programs require changes to the
data.

Compiled By: Krishna Bhandari www.genuinenotes.com


• Every traditional computer program has to describe the location and nature of the data with
which it works.
• In a traditional file environment, any change in a software program could require a change in the
data accessed by that program.
• One program might be modified from a five-digit to a nine-digit ZIP code.
• If the original data file were changed from five-digit to nine-digit ZIP codes, then other programs
that required the five-digit ZIP code would no longer work properly. Such changes could cost
millions of dollars to implement properly.

Poor Security

• Because there is little control or management of data, access to and dissemination of information
may be out of control.
• Management may have no way of knowing who is accessing or even making changes to the
organization’s data.

Lack of Data Sharing and Availability

• Because pieces of information in different files and different parts of the organization cannot be
related to one another, it is virtually impossible for information to be shared or accessed in a
timely manner.
• Information cannot flow freely across different functional areas or different parts of the
organization.
• If users find different values of the same piece of information in two different systems, they may
not want to use these systems because they cannot trust the accuracy of their data.

The Database Approach to Data Management:

• Database technology cuts through many of the problems of traditional file organization.
• A more rigorous definition of a database is a collection of data organized to serve many
applications efficiently by centralizing the data and controlling redundant data.
• Rather than storing data in separate files for each application, data are stored so as to appear to
users as being stored in only one location.
• A single database services multiple application.
• For example, instead of a corporation storing employee data in separate information systems and
separate files for personnel, payroll, and benefits, the corporation could create a single common
human resources database.

Database Management Systems (DBMS):

• A database is an organized collection of logically related data that contains information relevant
to an enterprise.
• The database is also called the repository or container for a collection of data files.
• For example, university database maintains information about students, courses and grades in
university.
• A database management system (DBMS) is software that permits an organization to centralize
data, manage them efficiently, and provide access to the stored data by application programs.

Compiled By: Krishna Bhandari www.genuinenotes.com


• The DBMS acts as an interface between application programs and the physical data files.
• When the application program calls for a data item, such as gross pay, the DBMS finds this item
in the database and presents it to the application program.
• Using traditional data files, the programmer would have to specify the size and format of each
data element used in the program and then tell the computer where they were located.
• A Database Management System (DBMS) is the set of programs that is used to store, retrieve and
manipulate the data in convenient and efficient way.
• Main goal of database management system (DBMS) is to hide underlying complexities of data
management from users and provide easy interface to them.
• Some common examples of the DBMS software are Oracle, Sybase, Microsoft SQL Server, DB2,
MySQL, Postgres, Dbase, MS-Access etc.
• The database management software makes the physical database available for different logical
views required by users.
• For example, for the human resources database illustrated in Figure below, a benefits specialist
might require a view consisting of the employee’s name, social security number, and health
insurance coverage.

• A payroll department member might need data such as the employee’s name, social security
number, gross pay, and net pay.
• The data for all these views are stored in a single database, where they can be more easily
managed by the organization.

How a DBMS Solves the Problems of the Traditional File Environment?

• A DBMS reduces data redundancy and inconsistency by minimizing isolated files in which the same
data are repeated.

Compiled By: Krishna Bhandari www.genuinenotes.com


• The DBMS may not enable the organization to eliminate data redundancy entirely, but it can help
control redundancy.
• Even if the organization maintains some redundant data, using a DBMS eliminates data
inconsistency because the DBMS can help the organization ensure that every occurrence of
redundant data has the same values.
• The DBMS uncouples programs and data, enabling data to stand on their own.
• Access and availability of information will be increased and program development and
maintenance costs reduced because users and programmers can perform ad hoc queries of data
in the database.
• The DBMS enables the organization to centrally manage data, their use, and security.

Relational DBMS

• Contemporary DBMS use different database models to keep track of entities, attributes, and
relationships.
• The most popular type of DBMS today for PCs as well as for larger computers and mainframes is
the relational DBMS.
• Relational databases represent data as two-dimensional tables (called relations). Tables may be
referred to as files. Each table contains data on an entity and its attributes.
• Microsoft Access is a relational DBMS for desktop systems, whereas DB2, Oracle Database, and
Microsoft SQL Server are relational DBMS for large mainframes and midrange computers.
• MySQL is a popular open-source DBMS, and Oracle Database Lite is a DBMS for small handheld
computing devices.

Compiled By: Krishna Bhandari www.genuinenotes.com


Capabilities of Database Management Systems

• A DBMS includes capabilities and tools for organizing, managing, and accessing the data in the
database.
• The most important are its data definition language, data dictionary, and data manipulation
language.
• DBMS have a data definition capability to specify the structure of the content of the database. It
would be used to create database tables and to define the characteristics of the fields in each
table. This information about the database would be documented in a data dictionary.
• A data dictionary is an automated or manual file that stores definitions of data elements and their
characteristics.
• A data manipulation language is used to add, change, delete, and retrieve the data in the
database.
• The key capabilities of database management systems are listed below:
o Querying and reporting
o Maintaining complex relationship among data
o Provide backup and recovery
o Data availability
o Maintaining data integrity
o Minimize data redundancy
o Improve data security
o Handling concurrent access anomalies

Querying and reporting

• The database contains the huge amount of data. Querying helps to filter the data and present
only what the user requires.
• The most popular type of query language is SQL. It uses English like structured syntax for creating
queries.
• Now after filtering data from the database through query languages it is equally necessary to
present the data in an appropriate structure.
• DBMS have the capabilities to generate reports on the user-desired data based on user-desired
structure.

Compiled By: Krishna Bhandari www.genuinenotes.com


Designing Databases:

• To create a database, we must understand the relationships among the data, the type of data that
will be maintained in the database, how the data will be used, and how the organization will need
to change to manage data from a company-wide perspective.
• The database requires both a conceptual design and a physical design.
• The conceptual, or logical, design of a database is an abstract model of the database from a
business perspective, whereas the physical design shows how the database is actually arranged
on direct-access storage devices.
• The conceptual database design describes how the data elements in the database are to be
grouped.
• The design process identifies relationships among data elements and the most efficient way of
grouping data elements together to meet business information requirements.
• The process also identifies redundant data elements and the groupings of data elements required
for specific application programs.
• Groups of data are organized, refined, and streamlined until an overall logical view of the
relationships among all the data in the database emerges.
• The conceptual database design deals with two important concepts:
o Normalization and
o Entity relationship diagram

Normalization

• The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into
smaller relations is called normalization.
• While designing a database out of an entity–relationship model, the main problem existing in that
raw database is redundancy. Redundancy is storing the same data item in more one place. A
redundancy creates several problems like the following:
o Extra storage space: storing the same data in many places takes large amount of disk pace.
o Entering same data more than once during data insertion.
o Deleting data from more than one place during deletion.
o Modifying data in more than one place.
o Anomalies may occur in the database if insertion, deletion, modification etc. are no done
properly. It creates inconsistency and unreliability in the database.
• To solve this problem, the raw database needs to be normalized. This is a step-by-step process of
removing different kinds of redundancy and anomaly at each step.
• At each step a specific rule is followed to remove specific kind of impurity in order to give the
database a slim and clean look.
• The process of reducing data redundancy and removing database modification anomaly in a
relational database is called normalization.

Compiled By: Krishna Bhandari www.genuinenotes.com


Entity Relationship (ER) diagram

• An E-R diagram is a specialized graphical tool that demonstrates the interrelationships among
various entities of a database. It is used to represent the overall logical structure of the database.
• While designing E-R diagrams, the
• emphasis is on the schema of the database and not on the instances. This is because the schema
of the database is changed rarely; however, the instances in the entity and relationship sets
change frequently. Thus, E-R diagrams are more useful in designing the database.
• E-R diagram focuses high level database design and hides low level details of database
representation therefore it can be used to communicate with users of the system while collecting
information.

• Another representation of ER diagram can be as follows:

Compiled By: Krishna Bhandari www.genuinenotes.com


Using Databases to Improve Business Performance and Decision Making:

• Businesses use their databases to keep track of basic transactions, such as paying suppliers,
processing orders, keeping track of customers, and paying employees.
• But they also need databases to provide information that will help the company run the business
more efficiently, and help managers and employees make better decisions.
• If a company wants to know which product is the most popular or who is its most profitable
customer, the answer lies in the data.
• For example, by analyzing data from customer credit card purchases, Louise’s Trattoria, a Los
Angeles restaurant chain, learned that quality was more important than price for most of its
customers, who were college-educated and liked fine wine.
• Acting on this information, the chain introduced vegetarian dishes, more seafood selections, and
more expensive wines, raising sales by more than 10 percent.
• In a large company, with large databases or large systems for separate functions, such as
manufacturing, sales, and accounting, special capabilities and tools are required for analyzing vast
quantities of data and for accessing data from multiple systems.
• These capabilities include data warehousing, data mining, and tools for accessing internal
databases through the Web.

Data Warehouses:

• A data warehouse is a database that stores current and historical data of potential interest to
decision makers throughout the company.
• The data originate in many core operational transaction systems, such as systems for sales,
customer accounts, and manufacturing, and may include data from Web site transactions.
• The data warehouse consolidates and standardizes information from different operational
databases so that the information can be used across the enterprise for management analysis and
decision making.
• The data warehouse makes the data available for anyone to access as needed, but it cannot be
altered.
• A data warehouse system also provides a range of ad hoc and standardized query tools, analytical
tools, and graphical reporting facilities.
• Many firms use intranet portals to make the data warehouse information widely available
throughout the firm.

Compiled By: Krishna Bhandari www.genuinenotes.com


How does a data warehouse differ from a database?

• There are a number of fundamental differences which separate a data warehouse from a
database.
• The biggest difference between them is that most database place an emphasis on a single
application, and this application will generally be one that is based on transaction.
• If the data is analyzed, it will be done within a single domain.
• In contrast, data warehouses deal with multiple domains simultaneously.

Data Marts:

• Companies often build enterprise-wide data warehouses, where a central data warehouse serves
the entire organization, or they create smaller, decentralized warehouses called data marts.
• A data mart is a subset of a data warehouse in which a summarized or highly focused portion of
the organization’s data is placed in a separate database for a specific population of users.
• For example, a company might develop marketing and sales data marts to deal with customer
information.
• A data mart typically focuses on a single subject area or line of business, so it usually can be
constructed more rapidly and at lower cost than an enterprise-wide data warehouse.\

Compiled By: Krishna Bhandari www.genuinenotes.com


Tools for Business Intelligence: Multidimensional Data Analysis and Data Mining

• Once data have been captured and organized in data warehouses and data marts, they are
available for further analysis using tools for business intelligence.
• Business intelligence tools enable users to analyze data to see new patterns, relationships, and
insights that are useful for guiding decision making.
• Principal tools for business intelligence include software for database querying and reporting,
tools for multidimensional data analysis (online analytical processing), and tools for data mining.

Online analytical processing (OLAP): Multidimensional data analysis

• OLAP supports multidimensional data analysis, enabling users to view the same data in different
ways using multiple dimensions.
• Each aspect of information—product, pricing, cost, region, or time period—represents a different
dimension.
• Multidimensional data models are designed expressly to support data analyses.
• The goal of multidimensional data models is to support analysis in a simple and faster way by
executives, managers and business professionals.
• These people are not interested in the overall architecture.
• Suppose your company sells five different products—Laptops, Computers, TVs, Camera and
Mobiles—in the East, West, North and Central regions.
• If you wanted to ask a fairly straightforward question, such as how many Computers were sold in
the last week, you could easily find the answer by using sales database.
• But what if you wanted to know how many Computers sold in each of your sales regions and
compare actual results with projected sales, then the querying becomes complicated.
• In such a case, OLAP is used.

Data mining

• Data mining refers to extracting or ―mining‖ knowledge, interesting information or patterns from
large amount of data.
• Data mining is a process of discovering interesting knowledge from large amounts of data stored
either, in database, data warehouse, or other information repositories.

Compiled By: Krishna Bhandari www.genuinenotes.com


• It is the semi-automatic process of extracting and identifying patterns from stored data.
• A data mining application, or data mining tool, is typically a software interface which interacts
with a large database containing customer or other important data.
• Data mining is widely used by companies and public bodies for such uses as marketing, detection
of fraudulent activity etc. That is, data mining deals with ―knowledge discovery in databases.
• There are a wide variety of data mining applications available, particularly for business uses, such
as Customer Relationship Management (CRM).
• These applications enable marketing managers to understand the behaviors of their customers
and also to predict the potential behavior of prospective clients.
• Data mining is a logical process that is used to search through large amount of data in order to
find useful data.
• The goal of this technique is to find patterns that were previously unknown.
• Once these patterns are found they can further be used to make certain decisions for
development of their business.

Functions of Data Mining:

The types of information obtainable from data mining include associations, sequences, classifications,
clusters, and forecasts.

• Association: Association is one of the best-known data mining technique. In association, a pattern
is discovered based on a relationship between items in the same transaction. That is the reason
why association technique is also known as relation technique. The association technique is used
in market basket analysis to identify a set of products that customers frequently purchase
together. For instance, books that tends to be bought together. If a customer buys a book, an
online bookstore may suggest other associated books. If a person buys a camera, the system may
suggest accessories that tend to be bought along with cameras.
• In Sequences, events are linked over time. We might find, for example, that if a house is
purchased, a new refrigerator will be purchased within two weeks 65 percent of the time, and an
oven will be bought within one month of the home purchase 45 percent of the time.
• Classification recognizes patterns that describe the group to which an item belongs by examining
existing items that have been classified and by inferring a set of rules. For example, businesses
such as credit card or telephone companies worry about the loss of steady customers.
Classification helps discover the characteristics of customers who are likely to leave and can
provide a model to help managers predict who those customers are so that the managers can
devise special campaigns to retain such customers.
• Clustering works in a manner similar to classification when no groups have yet been defined. A
data mining tool can discover different groupings within data, such as finding affinity groups for
bank cards or partitioning a database into groups of customers based on demographics and types
of personal investments.
• Although these applications involve predictions, Forecasting uses predictions in a different way.
It uses a series of existing values to forecast what other values will be. For example, forecasting
might find patterns in data to help managers estimate the future value of continuous variables,
such as sales figures.

Compiled By: Krishna Bhandari www.genuinenotes.com


Text mining

• Text mining is the discovery of patterns and relationships from large sets of unstructured data—
the kind of data we generate in e-mails, phone conversations, blog postings, online customer
surveys, and tweets.

Web mining

• The discovery and analysis of useful patterns and information from the World Wide Web or simply
web is called web mining.
• Web mining is the application of data mining technique to find interesting and potentially useful
knowledge from web data.
• So, web mining is the application of data mining technique to extract knowledge from web data,
including web documents, hyperlinks between documents, usage logs of web sites etc.

Compiled By: Krishna Bhandari www.genuinenotes.com

You might also like