Notes - Unit1 - 1
Notes - Unit1 - 1
What is BI?
Business Intelligence (BI) is about getting the right information, to the right decision makers,
at the right time.
BI is an enterprise-wide platform that supports reporting, analysis and decision making
In 1989- Howard Dresner, Gartner Group, defined as: “BI is a set of concepts and
methodologies to improve decision making in business through use of facts and fact-based
systems”.
❑ The goal of BI is improved decision making.
Yes, decisions were made earlier too (without BI). The use of BI should lead
to improved decision making.
❑ BI is more than just technologies.
It is a group of concepts and methodologies.
❑ It is fact based.
Decisions are no longer made on gut feeling or purely on hunch. It has to
be backed by facts.
◼ BI uses a set of processes, technologies, and tools :
-To transform raw data into meaningful information.
◼ BI mines the information:
-To provide knowledge and uses the knowledge gained to provide beneficial insights;
These insights then lead to impactful decision making which in turn provide business benefits
such as:
• increased profitability;
• increased productivity,
• reduced cost,
• improved operation etc.
The transformation of raw data to business benefits through BI may be depicted as:
◼ The term Business Intelligence (BI) refers to technologies, applications and practices
for the collection, integration, analysis, and presentation of business information.
◼ The purpose of Business Intelligence is to support better business decision making.
◼ Structured data is organized in semantic chunks (entities) with similar entities grouped
together to form relations or classes.
◼ Entities in the same group have the same descriptions, i.e. attributes.
◼ Descriptions for all entities in a group contains
◼ Data coming from databases such as Access, OLTP systems, SQL as well spreadsheets
such as Excel etc. are all in the structured format
◼ To summarize, structured data
❑ Consists of fully described data sets.
❑ Has clearly defined categories and sub-categories.
❑ Is placed neatly in rows and columns.
❑ Goes into the records and hence the database is
regulated by a well-defined structure
❑ Can be indexed easily either by the DBMS itself or
manually.
◼ Working with structured data is easy when it comes to Storage, Scalability, Security,
Update and Delete operations
❑ Storage: Both defined and user-defined data types help with
the storage of structured data
❑ Scalability: Scalability is not generally an issue with
increase in data
❑ Security: Ensuring security is easy
❑ Update and Delete: updating, deleting, etc. is easy due to
structured form
3. Classification/Taxonomy:
• Taxonomy is classifying data on the basis of the relationships that exist between
data.
• Data can be arranged in groups and placed in hierarchies based on the taxonomy
prevalent in an organization. However, classifying unstructured data is difficult as
identifying relationships between data is not an easy task.
• In the absence of any structure or metadata or schema, identifying accurate
relationships and classifying is not easy.
• Since the data is unstructured, naming conventions or standards are not consistent
across an organization, thus making it difficult to classify data.
• Storage space: It is difficult to store and manage unstructured data. A lot of space
is required to store such data. It is difficult to store images, videos, audios, etc.
• Scalability: As the data grows, scalability becomes an issue and the cost of storing
such data grows.
• Retrieve information: Even if unstructured data is stored, it is difficult to retrieve
and recover from it.
• Security: Ensuring security is difficult due to varied sources of data.
• Update and delete: Updating and deleting unstructured data are difficult due to
no clear structure.
• Indexing and searching: Indexing unstructured data is difficult and error-prone as
the structure is not clear and attributes are not pre-defined. As a result, the search
results are not very accurate. Indexing becomes all the more difficult as the
volume of data grows.
Solutions to Storage Challenges of Unstructured Data
Few possible solutions depicted as below:
◼ Changing format:
❑ Unstructured data may be converted to formats which are easily managed,
stored and searched.
❑ For example, IBM is working on providing a solution which will convert audio,
video, ete. to text.
◼ Developing new hardware:
❑ While unstructured data such as video or image file cannot be stored fairly
neatly into a relational column, there is no such problem when it comes to
storing its metadata, such as the date and time of its creation, the owner or
author of the data, etc.
◼ Storing in XML( eXtensible Markup Language) format:
❑ Unstructured data may be stored in XML format which tries to give some
structure to it by using tags and elements.
◼ CAS (Content Addressable Storage):
❑ It organizes files based on their metadata and assigns a unique name to every
object stored in it.
❑ The object is retrieved based on its content and not its location.
❑ It is used extensively to store emails, etc.
• Schemas: These can be used to describe the structure of data. Schemas define the
constraints, content of the document, etc. The problem with schemas is that
requirements are ever changing, and the changes required in data also lead to changes
in schema.
• Graph based data models: These can be used to describe data. This is schema less
approach and is also known as self-describing as data is presented in such a way that
it explains itself. The relationships and hierarchies are represented in the form of a
tree-like structure where the Vertices contain the object or entity and the leaves
contain data.
• XML: This is widely used to store and exchange semi-
structured data. It allows the user to define tags to store
data in hierarchical or nested forms.
Schemas in XML are not tightly coupled to data.
• XML is widely used to store and exchange semi-structured data. It allows its user to
define tags and attributes to store the data in hierarchical form.
Schema and Data are not tightly coupled in XML.
• Object Exchange Model (OEM) can be used to store and exchange semi-structured
data. OEM structures data in form of graph.
• RDBMS can be used to store the data by mapping the data to relational schema and
then mapping it to a table
How to Extract Information from Semi-Structured Data?
Challenges faced:
Possible solutions:
◼ Indexing: Indexing data in a graph-based model enables quick search.
◼ OEM: This data modeling technique allows for the data to be stored in a graph-based
data model which is easier to index and search.
◼ XML: It allows data to be arranged in a hierarchical or tree-like structure which
enables indexing and searching.
◼ Mining tools: Various mining tools are available which search data based on graphs,
schemas, structures, etc.
• DTD’s (Document Type Descriptors) provide partial schemas for XML documents.