Database Data: Definition - Unstructured Data Is A Generic Label For Describing Any Corporate Information That Is Not
Database Data: Definition - Unstructured Data Is A Generic Label For Describing Any Corporate Information That Is Not
DEFINITION -
Unstructured data is a generic label for describing any corporate information that is not
in a database. Unstructured data can be textual or non-textual. Textual unstructured data is
generated in media like email messages, PowerPoint presentations, Word documents,
collaboration software and instant messages. Non-textual unstructured data is generated in
media like JPEGimages, MP3 audio files and Flash video files.
If left unmanaged, the sheer volume of unstructured data that’s generated each year within an
enterprise can be costly in terms of storage. Unmanaged data can also pose a liability if
information cannot be located in the event of acompliance audit or lawsuit. The information
contained in unstructured data is not always easy to locate. It requires that data in both electronic
and hard copy documents and other media be scanned so a search application can parse out
concepts based on words used in specific contexts. This is called semanticsearch. It is also
referred to as enterprise search.
In customer-facing businesses, the information contained in unstructured data can be analyzed to
improve customer relationship management andrelationship marketing. As social media
applications like Twitter and Facebookgo mainstream, the growth of unstructured data is
expected to far outpace the growth of structured data.
1
reports, documents, PowerPoint presentations and e-mail. Identifying customer complaints,
benchmarking marketing campaigns and identifying insurance claims fraud are just a few of the areas that
can tie the analysis of unstructured data to organizational profit. If general estimates of organizational
unstructured data are correct (between 53 and 85 percent) then organizations that only capture structured
data for analysis are missing potential opportunities for performance optimization as they are utilizing less
than half of their information resources.
Business examples of how the analysis of unstructured data benefits organizations abound. Call centers
provide a good example of how unstructured data can be leveraged to improve internal processes and
performance. Customer complaints may be monitored, but the type of complaint and levels of customer
satisfaction may not be. Freeform text fields within CRM applications can provide decision makers the
information they require to identify trends in customer dissatisfaction and recurring issues. These can
then be used to enhance the overall customer experience, thereby increasing satisfaction and reducing
customer churn rates. In general, freeform text within any organizational software application can be
analyzed to identify trend-based information to help identify areas for improvement.
Conclusion
The importance of unstructured data to the world of BI and BPM can’t be underestimated. As
organizations compete to maintain competitive advantage the way they leverage their data becomes key.
The importance of identifying customer complaints, quality issues within manufacturing, and how to
benchmark against competitors’ marketing campaigns are just a few of the applications of unstructured
data analysis and mark the start of the transition towards the optimization of analysis tools to help drive
organizational performance.
STRUCTURED
Structured information means information that has been traditionally classed as a report. Here,
data, characteristics, key figures, assignments and other attributes are presented in table or
diagram form. The use of trees, grids, and other graphics is also usual. These structures enable
diverse analyses. Reports do not always have to be created individually - a large proportion of
report creation can be carried out automatically. Searching, sorting, filtering, highlighting and
exceptions can be used as desired on individual attributes.
UNSTRUCTURED
In contrast, documents that contain, for example, body texts, pictures, films, are unstructured
information. They are often stored in different ways and created individually and manually rather
than automatically. The search – apart from that of attributes in the document master record or
document folders – is usually a free-text search using a text index that has been created with a
special indexing program.
2
information about the displayed data. Here the transition is blurred, and it is unclear what has the
higher status in a concrete document: the body text or the tables and diagrams.
Information is often not useful alone, but needs to be explained by its context. This is, of course,
also true for the interaction of structured and unstructured information. Therefore, here too the
operation of both types of information should remain as similar to each other as possible.
With information exchange, structured reports, as well as other documents, also have to be
exchanged and distributed, and it users have to be able to add comments afterwards, not only on
paper. Here it is also important that the information can be assigned in a targeted way within the
document. In addition to this, comments that have already been made in the data source should
also be accessible, without ruining the view of current data.
This means that reports and (in the case of report creation) their fundamental components, are
searched for by attributes and parts as well as by their description. A data source can, for
example, be searched for according to its fields or attributes, such as author, as well as by the
descriptive text.
Lastly, it is must be said that searching for documents, when not using a text index but using
document master record attributes, can be seen as master data reporting.
Q2. Discuss the problems with Unstructured Data. How is it dealt with.
Take for example the statements "Jim rode in his Mustang" and "Jim rode on his mustang."
There is little difference in the wording but a vast difference in meaning. A human would correctly
recognize that one is talking about a car and the other a horse. He would also know that the first
sentence must have taken place in the last forty years, since Ford started selling Mustangs in 1964,
and odds are that it occurred on a paved road. The other sentence is more likely to have happened on
a dirt path in the western United States in the latter half of the nineteenth century. There is also a
high degree of probability that Jim refers to a male, adult human. We also recognize that the sentence
might contain a typo, the "I" and the "O" are right next to each other on the keyboard, or someone
might have bad grammar. By reading other sentences in the same document, we convert these
probabilities into certainties.
We constantly engage in those types of analyses and decisions when we speak or read. Very
simple and fast for a human. But there is one problem.
"Humans are better than computers when it comes to less structured data," says Gartner, Inc.
(Stamford, CT) research vice president Alexander Linden. "The problem with humans is that they can't
scale well for large masses of data."
To overcome this scaling problem, companies such as ClearForest Corporation (New York,
N.Y.), Inxight Software, Inc. (Sunnyvale, Cal.),Megaputer Intelligence Inc. (Bloomington, Ind.)
and SPSS Inc. (Chicago, Ill.) have created products to analyze vast quantities of text information and
convert it into actionable intelligence.
3
The first step typically involves applying "natural language processing" algorithms which
determine the meaning of the sentences taking into account context, grammar, synonyms and
colloquialisms. It can then categorize the documents and group similar ones. Some tools allow
extraction of certain type of data such as all company names or cities. Others present the information
in a graphic form making it easier to spot relationships.
Although this technology is still fairly new and is not as accurate yet as traditional data
mining, its use is expanding. Dow Chemical, for example is using it to conduct patent searches and
manufacturers are using it to mine call center reports for common complaints. The Global Aviation
Information Network (GAIN), an international consortium of airlines, government agencies and
manufacturers, is developing tools to gather data from mechanics, pilot and flight attendant reports to
spot common mechanical problems so they can be corrected before a disaster - a far better method
than sorting through airplane wreckage to try to determine the cause.
"We are trying to get smarter by looking at events that happen relatively frequently, but are
innocuous by themselves because of the robustness of the systems," says Christopher Hart, Systems
Administrator for System Safety for the Federal Aviation Administration. "But if they are part of the
links in an accident chain, we are trying to stop those links before they cause an accident."
BI applications include the activities of decision support systems, query and reporting,
online analytical processing (OLAP), statistical analysis, forecasting, and data mining.
4
of data to keep the management informed about
the state of their business. Other BI tools are
used to store and analyze data, such as data
mining and data warehouses; decision support
systems and forecasting; document warehouses
and document management; knowledge
management; mapping, information visualization,
Journal of Theoretical and Applied Information Technology
© 2005 - 2009 JATIT. All rights reserved.
62 and dash boarding; management information
www.jatit.org
systems, geographic information systems; Trend
Analysis; Software as a Service (SaaS).
5
based on the needs of a given department.
Finance has their data mart, marketing has theirs,
and sales have theirs and so on. And the data
mart for marketing only faintly resembles
anyone else's data mart. Perhaps most
importantly, (Inmon, 1999) the individual
departments own the hardware, software, data
and programs that constitute the data mart. Each
department has its own interpretation of what a
data mart should look like and each department's
data mart is peculiar to and specific to its own
needs. Similar to data warehouses, data marts
contain operational data that helps business
experts to strategize based on analyses of past
trends and experiences. The key difference is
that the creation of a data mart is predicated on a
specific, predefined need for a certain grouping
and configuration of select data. There can be
multiple data marts inside an enterprise. A data
mart can support a particular business function,
business process or business unit.
A data mart as described by (Inmon, 1999) is a
collection of subject areas organized for decision
support based on the needs of a given department.
Finance has their data mart, marketing has theirs,
and sales have theirs and so on. And the data
mart for marketing only faintly resembles
anyone else's data mart.
BI tools are widely accepted as a new
middleware between transactional applications
and decision support applications, thereby
decoupling systems tailored to an efficient
handling of business transactions from systems
tailored to an efficient support of business
decisions. The capabilities of BI include decision
support, online analytical processing, statistical
analysis, forecasting, and data mining. The
following are the major components that
constitute BI.
Data Sources
Data sources can be operational databases,
historical data, external data for example, from
market research companies or from the Internet),
or information from the already existing data
warehouse environment. The data sources can be
relational databases or any other data structure
that supports the line of business applications.
They also can reside on many different platforms
and can contain structured information, such as
tables or spreadsheets, or unstructured
information, such as plaintext files or pictures
and other multimedia information.
6
Ans : ISSUES IN BI
7
Ans: A framework for implementing Business Intelligence (BI) solutions at an organization
level is provided. The framework includes
1) .an architecture module,
2) a process module,
3) an operations
4) and support module,
5) a governance module
6) and a delivery module.
The architecture module includes an execution architecture, an operations architecture
and a development architecture. The process module manages processes implemented by
Business Units (BUs) to provide BI solutions. The operations and support module manages the
infrastructural requirements of the framework. The governance module manages the
organizational requirements related to the implementation of the framework. The delivery
module manages the delivery of the processed business data to the BUs. The process module, the
operations and support module, the governance module and the delivery module are
implemented, based on the architecture module. The framework is used as a Business
Intelligence Competency Center (BICC) for providing real-time BI solutions to BUs.
Business Benefits
The payback achieved by building the business intelligence infrastructure is a function of how efficiently it
operates, how well the infrastructure is supported and enhanced by the business organization as well as its capacity
8
for producing business insight from raw operational data. The business intelligence infrastructure delivers key
information to business users. For maximum impact, standards and procedures must be in place to provide key
business information proactively. This business intelligence infrastructure enables the organization to unlock the
information from the legacy systems, to integrate data across the enterprise and empower business users to become
information self- sufficient.
ADVERTISEMENT
Providing managers and knowledge workers with new tools allowing them to see data in new ways
empowers them to make faster and better decisions. Rather than responding to continuous stream of report requests,
the business intelligence platform provides business users self-service decision support via the Web or at the
desktop. The quantifiable benefits of providing such a business intelligence platform are decisions which increase
revenue by identifying and creating up-sell and cross-sell opportunities, improve "valued customer" profitability,
decrease costs or expenses by leveraging infrastructure and automating processes, decrease investment in assets
such as inventory, or improve productivity with better decision making and faster response-to-market changes or
other business events.
The following sections examine more closely each of the layers of the business intelligence infrastructure.
Data Integration. Based on the overall requirements of business intelligence, the data integration layer is required to
extract, cleanse and transform data into load files for the information warehouse. This layer begins with transaction-
level operational data and meta data about these operational systems. Typically this data integration is done using a
relational staging database and utilizing flat file extracts from source systems. The product of a good data staging
layer is high-quality data, a reusable infrastructure and meta data supporting both business and technical users.
Improved data quality can entail matching against third-party name/address/location databases, merging information
from disparate sources into the same information structure and eliminating duplicate, null and outlier values. Building
an efficient data integration process is a key component to delivering powerful business intelligence solutions. Often
this infrastructure needs to bring data across on a daily basis. In order to accomplish this, the processes need to be
efficient and automated. Operator alerts should automatically be generated when exception conditions occur. This
layer will generate significant meta data that must be captured and leveraged in the other layers to ensure proper
delivery, support and guidance to both system administrators and business intelligence users.
Distinguishing characteristics of business intelligence versus operational systems information are primarily along the
lines of purpose, data content, usage and response time requirements. The purpose of operational systems is to
support the day-to-day business processes instead of supporting long-term strategic decision-making which is the
purpose of business intelligence. Data content for operational systems is current and has real-time values as
opposed to historical data that is accurate as of a point in time for business intelligence. Operational data is highly
structured and repetitive, whereas business intelligence is highly unstructured, heuristic or analytical. Operation
support requires information within seconds, yet business intelligence response time requirements range from
seconds to minutes.
Information Warehouse. The information warehouse layer consists of relational and/or OLAP cube services that
allow business users to gain insight into their areas of responsibility in the organization. Important in the warehouse
design the definition of databases that provide information on confirmed dimensions or business variables that are
true across the whole enterprise. The information warehouse is usually developed incrementally over time and is
architected to include key business variables and business metrics in a structure that meets all business analysis
questions required by the business groups. A common practice is to architect the information warehouse into a
sequence of data marts that can be developed within 90 days. The essential responsibility of the first data mart is to
build out the conformed dimensions for the enterprise and the business facts for a specific subject area of the
business. The infrastructure established during that initial data mart effort can be leveraged in subsequent data mart
development efforts.
In order to architect this information warehouse layer correctly, the business requirements and key business
questions need to be defined. When this information is available, there will be additional insight into the business
derived from the underlying data that cannot be fully anticipated before the data is actually available. Key areas to
9
consider in defining requirements relate to the major functional areas of the organization. There are a few key
categories of business intelligence that should be considered: Customer, operational and clickstream intelligence.
Customer Intelligence relates to customer, service, sales and marketing information viewed along time periods,
location/geography, product and customer variables. Business decisions that can be supported with customer
intelligence range from pricing, forecasting, promotion strategy, competitive analysis to up-sell strategy and customer
service resource allocation.
Operational Intelligence relates to finance, operations, manufacturing, distribution, logistics and human resource
information viewed along time periods, location/geography, product, project, supplier, carrier and employee. Business
decisions that can be supported with operational intelligence include budgeting, investment, pricing, hiring, training,
promotion, cost control, scheduling, service levels, defect prevention and capacity planning.
Clickstream Intelligence relates to Web sessions, sales and service information viewed along time periods,
products, customer, Web pages and request type. Business decisions that can be supported with clickstream
intelligence include website optimization, ad space pricing, promotion communication and up-sell strategy.
Automating the warehouse administration is essential for shortening the cycle time for bringing business intelligence
updates into the information warehouse. Features of an automated warehouse administration include operator alerts
for exceptions, automated exception handling based on predefined business rules, DBA alerts for key service
outages and a production scheduled of tasks run according to business requirements.
BI Applications. The most visible layer of the business intelligence infrastructure is the applications layer which
delivers the information to business users. Business intelligence requirements include scheduled report generation
and distribution, query and analysis capabilities to pursue special investigations and graphical analysis permitting
trend identification. This layer should enable business users to interact with the information to gain new insight into
the underlying business variables to support business decisions. Another important application is the balanced
scorecard that displays key performance indicator current values and the targets for financial, customer, internal
systems and human capital categories. The balance scorecard is a summary of key business analytics rolled up to
the appropriate level for the user with capabilities to drill down into more detail. This detail relates to operational,
customer and clickstream intelligence described in the previous section.
In order to achieve maximum velocity of business intelligence, continuous monitoring processes should be in place to
trigger alerts to business decision-makers, accelerating action toward resolving problems or compensating for
unforeseen business events. This proactive nature of business intelligence can provide tremendous business
benefits.
Portals. Presenting business intelligence on the Web through a portal is gaining considerable momentum. Web-
based portals are becoming commonplace as a single personalized point of access for key business information. All
major BI vendors have developed components which snap into the popular portal infrastructure. Portals are usually
organized by communities of users organized for suppliers, customers, employers and partners. Portals can reduce
the overall infrastructure costs of an organization as well as deliver great self-service and information access
capabilities.
Organizational Requirements. There are several organizational considerations that contribute to an efficient
business intelligence infrastructure. First is to have a core business intelligence implementation and support team
dedicated to the principles of optimizing the business intelligence infrastructure. This core team minimally consists of
the following roles:
• BI architect responsible for the overall design and implementation of the business intelligence infrastructure
with special focus on the information architecture in the information warehouse layer
• ETL developer responsible for the design and development of the data staging layer
• BI analyst responsible for identifying the key business intelligence questions of the business decision-
makers and the key requirements of the BI applications layer
10
• Database administrator responsible for the physical implementation and support of the information
warehouse layer
• Business content managers responsible for delivering the required information to various user communities
of the portal layer
Additionally, there needs to be great organizational support for business intelligence across the subject areas
comprising the business intelligence infrastructure. This is facilitated by a business intelligence steering committee
comprised of the technical managers and business sponsors. This committee ensures that the risks of the business
intelligence projects are mitigated with adequate scope control and good communication to the business sponsors.
Finally, there should be objective measures in place to track the effectiveness of the business intelligence
architecture. These measures should include areas such as business intelligence usage, ratings of business and
technical meta data available, service level metrics, business impacts metrics and efficiency improvement metrics.
To determine the completeness and adequacy of your BI infrastructure, answer the following questions. Any "no"
answers indicate opportunity areas for improvement.
1. Do you have an effective data integration process in place to create required business intelligence on a
daily basis?
2. Are continuous monitoring processes in place to allow alerts to be communicated immediately to those who
need to take action?
3. Is your information delivery process automated?
4. Is your warehouse administration infrastructure completely automated?
5. Are alerting techniques used to communicate exceptions quickly so decisions are accelerated?
6. Are the key business questions being answered about your business areas of responsibility?
7. Is information available on standardized dimension such as customer, product and geography?
8. Do you have adequate competitive information to answer key business questions?
9. Have you delivered scorecards on key performance indicators to top decision-makers?
10. Do you leverage your enterprise portal infrastructure to deliver business intelligence?
Mark Robinson manages Greenbrier & Russel's Business Intelligence practice. With more than 20 years of
experience in business and technology fields, Robinson has performed traditional leadership roles in IT
management, product management, practice management and solutions delivery as well as leading companies in
strategic transformation efforts through investments in business intelligence. As a consultant, he has been involved
in business transformation efforts that have begun with critical success factors studies, assessments and discovery
workshops that focus on the value of business intelligence as it aligns with the business strategy. In addition to
teaching, Robinson has been a speaker/educator on various topics in business intelligence with industry associations
including The Data Warehouse Institute and the Finance Executives Institute. You can reach him at
[email protected].
Q9. What are the important roles and responsibilities of a business intelligence
(BI) team?
11
Ans: The BI team typically is responsible for data definitions, source system
identification and connectivity, working with end users to define business
requirements and needs, performing data validation, writing BI policies and
procedures for steering committee approval, overseeing the implementation of
approved projects and assuring that the promised benefits of BI projects are
achieved. The business intelligence team is also tasked with reporting on the
progress of the BI program and defined projects to senior executives.
• Data quality vendors are being scooped up. There are fewer standalone data quality vendors these days.
Many have now become part of full suite vendors (those offering everything from ETL processing to the
ultimate delivery of BI analytics) or part of ETL packages. These purchases add to data integration
capabilities by improving the data that is being integrated.
• EII (enterprise information integration) vendors are also being purchased or forming exclusive partnerships
with other BI vendors. These strategic allegiances broaden the data integration capabilities of either the
standalone ETL or full-service vendors by adding the ability to deliver real-time integration capabilities.
• Small niche players are being acquired before they even make a name for themselves. IBM and Oracle, in
particular, seem to have a propensity to purchase these innovative and bright point-solution vendors. These
additions give these big companies quick solutions such as master data management (MDM) or customer
data integration (CDI) applications.
12
So – is this a good thing? On the plus side, consolidating your business intelligence solution into the hands of a single
vendor makes life simpler in terms of contracting, a single vendor interface and a reduction of finger-pointing when
things don’t go according to plan. There is no doubt an appeal to the one-stop-shopping mentality.
The goal of the full-service vendor is to provide a fully integrated, end-to-end solution that satisfies the majority of
your needs. These vendors state that their offerings significantly reduce the cost and implementation time. These
products simplify maintenance, make enhancements easier and shorten the learning curve for implementers.
On the negative side, just because a company bought a technology does not mean that the technology is instantly
integrated into the rest of the vendor’s offerings. You may still struggle with integration issues for several years after
the technology was acquired. Second, there is a question of focus. The best-of-breed vendors have a single mission
in life – to make the best solution for a particular problem. Their solution is usually a surgical strike – complete, fast
and elegant. The full-service vendors usually can’t afford to expend that kind of energy on a point solution, especially
when they are applying a lot of their energy on integrating their technologies. Their compromise may be a suite that
meets the majority of your needs, but not all. When shopping for best-of-breed versus full-service technologies, look
at the partnerships between companies. You should get a pretty good idea of who may be next on the auction block.
Finally, no single business intelligence vendor has a commanding lead in terms of sales or market share. That means
that there is still plenty of room for both types of vendors. It is still a place where small, innovative technologies can
effectively compete against their much larger and more established brethren.
To reduce data latency, we see more and more virtual BI components being created, including virtual operational
data stores (ODSs) and data marts using enterprise EII technologies. If your operational data is in fairly good shape
(minimal integration and data cleanup required), a virtual ODS or even an oper mart may be a solution to reducing
data latency. However, it is mandatory to monitor the effects of this environment on the operational systems.
For analysis latency, we are seeing technologies that offer business activity monitoring (BAM) or operational
dashboards as inline operational analytic engines that constantly serve the results to the business user and send
alerts or alarms immediately when thresholds are exceeded. Key performance indicators (KPIs), important metrics
delivered hourly or even more frequently, and consolidated current operational results can be displayed through the
dashboards or portals, giving the operations personnel insight into key events that are occurring.
While speeding the collection, analysis and display of operational data is certainly useful to the business, it must be
remembered that not all business intelligence data must be included. Many IT implementers do not perform thorough
due diligence to determine precisely what data must be included in an operational BI application. They make the
13
catastrophic error of including as much as they can, forcing all data to be “real time,” thereby creating an unwieldy
and unmanageable BI world.
The message is to carefully evaluate the push for real-time analytics. Understand the business need completely, and
you may find that it is really only a very small percentage of data that must be rushed into the hands to the business
consumers. Most analytic data can be hours, days or months old and still be relevant to the decision-making process.
These technologies have raised the awareness of predictive analytics or guided decision-making capabilities, making
it possible to embed these in operational flows. Companies are now able to perform operational or right-time BI,
giving the front-line workers the ability to access and use the results of these analytics, combined with operational
data, for their daily activities.
As exciting as these are, it must be remembered that these capabilities don’t just come into being serendipitously.
They must fit into the enterprise’s overall BI architecture and technological infrastructure or chaos will surely reign.
Operational BI also requires a thorough understanding of the business processes or workflow that it will enhance.
Without this understanding, BI implementers cannot know how or where to embed these valuable insights for
maximum benefit.
There are other technological trends in our business intelligence marketplace, but these seem to be the ones having
the highest impact on business intelligence today. It would be interesting to revisit these trends in six months to see if
other trends have emerged to captivate our attention and change the direction of our still young and growing industry.
14