Knowledge Management Complete Notes
Knowledge Management Complete Notes
Question Bank
UNIT – 1
Q1 Define Data, Information, Knowledge
Ans.
Data: Data is a collection of facts, measurements, observations, or words that can be used to generate
information. Data can include the number of people in a country, the value of sales of a product, or the number
of times a country has won a cricket match.
• Structured Data: This type of data is organized data into specific format, making it easy to search ,
analyze and process. Structured data is found in a relational databases that includes information like
numbers, data and categories.
• Unstructured Data: Unstructured data does not conform to a specific structure or format. It may include
some text documents , images, videos, and other data that is not easily organized or analyzed without
additional processing.
Data can be classified as qualitative or quantitative. Qualitative data captures subjective qualities, while
quantitative data is numerical and can be measured.
Information: Information is processed data, organized, or structured in a way that makes it meaningful, valuable
and useful. It is data that has been given context, relevance and purpose. It gives knowledge, understanding and
insights that can be used for decision-making, problem-solving, communication and various other purposes.
Knowledge: A mix of contextual information, experiences, rules, and values. Knowledge is richer and deeper
than information and more valuable because someone has thought deeply about that information and added their
own unique experience, judgment, and wisdom.
Knowledge can be gained and accumulated as “information combined with experience, context, interpretation,
reflection and is highly contextual”.
It is a high-value form of information that is ready for application to decision and actions within organizations.
Knowledge is increasingly being viewed as a commodity or an intellectual asset. It possesses some contradictory
characteristics that are radically different from those of other valuable commodities.
Types of Knowledge
1. Tacit knowledge 2. Explicit knowledge
Tacit Knowledge (Implicit Knowledge)
a. The word tacit means understood and implied without being stated.
b. The tacit knowledge is unique and it can’t explain clearly.
c. That is the knowledge which the people possess is difficult to express.
d. The cognitive skills of an employee are a classic example of tacit knowledge.
1
e. The tacit knowledge is personal and it varies depending upon the education, attitude and perception of the
individual.
f. This is impossible to articulate because sometimes the tacit knowledge may be even sub conscious.
g. This tacit knowledge is also subjective in character.
Explicit Knowledge
The word explicit means stated clearly and in detail without any room for confusion. The explicit knowledge
is easy to articulate and they are not subjective. This is also not unique and it will not differ upon individuals.
It is impersonal. The explicit knowledge is easy to share with others.
Is quantifiable, there can Is quantifiable, there can be Is not quantifiable, there is no knowledge
be data overload information overload overload
Q 2- What do you understand by Decision support system? What are the components of DSS? Discuss the
characteristics of DSS.
DSS: A decision support system (DSS) is an interactive computer-
based application that combines data and mathematical models to help
decision makers solve complex problems faced in managing the public
and private enterprises and organizations.
A properly designed DSS is an interactive software based system
intended to help decision makers compile useful information from raw
data, documents, personal knowledge, and/or business models to
identify and solve problems and make decisions
2
deemed significant. Mathematical models and the corresponding solution methods usually play a valuable role
during the choice phase.
Implementation: When the best alternative has been selected by the decision maker, it is transformed into
actions by means of an implementation plan. This involves assigning responsibilities and roles to all those
involved into the action plan.
Control: Once the action has been implemented, it is finally necessary to verify and check that the original
expectations have been satisfied and the effects of the action match the original intentions.
It contains three elements:
Data: Contains database
Models: Repository(collections) of mathematical Models
Interface: Module for handling the dialogue between the system
and the users.
Components of DSS:
Data management: includes database required to make decisions.
Model Management: Collection of mathematical modes derived from operations research, statistics and
financial analysis.
Interactions: Takes inputs from user specifically in graphics forms from browser and gives information and
knowledge generated by system.
Knowledge management: It allows decision makers to draw various forms of collective knowledge.
Features of Decision Support System:
Effectiveness: It should help knowledge workers to reach more effective decisions.
3
Mathematical models: Mathematical modes are applied to the data contained in data marts and data warehouse.
Integration in the decision-making process: Decision makers allowed to integrate in a DSS to their needs
rather than passively accepting what comes out of it.
Organizational role: DSS operate at different hierarchical levels within an enterprise.
Flexibility: A DSS must be flexible and adaptable in order to incorporate the changes required to reflect
modifications in the environment or in the decision-making process.
Advantages of a DSS:
• Enable informed decision-making. By taking multiple different data sources into account, DSS can
facilitate better, up-to-date and informed decisions.
• Consider different outcomes. DSS consider different business outcomes, as possible decisions are based
on current and historical company data.
• Increase efficiency. DSS automate the analysis of large data sets.
• Provide better collaboration. DSS tools might also include communication and collaboration features.
• Enable flexibility. DSS can be used by many different industries.
• Handle complexity. DSS can handle complex problems that have multiple interdependencies and
variables.
Disadvantages of a DSS:
• Cost. Expenses for developing, implementing and maintaining DSS can be high, which can limit their
use by smaller organizations.
• Dependence. Developing an over-reliance on a DSS eventually takes away from the subjectivity
involved in decision-making.
• Complexity. DSS must consider all aspects of a given problem, which requires a lot of data. They can
also be complex to design and implement.
• Security. Data that DSS use might involve sensitive or critical data, meaning that an increased focus on
security is required.
Types of decision
Strategic decisions: Decisions are strategic when they affect the entire organization or at least a substantial part
of it for a long period of time. They strongly influence the general objectives and policies of an enterprise. Taken
at a higher organizational level, usually by the company top management.
Tactical decision: Tactical decisions affect only parts of an enterprise and are usually restricted to a single
department. The time span is limited to a medium-term horizon, typically up to a year. Made by middle
managers.
Operational decision: Operational decisions are framed within the elements and conditions determined by
strategic and tactical decisions. They are usually made at a lower organizational level, by knowledge workers
responsible for a single activity or task such as sub-department heads, workshop foremen, back-office heads.
4
Q 3- What do you understand by Group Decision support system? What are the components of GDSS?
Discuss the characteristics of GDSS.
Ans Group Decision Support System (GDSS) It is an information system that is designed to support decisions
made by groups in an organization. The main aim of GDSS is to facilitate group communication and foster
learning. A GDSS is helpful in situations involving meeting scheduling and documentation; brainstorming;
group discussions; visioning; planning; team building; etc. It enables users or group members to solve complex
problems, formulate detailed plans and proposals, manage conflicts, and effectively prioritize activities. GDSS
helps group members not only make better decisions but also improve tasks in an improved manner.
Components of GDSS
There are three fundamental types of components that compose GDSSs:
1. Software
The software part may consist of the following components: databases and database management capabilities,
user/system interface with multi-user access, specific applications to facilitate group decision-makers’ activities,
and modeling
capabilities.
2. Hardware
he hardware part may consist of the following components: I/O devices, PCs or workstations, individual monitors
for each participant or a public screen for group, and a network to link participants to each other.
3. People
he people may include decision-making participants and /or facilitator. A facilitator is a person who directs the
group through the planning process.
Advantages of GDSS
1. Increased efficiency: Due to increasing computer data processing power, communication and network
performance, the speed and quality for information processing and information transmission create the
opportunity for higher efficiency. Efficiency achievement depends on the performance of hardware (e.g.,
PCs, LAN/WAN) and software.
2. Improved quality: In a GDSS, the outcome of a meeting or decision-making process depends on
communication facilities and decision support facilities. Those facilities can help decision-making
participants avoid the constraints imposed by geography. They also make information sharable and
reduce effort in the decision-making process. Therefore, those facilities contribute to meeting quality
improvement.
3. Leverage that improves the way meetings run: Leverage implies that the system does not merely speed
up the process (say efficiency), but changes it fundamentally. In other words, leverage can be achieved
through providing better ways of meeting, such as providing the ability to execute multiple tasks at the
same time.
5
Q 4- What is Executive information system? How does decision support system help in business?
Executive support systems are intended to be used by the senior managers directly to provide support to non-
programmed decisions in strategic management.
These information are often external, unstructured and even uncertain. Exact scope and context of such
information is often not known beforehand.
• Market intelligence
• Investment intelligence
• Technology intelligence
Examples of Intelligent Information
Following are some examples of intelligent information, which is often the source of an EIS:
• External databases
• Technology reports like patent records etc.
• Technical reports from consultants
• Market reports
• Confidential information about competitors
• Speculative information like market conditions
• Government policies
• Financial reports and information
Advantages of EIS
6
• Time management
• Increased communication capacity and quality
Disadvantage of EIS
Q 5- What is groupware technology? Discuss different groupware technologies. How is Groupware Design
Different from Traditional User Interface Design?
Groupware is technology designed to facilitate the work of groups. This technology may be used to
communicate, cooperate, coordinate, solve problems, compete, or negotiate. While traditional technologies like
the telephone qualify as groupware, the term is ordinarily used to refer to a specific class of technologies relying
on modern computer networks, such as email, newsgroups, videophones, or chat.
An organisation or a team might use groupware to work more efficiently in the following ways:
• Enables collaboration and communication on projects irrespective of employee's location
• Minimises misunderstanding and errors in the workplace
• Helps in the efficient management of workflows and processes
• Ensures efficient information management in the organisation
Types of Groupware
Apart from ensuring transparency and productivity in teams, these tools ensure communication, conferencing
and coordination among team members. Here are two types of groupware:
1. Synchronous groupware
2. Asynchronous groupware
Synchronous groupware
Synchronous groupware are tools that allow multiple users to contribute to a project in real-time. Examples of
synchronous groupware include video conferencing, chat systems, support systems and shared whiteboards.
Collaborating in real-time helps organisations create and manage tasks. Using synchronous groupware, teams
focus on group meetings between professionals working on different teams. This can help companies train
multiple professionals simultaneously while reducing the resources a team manager uses.
Asynchronous groupware
Asynchronous groupware allows multiple users to contribute to projects at different times. It allows different
services like file sharing, structured messages, collaborative writing and email handling. A team might prefer
using asynchronous groupware, as it maximises the time people spend working on different projects. These tools
allow team members to access information and data from any location and at any time with an internet
connection. Asynchronous groupware users collaborate on shared data access and make modifications.
7
Advantages of using a groupware
Groupware helps assist people in working collaboratively while located in different places. Here are some
benefits of using groupware:
• Fosters creativity: Groupware fosters creativity among different users and enables team members to use
new ideas to improve the workflow and process.
• Facilitates communication: Groupware facilitates communication between team members through
chats, video conferencing, instant messaging and emails. Using communication tools, team members can
discuss issues before they become significant.
• Manages multiple tasks: Groupware helps manage multiple tasks, professionals and teams, making it
easier to understand goals at every level of the organisation. Knowing the goals at multiple levels can
increase production because everyone understands the task to complete.
• Provides structure: Groupware provides a structure which allows team members to view the goals and
purpose and set up schedules. This helps ensure team members can acquire important information,
compare notes and exchange ideas with others.
• Helps save documents: Using groupware, teams have an option to save documents like faxes, emails
and spreadsheets. This allows users to access files from anywhere using an internet connection.
• Saves travel costs: With groupware, organisations can save travel costs because they do not travel to
different locations to conduct meetings. It helps ensure also team members can attend the meeting while
working from home.
Disadvantages of using a groupware
Though groupware offers many benefits, it also offers the following disadvantages:
• Expensive: Buying a subscription to these tools can be costly. An organisation might also incur training
costs while implementing these tools in the organisation.
• Dependent on a server: While most groupware tools are beneficial, some might be unreliable because
they depend on one server. When the server is down, no one can use the tool.
• Does not allow non-verbal communication: Groupware tools do not allow non-verbal communication
between team members. As a result, team members might not form a strong professional relationships
with each other.
• Promotes overdependence on particular groupware: An organisation might depend upon one tool
because of the security issues involved and the cost of training. This dependence on one groupware might
be problematic in the long term.
Email is by far the most common groupware application (besides, of course, the traditional telephone). While
the basic technology is designed to pass simple messages between 2 people, even relatively basic email systems
today typically include interesting features for forwarding messages, filing messages, creating mailing groups,
and attaching files with a message.
Newsgroups and mailing lists are similar in spirit to email systems except that they are intended for messages
among large groups of people instead of 1-to-1 communication. In practice the main difference between
newsgroups and mailing lists is that newsgroups only show messages to a user when they are explicitly requested
(an “on-demand” service), while mailing lists deliver messages as they become available (an “interrupt-driven”
interface).
8
Workflow systems allow documents to be routed through organizations through a relatively-fixed process.
Workflow systems may provide features such as routing, development of forms, and support for differing roles
and privileges.
Group calendars allow scheduling, project management, and coordination among many people, and may
provide support for scheduling equipment as well. Typical features detect when schedules conflict or find
meeting times that will work for everyone.
Shared whiteboards allow two or more people to view and draw on a shared drawing surface even from
different locations. This can be used, for instance, during a phone call, where each person can jot down notes
(e.g., a name, phone number, or map) or to work collaboratively on a visual problem.
Video communications systems allow two-way or multi-way calling with live video, essentially a telephone
system with an additional visual component. Cost and compatibility issues limited early use of video systems to
scheduled videoconference meeting rooms.
Chat systems permit many people to write messages in real time in a public space. As each person submits a
message, it appears at the bottom of a scrolling screen. Chat groups are usually formed by having a listing of
chat rooms by name, location, number of people, topic of discussion, etc.
Multi-player games have always been reasonably common in arcades, but are becoming quite common on the
internet. Many of the earliest electronic arcade games were multi-user, for example, Pong, Space Wars, and car
racing games.
Electronic questionnaire: It is used by group members for planning meetings, and determining crucial issues
and related information for decision-making. By using electronic questionnaires, groups can acquire the required
information for finding optimal solutions and making effective decisions.
UNIT -II
Q 6- What is expert system? Explain different component of expert system also with advantages &
Disadvantages
Ans An Expert System (ES) is a computer-based system that mimics the decision-making ability of a human
expert in a specific domain or field. It uses knowledge representation, inference mechanisms, and user interfaces
to provide expert-level advice, solutions, or recommendations.
9
Components of an Expert System:
3. Knowledge Acquisition System (KAS): Helps experts transfer their knowledge to the KB.
1. Medical Diagnosis
2. Financial Planning
3. Engineering Design
4. Troubleshooting
5. Decision Support Systems
6. Robotics
7. Natural Language Processing
1. Improved decision-making
2. Increased efficiency
3. Enhanced accuracy
4. Consistency
5. Scalability
6. Reduced costs
10
2. Complexity
3. Maintenance difficulties
4. Explanation limitations
5. Limited domain expertise
Q7- Define Data Warehouse. Explain characteristics of data warehouse. Why do we need data
warehouse?
Ans Data Warehouse: A Data Warehouse (DW) is a centralized repository that stores data from various sources
in a single location, making it easier to access, analyze, and report data. It is a database specifically designed for
querying and analyzing data, rather than transactional processing.
11
What is Data Warehousing?
Data warehousing is the process of constructing and using a data warehouse. A data warehouse is constructed by
integrating data from multiple heterogeneous sources that support analytical reporting, structured and/or ad hoc
queries, and decision making. Data warehousing involves data cleaning, data integration, and data
consolidations.
Extract and Load Process Data extraction takes data from the source systems. Data load takes the extracted
data and loads it into the data warehouse.
Note: Before loading the data into the data warehouse, the information extracted from the external sources must
be reconstructed.
Controlling the Process Controlling the process involves determining when to start data extraction and the
consistency check on data. Controlling process ensures that the tools, the logic modules, and the programs are
executed in correct sequence and at correct time.
When to Initiate Extract
Data needs to be in a consistent state when it is extracted, i.e., the data warehouse should represent a single,
consistent version of the information to the user.
For example, in a customer profiling data warehouse in telecommunication sector, it is illogical to merge the list
of customers at 8 pm on Wednesday from a customer database with the customer subscription events up to 8 pm
on Tuesday. This would mean that we are finding the customers for whom there are no associated subscriptions.
12
Loading the Data After extracting the data, it is loaded into a temporary data store where it is cleaned up and
made consistent.
Note: Consistency checks are executed only when all the data sources have been loaded into the temporary data
store.
Clean and Transform Process
Once the data is extracted and loaded into the temporary data store, it is time to perform Cleaning and
Transforming. Here is the list of steps involved in Cleaning and Transforming:
• Clean and transform the loaded data into a structure
• Partition the data
• Aggregation
Clean and Transform the Loaded Data into a Structure
Cleaning and transforming the loaded data helps speed up the queries. It can be done by making the data
consistent:
• within itself.
• with other data within the same data source.
• with the data in other source systems.
• with the existing data present in the warehouse.
Transforming involves converting the source data into a structure. Structuring the data increases the query
performance and decreases the operational cost. The data contained in a data warehouse must be transformed to
support performance requirements and control the ongoing operational costs.
Partition the Data
It will optimize the hardware performance and simplify the management of data warehouse. Here we partition
each fact table into multiple separate partitions.
Aggregation
Aggregation is required to speed up common queries. Aggregation relies on the fact that most common queries
will analyze a subset or an aggregation of the detailed data.
13
• Bottom Tier - The bottom tier of the architecture is the data warehouse database server. It is the relational
database system. We use the back-end tools and utilities to feed data into the bottom tier. These backend tools
and utilities perform the Extract, Clean, Load, and refresh functions.
• Middle Tier - In the middle tier, we have the OLAP Server that can be implemented in either of the following
ways.
o By Relational OLAP (ROLAP), which is an extended relational database management system. The ROLAP
maps the operations on multidimensional data to standard relational operations.
o By Multidimensional OLAP (MOLAP) model, which directly implements the multidimensional data and
operations.
• Top-Tier - This tier is the front-end client layer. This layer holds the query tools and reporting tools, analysis
tools and data mining tools. The following diagram depicts the three-tier architecture of a data warehouse:
Metadata
Metadata is simply defined as data about data. The data that are used to represent other data is known as
metadata. For example, the index of a book serves as a metadata for the contents in the book. In other words, we
can say that metadata is the summarized data that leads us to the detailed data.
In terms of data warehouse, we can define metadata as following:
14
• Metadata is a roadmap to data warehouse.
• Metadata in data warehouse defines the warehouse objects.
• Metadata acts as a directory. This directory helps the decision support system to locate the contents of a data
warehouse.
Metadata Repository
Metadata repository is an integral part of a data warehouse system. It contains the following metadata:
• Business metadata - It contains the data ownership information, business definition, and changing policies.
• Operational metadata - It includes currency of data and data lineage. Currency of data refers to the data
being active, archived, or purged. Lineage of data means history of data migrated and transformation applied on
it.
• Data for mapping from operational environment to data warehouse - It metadata includes source
databases and their contents, data extraction, data partition, cleaning, transformation rules, data refresh and
purging rules.
• The algorithms for summarization - It includes dimension algorithms, data on granularity, aggregation,
summarizing, etc.
Data Cube
A data cube helps us represent data in multiple dimensions. It is defined by dimensions and facts. The
dimensions are the entities with respect to which an enterprise preserves the records.
Data Mart
Data marts contain a subset of organization-wide data that is valuable to specific groups of people in an
organization. In other words, a data mart contains only those data that is specific to a particular group. For
example, the marketing data mart may contain only data related to items, customers, and sales. Data marts are
confined to subjects.
15
Online Analytical Processing Server (OLAP)
Online Analytical Processing Server (OLAP) is based on the multidimensional data model. It allows managers
and analysts to get an insight of the information through fast, consistent, and interactive access to information.
This chapter covers the types of OLAP, operations on OLAP, difference between OLAP, and statistical
databases and OLTP.
Types of OLAP Servers
We have four types of OLAP servers:
• Relational OLAP (ROLAP)
• Multidimensional OLAP (MOLAP)
• Hybrid OLAP (HOLAP)
• Specialized SQL Servers
Relational OLAP
ROLAP servers are placed between relational back-end server and client frontend tools. To store and manage
warehouse data, ROLAP uses relational or extended-relational DBMS.
ROLAP includes the following:
• Implementation of aggregation navigation logic.
• Optimization for each DBMS back-end.
• Additional tools and services.
Points to Remember
• ROLAP servers are highly scalable.
• ROLAP tools analyze large volumes of data across multiple dimensions.
• ROLAP tools store and analyze highly volatile and changeable data.
16
Advantages
• ROLAP servers can be easily used with existing RDBMS.
• Data can be stored efficiently, since no zero facts can be stored.
• ROLAP tools do not use pre-calculated data cubes.
• DSS server of micro-strategy adopts the ROLAP approach.
Disadvantages
• Poor query performance.
• Some limitations of scalability depending on the technology architecture that is utilized.
Multidimensional OLAP
MOLAP uses array-based multidimensional storage engines for multidimensional views of data. With
multidimensional data stores, the storage utilization may be low if the dataset is sparse. Therefore, many
MOLAP servers use two levels of data storage representation to handle dense and sparse datasets.
Points to Remember
• MOLAP tools process information with consistent response time regardless of level of summarizing or
calculations selected.
• MOLAP tools need to avoid many of the complexities of creating a relational database to store data for
analysis.
• MOLAP tools need fastest possible performance.
• MOLAP server adopts two level of storage representation to handle dense and sparse datasets.
• Denser sub-cubes are identified and stored as array structure.
• Sparse sub-cubes employ compression technology
MOLAP Architecture
MOLAP includes the following components:
• Database server
• MOLAP server
• Front-end tool
17
Advantages
• MOLAP allows fastest indexing to the pre-computed summarized data.
• Helps the users connected to a network who need to analyze larger, less defined data.
• Easier to use, therefore MOLAP is suitable for inexperienced users.
Disadvantages
• MOLAP are not capable of containing detailed data.
• The storage utilization may be low if the data set is sparse.
18
Specialized SQL Servers Specialized SQL servers provide advanced query language and query processing
support for SQL queries over star and snowflake schemas in a read-only environment.
OLAP Operations
Since OLAP servers are based on multidimensional view of data, we will discuss OLAP operations in
multidimensional data.
Here is the list of OLAP operations:
• Roll-up
• Drill-down
• Slice and dice
• Pivot (rotate)
Roll-up
Roll-up performs aggregation on a data cube in any of the following ways:
• By climbing up a concept hierarchy for a dimension
• By dimension reduction
The following diagram illustrates how roll-up works.
19
Drill-down
Drill-down is the reverse operation of roll-up. It is performed by either of the following ways:
• By stepping down a concept hierarchy for a dimension
• By introducing a new dimension
• Drill-down is performed by stepping down a concept hierarchy for the dimension time.
• Initially the concept hierarchy was "day < month < quarter < year."
• On drilling down, the time dimension is descended from the level of quarter to the level of month.
• When drill-down is performed, one or more dimensions from the data cube are added.
• It navigates the data from less detailed data to highly detailed data.
Slice
The slice operation selects one particular dimension from a given cube and provides a new sub-cube. Consider
the following diagram that shows how slice works.
• Here Slice is performed for the dimension "time" using the criterion time = "Q1".
• It will form a new sub-cube by selecting one or more dimensions.
Dice
Dice selects two or more dimensions from a given cube and provides a new subcube. Consider the following
diagram that shows the dice operation.
The dice operation on the cube based on the following selection criteria involves three dimensions.
• (location = "Toronto" or "Vancouver")
• (time = "Q1" or "Q2")
• (item =" Mobile" or "Modem")
Pivot
The pivot operation is also known as rotation. It rotates the data axes in view in order to provide an alternative
presentation of data. Consider the following diagram that shows the pivot operation.
1. What do you understand by group decision support system? What are the components of GDSS?
2. What is executive information system? How does decision support system help in business? Explain.
3. Explain the term groupware technologies in detail.
4. Define business expert system in detail along with its benefits.
5. Explain in detail data warehousing tools and utilities functions.
6. What is data warehouse? What are the goals of a data warehouse?
7. How are data marts different from data warehouse?
8. Differentiate Datamart and Warehouse.
9. Define Metadata.
20
10. Differentiate OLAP and OLTP.
11. Explain the three-tier architecture of a data warehouse.
12. Differentiate MOLAP and ROLTP.
13. Write all the operations of OLAP
14. Define the architecture of ROLAP and MOLAP
15. Write all the Process Flow in Data Warehousing.
UNIT-III
Multi- Dimensional analysis:
Data mining and knowledge discovery,
Data mining and Techniques,
Data mining of Advance Databases.
What is knowledge discovery?
What is datamining technologies?
What are datamining functionalities?
What do you understand by frequent pattern mining?
Classification Vs Clustering.
Section B or C
What are various steps of knowledge discovery? Discuss the role of datamining in knowledge discovery.
Explain the diagrammatic illustration for step by steps involved in the process of knowledge discovery from data
base.
What is need of datamining? Explain different types of data mining techniques.
What is need of datamining? Discuss the evaluation of database system technologies.
Explain various methods for evaluating the accuracy of classification or prediction.
Describe classification and prediction. Discuss methods regarding classification.
Describe Apriori Algorithm for frequent pattern mining.
1. Datamining terminology
Data Mining
Data mining is defined as extracting the information from a huge set of data. In other words we can say that data
mining is mining the knowledge from data. This information can be used for any of the following applications −
• Market Analysis
• Fraud Detection
• Customer Retention
• Production Control
• Science Exploration
Data Mining Engine
Data mining engine is very essential to the data mining system. It consists of a set of functional modules that
perform the following functions −
• Characterization
21
• Association and Correlation Analysis
• Classification
• Prediction
• Cluster analysis
• Outlier analysis
• Evolution analysis
Knowledge Base
This is the domain knowledge. This knowledge is used to guide the search or evaluate the interestingness of the
resulting patterns.
Knowledge Discovery
Some people treat data mining same as knowledge discovery, while others view data mining as an essential step
in the process of knowledge discovery. Here is the list of steps involved in the knowledge discovery process −
• Data Cleaning
• Data Integration
• Data Selection
• Data Transformation
• Data Mining
• Pattern Evaluation
• Knowledge Presentation
User interface
User interface is the module of data mining system that helps the communication between users and the data
mining system. User Interface allows the following functionalities −
22
Data Cleaning
Data cleaning is a technique that is applied to remove the noisy data and correct the inconsistencies in data. Data
cleaning involves transformations to correct the wrong data. Data cleaning is performed as a data preprocessing
step while preparing the data for a data warehouse.
Data Selection
Data Selection is the process where data relevant to the analysis task are retrieved from the database. Sometimes
data transformation and consolidation are performed before the data selection process.
Clusters
Cluster refers to a group of similar kind of objects. Cluster analysis refers to forming group of objects that are
very similar to each other but are highly different from the objects in other clusters.
Data Transformation
In this step, data is transformed or consolidated into forms appropriate for mining, by performing summary or
aggregation operations.
• Design and construction of data warehouses for multidimensional data analysis and data mining.
23
Retail Industry
Data Mining has its great application in Retail Industry because it collects large amount of data from on sales,
customer purchasing history, goods transportation, consumption and services. It is natural that the quantity of
data collected will continue to expand rapidly because of the increasing ease, availability and popularity of the
web.
Data mining in retail industry helps in identifying customer buying patterns and trends that lead to improved
quality of customer service and good customer retention and satisfaction. Here is the list of examples of data
mining in the retail industry −
• Design and Construction of data warehouses based on the benefits of data mining.
• Customer Retention.
Telecommunication Industry
Today the telecommunication industry is one of the most emerging industries providing various services such as
fax, pager, cellular phone, internet messenger, images, e-mail, web data transmission, etc. Due to the development
of new computer and communication technologies, the telecommunication industry is rapidly expanding. This is
the reason why data mining is become very important to help and understand the business.
Data mining in telecommunication industry helps in identifying the telecommunication patterns, catch fraudulent
activities, make better use of resource, and improve quality of service. Here is the list of examples for which data
mining improves telecommunication services −
24
Biological Data Analysis
In recent times, we have seen a tremendous growth in the field of biology such as genomics, proteomics,
functional Genomics and biomedical research. Biological data mining is a very important part of Bioinformatics.
Following are the aspects in which data mining contributes for biological data analysis −
• Alignment, indexing, similarity search and comparative analysis multiple nucleotide sequences.
• Discovery of structural patterns and analysis of genetic networks and protein pathways.
• Association and correlation analysis, aggregation to help select and build discriminating attributes.
25
3. What is Knowledge Discovery?
Data mining as an essential step in the process of knowledge discovery. Here is the list of steps involved in the
knowledge discovery process −
• Data Cleaning − In this step, the noise and inconsistent data is removed.
• Data Selection − In this step, data relevant to the analysis task are retrieved from the database.
• Data Transformation − In this step, data is transformed or consolidated into forms appropriate for mining
by performing summary or aggregation operations.
• Data Mining − In this step, intelligent methods are applied in order to extract data patterns.
26
4. Datamining tasks: Data mining deals with the kind of patterns that can be mined. On the basis of the
kind of data to be mined, there are two categories of functions involved in Data Mining −
1. Descriptive
2. Classification & prediction
Descriptive Function
The descriptive function deals with the general properties of data in the database. Here is the list of descriptive
functions −
• Class/Concept Description
• Mining of Frequent Patterns
• Mining of Associations
• Mining of Correlations
• Mining of Clusters
Class/Concept Description
Class/Concept refers to the data to be associated with the classes or concepts. For example, in a company, the
classes of items for sales include computer and printers, and concepts of customers include big spenders and
budget spenders. Such descriptions of a class or a concept are called class/concept descriptions. These
descriptions can be derived by the following two ways −
• Data Characterization − This refers to summarizing data of class under study. This class under study is
called as Target Class.
• Data Discrimination − It refers to the mapping or classification of a class with some predefined group or
class.
• Frequent Item Set − It refers to a set of items that frequently appear together, for example, milk and
bread.
• Frequent Subsequence − A sequence of patterns that occur frequently such as purchasing a camera is
followed by memory card.
• Frequent Sub Structure − Substructure refers to different structural forms, such as graphs, trees, or
lattices, which may be combined with item−sets or subsequences.
Mining of Association
Associations are used in retail sales to identify patterns that are frequently purchased together. This process refers
to the process of uncovering the relationship among data and determining association rules.
27
For example, a retailer generates an association rule that shows that 70% of time milk is sold with bread and only
30% of times biscuits are sold with bread.
Mining of Correlations
It is a kind of additional analysis performed to uncover interesting statistical correlations between associated-
attribute−value pairs or between two item sets to analyze that if they have positive, negative or no effect on each
other.
Mining of Clusters
Cluster refers to a group of similar kind of objects. Cluster analysis refers to forming group of objects that are
very similar to each other but are highly different from the objects in other clusters.
Classification & prediction: Classification is the process of finding a model (or function) that describes and
distinguishes data classes or concepts, for the purpose of being able to use the model to predict the class of
objects whose class label is unknown.
The derived model is based on the analysis of a set of training data (i.e. data object whose class label is known.)
The derived model may be represented in various forms, such as classification(if-then) rule, decision tree, neural
network.
There are two forms of data analysis that can be used for extracting models describing important classes or to
predict future data trends. These two forms are as follows −
• Classification
• Prediction
Classification models predict categorical class labels; and prediction models predict continuous valued functions.
For example, we can build a classification model to categorize bank loan applications as either safe or risky, or a
prediction model to predict the expenditures in dollars of potential customers on computer equipment given their
income and occupation.
What is classification?
Following are the examples of cases where the data analysis task is Classification −
• A bank loan officer wants to analyze the data in order to know which customer (loan applicant) are risky
or which are safe.
• A marketing manager at a company needs to analyze a customer with a given profile, who will buy a new
computer.
In both of the above examples, a model or classifier is constructed to predict the categorical labels. These labels
are risky or safe for loan application data and yes or no for marketing data.
28
What is prediction?
Following are the examples of cases where the data analysis task is Prediction −
Suppose the marketing manager needs to predict how much a given customer will spend during a sale at his
company. In this example we are bothered to predict a numeric value. Therefore the data analysis task is an
example of numeric prediction. In this case, a model or a predictor will be constructed that predicts a continuous-
valued-function or ordered value.
Note − Regression analysis is a statistical methodology that is most often used for numeric prediction.
• The classifier is built from the training set made up of database tuples and their associated class labels.
• Each tuple that constitutes the training set is referred to as a category or class. These tuples can also be
Decision tree:
A decision tree is a structure that includes a root node, branches, and leaf nodes. Each internal node denotes a test
on an attribute, each branch denotes the outcome of a test, and each leaf node holds a class label. The topmost
node in the tree is the root node.
The following decision tree is for the concept buy_computer that indicates whether a customer at a company is
likely to buy a computer or not. Each internal node represents a test on an attribute. Each leaf node represents a
class.
29
The benefits of having a decision tree are as follows −
5. Clustering vs Classification
Though clustering and classification appear to be similar processes, there is a difference between them based on
their meaning. In the data mining world, clustering and classification are two types of learning methods. Both these
methods characterize objects into groups by one or more features. The key difference between clustering and
classification is that clustering is an unsupervised learning technique used to group similar instances on the
basis of features whereas classification is a supervised learning technique used to assign predefined tags to
instances on the basis of features.
What is Clustering?
Clustering is a method of grouping objects in such a way that objects with similar features come together, and
objects with dissimilar features go apart. It is a common technique for statistical data analysis used in machine
learning and data mining. Clustering can be used for exploratory data analysis and generalization.
Clustering belongs to unsupervised data mining, and clustering is not a single specific algorithm, but a general
method to solve the task. Clustering can be achieved by various algorithms. The appropriate cluster algorithm and
parameter settings depend on the individual data sets. It is not an automatic task, but it is an iterative process of
discovery. Therefore, it is necessary to modify data processing and parameter modeling until the result achieves
30
the desired properties. K-means clustering and Hierarchical clustering are two common clustering algorithms used
in data mining.
What is Classification?
Classification is a process of categorization where objects are recognized, differentiated and understood on the
basis of the training set of data. Classification is a supervised learning technique where a training set and correctly
defined observations are available.
The algorithm which implements classification is often known as the classifier, and the observations are often
known as the instances. K-Nearest Neighbor algorithm and decision tree algorithms are the most famous
classification algorithms used in data mining.
31
What is the difference between Clustering and Classification?
Clustering: Clustering is an unsupervised learning technique used to group similar instances on the basis of
features.
Classification: Classification is a supervised learning technique used to assign predefined tags to instances on
the basis of features.
## Whether you chose supervised or unsupervised should be based on whether or not you know what the
'categories' of your data are. If you know, use supervised learning. If you do not know, then use unsupervised.
6. Apriori algorithm Apriori algorithm is an association rule mining algorithm used in data mining. It is used
to find the frequent item set among the given number of transactions. It is a classic algorithm used in data
mining for learning association rules. It is nowhere as complex as it sounds, on the contrary it is very
simple; let me give you an example to explain it. Suppose you have records of large number of transactions
at a shopping center as follows:
Learning association rules basically means finding the items that are purchased together more frequently than
others.
For example in the above table you can see Item1 and item2 are bought together frequently.
32
Now, we follow a simple golden rule: we say an item/itemset is frequently bought if it is bought at least 60% of
times. So for here it should be bought at least 3 times.
For simplicity
M = Mango
O = Onion
And so on……
Original table:
Transaction Items Bought
ID
T1 {M, O, N, K, E, Y }
T2 {D, O, N, K, E, Y }
T3 {M, A, K, E}
T4 {M, U, C, K, Y }
T5 {C, O, O, K, I, E}
Step 1: Count the number of transactions in which each item occurs, Note ‘O=Onion’ is bought 4 times in total,
but, it occurs in just 3 transactions.
Item No of
transactions
M 3
O 3
N 2
K 5
E 4
Y 3
D 1
A 1
U 1
C 2
I 1
Step 2: Now remember we said the item is said frequently bought if it is bought at least 3 times. So in this step we
remove all the items that are bought less than 3 times from the above table and we are left with
Item Number of
transactions
M 3
O 3
K 5
E 4
Y 3
33
This is the single items that are bought frequently. Now let’s say we want to find a pair of items that are bought
frequently. We continue from the above table (Table in step 2)
Step 3: We start making pairs from the first item, like MO,MK,ME,MY and then we start with the second item
like OK,OE,OY. We did not do OM because we already did MO when we were making pairs with M and buying
a Mango and Onion together is same as buying Onion and Mango together. After making all the pairs we get,
Item pairs
MO
MK
ME
MY
OK
OE
OY
KE
KY
EY
Step 4: Now we count how many times each pair is bought together. For example M and O is just bought together
in {M,O,N,K,E,Y}
While M and K is bought together 3 times in {M,O,N,K,E,Y}, {M,A,K,E} AND {M,U,C, K, Y}
After doing that for all the pairs we get
Step 5: Golden rule to the rescue. Remove all the item pairs with number of transactions less than three and we
are left with
34
These are the pairs of items frequently bought together.
Now let’s say we want to find a set of three items that are brought together.
We use the above table (table in step 5) and make a set of 3 items.
Step 6: To make the set of three items we need one more rule (it’s termed as self-join),
It simply means, from the Item pairs in the above table, we find two pairs with the same first Alphabet, so we get
· OK and OE, this gives OKE
· KE and KY, this gives KEY
Then we find how many times O,K,E are bought together in the original table and same for K,E,Y and we get the
following table
While we are on this, suppose you have sets of 3 items say ABC, ABD, ACD, ACE, BCD and you want to generate
item sets of 4 items you look for two sets having the same first two alphabets.
· ABC and ABD -> ABCD
· ACD and ACE -> ACDE
And so on … In general you have to look for sets having just the last alphabet/item different.
Step 7: So we again apply the golden rule, that is, the item set must be bought together at least 3 times which
leaves us with just OKE, Since KEY are bought together just two times.
Thus the set of three items that are bought together most frequently are O,K,E.
There are several major data mining techniques have been developing and using in data mining projects
recently including association, classification, clustering, prediction, sequential patterns and decision tree. We
will briefly examine those data mining techniques in the following sections.
Association
Association is one of the best-known data mining technique. In association, a pattern is discovered based on a
relationship between items in the same transaction. That’s is the reason why association technique is also known
as relation technique. The association technique is used in market basket analysis to identify a set of products
that customers frequently purchase together.
Retailers are using association technique to research customer’s buying habits. Based on historical sale data,
retailers might find out that customers always buy crisps when they buy beers, and, therefore, they can put beers
and crisps next to each other to save time for customer and increase sales.
35
Classification
Classification is a classic data mining technique based on machine learning. Basically, classification is used to
classify each item in a set of data into one of a predefined set of classes or groups. Classification method makes
use of mathematical techniques such as decision trees, linear programming, neural network and statistics. In
classification, we develop the software that can learn how to classify the data items into groups. For example, we
can apply classification in the application that “given all records of employees who left the company, predict
who will probably leave the company in a future period.” In this case, we divide the records of employees into
two groups that named “leave” and “stay”. And then we can ask our data mining software to classify the
employees into separate groups.
Clustering
Clustering is a data mining technique that makes a meaningful or useful cluster of objects which have similar
characteristics using the automatic technique. The clustering technique defines the classes and puts objects in
each class, while in the classification techniques, objects are assigned into predefined classes. To make the
concept clearer, we can take book management in the library as an example. In a library, there is a wide range of
books on various topics available. The challenge is how to keep those books in a way that readers can take
several books on a particular topic without hassle. By using the clustering technique, we can keep books that
have some kinds of similarities in one cluster or one shelf and label it with a meaningful name. If readers want to
grab books in that topic, they would only have to go to that shelf instead of looking for the entire library.
Prediction The prediction, as its name implied, is one of a data mining techniques that discovers the relationship
between independent variables and relationship between dependent and independent variables. For instance, the
prediction analysis technique can be used in the sale to predict profit for the future if we consider the sale is an
independent variable, profit could be a dependent variable. Then based on the historical sale and profit data, we
can draw a fitted regression curve that is used for profit prediction.
Sequential Patterns Sequential patterns analysis is one of data mining technique that seeks to discover or
identify similar patterns, regular events or trends in transaction data over a business period.
In sales, with historical transaction data, businesses can identify a set of items that customers buy together
different times in a year. Then businesses can use this information to recommend customers buy it with better
deals based on their purchasing frequency in the past.
Decision trees
The A decision tree is one of the most common used data mining techniques because its model is easy to
understand for users. In decision tree technique, the root of the decision tree is a simple question or condition
that has multiple answers. Each answer then leads to a set of questions or conditions that help us determine the
data so that we can make the final decision based on it. For example, We use the following decision tree to
determine whether or not to play tennis:
36
Starting at the root node, if the outlook is overcast then we should definitely play tennis. If it is rainy, we should
only play tennis if the wind is the week. And if it is sunny then we should play tennis in case the humidity is
normal.
We often combine two or more of those data mining techniques together to form an appropriate process that
meets the business needs.
Formally, Bayesian networks are directed acyclic graphs whose nodes represent variables, and whose arcs
encode conditional independencies between the variables. The graph provides an intuitive description of the
dependency model and defines a simple factorization of the joint probability distribution leading to a
tractable model which is compatible with the encoded dependencies. Efficient algorithms exist to learn both
the graphical and the probabilistic models from data, thus allowing for the automatic application of this
methodogy in complex problems. Bayesian networks that model sequences of variables (such as, for
example, time series of historical records) are called dynamic Bayesian networks. Generalizations of
37
Bayesian networks that can represent and solve decision problems under uncertainty are called influence
diagrams.
On the other hand, neural networks are nonlinear models inspired in the functioning of the brain which have
been designed to solve different problems. Thus, multi-layer perceptrons are regression-like algorithms to
build a deterministic model y=f(x), relating a set of predictors, x, and predictands, y (figure below, left). Self-
Organizing Maps (SOM) are competitive networks designed for clustering and visualization purposes (right).
38
UNIT-IV
KNOWLEDGE
MANAGEMENT
Define knowledge management and knowledge management process.
Explain the types of knowledge
Write brief about failure of knowledge management
Characteristics of knowledge
What are different issues of challenges for knowledge management?
Difference between tacit and explicit knowledge
Define expert knowledge
Advantages and disadvantages of knowledge management
Limitations of knowledge management
Define knowledge architecture in detail
What are different benefits of knowledge management.
WRITE DOWN THE PHASES OF KNOWLEDGE MANAGEMENT
Define
1. RER
2. CASE STUDY
3. KNOWLEDGE BANK
4. KNOWLEDGE CAFÉ
5. KNOWLEDGE MARKETPLACE
6. COTS
7. BRAINSTORMING
8. ROI
“Knowledge management is really about recognizing that regardless of what business you are in, you are
competing based on the knowledge of your employees”
• Strategy: Knowledge management strategy must be dependent on corporate strategy. The objective is to
manage, share, and create relevant knowledge assets that will help meet tactical and strategic requirements.
39
• Organizational Culture: The organizational culture influences the way people interact, the context within
which knowledge is created, the resistance they will have towards certain changes, and ultimately the way they
share (or the way they do not share) knowledge.
• Organizational Processes: The right processes, environments, and systems that enable KM to be implemented
in the organization.
• Management & Leadership: KM requires competent and experienced leadership at all levels. There are a wide
variety of KM-related roles that an organization may or may not need to implement, including a CKO,
knowledge managers, knowledge brokers and so on. More on this in the section on KM positions and roles.
• Technology: The systems, tools, and technologies that fit the organization's requirements - properly designed
and implemented.
• Politics: The long-term support to implement and sustain initiatives that involve virtually all organizational
functions, which may be costly to implement (both from the perspective of time and money), and which often do
not have a directly visible return on investment.
why is knowledge management useful? It is useful because it places a focus on knowledge as an actual asset,
rather than as something intangible. In so doing, it enables the firm to better protect and exploit what it knows, and
to improve and focus its knowledge development efforts to match its needs.
In other words:
40
on this subject have absolutely nothing to do with the KM that I was taught back in business school. I will discuss
this latter issue in greater detail in the future.
Types of Knowledge
Once knowledge is created, it exists within the organization. However, before it can be reused or shared it must be
properly recognized and categorized. Within business and KM, two types of knowledge are usually defined,
namely explicit, tacit knowledge and Embedded knowledge.
• Explicit Knowledge: This is largely a process of sorting through documents and other records, as well as
discovering knowledge within existing data and knowledge repositories. For the latter, IT can be used to uncover
hidden knowledge by looking at patterns and relationships within data and text. The main tools/practices in this
case include intelligence gathering, data mining (finding patterns in large bodies of data and information), and
text mining (text analysis to search for knowledge, insights, etc.). Intelligence gathering is closely linked to
expert systems (Bali et al 2009) where the system tries to capture the knowledge of an expert, though the extent
to which they are competent for this task is questionable (Botha et al 2008).
• Tacit knowledge: Discovering and detecting tacit knowledge is a lot more complex and often it is up to the
management in each firm to gain an understanding of what their company's experts actually know. Since tacit
knowledge is considered as the most valuable in relation to sustained competitive advantage, this is a crucial
step, a step that often simply involves observation and awareness. There are several qualitative and quantitative
tools/practices that can help in the process; these include knowledge surveys, questionnaires, individual
interviews, group interviews, focus groups, network analysis, and observation. IT can be used to help identify
experts and communities. Groupware systems and other social/professional networks as well as expert finders
can point to people who are considered experts, and may also give an indication of the knowledge these
people/groups possess.
• Embedded knowledge: This implies an examination and identification of the knowledge trapped inside
organizational routines, processes, products etc, which has not already been made explicit. Management must
essentially ask "why do we do something a certain way?" This type of knowledge discovery involves
observation and analysis, and the use of reverse engineering and modeling tools.
ROI Illustrating the return-on-investment (ROI) for a portal solution or knowledge management (KM) system
measuring the ROI on improved processes and increased economic value of employee performance. Thus, rather
than employing traditional notions of value and assets as noted in standard accounting practices, KM solutions
are tools managers should use to support opportunities for process improvement and redesign. ROI that measures
value from this perspective creates new areas of value from an organization's existing, undervalued assets. A
well-developed measurement methodology for implementing a KM system may illustrate ROI, justify
expenditures for implementing the system, and provide a format to ensure that process improvement occurs. A
well-thought-out KM system has the capability of becoming the “digital nervous system” of an organization,
tying all areas to the strategic goals of an organization.
KM Failure Factors
Based on the works of numerous researchers and authors, I arrived at two categories of factors, namely "causal"
and "resultant".
Causal factors refer to fundamental problems within the organization, which lead to conditions that are not suitable
for KM. They are not always easily visible and they lead to a number of symptoms, which I have termed “resultant”
factors.
41
Causal Failure Factors:
• Lack of performance indicators and measurable benefits
• Inadequate management support
• Improper planning, design, coordination, and evaluation
• Inadequate skill of knowledge managers and workers
• Problems with organizational culture
• Improper organisational structure
Resultant Failure Factors:
• Lack of widespread contribution
• Lack of relevance, quality, and usability
• Overemphasis on formal learning, systematisation, and determinant needs
• Improper implementation of technology
• Improper budgeting and excessive costs
• Lack of responsibility and ownership
• Loss of knowledge from staff defection and retirement
42
Knowledge management process
The operational processes present the processes of actually carrying out KM, i.e. knowledge collection, sharing,
update, etc. Before elaborating on the processes and their sub-processes in the following sections, an overview of
the model is
provided below: Figure shows the main processes of the model and their basic dependencies.
43
Overview of the Main Processes.
The co-ordination processes are underlying the operational processes. In Figure, this is shown by the rectangle
lying behind all other processes. The operational processes are presented as the following main processes:
“Identification of Need”, “Sharing”, “Creation”, “Collection and Storage”, and “Update”. Please note that there
are two processes that represent the main process “Sharing” in the model: “Knowledge Pull” and “Knowledge
Push”. The arrows connecting the processes
44
Knowledge management domain
provide an overview of the interaction and knowledge flows. The picture in the middle represents the place
where the knowledge is stored. The purpose of this picture, showing a human and a machine, is to express the
variety of possible ways of storing knowledge, including both technical (databases, documents, videos) and non-
technical
(human mind) repositories.
The general concept of the process model is that within the coordinating processes the operational processes are
planned and initiated. Together these make up the KM system. The main processes are described in the
following. “Identification of Need for Knowledge” identifies a need for knowledge and determines it. “Sharing”
is initiated in order to find out whether knowledge that already exists in the system can be used. This covers both
the searching for knowledge by a person who needs the knowledge (“Knowledge Pull”) and the feeding of
knowledge to recipients who are known to be in need of it
(“Knowledge Push”). If the needed knowledge is not available yet, “Creation of Knowledge” is initiated.
Consequently, the new knowledge (the result) has to be collected – this is done in “Knowledge Collection and
Storage”.
Characteristics of knowledge: The most important characteristic of knowledge is non-rivalry, which means that
one person’s use of an idea does not preclude another person using it at the same time.
45
Difference between knowledge and information
Information Knowledge
Static Dynamic
Independent of the individual Dependent on individual
Explicit Tacit
Digital Analogue
Easy to duplicate Must be re-create
Easy to broadcast Face-to-face mainly
No instrinctic meaning Meaning has to be personally assigned
Tacit knowledge (knowing-how): knowledge embedded in the human mind through experience and jobs.
Know-how and learning embedded within the minds of people. Personal wisdom and experience, context-
specific, more difficult to extract and codify. Tacit knowledge Includes insights, intuitions.
Explicit knowledge (knowing-that): knowledge codified and digitized in books, documents, reports, memos,
etc. Documented information that can facilitate action. Knowledge what is easily identified, articulated, shared
and employed.
Expert system
46
Expert systems (ES) are one of the prominent research domains of AI. It is introduced by the researchers at
Stanford University, Computer Science Department.
The expert systems are the computer applications developed to solve complex problems in a particular domain,
at the level of extra-ordinary human intelligence and expertise.
Artificial intelligence based system that converts the knowledge of an expert in a specific subject into a software
code. This code can be merged with other such codes (based on the knowledge of other experts) and used for
answering questions (queries) submitted through a computer. Expert systems typically consist of three parts:
(1) knowledge base which contains the information acquired by interviewing experts, and logic rules
that govern how that information is applied;
(2) Inference engine that interprets the submitted problem against the rules and logic of information
stored in the knowledge base; and an
(3) User Interface that allows the user to express the problem in a human language such as English.
Despite its earlier high hopes, expert systems technology has found application only in areas where information
can be reduced to a set of computational rules, such as insurance underwriting or some aspects of securities
trading. Also called rule based system
47
Characteristics of Expert Systems
• High performance
• Understandable
• Reliable
• Highly responsive
Capabilities of Expert Systems
The expert systems are capable of −
• Advising
• Instructing and assisting human in decision making
• Demonstrating
• Deriving a solution
• Diagnosing
• Explaining
• Interpreting input
• Predicting results
• Justifying the conclusion
• Suggesting alternative options to a problem
48
Applications of Expert System
The following table shows where ES can be applied.
Application Description
• Capturing and recording business knowledge - ensure your business has processes in place to capture and
record business knowledge.
• Sharing information and knowledge – develop a culture within your business for sharing knowledge between
employees.
• Business strategy and goals – without clear goals or a business strategy in place for the knowledge gathered the
information will be of no use to your business.
• Knowledge management systems – these systems can be costly and complex to understand but when utilised
properly can provide huge business benefits. It is important that staff are fully trained on these systems so that
they collect and record the right data.
49
Advantages of knowledge management
Consider the measurable benefits of capturing and using knowledge more effectively in your business. The
following are all possible outcomes:
• An improvement in the goods or services you offer and the processes that you use to sell them. For example,
identifying market trends before they happen might enable you to offer products and services to customers
before your competitors.
• Increased customer satisfaction because you have a greater understanding of their requirements through
feedback from customer communications.
• An increase in the quality of your suppliers, resulting from better awareness of what customers want and what
your staff require.
• Improved staff productivity, because employees are able to benefit from colleagues' knowledge and expertise to
find out the best way to get things done. They'll also feel more appreciated in a business where their ideas are
listened to.
• Increased business efficiency, by making better use of in-house expertise.
• Better recruitment and staffing policies. For instance, if you have increased knowledge of what your customers
are looking for, you're better able to find the right staff to serve them.
• The ability to sell or license your knowledge to others. You may be able to use your knowledge and expertise in
an advisory or consultancy capacity. In order to do so, though, make sure that you protect your intellectual
property.
50
Knowledge Management System Architecture
Developing a KMS is a complex task and requires a careful planning before selecting the tools for supporting the
knowledge processes. The designed system architecture should suit the organizational culture and business
needs. KMS can be as simple as a file folder until a complex business intelligence system which uses an
advanced data visualization and artificial intelligence. Thus, we have studied several KMS architectures which
aim to support knowledge management processes and collaboration in the organization. We found that even if
there are differences between architectures in term of functions and services, the major components of
architecture are comparable. The general KMS architecture is proposed by Tiwana [Tiwana 02]. He pointed out
that the KMS should comprise four major components: repository, collaborative platform, network, and culture.
1. Repository holds explicated formal and informal knowledge, such as declarative knowledge, procedural
knowledge, causal knowledge, and context. This component acts as the core of KMS which aims to store
and retrieve knowledge for future use.
2. Collaborative platform supports distributed work and incorporates pointers, skills databases, expert
locators, and informal communications channels.
3. Network means both physical and social networks that support communication and conversation.
Physical network is a ‘hard’ network such as intranet, shared space, and back bone. Social network is a
‘soft’ network such as Communities of Practice (CoP), associations, and working groups.
4. Culture is the enabler to encourage sharing and use of the KMS. Research has revealed that the greatest
difficulty in KM is ‘‘changing people’s behavior,’’ and the current biggest impediment to knowledge
transfer is ‘‘culture’’.
These four components are considered as the basis elements for each knowledge management system. However,
other tools could be integrated to enhance the quality of services of the system. Tiwana also proposed seven-
layer KMS architecture [Tiwana 02] which is the integration of these four components and their supportive
information technologies.
Actually, seven layer KMS architecture is just a reflection of OSI model (Open Systems Interconnection basic
reference model). This model tries to represent the functions and tools of KMS in terms of layer that the
51
knowledge passed though. This architecture might suit with complex systems which require network and data
manipulation.
It is important to have a life cycle in building knowledge management systems, because the life cycle provides
structure and order to the process. Additionally, the life cycle provides a breakdown of the activities into
manageable steps, good documentation for possible changes in the future, coordination of the project for a timely
completion, and regular management review at each phase of the cycle.
Many organizations leap into a knowledge management solution (e.g. document management, data mining,
blogging, and community forums) without first considering the purpose or objectives they wish to fulfill or how
the organization will adopt and follow best practices for managing its knowledge assets long term.
A successful knowledge management program will consider more than just technology. An organization should
also consider:
• People. They represent how you increase the ability of individuals within the organization to influence
others with their knowledge.
• Processes. They involve how you establish best practices and governance for the efficient and accurate
identification, management, and dissemination of knowledge.
• Technology. It addresses how you choose, configure, and utilize tools and automation to enable
knowledge management.
• Structure. It directs how you transform organizational structures to facilitate and encourage cross-
discipline awareness and expertise.
• Culture. It embodies how you establish and cultivate a knowledge-sharing, knowledge-driven culture.
8 Steps to Implementation
Implementing a knowledge management program is no easy feat. You will encounter many challenges along the
way including many of the following:
• Inability to recognize or articulate knowledge; turning tacit knowledge into explicit knowledge.
52
• Cultural barriers (e.g. “this is how we've always done it” mentality).
The following eight-step approach will enable you to identify these challenges so you can plan for them, thus
minimizing the risks and maximizing the rewards. This approach was developed based on logical, tried-and-true
activities for implementing any new organizational program. The early steps involve strategy, planning, and
requirements gathering while the later steps focus on execution and continual improvement.
Define
1. RER: Rapid Evidence Review: A Rapid Evidence Review is a way of reviewing research and
evidence on a particular issue. It looks at what has been done in particular area and records the
main outcomes. Evidence review can be run in several ways. Some are more exhaustive in their
execution and ambitious in their scope.
The RER provides a quicker review but still useful way of gathering and consolidating
knowledge. It’s useful building block from which to start work on a new project.
2. CASE STUDY: A case study is a written examination of a project, or important part of a project.
It has a clear structure that brings out key qualitative and quantitative information from the
project. Case studies are also published with a broad audience in mind, so it is useful to bring the
most useful and transferable information to the fore.
3. KNOWLEDGE BANK: Knowledge banks are online services and resources which hold
information, learning and support, giving you the power to improve your council. They are
typically used to showcase the work of an organization and provide signposts to documents,
article and toolkits.
4. KNOWLEDGE CAFÉ: A knowledge café people brings together to have open creative
conversation on topics of mutual interest. It can be organized in a meeting or workshop format,
but the emphasis should be on following dialogue that allows people to share ideas and learn from
each other. It encourages people to explore issues that require discussion in order to build a
consensus around an issue.
6. COTS: Customized Off The Shelf (COTS) – this is the traditional and most popular way of
deploying application services. Based on the organizational needs, the applications will be
identified and then examined against the functional needs of the organization. A short-period test
may follow to identify the most suitable application. Once an application is acquired,
53
customization of the standard features is usually performed to integrate it into the organization’s
information system.
Used to bring different perspectives to a problem, find key areas to focus on in a project or test new methods,
brainstorming usually happens in a workshop or meeting with small and large groups working together on ideas.
8. ROI: Return on investment measures the gain or loss generated on an investment relative to the
amount of money invested. ROI is usually expressed as a percentage and is typically used for
personal financial decisions, to compare a company's profitability or to compare the efficiency of
different investments.
9. COP Communities of Practice (CoP) are also called knowledge communities, knowledge
networks, learning communities, communities of interest and thematic groups. These consist of a
group of people of different skill sets, development histories and experience backgrounds that
work together to achieve commonly shared goals (Ruggles, 1997). These groups are different
from teams and task forces. People in a CoP can perform the same job or collaborate on a shared
task, e.g. software developers, or work together on a product, e.g. engineers, marketers, and
manufacturing specialists.
54