Artificial Intelligence by Mehedi
Artificial Intelligence by Mehedi
On
Practical Application of AI in Commercial Sector
Submitted By
Md.Moshiur Rahman Khan
Reg No.: 00600
Roll No.: 21
L-IV, S- I
CSE, PSTU.
Introduction 01
Definition of AI 01
References 41
Abstract
At the time of writing, almost every computing group in the world is urgently
starting up courses in Artificial Intelligence. This topic is aimed at computer
scientists and postgraduate, or faculty level, who have taken, or are concurrently
taking, a course in Artificial Intelligence. However, a good proportion of this topic
will also be useful to people with an interest in AI but who do not have a
computational background about the practical application of Artificial Intelligence.
This topic is specifically intended to spread knowledge about commercial side
where AI has a great application. It is difficult to discuss about commercial side
where AI has a large & vast application such as OCR, Speech Recognition, and
Document Location & Library Catalogue Management.
Introduction
One of the most fascinating research issues today is the investigation of the true nature of
intelligence: the study of cognitive processes and models, of natural language and
perception, of human knowledge, representation, and reasoning.
What can we do today that we could not do thirty years ago? It is fortunate that AI has
several areas in which there has been sustained research over the past twenty to thirty
years. These areas are chess, natural language, speech, vision, robotics and expert
systems. I would like to illustrate the progress by providing a historical perspective on
some of these areas.
What is OCR?
Zonal OCR
Zonal OCR helps to automate data extraction from digital images. However, zonal OCR,
and OCR in general, is not entirely accurate and review of the extracted data will be
required.
Application of OCR
Existing vehicle license plate identification and recognition systems are potent for
either their accuracy or speed but not a combination of both. Using the concept of color
coherence vectors, an image recognition algorithm is presented which utilizes this
extracted region and compares it as a whole to other images of license plates, in the
database. The application developed for the testing of this algorithm works with an
accuracy of eighty eight percent and an average processing time of two seconds per
image. Key Words and Phrases: Vehicle license plate recognition, color coherence
vectors, mathematical morphology.
The problem of vehicle license plate recognition is an interesting one over the years. The
applications of such a system are vast and can range from parking lot security to traffic
management. The recognition process deviates from the conventional approach of using
Optical Character Recognition (OCR) systems and utilizes the concept of color
coherence vectors.
Using fuzzy logic and neural network algorithms the character regions are segmented and
the characters within them are identified.
The first stage of this algorithm involves the identification of the region within the image
wherein the license plate is enclosed. Firstly, an original image similar to the one shown
in Figure 1 is converted to monochrome using two different thresholds.
FIG 3: Monochrome image with a threshold of 158 between black and white
Further, the image shown is Figure 3 is subjected to the dilation operation with a mask
size of nine by nine and with white being the target color. The results of which are shown
in Figure 4.
Hence, during the comparison of these parameters the buckets representing the characters
are compared to each other and the one which has the least overall error is displayed.
Table 1 shows some of the successful and failed cases encountered during the testing of
this proposed method.
Neural networks are commonly used for digital character recognition: partly
because of their popularity and appeal, but undoubtedly they provide excellent results,
perhaps among the best known for character recognition. The inherent pattern
recognition abilities of layered neural networks lend itself perfectly to this type of
task, by autonomously learning the complex mappings in high dimensional input data.
There are various forms of multi-layered neural network models, the most common
and applicable to this task being the standard feed-forward connectionist model usually
trained by way of back propagation. In general, multi-layered neural networks are
said to perform a non-linear function of the linearly weighted sum of inputs in a
distributed manner and can be very powerful.
What is GOCR?
GOCR is an OCR (Optical Character Recognition) program, developed under the GNU
Public License. It converts scanned images of text back to text files.
GOCR can be used with different front-ends, which makes it very easy to port to different
OSes and architectures. It can open many different image formats, and its quality have
been improving in a daily basis.
1. Combining both, using FineReader and InftyReader in a pipe to let every system
to do what it is good for, then `vote'.
2. Top-level (Java) program to automate the process and fix some in deficiencies
3. Instant setup unusable: fine-tuning and gradually enhancing the OCR procedure
and program parameters so that OCR results would be acceptable for DML-CZ
purposes
Indexing
Full text retrieval
Full text representation
Full text representation with xml mark-up
OCR works best with originals or very clear copies and mono-spaced fonts like Courier.
If you have choices, use the following source material:
OCR Limitations
There are several limitations of OCR Systems. They are illustrated here –
Using text from a source with font size less than 12 points or from a fuzzy
copy will result in more errors.
Except for tab stops and paragraphs marks, MOST document formatting is
lost during text scanning, (Bold, Italic & Underline are sometimes
recognized).
The output from a finished text scan will be a single column editable text
file. This text file will always require spellchecking and proofreading as
well as reformatting to desired final layout.
Scanning plain text files or printouts from a spreadsheet usually works, but
the text must be imported into a spreadsheet and reformatted to match the
original.
Conclusion
Optical character recognition is the task of converting images of text into their editable
textual representations. Most OCR systems for machine print text need large collections
of font styles and canonical character representations, whereby the recognition process
involves template matching for the input character images. Such systems are font
dependent and suffer in accuracy when given documents printed in novel font styles.
Speech Recognition
AI speech research, like AI more generally, often makes recourse to introspection about
human abilities. This subsection illustrates the use of introspection at four phases: when
setting long-term goals, when setting short-term goals, for design, and when debugging.
What it is?
Speech recognition has a long history of being one of the difficult problems in Artificial
Intelligence and Computer Science. As one goes from problem solving tasks to
perceptual tasks, the problem characteristics change dramatically: knowledge poor to
knowledge rich; low data rates to high data rates; slow response time (minutes to hours)
to instantaneous response time.
Speech was one of the first task domains to use a number of new concepts such as
Blackboard models, Beam search, and reasoning in the presence of uncertainty, which are
now used widely within AI.
Related Work
Model Description
The proposed model (see Fig. 1) is implemented as a dynamic Bayesian network. For
each word in the vocabulary, the model essentially consists of two parallel HMMs, one
for the audio stream and one for the video stream, each having N states per word. The
joint evolution of the HMM states is constrained by synchrony requirements imposed by
additional random variables.
The variable ct simply checks that the degree of asynchrony between the two streams is
in fact equal to at. This is done by having ct always observed with value 1, and defining
its distribution as
And 0 otherwise. This model therefore has only a few extra parameters in addition to the
parameters of the individual streams; if we allow asynchrony by a maximum of k states.
To evaluate the model at different levels of audio noise, we added babble noise from the
NOISEX database to the clean audio, and extracted Mel-frequency cepstral coefficients
(MFCCs) from the clean and noisy waveforms. The audio observations consisted of 14
MFCCs, plus first and second derivatives, minus the first energy coefficient, resulting in
a 41-dimensional vector in each frame. The original visual features were sampled at
29.97Hz; however, to enable state-synchronous audio-visual fusion as a baseline method,
they were interpolated to100Hz to match the audio frame rate.
But the best audio stream is given here –
Japan recently initiated a seven year $120 million project as the first phase towards
developing a phone system in which a Japanese speaker can converse with, say, an
English speaker in real time. This requires solutions to a number of currently unsolved
problems: a speech recognition system capable of recognizing a large (possibly
unlimited) vocabulary and spontaneous, unrehearsed, continuous speech; a natural
sounding speech synthesis preserving speaker characteristics; and a natural language
translation system capable of dealing with ambiguity, non-grammaticality, and
incomplete phrases.
One can also use POS tags, which capture the syntactic role of each word, as the basis of
the equivalence classes (Jelinek, 1985). Consider the utterances "load the oranges" and
"the load of bananas". The word “load” is being used as an untensed verb in the first
example, and as a noun in the second; and "oranges" and "bananas" are both being
used as plural nouns. The POS tag of a word is influenced by, and influences the
neighboring words and their POS tags. To use POS tags in language modeling, the
typical approach is to sum over all of the POS possibilities. Below, we give the
derivation based on using trigrams.
Pr (W1, N)
To add POS tags into the language model, we refrain from simply summing over all
POS sequences as prior approaches have done. Instead, we redefine the speech
recognition problem so that it finds the best word and POS sequence. Let P be a POS
sequence for the word sequence W. The goal of the speech recognizer is to now solve
the following.
POS Probabilities
For estimating the POS probability distribution, the algorithm starts with a single node
with all of the training data. It then finds a question to ask about the POS tags and
word identities of the preceding words (Pl, i-lWl, i-1) in order to partition the node
into two leaves, each being more informative as to which POS tag occurred than
the parent node.
Word Probabilities
Starting the decision tree algorithm with a separate root node for each POS tag has the
following advantages. Words only take on a small set of POS tags. For instance, a
word that is a superlative adjective cannot be a relative adjective. For the Wall
Street Journal, each token on average takes on 1.22 of the 46 POS tags. Second,
if we start with a root node for each POS tag, the number of words that need to be
distinguished at each node in the tree is much less than the full vocabulary size. .
Questions about POS Tags
Figure 1 gives the classification tree that we built for the POS tags from the
Trains corpus. The algorithm starts with each token in a separate class and
iteratively finds two classes to merge that results in the smallest lost of information
about POS adjacency. Rather than stopping at a certain number of classes, one
continues until only a single class remains. However, the order in which classes were
merged gives a hierarchical binary tree with the root corresponding to the entire
tagset, each leaf to a single POS tag, and intermediate nodes to groupings of tags
that occur in statistically similar contexts.
LV-C-SR Flow
Hidden Markov Models
1 1 1 … 1
2 2 2 … 2
… … … …
K K K … K
x1 x2 x3 x
A General Case HMM
The Evaluation Problem
The Viterbi Algorithm
The best score along a single path, at time t, which accounts for the first t observations
and ends in state i
Viterbi Algorithm is similar in implementation to the forward calculation, but the major
difference is the maximization over previous states.
Speech in an Hour
Spectral Analysis
For speech understanding, and more generally, AI approaches have seemed more
promising than traditional science and engineering. This is probably because AI
methodology exploits introspection (§2.2) and aesthetic considerations (§2.3), both of
which seem to provide the right answers with relatively little effort.
However AI methodology has not fulfilled this promise for speech. The engineering
approach, in contrast, has produced solid and impressive results. The significance of this
is not obvious. Some AI researchers believe that this means that the speech community
should be “welcomed back” to AI, as argued by several recent editorials. But the values
of AI and mainstream (engineering-style) speech research are so different, as seen in §2,
that reconciliation does not seem likely.
Other AI researchers take the view that the results of engineering work are not interesting;
presumably meaning that they are compelling neither introspectively or aesthetically.
Many further believe that the AI approach to speech will be vindicated in the end. A few
strive towards this goal (I was one). However, AI goals conflict with other more
important goals, and so it is hard to be optimistic about future attempts to apply AI
methodology to the speech understanding problem.
For speech understanding, AI methodology turns out to be of little value. Whether this is
also true in other cases is a question of great interest.
Conclusion
Let me conclude by first saying that the field is more exciting than ever before. Our
recent advances are significant and substantial. And the mythical AI winter may have
turned into an AI spring. I see many flowers blooming. There are so many successes that
I never cease to be amazed with wonderment at these new and creative uses of AI.
We are not, and never have been an island unto ourselves. Finally, all parts of AI belong
together. Success in AI requires advances in all of its disparate parts including chess,
cognitive science, logic, and connectionism.
Each of these experiments yields new insights that are crucial to the ultimate success of
the whole enterprise. What can you do? I believe the time has come for each of us to
become a responsible spokesman for the entire field. This requires some additional effort
on our part to be articulate and be convincing about the progress and prospects of AI.
Finally, choose your favorite grand challenge relevant to the nation, and work on it.
Document Management
Introduction
Current document management technology grows out of the business community where
some 80% of corporate information resides in documents. The need for greater
efficiencies in handling business documents to gain an edge on the competition has
fueled the rapid development of Document Management Systems (DMS) over the last
two years.
Document management has replaced data management— the focus of computing for
the last twenty years— as the latest challenge facing information technologists. This
paper provides a general overview of document management, its associated standards,
and trends for the year to come, and prominent vendors of document management
products.
Classification
These two classes differ largely in the fact that images are static, while editable
documents are dynamic and changing. The functions associated with the two classes
differ as well.
The elements of a DMS include software to perform all functions necessary to manage
the document across an organization from cradle to grave. Each element is described
below.
Underlying infrastructure. While not part of an application per se, an appropriate
underlying infrastructure in nevertheless a prerequisite to supporting a DMS. The
infrastructure is the set of desktop computers, workstations, and servers that are
interconnected by LANs and/or WANs.
Authoring. Authoring tools support document creation. Some more sophisticated tools
support structured or guided authoring, where authors are constrained by the system to
enter data in specified ways.
Workflow. Workflow is defined as the coordination of tasks, data, and people to make a
business process more efficient, effective, and adaptable to change. It is the control of
information throughout all phases of a process.
Storage. The core of the DMS is the database and search engines supporting storage and
retrieval of documents. Traditionally relational, DMSs are moving toward object-
oriented databases.
Library services. Not to be confused with what librarians consider to be library services,
this is a term used specifically by the document management community to refer to
document control mechanisms such as checkin, checkout, audit trail, protection/security,
and version control.
The electronic documents that are the very lifeblood of the modern business are all too
often taken totally for granted. Very few businesses take the time to consider the
expenses that they incur on a daily basis because of:
Time and effort wasted in locating documents. Recent research has indicated
that nearly 10% of an average office worker’s day is spent trying to locate
existing information and documents.
Redundant effort necessitated because it’s often easier to recreate something
than it would be to try to find it.
Time and effort involved in figuring out who has the latest version of a
document, and recovering when various revisions overwrite each other.
Unnecessary usage of network storage devices and network bandwidth,
because the documents are dispersed everywhere across the enterprise, rather
than centralized.
Likewise, few businesses take the time to consider the considerable risks that they expose
themselves to on a daily basis because:
Security is applied haphazardly at best, which exposes important information
to scrutiny by potentially inappropriate people.
Critical documents are stored -- often exclusively -- on laptop computers that
could be lost, stolen, or damaged at any time.
Documents stored centrally on Windows network drives, once deleted; do not
go into a recycle bin as commonly believed. They simply disappear, and must
be restored slowly from tape backups (if you’re lucky enough to have those).
No record exists of precisely who has viewed and/or edited a document. It’s
therefore impossible to audit a business process to uncover mistakes or
inefficiencies.
Static documents, such as digital images, are the least flexible— they cannot be edited or
made machine readable without further processing (i.e., optical character recognition).
Next along are static, but editable documents, such as word processing documents and
spreadsheets. While modifiable, these documents are: a) tied explicitly to one application
(e.g., WordPerfect); b) considered to be "dumb" in that they contain little or no
information about themselves; and c) typically flat files, or information blobs, prohibiting
access of specific information elements within them. They are not tied to one application
or platform; they are dynamic, constantly in a process of change; and they are
"intelligent", carrying information about their content and structure.
For librarians, the idea of a virtual document— a document that exists only briefly, and
which changes with each particular viewing, and with each particular viewer— presents a
nightmare scenario.
When considering the use of DMSs in libraries, questions that arise include: how well
does the prevailing business model of document management – controlling proprietary
documents as a means to an end, not the end in itself— map onto library models? What
are the business-critical objectives of libraries in general, and of the National Library in
specific? What are the documents to be managed? Who are the clients? Are the
documents internal documents (e.g., memos and reports), publications emanating from
the National Library, or documents of which we are the custodians but for which we do
not necessarily hold copyright?
OLE (Object Linking and Embedding). This proprietary standard from Microsoft
allows objects in one application to be linked to objects in another. For example, a graph
in a word processing document can be linked to the original data in a spreadsheet
application.
Much technology has appeared in the last two years to support document management. In
fact, so much has appeared that the current problem facing organizations is the
integration of document management technologies. The overall trends in document
management reported on Open Information and Document Systems, are:
Most managers seeking document solutions are looking for compound document
management, electronic and paper delivery, document image management, and
workflow.
Object-orientation is the trend, but most DMSs are relational.
Advanced document viewers are appearing.
On-demand printing is growing rapidly.
A growing number of companies are providing DMSs. The companies vary widely in the
number of DMS functions they implement and in the manner in which the functions are
implemented. Major companies providing DMSs include:
Documentum
Electronic Book Technologies
Folio Corporation
Frame Technology
Fulcrum Technologies
Information Dimensions, Inc.
Interleaf
Open Text Corporation
Oracle Corporation
PC Docs, Inc.
A relational or object- relational database is the repository for the “objects” or file
references. The document management system acts as the file manager and points to
references or maintains “objects” in the database The DMS has an interface or front-end
to applications; profiles contain information to identify documents and are used for file
searches Applications used for creating files, spreadsheets, engineering diagrams, or for
publishing forms are integrated with the DMS or accessed through the DMS.
Nine Document Management Risks
The full range of functions that a document management system may perform includes
document identification, storage and retrieval, tracking, version control, workflow
management, and presentation.
It is important to note that document management is not yet a single technology, but
several. The major challenge at this time is the integration of several software packages—
those for image storage and retrieval, workflow management, compound document
management, and document presentation— into a single integrated system. To facilitate
this process, vendors of DMSs are forming alliances and creating common standards to
provide an open approach to the technologies.
In the longer term, experts in the field predict that document management functions will
cease to be implemented as separate, dedicated applications. They instead will be
incorporated as basic tools of operating systems, much like current file access
mechanisms.
Library Catalogue Managements
Today, the library is under pressure to become more efficient while delivering even better
experiences for library users. Faced with these challenges, libraries have to modernize
nearly every aspect of their operations.Today’s libraries want this underlying tool to help
with a number of aims:
Bibliographic staff will find that the new cataloguing editor offers both more flexibility
and more guidance for record creation and editing. Real-time addition of new records to
the indexes that enable OPAC retrieval will make new stock immediately visible within
the catalogue. Enhanced facilities for defining format in MARC21 will enable multi-
media items to be more accurately and consistently catalogued.
Investment in new stock and deployment of staff time to administrate selection and
acquisition represents a huge proportion of library expenditure. So it’s a key area for
delivering efficiency savings for many libraries. Today we are seeing many changes in
the way libraries view procurement and collection management and, over time, this will
have an impact on the use of the ILS.
In the public sector, the widespread implementation of CRM systems as a tool for
delivering e-government has led to services and systems being orientated around the
citizen and their interactions with local government services. In academic institutions the
fee-paying student is also increasingly the ‘customer’ around which appropriate services
and resources must be built.
Smart Library Management System
Features:
Conclusion
Procuring a new Library Management System (LMS) is a complex process: this briefing
is a broad, practical, non-technical introduction. It is primarily intended for FE library
staff and their managers.
http://www.cs.toronto.edu/~smalik/downloads/report_407.pdf
Rao, V., Rao, H. (1995). Neural Networks and Fuzzy Logic. MIS Press, New York.
David Barber (2004). Learning From Data Lecture Notes, Nearest Neighbour
Classification.
http://www.anc.ed.ac.uk/~amos/lfd/lectures/lfd_2005_nearest_neighbour.pdf.
http://www.infoRouter.com
www.unisys.com/public_sector
http://www.hclinfosystems.in
http://www.cs.colorado.edu/~martin/slp.html