Bidirectional Associative Memory
Bidirectional Associative Memory
[1] There are two types of associative memory, autoassociative and hetero-associative. BAM is hetero-associative, meaning given a pattern it can return another pattern which is potentially of a different size. It is similar to the Hopfield network in that they are both forms of associative memory. However, Hopfield nets return patterns of the same size.
Topology
A BAM contains two layers of neurons, which we shall denote X and Y. Layers X and Y are fully connected to each other. Once the weights have been established, input into layer X presents the pattern in layer Y, and vice versa.
Procedure
Learning
Imagine we wish to store two associations, A1:B1 and A2:B2.
X1 = (1, -1, 1, -1, 1, -1), Y1 = (1, 1, -1, -1) X2 = (1, 1, 1, -1, -1, -1), Y2 = (1, -1, 1, -1) where denotes the transpose. So,
Recall
To retrieve the association A1, we multiply it by M to get (4, 2, -2, -4), which, when run through a threshold, yields (1, 1, 0, 0), which is B1. To find the reverse association, multiply this by the transpose of M.
A recurrent neural network (RNN) is a class of neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. Unlike feedforward neural networks, RNNs can use their internal memory to process arbitrary sequences of inputs. This makes them applicable to tasks such as unsegmented connected handwriting recognition, where they have achieved the best known results.[1]
Contents
1 Architectures o 1.1 Fully recurrent network o 1.2 Hopfield network o 1.3 Elman networks and Jordan networks o 1.4 Echo state network o 1.5 Long short term memory network o 1.6 Bi-directional RNN o 1.7 Continuous-time RNN o 1.8 Hierarchical RNN o 1.9 Recurrent multilayer perceptron o 1.10 Second Order Recurrent Neural Network o 1.11 Pollacks sequential cascaded networks 2 Training o 2.1 Gradient descent o 2.2 Global optimization methods 3 Related fields 4 Issues with recurrent neural networks 5 References 6 External links
Architectures
Fully recurrent network
This is the basic architecture developed in the 1980s: a network of neuron-like units, each with a directed connection to every other unit. Each unit has a time-varying real-valued activation. Each connection has a modifiable real-valued weight. Some of the nodes are called input nodes, some output nodes, the rest hidden nodes. Most architectures below are special cases. For supervised learning in discrete time settings, training sequences of real-valued input vectors become sequences of activations of the input nodes, one input vector at a time. At any given time step, each non-input unit computes its current activation as a nonlinear function of the weighted sum of the activations of all units from which it receives connections. There may be teacher-
given target activations for some of the output units at certain time steps. For example, if the input sequence is a speech signal corresponding to a spoken digit, the final target output at the end of the sequence may be a label classifying the digit. For each sequence, its error is the sum of the deviations of all target signals from the corresponding activations computed by the network. For a training set of numerous sequences, the total error is the sum of the errors of all individual sequences. Algorithms for minimizing this error are mentioned in the section on training algorithms below. In reinforcement learning settings, there is no teacher providing target signals for the RNN, instead a fitness function or reward function is occasionally used to evaluate the RNN's performance, which is influencing its input stream through output units connected to actuators affecting the environment. Again, compare the section on training algorithms below.
Hopfield network
The Hopfield network is of historic interest although it is not a general RNN, as it is not designed to process sequences of patterns. Instead it requires stationary inputs. It is a RNN in which all connections are symmetric. Invented by John Hopfield in 1982, it guarantees that its dynamics will converge. If the connections are trained using Hebbian learning then the Hopfield network can perform as robust content-addressable memory, resistant to connection alteration. A variation on the Hopfield network is the bidirectional associative memory (BAM). The BAM has two layers, either of which can be driven as an input, to recall an association and produce an output on the other layer.[2]
The following special case of the basic architecture above was employed by Jeff Elman. A threelayer network is used (arranged vertically as x, y, and z in the illustration), with the addition of a set of "context units" (u in the illustration). There are connections from the middle (hidden) layer to these context units fixed with a weight of one.[3] At each time step, the input is propagated in a standard feed-forward fashion, and then a learning rule is applied. The fixed back connections result in the context units always maintaining a copy of the previous values of the hidden units (since they propagate over the connections before the learning rule is applied). Thus the network can maintain a sort of state, allowing it to perform such tasks as sequence-prediction that are beyond the power of a standard multilayer perceptron. Jordan networks, due to Michael I. Jordan, are similar to Elman networks. The context units are however fed from the output layer instead of the hidden layer. The context units in a Jordan network are also referred to as the state layer, and have a recurrent connection to themselves with no other nodes on this connection.[3] Elman and Jordan networks are also known as "simple recurrent networks" (SRN).
Bi-directional RNN
Invented by Schuster & Paliwal in 1997,[9] bi-directional RNN or BRNN use a finite sequence to predict or label each element of the sequence based on both the past and the future context of the element. This is done by adding the outputs of two RNN, one processing the sequence from left to right, the other one from right to left. The combined outputs are the predictions of the teachergiven target signals. This technique proved to be especially useful when combined with LSTM RNN.[10]
Continuous-time RNN
A continuous time recurrent neural network (CTRNN) is a dynamical systems model of biological neural networks. A CTRNN uses a system of ordinary differential equations to model the effects on a neuron of the incoming spike train. CTRNNs are more computationally efficient
than directly simulating every spike in a network as they do not model neural activations at this level of detail[citation needed] . For a neuron in the network with action potential the rate of change of activation is given by:
Where:
: Time constant of postsynaptic node : Activation of postsynaptic node : Rate of change of activation of postsynaptic node : Weight of connection from pre to postsynaptic node : Sigmoid of x e.g. : Activation of presynaptic node : Bias of presynaptic node : Input (if any) to node .
CTRNNs have frequently been applied in the field of evolutionary robotics, where they have been used to address, for example, vision,[11] co-operation[12] and minimally cognitive behaviour.[13]
Hierarchical RNN
There are many instances of hierarchical RNN whose elements are connected in various ways to decompose hierarchical behavior into useful subprograms.[14][15]
Training
Gradient descent
To minimize total error, gradient descent can be used to change each weight in proportion to its derivative with respect to the error, provided the non-linear activation functions are differentiable. Various methods for doing so were developed in the 1980s and early 1990s by Paul Werbos, Ronald J. Williams, Tony Robinson, Jrgen Schmidhuber, Sepp Hochreiter, Barak Pearlmutter, and others. The standard method is called "backpropagation through time" or BPTT, and is a generalization of back-propagation for feed-forward networks,[19][20] and like that method, is an instance of Automatic differentiation in the reverse accumulation mode or Pontryagin's minimum principle. A more computationally expensive online variant is called "Real-Time Recurrent Learning" or RTRL,[21][22] which is an instance of Automatic differentiation in the forward accumulation mode with stacked tangent vectors. Unlike BPTT this algorithm is local in time but not local in space.[23][24] There also is an online hybrid between BPTT and RTRL with intermediate complexity,[25][26] and there are variants for continuous time.[27] A major problem with gradient descent for standard RNN architectures is that error gradients vanish exponentially quickly with the size of the time lag between important events.[28] [29] The Long short term memory architecture together with a BPTT/RTRL hybrid learning method was introduced in an attempt to overcome these problems.[6]
There are many chromosomes that make up the population; therefore, many different neural networks are evolved until a stopping criterion is satisfied. A common stopping scheme is: 1) when the neural network has learnt a certain percentage of the training data or 2) when the minimum value of the mean-squared-error is satisfied or 3) when the maximum number of training generations has been reached. The stopping criterion is evaluated by the fitness function as it gets the reciprocal of the mean-squared-error from each neural network during training. Therefore, the goal of the genetic algorithm is to maximize the fitness function, hence, reduce the mean-squared-error. Other global (and/or evolutionary) optimization techniques may be used to seek a good set of weights such as Simulated annealing or Particle swarm optimization.
Hopfield network
Memory is the process of maintaining information over time. (Matlin, 2005) Memory is the means by which we draw on our past experiences in order to use this information in the present (Sternberg, 1999).
Memory is the term given to the structures and processes involved in the storage and subsequent retrieval of information. Memory is essential to all out lives. Without a memory of the past, we cannot operate in the present or think about the future. We would not be able to remember what we did yesterday, what we have done today or what we plan to do tomorrow. Without memory we could not learn anything. Memory is involved in processing vast amounts of information. This information takes many different forms, e.g. images, sounds or meaning. For psychologists the term memory covers three important aspects of information processing:
1. Memory Encoding
When information comes into our memory system (from sensory input), it needs to be changed into a form that the system can cope with, so that it can be stored. (Think of this as similar to changing your money into a different currency when you travel from one country to another).
For example, a word which is seen (on the whiteboard) may be stored if it is changed (encoded) into a sound or a meaning (i.e. semantic processing). There are three main ways in which information can be encoded (changed): 1. Visual (picture) 2. Acoustic (sound) 3. Semantic (meaning) For example, how do you remember a telephone number you have looked up in the phone book? If you can see it then you are using visual coding, but if you are repeating it to yourself you are using acoustic coding (by sound). Evidence suggests that this is the principle coding system in short term memory (STM) is acoustic coding. When a person is presented with a list of numbers and letters, they will try to hold them in STM by rehearsing them (verbally). Rehearsal is a verbal process regardless of whether the list of items is presented acoustically (someone reads them out), or visually (on a sheet of paper). The principle encoding system in long term memory (LTM) appears to be semantic coding (by meaning). However, information in LTM can also be coded both visually and acoustically.
2. Memory Storage
This concerns the nature of memory stores, i.e. where the information is stored, how long the memory lasts for (duration), how much can be stored at any time (capacity) and what kind of information is held. The way we store information affects the way we retrieve it. There has been a significant amount of research regarding the differences between Short Term Memory (STM ) and Long Term Memory (LTM). Most adults can store between 5 and 9 items in their short-term memory. Miller (1956) put this idea forward and he called it the magic number 7. He though that short-term memory capacity was 7 (plus or minus 2) items because it only had a certain number of slots in which items could be stored. However, Miller didnt specify the amount of information that can be held in each slot. Indeed, if we can chunk information together we can store a lot more information in our short-term memory. In contrast the capacity of LTM is thought to be unlimited. Information can only be stored for a brief duration in STM (0-30 seconds), but LTM can last a lifetime.
3. Memory Retrieval
This refers to getting information out storage. If we cant remember something, it may be because we are unable to retrieve it. When we are asked to retrieve something from memory, the differences between STM and LTM become very clear. STM is stored and retrieved sequentially. For example, if a group of participants are given a list of words to remember, and then asked to recall the fourth word on the list, participants go through the list in the order they heard it in order to retrieve the information. LTM is stored and retrieved by association. This is why you can remember what you went upstairs for if you go back to the room where you first thought about it. Organizing information can help aid retrieval. You can organize information in sequences (such as alphabetically, by size or by time). Imagine a patient being discharged form hospital whose treatment involved taking various pills at various times, changing their dressing and doing exercises. If the doctor gives these instructions in the order which they must be carried out throughout the day (i.e. in sequence of time), this will help the patient remember them.
be generalized to real life. As a result, many memory experiments have been criticized for having low ecological validity.
Memory has the ability to encode, store and recall information. Memories give an organism the capability to learn and adapt from previous experiences as well as build relationships. Encoding allows the perceived item of use or interest to be converted into a construct that can be stored within the brain and recalled later from short term or long term memory. Working memory stores information for immediate use or manipulation which is aided through hooking onto previously archived items already present in the long-term memory of an individual.
Visual encoding
Visual encoding is the process of encoding images and visual sensory information. This means that people can convert the new information that they stored into mental pictures (Harrison, C., Semin, A.,(2009). Psychology. New York p. 222) Visual sensory information is temporarily stored within our iconic memory[1] and working memory before being encoded into permanent long-term storage.[2][3] Baddeleys model of working memory states that visual information is stored in the visuo-spatial sketchpad.[1] The amygdala is a complex structure that has an important role in visual encoding. It accepts visual input in addition to input from other systems and encodes the positive or negative values of conditioned stimuli.[4]
Elaborative Encoding
Elaborative Encoding is the process of actively relating new information to knowledge that is already in memory. Memories are a combination of old and new information, so the nature of any particular memory depends as much on the old information already in our memories as it does on the new information coming in through our senses. In other words, how we remember something depends in how we think about it at the time. Many studies have shown that long-term retention is greatly enhanced by elaborative encoding.[5]
Acoustic encoding
Acoustic encoding is the encoding of auditory impulses. According to Baddeley, processing of auditory information is aided by the concept of the phonological loop, which allows input within our echoic memory to be sub vocally rehearsed in order to facilitate remembering.[1] When we hear any word, we do so by hearing to individual sounds, one at a time. Hence the memory of the beginning of a new word is stored in our echoic memory until the whole sound has been perceived and recognized as a word.[6] Studies indicate that lexical, semantic and phonological factors interact in verbal working memory. The phonological similarity effect (PSE), is modified by word concreteness. This emphasizes that verbal working memory performance cannot exclusively be attributed to phonological or acoustic representation but also includes an interaction of linguistic representation.[7] What remains to be seen is whether linguistic representation is expressed at the time of recall or whether they[clarification needed] participate in a more fundamental role in encoding andpreservation.[
There are three or four main types of encoding: Acoustic encoding is the processing and encoding of sound, ??? Did You Know ??? words and other auditory input for storage and later retrieval. This is aided by the concept of the phonological loop, which When presented with a visual allows input within our echoic memory to be sub-vocally stimulus, the part of the brain rehearsed in order to facilitate remembering. which is activated the most Visual encoding is the process of encoding images and visual depends on the nature of the sensory information. Visual sensory information is temporarily image. stored within the iconic memory before being encoded into A blurred image, for example, long-term storage. The amygdala (within the medial temporal activates the visual cortex at lobe of the brain which has a primary role in the processing of the back of the brain most. emotional reactions) fulfills an important role in visual encoding, An image of an unknown face as it accepts visual input in addition to input from other systems activates the associative and and encodes the positive or negative values of conditioned frontal regions most. stimuli. An image of a face which is Tactile encoding is the encoding of how something feels, already in working memory normally through the sense of touch. Physiologically, neurons in activates the frontal regions the primary somatosensory cortex of the brain react to most, while the visual areas vibrotactile stimuli caused by the feel of an object. are scarcely stimulated at all. Semantic encoding is the process of encoding sensory input that has particular meaning or can be applied to a particular context, rather than deriving from a particular sense.
Fuzzy logic
From Wikipedia, the free encyclopedia
Fuzzy logic is a form of many-valued logic; it deals with reasoning that is approximate rather than fixed and exact. Compared to traditional binary sets (where variables may take on true or false values), fuzzy logic variables may have a truth value that ranges in degree between 0 and 1. Fuzzy logic has been extended to handle the concept of partial truth, where the truth value may range between completely true and completely false.[1] Furthermore, when linguistic variables are used, these degrees may be managed by specific functions. Irrationality can be described in terms of what is known as the fuzzjective.[citation needed] The term "fuzzy logic" was introduced with the 1965 proposal of fuzzy set theory by Lotfi A. Zadeh.[2][3] Fuzzy logic has been applied to many fields, from control theory to artificial intelligence. Fuzzy logics however had been studied since the 1920s as infinite-valued logics notably by ukasiewicz and Tarski.[4]
Contents
1 Overview o 1.1 Applying truth values o 1.2 Linguistic variables 2 Early applications
3 Example o 3.1 Hard science with IF-THEN rules 4 Logical analysis o 4.1 Propositional fuzzy logics o 4.2 Predicate fuzzy logics o 4.3 Decidability issues for fuzzy logic o 4.4 Synthesis of fuzzy logic functions given in tabular form 5 Fuzzy databases 6 Comparison to probability 7 Relation to ecorithms 8 See also 9 References 10 Bibliography 11 External links
Overview
Classical logic only permits propositions having a value of truth or falsity. The notion of whether 1+1=2 is absolute, immutable, mathematical truth. However, there exist certain propositions with variable answers, such as asking various people to identify a color. The notion of truth doesn't fall by the wayside, but rather a means of representing and reasoning over partial knowledge is afforded, by aggregating all possible outcomes into a dimensional spectrum. Both degrees of truth and probabilities range between 0 and 1 and hence may seem similar at first. For example, let a 100 ml glass contain 30 ml of water. Then we may consider two concepts: Empty and Full. The meaning of each of them can be represented by a certain fuzzy set. Then one might define the glass as being 0.7 empty and 0.3 full. Note that the concept of emptiness would be subjective and thus would depend on the observer or designer. Another designer might equally well design a set membership function where the glass would be considered full for all values down to 50 ml. It is essential to realize that fuzzy logic uses truth degrees as a mathematical model of the vagueness phenomenon while probability is a mathematical model of ignorance.
In this image, the meanings of the expressions cold, warm, and hot are represented by functions mapping a temperature scale. A point on that scale has three "truth values"one for each of the three functions. The vertical line in the image represents a particular temperature that the three arrows (truth values) gauge. Since the red arrow points to zero, this temperature may be interpreted as "not hot". The orange arrow (pointing at 0.2) may describe it as "slightly warm" and the blue arrow (pointing at 0.8) "fairly cold".
Linguistic variables
While variables in mathematics usually take numerical values, in fuzzy logic applications, the non-numeric are often used to facilitate the expression of rules and facts.[5] A linguistic variable such as age may have a value such as young or its antonym old. However, the great utility of linguistic variables is that they can be modified via linguistic hedges applied to primary terms. The linguistic hedges can be associated with certain functions.
Early applications
The Japanese were the first to utilize fuzzy logic for practical applications. The first notable application was on the high-speed train in Sendai, in which fuzzy logic was able to improve the economy, comfort, and precision of the ride.[6] It has also been used in recognition of hand written symbols in Sony pocket computers[citation needed], Canon auto-focus technology[citation needed], Omron auto-aiming cameras[citation needed], earthquake prediction and modeling at the Institute of Seismology Bureau of Metrology in Japan[citation needed], etc.
Example
Hard science with IF-THEN rules
Fuzzy set theory defines fuzzy operators on fuzzy sets. The problem in applying this is that the appropriate fuzzy operator may not be known. For this reason, fuzzy logic usually uses IF-THEN rules, or constructs that are equivalent, such as fuzzy associative matrices.
Rules are usually expressed in the form: IF variable IS property THEN action For example, a simple temperature regulator that uses a fan might look like this:
IF IF IF IF temperature temperature temperature temperature IS IS IS IS very cold THEN stop fan cold THEN turn down fan normal THEN maintain level hot THEN speed up fan
There is no "ELSE" all of the rules are evaluated, because the temperature might be "cold" and "normal" at the same time to different degrees. The AND, OR, and NOT operators of boolean logic exist in fuzzy logic, usually defined as the minimum, maximum, and complement; when they are defined this way, they are called the Zadeh operators. So for the fuzzy variables x and y:
NOT x = (1 - truth(x)) x AND y = minimum(truth(x), truth(y)) x OR y = maximum(truth(x), truth(y))
There are also other operators, more linguistic in nature, called hedges that can be applied. These are generally adverbs such as "very", or "somewhat", which modify the meaning of a set using a mathematical formula.
Logical analysis
In mathematical logic, there are several formal systems of "fuzzy logic"; most of them belong among so-called t-norm fuzzy logics.
Monoidal t-norm-based propositional fuzzy logic MTL is an axiomatization of logic where conjunction is defined by a left continuous t-norm, and implication is defined as the residuum of the t-norm. Its models correspond to MTL-algebras that are prelinear commutative bounded integral residuated lattices. Basic propositional fuzzy logic BL is an extension of MTL logic where conjunction is defined by a continuous t-norm, and implication is also defined as the residuum of the t-norm. Its models correspond to BL-algebras. ukasiewicz fuzzy logic is the extension of basic fuzzy logic BL where standard conjunction is the ukasiewicz t-norm. It has the axioms of basic fuzzy logic plus an axiom of double negation, and its models correspond to MV-algebras. Gdel fuzzy logic is the extension of basic fuzzy logic BL where conjunction is Gdel t-norm. It has the axioms of BL plus an axiom of idempotence of conjunction, and its models are called Galgebras.
Product fuzzy logic is the extension of basic fuzzy logic BL where conjunction is product t-norm. It has the axioms of BL plus another axiom for cancellativity of conjunction, and its models are called product algebras. Fuzzy logic with evaluated syntax (sometimes also called Pavelka's logic), denoted by EV, is a further generalization of mathematical fuzzy logic. While the above kinds of fuzzy logic have traditional syntax and many-valued semantics, in EV is evaluated also syntax. This means that each formula has an evaluation. Axiomatization of EV stems from ukasziewicz fuzzy logic. A generalization of classical Gdel completeness theorem is provable in EV.
It is known that any boolean logic function could be represented using a truth table mapping each set of variable values into set of values {0,1}. The task of synthesis of boolean logic function given in tabular form is one of basic tasks in traditional logic that is solved via disjunctive (conjunctive) perfect normal form. Each fuzzy (continuous) logic function could be represented by a choice table containing all possible variants of comparing arguments and their negations. A choice table maps each variant into value of an argument or a negation of an argument. For instance, for two arguments a row of choice table contains a variant of comparing values x1, x1, x2, x2 and the corresponding function value f( x 2 x1 x1 x2 ) = x1 The task of synthesis of fuzzy logic function given in tabular form was solved in.[7] New concepts of constituents of minimum and maximum were introduced. The sufficient and necessary conditions that a choice table defines a fuzzy logic function were derived.
Fuzzy databases
Once fuzzy relations are defined, it is possible to develop fuzzy relational databases. The first fuzzy relational database, FRDB, appeared in Maria Zemankova's dissertation. Later, some other models arose like the Buckles-Petry model, the Prade-Testemale Model, the Umano-Fukami model or the GEFRED model by J.M. Medina, M.A. Vila et al. In the context of fuzzy databases, some fuzzy querying languages have been defined, highlighting the SQLf by P. Bosc et al. and the FSQL by J. Galindo et al. These languages define some structures in order to include fuzzy aspects in the SQL statements, like fuzzy conditions, fuzzy comparators, fuzzy constants, fuzzy constraints, fuzzy thresholds, linguistic labels and so on. Much progress has been made to take fuzzy logic database applications to the web and let the world easily use them, for example: http://sullivansoftwaresystems.com/cgi-bin/fuzzy-logicmatch-algorithm.cgi?SearchString=garia This enables fuzzy logic matching to be incorporated into a database system or application.
Comparison to probability
Fuzzy logic and probability are different ways of expressing uncertainty. While both fuzzy logic and probability theory can be used to represent subjective belief, fuzzy set theory uses the concept of fuzzy set membership (i.e., how much a variable is in a set), and probability theory uses the concept of subjective probability (i.e., how probable do I think that a variable is in a set). While this distinction is mostly philosophical, the fuzzy-logic-derived possibility measure is inherently different from the probability measure, hence they are not directly equivalent. However, many statisticians are persuaded by the work of Bruno de Finetti that only one kind of mathematical uncertainty is needed and thus fuzzy logic is unnecessary. On the other hand, Bart Kosko argues[citation needed] that probability is a subtheory of fuzzy logic, as probability only handles one kind of uncertainty. He also claims[citation needed] to have proven a derivation of Bayes'
theorem from the concept of fuzzy subsethood. Lotfi A. Zadeh argues that fuzzy logic is different in character from probability, and is not a replacement for it. He fuzzified probability to fuzzy probability and also generalized it to what is called possibility theory. (cf.[8]) More generally, fuzzy logic is one of many different proposed extensions to classical logic, known as probabilistic logics, intended to deal with issues of uncertainty in classical logic, the inapplicability of probability theory in many domains, and the paradoxes of Dempster-Shafer theory.
Relation to ecorithms
Harvard's Dr. Leslie Valiant, co-author of the Valiant-Vazirani theorem, uses the term "ecorithms" to describe how many less exact systems and techniques like fuzzy logic (and "less robust" logic) can be applied to learning algorithms. Valiant essentially redefines machine learning as evolutionary. Ecorithms and fuzzy logic also have the common property of dealing with possibilities more than probabilities, although feedback and feedforward, basically stochastic "weights," are a feature of both when dealing with, for example, dynamical systems. In general use, ecorithms are algorithms that learn from their more complex environments (hence eco) to generalize, approximate and simplify solution logic. Like fuzzy logic, they are methods used to overcome continuous variables or systems too complex to completely enumerate or understand discretely or exactly. See in particular p. 58 of the reference comparing induction/invariance, robust, mathematical and other logical limits in computing, where techniques including fuzzy logic and natural data selection (ala "computational Darwinism") can be used to shortcut computational complexity and limits in a "practical" way (such as the brake temperature