Abstracts for Volume 6 (1995 issues) of IJNS are now available online. This is a free service from World Scientific, intended to enhance the scientific community's access to material contributed to our journals.
You may view the Contents of the following issues of this journal, and then download the Abstracts.
Volume 6 Number 1 (March 1995)
Noise-Enhanced Performance in a Cortical Associative Memory Model
H. Liljenström & X.-B. Wu
Pruning of a Large Network by Optimal Brain Damage and Surgeon: An Example from Biological Sequence Analysis
N. Tolstrup
Learning with Piece-Wise Linear Networks
M. Staley
Analysis of Training Set Parallelism for Backpropagation Neural Networks
S. K. Foo, P. Saratchandran & N. Sundararajan
Human-like Dynamic Programming Neural Networks for Dynamic Time Warping Speech Recognition
C.-C. Chiu & M. A. Shanblatt
Locally Activated Neural Networks and Stable Neural Controller Design for Nonlinear Dynamic Systems
H.-G. Kim & S.-Y. Oh
You may view the Abstracts of papers in this issue.
Ensemble Competitive Learning Neural Networks with Reduced Input Dimension
J. W. Kim, J. S. Ahn & S. Cho
Dynamical Recurrent Neural Networks - Towards Environmental Time Series Prediction
A. Aussem, F. Murtagh & M. Sarazin
On the functional equivalence of fuzzy inference systems and spline-based networks.
K. J. Hunt, R. Haas & M. Brown
Fuzzy Neural Networks: Between Functional Equivalence and Probability
S. K. Halgamuge & M. Glesner
A Perspective and Critique of Adaptive Neurofuzzy Systems used for Modelling and Control Applications
M. Brown & C. J. Harris
You may view the Abstracts of papers in this issue.
Volume 6 Number 3 (September 1995)
On the Equivalence of Two-Layered Perceptrons with Binary Neurons
M. Blatt, E. Domany & I. Kanter
Methods of Training and Constructing Multilayer Perceptrons with Arbitrary Pattern Sets
X. Liang & S.-W. Xia
Learning in Recurrent Finite Difference Networks
F.-S. Tsung & G. W. Cottrell
Neural Network Based Dynamic Controllers for Industrial Robots
S.-Y. Oh, W.-C. Shin & H.-G. Kim
Use of Multilayer Feedforward Neural Networks as a Display Method for Multidimensional Distributions
L. Garrido, V. Gaitan, M. Serra-Ricart & X. Calbet
Adaptive Fuzzy Control of Unstable Nonlinear Systems
C.-J. Lin & C.-T. Lin
Training Neural Networks by Means of Genetic Algorithms Working on Very Long Chromosomes
P. G. Korning
Recognition of Telugu Characters using Neural Networks
M. B. Sukhaswami, P. Seetharamulu & A. K. Pujari
Human Chromosome Classification using Multilayer Perceptron Neural Network
B. Lerner, H. Guterman, I. Dinstein & Y. Romem
You may view the Abstracts of papers in this issue.
Nonlinear gated experts for time series: discovering regimes and avoiding overfitting
Andreas S. Weigend, Morgan Mangeas, & Ashok N. Srivastava
Can deterministic penalty terms model the effects of synaptic weight noise on network fault-tolerance?
Peter J. Edwards & Alan F. Murray
Comparison of multilayer neural network and nearest neighbor classifiers for handwritten digit recognition
Hong Yan
An application of Hamiltonian neurodynamics using Pontryagin's maximum (minimum) principle
Takamasa Koshizen & John Fulcher
Multilayer perceptrons to approximate complex valued functions
P. Arena, L. Fortuna, R. Re & M. G. Xibilia
Remote sensing operations
Fouad Badran, Carlos Mejia, Sylvie Thiria & Michel Crepon
A correlation significance learning scheme for auto-associative memories
Donq-Liang Lee & Wen-June Wang
Theoretical results for a class of neural networks
Steve G. Romaniuk
You may view the Abstracts of papers in this issue.
Volume 7 Number 1 (March 1996)
An attractor neural network model of classical conditioning
Sergio D. Serulnik & Moshe Gur
Analytical interpretation of feed-forward nets outputs after training
Lluis Garrido & Sergio Gomez
Speaker identification using time-delay HMEs
Ke Chen, Dahong Xie & Huisheng Chi
Hardware prototypes of a Boolean neural network and the simulated annealing
optimization method
Jarkko Niittylahti
A unified neural bigradient algorithm for robust PCA and MCA
Liuyue Wang & Juha Karhunen
On the characteristics of the quadratic order associative memory that uses synchronous update and direct convergence
Jung-hua Wang
An adaptive Boolean automaton to model Circadian cycles
R. P. J. Perazzo & A. Schuschny
Role of chaos in trial-and-error problem solving by an artificial neural network
Ichiro Obana & Yasuhiro Fukui
1995 sees the International Journal of Neural Systems enter its sixth year of existence. It must now be considered a mature journal. But maturity is no goal in itself, and evolutionary changes must take place over time. It would be sad if a journal devoted to adaptive computation itself resisted adaptation.Looking back, it appears to us that the field of neural networks has fulfilled quite a number of the original - then heavily hyped - expectations, somewhat in contrast to the lack of progress elsewhere in the field of artificial intelligence during the last thirty years.
The impact areas have mostly been applications and theory. Artificial neural networks are today part of a standard toolbox for computer modeling of complex nonlinear systems, used for example in optical character recognition, financial forecasting, industrial quality assessment, and understanding the internal structure of mappings, for example in the genetic code. The solid relationship between statistical physics and neural networks has provided an excellent theoretical framework for neural machine learning, shedding light on the black box interior of the networks and carrying the subject beyond engineering habits of trial and error. Though it would not be fair to say that artificial neural networks have led to breakthroughs in the understanding of the brain, the modeling of biological neural systems is still an important goal.
In the coming year you will see changes in the Editorial Board, reflecting developments in the field and its evolving cohort of active scientists. More reviews, perhaps even of a provocative and controversial nature, will appear, and it is our intention to expand the Letters section to accomodate more concise papers. We are trying to speed up the refereeing process, become even more critical in dealing with papers using toy examples and synthetic data, and ease the paper submission procedures by electronic means.
The scope of the journal is still so broad as to encompass all of the subjects that come together under the heading Studies of Neural Systems. The scientific pluralism that this field engenders is not caused by a political desire to soften the boundaries between scientific disciplines, but is a direct consequence of the close intermingling of physical, biological, psychological, and computer science aspects in natural neural systems.
Benny Lautrup and Søren Brunak
Back to IJNS introductory page.
Please send bug-reports and suggestions for these Online Abstracts pages to rlim@wspc.demon.co.uk
INTERACTION OF NEURONAL POPULATIONS WITH DELAY: EFFECT OF FREQUENCY MISMATCH AND FEEDBACK GAIN
V. MENON
Division of Neurobiology, LSA 129, University of California, Berkeley, CA 94720, USA
The effect of frequency mismatch, signal transduction delay and inter-population feedback gain on the interaction of neuronal populations, mediated by long-range excitation, is investigated using physiologically realistic system parameters. Self-consistent solutions for the frequency, amplitude, and relative phase of the component signals are derived for limit cycle oscillations. These solutions predict important qualitative features including discontinuous changes in frequency of oscillation and phase reversal between symmetric and anti-symmetric limit cycles. A singularity in the solutions is used to predict parameter regions in which limit cycles do not exist. If limit cycles exist at zero delay, it is shown that limit cycle and quasi-periodic attractors alternate as a function of delay. The implications of these results for estimating physiologically meaningful delays from observed phase shifts in EEG time series are discussed.Spectral peaks for the quasi-periodic attractor occur at $m\nu_1\pm n\nu_2$, where the difference $\nu_1-\nu_2$ is approximately equal to the intrinsic population frequency mismatch $\delta\nu$. The cross-correlation function is amplitude modulated with a frequency equal to $\delta\nu/2$, indicating that the two populations slip in and out of phase with a mean correlation duration equal to $1/\delta\nu$. These findings underpin the dynamical basis of delay induced "desynchronization" of oscillations reported in computer simulations.
Bifurcation diagrams indicate that quasi-periodic attractors exist for a wide range of parameters in the presence of delay in long-range excitation and non-zero frequency mismatch. If the frequency mismatch is sufficiently large and the feed-back gain is sufficiently small, quasi-periodic attractors exist for all delays. Delays of a few milliseconds, much smaller than the system time scale, can destabilize limit cycle oscillations. The role of synaptic change in inducing bifurcations of limit cycles to quasi-periodic attractors and vice versa is discussed. The implication of these findings for the generation of chaos in distributed neural systems is discussed.
and
X.-B. Wu
Department of Theoretical Physics
Royal Institute of Technology
S-100 44 Stockholm, Sweden
Spontaneous neuronal activity and synaptic noise are well-known phenomena, but their biological significance has not yet been assessed. Using a computer model of the olfactory cortex we show that such activity, expressed as temporal noise in the model, can reduce recall time in associative memory tasks. We investigate both additive and multiplicative noise, and find optimal noise levels for which the recall time reaches a minimum. In addition, we demonstrate that noise can induce state transitions, such that the system is pushed from one attractor state to another. For high enough noise levels the dynamics can change dramatically and, for example, switch from an oscillatory to a chaos-like behavior. We discuss these findings in light of their significance for neural information processing.
Optimal Brain Damage (OBD) and Optimal Brain Surgeon (OBS) represent two popular pruning procedures; however, pruning large networks trained on voluminous data sets using these methods easily becomes intractable. We present a number of approximations and discuss practical issues in real-world pruning, and use as an example a network trained to predict protein coding regions in DNA sequences. The efficiency of OBS on large networks is compared to OBD, and it turns out that OBD is preferable to OBS, since more weights can be removed using less computational effort.
A new feedforward architecture is presented for empirical model building and regression. The network consists of two hidden layers of units, where each unit utilizes a piece-wise linear activation function. A procedure for determining both the number of units and their connectivity is developed. The most notable feature of the network is its associated learning algorithm which allows for recursive updating of the parameters. A smoothness constraint is employed to limit the range of solutions, so that practical models may be built with small amounts of data. The network is applied to some function estimation tasks, as well as to a forecasting problem using data from the Santa Fe Institute time-series competition.
Training set parallelism and network based parallelism are two popular paradigms for parallelizing a feedforward (artificial) neural network. Training set parallelism is particularly suited to feedforward neural networks with backpropagation learning where the size of the training set is large in relation to the size of the network. This paper analyzes training set parallelism for feedforward neural networks when implemented on a transputer array configured in a pipelined ring topology. Theoretical expressions for the time per epoch (iteration) and optimal size of a processor network are derived when the training set is equally distributed among the processing nodes. These show that the speed up is a function of the number of patterns per processor, communication overhead per epoch and the total number of processors in the topology. Further analysis of how to optimally distribute the training set on a given processor network when the number of patterns in the training set is not an integer multiple of the number of processors, is also carried out. It is shown that optimal allocation of patterns in such cases is a mixed integer programming problem. Using this analysis it is found that equal distribution of training patterns among the processors is not the optimal way to allocate the patterns even when the training set is an integer multiple of the number of processors. Extension of the analysis to processor networks comprising processors of different speeds is also carried out. Experimental results from a T805 transputer array are presented to verify all the theoretical results.
MICHAEL A. SHANBLATT
Department of Electrical Engineering, Michigan State University
East Lansing, MI 48824, USA
This paper presents a human-like dynamic programming neural network method for speech recognition using dynamic time warping. The networks are configured, much like human's, such that the minimum states of the network's energy function represent the near-best correlation between test and reference patterns. The dynamics and properties of the neural networks are analytically explained. Simulations for classifying speaker-dependent isolated words, consisting of 0 to 9 and A to Z, show that the method is better than conventional methods. The hardware implementation of this method is also presented.
A stable neural control scheme using a locally activated neural network has been proposed for a class of nonlinear dynamic systems. The locally activated neural network, for a given input, essentially selects a small subset of the network hidden nodes for output computation using the CMAC-like content addressing mechanism. This network aims to maintain local representations of the system dynamics. Thus, the global control performance in the concerned state space is achieved by the cooperation of many local control efforts and furthermore, real-time control can be facilitated because only a small sized network is involved to control and learn at any given time. The proposed control scheme is composed of two stages: (1)~prediction error based learning in which the network attempts to learn the nonlinear basis functions of the plant inverse dynamics by a modified backpropagation learning rule; and (2)~tracking error based learning in which the network weights are further fine-tuned using the basis set obtained in (1). This basis set spans the locally partitioned vector space of the system inverse dynamics when the prediction error based learning is achieved within a prescribed error tolerance. For uniform stability, the sliding mode control is introduced as a safety mechanism when the network has not sufficiently learned the plant dynamics yet. With suitable assumptions on the controlled plant, global stability and tracking error convergence proof has been given. Finally, the proposed control scheme is verified with computer simulation.
ASHOK N. SRIVASTAVA
Department of Elec. & Comp. Engineering and Center for Space
Construction, University of Colorado at Boulder, Boulder, CO 80309-0529, USA
Most traditional prediction techniques deliver a single point, usually the mean of a probability distribution. For multimodal processes, instead of predicting the mean, it is important to predict the full distribution. This article presents a new connectionist method to predict the conditional probability distribution in response to an input. The main idea is to transform the problem from a regression problem to a classification problem. The conditional probability distribution network can perform both direct predictions and iterated predictions, the latter task being specific for time series problems. We compare this new method to fuzzy logic and discuss important differences, and also demonstrate the architecture on two time series. The first is the benchmark laser series used in the Santa Fe competition, a deterministic chaotic system. The second is a time series from a Markov process which exhibits structure on two time scales. The network produces multimodal predictions for this series. We compare the predictions of the network with a nearest-neighbor predictor and find that the conditional probability network is more than twice as likely a model.
We present an application of a Potts glass to the clustering problem. Simulated annealing in the mean field approximation is used in order to avoid local minima. The resulting updating equations are completely parallel, and very easy to implement. The model has no free parameters except for the annealing parameters. We show how the model can be implemented for some special clustering problems. The T->0 limit of the Potts glass is identical to the vector quantization algorithm with certain increments. A comparative study of the Potts glass and vector quantization is also made, and it is shown that for difficult clustering problems, the Potts glass is far better than vector quantization.
JESUNG AHN
Central Research Laboratory, GoldStar, 16, Woomyeon-Dong, Seocho-Gu,
Seoul, 137-140 South Korea
SEONGWON CHO
Department of Electrical and Control Engineering, Hong Ik University,
Mapo-Gu, Seoul, 121-791 South Korea
Conventional neural networks utilize all the dimensions of the original input patterns for training and classification. However, a particular attribute of the input patterns does not necessarily contribute to classification and may even cause misclassification in certain cases.A new ensemble competitive learning method using the reduced input dimension is proposed. In contrast to the previous ensemble neural networks which adjust learning parameters, the proposed method takes advantage of the information in each dimension of the input patterns. Since the degree of contribution of each attribute to classification is not known beforehand, the different input data sets with one dimension reduced are presented to multiple neural networks. The classification information from each competitive learning neural network is then combined to make a final decision for classification.
In order to improve classification accuracy, the ambiguous output neurons are eliminated which cannot be assigned to any class after training. We use three consensus schemes to judge the classification using ensemble neural networks. The experimental results with remote sensing and speech data indicate the improved performance of the proposed method.
EDITORIAL
K. J. HUNT
Systems Technology Research, Daimler-Benz AG, Alt-Moabit 96 A,
D-10559 Berlin, Germany
FIONA MURTAGH
Université René Descartes, UFR de Mathématiques et
Informatique, 45, rue des Saints-Péres, 75006 Paris, France
MARC SARAZIN
Space Telescope - European Coordinating Facility, European Southern Observatory, Karl-Schwarzschild-Str. 2, D-85748 Garching bei München, Germany
Dynamical Recurrent Neural Networks (DRNN) are a class of fully recurrent networks obtained by modeling synapses as autoregressive filters. By virtue of their internal dynamic, these networks approximate the underlying law governing the time series by a system of nonlinear difference equations of internal variables. They therefore provide history-sensitive forecasts without having to be explicitly fed with external memory. The model is trained by a local and recursive error propagation algorithm called temporal-recurrent-backpropagation. The efficiency of the procedure benefits from the exponential decay of the gradient terms backpropagated through the adjoint network. We assess the predictive ability of the DRNN model with meteorological and astronomical time series recorded around the candidate observation sites for the future VLT telescope. The hope is that reliable environmental forecasts provided with the model will allow the modern telescopes to be preset, a few hours in advance, in the most suited instrumental mode. In this perspective, the model is first appraised on precipitation measurements with traditional nonlinear AR and ARMA techniques using feedforward networks. Then we tackle a complex problem, namely the prediction of astronomical seeing, known to be a very erratic time series. A fuzzy coding approach is used to reduce the complexity of the underlying laws governing the seeing. Then, a fuzzy correspondence analysis is carried out to explore the internal relationships in the data. Based on a carefully selected set of meteorological variables at the same time-point, a nonlinear multiple regression, termed nowcasting, is carried out on the fuzzily coded seeing records. The DRNN is shown to outperform the fuzzy k-nearest neighbors method.
M. BROWN
Image, Speech and Intelligent Systems Research Group, Department
of Electronics and Computer Science, University of Southampton, UK
The conditions under which spline-based networks are functionally equivalent to the Takagi-Sugeno-model of fuzzy inference are formally established. We consider a generalized form of basis function network whose basis functions are splines. The result admits a wide range of fuzzy membership functions which are commonly encountered in fuzzy systems design. We use the theoretical background of functional equivalence to develop a hybrid fuzzy-spline net for inverse dynamic modeling of a hydraulically driven robot manipulator.
Research in fuzzy neural networks, which started from application oriented fuzzy system tuning, then moving to the automatic generation of fuzzy systems from data, is reaching a more mature stage, especially after the proof of functional equivalence of certain fuzzy models and neural networks. It is essential that the applicability of such developments is explored emphasizing the directions that research should follow. It can be shown that the nearest prototype classifier is functionally equivalent to an alternative fuzzy classifier model. Efficient, hardware friendly training algorithms are developed for dynamic generation of an optimum number of nearest prototypes for neural classifiers which enable the generation of fuzzy systems in real time. These systems are tested with complex applications showing the simulation results.
This paper outlines some of the theoretical and practical developments being made in neurofuzzy systems. As the name suggests, neurofuzzy networks were developed by fusing the ideas that originated in the fields of neural and fuzzy systems. A neurofuzzy network attempts to combine the transparent, linguistic, symbolic representation associated with fuzzy logic with the architecture and learning rules commonly used in neural networks. These hybrid structures have both a qualitative and a quantitative interpretation and can overcome some of the difficulties associated with solely neural algorithms which can usually be regarded as black box mappings, and with fuzzy systems where few modelling and learning theories existed. Both B-spline and Gaussian Radial Basis Function networks can be regarded as neurofuzzy systems and soft inductive learning algorithms can be used to extract unknown, qualitative information about the relationships contained in the training data. In a similar manner, qualitative rules or information about the network's structure can be used to initialise the system. These areas, coupled with the extensive work being carried out on theoretically analysing their modelling, convergence and stability properties means that this research topic is highly applicable in "intelligent" modelling and control problems. Apart from outlining this work, the paper also discusses a wide variety of open research questions and suggests areas where new efforts may be fruitfully applied.
ON THE EQUIVALENCE OF TWO-LAYERED PERCEPTRONS WITH BINARY NEURONS
MARCELO BLATT and EYTAN DOMANY
Department of Physics of Complex Systems, The Weizmann Institute
of Science, Rehovot 76100, Israel
IDO KANTER
Department of Physics, Bar Ilan University, 52900 Ramat Gan, Israel
We consider two-layered perceptrons consisting of N binary input units, K binary hidden units and one binary output unit, in the limit $N\gg K\ge 1$. We prove that the weights of a regular irreducible network are uniquely determined by its input-output map up to some obvious global symmetries. A network is regular if its K weight vectors from the input layer to the K hidden units are linearly independent. A (single layered) perceptron is said to be irreducible if its output depends on every one of its input units; and a two-layered perceptron is irreducible if the K+1 perceptrons that constitute such network are irreducible. By global symmetries we mean, for instance, permuting the labels of the hidden units. Hence, two irreducible regular two-layered perceptrons that implement the same Boolean function must have the same number of hidden units, and must be composed of equivalent perceptrons.
SHAOWEI XIA
Department of Automation, Tsinghua University,
Beijing 100084, China
This paper presents two compensation methods for multilayer perceptrons (MLPs) which are very difficult to train by traditional Back Propagation (BP) methods. For MLPs trapped in local minima, compensating methods can correct the wrong outputs one by one using constructing techniques until all outputs are right, so that the MLPs can skip from the local minima to the global minima. A hidden neuron is added as compensation for a binary input three-layer perceptron trapped in a local minimum; and one or two hidden neurons are added as compensation for a real input three-layer perceptron. For a perceptron of more than three layers, the second hidden layer from behind will be temporarily treated as the input layer during compensation, hence the above methods can also be used. Examples are given.
GARRISON W. COTTRELL
Institute for Neural Computation, Computer Science &
Engineering, University of California, San Diego, La Jolla,
California 92093, USA
A recurrent learning algorithm based on a finite difference discretization of continuous equations for neural networks is derived. This algorithm has the simplicity of discrete algorithms while retaining some essential characteristics of the continuous equations. In discrete networks learning smooth oscillations is difficult if the period of oscillation is too large. The network either grossly distorts the waveforms or is unable to learn at all. We show how the finite difference formulation can explain and overcome this problem. Formulas for learning time constants and time delays in this framework are also presented.
WEON-CHANG SHIN
Power Electronics Group, Research Lab., POSCON
HYO-GYU KIM
Automation Research Institute, Samsung Electronics Company
The industrial robot's dynamic performance is frequently measured by positioning accuracy at high speeds and a good dynamic controller is essential that can accurately compute robot dynamics at a servo rate high enough to ensure system stability. A real-time dynamic controller for an industrial robot is developed here using neural networks. First, an efficient time-selectable hidden layer architecture has been developed based on system dynamics localized in time, which lends itself to real-time learning and control along with enhanced mapping accuracy. Second, the neural network architecture has also been specially tuned to accommodate servo dynamics. This not only facilitates the system design through reduced sensing requirements for the controller but also enhances the control performance over the control architecture neglecting servo dynamics. Experimental results demonstrate the controller's excellent learning and control performances compared with a conventional controller and thus has good potential for practical use in industrial robots.
VICENS GAITAN
Institut de Física d'Altes Energies, Universitat Autònoma
de Barcelona, E-08193 Bellaterra (Barcelona), Spain
MIQUEL SERRA-RICART and XAVIER CALBET
Instituto de Astrofísica de Canarias, E-38200, La Laguna (Tenerife), Spain
We present a new method based on multilayer feedforward neural nets for displaying an n-dimensional distribution in a projected space of 1, 2 or 3 dimensions. A fully nonlinear net with several hidden layers is used. Efficient learning is achieved using multi-seed backpropagation. As a principal component analysis (PCA), the proposed method is useful for extracting information on the structure of the data set, but unlike the PCA, the transformation between the original distribution and the projected one is not restricted to be linear. Artificial examples and a real application are presented in order to show the reliability and potential of the method.
This paper addresses the structure and an associated on-line learning algorithm of a feedforward multilayer connectionist network for realizing the basic elements and functions of a traditional fuzzy logic controller. The proposed Fuzzy Adaptive Learning Control Network (FALCON) can be contrasted with the traditional fuzzy logic control systems in their network structure and learning ability. An on-line structure/parameter learning algorithm, called FALCON-ART, is proposed for constructing the FALCON dynamically. The FALCON-ART can partition the input/output space in a flexible way based on the distribution of the training data. Hence it can avoid the problem of combinatorial growing of partitioned grids in some complex systems. It combines the backpropagation learning scheme for parameter learning and the fuzzy ART algorithm for structure learning. More notably, the FALCON-ART can on-line partition the input/output spaces, tune membership functions, and find proper fuzzy logic rules dynamically without any a priori knowledge or even any initial information on these. The proposed learning scheme has been successfully used to control two unstable nonlinear systems. They are the seesaw system and the inverted wedge system.
In the neural network/genetic algorithm community, rather limited success in the training of neural networks by genetic algorithms has been reported. In a paper by Whitley et al. (1991), he claims that, due to "the multiple representations problem", genetic algorithms will not effectively be able to train multilayer perceptrons, whose chromosomal representation of its weights exceeds 300 bits. In the following paper, by use of a "real-life problem", known to be non-trivial, and by a comparison with "classic" neural net training methods, I will try to show, that the modest success of applying genetic algorithms to the training of perceptrons, is caused not so much by the "multiple representations problem" as by the fact that problem-specific knowledge available is often ignored, thus making the problem unnecessarily tough for the genetic algorithm to solve. Special success is obtained by the use of a new fitness function, which takes into account the fact that the search performed by a genetic algorithm is holistic, and not local as is usually the case when perceptrons are trained by traditional methods.
P. SEETHARAMULU and ARUN K. PUJARI
Artificial Intelligence Laboratory, Department of Computer and
Information Science, University of Hyderabad, P.O. Central
University, Hyderabad - 500046 India
The aim of the present work is to recognize printed and handwritten Telugu characters using artificial neural networks (ANNs). Earlier work on recognition of Telugu characters has been done using conventional pattern recognition techniques. We make an initial attempt here of using neural networks for recognition with the aim of improving upon earlier methods which do not perform effectively in the presence of noise and distortion in the characters. The Hopfield model of neural network working as an associative memory is chosen for recognition purposes initially. Due to limitation in the capacity of the Hopfield neural network, we propose a new scheme named here as the Multiple Neural Network Associative Memory (MNNAM). The limitation in storage capacity has been overcome by combining multiple neural networks which work in parallel. It is also demonstrated that the Hopfield network is suitable for recognizing noisy printed characters as well as handwritten characters written by different "hands" in a variety of styles. Detailed experiments have been carried out using several learning strategies and results are reported. It is shown here that satisfactory recognition is possible using the proposed strategy. A detailed preprocessing scheme of the Telugu characters from digitized documents is also described.
Y. ROMEM
The Institute of Medical Genetics, Soroka Medical
Center, Beer-Sheva, Israel 84101
A multilayer perceptron (MLP) neural network (NN) has been studied for human chromosome classification. Only 10 - 20 examples were required for the MLP NN to reach its ultimate performance classifying chromosomes of five types. The empirical dependence of the entropic error on the number of examples was found to be highly comparable to the 1/t function. The principal component analysis (PCA) was used, both for network initialization and for feature reduction purposes. The PCA demonstrated the importance of retaining most of the image information whenever small training sets are used. The MLP NN classifier outperformed the Bayes piecewise classifier for all the cases tested. The MLP classifier was found to be almost unsusceptible to the ratio of the number of training vectors to the number of features, whereas the piecewise classifier was highly dependent on this ratio.
NONLINEAR GATED EXPERTS FOR TIME SERIES: DISCOVERING REGIMES AND AVOIDING OVERFITTING
ANDREAS S. WEIGEND
Department of Computer Science and Institute of Cognitive Science, University of Colorado, Boulder, CO 80309-0430, USA
MORGAN MANGEAS
Electricit´ de France, Direction des Etudes et Recherches, 1, av. du géneral de Gaulle, 92141 Clamart, France; and Department of Computer Science, University of Colorado, Boulder, CO 80309-0430, USA
ASHOK N. SRIVASTAVA
Department of Electrical and Computer Engineering and Center for Space Construction, University of Colorado, Boulder, CO 80309-0529, USA
In the analysis and prediction of real-world systems, two of the key problems are nonstationarity (often in the form of switching between regimes), and overfitting (particularly serious for noisy processes). This article addresses these problems using gated experts, consisting of a (nonlinear) gating network, and several (also nonlinear) competing experts. Each expert learns to predict the conditional mean, and each expert adapts its width to match the noise level in its regime. The gating network learns to predict the probability of each expert, given the input. This article focuses on the case where the gating network bases its decision on information from the inputs. This can be contrasted to hidden Markov models where the decision is based on the previous state(s) (i.e. on the output of the gating network at the previous time step), as well as to averaging schemes over several predictors. In contrast, gated experts soft-partition the input space, only learning to model their region.This article discusses the underlying statistical assumptions, derives the weight update rules, and compares the performance of gated experts to standard methods on three time series: (1) a computer-generated series, obtained by randomly switching between two nonlinear processes; (2) a time series from the Santa Fe Time Series Competition (the light intensity of a laser in chaotic state); and (3) the daily electricity demand of France, a real-world multivariate problem with structure on several time scales. The main results are: (1) the gating network correctly discovers the different regimes of the process; (2) the widths associated with each expert are important for the segmentation task (and they can be used to characterize the sub-processes); and (3) there is less overfitting compared to single networks (homogeneous multilayer perceptrons), since the experts learn to match their variances to the (local) noise levels. This can be viewed as matching the local complexity of the model to the local complexity of the data.
This paper investigates fault tolerance in feedforward neural networks, for a realistic fault model based on analog hardware. In our previous work with synaptic weight noise we showed significant fault tolerance enhancement over standard training algorithms. We proposed that when introduced into training, weight noise distributes the network computation more evenly across the weights and thus enhances fault tolerance. Here we compare those results with an approximation to the mechanisms induced by stochastic weight noise, incorporated into training deterministically via penalty terms. The penalty terms are an approximation to weight saliency and therefore, in addition, we assess a number of other weight saliency measures and perform comparison experiments. The results show that the first term approximation is an incomplete model of weight noise in terms of fault tolerance. Also the error Hessian is shown to be the most accurate measure of weight saliency.
The basic Nearest Neighbor Classifier (NNC) is often inefficient for classification in terms of memory space and computing time needed if all training samples are used as prototypes. These problems can be solved by reducing the number of prototypes using clustering algorithms and optimizing the prototypes using a special neural network model. In this paper, we compare the performance of the multilayer neural network and an Optimized Nearest Neighbor Classifier (ONNC) for handwritten digit recognition applications. We show that an ONNC can have the same recognition performance as an equivalent neural network classifier. The ONNC can be efficiently implemented using prototype and variable ranking, partial summation and distance triangular inequality based strategies. It requires the same memory space as, but less, training time and classification time than the neural network.
Classical optimal control methods, notably Pontryagin's Maximum (Minimum) Principle (PMP) can be employed, together with Hamiltonians, to determine optimal system weights in Artificial Neural dynamical systems. A new learning rule based on weight equations derived using PMP is shown to be suitable for both discrete- and continuous-time systems, and moreover, can also be applied to feedback networks. Preliminary testing shows that this PMP learning rule compares favorably with Standard BackPropagation (SBP) on the XOR problem.
R. RE
Dipartimento di Matematica
University of Catania, v. le A. Doria 6, 95125, Catania, Italy
In this paper the approximation capabilities of different structures of complex feedforward neural networks, reported in the literature, have been theoretically analyzed. In particular a new density theorem for Complex Multilayer Perceptrons with complex valued non-analytic sigmoidal activation functions has been proven. Such a result makes Multilayer Perceptrons with complex valued neurons universal interpolators of continuous complex valued functions. Moreover the approximation properties of superpositions of analytic activation functions have been investigated, proving that such combinations are not dense in the set of continuous complex valued functions. Several numerical examples have also been reported in order to show the advantages introduced by Complex Multilayer Perceptrons in terms of computational complexity with respect to the classical real MLP.
We show that Neural Networks can efficiently model multivalued transfer functions. We propose a method related to conditional density approximation $p(\bold y/\bold x)$ and test the validity of the approach on a remote sensing problem.
A new concept called correlation significance for expanding the attraction regions around all the stored vectors (attractors) of an asynchronous auto-associative memory is introduced. Since the well known outer product rule adopts equally-weighted correlation matrix for the neuron connections, the attraction region around each attractor is not maximized. In order to maximize these attraction regions, we devise a rule that all the correlations between two different components of two different stored patterns should be unequally weighted. By this formalism, the connection matrix T of the asynchronous neural network is designed by using the gradient descent approach. Additionally, an exponential type error function is constructed such that the number of successfully stored vectors can be directly examined during the entire learning process. Finally, computer simulations demonstrate the efficiency and capability of this scheme.
The ability to derive minimal network architectures for neural networks has been at the center of attention for several years now. To this date numerous algorithms have been proposed to automatically construct networks. Unfortunately, these algorithms lack a fundamental theoretical analysis of their capabilities and only empirical evaluations on a few selected benchmark problems exist. Some theoretical results have been provided for small classes of well-known benchmark problems such as parity- and encoder-functions, but these are of little value due to their restrictiveness. In this work we describe a general class of 2-layer networks with 2 hidden units capable of representing a large set of problems. The cardinality of this class grows exponentially with regard to the inputs N. Furthermore, we outline a simple algorithm that allows us to determine, if any function (problem) is a member of this class. The class considered in this paper includes the benchmark problems parity and symmetry. Finally, we expand this class to include an even larger set of functions and point out several interesting properties it exhibits.
AN ATTRACTOR NEURAL NETWORK MODEL OF CLASSICAL CONDITIONING
SERGIO D. SERULNIK
and
MOSHE GUR
and
SERGIO GOMEZ
Hardware prototypes of a Boolean Neural Network and the
Simulated Annealing optimization method have been
designed using discrete components. The Boolean
Neural Network implementation is basically a dynamically
configurable feedforward network of Boolean logic gates of two
inputs. The Simulated Annealing implementation is a general
purpose hardware tool for multivariable optimization tasks. Here
it is applied to do supervised training of the Boolean Neural
Network hardware.
Return to
Volume 7 Number 1 (March 1996)
Abstracts of Papers
Department of Neurobiology, Brain Research Building, The Weizmann Institute of Science, 76100, Rehovot, Israel
Department of Biomedical Engineering, Technion, 32000, Haifa, IsraelLiving beings learn to associate known stimuli that exhibit
specific temporal correlations. This kind of learning is called
associative learning, and the process by which animals change
their responses according to the schedule of arriving stimuli is
called "classical conditioning". In this paper, a
conditionable neural network which exhibits features like
forward conditioning, dependency on the interstimulus interval,
and absence of backward and reverse conditioning is presented.
An asymmetric neural network was used and its ability to
retrieve a sequence of embedded patterns using a single
recalling input was exploited. The main assumption was that
synapses that respond with different time constants coexist in
the system. These synapses induce transitions between different
embedded patterns. The appearance of a correct transition when
only the first stimulus is applied, is interpreted as a
realization of the conditioning process. The model also allows
the analytical description of the conditioning process in terms
of internal and external or researcher-controlled variables.
ANALYTICAL INTERPRETATION OF FEED-FORWARD NETS OUTPUTS AFTER TRAINING
LLUIS GARRIDO
Departament d'Estructura i Constituents de la Matéria,
Facultat de Física, Universitat de Barcelona, Diagonal 647, 08028 Barcelona, Spain
Dept. d'Enginyeria Informática (ETSE), Univ. Rovira i Virgili, Tarragona (Spain)The minimization quadratic error criterion which gives rise to
the back-propagation algorithm is studied using functional
analysis techniques. With them, we recover easily the well-known
statistical result which states that the searched global minimum
is a function which assigns, to each input pattern, the expected
value of its corresponding output patterns. Its application to
classification tasks shows that only certain output class
representations can be used to obtain the optimal Bayesian
decision rule. Finally, our method permits the study of other
error criterions, finding out, for instance, that absolute value
errors lead to medians instead of mean values.
SPEAKER IDENTIFICATION USING TIME-DELAY HMEs
KE CHEN, DAHONG XIE and HUISHENG CHI
National Lab. of Machine Perception and Center for Information Science, Peking University, Beijing 100871, ChinaIn this paper, we extend the Hierarchical Mixture of
Experts (HME) to temporal processing and explore it for a
substantial problem, that of text-dependent speaker
identification. For a specific multiway classification, we
propose a generalized Bernoulli density instead of the
multinomial log density to avoid the instability during
training. Time-delay technique is applied for spatio-temporal
processing in the HME and a combining scheme is presented for
combining multiple time-delay HMEs in order to complete a
multi-scale analysis for the temporal data. Using the time-delay
HME along with the EM algorithm as well as the combination of
multiple time-delay HMEs, the speaker identification system has
a good performance and yields significantly fast training. We have
also addressed some issues about the time-delay techniques in
the HME.
HARDWARE PROTOTYPES OF A BOOLEAN NEURAL NETWORK AND THE SIMULATED ANNEALING
OPTIMIZATION METHOD
JARKKO NIITTYLAHTI
Nokia Consumer Electronics, Meesmannstr. 105-107, D-44807 Bochum, GermanyA Boolean Neural Network is a neural network that operates with
binary weight values of "1" and "0". Otherwise it is
formally analogous to the Multilayer Perceptron (MLP).
Simulated Annealing is a stochastic optimization method that is
suitable for performing nonlinear multivariable optimization
tasks. Training a Boolean Neural Network is a well-suited
problem to this algorithm. However, the Simulated Annealing
method is computationally heavy, which makes the training
procedure slow. The training speed can be improved by using
custom designed hardware for the whole system including the
optimization method and the neural network.
A UNIFIED NEURAL BIGRADIENT ALGORITHM FOR ROBUST PCA AND MCA
LIUYUE WANG and JUHA KARHUNEN
Helsinki University of Technology, Laboratory of Computer and Information Science, Rakentajanaukio 2 C, FIN-02150 Espoo, FinlandA new instantaneous-gradient search algorithm for computing a
principal component or minor component type solution is
proposed. The algorithm can use normalized Hebbian or
anti-Hebbian learning in a unified formula. Starting from
one-unit rule, a multi-unit algorithm is developed which can
simultaneously extract several robust counterparts of the
principal or minor eigenvectors of the data covariance matrix.
Standard principal or minor components emerge as special cases
from the general non-quadratic criterion. The learning rule is
analyzed mathematically, and the theoretical results are
verified by simulations. The proposed bigradient approach can be
applied to blind separation of independent source signals from
their linear mixtures.
ON THE CHARACTERISTICS OF THE QUADRATIC ORDER ASSOCIATIVE MEMORY THAT USES SYNCHRONOUS UPDATE AND DIRECT CONVERGENCE
JUNG-HUA WANG
Computer Engineering Lab, Department of Electrical Engineering,
National Taiwan Ocean University, 2 Pei-Ning Rd., Keelung, TaiwanA statistical method is applied to explore the characteristics
of a certain class of quadratic order associative memories: the
Synchronous Update Direct Convergence Memory (SQDM). The initial
input vector is required to converge, with a given value of
probability $P_{dc}$, to a stored codeword in just one
synchronous update. The memory capacity $m_s$ is derived by
using a figure of merit $N\Phi$, and its tight asymptotic bound
is found by using Tchebycheff's inequality. When the input
contains erroneous bits, the maximum allowable number of
attractors and their attraction radii are determined. The
existence of principal connections $T_{pr}$ useful in solving
the problem of proliferation of connections in the SQDM is
introduced. We prove that $T_{pr}$ is a set of which an
arbitrary connection $T_{ijk}$ satisfies
$\sqrt{m_s}\le|T_{ijk}|\le2\sqrt{m_s}$.
AN ADAPTIVE BOOLEAN AUTOMATON TO MODEL CIRCADIAN CYCLES
R. P. J. PERAZZO and A. SCHUSCHNY
Centro de Estudios Avanzados, Univ. de Buenos Aires, Uriburu 950, (1114) Buenos Aires, ArgentinaWe propose a Boolean cellular automaton to model an artificial
adaptive living organism in order to investigate the development
of cyclic vital functions during a simulated evolutionary
process. The organism is endowed with a basic architecture
consisting of several sensor (input), motor (output) and
processing Boolean gates whose connectivity pattern is adapted
with a genetic algorithm. Cyclic searching behaviors develop
that are tuned to the spatial distribution of "food". Under
additional assumptions we also find that internal pacemakers can
develop to adapt plastically to the alternance of "light" and
"darkness". These pacemakers coexist with a "free running"
regime in which the circadian cycles persist even in the absence
of external periodic stimuli.
ROLE OF CHAOS IN TRIAL-AND-ERROR PROBLEM SOLVING BY AN ARTIFICIAL NEURAL NETWORK
ICHIRO OBANA and YASUHIRO FUKUI
Department of Applied Electronic Engineering, Tokyo Denki University,
Hatoyama, Hiki, Saitama Prefecture, 350-03, Japan One role of chaotic neural activity is illustrated by means of
computer simulations of an imaginary agent's goal-oriented
behavior. The agent has a simplified neural network with seven
neurons and three legs. The neural network consists of one
photosensory neuron and three pairs of inter- and motor neurons.
The three legs whose movements are governed by the three motor
neurons allow the agent to walk in six concentric radial
directions on a plane. It is intended that the neural network
causes the agent to walk in a direction of greater brightness,
to reach finally the most brightly lit place on the plane. The
presence of only one sensory neuron has an important meaning.
That is, no immediate information on directions of greater
brightness is sensed by the agent. In other words, random
walking in the manner of trial-and-error problem solving must be
involved in the agent's walking. Chaotic firing of the motor
neurons is intended to play a crucial role in generating the
random walking. Brief random walking and rapid straight walking
in a direction of greater brightness were observed to occur
alternately in the computer simulation. Controlled chaos in
naturally occurring neural networks may play a similar role.