Frontiers in Science

p-ISSN: 2166-6083    e-ISSN: 2166-6113

2019;  9(1): 20-32

doi:10.5923/j.fs.20190901.03

 

Neural Network Estimation of Some Noisy Asymmetric Dynamical Maps with Use FFT as Transfer Function

Salah H. Abid, Saad S. Mahmood, Yaseen A. Oraibi

Department of Mathematics, College of Education, AL-Mustansiriyah University, Iraq

Correspondence to: Salah H. Abid, Department of Mathematics, College of Education, AL-Mustansiriyah University, Iraq.

Email:

Copyright © 2019 The Author(s). Published by Scientific & Academic Publishing.

This work is licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0/

Abstract

The aim of this paper is to design a feed forward artificial neural network (Ann) to estimate one dimensional noisy Asymmetric dynamical map by selecting an appropriate network, transfer function and node weights to get noisy Asymmetric dynamical map estimation. The proposed network side by side with using Fast Fourier Transform (FFT) as transfer function is used. For different cases of the system, noisy Asymmetric Logistic noisy Asymmetric Logistic -Tent and noisy Asymmetric Tent-Logistic, the experimental results of proposed algorithm will compared empirically, by means of the mean square error (MSE) with the results of the same network but with traditional transfer functions, Logsig and Tagsig. The performance of proposed algorithm is best from others in all cases from Both sides, speed and accuracy.

Keywords: FFT, Logsig, Tagsig, Feed Forward neural network, Transfer function, Noisy Asymmetric map, Noise normal

Cite this paper: Salah H. Abid, Saad S. Mahmood, Yaseen A. Oraibi, Neural Network Estimation of Some Noisy Asymmetric Dynamical Maps with Use FFT as Transfer Function, Frontiers in Science, Vol. 9 No. 1, 2019, pp. 20-32. doi: 10.5923/j.fs.20190901.03.

1. Introduction

Ann is a simplified mathematical model of the human brain. It can be implemented by both electric elements and computer software. It is a parallel distributed processor with large numbers of connections, it is an information processing system that has certain performance characters in common with biological neural networks. Ann has been developed as generalizations of mathematical models of human cognition or neural biology, based on the assumptions that:
1- Information processing occurs at many simple elements called neurons that are fundamental to the operation of Ann's.
2- Signals are passed between neurons over connection links.
3- Each connection link has an associated weight which, in a typical neural net, multiplies the signal transmitted.
4- Each neuron applies an action function (usually nonlinear) to its net input (sum of weighted input signals) to determine its output signal [16].
The units in a network are organized into a given topology by a set of connections, or weights.
Ann is characterized by [31]:
1- Architecture: its pattern of connections between the neurons.
2- Training Algorithm: its method of determining the weights on the connections.
3- Activation function.
Ann are often classified as single layer or multilayer. In determining the number of layers, the input units are not counted as a layer, because they perform no computation. Equivalently, the number of layers in the net can be defined to be the number of layers of weighted interconnects links between the slabs of neurons [47].

1.1. Multilayer Feed Forward Architecture [23]

In a layered neural network the neurons are organized in the form of layers. We have at least two layers: an input and an output layer. The layers between the input and the output layer (if any) are called hidden layers, whose computation nodes are correspondingly called hidden neurons or hidden units. Extra hidden neurons raise the network’s ability to extract higher-order statistics from (input) data.
The Ann is said to be fully connected in the sense that every node in each layer of the network is connected to every other node in the adjacent forward layer; otherwise the network is called partially connected. Each layer consists of a certain number of neurons; each neuron is connected to other neurons of the previous layer through adaptable synaptic weights w and biases b.

1.2. Literature Review

Pan and Duraisamy in 2018 [34] studied the use of feedforward neural networks (FNN) to develop models of non-linear dynamical systems from data. Emphasis is placed on predictions at long times, with limited data availability. Inspired by global stability analysis, and the observation of strong correlation between the local error and the maximal singular value of the Jacobian of the ANN, they introduced Jacobian regularization in the loss function. This regularization suppresses the sensitivity of the prediction to the local error and is shown to improve accuracy and robustness. Comparison between the proposed approach and sparse polynomial regression is presented in numerical examples ranging from simple ODE systems to nonlinear PDE systems including vortex shedding behind a cylinder, and instability-driven buoyant mixing ow. Furthermore, limitations of feedforward neural networks are highlighted, especially when the training data does not include a low dimensional attractor. The need to model dynamical behavior from data is pervasive across science and engineering. Applications are found in diverse fields such as in control systems [44], time series modeling [40], and describing the evolution of coherent structures [13]. While data-driven modeling of dynamical systems can be broadly classified as a special case of system identification [24], it is important to note certain distinguishing qualities: the learning process may be performed off-line, physical systems may involve very high dimensions, and the goal may involve the prediction of long-time behavior from limited training data. Artificial neural networks (ANN) have attracted considerable attention in recent years in domains such as image recognition in computer vision [19, 38] and in control applications [13]. The success of ANNs arises from their ability to effectively learn low-dimensional representations from complex data and in building relationships between features and outputs. Neural networks with a single hidden layer and nonlinear activation function are guaranteed to be able to predict any Borel measurable function to any degree of accuracy on a compact domain [18]. The idea of leveraging neural networks to model dynamical systems has been explored since the 1990s. ANNs are prevalent in the system identification and time series modeling community [21, 29, 30, 36], where the mapping between inputs and outputs is of prime interest. Billings et al. [6] explored connections between neural networks and the nonlinear autoregressive moving average model (NARMAX) with exogenous inputs. It was shown that neural networks with one hidden layer and sigmoid activation function represent an infinite series consisting of polynomials of the input and state units. Elanayar and Shin [14] proposed the approximation of nonlinear stochastic dynamical systems using radial basis feedforward neural networks. Early work using neural networks to forecast multivariate time series of commodity prices [10] demonstrated its ability to model stochastic systems without knowledge of the underlying governing equations. Tsung and Cottrell [45] proposed learning the dynamics in phase space using a feedforward neural network with time-delayed coordinates. Paez and Urbina [32, 33, 46] modeled a nonlinear hardening oscillator using a neural network-based model combined with dimension reduction using canonical variate analysis (CVA). Smaoui [41, 42, 43] pioneered the use of neural networks to predict fluid dynamic systems such as the unstable manifold model for bursting behavior in the 2-D Navier-Stokes and the Kuramoto-Sivashinsky equations. The dimensionality of the original PDE system is reduced by considering a small number of proper orthogonal decomposition (POD) coefficients [5]. Interestingly, similar ideas of using principal component analysis for dimension reduction can be traced back to work in cognitive science by Elman [15]. Elman also showed that knowledge of the intrinsic dimensions of the system can be very helpful in determining the structure of the neural network. However, in the majority of the results [41, 42, 43], the neural network model is only evaluated a few time steps from the training set, which might not be a stringent performance test if longer time predictions are of interest. ANNs have also been applied to chaotic nonlinear systems that are challenging from a data-driven modeling perspective, especially if long time predictions are desired. Instead of minimizing the pointwise prediction error, Bakker et al. [4] satisfied the Diks’criterion in learning the chaotic attractor. Later, Lin et al. [22] demonstrated that even the simplest feedforward neural network for nonlinear chaotic hydrodynamics can show consistency in the time-averaged characteristics, power spectra, and Lyapunov exponent between the measurements and the model. A major difficulty in modeling dynamical systems is the issue of memory. It is known that even for a Markovian system, the corresponding reduced-dimensional system could be non-Markovian [11, 35]. In general, there are two main ways of introducing memory effects in neural networks. First, a simple workaround for feedforward neural networks (FNN) is to introduce time delayed states in the inputs [12]. However, the drawback is that this could potentially lead to an unnecessarily large number of parameters [20]. To mitigate this, Bakker [4] considered following Broomhead and King [7] in reducing the dimension of the delay vector using weighted principal component analysis (PCA). The second approach uses output or hidden units as additional feedback. As an example, Elman’s network [20] is a recurrent neural network (RNN) that incorporates memory in a dynamic fashion. Miyoshi et al. [26] demonstrated that recurrent RBF networks have the ability to reconstruct simple chaotic dynamics. Sato and Nagaya [39] showed that evolutionary algorithms can be used to train recurrent neural networks to capture the Lorenz system. Bailer-Jones et al. [3] used a standard RNN to predict the time derivative in discrete or continuous form for simple dynamical systems; this can be considered an RNN extension to Tsung’s phase space learning [45]. Wang et al. [48] proposed a framework combining POD for dimension reduction and long-short-term memory (LSTM) recurrent neural networks and applied it to a fluid dynamic system.

1.3. Fast Fourier Transform

The first to propose the techniques that we now call the fast Fourier transform (FFT) for calculating the coefficients in a trigonometric expansion of an asteroid’s orbit in 1805 [9]. However, Fast Fourier transform is an algorithm that calculates the value of the discret Fourier transform in faster. The speed this algorithm is due to the fact that it does not calculate the parts that are equal to zero. The algorithm is discovered by James W. Cooley and John W. Tukey who published the algorithm in 1965 [12].
As know today
(1)
The coefficients can be datermain as follows:
(2)
(3)
(4)
The discrete Fourier transform (DFT) is one of the most powerful tools in Digital signal processing. The DFT enables us to conveniently analyze and Design systems in frequency domain [1] and the formal as:
(5)

2. Noise Asymmetric Map Solution

In this section we will explain how this approach can be used to find the approximate solution of the Asymmetric map.
NA(x) is the solution to be computed. Let yt(x, p) denotes a trial solution with adjustable parameters p.
In the proposed approach, the trial solution yt employs a FFNN and the parameters p corresponding to the weights and biases of the neural architecture. We choose a form for the trial function yt(x) such that yt(x,p) = N(x, p) where N(x, p) is a single-output FFNN with parameters (weights) p and n input units fed with the input vector x.

2.1. Computation of the Gradient

The error corresponding to each input vector xi is the value E (xi) which has to force near zero. Computation of this error value involves not only the FFNN output but also the derivatives of the output with respect to any of its inputs. Therefore, for computation the gradient of the error with respect to the network weights, consider a multilayer FFNN with n input units (where n is the dimensions of the domain), two hidden layer with H sigmoid units, q hidden layer and a linear output unit.
For a given input vector x ( x1, x2, …, xn ) the output of the FFNN is:
(6)
wij denotes the weight connecting the input unit j to the hidden unit i
denotes the weight connecting the hidden unit i to the hidden unit k
vk denotes the weight connecting the hidden unit k to the out put unit,
bi denotes the bias of hidden unit i,
bik denotes the bias of hidden unit i to the hidden unit k, and
σ is the transfer function
The gradient of suggest FFNN, with respect to the coefficients of the FFNN can be computed as:
(7)
(8)
(9)
(10)
(11)
The derivative of the error performance with respect to the FFNN coefficients can be defined and then it is easy to find minimization solution.

3. Suggested Networks

It is well known that a multilayer FFNN consist one hidden layer can approximate any function to any accuracy [28], but dynamical maps they have more completed behavior than other functions, thus, we suggest FFNN contains two hidden layer, one input and one output to estimate a solution for dynamic maps.
The suggested network divided the inputs in to two parts 60% for training and 40% for testing. The error quantity to be minimized is given by:
(12)
Where xi ∈ [0, 1]. It is easy to evaluate the gradient of the performance with respect to the coefficient. Using (7) – (11). The training progress algorithm of FFNN with supervises training and BFGS algorithm is given as follows. Assume that there are one nodes in the First layer (input layer), the second layer (hidden layer) consist with ten nodes, the third layer (hidden layer) consist with five nodes and the fourth layer (output layer) consist with one node.
Following the steps of the technique as an algorithm,
Step 0: input and target.
Insert the input (x: x1, x2, x3,…., xn) and the target.
Step 1: allocating inputs.
Each input goes to each neurons with a hidden layer.
Step 2: initialization weights.
We initial weights and bias from the uniform distribution respectively for all connection in the neural network.
Figure 1. Flowchart for training algorithm with BFGS
Step 3: select the following.
- parameter epoch
- parameter goal
- performance function (MSE)
Step 4: calculations in each node in the first hidden layer.
In each node in hidden layer, computing the sum of the product of weights and inputs and adding the result to the bias.
Step 5: compute the output of each node for the first hidden layer.
Take the active function for sum value in step4, then its output is sent to the second hidden layer as input.
Step 6: calculations in each node in the second layer.
In each node in second hidden layer, computing the sum of the product of weights and inputs and adding the result to the bias.
Step 7: compute the output of each node for the second hidden layer.
Take the active function for sum value in step6, then its output is sent to the output layer as input.
Step 8: calculations in output layer.
There is only one neuron (node) in the output layer. The node sum is the product of weights by inputs.
Step 9: compute the output of node in output layer
The value of active function for node output is also considered as the output of overall network.
Step 10: compute the mean square error (MSE).
The mean square error is computed as follows
the MSE is a measure of performance.
Step 11: The checking.
When such that is small value close to zero, then stop the training and the bias and weights are sent. Otherwise training process goes to the next step.
Step 12: when select the training rule, the low for update weights and bias between the hidden layer and the output layer are calculated
Step 13: the update weights and bias in output layer.
At end for each iteration, the weights and bias are updating as follows:
When (new) means the current iteration and (old) means the previous iteration,
represent the gradient for weights and bias, is the parameter selected to minimize the performance function along the search direction, represent the invers hessian matrix, v is the weight in the output layer and b is the bias.
Step 14: the update of weights and bias in the first hidden layer.
Each hidden node in the first hidden layer updates the weights and bias as follow:
Where w is the weight of hidden layer and b is the bias.
Step 15: the update of weights and bias in the second hidden layer as follow.
Where s is the weight of hidden layer and b is the bias.
Step 16: return to step2 for next iteration.

4. Asymmetric Logistic Map (ALM) [2]

The dynamical system for asymmetric logistic map can be defined as follows
(13)
Where and

4.1. Description of Training Process for Noisy Asymmetric Logistic Map (NALM)

We use the suggested network with tansig, logsig and FFT transfer functions to train the data for normal NALM. It is suitable to choose the maximum number of epochs to reach to the high performance. The variances used in this case are (0.05, 0.5, 15).
In this case, we will train with a=0.5with 4 cases values of bifurcation parameter where the first and second parts of ALM are as follow,
i- The two parts are noisy deterministic.
ii- The first part is noisy deterministic and the second part is noisy chaotic.
iii- The first part is noisy chaotic and the second part is noisy deterministic.
iv- The two parts are noisy chaotic.
It is worth to mention that the run size is k=1000.
Table from (1) to (9) contains the results.
Table (1). Time and MSE of approximate solution after many training trials with normal noise by using logsig transfer function when the variance equal to 0.05
Table (2). Time and MSE of approximate solution after many training trials with normal noise by using tansig transfer function when the variance equal to 0.05
Table (3). Time and MSE of approximate solution after many training trials with normal noise by using FFT transfer function when the variance equal to 0.05
Table (4). Time and MSE of approximate solution after many training trials with normal noise by using logsig transfer function when the variance equal to 0.5
Table (5). Time and MSE of approximate solution after many training trials with normal noise by using tansig transfer function when the variance equal to 0.5
Table (6). Time and MSE of approximate solution after many training trials with normal noise by using FFT transfer function when the variance equal to 0.5
Table (7). Time and MSE of approximate solution after many training trials with normal noise by using logsig transfer function when the variance equal to 15
Table (8). Time and MSE of approximate solution after many training trials with normal noise by using tansig transfer function when the variance equal to 15
Table (9). Time and MSE of approximate solution after many training trials with normal noise by using FFT transfer function when the variance equal to 15

4.2. Normal Distribution [37]

Let x be a normally distributed random variable with mean and variance , where with probability density function and cumulative distribution function are respectively,
(14)
(15)
One can generate values from Normal random variable as follows,
(16)
Where q is represent the number point and v is represent the variance.

4.3. Results Discussion

The experimental results in tables from (1) to (9) show that FFT transfer function has the best MSE results for all (i) to (iv) cases, with normal noisy.
The logsig and tansig transfer functions have good performance only in case (i) when two parts of NALM are deterministic with additive noise, but the performance of them is much less than that of FFT transfer function.

5. Tent Map

One dimensional tent map is expressed through the following equation:
(17)
Where x0 is the initial value, xn ∈ [0, 1] and r ∈ [0,2], [17]. A Tent map is an iterated function of a dynamical system that exhibits chaotic behaviours (orbits) and is governed by equation 3. It has a similar shape to the logistic map shape with a corner. It is one-dimensional map generating periodic chaotic behaviour similar to a logistic map. [25]
Figure 2. Bifurcation diagram for the tent map [27]
Figure 3. When the deferent value [27]

6. Asymmetric Logistic-tent Map (ALTM) [2]

The dynamical system for the asymmetric logistic-tent map can be defined as follows:
(18)
Where and

6.1. Discretion of Training Process for Noisy Asymmetric Logistic – tent Map (NALTM)

We use the suggested network with tansig, logsig and FFT transfer functions to train the data for normal NALTM., it is suitable to choose the maximum number of epochs to reach to the high performance, the variances used in this case are (0.05, 0.5, 15). It is worth to mention that the run size is k=1000.
Tables from (10) to (18) contain the results.
Table (10). Time and MSE of approximate solution after many training trials with normal noise by using logsig transfer function when the variance equal to 0.05
Table (11). Time and MSE of approximate solution after many training trials with normal noise by using tansig transfer function when the variance equal to 0.05
Table (12). Time and MSE of approximate solution after many training trials with normal noise by using FFT transfer function when the variance equal to 0.05
Table (13). Time and MSE of approximate solution after many training trials with normal noise by using logsig transfer function when the variance equal to 0.5
Table (14). Time and MSE of approximate solution after many training trials with normal noise by using tansig transfer function when the variance equal to 0.5
Table (15). Time and MSE of approximate solution after many training trials with normal noise by using FFT transfer function when the variance equal to 0.5
Table (16). Time and MSE of approximate solution after many training trials with normal noise by using logsig transfer function when the variance equal to 15
Table (17). Time and MSE of approximate solution after many training trials with normal noise by using tansig transfer function when the variance equal to 15
Table (18). Time and MSE of approximate solution after many training trials with normal noise by using FFT transfer function when the variance equal to 15

6.2. Results Discussion

The performance results proved that the use of FFT as transfer function is much superior compared with using tansig and logsig as transfer functions.
We see that the results of logsig when the variance (0.05 and 0.5) given in tables (10 and 13) and results of tansig transfer function in tables (11 and 14) are near from each other in performance. When the variance increase to value 15 the performance for this transfer functions is very bad.
As effective conclusion, we can say that the performance of proposed algorithm is best from others in all cases from Both sides, speed and accuracy.

7. Asymmetric Tent-logistic Map (ATLM)

The dynamical system for the asymmetric tent- logistic map can be defined as follows:
(19)
Where and

7.1. Description of Training Process for Noisy Asymmetric Tent-logistic Map (NATLM)

We use the suggested network with tansig, logsig and FFT transfer functions to train the data for normal NATLM., it is suitable to choose the maximum number of epochs to reach to the high performance, the variances used in this case are (0.05, 0.5, 15). It is worth to mention that the run size is k=1000.
Tables from (19) to (27) contain the results.
Table (19). Time and MSE of approximate solution after many training trials with normal noise by using logsig transfer function when the variance equal to 0.05
Table (20). Time and MSE of approximate solution after many training trials with normal noise by using tansig transfer function when the variance equal to 0.05
Table (21). Time and MSE of approximate solution after many training trials with normal noise by using FFT transfer function when the variance equal to 0.05
Table (22). Time and MSE of approximate solution after many training trials with normal noise by using logsig transfer function when the variance equal to 0.5
Table (23). Time and MSE of approximate solution after many training trials with normal noise by using tansig transfer function when the variance equal to 0.5
Table (24). Time and MSE of approximate solution after many training trials with normal noise by using FFT transfer function when the variance equal to 0.5
Table (25). Time and MSE of approximate solution after many training trials with normal noise by using logsig transfer function when the variance equal to 15
Table (26). Time and MSE of approximate solution after many training trials with normal noise by using tansig transfer function when the variance equal to 15
Table (27). Time and MSE of approximate solution after many training trials with normal noise by using FFT transfer function when the variance equal to 15

7.2. Results Discussion

From the tables of results, it is clear when the noise is normal with all considered variances values, that the FFT transfer function is the best among all others transfer functions.
We see that the results of logsig when the variance (0.05) given in table (19) are better than of results of tansig transfer function in table (20), when the variance (0.5) we see from the table (23) the results tansig are better than the results logsig in table (22) and When the variance increase to value 15 the performance for this transfer functions is very bad but the results.

8. Summary

From the above cases, it is clear that suggested FFT as transfer function in artificial neural network gives the good results and good accuracy in noisy deterministic and noisy chaotic comparison with usual transfer functions in artificial neural network (see Tables (1) – (27)). Therefore, we can conclude that the FFNN with FFT which we proposed can handle effectively of the noisy asymmetric dynamical maps and provide accurate approximate solution throughout the whole domain and not only at the training set.
The estimation of noisy asymmetric dynamical maps obtained by trained Ann's offer some advantages, such as:
1- Complexity of computations increases with the increase of the number of sampling points in noisy asymmetric dynamical maps (NALM, NALTM and NATLM).
2- The FFNNs with FFT transfer function provides a solution with very good performance function in compare with Logsig and Tagsig traditional transfer functions.
3- The proposed FFNNs with FFT transfer function can be applied to noisy asymmetric dynamical maps.
4- The proposed transfer function FFT gave best rustles especially when the noisy asymmetric dynamical maps is chaotic and high chaotic.
5- In general, the experimental results show that the FFNN side by side FFT transfer function which proposed can handle effectively noisy asymmetric dynamical maps and provide accurate approximate solution throughout the whole domain, because three points the first point neural network computations is parallel the second point FFT analysis of data and computations are parallel and third point FFT transfer function returns differences in data to sources original related homogeneity data and sectors of the work.
Some future works can be recommended. These works is as follows
1- Using networks with three or more hidden layers.
2- Using the architecture feedback neural network.
3- Increase the neurons in each hidden layer.
4- Use anther random variables as noise on asymmetric dynamical maps.
5- Choose initial weights to be distributed as different random variables.

References

[1]  Arar, S. (2017) “An Introduction to the Fast Fourier Transform”, https://www.allaboutcircuits.com/technical-articles/anintroduction-to-the-fast-fourier-transform/ 2017.
[2]  Abid S.H. and Hasan M. H., "About asymmetric noisy chaotic maps ", International Journal of Basic and Applied Sciences, VOL.3, p.62-73, 2014.
[3]  Bailer-Jones, C., MacKay, D. and Withers, P. J. (1998) “A recurrent neural network for modelling dynamical systems,” Network: Computation in Neural Systems, vol. 9, no. 4, pp. 531–547.
[4]  Bakker, R., Schouten, J., Giles, C. Takens, F. and van den Bleek, C. (2000) “Learning chaotic attractors by neural networks”, Neural Computation, vol. 12, no. 10, pp. 2355–2383.
[5]  Berkooz, G., Holmes, P. and Lumley, J. (1993) “The proper orthogonal decomposition in the analysis of turbulent flows,” Annual Review of Fluid Mechanics, vol. 25, no. 1, pp. 539–575.
[6]  Billings, S., Jamaluddin, H. and Chen, S. (1992) “Properties of neural networks with applications to modelling nonlinear dynamical systems,” International Journal of Control, vol. 55, no. 1, pp. 193–224.
[7]  Broomhead, D. and King, G. (1986) “Extracting qualitative dynamics from experimental data,” Physica D: Nonlinear Phenomena, vol. 20, no. 2-3, pp. 217–236.
[8]  Brunton, S. Proctor, J. and Kutz, J. (2016) “Discovering governing equations from data by sparse identification of nonlinear dynamical systems”, Proceedings of the National Academy of Sciences of the United States of America, vol. 113, no. 15, pp. 3932–3937.
[9]  Burrus, C. and Johnson, S. (2012) “Fast Fourier Transforms”, http://cnx.org/content/col10550/1.22.
[10]  Chakraborty, K., Mehrotra, K., Mohan, C. and Ranka, S. (1992) “Forecasting the behavior of multivariate time series using neural networks,” Neural Networks, vol. 5, no. 6, pp. 961–970.
[11]  Chorin, A. and Hald, O. (2009) “Stochastic Tools in Mathematics and Science”, vol. 3 of Surveys and Tutorials in the Applied Mathematical Sciences, Springer.
[12]  Cooly, J. and Tukey, J.W. (1965) “An Algorithm for the Machine Calculation of Complex Fourier Series”, Mathematics of Computation, 19 (90), 297-301.
[13]  Duriez, T., Brunton, S. and Noack, B. (2017) “Machine Learning Control – Taming Nonlinear Dynamics and Turbulence, Springer.
[14]  Elanayar, V. and Shin, Y. (1994) “Radial basis function neural network for approximation and estimation of nonlinear stochastic dynamic systems,” IEEE Transactions on Neural Networks, vol. 5, no. 4, pp. 594–603.
[15]  Elman, J. (1990) “Finding structure in time,” Cognitive Science, vol. 14, no. 2, pp. 179–211.
[16]  Galushkin, I. (2007) "Neural Networks Theory", Berlin Heidelberg.
[17]  Hassani E. and Eshghi, M. (2013) "Image Encryption Based on Chaotic Tent Map in Time and Frequency Domains",The ISC International Journal of Information Security, vol. 5, p. 97.
[18]  Hornik, K. (1991) “Approximation capabilities of multilayer feedforward networks,” Neural Networks, vol. 4, no. 2, pp. 251–257, 1991.
[19]  Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R. and Fei-Fei, L. (2014) “Largescale video classification with convolutional neural networks,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1725–1732, Columbus, OH, USA.
[20]  Koskela, T. Lehtokangas, M., Saarinen, M. and Kaski, K. (1996) “Time series prediction with multilayer perceptron, FIR and Elman neural networks,” in Proceedings of the World Congress on Neural Networks, pp. 491–496, Citeseer.
[21]  Kuschewski, J., Hui, S. and Zak, S. H. (1993) “Application of feedforward neural networks to dynamical system identification and control,” IEEE Transactions on Control Systems Technology, vol. 1, no. 1, pp. 37–49.
[22]  Lin, H., Chen, W. and Tsutsumi, A. (2003) “Long-term prediction of nonlinear hydrodynamics in bubble columns by using artificial neural networks,” Chemical Engineering and Processing: Process Intensification, vol. 42, no. 8-9, pp. 611–620.
[23]  Mahdi, O. and Tawfiq, L. (2015) "Design Suitable Neural Networks to Solve Eigen Value Problems and It′s Application", M.Sc. Thesis, College of Education Ibn AL-Haitham, University of Baghdad, Iraq.
[24]  Mangan, N. Brunton, S., Proctor, J. and Kutz, J. (2016) “Inferring biological networks by sparse identification of nonlinear dynamics”, IEEE Transactions on Molecular, Biological and Multi-Scale Communications, vol. 2, no. 1, pp. 52–63.
[25]  Maqableh, M. (2012) "Analysis and Design Security Primitives Based on Chaotic Systems for Ecommerce", Durham University.
[26]  Miyoshi, T., Ichihashi, H., Okamoto, S. and Hayakawa, T. (1995) “Learning chaotic dynamics in recurrent RBF network,” in Proceedings of ICNN'95 - International Conference on Neural Networks, vol. 1, pp. 588–593, Perth, WA, Australia.
[27]  Mohsen M. M. A. (2017), "Multi-level Security by Using Chaotic Dynamical System”, M.Sc. Thesis, Technical College of Management / Baghdad, Middle Technical University, Iraq.
[28]  MacKay, Neural Computation, Vol. 4, No. 3, pp 415-447, 1992.
[29]  Narendra, K., and Parthasarathy, K. (1990) “Identification and control of dynamical systems using neural networks,” IEEE Transactions on Neural Networks, vol. 1, no. 1, pp. 4–27.
[30]  Narendra, K. and Parthasarathy, K. (1992) “Neural networks and dynamical systems,” International Journal of Approximate Reasoning, vol. 6, no. 2, pp. 109–131.
[31]  Oraibi, Y. and Tawfiq, L. (2013) "Fast Training Algorithms for Feed Forward Neural Networks", Ibn Al-Haitham Jour. for Pure & Appl. Sci., Vol. 26, No. 1, pp.276.
[32]  Paez, T. and Hunter, N. (1997) “Dynamical system modeling via signal reduction and neural network simulation,” Sandia National Labs, Albuquerque, NM (United States).
[33]  Paez, T. and Hunter, N. (2000) “Nonlinear system modeling based on experimental data,” Technical report, Sandia National Labs., Albuquerque, NM (US); Sandia National Labs., Livermore, CA (US).
[34]  Pan, S. and Duraisamy, K. (2018) “Long-Time Predictive Modeling of Nonlinear Dynamical Systems Using Neural Networks”, Hindawi, Complexity, Volume 2018, pp. 1–26.
[35]  Parish, E. and Duraisamy, K. (2016) “Reduced order modeling of turbulent flows using statistical coarse-graining,” in 46th AIAA Fluid Dynamics Conference, Washington, D.C., USA.
[36]  Polycarpou, M. and Ioannou, P. (1991) “Identification and control of nonlinear systems using neural network models: design and stability analysis,” University of Southern California.
[37]  Rubinstein, Y. and Kroese, D. (2007), “Simulation And The Monte Carlo Method”, Wiley, second Edition.
[38]  Russakovsky, O., Deng, J., Su, H. (2015) “Imagenet large scale visual recognition challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252.
[39]  Sato, Y. and Nagaya, S. (1996) “Evolutionary algorithms that generate recurrent neural networks for learning chaos dynamics,” in Proceedings of IEEE International Conference on Evolutionary Computation, pp. 144–149, Nagoya, Japan.
[40]  Shumway, R. and Stoffer, D. (2000) “Time series analysis and its applications,” Studies in Informatics and Control, vol. 9, no. 4, pp. 375-376.
[41]  Smaoui, N. (1997) “Artificial neural network-based low-dimensional model for spatio-temporally varying cellular flames,” Applied Mathematical Modelling, vol. 21, no. 12, pp. 739–748.
[42]  Smaoui, N. (2001) “A model for the unstable manifold of the bursting behavior in the 2D navier–stokes flow,” SIAM Journal on Scientific Computing, vol. 23, no. 3, pp. 824–839.
[43]  Smaoui, N. and Al-Enezi, S. (2004) “Modelling the dynamics of nonlinear partial differential equations using neural networks,” Journal of Computational and Applied Mathematics, vol. 170, no. 1, pp. 27–58.
[44]  Tanaskovic, M., Fagiano, L., Novara, C. and Morari, M. (2017) “Data driven control of nonlinear systems: an on-line direct approach,” Automatica, vol. 75, pp. 1–10.
[45]  Tsung, F. and Cottrell, G. (1995) “Phase-space learning”, in Advances in Neural Information Processing Systems, pp. 481–488, MIT Press.
[46]  Urbina, A., Hunter, N. and Paez, T. (1998) “Characterization of nonlinear dynamic systems using artificial neural networks,” Technical report, Sandia National Labs, Albuquerque, NM (United States).
[47]  Villmann, T., Seiffert, U. and Wismϋller, A. (2004) "Theory and Applications of Neural maps", ESANN2004 PROCEEDINGS - European Symposium on Ann, pp.25 - 38.
[48]  Wang, Z., Xiao, D., Fang, F., Govindan, R., Pain, C. and Guo, Y. (2018) “Model identification of reduced order fluid dynamics systems using deep learning,” International Journal for Numerical Methods in Fluids, vol. 86, no. 4, pp. 255–268.