EXPERIMENTAL RESULTS (see [4] for details)

**EXPERIMENT 1 - noisy classification.**
The first experiment is taken from
Pearlmutter and Rosenfeld
[9].
The task is to decide whether
the -coordinate of a point in 2-dimensional space
exceeds zero (class 1) or does not (class 2).
Noisy training examples are generated as follows:
data points are obtained from a Gaussian with
zero mean and stdev 1.0, bounded in the interval .
The data points are misclassified
with a probability of .
Final input data is obtained by
adding a zero mean Gaussian with stdev 0.15 to the data points.
In a test with 2,000,000 data points,
it was found that the procedure above leads
to 9.27 per cent misclassified data.
*No* method will misclassify less
than 9.27 per cent, due to the
inherent noise in the data.
The training set is based on 200 fixed data points.
The test set
is based on 120,000 data points.

**Results.**
10 conventional backprop (BP) nets were tested
against 10 equally initialized networks based on our new method
(``flat minima search'', FMS).
*After 1,000 epochs, the weights of our nets essentially stopped changing
(automatic ``early stopping''),
while backprop kept changing weights to learn the outliers in the
data set and overfit.*
In the end, our approach left
a single hidden unit with a maximal weight of or
from the x-axis input. Unlike with backprop,
the other hidden units were effectively pruned away
(outputs near zero).
So was the y-axis input (zero weight to ).
It can be shown that this corresponds to an ``optimal'' net
with minimal numbers of units and weights.
Table 1 illustrates the superior performance of our approach.

**EXPERIMENT 2 - recurrent nets.**
The method works for continually running
fully recurrent nets as well.
At every time step,
a recurrent net with
sigmoid activations in
sees an input vector from a stream
of randomly chosen input vectors from the set
.
The task is to switch on the first output unit whenever
an input had occurred two time steps ago,
and to switch on the second output unit without delay
in response to any input .
The task can be solved by a single hidden unit.

**Results.**
With conventional recurrent net algorithms,
after training,
both hidden units were used to store the input vector.
Not so with our new approach.
We trained 20 networks. All of them
learned perfect solutions.
Like with weight decay,
most weights to the output decayed to zero.
But *unlike* with weight decay,
**strong inhibitory** connections (-30.0) switched off
one of the hidden units, effectively
pruning it away.

**EXPERIMENT 3 - stock market prediction.**
We predict the DAX (German stock market index) based on
fundamental (experiments 3.1 and 3.2) and technical (experiment 3.3)
indicators. We use strictly layered feedforward nets with
sigmoid units active in [-1,1], and the following performance
measures:

*Confidence:* output
positive tendency,
negative tendency.
*Performance:*
Sum of confidently, incorrectly predicted DAX changes
is subtracted from
sum of confidently, correctly predicted ones.
The result is divided by the sum of absolute changes.

EXPERIMENT 3.1:
Fundamental inputs:
(a) German interest rate (*``Umlaufsrendite''*),
(b) industrial production divided by money supply, (c) business
sentiments (*``IFO Geschäftsklimaindex''*).
24 training examples, 68 test examples,
quarterly prediction,
confidence: 0.0/0.6/0.9,
architecture: (3-8-1).

EXPERIMENT 3.2:
Fundamental inputs:
(a), (b), (c) as in exp. 3.1,
(d) dividend rate,
(e) foreign orders in manufacturing industry.
228 training examples, 100 test examples,
monthly prediction,
confidence: 0.0/0.6/0.8,
architecture: (5-8-1).

EXPERIMENT 3.3:
Technical inputs:
(a) 8 most recent DAX-changes,
(b) DAX, (c)
change of 24-week relative strength index
(``RSI''),
(d) difference of ``5 week statistic'',
(e) ``MACD'' (difference of exponentially
weighted 6 week and 24 week DAX).
320 training examples, 100 test examples,
weekly predictions,
confidence: 0.0/0.2/0.4,
architecture: (12-9-1).

The following methods
are tested:
(1) Conventional backprop (BP),
(2) optimal brain surgeon (OBS [2]),
(3) weight decay (WD []),
(4) flat minima search (FMS).

**Results.**
Our method clearly outperforms the other methods.
FMS is up to 63 per cent better than the best
competitor (see [4] for details).

Back to Financial Forecasting page