Use the general command NEURAL to enter the neural network module.
The network will probably be used by people intererested in QSAR techniques (structure activity relations for drugs). In principle the network should be able to replace the classical QSAR modules, but in practice I think it will only be useful to use QSAR and the network in parallel.
Neural networks are normally used to do pattern recognition. They are often useful to detect hidden correlations between data. In the special case of QSAR problems a neural network should in principle be capable to find correlations between the parameters for the variable active groups.
Most neural networks accept bitpatterns as input. The WHAT IF neural network however, expects real numbers as input. It is of course easy to see that this greatly enhances the flexibility of this network. In practice, the neural network will find a two till three fold smaller deviation between the observed and calculated binding constants than classical QSAR methods for 90 till 95 percent of all data points. However, for the other 5 till 10 percent of the data points the correlation is five till ten times worse. Much experiments still have to be performed, but it looks to me that this is a good way of detecting outlyers in the data set.
Most neural networks suffer badly from the multiple minimum problem. The WHAT IF neural network uses a optimization scheme that uses random neuron alterations. This ensures that given sufficient CPU time, also the global minimum can be found.
The training phase can be very CPU intensive, the testing phase however, is blastingly fast.
Please read some literature about neural networks. Especially about size of the dataset and the corresponding network architecture. If there are not enough neurons in the net, the network will not generalize and errors will be larger than needed. If there are to many, the network will get over-trained, and the predictions will become random. I suggest you start with 2 times more neurons as there are variables in your dataset. I also suggest that you do not use too many hidden layers, one or sometimes two, will almost always work fine.
Of course you can use the network for other purposes, the limitation is than that you need N (N less than 50) reals as input, and one real as output. I have personally also used it for secondary structure prediction, but this took very, very much CPU time.
Leave WHAT IF.
Resart WHAT IF.
Type:
neural exampl getset TRAIN.NEU netwrk 2 5 2.5 5.0 train 200 shoset grafic end scater grafic goYou now see the results graphically. You can rotate/translate it etc. Click on CHAT, because it is now time to USE the net to predict values with the neural net.
Below you see a dataset that has the answers given. The file without the answers is called TEST.NEU. So, type:
end getset TEST.NEU shosetWith `neural` you went to the neural network menu. The `exampl` command copied a training dataset, called TRAIN.NEU, that can be aproximated with a non-linear function. With the `getset` command you read this dataset in. There are 30 data points. With 'netwrk, 2, 5, 2.5, 5.0` you created a network architecture consisting of 2 hidden layers of 5 nodes each. WHAT IF will try to keep the values of the junctions between -2.5 and 2.5, but junctions outside -5.0, 5.0 are forbidden. With `train` and `200` you told WHAT IF to do 200 rounds of network optimization. This will take a couple of minutes on an INDIGO workstation. You will see the error converge around a value of 0.20. That is a little bit bigger than the error that I put into this dataset (0.14). (Try more and wider hidden layers overnight, and you will see that the error can get smaller. This is called over-training. The network learns the data by heart, rather than that it extracts the hidden correlations). The `shoset` command gives two sets of output the first half shows the input values, the observed results, the calculated results, and the error in the calculated results. The second half also displays the tolerance of the net (see below). The little excursion with `grafic` and `end` is needed to initialize the graphics window. The command scater (scatter, which is better english is acceptable too), will make a scatter plot in which the data points are green and the calculated values red. The size of the cross is a measure for the error. The second shoset command does the same as the first, but now the errors are of course irrelevant. You should just look at the calculated answers. The true answers are given below. If you were to take the trouble of calculating the RMS between the expected and calculated values in the test set, you would probably find an RMS around 0.7. That nicely indicates one of the problems of neural nets. They are black boxes, very deep-black black boxes.....
1.823 1.311 3.633 0.424 0.140 0.549 0.906 1.296 2.603 0.129 0.690 0.605 1.472 0.419 1.728 1.013 0.226 1.155 1.202 0.733 1.836 0.409 1.550 2.984 0.681 1.092 2.003 1.511 1.764 4.697 1.397 1.096 2.740 1.462 1.560 3.916 1.772 0.221 1.949 0.146 0.777 0.907 0.871 1.240 2.530 0.959 0.482 1.267 0.274 0.907 1.185 0.453 1.726 3.545 1.355 0.504 1.620 0.782 0.658 1.283 1.076 1.002 2.194 0.515 0.201 0.712 1.666 0.574 2.175 0.140 0.430 0.330 1.565 0.476 1.839 0.778 1.875 4.439 1.266 0.920 2.299 1.222 1.545 3.663 0.473 0.609 0.874 1.982 0.616 2.367
If you have too many neurons in the network you will run into the over-training problem. That means that the network will not determine general rules, but rather will learn your data by heart. If you want to circumvent this, you should not take too many training steps, or use fewer neurons, but it is impossible to determine the optimum. A good, but time consuming, way of checking that you have the correct architecture and training length is the jack-knife method. That means, take out all data points one after the other, train the network with all but this one data point, and for each training run determine at the end the error in the prediction of the output parameter (binding constant in QSAR) for the one value that was removed from the data set.
If there are datapoints that you trust more than others, you can make this clear to WHAT IF by putting that data point multiple times in the input data.
To start a training procedure, you use the command TRAIN. You will be prompted for the number of rounds. I suggest that you start with one round to get an impression about the amount of CPU time needed. As WHAT IF trains the net incrementally, no training round will ever get lost.
Use the SHOSET command to see the progress of the training.
WARNING. Strange things will happen if the network architecture and the data set do not belong together!
The process of slowly decreasing the step size in monte carlo like procedures like the one chosen to optimize the WHAT IF network, is often called simulated anealing.
A hard and a soft limit for the neurons will also be listed. During the training phase WHAT tries to keep the values for the neurons between plus and minus the soft limit. However, as soon as the absolute value of a neuron exceeds the hard boundary a reset will be done for this neuron; even if this makes the whole performance worse. If this happens, you should increase the hard and soft boundary. Be aware that the product of the width of the hidden layers and the hard limit should have at least the same order of magnitude as the expected values at the output unit.