Report Assignment 2 ANN

Background and intentions

This assignment is about perceptrons and the backpropagation algorithm. First there are two theoretical questions and then two implementations.

For the implementations I have used Java again. My programs will hopefully produce answers "online". Because of this I have elected not to show a lot of diagrams for different learning rates, momentum etc. If you want me to, I guess I could generate them from my program and process the data in MatLab or (hush) Excel.

I also want my programs to be "instructional" i.e. they should show how MLP works, and for instance how the hidden neurons contribute to the solution.

Task 1. Boolean functions

The question is how many of the 256 Boolean functions with three variables that are linearly separable (can be handled by a perceptron net without hidden units).


For 1 input there are 4 functions, all of them linearly separable.
For 2 inputs there are 16 functions. 14 are linearly separable. The other two are XOR and XNOR.
For 3 inputs I think there are 104 linearly separable functions.

To realize this I try to think like:
First we have 8 ways to pick one corner. Then 12 ways to pick two corners that are linearly separable (the edges). Three corners 24 ways (6 sides * 3 corners from 4) and four corners 6 ways (the sides). Then we can multiply this with 2 because of the "inverses". So 2*(8+12+24+6)=2*50=100. But then I almost forgot that I could pick 0 or 8 corners as well. So the final number is 100+4=104.

What happens when the number of inputs approaches infinity? Well I must at least make a guess. A good guess is normally a member of the set {0, pi, sqrt(3)/2} ;-)) and since I don't believe in the latter two my guess is 0.

Seriously, I started by looking at the ratios:
4/4=1, 14/16=0.875 and 104/256=0.40625. This is of course no proof, but maybe it supports my guess a little...
For the next step I tried to find this problem in some book. Rojas discusses the topic in his book "Neural Networks- A systematic introduction" (Springer 1996) and says:

"Although there has been extensive research on linearly separable functions in recent years, no formula for expressing the number of linearly separable functions as a function of n, [number of inputs] has yet been found. However we will provide some upper bounds for this number..."

But if there is an upper bound to the number of linearly separable functions then the ratio should in fact be 0 - and my guess correct! 

Task 2. A manually constructed feed-forward network

This network should solve the 3-parity problem.

The idea is that the hidden units detect one, two and three 1's respectively. The odd numbers (one and three) are then positive and the even number (two) is negative to the output unit. (This network uses binary inputs.)

Task 3

I should implement the backpropagation algorithm and apply it to the 3-parity problem. My program uses all combinations to train from. After the training it normally classifies each combination correctly.

Results

The resulting weights from the hidden layer to the output, sometimes behaves like my manually constructed net. The difference is only the order or a factor. The net uses many epochs to converge. This is due to that fact that I require a MSE less than 0.01. It is interesting to note that the net (normally) is very certain that 000 should not have a high output. It is also very good at determining the mixed patterns (001, 010, 011, 100, 101, 110). If it makes a mistake it is almost always for 111. Even when categorized correctly this pattern has a much smaller value than the other "high" patterns. So, the net is really counting the 1's.

Instruction for the applet

When you start the applet you will see a picture of the net. The weights are the numbers inside the squares and to the right of the rightmost node. The thresholds are the numbers inside the nodes. The color of the weight shows you to which hidden node it belongs. The start button will randomly initialize the weights and start training. After the training is complete the resulting weights are shown and you can use the scrollbar to cycle through the different patterns (displayed in white). The result (output) is then shown to the right.

Run the applet

Task 4

Here I use backpropagation to classify bitstrings. The length of the strings is 10 bits. The program uses a training set, a validation set and a test set. Each set is randomly generated and the program makes sure that a certain string only shows up in one of the sets. The size of these sets can be varied, as can the number of hidden units. My net may reject to classify a certain string. This is done for a result between 0.2 and 0.8.

Results

I get very similar results no matter the values for all parameters (within reasonable limits). My net classifies about 90% correctly. The difference between the fully trained net and the net that stopped its' training when the validation error is minimal is normally just that the latter rejects more and therefore miss-classifies fewer.

Not even the number of hidden units (if more than 3) affects the results a lot. I can't really explain this and therefore welcome any comments/explanations.

Instruction for the applet

When you start the applet you will see a blank screen and a bar where you can adjust the settings. The first three are the number of examples in the sets (training, validation and test). These must each be below 1000 and together below 1024. After that you may set the number of hidden units (between 1 and 10). E and A means learning rate and momentum respectively. The number of epochs determines how many times the training set is showed to the net. Finally there are two buttons to start and to leave the program.

When the program is started the number of completed epochs are shown on the screen. After completed training the results are displayed. The first is for complete training (the number of epochs set by the user) the second is for the weights with the lowest validation error during the training.

Run the applet