Fast Artificial Neural Network Library (FANN)
Reference Manual for latest CVS release
This section describes the implementation of the current state of the neural networks on the gpu using GLSL project. It includes both a shader code example and a small network implementation example. The code is still expremental and is not included in the FANN library but is available for the user to try out if he wishes.
The idea of running neural networks on the gpu is to exploit that many shader programs can run in parallell on the gpu. Since a neural network is much about vector*matrix operations the gpu might suit well for this. When the internal structure where designed the MIMO structure were in mind. One vecor of neurons and a matrix of weights together with an activation function is called a Layer. A Layer produces an output and holds a pointer to an input vector of another Layer. The complete network is constructed of any number of such Layers with generic connections between them. When the network is constructed it must be traversed in the correct order of Layers so that all input of Layer A are computed before the output of Layer A can be computed. The computation of an output is done by a call to a function: run(Layer * layer).
Each Layer struct holds all variables such as input vector and weight matrix that are represented as textures. It also holds a shader program (fragment program) that is the core of the Layer. The textures are inputs to this program and it renders an output to the output texture, which is input texture in another layer and so on. It is NOT possible to let the output and input of one layer to be the same. This should in theory allow for recurrent network but with the drawback of no immediate self loops are possible.
The values on the textures are 32bit floating point variables and they must be stored in a float vector to be copied onto the textures. For the weight matrices there are currently an issue that must be handleed by the user. The width of the matrix must be a multiple of 4 (see shader topic) even though the actual size can be of an arbitrary size. ex. if the size is 6x2 then float vector must be
the 2’s end up used in the shader while the 0’s dont. The function CopyWeights(Mask)ToTexture handles this conversion automaticly.
The implementation is mainly tested in a windows environment using an NVIDIA card. This leave alot to be tested on other platforms and other cards. Also due to texture size limitations the maximum size of a vector is 4000 neurons.
To exploit the fact that the shader can perform operations on vectors of size 4 using one instruction the values on a texture are stored in the rgba channels giving 4 values per texel. Each shader can then calculate 4 values instead of 1. This handles internally but it have some minor drawbacks like layers can only be offsetted done by multiples of 4.
Since the both input and output sizes as well as the offset are known when the network is created and the shader source are loaded these values are added to the source code as #defines just before compile. These allowes the shader compiler to work with precalculated constants instead of performing the calculations in runtime. This offsets should be apendend at the top of the shader source but is appended at the first occurence of the ¤ character. This allows the user to put his own defines above this character when debugged with another compiler. Below is an shader example of computing neuron potential and activates it using the sigmoid function.
To implement another activation function just simply modify the vector sum in the desired way. sum is a vec4 (vector of size 4) but can be treated as a single value. The operations will execute on all values in the vector.
vec4 sigmoid = (1.0/(1.0 + exp(-2.0 * sum))); // ACTIVATION FUNCTION
Windows XP NVIDIA cards
Linux Mac OS ATI cards Other Windows OS except Vista.
It doesnt work under Windows Vista due to poor OpenGL support. Might change in the future.
First the OpenGL context must be initialized. This is done by the call
Then it is wise to test the system and check if it the lib can run on it. test() returns a char* with info on what went wrong. If all is ok, it returns 0.
if ((error = test()) != 0)
If the system passes the test it must be initialized. i.e. all used external function pointers will be set up and other variables are given values. init() returns 1 if successful and 0 if it fails.
To create a standard 2 layer feed forward network actually 3 layers are needed.
Each layer need to know what shader program it shall run, the “sigmoid_sum.fp” program sums the weighted input and activates it using the sigmoid function. Also no offset are needed.
layer *A, *B, *C
Connect the layer in the desired order. Each layer gets an input vector on creation. This is user to forge the network to the desired shape.
setOutput(A, B); //sets the (initially empty) output vector pointer in A to point to the input of B
Then fill each of the layer weights with data. It is assumed that float arrays named weight_matrixA and weight_matrixB contain the weights.
//copy the weights to textures on the layers
Let the input to the net be stored in a float array named input. And copy the data to the input vector of layer A
Execute the net is done bu runneing the layers in the network in the correct order. This must be controlled by the user to allow for more complex structures. But in this example it is easy.
To get back the output from the last layer © either use
that stores the output of layer C to a float array output. Or use:
to print the output to stdout.
The following code creates a structure of layers: A-E are layers. B / \ A D-E \ / C A - 3 input, 36 output B - 36 input, 16 output C - 36 input, 22 output offsetted by 16 to append the 16 of layer B D - 38 input the sum of B and C, 5 output E - 5 input
note that in each layer an arbitrary number of neurons can be placed (max 4000)