Register    Login    Forum    Search    FAQ

Board index » FANN » Enhancing the C library




Post new topic Reply to topic  [ 10 posts ] 
Author Message
 Post subject: Vectorizing FANN
 Post Posted: Tue Mar 06, 2007 10:06 pm 
Offline
Site Admin
User avatar

Joined: Tue Mar 06, 2007 10:03 pm
Posts: 134
Location: Corn Desert (IL, USA)
(Message originally posted to the sourceforge mailing list, copied here for completeness)

I haven't had the time to work on it in the last week, but I still intend on doing so. I thought I'd update people with what I've been thinking. First, I would like to create a test suite I can use for regression testing. I've added a method for setting the seed, so in the future people can run regression tests without having to define FANN_NO_SEED. You can test the same library you've compiled for use. Makes sense to me. I'm still working on this part.


Re: vectorizing while maintaining code readability (can I interest you in teraflops?):

The most difficult part of writing vector code is creating the proper memory structures to operate on. In the case of AltiVec, you can (and I have) automatically vectorize code. But if you look at the result, the processor will often spend much of its time translating to and from the correct memory structures.

A quick example: I vectorized some FORTRAN code and got a 1.5x speed improvement on AltiVec. Not terrible, but it should be better. So I looked at the code that was generated by the auto-vectorizer. The vector processor spent roughly *half* its time translating from memory that wasn't 16 byte aligned. The problem gets worse when you attempt to operate on a vector of values (like neuron sums, activations, or weights) that are stored in structures with unused data surrounding each value.

The conclusion here is that the most important part of creating vector code is creating the correct memory structure.

I've decided the best way to do this is to create macros for loading/setting weights and other neuron values. The macros used can be chosen at compile time along with the processing method used. Something like these:
fann_neuron_wieght_store()
fann_neuron_wieght_load()
fann_neuron_sum_store()
fann_neuron_sum_load()
...

They will be customized to use data structures that are optimized for fann_run(), so the utility functions that operate on the nets will be a bit slower, but fann_run() will be more adjustable. You've probably heard the 90%/10% rule, where most programs spend 90% of their time in 10% of the code. These macros would be used in the code where the processor spends the least amount of time (90% of the code), and the code would be maintainable. This will allow people to create custom, fast, solutions for fann_run().

Therefore, someone can create data structures that could be used with scalar, AltiVec, SSE, and even GPU processors. The possibility of using a GPU is particularly interesting, because it opens FANN up to operating on the scale of *teraflops* with consumer hardware. See also this recent slashdot story and associated links:
http://hardware.slashdot.org/article.pl ... 01/1519254

I've been doing a bunch of reading, and I believe that this is quite doable once these macros are in place (see also: http://www.gpgpu.org/ ). I would think that the ability to easily use massively parallel systems will open up new areas of ANN usage. (All this would probably also work with the Cell processor, but I'm not sure if that would be useful to users of libfann.)

To give people an idea of the speedups I'm talking about:
- Scalar
I expect this to run about the same speed as the original code. Packing neurons into a struct of arrays instead of an array of structs might speed the code up because of better cache usage. But using macros for neuron access in 90% of the code will slow things down a tad.

- AltiVec
In the least, it can operate on four floats at the same time, which yields a ~400% speed improvement. Lucky for PowerPC users, AltiVec also has hardware inverse, inverse exp(), inverse sqrt(), and some other useful functions. A patch for FANN v1.2 is optimized for AltiVec, and it was "between 5 and 20 (36 in one case!) times as fast". That's 500%-2000% speedup in real world tests.

- SSE
SSE is similar to AltiVec, but is missing some hardware features (like the inverse exp()). I would guess that SSE would end up closer to a ~400% speedup.

- GPU
This is where things really get interesting, and more so in the near future. Most modern GPUs have a texture processor that can do calculations using floating point numbers. They operate on vectors of 4 floating point values, and they can operate on many vectors simultaneously (8-32 pipelines?). As with all vector processors, their greatest speedup will occur with many interconnects between layers.

Imagine you want to use a 32x32 pixel display as an input to your net. That's 1024 input neurons. Two fully connected 1024 neuron layers will have 1.04 million interconnects (weights). As currently programmed, it would take 1.04 million passes to calculate the resulting sums. A GPU would treat the weights as a 1024x256 pixel texture and the neuron sums as a 256x1 pixel texture. It could operate on 128 weights at a time in floating point, resulting in 12800% speedup (theoretically).

The same number of interconnects are used if the input is 64x64 pixels and your hidden layer is 256 neurons. Or the input is ~74x74 RGB pixels and the hidden layer is 64 neurons. By my calculations that's enough processing power to simulate a fruit fly brain in realtime. ;)


Thoughts? Misunderstandings? Miscalculations? If I worked on getting the scalar macros working, is there a graphics programmer out there who would get it running on GPUs (OpenGL)? Papers on working with massive ANNs?

Apologies if I'm confused on the GPU details. IANAGP
(I Am Not A GPU Programmer)
~Seth


Appendix A:
Fruit fly brain calculations
100,000 neurons total
* 1024 connections per neuron
* 100 operations per connection calculation
* 100 neuron firing events per second
~= 1 teraflops

http://en.wikipedia.org/wiki/List_of_an ... of_neurons


Top 
 Profile  
 
 Post subject: Google Summer of Code
 Post Posted: Tue Mar 06, 2007 10:22 pm 
Offline
User avatar

Joined: Tue Mar 06, 2007 7:24 pm
Posts: 264
Location: Copenhagen, Denmark
I can see that the mentors can sign up for Google Summer of Code now, I will make sure to make an idea page and all that very soon, so that Vincenzo and perhaps other can sign up as students.

_________________
Steffen Nissen - http://facebook.com/profile.php?id=595485027
Project Administrator - Fast Artificial Neural Network Library (FANN)
http://leenissen.dk/fann/


Top 
 Profile  
 
 Post subject:
 Post Posted: Tue Mar 06, 2007 10:28 pm 
Offline
Site Admin
User avatar

Joined: Tue Mar 06, 2007 10:03 pm
Posts: 134
Location: Corn Desert (IL, USA)
Is there anything that I should do to sign up as a (backup) mentor? Or is the majority of the work on you because you are the head of the libfann project?


Top 
 Profile  
 
 Post subject: No
 Post Posted: Tue Mar 06, 2007 10:30 pm 
Offline
User avatar

Joined: Tue Mar 06, 2007 7:24 pm
Posts: 264
Location: Copenhagen, Denmark
It doesn't seem like it, the registration form is here:
http://code.google.com/soc/org_signup.html

_________________
Steffen Nissen - http://facebook.com/profile.php?id=595485027
Project Administrator - Fast Artificial Neural Network Library (FANN)
http://leenissen.dk/fann/


Top 
 Profile  
 
 Post subject:
 Post Posted: Tue Mar 06, 2007 10:33 pm 
Offline
User avatar

Joined: Tue Mar 06, 2007 7:24 pm
Posts: 264
Location: Copenhagen, Denmark
I can see that you will need to have a google account, so I will need to know your google account email.

_________________
Steffen Nissen - http://facebook.com/profile.php?id=595485027
Project Administrator - Fast Artificial Neural Network Library (FANN)
http://leenissen.dk/fann/


Top 
 Profile  
 
 Post subject:
 Post Posted: Tue Mar 06, 2007 10:51 pm 
Offline
Site Admin
User avatar

Joined: Tue Mar 06, 2007 10:03 pm
Posts: 134
Location: Corn Desert (IL, USA)
I have a google account name "seth@pricepages.org"


Top 
 Profile  
 
 Post subject:
 Post Posted: Mon Mar 26, 2007 2:58 am 
Offline

Joined: Mon Mar 26, 2007 2:43 am
Posts: 2
Hello.

I'm an SoC prospective, and I was interested in vectorizing FANN. Seth's research and post seems to have covered everything which, unfortunately, makes it difficult for me to show that I've researched this on my own. I'm mainly interested in working with Intel's SSE (since AMD also supports SSE) and nVidia's CUDA (it'll be a reason to get a new GeForce 8).

If there's time, vectorization would naturally lead to parallelization, which could be starting point for multi-threading.

I'm also wondering how much of a learning curve you would tolerate. I've manually optimized assembly in three classes (MIPS and x86), but have never used FANN and never taken any class on neural networks. Do you think I would be a suitable candidate?

Thanks.

Brian


Top 
 Profile  
 
 Post subject:
 Post Posted: Mon Mar 26, 2007 12:43 pm 
Offline
User avatar

Joined: Tue Mar 06, 2007 7:24 pm
Posts: 264
Location: Copenhagen, Denmark
We have already half accepted an application for this particular project, but you are still welcome to send in an application.

You do however need to be very specific about the alterations that you plan to make, in order to come in consideration.

_________________
Steffen Nissen - http://facebook.com/profile.php?id=595485027
Project Administrator - Fast Artificial Neural Network Library (FANN)
http://leenissen.dk/fann/


Top 
 Profile  
 
 Post subject: Vectorizing FANN
 Post Posted: Thu Mar 29, 2007 9:02 pm 
Offline

Joined: Thu Mar 29, 2007 8:55 pm
Posts: 1
You might find it interesting to take a look at how the Quicknet3 multi-layer perceptron library is implemented. It does vectorized MLP training using ATLAS BLAS. It can also do multi-threaded training to gain a speedup from multiple core or hyperthreaded systems. The library can be downloaded at
http://www.icsi.berkeley.edu/Speech/qn.html


Top 
 Profile  
 
 Post subject:
 Post Posted: Thu Mar 29, 2007 9:49 pm 
Offline

Joined: Mon Mar 26, 2007 2:43 am
Posts: 2
Steffen Nissen wrote:
We have already half accepted an application for this particular project, but you are still welcome to send in an application.

You do however need to be very specific about the alterations that you plan to make, in order to come in consideration.


I decided not to submit an application, if you haven't already noticed. Thanks for your quick reply earlier, and good luck selecting your mentees this year.


Top 
 Profile  
 
Display posts from previous:  Sort by  
 
Post new topic Reply to topic  [ 10 posts ] 

Board index » FANN » Enhancing the C library


Who is online

Users browsing this forum: No registered users and 0 guests

 
 

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron