Development of a Dynamically Extendable SpiNNaker Chip Computing Module

Development of a Dynamically ExtendableSpiNNaker Chip Computing Module

Rui Araujo, Nicolai Waniek, and Jorg Conradt

Technische Universitat Munchen, Neuroscientific System Theory,Karlstraße 45, 80333 Munchen, Germany

{rui.araujo,nicolai.waniek,conradt}@tum.de

http://www.nst.ei.tum.de

Abstract. The SpiNNaker neural computing project has created a hard-ware architecture capable of scaling up to a system with more thana million embedded cores, in order to simulate more than one billionspiking neurons in biological real time. The heart of this system is theSpiNNaker chip, a multi-processor System-on-Chip with a high level ofinterconnectivity between its processing units. Here we present a Dy-namically Extendable SpiNNaker Chip Computing Module that allowsa SpiNNaker machine to be deployed on small mobile robots. A non-neural application, the simulation of the movement of a flock of birds,was developed to demonstrate the general purpose capabilities of thisnew platform. The developed SpiNNaker machine allows the simulationof up to one million spiking neurons in real time with a single SpiNNakerchip and is scalable up to 256 computing nodes in its current state.

Keywords: mobile robotics, parallel computing module, spiking neuralnetworks simulations, SpiNNaker

1 Introduction

Abstract processes carried out in the biological brain are still one of the greatchallenges for computational neuroscience. Despite an increasing amount of ex-perimental data and knowledge of individual components, we have only insuf-ficient understanding of the operations of intermediate levels. However, thoseare believed to be the key elements in constructing thoughts and in processinginformation [5].

Large-scale neural network simulations will help increasing our knowledgeabout these intermediate levels. However, the huge amount of neurons in thehuman brain [6] and their connectivities [2] are difficult if not impossible tosimulate. General purpose digital hardware is ill suited to perform these simu-lations, as they are unable to cope with the massive parallelism carried out inthe brain. A possible approach is the usage of neuromorphic systems such asthe BrainScales [7] or the Neuro-Grid hardware [3] which emulate the neuralnetwork with a physical implementation of the individuals neurons.

The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-11179-7_103

1

https://www.researchgate.net/publication/220852624_Biologically-Inspired_Massively-Parallel_Architectures_-_Computing_Beyond_a_Million_Processors?el=1_x_8&enrichId=rgreq-14c487a4f9437e104705f96cb2bc1bc3-XXX&enrichSource=Y292ZXJQYWdlOzI2NzAyOTUwODtBUzoxODU4NzI1MDY5NTc4MjhAMTQyMTMyNjg2NzcxNw==

https://www.researchgate.net/publication/271320830_Total_Number_of_Synapses_in_the_Adult_Human_Neocortex?el=1_x_8&enrichId=rgreq-14c487a4f9437e104705f96cb2bc1bc3-XXX&enrichSource=Y292ZXJQYWdlOzI2NzAyOTUwODtBUzoxODU4NzI1MDY5NTc4MjhAMTQyMTMyNjg2NzcxNw==

https://www.researchgate.net/publication/252178274_Understanding_Brain_Development_in_Young_Children?el=1_x_8&enrichId=rgreq-14c487a4f9437e104705f96cb2bc1bc3-XXX&enrichSource=Y292ZXJQYWdlOzI2NzAyOTUwODtBUzoxODU4NzI1MDY5NTc4MjhAMTQyMTMyNjg2NzcxNw==

https://www.researchgate.net/publication/260869460_Silicon_Neurons_That_Compute?el=1_x_8&enrichId=rgreq-14c487a4f9437e104705f96cb2bc1bc3-XXX&enrichSource=Y292ZXJQYWdlOzI2NzAyOTUwODtBUzoxODU4NzI1MDY5NTc4MjhAMTQyMTMyNjg2NzcxNw==

2 Development of a SpiNNaker Chip Computing Module

Another possible approach is the SpiNNaker system, which is a massively-parallel computer architecture based on a multi-processor System-on-Chip (MP-SoC) technology that can scale up to a million cores. It is capable of simulatingup to a billion spiking neurons in biological real time with realistic levels ofinterconnectivity between the neurons.

The SpiNNaker system was motivated by the attempt to understand andstudy biological computing structures. Their high level of parallelism with fru-gal amounts of energy are opposed to traditional electronics designs, which weremostly driven by serial throughput just until recently. The biological approachto the design such many-cores architecture also brings new concerns in termsof fault-tolerance computation and efficiency. The SpiNNaker chip, the basicbuilding block of a SpiNNaker machine, relies on smaller processors than othermachines but in larger number, it has 18 highly efficient embedded ARM pro-cessors that allow the SpiNNaker system to be competitive according to twometrics, MIPS/mm2 and MIPS/W.

Fig. 1. The SpiNNaker Chip Computing Module Dataflow Architecture.

The currently available machines with SpiNNaker chips are relatively large,the minimum size at this moment is 105× 95mm, which limits their deploymenton systems with limited size as for example, small mobile robots, specially flyingones due to very strict weight and space constraints. Additionally the SpiNNakersystems currently require a workstation, usually a desktop or a laptop, connected


2

Development of a SpiNNaker Chip Computing Module 3

through an Ethernet connection to bootstrap the system every time it powerson and to feed the processing data into the system. This requirement seriouslylimits the independence and deployment capabilities of systems with embeddedSpiNNaker chips. At present, it necessary to add an wireless router in order tohave a mobile system with a SpiNNaker machine [4]. The drawbacks from thisapproach are fairly obvious, such as increased power consumption and spacerequirements since the typical wireless router consumes around 4 to 5 watts andeven though there are fairly small models available at the market it would stilltake up some space.

Furthermore the amount of extensibility provided by standard SpiNNakermachines is very limited since it only allows increases of computing power infixed amounts. The current single board SpiNNaker machines are available intwo versions, one with four chips and another with forty eight. These are wildlydifferent amounts of processing capability which make it difficult to create in-termediate solutions. It would be interesting to have the capability of selectinghow many SpiNNaker chips one needs to deploy without having to design newhardware.

The current requirements of the SpiNNaker architecture are not suitable for alot of applications where its processing power and capabilities would be helpful.It is then necessary to design a new solution that it can overcome the limitationsof the present options.

2 SpiNNaker Chip Computing Module

Traditional SpiNNaker systems rely on the host system to provide the input data.This limits the possibilities of interfacing the SpiNNaker chip with the externalenvironment. In order to overcome this restriction, we added a microcontrollerwhich is responsible for communicating with the outside world. Figure 1 presentsthe general architecture for the developed solution which involves: 1) the designof a PCB with a single SpiNNaker chip and a microcontroller, 2) the softwareto drive the microcontroller, and 3) an application that runs on the host systemto communicate with the board. We strived to keep the hardware as simple andsmall as possible to achieve one of our main goals: portability. Therefore, oursolution consists only of the necessary components.

Several requirements determined the usage of our system. For instance, back-wards compatibility with the standard tools used for the bigger SpiNNaker ma-chines, ybug and tubotron, was an important aspect that lead to the creationof the host system application. Having backwards compatibility allows usersof our novel system to simply apply their already established know-how aboutSpiNNaker and its tools.

Adding the microcontroller had various benefits. We could replace the Ether-net connection with a simpler, albeit slower, universal asynchronous receiver/transmitter(UART) at a baud rate of 12 MBps, which roughly translates into 1.2 Megabytes/seach way. Additionally, the microcontroller can boot the SpiNNaker chip withouta host system as it is capable of storing the previously transmitted image. The


3

https://www.researchgate.net/publication/278717782_Real-Time_Interface_Board_for_Closed-Loop_Robotic_Tasks_on_the_SpiNNaker_Neural_Computing_System?el=1_x_8&enrichId=rgreq-14c487a4f9437e104705f96cb2bc1bc3-XXX&enrichSource=Y292ZXJQYWdlOzI2NzAyOTUwODtBUzoxODU4NzI1MDY5NTc4MjhAMTQyMTMyNjg2NzcxNw==


Microcontroller

SpiNNaker ChipPrototype Debug I/O

SpiNNLink Port (optional)

Fig. 2. The SpiNNaker Chip Computing Module.

microcontroller and the application in the workstation act jointly to pretendthey are another SpiNNaker chip which is connected to the Ethernet, havingtheir own point to point (P2P) address and position in the grid. SpiNNaker ma-chines are a two dimensional grid of nodes, where a node is a SpiNNaker chip,each with its own P2P address, a 16 bit number.

2.1 Evaluation

We took measurements to characterize our computing module. The main ob-jective was to determine the input and output capacity in terms of SpiNNakerpackets since higher level protocols rely on these. Both cores ran at 180 MHzduring all tests. The methodology used for these procedures was as follows:

– change the source code to set and clear a GPIO pin around the action to bemeasured.

– use the switch present in the prototype board to trigger the transmission ofa packet.

– using an oscilloscope, measure the time taken between the changes in stateof the previously selected GPIO pin.

The results are compiled in Table 1. The first two lines represent the timetaken to transmit a packet, including the calculation of the parity bit and conse-quential addition to the header, while the following two lines represent the timetaken only during the symbol transmission. There is no such difference for theinput side since it is only reading symbols and placing them at the queue in thecorrect position.

It is possible to calculate the symbol transmission capacity with these results.Each symbol takes around 90 ns to be transmitted, which translates into a 11MSymbol transmission rate. The input capacity, around 43 % of the transmissionrate, lies at 4.78 MSymbols.


4


These numbers are fairly small when compared to the capacity on the SpiN-Naker side, which is around 62.4 MSymbols in both directions. This large dif-ference is explained by the fact the microcontroller implementation is entirelysoftware based as opposed to the SpiNNaker which is implemented in dedi-cated hardware. Furthermore, the difference between the transmission and thereception rate lies in the increased complexity when receiving symbols. Whereasreceiving includes a step to identify symbols, no such step is required when trans-mitting packets. Another analysis showed that the transmission and receptioncapacity exceeds the UART’s 12 MBps by far. This means that this communi-cation channel is currently the bottleneck.

The packet transmission rate resides at 455K packets per second for packetswith payload and 770K packets per second for packets without further payload.As for the packet reception rate, 435K packets per second can be received if theydo not include a payload, whereas the rate for packets with payload is around238K packets per second.

Action Time taken (µs)

Packet Transmission with payload 2.20Packet Transmission without payload 1.30Symbol transmission for packet with payload 1.80Symbol transmission for packet without payload 1.00Packet reception with payload 4.20Packet reception without payload 2.30

Table 1. Performance Measurements for the transmission and reception of SpiNNakerpackets.

3 Boids Model

We developed a case study for the SpiNNaker Chip Computing Module todemonstrate possible applications. SpiNNaker was obviously designed to sim-ulate spiking neural networks. however, we opted for a more general exampleand simulated the aggregate motion of a flock of birds while using distributedrules for each bird. The implemented model for this simulation is traditionallynamed the Boids model.

The Boids model is a distributed behavioral model [8] for flocks of flyingbirds or fish schools. It shares many characteristics with particle systems – largesets of individual items, each with its own behavior – but has several crucialdifferences. Traditional particle systems are usually used to model fire, smoke,or water. Each particle is created, ages, and finally dies, and is generally denotedby a point-like structure. In a boids simulation however, individual boids havea geometrical shape and, consequently, describe an orientation. Additionally,typical particles do not interact between themselves as opposed to e.g. a birdwhich must do so in order to flock appropriately.


5

https://www.researchgate.net/publication/2797343_Flocks_Herds_and_Schools_A_Distributed_Behavioral_Model?el=1_x_8&enrichId=rgreq-14c487a4f9437e104705f96cb2bc1bc3-XXX&enrichSource=Y292ZXJQYWdlOzI2NzAyOTUwODtBUzoxODU4NzI1MDY5NTc4MjhAMTQyMTMyNjg2NzcxNw==


Boids models are often referred to as a prime example of Artificial Life [1].They illustrate a variety of principles which appear in natural systems, such asemergence where complex behavior comes from the local interaction of simplerules. Another example is unpredictability. Although a bird usually does notbehave chaotically in a temporally local context, it is difficult or near impossibleto predict its behavior on a larger time scale.

A natural flock has certain behavioral rules that allows it to exist and survive.For instance, birds tend to avoid collisions with other birds but still stay close tothe flock in order to protect the whole flock against predators. We impose suchpossibly contradictory rules on our simulated boids to induce seemingly naturalflocking behaviors. The basic rules are:

– Collision Avoidance: avoid collisions with nearby flock members– Velocity Matching: attempt to match velocity with nearby flock members– Flock Centering: attempt to stay close to nearby flock members

Each behavior rule produces an acceleration which is a contribution to atunable weighted average. The relative strength of each rule will dictate thegeneral behavior of the flock. For example, if the flock centering behavior hasa very low impact then the flock will be very sparse while it will still follow acommon direction.

Since the model attempts to simulate the movement of birds, it must bebased on a semi-realistic model of flight. It does not need to take in considerationall physical forces like aerodynamic drag or even gravity but it must limit thevelocity and instantaneous accelerations to realistic values. These restrictionshelp modeling creatures with finite amounts of energy.

Fig. 3. A frame of the Boids Visualizer with 2176 birds.

3.1 Evaluation

We developed two distinct versions of the simulation. One incorporates the capa-bilities of our SpiNNaker Chip Computing Module to simulate the boids model


6

https://www.researchgate.net/publication/9035421_Artificial_Life_Organization_Adaptation_and_Complexity_from_the_Bottom_Up_TRENDS_Cognit_Sci?el=1_x_8&enrichId=rgreq-14c487a4f9437e104705f96cb2bc1bc3-XXX&enrichSource=Y292ZXJQYWdlOzI2NzAyOTUwODtBUzoxODU4NzI1MDY5NTc4MjhAMTQyMTMyNjg2NzcxNw==


which is described above, the other does not and executes the simulation ona standard desktop computer. Both implementations use a desktop computerto visualize the simulation results. The differences between the two simula-tions can be directly inspected with this approach. A screenshot of the simu-lation is shown in Figure 3 and video of the simulation can be found online athttps://www.youtube.com/watch?v=KiyhVRgxugY. The method used for theevaluation is as follows:

– select a number of birds for the simulation.– run the simulation on the SpiNNaker Chip Computing Module, and record

the frames per second (FPS) of the visualization.– run the simulation on the computer and, again, record the frames per second.

The results of this benchmark are compiled in Table 2. The central processingunit (CPU) of the computer used for running this simulation was an AMDPhenom II X4 945 running at 3 GHz.

The O(N2) complexity of the algorithm is clearly visible in the results. Theincrease of the number of birds leads to an ever increasing reduction of the framerate. The version that uses the SpiNNaker Chip Computing Module also suffersfrom a reduction in the frame rate due to the shear number of objects it hasto represent. The graphics code was not optimized to handle a large number ofbirds, but as it is the same code for both versions, the rendering time is negligible.The upper limit of 60 frames per second is due to the monitor’s internal refreshrate setting.

Clearly, the SpiNNaker Chip Computing Module is advantageous. The simu-lation delivers higher frame rates when using the module when compared to thedesktop version of the simulation.

Number of birds FPS on SCCM FPS on Desktop Computer

2176 60 604352 59 306528 43 15Table 2. Frames per second for the simulation with and without the SpiNNaker ChipComputing Module (SCCM).

4 Summary and Conclusions

We developed hardware and software for a novel module consisting of a singleSpiNNaker chip and an auxiliary microcontroller. The system can simulate hugeneural networks but still has a very low power consumption. In fact, its real-time I/O capabilities with limited power requirements make it highly suitablefor mobile power efficient systems. This point is even more emphasized by themodule’s small size. Furthermore, our module is designed to be easily extendable


7


by other SpiNNaker Chip Computing Modules or by other SpiNNaker systems.We demonstrated the capabilities of one single module using a boids simulation.The evaluation shows that our system is capable of simulating large numbersof distributed items. Hence, the module is ideal to simulate large networks ofartificial neurons.

We are currently working on a version which is even further reduced in size.It will thus be able to use the module on tiny robots, in neuroprosthetics, ormultiple connected modules to simulate even larger artificial neural networkswith only little space consumption.

Acknowledgments. This project would not have been possible without theprecious help from Steve Temple, Luis Plana and Francesco Gallupi from theUniversity of Manchester who supplied the SpiNNaker chips and provided es-sential guidance while studying the SpiNNaker system.

References

1. Bedau, M.A.: Artificial life: organization, adaptation and complexity from the bot-tom up. Trends in Cognitive Sciences 7(11), 505 – 512 (2003)

2. Brotherson, S.: Understanding Brain Development in Young Children (April 2009),http://www.ag.ndsu.edu/pubs/yf/famsci/fs609.pdf

3. Choudhary, S., Sloan, S., Fok, S., Neckar, A., Trautmann, E., Gao, P., Stewart,T., Eliasmith, C., Boahen, K.: Silicon neurons that compute. In: Villa, A., Duch,W., Erdi, P., Masulli, F., Palm, G. (eds.) Artificial Neural Networks and MachineLearning – ICANN 2012, Lecture Notes in Computer Science, vol. 7552, pp. 121–128.Springer Berlin Heidelberg (2012)

4. Denk, C., Llobet-Blandino, F., Galluppi, F., Plana, L., Furber, S., Conradt, J.: Real-time interface board for closed-loop robotic tasks on the spinnaker neural computingsystem. In: Mladenov, V., Koprinkova-Hristova, P., Palm, G., Villa, A.E., Appollini,B., Kasabov, N. (eds.) Artificial Neural Networks and Machine Learning – ICANN2013, Lecture Notes in Computer Science, vol. 8131, pp. 467–474. Springer BerlinHeidelberg (2013)

5. Furber, S., Brown, A.: Biologically-inspired massively-parallel architectures - com-puting beyond a million processors. In: Application of Concurrency to System De-sign, 2009. ACSD ’09. Ninth International Conference on. pp. 3–12 (2009)

6. Nguyen, T.: Total number of synapses in the adult human neocortex. Journal ofMathematical Modeling: One + Two 3(1) (2010)

7. Pfeil, T., Grubl, A., Jeltsch, S., Muller, E., Muller, P., Petrovici, M.A., Schmuker,M., Bruderle, D., Schemmel, J., Meier, K.: Six networks on a universal neuromorphiccomputing substrate. ArXiv e-prints (Oct 2012)

8. Reynolds, C.W.: Flocks, herds and schools: A distributed behavioral model. SIG-GRAPH Comput. Graph. 21(4), 25–34 (Aug 1987)


8























Development of a Dynamically Extendable SpiNNaker Chip Computing Module

Documents

Transcript of Development of a Dynamically Extendable SpiNNaker Chip Computing Module