What should the next-generation SpiNNaker system look like?
Protocol of first meeting on 26.4. at 10pm
(thanks to Simon for taking notes)
The discussion went through numerous constraints of the current system that could be improved in the next generation. They are sorted here according to the main sub-parts of the SpiNNaker system:
Processing
(processor cores + extensions -> implementing models)
- Floating-point calculations are required for certain models/applications (e.g. matrix multiplications/inversions, models that require ODE solvers)
- General agreement that we'd be running more complex models in future, perhaps with multi-compartment models, dendritic branches, etc. That would shift the communication/compute balance significantly to the compute side.
Memory
(internal/external -> state variables, synaptic matrix and parameters)
- Local memory could become a more critical constraint if complexity of models increases
- External memory access seemed not to be problematic (maybe hidden by/visible through communication limitations)
Communication
(internal and IO -> spike communication, interfacing to the outside world)
- Easier connectivity for AER sensors (to avoid needing external FPGA; could FPGA on SpiNNaker board be employed?), and more generally for AER-compatible devices
- Direct WIFI connectivity maybe also useful for independent systems (robots, Drones, etc); would allow a bigger, stationary SpiNNaker system to control a mobile agent
- Data load times and data read times were felt as a rather severe obstacle for using the current system
- Especially important for models with explicit connectivity (e.g. pyNN.FromListConnector) and parameter sweeps
- Improved bandwidth from/to host needed
- Spike packet drops are encountered for some benchmarks
- Better diagnostics would help giving more informative feedback (where, when, which) back to the user
- Longer routing keys in the SpiNNaker router could be beneficial for more targeted routing. This would also be required for interfacing with new/future retinas/AER chips with number of pixels/neurons. A current ATIS interface already uses all available bits in the routing key, and would benefit from longer keys. Maybe also longer payloads could help for certain applications.
General
- Mobile systems (smaller size than 48-node board, one chip or a few chips in a compact package) would be quite useful, to fit e.g. on small robots
- Scalability: SpiNNaker is scalable in principle, but how to assess the constraints that apply to large-scale models? Communication bandwidth vs. router entries seems to be a likely limiting factor
- Do not only scale the network size, but also make models more detailed (dendritic branches, etc.)
- Investigation with two possible approaches: Either a set of example networks, i.e. networks that users work with anyway, or dedicated synthetic tests to analyze constraints in a more targeted fashion
Protocol of session on 29.04.2016 (Felix/Christian)
Network-on-chip improvements
- Secure Packet Transmission - Mode/avoid packet loss
- required for non-spike data (tbd, stuff which is more critical than e.g. loosing a couple of spikes)
- self-check features of links and higher NoC layers?
- Deadlock (fifos getting full) seems to be a large factor in current spinnaker spike packet loss, how to avoid it
- directed communication channels (as current spinnaker uses one event queue for both in and out, bidirectional communication leads to deadlock)
- increase bandwidth/fifo size
- Extent routing table key length for increased system size of spinnaker 2
- Memory Address Mapping
Processor
- Double Precision floating point unit
- DMA/Memory Improvements (e.g. Read-Modify-Request)
- Memory Partition (Shared SRAM-Access for more than 1 core)
- Embedding FPGA-like structure as configurable hardware accelerator
Configuration/setup
- Reduce configuration time (priority issue)
- increase external bandwidth
- implement on-spinnaker configuration (self-mapping, etc)
- Online interactions/reconfiguration
- Protocols to implement this
- what needs to be reconfigured
General
- Benchmark/Profiling/Debug Features
- Better detect for Routing/Execute Errors (realtime violations)
Protocol of session on 03.05.2016 (Andreas/Sebastian)
- State-recording might require additional memory bandwidth and capacity
- 0.1ms timestep --> UMAN list implications of that
- TUD evaluates HMC as memory solution, focus on power reduction at low utilization (e.g. sleep modes of SerDES transceivers)
- Portable systems might power-down the HMC or only utilize less than 4 links for power saving reasons.
- Memory discussion ongoing with UMAN
- Hardware accelerators:
- e.g. expr, log, sqrt, logistics function … (TUD provides initial list, to be discussed with UMAN, evaluate HW overhead)
- DMA that supports memory access of arrays