The group identified 3 distinct benchmarking classes.

1) Behavioural benchmarks for whole networks

2) Dynamical reproduction benchmarks for specific neuron and synapse models

3) Benchmarks for comparing against biological datasets.

With respect to the first it was quickly acknowledged that any metrics developed

will be highly task-specific. Most people expressed a general dissatisfaction

with present benchmarks which tend to use ad-hoc metrics open to mathematical

criticism. 2 examples were presented of more formally justifiable metrics. The

first considers a synfire chain. The dynamics of the synfire chain can be

expressed in terms of initial spike activity vs. synchrony, in which there is

a clear separatrix between networks whose long-term dynamics will dissipate

(i.e. the synfire waves will disappear) and networks whose long-term dynamics

remain stable and convergent (i.e. the synfire waves propagate indefinitely with

increasing synchronisation). Since the network can be characterised in this way

it was proposed that the degree to which the implemented network reproduces this

theoretical separatrix could be a benchmark. (The 'degree of match' remains

something that must be formally defined) The second example considers object

tracking. If one has a ball following a ballistic trajectory, and neglecting air

resistance (such as, e.g. can be achieved by simulating a ball on a computer

screen), the object tracking performance can be measured by measuring degree of

match in both time and position to the trajectory. Again the physics of the

problem allows the analytic solution to be described and thus the degree of

match.

One member considered the idea that separate benchmark figures on a variety of

tasks could be combined into a Quality of Service figure, however, most of the

particpants noted that since there was no way to normalise individual metrics

given the radically different nature of the task and the measurement, such an

approach must be considered dubious. In the end the group indicated that in

essence reasonable metrics will be in large part a matter of good experimental

design. It must be considered essential that the expected behaviour can be

defined using some closed-form mathematical expression so that the actual network

can be compared relative to an absolute reference.

The second type of benchmarks was felt to be the easiest, because hardware and

other systems are necessarily reproducing a model that can be defined

mathematically. Platforms can then be compared relative to a reference simulator

which is considered to give the definitive solution. There was some question as

to what the reference simulator should be - certain participants had worked with

Mathematica to produce high-quality results but the exact nature of the tests for

similarity needed to be confirmed. In general however the group was in consensus

that model matching could in principle be benchmarked in this way provided

representational precision was high enough in the reference simulator.

With the third type of metric the group found more difficulty. There are issues

related to the fact that there is no absolute reference for comparison - data is

simply data. Furthermore there are issues in the case of spike comparison with

respect to spike matching: for the interval over which the modelled network and

the data produce exactly the same number of spikes then some sort of pairing could

possibly be done but once the number of spikes diverges identifying a given spike

with a given expected spike is much more problematic. Various sorts of

sliding-window comparisons could be made but this introduces a significant ad-hoc

component in the size and shape of the sliding window - e.g. all spikes could be

convolved with a Gaussian kernel but what should the width and gain of the kernel

be? The outlook here was less definite and various metrics were proposed with the

overall idea that there ought to be some sort of distance metric between the

dataset and the model but what this distance metric ought to be remained the subject

of further work.