A Formal Mathematical Framework For Physiological Observations, Experiments and Analyses
A Formal Mathematical Framework For Physiological Observations, Experiments and Analyses
A Formal Mathematical Framework For Physiological Observations, Experiments and Analyses
1. Department of Biology, University of Leicester, University Road, Leicester LE1 7RH 2. School of Computer Science, University of Nottingham, Jubilee Campus, Nottingham NG8 1BB
Abstract
Experiments can be complex and produce large volumes of heterogeneous data, which makes their execution, analysis, independent replication and meta-analysis dicult. We propose a mathematical model for experimentation and analysis in physiology that addresses these problems. We show that experiments can be composed from time-dependent quantities, and be expressed as purely mathematical equations. Our structure for representing physiological observations can carry information of any type and therefore provides a precise ontology for a wide range of observations. Our framework is concise, allowing entire experiments to be dened unambiguously in a few equations. To demonstrate that our approach can be implemented, we show the equations we have used to run and analyse two non-trivial experiments describing visually stimulated neuronal responses and dynamic clamp of vertebrate neurons. Our ideas could provide a theoretical basis for developing new standards of data acquisition, analysis and communication in neurophysiology.
Introduction
Reproducibility and transparency are cornerstones of the scientic method. As scientic experiments and analysis become increasingly complex, reliant on computer code and produce larger volumes of data, the feasibility of independent verication and replication has in practice been undermined. Primary data sharing is now standard in applications of high-throughput biology, but not in the many elds that produce heterogeneous observations [1,2]. Even when raw data and code used for experiments and analysis are fully disclosed, only a minority of ndings can be reproduced without discrepancies [3-5]. It is often not realistic to verify that computer code used in analyses correctly implements an intended mathematical algorithm, yet errors can undermine a large body of work [6]. The combination of bias, human error, and unveried software, has led to the suggestion that many published research ndings are awed [7,8]. Some of these problems may be mitigated by developing explicit models of experimentation, evidence representation and analysis. A good model should simultaneously: (i) introduce a system of categorisation directly relevant to the scientic eld, such that scientists can dene experiments and reason about observations in familiar terms, (ii) be machine executable, therefore unambiguous and practically useful, and (iii) consist exclusively of terms that directly correspond to mathematical entities, enabling algebraic reasoning about and manipulation of procedures. Experiments are dicult to formalise in terms of relations between mathematical objects because they produce heterogeneous data [2], and because they interact with the physical world. In creating a mathematical framework for experiments, we take advantage of progress in embedding input and output [9,10] into programming languages where the only mechanism of computation is the evaluation of mathematical functions [11]. Here, we dene a formal framework for physiology that satises the above criteria. We show that there is a large conceptual overlap between physiological experimentation and Functional Reactive Programming (FRP; [12,13]), a concise and purely functional formulation of time-dependent reactive computer programs. Consequently, physiological experiments can be concisely dened in the vocabulary of signals and events introduced by FRP. Such a language does not describe the physical components of biological organisms; it has no concept of networks, cells or proteins. Instead it describes the observation and calculation of the mathematical objects that constitute physiological evidence (observations).
(i) An explicitly dened ontology of physiological observations. Physiological databases have not been widely adopted [14,15] despite many candidates [16-19]. This constrasts with bioinformatics and neuroanatomy, where databases are routinely used [20,21]. We suggest that a exible, concise and simple structure for physiological quantities can remedy some of the shortcomings [1,15] of existing databases and thus facilitate the sharing of physiological data and metadata [22]. (ii) A concise language for describing complex experiments and analysis procedures in physiology using only mathematical equations. Experimental protocols can be communicated unambiguously, highlighting dierences between studies and facilitating replication and meta-analysis. The provenance [23-25] of any observation can be extracted as a single equation that includes post-acquisition processing and censoring. In addition, analysis procedures in languages with a clear mathematical denotation are veriable since their implementation closely follows their specication [26]. (iii) The theoretical basis for new tools that are practical, powerful and generalise to complex and multi-modal experiments. To demonstrate this, we have implemented our framework as a new programming language and used it for non-trivial neurophysiological experiments and data analyses. A strength of our approach is that its individual elements could, alternatively, be adopted separately or in dierent ways to suit dierent demands.
Results
To introduce the calculus of physiological evidence (CoPE), we rst dene some terminology and basic concepts. We assume that time is global and is represented by a real number, as in classical physics. An experiment is an interaction between an observer and a number of organisms during a dened time period. An experiment consists of one or more trials: non-overlapping time periods during which the observer is running a program instructions for manipulating the environment and for constructing mathematical objects, the observations. The analyses are further programs to be run during or after the experiment that construct other mathematical objects pertaining to the experiment. In the sections that follow, we give precise denitions of these concepts using terms from programming language theory and type theory, while providing an introduction to the terms for a general audience.
What kinds of mathematical objects can be used as physiological evidence? We answer this question within simple type theory [27,28], which introduces an intuitive classication of mathematical objects by assigning to every object exactly one type. These types include base types, such as integers Z, text strings String and the Boolean type Bool with the two values True and False, as well as the real numbers R (which can be approximated in a programming language with Float). These base types are familiar to users of most programming languages. In addition, modern type systems, including simple type theory, allow types to be arbitrarily combined in several ways. For instance, if and are types, the type is the pair formed by one element of and one of ; [] is a list of s; and is the type of functions that calculate a value in the type from a value in . The ability to write exible type schemata and generic functions containing type variables (, , . . .), which can later be substituted with any concrete type, is called parametric polymorphism[27] (or templates and generics in the programming languages C++ and Java, respectively) is essential to the simplicity and exibility of CoPE. We distinguish three type schemata in which physiological evidence can be values. These dier in the manner in which measurements appear in a temporal context, but all derive their exibility from parametric polymorphism. Signals capture the notion of quantities that change in time. In physiology, observed time-varying quantities often represent scalar quantities, such as membrane voltages or muscle force, but there are also examples of non-scalar signals such as the two- or three dimensional location of an animal or of a body part. Here, we generalise this notion such that for any type , a signal of is dened as a function from time to a value in , written formally as: Signal = Time For instance, the output of a dierential voltage amplier might be captured in a Signal Float. To model occurrences pertaining to specic instances in time, FRP denes events as a list of pairs of time points and values in a type , called the tags: Event = [Time ] For example, an event can be constructed from a number-valued signal that represents the time of the largest amplitude value of of the signal, with that amplitude in the tag. Events that
do not have a value of interest to associate with the time point at which it occurred can be tagged with the unit type () which has only one element (that is, no information). Events can therefore represent measurements where the principal information is when something happened, or measurements that concern what happened. A third kind of information describes the properties of whole time periods. We dene a duration of type as a list of pairs, of which the rst component is a pair denoting a start time and an end time. The last component is again a value of any type : Duration = [Time Time ] Durations are useful for manipulating information about a whole trial or a single annotation of an entire experiment, but could also be observations in their own right, such as open times of individual ion channels, or periods in which activity of a system exceeds a set threshold (e.g during bursts of action potentials). Lastly, durations could be used for information that spans multiple trials but not an entire experiment for instance, the presence or absence of a drug. Since signals, events and durations can be instantiated for any type, they form a simple but exible framework for representing many physiological quantities. We show a list of such examples primarily drawn from neurophysiology in Table 1. A framework in any type system that does not support parametric polymorphism would have to represent these quantities fundamentally dierently, thus removing the possibility of re-using common analysis procedures. Although parametric polymorphism is conceptually simple and the distinctions we are introducing are intuitive, common biomedical ontologies [29] cannot accommodate these denitions. Calculating with signals and events From direct observations, one often needs to process events and signals, create new events from signals, lter data and calculate statistics. Here, we formulate these transformations in terms of the lambda calculus [11], a family of formal languages for computation based solely on evaluating functions. These languages, unlike conventional programming languages, retain an important characteristic of mathematics: a term can freely be replaced by another term with identical meaning. This property (referential transparency; [30]) facilitates algebraic manipulation of and reasoning about programs [26]. The lambda calculus allows functions to be used as rst class entities: that is, they can be referenced by variables and passed as arguments to other functions. On the other hand, the lambda calculus disallows changing the value of variables or global
states. These properties together mean that the lambda calculus combines veriable correctness with a high level of abstraction, leading to programs that are in practice more concise [31] than those written in conventional programming languages. The lambda calculus or variants thereof has been used as a foundation for mathematics [32], classical [33] and quantum mechanics [34], evolutionary biochemistry [35], mechanized theorem provers [36,37] and functional programming languages [38]. In the lambda calculus, calculations are performed by function abstraction and application. x e denotes the function with argument x and body e, and f e the application of the function f to the expression e (more conventionally written f (e)). For instance, the function add2 = x x + 2, which can be written more conveniently as add2 x = x + 2, adds two to its argument; hence add2 3 = (x x + 2) 3 = 3 + 2 by substituting arguments in the function body. We now present the concrete syntax of CoPE, in which we augment the lambda calculus with constructs to dene and manipulate signals and events. This calculus borrows some concepts from earlier versions of FRP, but focuses exclusively on signals and events as mathematical objects and their relations. It does not have any control structures for describing sequences of system congurations, where a signal expression depends on the occurrence of events [12,13], although such constructs may be useful for simulations. As a result, CoPE is quite dierent from conventional FRP, which is also reected in its implementation. Let the construct {: e :} denote a signal with the value of the expression e at every time point, and let the construct : s : denote the current value of the signal s in the temporal context created by the surrounding {: . . .:} braces. For instance, {: 1 :} denotes the signal that always has the value 1; and the function smap dened as smap f s = {: f : s : :} transforms, for any two types and , the signal s of into a signal of by applying the function f of type to the value of the signal at every time point. The dierential operator D dierentiates a real-valued signal with respect to time, such that D s denotes its rst derivative and D D s the second derivative of the signal s. When the dierential operator appears on the left-hand side of a denition, it introduces a dierential equation (see Example 2 below).
Events and durations can be manipulated as lists. Thus, a large number of transformations can be dened with simple recursive equations including lters, folds and scans that are pivotal in functional programming languages [31]. In addition, we have added a special construct to detect events from existing signals. For instance, a threshold detector generates an occurrence of an event whenever the value of a signal crosses a specic level from below. Here, we generalise the threshold detector to an operator ?? that takes a predicate (i.e., a function of type Bool ), applies it to the instantaneous value of a signal, and generates an event whenever the predicate becomes true. For instance, (x x > 5) ?? s denotes the event that occurs whenever the value of the signal s starts to satisfy the predicate x x > 5; i.e., whenever it becomes greater than 5 after having been smaller. The expression (x x > 5) ?? s thus denes a threshold detector restricted to threshold crossings with a positive slope. Table S1 in the supplementary information presents an informal overview of the syntax of CoPE; Table S2 details the types and names of some of the functions we have dened using these denitions. Interacting with the physical world In the previous examples, signals, events and durations exist as purely mathematical objects. To describe experiments, however, it must also be possible to observe particular values from real-world systems, and to create controlled stimuli to perturb these systems. For this purpose, we introduce sources and sinks that act as a bridge between purely mathematical equations and the physical world. A source is an input port through which the value of some external quantity can be observed during the course of an experiment by binding it to a variable. If the quantity is time-varying, the bound variable will denote a signal. For instance, binding a variable to source denoting a typical analog-to-digital converter yields a signal of real numbers. However, a source may also refer to a time-invariant quantity. The construct identier < source
binds the value or signal resulting from the observation of the source during the course of an experiment to the variable identier. For a concrete example, the following code denes a simple experiment: v < ADC 0 (kHz 20) where kHz x = 1000x . This describes the observation of the voltage signal on channel 0 of an analog-to-digital converter at 20 kHz, binding the whole signal to the variable v . We have also used sources to sample values from probability distributions (see Supplementary Information). In addition to making appropriate observations, an experiment may also involve a perturbation of the experimental preparation. For example, the manipulation could control the amount of electric current injected into a cell. Alternatively, non-numeric signals are used below to generate visual stimuli on a computer screen. Such manipulations require the opposite of a source, a sink : an output port connected to a physical device capable of eecting the desired perturbation. The value at the output at any point in time during an experiment is dened by connecting the corresponding sink to a signal. This is done through the the following construct, mirroring the source construct introduced above: signal sink > As a concrete example, suppose we wish to output a sinusoidal stimulus. We rst construct a time-varying signal describing the desired shape of the stimulus. In this case, we read a clock source that yields a signal counting the number of seconds since the experiment started: seconds < clock The sine wave can now be dened as: sineWave = {: Asin (f : seconds : + p) :} where A, f and p are Float-valued constants speciying the amplitude, frequency and phase, respectively. We then write sineWave DAC 0 (kHz 20) > to send the sineWave signal to channel channel 0 of a digital-to-analog converter at 20 kHz. We have implemented this calculus as a new programming language and used this software to dene and run two detailed experiments in neurophysiology.
In locusts, the Descending Contralateral Movement Detector (DCMD) neuron signals the approach of looming objects to a distributed nervous system [39]. We have constructed several experiments in CoPE to record the response of DCMD to visual stimuli that simulate objects approaching with dierent velocities. To generate these stimuli, we augmented CoPE with primitive three-dimensional geometric shapes. Let the expression cube l denote a cube centred on the origin, with side length l , translate (x , y, z ) s the shape that results from translating the shape s by the vector (x , y, z ) and colour (r , g, b) s the shape identical to s except with the colour intensity red r , green g and blue b. These primitives are sucient for the experiments reported here. More complex stimuli can alternatively be dened in and loaded from external les; for instance, the source texture loads an image, such that tx < texture "image.tga" loads the binary image in image.tga and creates a new function tx , which when applied to a shape returns a similar but appropriately textured shape. Thus, tx (cube 1) denotes a textured cube. Sources for loading complex polygons could be dened similarly. Since signals in CoPE are polymorphic, they can carry not just numeric values but also shapes, so we represent visual stimuli as values in Signal Shape. The looming stimulus consists of a cube of side length l approaching a locust with constant velocity v. The time-varying distance from the locust to the cube in real-world coordinates is a real-valued signal: distance = {: v ( : seconds : 5) :} The distance signal is the basis of shape-valued signal loomingSquare representing the approaching square:
10
loomingSquare = {: colour (0, 0, 0) (translate (0, 0, : distance : ) (cube l )) :} loomingSquare diers from conventional protocols [40] for stimulating DCMD in that it describes an object that passes through the physical screen and the observer, and when displayed would thus disappear from the screen just before collision. In order not to evoke a large OFF response [41] at this point, the object is frozen in space as it reaches the plane of the surface onto which the animation is projected [42]. To achieve this eect, we dene a new signal that has a lower bound of the distance from the eye to the visual display screen zscreen distance = {: max zscreen : distance : :} where max x y returns the larger of the two numbers x and y. loomingSquare is identical to loomingSquare except for the use of distance . Finally, loomingSquare is connected to a screen signal sink that represents a visual display unit capable of projecting three-dimensional shapes onto a two-dimensional surface. loomingSquare screen > In our experiments, the extracellular voltage from the locust nerve (connective), in which the DCMD forms the largest amplitude spike, was amplied, ltered (see methods and Listing 1 in Supplementary Information) and digitised: voltage < ADC 0 (kHz 20) loomingSquare and voltage thus dene a single object approach and the recording of the elicited response. This approach was repeated every 4 minutes, with dierent values of shows
l |v| l |v| .
Figure 1
as values with type Duration Float, together with the distance and voltage signals for
the rst ve trials of one experiment on a common time scale. The simplest method for detecting spikes from a raw voltage trace is to search for threshold crossings, which works well in practice for calculating DCMD activity from recordings of the locust connectives [40]. (We have also implemented in CoPE a spike identication algorithm based on template matching; see supplementary information). If the threshold voltage for spike detection is vth , the event spike can be calculated with
11
spike = tag () ((v v > vth ) ?? voltage) where tag replaces every tag in some event with a xed value, so that spike has type Event (). The spike event detected by threshold crossing is displayed on the common time scale in Figure 1. The top row displays the spike rate histogram Hspike = {: length (lter (between : delay seconds : : seconds : fst) spikes) :} for each trial. This denition exploits the list semantics of events by using the generic listprocessing function lter which takes as arguments predicate p and a list xs, and returns the list of elements in xs for which the predicate holds. Here the predicate is fst (which returns the rst element of a pair, here the occurrence time) composed () with the function between = x y z z > x z the rst two. We examined how the DCMD spike response varied with changes in Hspike for three dierent values of
l |v| l |v| .
The average of
of spikes (length spike) and largest value of Hspike , for each approach, plotted against the value of
l |v|
[42]. The code that descibes and executes these trials is given in the Supplementary
Information (Listing 1). This code includes a description, captured in CoPE variables and with appropriate temporal context, of the experimental context that is not machine-executable. This description is based on proposed standards for minimal information about electrophysiological experiments [43]. A table of correspondences between these standards and CoPE variables is given in Table S3. This experiment demonstrates that the calculus of physiological evidence can adequately and concisely describe visual stimuli, spike recording and relevant analyses for activation of a locust looming detection circuit. To demonstrate the versatility of this framework, we next show that it can be used to implement dynamic clamp in an in vivo patch clamp recording experiment. Example 2 Dynamic clamp experiments [44,45] permit the observation of real neuronal responses to added simulated ionic conductances; for instance, a synaptic conductance or an additional Hodgkin-
12
Huxley type voltage-sensitive membrane conductance. A dynamic clamp experiment requires that the current injected into a cell is calculated at every timepoint based on the recorded membrane potential. Here, we use CoPE to investigate the eect of an A-type potassium conductance [46] on the response of a zebrash spinal motor neuron to synaptic excitation. The output current i is calculated at each time-step from the simulated conductance g and the measured membrane voltage Vm : Vm < ADC 0 (kHz 20) i = {: ( : Vm : E ) : g : :} i DAC 0 (kHz 20) > The experiment is thus characterised by the conductance signal g (for clarity, here we omit the amplier-dependent input and output gains). In the simplest case, g is independent of Vm ; for instance, when considering linear synaptic conductances [47]. We rst consider the addition of a simulated fast excitatory synaptic conductance to a real neuron. Simple models of synapses approximate the conductance waveform with an alpha function [48]: alphaf amp = {: amp 2 : seconds : exp ( : seconds : ) :} To simulate a barrage of synaptic input to a cell, this waveform is convolved with a simulated presynaptic spike train. The spike train itself is rst bound from a source representing a random probability distribution, in this case series of recurrent events of type Event () for which the interoccurrence interval is Poisson distributed. Our standard library contains a function convolveSE which convolves an impulse response signal with a numerically-tagged event, such that the impulse response is multiplied by the tag before convolution. preSpike < poissonTrain rate gsyn = convolveSE (alphaf amp ) (tag 1 preSpike) The signal gsyn could be used directly in a dynamic clamp experiment using the above template (i.e. g = gsyn ). Here, we will examine other conductances that modulate the response of the cell to synaptic excitation. Both the subthreshold properties of a cell and its spiking rate can be regulated by active ionic conductances, which can also be examined with the dynamic clamp. In the HodgkinHuxley formalism for ion channels, the conductance depends on one or more state variables, for
13
which the forward and backward rate constants depend on the membrane voltage. We show the equations for the activation gate of an A-type potassium current ([46]; following [49], but using SI units and absolute voltages). The equations for inactivation are analogous (see Listing 2 in supplementary information). We write the forward and backward rates as functions of the membrane voltage a v =
ka1 (v +ka2 ) ka2 v 1 exp k
a3
a v =
with ka1 = inverseV olts 2.0 105 , ka2 = volts 0.0469, ka3 = ka3 = volts 0.01, ka1 = inverseV olts 1.75105 , ka2 = volts 0.0199, volts = inverseV olts = x x . The time-varying state of the activation gate is given by a dierential equation. We use the notation D x = {: f (x , : seconds : ) :} to denote the ordinary dierential equation that is conventionally written
dx dt
= f (x, t) with starting conditions explicitly assigned to the variable x0 . The dierential
equation for the activation variable a is D a = {: a : Vm : (1 : a : ) a : Vm : : a : :} a0 = 0 The inactivation state signal b is dened similarly. The current signal from this channel is calculated from Ohms law: iA = {: gA : a : : b : ( : Vm : E ) :} This is added to the signal i dened above to give the output current thus completing the denition of this experiment: Vm gsyn + iA DAC 0 (kHz 20) > Figure 3A and 3B show the voltage response to a unitary synaptic conductance and a train of synaptic inputs, respectively, with gA ranging from 0 to 100 nS. By varying the value of rate, we can examine the input-output relationship of the neuron by measuring the frequency of postsynaptic spikes. Spikes were detected from the rst derivative of the Vm signal with spike = tag () ((v v > vth ) ?? D Vm )
14
and the spike frequency calculated with the frequencyDuring function. This relationship between the postsynaptic spike frequency and the simulated synaptic input rate is plotted in Figure 3C for four dierent values of gA . The code for Example 2 is given in the Supplementary Information (Listing 2).
Discussion
We present a new approach to performing and communicating experimental science. Our use of typed, functional and reactive programming overcomes at least two long-standing issues in bioinformatics: the need for a exible ontology to share heterogeneous data from physiological experiments [15] and a language for describing experiments and data provenance unambiguously [23,50]. The types we have presented form a linguistic framework and an ontology for physiology. Thanks to the exibility of parametric polymorphism, our ontology can form the basis for the interchange of physiological data and metadata without imposing unnecessary constraints on what can be shared. The ontology is non-hierarchical and would be dicult to formulate in the various existing semantic web ontology frameworks (Web Ontology Language, [29], or Resource Description Framework), which lack parametric polymorphism and functional abstraction. Nevertheless, by specifying the categories of mathematical objects that constitute evidence, it is an ontology in the classical sense of cataloguing the categories within a specic domain, and providing a vocabulary for that domain. We emphasise again that it is an ontology of evidence, not of the biological entities that give rise to this evidence. It is unusual as an ontology for scientic knowledge in being embedded in a computational framework, such that it can describe not only mathematical objects but also their transformations and observations. Recent work on metadata representation [51] has focused on delineating the information needed to repeat an experiment [52,43], but in practice it is often not clear a priori what aspects of an experiment could inuence its outcome. With CoPE, our main goal was to describe unambiguously machine-executable aspects of the metadata of an experiment. Nevertheless, any information that can be captured in a type can be represented in the temporal contexts provided by CoPE. I.e. it can exist as signals, events or durations, as we have demonstrated in the code listings in the Supplementary Materials. Here, we make no fundamental distinction between the representation of data and metadata. All relevant information about an experiment is indexed by time
Framework for experiments in physiology and thus linked by overlap on a common time scale.
15
Parametric polymorphism and rst-class functions are generally associated with researchoriented languages such as Haskell and ML rather than mainstream programming languages such as C++ or Java. It is likely that much of CoPE could be implemented in C++, where template metaprogramming implements a static form of parametric polymorphism and template functors can be used to represent functions. These must all be resolved at compile-time, however, so it is dicult to use dynamically calculated or arbitrarily complex functions. The importance of true rst-class functions and parametric polymorphism is becoming increasingly well recognized, and these features are now being implemented in the mainstream programming languages C++, C#, and Java. We therefore expect that it will soon be possible to implement the formalism we are proposing in a wide range of programming languages. Our mathematical denitions are unambiguous and concise, unlike typical denitions written in natural language, and are more powerful than those specied by graphical user interfaces or in formal languages that lack a facility for dening abstractions. Our framework is not only a theoretical formalism, but we also demonstrate that it can be implemented as a very practical tool. This tool consists of a collection of computer programs for executing experiments and analyses, and carrying out data management. Controlling experimentation and analysis with programs that share data formats can be highly ecient, and eliminates many sources of human error. Existing tools use dierential equations to dene dynamic clamp experiments (Model Reference Current Injection (MRCI); [53]) or simulations (X-Windows Phase Plane (XPP); [54]). Here, we show that a general (polymorphic) denition of signals and events, embedded in the lambda calculus, can dene a much larger range of experiments and the evidence that they produce. Our experiment denitions have the further advantage that they compose; that is, more complex experiments can be formulated by joining together elementary building blocks. Our full approach is particularly relevant to the execution of very complex and multi-modal experiments, which may need to be dynamically recongured based on previous observations, or to disambiguate dicult judgements about evidence [55]. Even if used separately, however, individual aspects of CoPE can make distinct contributions to scientic methodology. For instance, our ontology for physiological evidence can be used within more conventional programming languages or web applications that facilitate data sharing. In a similar way, the capabilities of CoPE for executing and analysing experiments could provide a robust core for innovative graphical user interfaces. We expect the formalism presented here to be applicable outside neu-
16
rophysiology. Purely temporal information from other elds could be represented using signals, events and durations, or other kinds of temporal context formalised in type theory. In addition, the concepts of signals, events and durations could be generalised to allow not just temporal but also spatial or spatiotemporal contexts to be associated with specic values. Such a generalisation is necessary for CoPE to accommodate data from the wider neuroscience community, including functional neuroanatomy and microscopy, and other scientic diciplines that observe and manipulate spatiotemporal data. We have argued that observational data, experimental protocols and analyses formulated in CoPE (or similar frameworks) are less ambiguous and more transparent than those described using many current formulations. The structure of CoPE presents additional opportunities for mechanically excluding some types of procedural errors in drawing inferences from experiments. For instance, CoPE could incorporate an extension to simple type theory [56] that adds not only a consistency check for dimensional units but also powerful aspects of dimensional analysis, such as Buckinghams -theorem. Furthermore, in our formulation, experimentally observed values exist as mathematical objects within a computational framework that can be used to dene probability distributions. Hierarchical probabilistic notation [57] permits the construction of exible statistical models for the directly observed data. This means that CoPE could in principle be used to turn such probabilistic models into powerful data analysis tools accessible from within the CoPE formalism. Such data analysis procedures can largely be automated, for instance by calculating parameter estimates using statistical packages such as WinBUGS [58] or Mlwin [59]. Alternatively it would be possible to compile descriptions of probabilistic models in CoPE to the specication format used by the AutoBayes [60] system, which would then be used to generate ecient code for statistical inference. When run, this code would return data to CoPE. This analysis workow could be integrated seamlessly into an implementation of CoPE such that both the hierarchical model structure and the returned parameters would be dened by and tagged with the appropriate temporal contexts. This methodology could be used to quantify dierent aspects of uncertainty in the measurements, taking into account all available information, while largely avoiding ad-hoc transformations of data. Integrating well-developed statistical tools with data acquisition and manipulation within CoPE would create a powerful platform for validating inferences drawn from physiological experiments.
17
Experimental Procedures
Language implementation We have used two dierent implementation strategies for reasons of rapid development and execution eciency. For purposes of experimentation and simulation, we have implemented a prototype compiler that can execute some programs that contain signals and events dened by mutual recursion, as is necessary for the experiments in this paper. The program is transformed by the compiler into a normal form that is translated to an imperative program which iteratively updates variables corresponding to signal values, with a time step that is set explicitly. The program is divided into a series of stages, where each stage consists of the signals and events dened by mutual recursion, subject to the constraints of input/output sources and sinks. This ensures that signal expressions can reference values of other signals at arbitrary time points (possibly in the future) as long as referenced signals are computed in an earlier stage. To calculate a new value from existing observations after data acquisition, we have implemented the calculus of physiological evidence as a domain-specic language embedded in the purely functional programming language Haskell. For hard real-time dynamic-clamp experiments, we have built a compiler back-end targeting the LXRT (user-space) interface to the RTAI (Real-time application interface; http://rtai. org) extensions of the Linux kernel, and the Comedi (http://comedi.org) interface to data acquisition hardware. Geometric shapes were rendered using OpenGL (http://opengl.org). All code used for experiments, data analysis and generating gures is available at http: //github.com/glutamate/bugpan under the GNU General Public License (GPL). Locust experiments Locusts were maintained at 1,600 m3 in 505050 cm cages under a standard light and temperature regime of 12h light at 36 C : 12h dark at 25 C. They were fed ad lib with fresh wheat seedlings and bran akes. Recordings from locust DCMD neurons were performed as previously described [61]. Briey, locusts were xed in plasticine with the ventral side upwards. The head was xed with wax and the connectives were exposed through an incision in the soft tissue of the neck. A pair of silver wire hook electrodes were placed underneath a connective and the electrodes and connective enclosed in petroleum jelly. The electrode signal was amplied 1000x and bandpass ltered 505000 Hz, before analog-to-digital conversion at 18 bits and 20
18
kHz with a National Instruments PCI-6281 board. The locust was placed in front of a 22 CRT monitor running with a vertical refresh rate of 160 Hz. All aspects of the visual stimulus displayed on this monitor and of the analog-to-digital conversion performed by the National Instruments board were controlled by programs written in CoPE running on a single computer. The code for running the trials described in Example 1, including relevant metadata, is given in Listing 1 (Supplementary Materials). Zebrash experiments Zebrash were maintained according to established procedures [62] in approved tank facilities, in compliance with the Animals (Scientic Procedures) Act 1986 and according to University of Leicester guidelines. Intracellular patch-clamp recordings from motor neurons in the spinal cord of a 2-day old zebrash embryo were performed as previously described [63]. We used a National Instruments PCI-6281 board to record the output from a BioLogic patch-clamp amplier in current-clamp mode, ltered at 3kHz and digitised at 10 kHz, with the output current calculated at the same rate by programs written in CoPE targeted to the RTAI backend (see above). The measured jitter for updating the output voltage was 6 s and was similar to that measured with the RTAI latency test tool for the experiment control computer. The code for the Zebrash experiment trials, including relevant metadata, is given in Listing 2 (Supplementary Materials). Author Contributions T.N. designed and implemented CoPE, carried out the experiments and data analyses, and wrote the draft of the paper. H.N. contributed to the language design, helped clarify the semantics, and wrote several sections of the manuscript. T.M. contributed to the design of the experiments and the data analysis, and made extensive comments on drafts of the manuscript. All authors obtained grant funding to support this project as described in the acknowledgements. Acknowledgements We would like to thank Jonathan McDearmid for help with the Zebrash recordings and Angus Silver, Guy Billings, Antonia Hamilton, Nick Hartell and Rodrigo Quian Quiroga for critical comments on the manuscript. This work was funded by a Human Frontier Science Project fellowship to T.N., a Biotechnology and Biological Sciences Research Council grant to T.M.
19
and T.N., a BBSRC Research Development Fellowship to T.M., and Engineering and Physical Sciences Research Council grants to H.N.
References
[1] Gardner, D., Abato, M., Knuth, K. H. & Robert, A. 2005 Neuroinformatics for Neurophysiology: The Role, Design, and Use of Databases. In Databasing the Brain: From Data to Knowledge (ed. Koslow, S. H. & Subramaniam, S.), pp. 4768. Hoboken, NJ: Wiley. [2] Tukey, J. W. 1962 The future of data analysis. Annals of Mathematical Statistics 33, 167. [3] Ioannidis, J. P., Allison, D. B., Ball, C. A., Coulibaly, I., Cui, X., Culhane, A. C., Falchi, M., Furlanello, C., Game, L., Jurman, G., et al. 2008 Repeatability of published microarray gene expression analyses. Nature Genetics 41, 149155. (DOI 10.1038/ng.295) [4] Baggerly, K. A. & Coombes, K. R. 2009 Deriving chemosensitivity from cell lines: forensic bioinformatics and reproducible research in high-throughput biology. Annals of Applied Statistics 3, 13091334. (DOI 10.1214/09-AOAS291) [5] McCullough, B. D. 2007 Got replicability? The Journal of Money, Credit and Banking archive. Econ Journal Watch 4, 326337. [6] Chang, G., Roth, C. B., Reyes, C. L., Pornillos, O., Chen, Y. & Chen, A. P. 2006 Retraction. Science 314, 1875. (DOI 10.1126/science.314.5807.1875b) [7] Ioannidis, J. P. 2005 Why most published research ndings are false. PLoS Medicine 2, e124. (DOI 10.1371/journal.pmed.0020124) [8] Merali, Z. 2010 Computational science: error. Nature 467, 775777. (DOI 10.1038/467775a) [9] Peyton Jones, S. 2002 Tackling the Awkward Squad: monadic input/output, concurrency, exceptions, and foreign-language calls in Haskell. In Engineering theories of software construction (ed. Hoare, T., Broy, M. & Steinbruggen, R.), pp. 4796. Amsterdam: IOS Press. [10] Wadler, P. 1995 Monads for Functional Programming. Lecture Notes In Computer Science 925, 2452. [11] Church, A. 1941 The Calculi of Lambda-Conversion. Princeton, NJ: Princeton University Press.
20
[12] Elliott, C. & Hudak, P. 1997 Functional reactive animation. International Conference on Functional Programming 32, 263273. [13] Nilsson, H., Courtney, A. & Peterson, J. 2002 Functional reactive programming, continued. Proceedings of the 2002 ACM SIGPLAN workshop on Haskell , 5164. [14] Herz, A. V., Meier, R., Nawrot, M. P., Schiegel, W. & Zito, T. 2008 G-Node: an integrated tool-sharing platform to support cellular and systems neurophysiology in the age of global neuroinformatics. Neural Networks 21, 10701075. (DOI 10.1016/j.neunet.2008.05.011) [15] Amari, S., Beltrame, F., Bjaalie, J. G., Dalkara, T., De Schutter, E., Egan, G. F., Goddard, N. H., Gonzalez, C., Grillner, S., Herz, A., et al. 2002 Neuroinformatics: the integration of shared databases and tools towards integrative neuroscience. J. Integrative Neuroscience 1, 117128. [16] Jessop, M., Weeks, M. & Austin, J. 2010 CARMEN: a practical approach to metadata management. Philosophical Transactions of the Royal Society London A 368, 41474159. (DOI 10.1098/rsta.2010.0147) [17] Teeters, J. L., Harris, K. D., Millman, K. J., Olshausen, B. A. & Sommer, F. T. 2008 Data sharing for computational neuroscience. Neuroinformatics 6, 4755. (DOI 10.1007/s12021008-9009-y) [18] Frishko, G., LePendu, P., Frank, R., Liu, H. & Dou, D. 2009 Development of Neural Electromagnetic Ontologies (NEMO): ontology-based tools for representation and integration of event-related brain potentials. Nature Precedings (DOI 10.1038/npre.2009.3458.1) [19] Katz, P. S., Calin-Jageman, R., Dhawan, A., Frederick, C., Guo, S., Dissanayaka, R., Hiremath, N., Ma, W., Shen, X., Wang, H. C., et al. 2010 NeuronBank: a tool for cataloging neuronal circuitry. Frontiers in Systems Neuroscience 4, 9. (DOI 10.3389/fnsys.2010.00009) [20] Rodriguez-Tome, P. 1996 The European Bioinformatics Institute (EBI) databases. Nucleic Acids Research 24, 612. (DOI 10.1093/nar/24.1.6) [21] Ascoli, G. A., Donohue, D. E. & Halavi, M. 2007 NeuroMorpho.Org: a central resource for neuronal morphologies. J. Neuroscience 27, 92479251. (DOI 10.1523/JNEUROSCI.205507.2007)
21
[22] Insel, T. R., Volkow, N. D., Li, T., Battey, J. F. & Landis, S. C. 2003 Neuroscience networks: data-sharing in an information age. PLoS Biology 1, E17. (DOI 10.1371/journal.pbio.0000017) [23] Pool, R. 2002 Bioinformatics: Converting Data to Knowledge, Workshop Summary. Washington, DC: National Academy Press. [24] Mackenzie-Graham, A. J., Van Horn, J. D., Woods, R. P., Crawford, K. L. & Toga, A. W. 2008 Provenance in Neuroimaging. NeuroImage 42, 178195. (DOI 10.1016/j.neuroimage.2008.04.186) [25] Van Horn, J. D. & Toga, A. W. 2009 Is it time to re-prioritize neuroimaging databases and digital repositories? NeuroImage 47, 17201734. (DOI 10.1016/j.neuroimage.2009.03.086) [26] Bird, R. & Moor, O. D. 1996 The Algebra of Programming. London: Prentice Hall. [27] Pierce, B. C. 2002 Types and Programming Languages. Cambridge, MA: MIT Press. [28] Hindley, J. R. 2008 Basic Simple Type Theory. Cambridge: Cambridge University Press. [29] Bechhofer, S. 2007 OWL Web Ontology Language Reference. W3C Recommendation [30] Whitehead, A. N. & Russell, B. 1927 Principia Mathematica. Cambridge: Cambridge University Press. [31] Hughes, J. 1989 Why functional programming matters. The Computer Journal 32, 98. [32] Martin-Lf, P. 1985 Intuitionistic Type Theory. Naples: Prometheus Books. o [33] Sussman, G. J. & Wisdom, J. 2001 Structure and Interpretation of Classical Mechanics. Cambridge, MA: The MIT Press. [34] Karczmarczuk, J. 2003 Structure and interpretation of quantum mechanics: a functional framework. Proceedings of the 2003 ACM SIGPLAN workshop on Haskell , 5061. [35] Fontana, W. & Buss, L. W. 1994 What would be conserved if the tape were played twice? PNAS 91, 757761. [36] de Bruijn, N. G. 1968 The mathematical language AUTOMATH, its usage and some of its extensions. Lecture notes in Mathematics 125, 2961.
22
[37] Harrison, J. 2009 Handbook of Practical Logic and Automated Reasoning. Cambridge: Cambridge University Press. [38] McCarthy, J. 1960 Recursive functions of symbolic expressions and their computation by machine, Part I. Communications of the ACM 3, 184195. [39] Rind, F. C. & Simmons, P. J. 1992 Orthopteran DCMD neuron: a reevaluation of responses to moving objects. I. Selective responses to approaching objects. J. Neurophysiology 68, 16541666. [40] Gabbiani, F., Mo, C. & Laurent, G. 2001 Invariance of angular threshold computation in a wide-eld looming-sensitive neuron. J. Neuroscience 21, 314329. [41] OShea, M. & Rowell, C. H. 1976 The neuronal basis of a sensory analyser, the acridid movement detector system. II. Response decrement, convergence, and the nature of the excitatory aerents to the fan-like dendrites of the LGMD. J. Experimental Biology 65, 289308. [42] Hatsopoulos, N., Gabbiani, F. & Laurent, G. 1995 Elementary computation of object approach by wide-eld visual neuron. Science 270, 10001003. [43] Gibson, F., Overton, P. G., Smulders, T. V., Schultz, S. R., Eglen, S. J., Ingram, C. D., Panzeri, S., Bream, P., Sernagor, E., Cunningham, M., et al. 2008 Minimum Information about a neuroscience investigation (MINI) electrophysiology. Nature Precedings (DOI 10.1038/npre.2008.1720.1) [44] Robinson, H. P. & Kawai, N. 1993 Injection of digitally synthesized synaptic conductance transients to measure the integrative properties of neurons. J. Neuroscience Methods 49, 15765. [45] Sharp, A. A., ONeil, M. B., Abbott, L. F. & Marder, E. 1993 Dynamic clamp: computergenerated conductances in real neurons. J. Neurophysiology 69, 992995. [46] Connor, J. A. & Stevens, C. F. 1971 Voltage clamp studies of a transient outward membrane current in gastropod neural somata. J. Physiology 213, 2130. [47] Mitchell, S. J. & Silver, R. A. 2003 Shunting inhibition modulates neuronal gain during synaptic excitation. Neuron 38, 433445.
23
[48] Carnevale, N. T. & Hines, M. L. 2006 The NEURON Book. Cambridge: Cambridge University Press. [49] Traub, R. D., Wong, R. K., Miles, R. & Michelson, H. 1991 A model of a CA3 hippocampal pyramidal neuron incorporating voltage-clamp data on intrinsic conductances. J. Neurophysiology 66, 635650. [50] Murray-Rust, P. & Rzepa, H. S. 2002 Scientic publications in XML - towards a global knowledge base. Data Science 1, 8498. [51] Bower, M. R., Stead, M., Brinkmann, B. H., Dufendach, K. & Worrell, G. A. 2009 Metadata and annotations for multi-scale electrophysiological data. Conf Proc IEEE Eng Med Biol Soc 2009, 28112814. (DOI 10.1109/IEMBS.2009.5333570) [52] Taylor, C. F., Paton, N. W., Lilley, K. S., Binz, P., Julian, R. K., Jones, A. R., Zhu, W., Apweiler, R., Aebersold, R., Deutsch, E. W., et al. 2007 The minimum information about a proteomics experiment (MIAPE). Nature Biotechnology 25, 887893. (DOI 10.1038/nbt1329) [53] Raikov, I., Preyer, A. & Butera, R. J. 2004 MRCI: a exible real-time dynamic clamp system for electrophysiology experiments. J. Neuroscience Methods 132, 109123. [54] Ermentrout, B. 1987 Simulating, Analyzing, and Animating Dynamical Systems: A Guide to XPPAUT for Researchers and Students. Philidelphia, PA: Society for Industrial Mathematics. [55] Kriegeskorte, N., Simmons, W. K., Bellgowan, P. S. & Baker, C. I. 2009 Circular analysis in systems neuroscience: the dangers of double dipping. Nature Neuroscience 12, 535540. (DOI 10.1038/nn.2303) [56] Kennedy, A. J. 1997 Relational parametricity and units of measure. Proceedings of the 1997 Symposium on Principles of Programming Languages 25, 442455. [57] Gelman, A. & Hill, J. 2006 Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press. [58] Gilks, W. R., Thomas, A. & Spiegelhalter, D. J. 1994 A language and program for complex Bayesian modelling. The Statistician 43, 169. (DOI 10.2307/2348941)
24
[59] Goldstein, H., Browne, W. & Rasbash, J. 2002 Multilevel modelling of medical data. Stat Med 21, 3291315. (DOI 10.1002/sim.1264) [60] Fischer, B. & Schumann, J. 2003 AutoBayes: a system for generating data analy-
sis programs from statistical models. J. Functional Programming 13, 483508. (DOI 10.1017/S0956796802004562) [61] Matheson, T., Rogers, S. M. & Krapp, H. G. 2004 Plasticity in the visual system is correlated with a change in lifestyle of solitarious and gregarious locusts. J. Neurophysiology 91, 112. (DOI 10.1152/jn.00795.2003) [62] Westereld, M. 1994 The Zebrash Book: A Guide for the Laboratory Use of Zebrash (Danio Rerio). Eugene, OR: Univ. of Oregon Press. [63] McDearmid, J. R., Liao, M. & Drapeau, P. 2006 Glycine receptors regulate interneuron dierentiation during spinal network development. PNAS 103, 96799684. (DOI 10.1073/pnas.0504871103)
25
Figure Legends
Figure 1. Diagram of an experiment to record the looming response from a locust DCMD neuron, showing the rst ve recorded trials from one animal. Experiment design: blue lines,
l simulated object size-to-approach speed ratio ( |v| ) for given approach trial, red lines, simulated
object distance, red triangles, apparent collision time. Observed signal: black lines, recorded extracellular voltage. The largest amplitude deections are DCMD spikes. Analysis: green dots, DCMD spikes, with randomly jittered vertical placement for display, thin black line, spike rate histogram with 50 ms bin size. The inter-trial interval of four minutes is not shown. Figure 2. A, Spike rate histograms for approaches with
l |v|
ms bin size, with collision time indicated by a black triangle. B, Scatter plot of number of counted spikes against approach of spiking against
l |v| l |v|
Figure 3. A, recorded intracellular voltage following conductance injections of a unitary simulated synaptic conductance, in the presence of A-type potassium conductances of increasing magnitude (values given are for the maximal conductance gA ). B, as A, but with a simulated presynaptic spike train with inter-spike intervals drawn from a Poisson distribution (here a mean of 120s1 ; the spike trains used to test the dierent levels of A-type conductance are identical). C, the postsynaptic spike rate plotted against the rate of simulated presynaptic inputs, with gA as in A.
26
Quantity Voltage across the cell membrane Ion concentration Animal location in 2D Action potential Action potential waveforms Spike detection threshold Spike interval Synaptic potential amplitude Drug present Trial with parameter Visual stimulus Lab notebook
Type Signal Float Signal Float Signal (Float Float) Event () Event (Signal Float) Duration Float Duration () Event Float Duration () Duration Signal Shape Event String
Figure 1.
l/|v| (s) 0 0 50 25 0.01 0.04 Distance (m) Voltage Spikes Frequency (s-1) 100
200
1
2s
A 300 Spike rate (s-1) 250 200 150 100 50 0 0 B 120 Spikes per trial 100 80 60 40 0.01 C Peak spike rate (s-1) 350 300 250 200 150 100 0.01 0.02 0.03 l/|v| (s) 0.04 0.05 0.02 0.03 l/|v| (s) 0.04 0.05 1 2 3 Time (s) 4 5 l/|v| = 0.04 s l/|v| = 0.02 s l/|v| = 0.01 s
Figure 2.
A -50 -55 Vm (mV) -60 -65 -70 -75 1.70 B -30 Vm (mV) -40 -50 -60 -70 1.20 C Postsynaptic rate (s-1) 30 20 10 0 0 20 40 60
-1
0 nS 10 nS 40 nS 100 nS
1.71
1.72
1.75
1.76
1.77
1.25
1.40
1.45
80
100
Presynaptic rate (s )
Figure 3.
Supplementary Information
Expression x e f x x e :: t let x = e in y if p then e1 else e2 (x , y) {: e :} :s: Ds s0 delay s p ?? s x < src e snk > Denotes The function that takes argument x and returns e Apply the function (or function-value expression) f to the value x The value of the variable x Annotation: the expression e has type t Dene x as the value of e in the expression y If p is True then yield e1 ; if p is False yield e2 The pair (Cartesian product) of x and y The signal whose value is given by expression e The value of the signal, in the temporal context of the surrounding {: . . . :} brackets The derivative of signal s. When used on the left-hand side of a denition, it introduces a dierential equation The initial value of the signal s (can also be used on the left-hand side of a denition) The signal s, delayed by a short time period Events that occur when the value of s satises the predicate p (Top-level only) Bind the value x to the observation of the source src (Top-level only) Send the value e to the sink snk
Function
Type (Time Time Time Time) Duration Duration Signal Float Event Float Event Signal Signal
Description Apply a function to adjust the beginning and end of each duration occurrence Calculate the centre of mass (time of the event) and area (tag of the event) of a signal Align signal around event occurrences Subtract from a signal its mean value between two time points All Events/Durations/Signals (g) occurring before the rst occurrence of an Event/Duration/Signal (f ), with an analogous after Durations when successive inter-event occurrence intervals are smaller than a set minimum Convolve a signal with an event Events/Durations/Signals (f ) that lie within occurrences in a duration Count events in each occurrence Create a duration from start and stop events Replace the tag of each occurrence with the time period to the next occurrence Delay each event occurrence by a xed amount of time Peak value of each signal segment Smooth a signal with the binomial lter Change all tags of events or durations (f ) to a xed value Exclude events or durations (f ) where the tag does not satisfy a predicate
adjustDur
area around
baseline
Float Float Signal Float Signal Float f g g Float Event Duration () Signal Float Event Float Signal Float
before
burst
convolveSE
Duration Event Duration Float Event Event Duration Event Event Float Float Event Event Signal Event Int Signal Float Signal Float f f ( Bool ) f f
Duration f f
Table S2. Examples of common operations in CoPE for generic manipulation of signals, events and durations.
module Looming where lov = inverseSecs 4.0000e-2 l = m 0.2980 v = l/(lov*2) distance = {: (min (v*(<: seconds :> - 5))) (-0.1800) :} loomingSquare = {: colour (0,0,0) (translate (0, 0, <: distance :>) (cube l)) :} loomingSquare *> screen "" _tmax=secs 6 _dt = secs 5.0e-5 voltage <* ADC 0 (kHz 20) voltage *> store "" m l = l inverseSecs x = x kHz r = 1000*r secs t = t metaData x = [((0,tmax), x)] electrophysiologyType = metaData "Extracellular" genus = metaData "schistocerca" species = metaData "gregaria" morph = metaData "Gregarious" developmentalStage = metaData "Adult" locationStructure = metaData "Neck connectives" electrodeConfiguration = metaData "Bipolar Hook" targetCellType = metaData "FETi" recordingCondition = metaData "invivo awake" containingDevice = metaData "Room temperature" lowPassCut = metaData (kHz 5) highPassCut = metaData (Hz 50) amplifier = metaData "NeuroLog NL104" amplifierGain = metaData 1000 Listing 1. Unformatted code for experiment trials for Example 1, related to Figure 1 and 2.
module DynamicClamp where gampa = nS 0.5 rate = 50 gmaxk = nS 10 _tmax = seconds 2 _dt = seconds 5.0e-5 alpha tau t = if t<0.0 then 0.0 else (t/tau)*exp (1-t/tau) gsyn = {: gampa* (alpha 0.005 <: seconds:>) :} stage gsyn -1 rawv, celli, vm, a, b, iA :: Signal Float preSpike :: [(Float, ())] preSpike <* poissonTrain rate rawv <* ADC 0 (kHz 20) vm = {: <: rawv:> * 0.10 :} gsyn = convolveSE gsyn (tag 1 (forget 0.1 preSpike)) D a = {: alphaa <: vm:> * ( 1 - <:a:>) betaa <:vm:> * <: a :> :} a_0 = 0.025 D b = {: alphab <: vm:> * ( 1 - <:b:>) betab <:vm:> * <: b :> :} b_0 = 0.9 alphaa, betaa, alphab, betab :: Float -> Float alphaa v = kaa1*(v+kaa2)/((exp ((-kaa2 -v)/kaa3)) -1) betaa v = kba1*(v+kba2)/((exp ((v+kba2)/kba3))-1) alphab v = kab1*(v+kab2)/((exp ((-kab2 -v)/kab3)) -1) betab v = kbb1*(v+kbb2)/((exp ((v+kbb2)/kbb3))-1) kaa1 kaa2 kaa3 kba1 kba2 kba3 = = = = = = inverseVolts (-2.0e5) volts 0.0469 volts 0.01 inverseVolts 1.75e5 volts 0.0199 volts 0.01
iA = {: gmaxk * <:a:> * <:b:>*(0.08+<:vm:>) :} iA_0 = 0 celli = {: (0-<:vm:>) * <:gsyn:> outv = {: <:celli:> * 1.0e9 :} outv *> DAC 0 (kHz 20) vm *> store "" metaData x = [((0,tmax), x)] electrophysiologyType = metaData "Intracellular" genus = metaData "danio" species = metaData "rerio" age = metaData "2 dpf" locationStructure = metaData "Spinal Cord" cellType = metaData "Spinal cord motoneuron" recordingCondition = metaData "invivo awake" containingDevice = metaData "Room temperature" electrodeConfiguration = metaData "Patch clamp" lowPassCut = metaData (kHz 3) amplifier = metaData "BioLogic RK400" Listing 2. Code for the trials in Example 2, related to Figure 3. - <: iA:> :}
Metadata representation Variable name (implicit) experimenter experimentalContext electrophysiologyType genus species strain cellLine geneticCharacteristics clinicalInformation sex age developmentalStage subjectLabel subjectIdentier subjectDetails preparationProtocol locationStructure brainArea sliceThickness sliceOrientation cellType behaviouralEvent behaviouralEquipment recordingCondition containingDevice solutions owSpeed electrode electrodeConguration Duration String Duration String Duration String Duration String Duration String Duration String Duration String Duration String Duration String Duration String Duration String Duration String Duration String Duration String Duration String Duration String Duration String Duration String Duration Float Duration String Duration String Duration String Duration String Duration String Duration String Duration String Duration Float Duration String Duration String Type MINI: Electrophysiology item Date and time Responsible person or role Experimental Context Electrophysiology Type Genus Species Strain Cell line Genetic characteristics Clinical Information Sex Age Developmental Stage Subject label Subject identier Associated subject details Preparation protocol Location structure Brain area Slice thickness Slice orientation Cell type Behavioural Event Behavioural Equipment Recording Condition Containing device Solutions Solution ow speed Electrode Electrode conguration 6
electrodeImpedance amplier amplierGain lter lowPassCut highPassCut recorder (implicit) (implicit) (implicit)
Duration Float Duration String Duration Float Duration String Duration Float Duration Float Duration String
Electrode impedance Amplier Amplier Filter Filter settings Filter settings Recorder Data format Sampling Rate File Location
Table S3. Meta-data representation in CoPE variables based on Minimum Information about a Neuroscience Investigation (MINI): Electrophysiology [43]. Additional information, or information from other kinds of physiological experiments can be added as needed, and irrelevant, inappropriate or unknown variables can be left blank or unassigned. Information about the task, stimulus and time series data (Ref 43, sections 4,5 and 8) is represented by machine-executable equations in CoPE ((implicit) in Table).
Sources and Sinks Sources and sinks link the functional equations with the physical world. We stress that they are not expressions, do not evaluate to values (unlike all other constructs), and can only be used at the top level. What happens if the same source is observed more than once in a description of an experiment? If the source refers to a single, physical input port such as a channel of an analog-to-digital converter, the result will necessarily be the same, because the same entity is being observed within a single run of the experiment. Such sources are called idempotent. Idempotency ensures that separate experiments referring to a common external variable can be composed easily with a predictable outcome. However, there are other kinds of sources, notably random sources as discussed below, where idempotency is not desirable. Each occurrence of a non-idempotent source is thus a separate, independent source, even if the name of the source and the parameters happen to be the same. What happens if the same sink is dened more than once? One could imagine combining the dening signals in various ways. For example, in the case of a simple numerical signal, they could simply be added, mirroring superposition of waves in the physical world. However, as our signals are more general, it is not always clear what the appropriate notion of addition should be. For example, if we have signals carrying images, and we wish to output these to a single graphical display, it is likely that we also need to describe aspects such as which one should be on top. Thus, for exibility and clarity, combination of output signals has to be described explicitly, and it is an error to dene a sink more than once in an experimental description. There are also operations in experiments that are not related to real-world observation or to purely functional computation. One example is sampling from probability distributions. We have implemented sources corresponding to common parametrised probability distributions, such that experiments can sample values from the distributions and use these values in computations or connect them to sinks. However, these sources are not idempotent as it is important that there are no accidental correlations. Sharing of a single random signal, when needed, can be described by binding that signal to a variable as discussed above and using the variable to refer to the signal instead of repeating the reference to the random source. In this more general view, sources and sinks bridge referentially transparent and non-transparent computations.
To demonstrate the versatility of CoPE, we present here an extended spike detection algorithm that incorporates template matching. We rst detect putative spikes as any deection that falls outside three times the standard deviation of the band-pass ltered signal. noiseSD = sigSD ecVolts
putatives = (v v < (3 noiseSD) v > (3 noiseSD)) ?? ecVolts For the purposes of this demonstration, we use the spikes found with a manual threshold manualSpikes as presented in the main body of this paper to construct a template of the largest spikes from the extracellular waveform ecVolts with which to compare putative spikes. manualSpikes are not used again, so the template matching algorithm presented here is a renement of the manual threshold analysis. template = take 1 $ averageSigs $ limitSigs (0.001) 0.001 $ around manualSpikes ecVolts Note the $ is used to avoid excessive parentheses. f $ y is dened as f y but guarantees that y is treated as a seperate term. For instance, f $ x + y = f (x + y) but f x + y = (f x ) + y. Here, around aligns the extracellular waveform ecVolts around manualSpikes events, returning one signal segment per event occurrence. limitSigs t1 t2 cuts each signal segment such that only the signal between timepoints t1 and t2 remains. Finally, averageSigs averages a list of signal segments and returns a list of three signals, the mean, the mean plus the s.e.m. and the mean minus the s.e.m. We only need the rst element (take 1) of this list, shown in Figure S1A. Our standard functions for data analysis operate on lists of event or duration occurrences. In order to use these with individual timepoints in the subsequent analysis, we write a simple utility function that takes a timepoint and creates such a list of event occurrences, each tagged with the unit type. ev t = [(t, ())] Finally, we implement simple template matching as a single function rms that transforms putative spikes into spikes tagged with the goodness-of-t (here, the root mean square dierence between the signal and the template). We align the extracellular voltage ecVolts around the event occurence ev t, and subtract this signal from the template where this is dened (-1 to 1 ms). This new signal is transformed (smap) with the square function (2) before summation and taking the square root. This root mean square value is paired with the event time, such that 9
the rms function can be used to create a new list of events (rmsEvs, below) with the appropriate temporal context and a real-valued tag designating the extent of template match. rms (t, ) = (t, sqrt $ sumSig $ smap (2) $ template (around (ev t) ecVolts)) A single bi- or triphasic spike may generate several putative spikes as positive and negative deections exceed the threshold. We use the onAdjacent function to step through pairs of adjacent spikes. onAdjacent is parametrised by a function that decides which out of two successive spikes should be kept; here, f expresses that if two spikes occur within 2 ms, then only the one with the lowest r.m.s. value should be kept. f (t1 , rms1 ) (t2 , rms2 ) | t2 t1 > 0.002 = [(t1 , rms1 ), (t2 , rms2 )] | rms1 < rms2 | otherwise = [(t1 , rms1 )] = [(t2 , rms2 )]
rmsEvs = onAdjacent f $ map rms putatives A histogram of the r.m.s. values for a stable recording period of 40 approaches is shown in Figure S1B, and rmsEvs is shown with a short signal segment in Figure S1C. The histogram can be used to set cut-os for dierent components; Figure S1D shows the average waveforms for putative spikes for two such groups.
A 0.02 0.01 0 -0.01 -0.02 -0.03 -1.0 C -0.5 0.0 time (ms) D 0.02 0.01 voltage 0 -0.01 -0.02 0.25 time (s) 0.5 -0.03 -1.0 -0.5 0.0 time (ms) 0.5 1.0 0.5 1.0 Frequency voltage B 0.025 0.02 0.015 0.01 0.005 0 0.1 0.2 0.3 0.4
Figure S1. A, template used to identify spikes. B, histogram of root mean square dierences from template for each putative spike over 40 stable trials. C, a segment of the extracellular 10
r.m.s.
voltage (top) with putative events, plotted with their r.m.s. dierences from the template. D, averages of spikes with low (around ((<0.1) // rmsEvs) ecVolts) and high (around ((>0.2) // rmsEvs) ecVolts) r.m.s. dierence from the template.
11