Research Article  Open Access
Estimating Latent Attentional States Based on Simultaneous Binary and Continuous Behavioral Measures
Abstract
Cognition is a complex and dynamic process. It is an essential goal to estimate latent attentional states based on behavioral measures in many sequences of behavioral tasks. Here, we propose a probabilistic modeling and inference framework for estimating the attentional state using simultaneous binary and continuous behavioral measures. The proposed model extends the standard hidden Markov model (HMM) by explicitly modeling the state duration distribution, which yields a special example of the hidden semiMarkov model (HSMM). We validate our methods using computer simulations and experimental data. In computer simulations, we systematically investigate the impacts of model mismatch and the latency distribution. For the experimental data collected from a rodent visual detection task, we validate the results with predictive loglikelihood. Our work is useful for many behavioral neuroscience experiments, where the common goal is to infer the discrete (binary or multinomial) state sequences from multiple behavioral measures.
1. Introduction
1.1. Motivation
In behavioral neuroscience experiments, a common task is to estimate the latent attentional or cognitive state (i.e., the “mind”) of the subject based on behavioral outcomes. The latent cognitive state may account for an internal neural process, such as the motivation and attention. This is important since one can relate the latent attentional or cognitive state to the simultaneous neurophysiological recordings or imaging to seek the “neural correlates” at different brain regions (such as the visual cortex, parietal cortex, and thalamus) [1–4]. Naive determination of such latent states might lead to erroneous interpretations of the result and in some cases even affect the scientific statement. Therefore, it is important to formulate a principled approach to estimate the latent state underlying the behavioral task, such as attention, detection, learning, or decision making [5–9].
In a typical experimental setup of attention task, animals or human subjects are instructed to follow a certain (such as visual or auditory) cue to deliver their attention to execute the task. At each trial, the experimentalist observes the animal’s or subject’s behavioral outcome (which is of an either binary or multiple choice) as well as the latency (or reaction time) from the cue onset until the execution. However, it shall be cautioned that the observed behavior choice does not necessarily reflect the underlying attentional or cognitive state. For instance, a “correct” behavioral choice can be due to either unattended random exploration or attended execution. In contrast, an “incorrect” behavioral choice can be induced by unattended random exploration or attended yet erroneous decision. Therefore, a simple and direct assignment of behavioral outcomes to attentional states can lead to a false statement or misinterpretation on the behavior. To avoid such errors, it is essential to incorporate a priori knowledge or all experimental evidence to estimate the latent state. One direct behavioral measure is the statistics of the latency. Another prior information is the task difficulty and the animal’s overall performance. Based on the animal’s experiences (naive versus welltrained) or the task difficulty, one can make a reasonable assumption about the dynamics of latent state process. Similar rationale also applies to other cognitive tasks that involves latent state, such as learning, planning, and decision making.
Markovian or semiMarkovian models are powerful tools to characterize temporal dependence of time series data. Markovian models assume history independence beyond the consecutive states (whether it is firstorder or highorder dependence), whereas semiMarkovian models allow history dependency; therefore, they are more flexible and they accommodate the Markovian model as special cases. In addition, semiMarkovian models can often be transformed into Markovian models by embedding or augmentation (such as the triplet Markov model) [10]. Typically, Markovian or semiMarkovian models presume stationary probability distributions (for state transition as well as the likelihood function) in time, although this assumption may deviate from the reallife data that often exhibit different degrees of nonstationarity. Despite such deviation, we still believe that Markovian or semiMarkovian models are appropriate for modeling a large class of behavioral data. In addition, statistical models can be adapted to accommodate nonstationarity via online learning, especially for large data sets [11–13].
1.2. State of the Art
In the literature, there has been a few works attempting to estimate latent attentional or cognitive states based on simultaneous binary and continuous behavioral measures [15]. In their work, the latent cognitive state was modeled as a continuousvalued randomwalk process (which is Markovian). The inference was tackled by an expectation maximization (EM) algorithm [16, 17] based on state space analysis [18, 19].
Alternatively, the attentional state can also be characterized by a discrete or binary variable. Assuming that the attentional state is Markovian or semiMarkovian, one can model the latent process via a hidden Markov model (HMM) [20, 21] or a variableduration HMM [22] or a hidden semiMarkov model (HSMM) [23–27]. We use the semiMarkovian assumption here. The contribution of this paper is twofold. First, motivated from neuroscience experiments, we formulate the behavioral attention task as a latent state Markovian problem, which may open a way of data analysis in behavioral neuroscience. Specifically, we extend the explicitduration HMM (or HSMM) to mixed observations (with discrete behavioral outcome and continuous behavioral latency) and derive the associated statistical inference algorithm. This can be viewed as modeling conditionally independent variables with parametric observation distributions in HMM or HSMM [28]. Second, we apply the proposed method to analyze preliminary experimental data collected from a mouse visual attention task.
The rest of the paper is organized as follows. In Section 2, we will present the method that details probabilistic modeling and maximum likelihood inference for the HSMM. Section 3 presents the results from simulated data as well as experimental data collected from freebehaving mice performing a visual detection task. We conclude the paper with discussions in Section 4.
2. Method
2.1. Probabilistic Modeling
We formulate the attention process as a hidden semiMarkov chain of two states, where (0: unattended; 1: attended) denotes the latent binary attention variables at trial . Conditioned on the attention state , we observe discrete (here, binary) choice outcomes (0: incorrect; 1: correct) and continuous, nonnegative latency measures . Unlike the HMM, the HSMM implies that the current state depends not only on the previous state, but also on the duration of previous state [25, 29]. To model such time dependence, we introduce an explicitduration HMM. Specifically, let denote the remaining sojourn time of the current state . In general, the probability distribution of the sojourn time iswhere the indicator function if and zero otherwise. In the case of modeling intertrial dependence, the sojourn time is a discrete random variable ; therefore, the explicit duration distribution can be characterized by a matrix , where () and the integer is the maximum duration possible in any state or the maximum interval between any two consecutive state transitions. Because of the state history dependence, the state transition is only allowed at the end of the sojourn:
Similar to the standard HMM, the HSMM is also characterized by a transition probability matrix (), where , as well as an emission probability matrix , where and . The initial state probability is denoted by . For all matrices , , and , the sum of the matrix rows is equal to one.
Furthermore, we assume the conditional independence between the binary behavioral measure and the continuous behavioral measure ; this implies thatwhere is characterized by a probability density function (PDF) parameterized by . Since the latency variable is nonnegative, we can model it with a probability distribution with positive support, such as exponential, gamma, lognormal, and inverse Gaussian distribution. For illustration purpose, here we model the latency variable with a lognormal distribution :where denotes the univariate latency variable; is normally distributed with the mean and variance ; and . The lognormal distribution is of the exponential family.
Notes the following.(i)Note that it is possible to convert a semiMarkovian chain () to a Markovian chain by defining an augmented state and defining a triplet Markovian train (TMC) [10]. The triplet Markov models (TMMs) are general and rich and consist many Markovtype models as special cases.(ii)If multivariate observations from behavioral measure become available, we can introduce multiple probability distributions (independent case) or multivariate probability distributions (correlated case) to characterize statistical dependency [30].
2.2. Likelihood Inference
The goal of statistical inference is to estimate the unknown latent state sequences and the unknown variables . Following the derivation of [29], here we present an expectationmaximization (EM) algorithm for simultaneous binary and continuous observations.
We first define a forward variable as joint posterior probability of and :and the marginal posterior probabilityIn addition, we define the ratio of the filtered conditional probability over the predicted conditional probability: where the third step of (7) follows from as well as the Markovian property and the last step of (7) follows from the conditional independence between and .
To compute the predictive probability, we define and where . Therefore, the observed data likelihood is given byConditional on the parameters , the expected complete data loglikelihood is written asOptimizing the expected complete data loglikelihood with respect to the unknown parameters yields the maximum likelihood estimate.
Similar to [29], we introduce notations for two conditional probabilities:where denotes the conditional probability of state starting at state and lasts for time units given the observations; and denotes the conditional probability of state transition from to . Note the consistency holds for .
To derive the forwardbackward updates, we further define a backward variable as the ratio of of the smoothed conditional probability over the predicted conditional probability : where the third equality of (14) follows from
For notation convenience, we define another four sets of random variables:where and represent the forward and backward recursions, respectively. Note that we also have [29]
2.3. EM Algorithm
The EM algorithm for the explicitduration HMM consists of a forwardbackward algorithm (Estep) and the reestimation (Mstep). The E and Msteps are run alternatingly to optimize the expected loglikelihood of the complete data (12).
In the Estep of forwardbackward algorithm (note that when , the forwardbackward algorithm reduces to the standard BaumWelch algorithm used for the HMM.), we can recursively update the forward variable and backward variable . Specifically, in the forward update,with an initial value . And in the backward update,with an initial value for any . In the end, we obtain the smoothed conditional probabilities , , and and .
In the Mstep, we use the smoothed probabilities for reestimating the model parameters : where , , , and are normalizing constants such that the sum of probabilities is equal to one. In addition, the unbiased maximum likelihood estimates of in the lognormal distribution are given by where .
Upon the algorithmic convergence (the convergence criterion is set as the consecutive loglikelihood increment is less than a smallvalued threshold, say ), we compute the maximum a posteriori (MAP) estimates of the state and duration as
2.4. Model Selection
In practice, the maximum length of state duration is usually unknown, and we need to estimate the order of the HSMM (since the state dimensionality is fixed here). In statistics, common model selection criteria include the Akaike information criterion (AIC) or Bayesian information criterion (BIC):where denotes the total number of free parameters in the model. Alternative order estimator has been suggested [25]: with .
It shall be emphasized that the AIC and BIC are only asymptotically optimal in the presence of large amount of samples. In practice, experimental behavioral data is often short, and therefore it shall be used with caution or combined with other criteria.
2.5. Alternative Parametric Formulation
Previously, we have assumed a nonparametric probability for (), which has degrees of freedom. Alternatively, we may assume that the state duration is modeled by a parametric distribution, such as the geometric distribution where , , and . In this case, the probabilistic model has degrees of freedom.
For the associated EM algorithm, the Estep remains similar (replacing the calculation of ), whereas the Mstep includes additional step to update the parameters of parametric distribution. For instance, in the case of geometric distribution, the parameter is updated as which is similar to the methods of moments in maximum likelihood estimation.
3. Results
3.1. Simulated Data
Setup. In computer simulations, we set the total number of trials as , with the maximum state duration . We simulate the state sequences and observations using the following matrices: The structure of the matrix implies that, for the unattended state, there is a higher probability for state duration of two; for the attended state, the highest probability is for state duration of three. Conditional on the attentional state, the latency variable is assumed to follow a lognormal distribution: (for the unattended state) and (for the attended state). Two distributions have approximately 13.5% overlap in the area (Figure 1). One realization of simulated latent attentional state sequence and behavioral sequence are shown in Figure 2. Comparing Figures 2(d) and 2(e) in this illustration, we can see the estimate using both behavioral measures is more accurate and closer to the ground truth (Figure 2(a)).
(a)
(b)
(c)
(d)
(e)
(f)
Assessment. Given the observations and , we run the inference algorithm to estimate the state sequence . In the simulation where the ground truth is known, the estimation error is defined as In addition, we define the baseline error as and further compute the relative improvement percentage (RIP): A higher value of RIP implies better improvement in the state estimate. For comparison, we run the HSMMEM algorithm to compute two error rates, one using binary observations only (method 1), the other using both binary and continuous observations (method 2). We also apply the standard HMMEM algorithm to analyze the same data using both binary and continuous observation. Furthermore, we consider two scenarios for HSMM. In the first scenario, we assume that the true model order is known. In the second scenario, we vary the model order by from the true model order (i.e., model mismatch).
We compare the RIP statistic based on 100 independent Monte Carlo runs (although the setup is same, the simulated state sequences and behavioral outcomes are different in each run). The results are summarized in Tables 1 and 2. In both cases, we found that the HSMM (method 2) using both binary and continuous measures yields the best RIP statistic. As expected, when there is a model mismatch from the data, the accuracy of the state estimate degrades.


The results of the HSMM estimate certainly depend on the exact simulation setup. It is expected that when the twostate latency distributions are heavily overlapped (see Figure 1), the estimation error may increase; on the other hand, if the semiMarkovian dynamics can be well approximated by a Markovian dynamics, the difference between the HSMM and HMM will become small. To investigate this issue, we systematically change one of the lognormal distribution (i.e., ) while keeping other parameters unchanged. Essentially, when and are close in value, there will be a strong overlap in the latency distributions. As seen in Table 3, as decreases, the distribution overlap gradually increases; consequently, the performance also gradually degrades. However, the HSMM (method 2) using both binary and continuous behavioral measures still significantly outperforms the HSMM (method 1, comparing Table 1), even in the extreme situation where .

Testing the Robustness to SemiMarkovian Assumption. In addition, we test the robustness of our HSMM and the semiMarkovian assumption for Markoviandriven data. To do that, we generate data from a simple Markovian chain (with a similar setup as before) and then run HMMEM and HSMMEM algorithms to compare their RIP. The Monte Carlo results are summarized in Table 4. As seen in this case, the HMM result is slightly more accurate (yet not statistically significant) than the HSMM results because of the nature of Markovian chain; meanwhile, it also confirms the robustness of the HSMM to the Markovian or semiMarkovian assumption.

Testing the Robustness to Nonstationarity. Next, we test the the robustness of HSMM and the EM algorithm to nonstationarity. We test two types of nonstationarity: state transition and slow drift of parameter in the likelihood model. In the first case, we consider the state transition in the second half of data sequences are governed by a slightly different probability: yet the other model parameters and remain unchanged. We reestimate the state sequences from simulated data (using HSMM method 2) from 100 independent Monte Carlo runs and obtain the RIP () statistic as .
In the second case, we allow the parameters of lognormal distribution slightly drift in the second half of data sequences: , (state 1) and , (state 2), yet the other model parameters and remain unchanged. Namely, in the second half, the mean and standard deviation statistics of the latency are reduced for both states and their mode gap is also narrowed. For the new data, we reestimate the state sequences from 100 independent Monte Carlo runs and obtain the RIP () statistic as .
The result of the first case is not significantly different from that of the stationary setup, and the estimation accuracy in the second case is slightly reduced. The reduction is mostly because the latency variable is more informative in determining the attentional state. Overall, it is concluded that the HSMM method with mixed observations is rather robust to data nonstationarity.
3.2. Experimental Data
Protocol and Animal Behavior. All experiments were performed in VGATcre mice and conducted according to the guidelines of Institutional Animal Care and Use Committee at Massachusetts Institute of Technology and the US National Institutes of Health. All behavioral and physiological data were collected by Dr. Michael Halassa and his team. For details, see [14, 31].
Mice were trained on a visual detection task that requires attentional engagement. Experiments were conducted in a standard modular test chamber. The front wall contained two white light emitting diodes, 6.5 cm apart, mounted below two nosepokes. A third nosepoke with response detector was centrally located on the grid floor, 6 cm away from the base wall and two small Plexiglas walls (3 × 5 cm), opening at an angle of 20, served as a guide to the poke. All nosepokes contained an infrared LED/infrared phototransistor pair for response detection. At the level of the floormounted poke, two headphone speakers were introduced into each sidewall of the box, allowing for sound delivery. Trial logic was controlled by custom software running on a microcontroller. Liquid reward consisting of 10 μL of evaporated milk was delivered directly to the lateral nosepokes via a singlesyringe pump.
A white noise auditory stimulus signaled the opportunity to initiate a trial. Mice were required to hold their snouts for 0.5–0.7 s into the floor mounted nosepoke unit for successful initiation (stimulus anticipation period). Following initiation, a stimulus light (0.5 s) was presented either to the left or to the right. Responding at the corresponding nosepoke resulted in a liquid reward (10 μL evaporated milk) dispensed directly at the nosepoke (see Figure 3).
Model Selection and Assessment of Behavioral Data. The animal behavior (performance and latency) varied at different experimental sessions. The number of trials per session varied between 73 and 152 (mean ± SD: ). The average error rate of the visual detection task across total 20 sessions from two animals is % (mean ± SD; minimum 6%, maximum 51%). Although the number of states is fixed to two, the model order parameter remains to be determined. For the two experimental sessions studied here, their basic statistics are shown in Table 5. Notably, for Dataset 1, the average latency is longer (yet statistically nonsignificant, , ranksum test) in incorrect trials than correct trials, whereas for Dataset 2, the average latency is shorter (yet statistically nonsignificant, , ranksum test) in incorrect trials than correct trials.

We use 80% data samples for parameter estimation and the remaining 20% for evaluation. In model selection, we compute the AIC and BIC to select a suboptimal . The model selection results for two experimental datasets are shown in Figure 4. Specifically, we found that, for Dataset 1, there is no local minimum within the range of 2 to 9 based on both criteria; whereas for Dataset 2, there is a local minimum based on the AIC. As a demonstration, Figure 5 presents the estimated state sequences from Dataset 2 based on (Dataset 2). Notably, the estimate of state sequences is nearly identical using (if based on the predictive loglikelihood of Table 4). In this case, we observe a relatively big discrepancy between the observed behavioral outcomes and the estimated state sequences. This may be partially due to the high error rate (around 51%) in behavior during this session; notably, unlike most of other sessions, this dataset has an abnormal statistic in that the average errortrial latency is shorter than the average correcttrial latency. Other possible reasons can be the insufficiency of the HSMM or model mismatch or the local maximum of EM optimization. Some of these points will be discussed in the next section.
Since there is no “ground truth” for the attentional state sequences, we also compute the predictive loglikelihood of the 20% heldout data (Table 6). In Table 6, the lowest predictive loglikelihood value is obtained for for Dataset 1 and for Dataset 2.

4. Discussion
In this paper, we have proposed a probabilistic modeling and inference framework for estimating latent attentional states based on simultaneous binary and continuous behavioral measures. The proposed model extends the standard HMM by explicitly modeling the state duration distribution, which yields a special example of the HSMM. The semiMarkovian assumption provides greater flexibility to characterize latent state dynamics.
Estimation of latent attentional states allows us to better interpret the neurophysiological data. Our framework for estimating attentional states is by no means limited by the behavioral measures considered here. In human attention tasks, we may also incorporate other sorts of behavioral measures, such as psychophysics [32].
Bayesian Inference and Model Extension. For the simultaneous binary and continuous behavioral measures, we have extended the maximumlikelihood based EM algorithm of [29] for estimating the HSMM parameters, and we have used the AIC or BIC for model selection. The likelihood inference may not yield consistent estimate given a small sample size (in our setup, the sample size is around 100, whereas the degree of freedom in the parameters is around 10–14). This imposes a strong limitation of the likelihood method on model selection in the presence of short behavioral data sequences. An alternative approach is to consider Bayesian inference, either variational or samplingbased Bayesian methods [33–35]. The Bayesian methods may potentially help alleviate the local optimum problem experienced in the likelihoodbased EM optimization. Another possibility is to employ the Monte Carlo EM algorithm [26], in which the Estep replaces the traditional BaumWelch algorithm with reversible jump Markov chain Monte Carlo (MCMC) sampling (where the number of transitions is unknown), and the state estimate is given by the average of Monte Carlo samples [26, 36]. In this case, the estimate obtained from the standard EM algorithm can serve as the initial point for the reversible jump MCMC algorithm [26]. Development of efficient Bayesian inference algorithms will be subject of future work.
The HSMM, or the explicitduration HMM, is closely related to other work in the literature, such as the sticky HMM [37], sticky HDPHMM [38], and HDPHSMM [39]. In these lines of work, the number of states is characterized by a hierarchical Dirichlet process (HDP). Although this is not the issue in our paper (i.e., the number of states is fixed to be two), it may be considered in other multiplestate estimation scenarios. Another possible model extension is to consider a nonparametric Bayesian formulation that allows infinite state duration in HSMM (provided that a large amount of data become available).
Verification of Experimental Data Analysis. In experimental data analyses, it is likely that our proposed probabilistic model is insufficient to capture the underlying state dynamics (e.g., nonstationary or switching state dynamics [40]), or that there might be a model mismatch between the empirical latency distribution and the assumed parametric distribution (e.g., lognormal, gamma, or inverse Gaussian). In all analyses, we have witnessed two types of estimation results: one is that the outcome is correct, yet the state is determined to be unattended (i.e., , ); another is that the outcome is incorrect, yet the state is identified to be attended (i.e., , ). Since there is no ground truth, it would be reassuring to have another independent measure to corroborate the attentional state estimate. Alternatively, according to the prior knowledge of practical requirement, one may need to formulate a “behaviorally constrained” model and derive a specific “constrained” inference algorithm. This line of research remains to be investigated in the future work.
The ultimate goal of behavioral analysis is to corroborate the neurophysiological data. Therefore, it is also important to verify the results by examining the neural correlates of the attention tasks. This can be in the form of either neuronal firing rate, spike timing or phase synchrony or oscillatory dynamics (power or phase), or LFP evoked potentials, by which one can establish a robust relationship between the attended state and the physiology. In the absence of ground truth, we can rely on the “consistency truth” (condition 1: , and condition 2: , ) and compare their differences in neural correlates. However, detailed experimental investigation of attentional neural correlates is beyond the scope of current paper.
Conflict of Interests
The author declares that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
The author thanks Dr. Michael Halassa (New York University Neuroscience Insititute) for kindly providing the animal behavior data for analysis. Z. Chen is supported by the NSFCRCNS (Collaborative Research in Computational Neuroscience) Award (IIS1307645) from the US National Science Foundation.
References
 J. T. Coull, “Neural correlates of attention and arousal: insights from electrophysiology, functional neuroimaging and psychopharmacology,” Progress in Neurobiology, vol. 55, no. 4, pp. 343–361, 1998. View at: Publisher Site  Google Scholar
 S. Treue, “Neural correlates of attention in primate visual cortex,” Trends in Neurosciences, vol. 24, no. 5, pp. 295–300, 2001. View at: Publisher Site  Google Scholar
 C. C. Rodgers and M. R. DeWeese, “Neural correlates of task switching in prefrontal cortex and primary auditory cortex in a novel stimulus selection task for rodents,” Neuron, vol. 82, no. 5, pp. 1157–1170, 2014. View at: Publisher Site  Google Scholar
 G. Rainer and E. K. Miller, “Neural ensemble states in prefrontal cortex identified using a hidden Markov model with a modified EM algorithm,” Neurocomputing, vol. 3233, pp. 961–966, 2000. View at: Publisher Site  Google Scholar
 J. H. Reynolds, T. Pasternak, and R. Desimone, “Attention increases sensitivity of V4 neurons,” Neuron, vol. 26, no. 3, pp. 703–714, 2000. View at: Publisher Site  Google Scholar
 T. J. Buschman and E. K. Miller, “Topdown versus bottomup control of attention in the prefrontal and posterior parietal cortices,” Science, vol. 315, no. 5820, pp. 1860–1862, 2007. View at: Publisher Site  Google Scholar
 J. M. Dantzker, “Bursting on the scene: how thalamic neurons grab your attention,” PLoS Biology, vol. 4, no. 7, article e250, 2006. View at: Publisher Site  Google Scholar
 T. D. Barnes, Y. Kubota, D. Hu, D. Z. Jin, and A. M. Graybiel, “Activity of striatal neurons reflects dynamic encoding and recoding of procedural memories,” Nature, vol. 437, no. 7062, pp. 1158–1161, 2005. View at: Publisher Site  Google Scholar
 S. Wirth, E. Avsar, C. C. Chiu et al., “Trial outcome and associative learning signals in the monkey hippocampus,” Neuron, vol. 61, no. 6, pp. 930–940, 2009. View at: Publisher Site  Google Scholar
 P. Lanchantin, J. LapuyadeLahorgue, and W. Pieczynski, “Unsupervised segmentation of randomly switching data hidden with nonGaussian correlated noise,” Signal Processing, vol. 91, no. 2, pp. 163–175, 2011. View at: Publisher Site  Google Scholar
 G. Mongillo and S. Deneve, “Online learning with hidden Markov models,” Neural Computation, vol. 20, no. 7, pp. 1706–1716, 2008. View at: Publisher Site  Google Scholar  MathSciNet
 W. Khreich, E. Granger, A. Miri, and R. Sabourin, “A survey of techniques for incremental learning of HMM parameters,” Information Sciences, vol. 197, pp. 105–130, 2012. View at: Publisher Site  Google Scholar
 M. J. Johnson and A. S. Willsky, “Stochastic variational inference for Bayesian time series models,” in Proceedings of the 31st International Conference on Machine Learning, 2014. View at: Google Scholar
 M. M. Halassa, Z. Chen, R. D. Wimmer et al., “Statedependent architecture of thalamic reticular subnetworks,” Cell, vol. 158, no. 4, pp. 808–821, 2014. View at: Publisher Site  Google Scholar
 M. J. Prerau, A. C. Smith, U. T. Eden et al., “Characterizing learning by simultaneous analysis of continuous and binary measures of performance,” Journal of Neurophysiology, vol. 102, no. 5, pp. 3060–3072, 2009. View at: Publisher Site  Google Scholar
 A. C. Smith and E. N. Brown, “Estimating a statespace model from point process observations,” Neural Computation, vol. 15, no. 5, pp. 965–991, 2003. View at: Publisher Site  Google Scholar
 A. C. Smith, L. M. Frank, S. Wirth et al., “Dynamic analysis of learning in behavioral experiments,” Journal of Neuroscience, vol. 24, no. 2, pp. 447–461, 2004. View at: Publisher Site  Google Scholar
 Z. Chen, R. Barbieri, and E. N. Brown, “Statespace modeling of neural spike train and behavioral data,” in Statistical Signal Processing for Neuroscience and Neurotechnology, K. Oweiss, Ed., pp. 175–218, Elsevier, 2010. View at: Google Scholar
 Z. Chen and E. Brown, “State space model,” Scholarpedia, vol. 8, no. 3, Article ID 30868, 2013. View at: Publisher Site  Google Scholar
 L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, 1989. View at: Publisher Site  Google Scholar
 O. Cappé, E. Moulines, and T. Ryden, Inference in Hidden Markov Models, Springer, New York, NY, USA, 2005.
 C. D. Mitchell, M. P. Harper, and L. H. Jamieson, “On the complexity of explicit duration HMM's,” IEEE Transactions on Speech and Audio Processing, vol. 3, no. 3, pp. 213–217, 1995. View at: Publisher Site  Google Scholar
 K. P. Murphy, “Hidden semiMarkov models (HSMMs),” Tech. Rep., Massachusetts Institute of Technology (MIT), Cambridge, Mass, USA, 2002. View at: Google Scholar
 Y. Guédon, “Estimating hidden semiMarkov chains from discrete sequences,” Journal of Computational and Graphical Statistics, vol. 12, no. 3, pp. 604–639, 2003. View at: Publisher Site  Google Scholar  MathSciNet
 S.Z. Yu, “Hidden semiMarkov models,” Artificial Intelligence, vol. 174, no. 2, pp. 215–243, 2010. View at: Publisher Site  Google Scholar  MathSciNet
 Z. Chen, S. Vijayan, R. Barbieri, M. A. Wilson, and E. N. Brown, “Discrete and continuoustime probabilistic models and algorithms for inferring neuronal UP and DOWN states,” Neural Computation, vol. 21, no. 7, pp. 1797–1862, 2009. View at: Publisher Site  Google Scholar  MathSciNet
 J. M. McFarland, T. T. G. Hahn, and M. R. Mehta, “Explicitduration hidden Markov model inference of UPDOWN states from continuous signals,” PLoS ONE, vol. 6, no. 6, Article ID e21606, 2011. View at: Publisher Site  Google Scholar
 W. Zucchini and I. L. MacDonald, Hidden Markov Models for Time Series: An Introduction, vol. 110 of Monographs on Statistics and Applied Probability, Chapman and Hall, New York, NY, USA, 2009. View at: Publisher Site  MathSciNet
 S.Z. Yu and H. Kobayashi, “Practical implementation of an efficient forwardbackward algorithm for an explicitduration hidden Markov model,” IEEE Transactions on Signal Processing, vol. 54, no. 5, pp. 1947–1951, 2006. View at: Publisher Site  Google Scholar
 N. Brunel and W. Pieczynski, “Unsupervised signal restoration using hidden Markov chains with copulas,” Signal Processing, vol. 85, no. 12, pp. 2304–2315, 2005. View at: Publisher Site  Google Scholar
 Z. Chen, R. D. Wimmer, M. A. Wilson, and M. M. Halassa, “Thalamic circuit mechanisms link sensory processing in sleep and attention,” Unpublished. View at: Google Scholar
 J. Liechty, R. Pieters, and M. Wedel, “Global and local covert visual attention: evidence from a Bayesian hidden Markov model,” Psychometrika, vol. 68, no. 4, pp. 519–541, 2003. View at: Publisher Site  Google Scholar  MathSciNet
 K. Hashimoto, Y. Nankaku, and K. Tokuda, “A Bayesian approach to hidden semiMarkov model based speech synthesis,” in Proceedings of the 10th Annual Conference of International Speech Communication Association (Interspeech '09), pp. 1751–1754, 2009. View at: Google Scholar
 M. Dewar, C. Wiggins, and F. Wood, “Inference in hidden Markov models with explicit state duration distributions,” IEEE Signal Processing Letters, vol. 19, no. 4, pp. 235–238, 2012. View at: Publisher Site  Google Scholar
 Z. Chen, “An overview of bayesian methods for neural spike train analysis,” Computational Intelligence and Neuroscience, vol. 2013, Article ID 251905, 17 pages, 2013. View at: Publisher Site  Google Scholar
 S. W. Linderman, M. J. Johnson, M. A. Wilson, and Z. Chen, “A nonparametric bayesian approach to uncovering rat hippocampal population codes during spatial navigation,” http://arxiv.org/pdf/1411.7706v1.pdf. View at: Google Scholar
 J. Paisley and L. Carin, “Hidden Markov models with stickbreaking priors,” IEEE Transactions on Signal Processing, vol. 57, no. 10, pp. 3905–3917, 2009. View at: Publisher Site  Google Scholar  MathSciNet
 E. B. Fox, E. B. Sudderth, M. I. Jordan, and A. S. Willsky, “An HDPHMM for systems with state persistence,” in Proceedings of the 25th International Conference on Machine Learning, pp. 312–319, July 2008. View at: Google Scholar
 M. J. Johnson and A. S. Willsky, “Bayesian nonparametric hidden semiMarkov models,” Journal of Machine Learning Research, vol. 14, pp. 673–701, 2013. View at: Google Scholar  MathSciNet
 J. LapuyadeLahorgue and W. Pieczynski, “Unsupervised segmentation of hidden semiMarkov nonstationary chains,” Signal Processing, vol. 92, no. 1, pp. 29–42, 2012. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2015 Zhe Chen. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.