Abstract

We consider functional data analysis when the observations at each location are functional rather than scalar. When the dynamic of underlying functional-valued process at each location is of interest, it is desirable to recover partial derivatives of a sample function, especially from sparse and noise-contaminated measures. We propose a novel approach based on estimating derivatives of eigenfunctions of marginal kernels to obtain a representation for functional-valued process and its partial derivatives in a unified framework in which the number of locations and number of observations at each location for each individual can be any rate relative to the sample size. We derive almost sure rates of convergence for the procedures and further establish consistency results for recovered partial derivatives.

1. Introduction

With the rapid advance in computational and analytical technology, many time-dynamic processes are monitored and recoded continuously during a time interval or intermittently at several discrete time points. Functional data analysis (FDA) is a powerful tool to deal with the analysis and theory of data that are in the form of functions, images and shapes, or more general objects. Traditional functional data typically consist of a random sample of independent real-valued functions, which can be viewed as the realization of a one-dimensional stochastic process. In this field of research, a general introduction of the available methods can be found in Ramsay and Silverman [1] and Wang et al. [2].

Many recent developments in FDA concern multivariate functional data and spatially indexed functional data. Chen and Müller [3] introduced a methodology for repeatedly observed and thus dependent functional data, covering the case where the recordings of the curves are scheduled on a regular and dense grid with often sparse and random recording times.

We consider special situations where the observations at each location are functional rather than scalar. For and , we consider the stochastic process and denote its value at time by , which is a square integrable random function with argument . Chen et al. [4] proposed marginal FPCA and product FPCA models for and developed estimating methods and theoretical results under designs that are dense and regular in s. In practice, we may deal with functional data which are dense and random at the s direction. Under these cases, a presmoothing of individual curve at each location is necessary. However, in practice, it is possible that we are faced with sparse and random designs in s. Moreover, it is also possible that curves at some locations are densely observed, while curves at other locations are sparsely observed. In this paper, we aim to recover X(s,t) by estimating the multivariate mean function, the marginal covariance function, and then the FPCA in a unified framework. This unified framework allows the number of locations and the number of observations at each location for each individual to be any rate relative to the sample size. Thus, the proposed procedure avoids a challenging issue of classifying which scenario we are faced with and hence deciding which methodology to use when dealing with real data.

On the other hand, it is often of interest to recover derivatives of a sample of random functions, especially when the dynamics of underlying processes is of interest. Since currently available statistical methods for estimating derivatives require densely observed data, it is quiet challenging to recover derivatives from sparse functional data with noise-contaminated measurements. Liu and Müller [5] proposed an approach based on estimating derivatives of eigenfunctions to obtain a representation for derivatives of a sample of sparsely observed one-dimensional functions. Our further work in this paper is aimed at recovering partial derivatives of underlying functional-valued process at each location, that is the dth partial derivatives of with respect to s, which is denoted as . The whole procedure is also in a unified framework in which multiple functional data can be either densely or sparsely observed.

The article is organized as follows. In Section 2, we introduce the model and all estimation procedures for recovering both functional-valued process and its partial derivatives. We establish the uniformly almost sure convergence rates of the procedures in Section 3, where we also discuss the rates corresponding to some special scenarios. Some relative issues to our proposed procedures are discussed in Section 4. In Section 5, simulation studies are conducted to evaluate the performance of our procedures. All technique lemmas and all proofs are included in Appendix.

2. Models and Estimation

2.1. Representations

Consider process with mean for all and and covariance function

Chen et al. [4] proposed a representation aswhere are the eigenfunctions of the operator in with kerneland are the random coefficients of the expansion of the centred processes in andis the Karhunen–Loeve expansion of the random functions in . Here, for each , are the eigenfunctions of the operator with kerneland are the FPC scores of .

Based on the representation of shown as (2), we can write aswhere is the dth partial derivative of with respect to s and is the dth derivative of on .

Denote are eigenvalues of , and then the eigenfunctions are the solutions of the eigenequations , under side conditions of norm 1 and orthogonality on all previous eigenfunctions. Upon taking the dth derivative with respect to s from both side of these eigenequations,

If exists for all , andis bounded and integrable for all , interchanging integrations and differentiation leads to

One can then estimate derivatives by approximating with a truncated representationwith finite , , and .

2.2. Estimation

Time-indexed functional data consist of a sample of n independent subjects or units. For the ith subject, suppose we observewhich means that, on each time point , , is recorded at a grid of functional points , . Here, are additional measurement errors, assumed to be iid with mean zero and finite variance . We also assume that are independent of all , , and .

Our approach is based on the local-polynomial smoother [6].

Step 1. Estimation of the mean function and the partial derivatives of mean functions.
For fixed and some bandwidths and ,Let , where and . Then, we obtain a smoothing estimator aswhere is a symmetric probability density function on and . Here, the kernel function can be different at different occasions. Then, local estimators of and are given byrespectively.

Step 2. Estimation of , , and .
Note thatwhere . On the contrary, if exists for all and is bounded and integrable for all , thenwhere . Thus, in order to estimate and , we estimate and first.
To this end, we estimate and based on the following procedures. For fixed and some bandwidths , , and (for d = 0, we choose ),Let , where , , and . Then, we obtain an estimator aswhere . Then, smoothing estimators of and are given byrespectively. We then can obtain thatTo estimate , we first estimate by , wherewith some bandwidths and . is estimated in the same way as estimating , but with and , we then estimate bywhere and are Lebesgue measures of and , respectively.

Remark 1. In practice, the empirical estimator of [4] can be used and remains at convergence rate for dense and regular designs in s; that is, all s are observed at and . On the contrary, by presmoothing for individual curves, the empirical estimator of is also applicable for dense and random designs for s, and as designs get denser, the overall convergence rate remains under appropriate regularity conditions. Under these circumstances, a further smoothing estimator of can also be obtained based on empirical estimators of . Similar results hold for the estimation of .
However, in practice, it is possible that some sample curves are densely observed, while others are sparsely observed at the s direction. Moreover, in dealing with real data, it may even be difficult to classify which scenario we are faced with and hence to decide which methodology to use.

Step 3. Estimation of eigenfunctions and and eigenvalues of the operator in with kernel , as well as estimation of FPC functions .
The estimated eigenfunctions and estimated eigenvalues can be obtained by standard methods of computing the eigenvalues and eigenfunctions of an integral operator with a symmetric kernel. Then, we haveFor designs that are dense in s, one can obtain by interpolating numerical approximations of the integrals:On the contrary, for designs that are sparse in s, one can estimate by the PACE approach [7].

Step 4. Estimation of eigenfunctions of the operator with kernel and FPCs .
This is a standard FPCA of one-dimensional processes . For each fixed j, one obtains estimates for the FPCs and eigenfunctions for designs that are dense in t [1] and for designs that are sparse in t [7]. One can also adapt the approach of Li and Hsing [8], which is suitable for both sparse and dense functional data, to one-dimensional processes .
In this step, for each j, we are able to approximate by
After selecting appropriate numbers of included components J and , , we obtain the overall representation:The included number of components J and , , can be selected via a variety of methods, including fraction of variance explained (FVE) criterion [8], leave-one-curve-out cross-validation [9], pseudo-AIC [10], or pseudo-BIC [7, 11]. We will illustrate these procedures in Section 4.

3. Asymptotic Theory

We first define the notations and conditions to be used. Assume that and may depend on n as well, namely, and . However, for simplicity, we continue to use the notation and . Define

For any bandwidths , we also define

From now on, without loss of generality, we assume that the domain of the process is . Some assumptions needed for the asymptotic theory are as follows. We use as a generic constant that can take different values at different places.

Now, we state the assumptions:

Assumption 1. All second-order partial derivatives of are uniformly continuous and bounded on . Furthermore, exists and is uniformly continuous and bounded on .

Assumption 2. All second-order partial derivatives of are uniformly continuous and bounded on . Furthermore, exists and is uniformly continuous and bounded on .

Assumption 3. Let and be the density distribution functions of S and T, respectively. Assume both of and are lower bounded away from 0 and and for some positive constants and .

Assumption 4. Let be the joint density distribution function of and be the joint density distribution function of . Both and are upper bounded and lower bounded away from 0. Furthermore, assume that both and have continuous and bounded second-order derivatives uniformly on their domains.

Assumption 5. and for some . as , and

Assumption 6. and for some . as and .

Assumption 7. and for some . as and

Assumption 8. Assume the autocovariance operator generated by is positive definite, such thatand with eigenfunctions satisfying .

Assumption 9. Let be a symmetric probability density function on , . LetAssume that is a nonsingular matrix.
Assumptions (1) and (2) are regular smoothness conditions on the mean function µ and the covariance function D. Since we do not impose any parametric structure on the distribution of X, assumptions (3) and (4) are required for the derivation of uniform convergence. The moment conditions in (5)–(7) are similar to that in (C.5)–(C.7) of Li and Hsing [8] and hold rather generally. Assumptions (8) is similar to condition (B4) in Liu and Müller [5] and is needed for Theorem 4. When , the standard normal distribution function is an example for a kernel satisfying (13).

3.1. Uniform Convergence Rates of and

We establish the uniform convergence rates of and . First, we give some definitions and notations. For , and , let

By some simple algebra, we can obtain that satisfieswhere and

It then follows that and .

Theorem 1. Under assumptions (1), (3)–(5), and (9), let , then

Remark 2. We discuss special cases of Theorem 1 under dense or sparse designs.(1)For the designs that are sparse in both s and t: if both and are bounded, then for all and . Thus, Theorem 1 implies thatBy choosing and satisfying that , can achieve its optimal convergence rate as . Moreover, can also achieve its optimal convergence rate as by further choosing , that is .(2)For the designs that are dense in both s and t: if and , satisfying and , then for all and , and Thus, Theorem 1 yields thatMoreover, if the Assumption (5) is replaced by a strong version, in which we assume that and are bounded, then if and satisfying and , Theorem 1 implies that(3)For the designs that are sparse in s and dense in t: if is bounded and , where , then , that isFurthermore, if and , then Theorem 1 shows that(4)For the designs that are dense in s and sparse in t: if is bounded and , where , then , that isFurthermore, if and , then Theorem 1 shows that

3.2. Uniform Convergence Rates of , , and

We next establish the convergence rates of , , and . Since , integrating and over t results in extrasmoothing, which leads to a faster convergence rate and than and , respectively. The similar conclusion holds for .

For , , and , letthen we can obtain thatwith and

Theorem 2. Under assumptions (1)–(6) and (9), let , then

Remark 3. We discuss special cases of Theorem 2. Actually, whether the design is dense or sparse in t, the convergence rate in Theorem 2 is not affected. Hence, we only discuss different designs with respect to s. (1)For the designs that are sparse in s: if is bounded, thenOn the contrary, sincewe choosing and . Either is chosen as in Remark 2(1) or Remark 2(3), Theorem 2 implies that(2)For the designs that are dense in s: if , thenAssume that and , where and . If the Assumption (5) is replaced by a strong version, in which we assume that and are bounded, then Theorem 2 implies that

Theorem 3. Under assumptions (1)–(7) and (9),

Remark 4. Same as Remark 3, we discuss the convergence rate of under special cases.(1)For the designs that are sparse in s: if is bounded, thenwhich results in(2)For the designs that are dense in s: if , satisfying that , thenIf , then

3.3. Uniform Convergence Rates in FPCA

We next establish the convergence rates in FPCA. Let J be a fixed positive constant.

Theorem 4. Under assumptions (1)–(9), for ,(1)(2)(3)(4)(5)For each ,

The consistency of guarantees the appropriacy of estimation procedures in Step 4. The proof of the theorems will be given in the Appendix.

4. Relative Issues

In this section, we discussed a few issues that are related to the implementation of our proposed methods.

4.1. Selection of Bandwidths

The performance of the estimators depends on the choice of bandwidths for and , and the best bandwidths vary with Ms and Ls. The bandwidth selection problem turns out to be very challenging and hence an important problem for future research. For lack of a better approach, we suggest picking the bandwidths by minimizing the integrated mean square error (IMSE). That is, for each function above, one calculated the IMSE over a range of h and selected the one that minimizes the IMSE.

4.2. Selection of Ks and J in the Overall Representations (25) and (26)

In practice, the choice of the numbers of components J and s to be included in (25) can be based on the leave-one-curve-out cross-validation method [9] or by the fraction of variance explained (FVE) by the first J components [4]. One can also adopt AIC [10] or BIC [11] type of criteria, see Yao et al. [7] for one-dimensional function data.

For bivariate functional data, a pseudo-Gaussian log-likelihood is given bywhere . One can choose J through minimizing (resp., ) with respect to J.

For each , define

The number of components is selected by minimizing (resp., ) with respect to .

Appendix

This is a five-part appendix organized as follows. Appendix A states some technical lemmas are needed for our main results. The proofs of these lemmas are not included here as they are lengthy and tedious. We provide them in an online supplementary material available online. Appendices B–E provide the proofs of Theorems 14, respectively.

A. Technical Lemmas

Some technical lemmas needed for our main results are shown as follows.

Lemma A.1. Let or for , , and . Suppose for some thatDefineand . Let and be any positive sequences tending to 0 and , then if , we have

The proof of Lemma A.1 is provided in the supplementary material for saving space.

Lemma A.2. Let be as in Lemma A.1 and assume that (A.1) holds. For bandwidths and and nonnegative integers p and q, letLet , assume that , , and , then we have

The proof of Lemma A.2 is provided in the supplementary material for saving space.

Lemma A.3. Let , , or for , , and . Suppose for some thatDefineand . Let , , and be any positive sequences tending to 0 and , then if , we have

The proof of Lemma A.3 is provided in the supplementary material for saving space.

Lemma A.4. Let be as in Lemma A.3 and assume that (A.6) holds. For bandwidths , , and and nonnegative integers p, q, and r, letwhere . Let ; assume that , , , and , then we have

The proof of Lemma A.4 is provided in the supplementary material for saving space.

Lemma A.5. Let be as in Lemma A.1 and assume that (A.1) holds. For bandwidths and nonnegative integers p, letThen, we haveLet , assume that and , then we have

The proof of Lemma A.5 is provided in the supplementary material for saving space.

Lemma A.6. Let be as in Lemma A.3 and assume that (A.6) holds. For bandwidths and and nonnegative integers p and q, letwhere .
Let , assume that , , and , then we haveLet , assume that and , then we have

The proof of Lemma A.6 is provided in the supplementary material for saving space.

B. Proof of Theorem 1

Recall that and . Note that we can writewhere for , ,

By Taylor’s expansion and Lemma A.2, uniformly in ,where , from which we conclude that, uniformly in ,

We next consider . For any interior point ,where is the joint density of and . Since is symmetric, we can further obtain thatand hence, uniformly for ,

Thus, uniformly for , . The same rate can be achieved for boundary points. Note thatand thus Theorem 1 holds.

C. Proof of Theorem 2

Recall that

To bound and , we consider and first.

Note that

Now writewhere for , and .

By Taylor’s expansion and Lemma A.4, uniformly in ,where . Thus, uniformly in ,

We next consider . For any interior point ,where is the joint density function of , , and T, and is bounded away form 0. Hence, uniformly for ,and thus,

The same rate can achieve for boundary points, from which we conclude that, uniformly for ,

Now we consider and . Recall that

Sincewe consider the first and the second terms on the right-hand side of (C.13) separately. Now recall that is the first component of , and then by (C.8) that uniformly for ,where is the th component of . Note that

Since by Lemma A.6, uniformly for ,it follows from (C.14) that, uniformly for ,

We next look into the second term on the right-hand side of (C.13). By Lemma A.5 and the similar derivation leading to (C.17), uniformly for ,

Thus, combining (C.13), (C.17), and (C.18) leading to (E.6), which is

By mirroring the derivations above, we can prove (E.8), which is

Theorem 2 holds.

D. Proof of Theorem 3

Let be the integral operator with kernel . The following Lemma A.7 is needed for the proof of Theorem 3 and Theorem 4.

Lemma A.7. For any bounded measurable function ψ on ,

The proof of Lemma A.7 is provided in the supplementary material.

Proof of Theorem 3. Since for all , ,Note that is a special case of in the proof of Lemma A.7 with , , and , then according to the proof of Lemma A.7, we can obtain that has the same rate as with and .
According to the expression (C.5) of , we can obtain thatObviously, and , which together with Lemma A.6 leads to thatwhich is also the rate of .
On the contrary, based on the definition of , by applying the similar proof of Lemma A.1 and A.2, Lemma 5, and Theorem 2 to , it is easy to show thatSince and , combing (D.2)–(D.5) leads to Theorem 3.

E. Proof of Theorem 4

(1)By the expansion [12] and Bessel’s inequality, we have for some constant :where is the Hilbert–Schmidt norm of . Then, it follows form Theorem 2 and Lemma A.7 thatand henceTheorem 4(1) holds.(2)By (4.9) in Hall et al. [13],Similarly to the argument leading to the rate of in Lemma A.7, we can obtain thatNext, we writeand similarly, we can show thatBy combining (E.2) and (E.5)–(E.8), Theorem 4(2) is proved.(3)For any ,By the Cauchy–Schwarz inequality, uniformly for all ,Thus, by Lemma A.7, we have uniformly for all :By the triangle inequality and Theorem 4(2),Theorem 4(3) holds.(4)Note that for all ,By the Cauchy–Schwarz inequality, uniformly for all ,On the contrary, by the similar argument in the proof of Lemma A.7, we can show that, uniformly for all ,It then follows from Theorem 4(2) thatwhich proves Theorem 4(4).(5)The uniform consistency of is straightforward, and the detailed discussions are omitted.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

Ma and Zhou’s research were partially supported by the Fundamental Research Funds for the Central Universities (nos. JBK140507 and JBK1806002) of China. Zhou was partially supported by the National Natural Science Foundation of China (NSFC) (no. 11571282).

Supplementary Materials

Some technical lemmas needed for our main results are stated and proved in the Supplementary Material. (Supplementary Materials)