Statistical Properties of Eyegaze Trace Data

F. Murtagh and M. Farid {f.murtagh, m.farid}


We confirm the fractal nature of the eyegaze trace data studied, finding a Hurst parameter value of 0.66. We derive the log-log variance-scale plot from the Haar à trous wavelet transform. We verify that the variance is finite.

1. Background

How predictable is eye-gaze behavior when, for example, an expert radiologist examines a mammogram? To answer this we need to find common patterns in fixation and saccade traces.

Self-similarity and fractal behaviour have become common modeling tools in recent years for telecommunications traffic, among other fields. Willinger et al. (1995) present a comprehensive overview. Other properties of such data are closely associated: long-tailed distributions, long memory processes, and burstiness. All of these properties have profound implications for our ability to understand what is going on at a given time-point (i.e. nowcasting) in regard to telecommunications traffic, or to establish the limits to how far we can predict the behaviour of such data. Some work of ours which discusses aspects of this (establishing fractal behavior, decomposing with a wavelet transform, fitting a dynamic recurrent connectionist model for forecasting) can be found in Aussem and Murtagh (1998).

Classic records of eye movement graphs were given by Yarbus (1967), where he showed for example the fixation points and saccades during free examination by a subject of a picture, or a face. Battro (1992) has investigated self-similar behavior in eye movement data from an experimental psychology perspective. He presents a fascinating review of his "microdiscovery" of fractal understanding of such data in his web text (see URL following citation to Battro, 1992). His experimental findings are that there is fractal behavior in the amplitude of saccades, that the angle of saccades is isotropic, and that a collection of saccades can be self-similar in time.

We will use a set of 29716 raw time and vertical position values. Vertical and horizontal eye-gaze positions are strongly correlated insofar as they are both based on the fixation point. However we will make a working assumption that in saccadic movement, and in random movements around a given fixation point, the vertical and horizontal eye-gaze positions are uncorrelated. On the basis of this assumption, we will examine vertical position data alone. The time sequence was assembled from consecutive but non-contiguous time segments. Without loss of generality, we will take the time sequence as a single one.

2. Data Properties: One Eyegaze Positional Coordinate

The data used, and the first order differenced signal, are shown in the following figure.

To see how burstiness appears to exist on all time scales, the following figures shows: top - original differenced data; middle - rebinned data with 1 time step corresponding to 5 in the top data; bottom - rebinned data with 1 time step corresponding to 5 in the middle data, or 1 time step corresponding to 25 in the top data.

Burstiness on all (spatial or time) scales is as opposed to Poisson random shot noise which becomes increasingly uniform through the smoothing effect of decreasing scale. Burstiness is associated with self-similar or fractal behavior. One measure of self-similarity is the Hurst parameter, which we will now quantify. We will use a wavelet transform to do this (see Abry et al., 1998).

We use the Haar à trous wavelet transform algorithm because of our interest in later implementation in a real-time control setting. This redundant transform was described in Zheng et al. (1999) and in Renaud et al. (2002). The following figure shows the original (differenced signal) data in the upper left, followed by resolution scales 1 and 2 of the wavelet transform in row 1. There follow resolution scales 3 through 13. The final smooth version of the data is in the bottom right. A simple addition of all resolution levels, plus the final smoothed version of the data, allows the input data (top left) to be exactly reconstructed. The Haar à trous transfrom is implemented asymmetrically, from right (final time point) to left. From the smoothed (bottom right) plot, we see that the transformed data is unusable for early time points. Therefore we will use transform signals from time point 10000 onwards. The resolution scale signals remain close to zero mean, having done this.

A log-log plot of the variances of the resolution scales versus aggregation factor is shown in the following figure. It is relatively linear. A least squares fit gives its slope as -1.5458. Hence beta = 1.5458, and the Hurst parameter is H = 1 - beta/2 = 0.2271. It is usually considered necessary for 0.5 < H < 1 for self-similarity to hold. We therefore find that the fractal nature of this data set is small. A plot of a histrogram also shows that heavy-tailed behavior, also a characteristic of self-similar data, is not in evidence. The data used in this first analysis is often outside the viewing area boundary. As such, the fact that we find no fractal behavior here is appropriate if the data has not been properly selected to begin with.

We note that Hagiwara et al. (2001) find that H changes frequently in real network traffic, underlining the importance of having a fast method to estimate it. We will investigate the feasibility of this, based on the Haar à trous wavelet transform which is potentially very efficient.

We looked at a number of noise models for the data, including additive Gaussian, multiplicative Gaussian, and non-stationary. A Poisson model seems to perform best. The following figure shows the original data, a filtered signal using a Poisson noise model, and the difference between the two.

3. Two Eyegaze Positional Coordinates

A different data set is now used. The following figure shows the horizontal and vertical traces. The scene observed by the subject was a 9-point calibration grid, arranged as a 3 x 3 point grid.

We will now examine the distances between successive eyegaze locations. If two successive eyegaze positions are (x1, y1) and (x2, y2), then the succession of values used are: sqrt[ (x1-x2)^2 + (y1-y2)^2 ]. A 9-level Haar à trous wavelet transform is shown in the following. The first (i.e., upper left) panel shows the input data, followed by (from left to right, and from top to bottom) the sequence of wavelet scales. The Haar wavelet function is clearly visible at some of these scales.

A log-log plot of the variances of the resolution scales versus aggregation factor is shown in the following figure. The initial 1000 values are ignored due to ill-defined transform values in this interval. A least squares linear fit gives its slope as -0.6737. Hence beta = 0.6737, and the Hurst parameter is H = 1 - beta/2 = 0.66315. Since self-similarity is associated with 0.5 < H < 1 we therefore confirm the fractal nature of this data set. A plot of a histrogram confirms heavy-tailed behavior.

Finally, we will look at the cleaning of this data. The following figure shows from top to bottom: horizontal trace; median filter applied with a length 17 kernel; vertical trace; and a median filter applied again with a kernel length of 17. To further clean these data, it would be a straightforward objective to quantize (i.e., cluster) each trace. Fitting a Gaussian mixture to the 1-dimensional signals would be appropriate. This would provide a clear, cluster-wise straight line, representation of the traces.

4. Finite or Infinite Variance?

Tsakalides et al. (2000) discuss the property that in symmetric alpha-stable distributions, moments of order less than beta exist, i.e. E abs(X)^p < infinity, for p < beta. For a value of beta as found above, we should find finite variance. As discussed in Tsakalides et al. (2000), an infinite variance is as much to be feared as the Gaussian distribution of infinite support, i.e. it may be fully satisfactory as a practial model. A test for infinite or finite variance is to plot the sample variance based on n observations against n. If the data are from a population with finite variance, then by the law of large numbers, the sample variance should converge to the population variance as n increases. Otherwise we will find oscillation, with typical large jumps. The following figure shows plots for 3000 values of n, starting with value 1000, and with value 4000. We find large jumps, but these are caused by the spikes in the given data. Corresponding plots for median-smoothed data (kernel 17) are shown next. Here we verify finite variance.

5. Signal Prediction

A straightfoward autoregressive model was fit to the first 3798 values of the signal (7597 values in total). Order 10, AR(10), was found to be best, giving a standard deviation prediction error of 63.25.

An autoregressive model by scale, for each of 9 scales resulting from a Haar à trous transform, assuming a non-stationary signal, and using an AR(1) model at each scale, gave a standard deviation prediction error of 54.51.

The best multiresolution modeling, based on the method described in Renaud et al. (2002) gave a standard deviation prediction error of 62.71. This was for an AR(2) multiresolution model.


  1. P. Abry, D. Veitch and P. Flandrin, "Long-range dependence: revisting aggregation with wavelets", Journal of Time Series Analysis, 19, 253-266, 1998.
  2. A. Aussem and F. Murtagh, "A neuro-wavelet strategy for Web traffic forecasting", in Research in Official Statistics (published by Eurostat, Statistical Office of the European Communities), Vol. 1, No. 1, 65-87, 1998.
  3. A.M. Battro, "La temperatura de la mirada", Laboratorio de Investigaciones Sensoriales, Conicet, Buenos Aires, 1992. See also A.M. Battro, "A fractal story",
  4. R.S. Blum, Y. Zhang, B.M. Sadler and R.J. Kozick, ``On the approximation of correlated non-Gaussian noise PDFs using Gaussian mixture models, Conference on the Applications of Heavy Tailed Distributions in Economics, Engineering and Statistics, June, 1999, American University, Washington DC.
  5. T. Hagiwara, H. Doi, H. Tode and H. Ikeda, "High-speed calculation method of the Hurst parameter based on real traffic", IEICE Trans. Inf. & Syst., E84-D, 578-587, 2001.
  6. S.A. Kalim and L. Sacks, "An investigaion using wavelet analysis to detect a change in the characteristics of self-similar traffic", 4 pp.
  7. O. Renaud, J.L. Starck and F. Murtagh, "Prediction based on a multiscale decomposition", preprint, 2002.
  8. P. Tsakalides, P. Reveliotis and C.L. Nikias, "Scalar quantisation of heavy-tailed signals", IEE Vis. Image Signal Process., 147, 475-484, 2000.
  9. W. Willinger, M. Taqqu, W.E. Leland and D. Wilson, "Self-similarity in high-speed packed traffic: analysis and modeling of Ethernet traffic measurements", Statistical Science, 10, 67-85, 1995.
  10. A.L. Yarbus, Eye Movements and Vision, Plenum, New York, 1967.
  11. G. Zheng, J.L. Starck, J.G. Campbell and F. Murtagh, "Multiscale transforms for filtering financial data streams", Journal of Computational Intelligence in Finance, 7, 18-35, 1999.