11.2 Statistics of vectors

We turn now to the trickier problem of sets of measured vectors. We will consider the case in which all vectors are assumed to have a length of one, i.e., these are unit vectors. Unit vectors are just “directions”. Paleomagnetic directional data are subject to a number of factors that lead to scatter. These include:

  1. uncertainty in the measurement caused by instrument noise or sample alignment errors,
  2. uncertainties in sample orientation,
  3. uncertainty in the orientation of the sampled rock unit,
  4. variations among samples in the degree of removal of a secondary component,
  5. uncertainty caused by the process of magnetization,
  6. secular variation of the Earth’s magnetic field, and
  7. lightning strikes.

Some of these sources of scatter (e.g., items 1, 2 and perhaps 6 above) lead to a symmetric distribution about a mean direction. Other sources of scatter contribute to distributions that are wider in one direction than another. For example, in the extreme case, item four leads to a girdle distribution whereby directions are smeared along a great circle. It would be handy to be able to calculate a mean direction for data sets and to quantify the scatter.


EPSfiles/fisher.eps

Figure 11.2: Hypothetical data sets drawn from Fisher distributions with vertical true directions with κ = 5 (a-c), κ = 10 (d-f), κ = 50 (g-i). Estimated D,I ,κ,α95 shown in insets.


In order to calculate mean directions with confidence limits, paleomagnetists rely heavily on the special statistics known as Fisher statistics (Fisher, 1953), which were developed for assessing dispersion of unit vectors on a sphere. It is applicable to directional data that are dispersed in a symmetric manner about the true direction. We show some examples of such data in Figure 11.2 with varying amounts of scatter from highly scattered in the top row to rather concentrated in the bottom row. All the data sets were drawn from a Fisher distribution with a vertical true direction.

In most instances, paleomagnetists assume a Fisher distribution for their data because the statistical treatment allows calculation of confidence intervals, comparison of mean directions, comparison of scatter, etc. The average inclination, calculated as the arithmetic mean of the inclinations, will never be vertical unless all the inclinations are vertical. In the following, we will demonstrate the proper way to calculate mean directions and confidence regions for directional data that are distributed in the manner shown in Figure 11.2. We will also briefly describe several useful statistical tests that are popular in the paleomagnetic literature.

11.2.1 Estimation of Fisher statistics

R. A. Fisher developed a probability density function applicable to many paleomagnetic directional data sets, known as the Fisher distribution (Fisher, 1953). In Fisher statistics each direction is given unit weight and is represented by a point on a sphere of unit radius. The Fisher distribution function PdA(α) gives the probability per unit angular area of finding a direction within an angular area, dA, centered at an angle α from the true mean. The angular area, dA, is expressed in steredians, with the total angular area of a sphere being 4π steredians. Directions are distributed according to the the Fisher probability density, given by:

P   (α ) =----κ--- exp (κ cosα),
 dA      4 πsinhκ
(11.3)

where α is the angle between the unit vector and the true direction and κ is a precision parameter such that as κ →∞, dispersion goes to zero.

We can see in Figure 11.3a the probability of finding a direction within an angular area dA centered α degrees away from the true mean for different values of κ. κ is a measure of the concentration of the distribution about the true mean direction. The larger the value of κ, the more concentrated the direction; κ is 0 for a distribution of directions that is uniform over the sphere and approaches for directions concentrated at a point.


EPSfiles/P.eps

Figure 11.3: a) Probability of finding a direction within an angular area, dA centered at an angle α from the true mean. b) Probability of finding a direction at angle α away from the true mean direction.


If ϕ is taken as the azimuthal angle about the true mean direction, the probability of a direction within an angular area, dA, can be expressed as

PdA (α )dA = PdA(α )sin (α)dαdϕ.

The sinα term arises because the area of a band of width varies as sinα. It should be understood that the Fisher distribution is normalized so that

∫ 2π∫ π
         PdA(α )sin (α)dαdϕ = 1.
 ϕ=0 α=0
(11.4)

Equation 11.4 simply indicates that the probability of finding a direction somewhere on the unit sphere must be unity. The probability P of finding a direction in a band of width between α and α + is given by:

         ∫ 2π
Pdα(α) =     PdA (α)dA = 2πPdA (α)sin(α)dα
          ϕ=0
                ---κ----
= PdA (α )sin α = 2π sinh κ exp(κcos α)sinα.
(11.5)

This probability (for κ = 5,10,50,100) is shown in Figure 11.3b where the effect of the sinα term is apparent. Equation 11.3 for the Fisher distribution function suggests that declinations are symmetrically distributed about the mean. In “data” coordinates, this means that the declinations are uniformly distributed from 0 360. Furthermore, the probability Pα of finding a direction of α away from the mean decays exponentially.

Because the intensity of the magnetization has little to do with the validity of the measurement (except for very weak magnetizations), it is customary to assign unit length to all directions. The mean direction is calculated by first converting the individual moment directions (mi) (see Figure 11.4), which may be expressed as declination and inclination (Di,Ii), to cartesian coordinates (x1,x2,x3) by the methods given in Chapter 2. Following the logic for vector addition explained in Appendix A.3.2, the length of the vector sum, or resultant vector R, is given by:

      ∑          ∑          ∑
R2 = (   x1i)2 + (  x2i)2 + (  x3i)2,
       i          i          i
(11.6)

The relationship of R to the N individual unit vectors is shown in Figure 11.4. R is always < N and approaches N only when the vectors are tightly clustered. The mean direction components are given by:

     1  ∑              1 ∑              1  ∑
x1 = --(   x1i);  x2 =  -(    x2i);  x3 = --(   x3i).
     R   i             R  i             R   i
(11.7)

These cartesian coordinates can, of course, be converted back to geomagnetic elements (D,I ) by the familiar method described in Chapter 2.


EPSfiles/vecsum.eps

Figure 11.4: Vector addition of eight unit vectors (mi) to yield resultant vector R. [Figure redrawn from Butler, 1992.]


Having calculated the mean direction, the next objective is to determine a statistic that can provide a measure of the dispersion of the population of directions from which the sample data set was drawn. One measure of the dispersion of a population of directions is the precision parameter, κ. From a finite sample set of directions, κ is unknown, but a best estimate of κ can be calculated by

κ ≃ k =  N---1-,
         N - R
(11.8)

where N is the number of data points. Using this estimate of κ, we estimate the circle of 95% confidence (p = 0.05) about the mean, α95, by:

                N - R   1 --1--
α95 = cos-1[1-  ------[(-)(N -1) - 1]].
                  R     p
(11.9)

In the classic paleomagnetic literature, α95 was further approximated by:

  ′    140
α 95 ≃ √kN--,
which is reliable for k larger than about 25 (see Tauxe et al., 1991). By direct analogy with Gaussian statistics (Equation 11.2), the angular variance of a sample set of directions is:
           ∑N
S2 = --1---    Δ2i,
     N  - 1i=1
(11.10)

where Δi is the angle between the ith direction and the calculated mean direction. The estimated circular (or angular) standard deviation is S, which can be approximated by:

CSD   ≃ 8√1-,
          k
(11.11)

which is the circle containing ~63% of the data.

Some practitioners use the statistic δ given by:

           R
δ = cos-1(--),
          N
(11.12)

because of its ease of calculation and the intuitive appeal (e.g., Figure 11.4) that δ decreases as R approaches N. In practice, when N >~ 10 - 20, CSD and δ are close to equal.

When we calculate the mean direction, a dispersion estimate, and a confidence limit, we are supposing that the observed data came from random sampling of a population of directions accurately described by the Fisher distribution. But we do not know the true mean of that Fisherian population, nor do we know its precision parameter κ. We can only estimate these unknown parameters. The calculated mean direction of the directional data set is the best estimate of the true mean direction, while k is the best estimate of κ. The confidence limit α95 is a measure of the precision with which the true mean direction has been estimated. One is 95% certain that the unknown true mean direction lies within α95 of the calculated mean. The obvious corollary is that there is a 5% chance that the true mean lies more than α95 from the calculated mean.

11.2.2 Some illustrations

Having buried the reader in mathematical formulations, we present the following illustrations to develop some intuitive appreciation for the statistical quantities. One essential concept is the distinction between statistical quantities calculated from a directional data set and the unknown parameters of the sampled population.

Consider the various sets of directions plotted as equal area projections (see Chapter 2) in Figure 11.2. These are all synthetic data sets drawn from Fisher distributions with means of a single, vertical direction. Each of the three diagrams in a row is a a replicate sample from the same distribution. The top row were all drawn from a distribution with κ = 5, the middle with κ = 10 and the bottom row with κ = 50. For each synthetic data set, we estimated D,I and α95 (shown as insets to the equal area diagrams).

There are several important observations to be taken from these examples. Note that the calculated mean direction is never exactly the true mean direction (I = +90). The calculated mean inclination 
I varies from 78.6 to 89.3, and the mean declinations fall within all quadrants of the equal-area projection. The calculated mean direction thus randomly dances about the true mean direction and deviates from the true mean by between 0.7 and 11.4. The calculated k statistic varies considerably among replicate samples as well. The variation of k and differences in angular variance of the data sets with the same underlying distribution are simply due to the vagaries of random sampling.

The confidence limit α95 varies from 19.9 to 4.3 and is shown by the circle surrounding the calculated mean direction (shown as a triangle). For these directional data sets, only one (Figure 11.2e) has a calculated mean that is more than α95 from the true mean. However, if 100 such synthetic data sets had been analyzed, on average five would have a calculated mean direction removed from the true mean direction by more than the calculated confidence limit α95. That is, the true mean direction would lie outside the circle of 95% confidence, on average, in 5% of the cases.

It is also important to appreciate which statistical quantities are fundamentally dependent upon the number of observations N. Neither the k value (Equation 11.8) nor the estimated angular deviation CSD (Equation 11.11) is fundamentally dependent upon N. These statistical quantities are estimates of the intrinsic dispersion of directions in the Fisherian population from which the data set was sampled. Because that dispersion is not affected by the number of times the population is sampled, the calculated statistics estimating that dispersion should not depend fundamentally on the number of observations N. However, the confidence limit α95 should depend on N; the more individual measurements there are in our sample, the greater must be the precision (and accuracy) in estimating the true mean direction. This increased precision should be reflected by a decrease in α95 with increasing N. Indeed Equation 11.9 indicates that α95 depends approximately on 1√N-- .

Figure 11.5 illustrates these dependencies of calculated statistics on number of directions in a data set. This diagram was constructed as follows:

  1. We drew a synthetic data set of N = 30 from a Fisher distribution with a κ of 29.2 (equivalent to a circular standard deviation S of 15).
  2. Starting with the first four directions in the synthetic data set, a subset of N = 4 was used to calculate k, CSD and δ using Equations 11.8,  11.11, and  11.12 respectively. In addition, α95 (using Equation 11.9) was calculated. Resulting values of CSD, δ and α95 are shown in Figure 11.5 as a function of N.
  3. For each succeeding value of N in Figure 11.5, the next direction from the N = 30 synthetic data set was added to the previous subset of directions, continuing until the full N = 30 synthetic data set was used.

The effects of increasing N are readily apparent in Figure 11.5 in which we show a comparison of the two estimates of S, CSD and δ. Although not fundamentally dependent upon N, in practice the estimated angular standard deviation, CSD, deviates from S for values of N < 15, only approaching the correct value when N 15. As expected, the calculated confidence limit α95 decreases approximately as 1√--
 N , showing a dramatic decrease in the range 4 < N < 10 and more gradual decrease for N > 10.


EPSfiles/a95-csd.eps

Figure 11.5: Dependence of estimated angular standard deviation, CSD and δ, and confidence limit, α95, on the number of directions in a data set. An increasing number of directions were selected from a Fisherian sample of directions with angular standard deviation S = 15 (κ = 29.2), shown by the horizontal line.


If directions are converted to VGPs as outlined in Chapter 2, the transformation distorts a rotationally symmetric set of data into an elliptical distribution. The associated α95 may no longer be appropriate. Cox and Doell (1960) suggested the following for 95% confidence regions in VGPs. Ironically, it is more likely that the VGPs are spherically symmetric implying that most sets of directions are not!

         cosλ         1
dm = α95 ---,   dp = -α95(1+ 3 sin 2λ),
         cosI         2
(11.13)

where dm is the semi-axis parallel to the meridians (lines of longitude), dpis the semi-axis parallel to the parallesl (lines of latitude), and λ is the site paleolatitude.