Analysis 1

From Advanced Labs Wiki
Jump to navigation Jump to search

Back to 2012 Main page

Mean and Variance

The measurements you make in the lab are subject to error. We can formalize this idea by thinking of a measurement of some quantity <math>x</math>. This may be the number of counts per minute from the ML or the RS labs, or it might be a reading of the Hall voltage, etc etc. Nature determines the expectation value of the measurement. We'll use the mean as the expectation value (instead of, say, the median):

<math>\mu = \langle x \rangle. </math>

If you take many data, such that you have <math>N</math> measurements of <math>x</math> -- <math>x_i</math>, then the sample mean is given by

<math>m = \frac{1}{N} \sum^N_i x_i </math>

which should be familiar. In order to quantify the error associated with our data, we introduce the variance <math>\sigma^2</math>

<math>\sigma^2 = \langle (x-\mu)^2 \rangle. </math>

This is the "second moment" of the distribution of data. Note that the square is essential because, from the definition of the mean, <math> \langle (x-\mu) \rangle = 0 </math>. The square root of the variance is the standard deviation <math>\sigma</math>. It is generally the standard deviation that you quote as an "error" as it has the same units as the measurement. (The variance has units of the measurement units squared.) Analogously to the mean, we can compute the sample variance from our dataset <math>x_i</math>:

<math>s^2 = \frac{1}{N-1} \sum^{N}_i (x_i - m)^2 </math>.

Why do we use <math>N-1</math> instead of <math>N</math> in the denominator for the sample variance? We'll come back to that later.

In some cases, particular data are characterized by a distribution where the variance is readily available from the data itself. An example relevant to your labs is the variance on counts, which is derived from the Poisson distribution. In this case, if you have counted a total of, say, <math>M</math> muons or alpha particles in so many seconds, the variance on this number is <math>M</math> (and the standard deviation is therefore <math>\sqrt M</math>). See suggested texts for a discussion of the Poisson distribution (but this is all you need to know for the class).

Simple Error Propagation

Given a function <math>f(x)</math>, what is the error on <math>f</math> if you know the mean <math>\mu</math> and the variance <math>\sigma^2</math> of <math>x</math>? Let's make things interesting and introduce another variable <math>y</math>. Let's start with our expression for variance

<math>\sigma_f^2 = \langle (f(x,y) - \mu_f)^2 \rangle</math>,

where <math>\sigma_f^2</math> and <math>\mu_f^2</math> are the variance and the mean of <math>f(x,y)</math>. Expanding <math>f</math> around the <math>x</math> and <math>y</math> means, we have

<math> f(x,y) \approx f(\mu_x, \mu_y) + \frac{\partial f}{\partial x} (x - \mu_x) + \frac{\partial f}{\partial y} (y-\mu_y) </math>

Substituting this into the expression for <math>\sigma_f</math> and recognizing that <math> \mu_f = f(\mu_x, \mu_y) </math>, we have

<math>\sigma_f^2 = \langle (\frac{\partial f}{\partial x} (x - \mu_x) + \frac{\partial f}{\partial y} (y-\mu_y))^2 \rangle</math>.

Because <math>x</math> and <math>y</math> are independent, the expectation value <math>\langle (x - \mu_x)(y - \mu_y) \rangle = 0</math>, and the final expression for the variance of <math>f</math> is

<math>\sigma_f^2 = \langle \frac{\partial f}{\partial x}^2 (x - \mu_x)^2 + \frac{\partial f}{\partial y}^2 (x-\mu_y)^2 \rangle = \frac{\partial f}{\partial x}^2 \sigma_x^2 + \frac{\partial f}{\partial y}^2 \sigma_y^2 </math>.

This can be extended to an arbitrary number of variables such that

<math>\sigma_f^2 = \sum_i \frac{\partial f}{\partial x_i}^2 \sigma_{x_i}^2 </math>.

Error on the Sample Mean

Armed with the derivation of simple error propagation, we can now address the variance of the sample mean of N independent measurements. The expression for the mean is

<math>m = \frac{1}{N} \sum^N_i x_i </math>

Therefore, for the <math> i^{th} </math> random variable, we have

<math> \frac{\partial m}{\partial x_i} = \frac{1}{N} </math>.

And thus, assuming that the variance for all the measurements is the same <math>\sigma_{x_i} = \sigma </math>, the error propagation equation yields

<math> \sigma_m^2 = \sum_i^N \frac{\partial m}{\partial x_i}^2 \sigma_{x_i}^2= \sum_i^N \frac{1}{N^2} \sigma^2 = \frac{\sigma^2}{N} </math>.

Therefore, the variance on the mean of <math>N</math> measurements is equal to the variance of any one measurement divided by <math>N</math>. In practice you take the sample mean, as defined in the first section of this tutorial, and divide it by <math>N</math> in order to get an estimate of the variance on the mean.