Analysis 1: Difference between revisions

From Advanced Labs Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(24 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''under construction'''
[[2013 | Back to 2013 Main Page]]
 
[[2012 | Back to 2012 Main Page]]


==Mean and Variance==
==Mean and Variance==


The measurements you make in the lab are subject to error. We can formalize this idea by thinking of a measurement of<math>x</math>. This may be the number of counts per minute from the ML or the RS labs, or it might be a reading of the Hall voltage, etc etc.  Nature determines the expectation value of the measurement. We'll use the '''mean'''  as the expectation value (instead of, say, the median):
The measurements you make in the lab are subject to error. We can formalize this idea by thinking of a measurement of some quantity <math>x</math>. This may be the number of counts per minute from the ML or the RS labs, or it might be a reading of the Hall voltage, etc etc.  Nature determines the expectation value of the measurement. We'll use the '''mean'''  as the expectation value (instead of, say, the median):


<math>\mu = \langle x \rangle. </math>
<math>\mu = \langle x \rangle. </math>


If you take a lot of data, such that you have <math>N</math> measurements of <math>x</math> -- <math>x_i</math>, then the '''sample mean''' is given by  
If you take many data, such that you have <math>N</math> measurements of <math>x</math> -- <math>x_i</math>, then the '''sample mean''' is given by  


<math>m = \frac{1}{N} \sum^N_i x_i </math>
<math>m = \frac{1}{N} \sum^N_i x_i </math>


which should be familiar. In order to quantify the error associated with our data, we introduce the '''variance''' $\sigma^2$
which should be familiar. In order to quantify the error associated with our data, we introduce the '''variance''' <math>\sigma^2</math>


<math>\sigma^2 = \langle (x-\mu)^2 \rangle. </math>
<math>\sigma^2 = \langle (x-\mu)^2 \rangle. </math>


This is the "second moment" of the distribution of data. Note that the square is essential because, from the definition of the mean, <math> \langle (x-\mu) \rangle = 0 </math>. The square root of the variance is the '''standard deviation''' <math>\sigma</math>. Analogously to the mean, we can compute the '''sample variance''' from our dataset <math>x_i</math>:
This is the "second moment" of the distribution of data. Note that the square is essential because, from the definition of the mean, <math> \langle (x-\mu) \rangle = 0 </math>. The square root of the variance is the '''standard deviation''' <math>\sigma</math>. It is generally the standard deviation that you quote as an "error" as it has the same units as the measurement. (The variance has units of the measurement units squared.) Analogously to the mean, we can compute the '''sample variance''' from our dataset <math>x_i</math>:


<math>s^2 = \frac{1}{N-1} \sum^{N}_i (x_i - m)^2 </math>.
<math>s^2 = \frac{1}{N-1} \sum^{N}_i (x_i - m)^2 </math>.


Why do we use <math>N-1</math> instead of <math>N</math> in the denominator for the sample variance? We'll come back to that later.
Why do we use <math>N-1</math> instead of <math>N</math> in the denominator for the sample variance? We'll come back to that later.
 
In some cases, particular data are characterized by a distribution where the variance is readily available from the data itself. An example relevant to your labs is the variance on counts, which is derived from the Poisson distribution. In this case, if you have counted a total of, say, <math>M</math> muons or alpha particles in so many seconds, the variance on this number is <math>M</math> (and the standard deviation is therefore <math>\sqrt M</math>). See suggested texts for a discussion of the Poisson distribution (but this is all you need to know for the class).


==Simple Error Propagation==
==Simple Error Propagation==
Line 25: Line 29:
Given a function <math>f(x)</math>, what is the error on <math>f</math> if you know the mean <math>\mu</math> and the variance <math>\sigma^2</math> of <math>x</math>? Let's make things interesting and introduce another variable <math>y</math>. Let's start with our expression for variance
Given a function <math>f(x)</math>, what is the error on <math>f</math> if you know the mean <math>\mu</math> and the variance <math>\sigma^2</math> of <math>x</math>? Let's make things interesting and introduce another variable <math>y</math>. Let's start with our expression for variance


<math>\sigma_f^2 = \langle (f(x) - \mu_f)^2 \rangle </math>,
<math>\sigma_f^2 = \langle (f(x,y) - \mu_f)^2 \rangle</math>,
 
where <math>\sigma_f^2</math> and <math>\mu_f^2</math> are the variance and the mean of <math>f(x,y)</math>. Expanding <math>f</math> around the <math>x</math> and <math>y</math> means, we have
 
<math> f(x,y) \approx f(\mu_x, \mu_y) + \frac{\partial f}{\partial x} (x - \mu_x) + \frac{\partial f}{\partial y} (y-\mu_y) </math>
 
Substituting this into the expression for <math>\sigma_f</math> and recognizing that <math> \mu_f =  f(\mu_x, \mu_y) </math>, we have
 
<math>\sigma_f^2 = \langle (\frac{\partial f}{\partial x} (x - \mu_x) + \frac{\partial f}{\partial y} (y-\mu_y))^2 \rangle</math>.
 
Because <math>x</math> and <math>y</math> are independent, the expectation value <math>\langle (x - \mu_x)(y - \mu_y) \rangle = 0</math>, and the final expression for the variance of <math>f</math> is
 
<math>\sigma_f^2 = \langle \frac{\partial f}{\partial x}^2 (x - \mu_x)^2 + \frac{\partial f}{\partial y}^2 (x-\mu_y)^2 \rangle = \frac{\partial f}{\partial x}^2 \sigma_x^2 + \frac{\partial f}{\partial y}^2 \sigma_y^2 </math>.
 
This can be extended to an arbitrary number of variables such that


where <math>\sigma_f^2</math> and <math>\mu_f^2</math> are the variance and the mean of $f(x)$. As you might
<math>\sigma_f^2 = \sum_i \frac{\partial f}{\partial x_i}^2 \sigma_{x_i}^2 </math>.


==Error on the Sample Mean==
==Error on the Sample Mean==


Armed with the above derivations, we can now address the variance of the sample mean.
Armed with the derivation of simple error propagation, we can now address the variance of the sample mean of N independent measurements. The expression for the mean is
 
<math>m = \frac{1}{N} \sum^N_i x_i </math>
 
Therefore, for the <math> i^{th} </math> random variable, we have
 
<math> \frac{\partial m}{\partial x_i} = \frac{1}{N} </math>.
 
And thus, assuming that the variance for all the measurements is the same <math>\sigma_{x_i} = \sigma </math>, the error propagation equation yields
 
<math> \sigma_m^2 = \sum_i^N \frac{\partial m}{\partial x_i}^2 \sigma_{x_i}^2= \sum_i^N \frac{1}{N^2} \sigma^2 = \frac{\sigma^2}{N} </math>.
 
Therefore, the variance on the mean of <math>N</math> measurements is equal to the variance of any one measurement divided by <math>N</math>. In practice you take the sample variance, as defined in the first section of this tutorial, and divide it by <math>N</math> in order to get an estimate of the variance on the mean.

Latest revision as of 03:36, 28 January 2013

Back to 2013 Main Page

Back to 2012 Main Page

Mean and Variance

The measurements you make in the lab are subject to error. We can formalize this idea by thinking of a measurement of some quantity <math>x</math>. This may be the number of counts per minute from the ML or the RS labs, or it might be a reading of the Hall voltage, etc etc. Nature determines the expectation value of the measurement. We'll use the mean as the expectation value (instead of, say, the median):

<math>\mu = \langle x \rangle. </math>

If you take many data, such that you have <math>N</math> measurements of <math>x</math> -- <math>x_i</math>, then the sample mean is given by

<math>m = \frac{1}{N} \sum^N_i x_i </math>

which should be familiar. In order to quantify the error associated with our data, we introduce the variance <math>\sigma^2</math>

<math>\sigma^2 = \langle (x-\mu)^2 \rangle. </math>

This is the "second moment" of the distribution of data. Note that the square is essential because, from the definition of the mean, <math> \langle (x-\mu) \rangle = 0 </math>. The square root of the variance is the standard deviation <math>\sigma</math>. It is generally the standard deviation that you quote as an "error" as it has the same units as the measurement. (The variance has units of the measurement units squared.) Analogously to the mean, we can compute the sample variance from our dataset <math>x_i</math>:

<math>s^2 = \frac{1}{N-1} \sum^{N}_i (x_i - m)^2 </math>.

Why do we use <math>N-1</math> instead of <math>N</math> in the denominator for the sample variance? We'll come back to that later.

In some cases, particular data are characterized by a distribution where the variance is readily available from the data itself. An example relevant to your labs is the variance on counts, which is derived from the Poisson distribution. In this case, if you have counted a total of, say, <math>M</math> muons or alpha particles in so many seconds, the variance on this number is <math>M</math> (and the standard deviation is therefore <math>\sqrt M</math>). See suggested texts for a discussion of the Poisson distribution (but this is all you need to know for the class).

Simple Error Propagation

Given a function <math>f(x)</math>, what is the error on <math>f</math> if you know the mean <math>\mu</math> and the variance <math>\sigma^2</math> of <math>x</math>? Let's make things interesting and introduce another variable <math>y</math>. Let's start with our expression for variance

<math>\sigma_f^2 = \langle (f(x,y) - \mu_f)^2 \rangle</math>,

where <math>\sigma_f^2</math> and <math>\mu_f^2</math> are the variance and the mean of <math>f(x,y)</math>. Expanding <math>f</math> around the <math>x</math> and <math>y</math> means, we have

<math> f(x,y) \approx f(\mu_x, \mu_y) + \frac{\partial f}{\partial x} (x - \mu_x) + \frac{\partial f}{\partial y} (y-\mu_y) </math>

Substituting this into the expression for <math>\sigma_f</math> and recognizing that <math> \mu_f = f(\mu_x, \mu_y) </math>, we have

<math>\sigma_f^2 = \langle (\frac{\partial f}{\partial x} (x - \mu_x) + \frac{\partial f}{\partial y} (y-\mu_y))^2 \rangle</math>.

Because <math>x</math> and <math>y</math> are independent, the expectation value <math>\langle (x - \mu_x)(y - \mu_y) \rangle = 0</math>, and the final expression for the variance of <math>f</math> is

<math>\sigma_f^2 = \langle \frac{\partial f}{\partial x}^2 (x - \mu_x)^2 + \frac{\partial f}{\partial y}^2 (x-\mu_y)^2 \rangle = \frac{\partial f}{\partial x}^2 \sigma_x^2 + \frac{\partial f}{\partial y}^2 \sigma_y^2 </math>.

This can be extended to an arbitrary number of variables such that

<math>\sigma_f^2 = \sum_i \frac{\partial f}{\partial x_i}^2 \sigma_{x_i}^2 </math>.

Error on the Sample Mean

Armed with the derivation of simple error propagation, we can now address the variance of the sample mean of N independent measurements. The expression for the mean is

<math>m = \frac{1}{N} \sum^N_i x_i </math>

Therefore, for the <math> i^{th} </math> random variable, we have

<math> \frac{\partial m}{\partial x_i} = \frac{1}{N} </math>.

And thus, assuming that the variance for all the measurements is the same <math>\sigma_{x_i} = \sigma </math>, the error propagation equation yields

<math> \sigma_m^2 = \sum_i^N \frac{\partial m}{\partial x_i}^2 \sigma_{x_i}^2= \sum_i^N \frac{1}{N^2} \sigma^2 = \frac{\sigma^2}{N} </math>.

Therefore, the variance on the mean of <math>N</math> measurements is equal to the variance of any one measurement divided by <math>N</math>. In practice you take the sample variance, as defined in the first section of this tutorial, and divide it by <math>N</math> in order to get an estimate of the variance on the mean.