Analysis 3: Difference between revisions

Revision as of 23:36, 10 February 2012

Back to 2012 Main Page

Linear Models

A linear model is a model that is linear in the parameters. Recall the example of the trajectory from the previous tutorials. In this example the data <math>d_i</math> were modeled by a quadratic function:

<math> d_i = a t_i^2 + b t_i + c </math>.

Note that this is quadratic in the sense that it is quadratic in time, not in the parameters <math>a,b,c</math>. With respect to the parameters, this model is linear. An example of a nonlinear model is <math> d_i = cos( 2\pi \nu t_i ) </math>, where you are fitting a frequency parameter <math>\nu</math>. In this section you'll learn how to fit linear models to data.

Matrix Notation

When dealing with linear systems, it's generally easier to work with matrices. Continuing with the above example of the trajectory, we introduce the parameter vector <math>\vec{p}</math>, which is a column vector with the parameters <math>a,b,c</math>. We also introduce matrix <math>M</math> with <math>N</math> rows and 3 columns. The first, second, and third columns of <math>M</math> have the elements <math> [ t_1^2, ..., t_N^2 ] </math>, <math> [ t_1, ..., t_N ] </math>, and <math> [ 1, ..., 1 ] </math>. Finally we introduce the data vector <math>\vec{d}</math>, which contains all the data. Given these definitions we can write

<math> \vec{d} = M\vec{p} </math>.

Fitting a Linear Model

In the previous tutorial, we the goodness of fit parameter <math>\chi^2</math>. For the trajectory example, we have

<math> \chi^2 = \sum_i^N \frac{(d_i - (a t_i^2 + b t_i + c))^2}{s_i^2} </math>,

We can continue to simplify our analysis by introducing the noise correlation matrix <math>D</math>, which for our purposes will be an <math>N \times N</math> matrix with <math>s_i^2</math> on the diagonal. With this matrix the <math>\chi^2</math> can be written as

<math> \chi^2 = (\vec{d}-M\vec{p})^T D^{-1} (\vec{d}-M\vec{p}) </math>.

The best fit parameters will minimize this function. Therefore, setting the gradient of this function with respect to the parameters <math>\vec{p}</math> equal to zero, we get an equation for the best fit parameters

<math> M^T D^{-1} M \vec{p} = M^T D^{-1} \vec{d}</math>.

Therefore, the vector of best fit parameters can be obtained by computing the right hand side of this equation and applying the inverse of the matrix <math>M^T D^{-1} M</math>.

Now you have the best fit parameters given the data. The next step is to compute the <math>\chi^2</math> associated with these parameters. See the tutorial on goodness of fit, to evaluate whether the best fit model is a good fit. If it is a good fit, then you can believe your model and go onto propagate errors to your model.

Errors on the Model Parameters

In the first analysis tutorial, we already learned to propagate errors of a function of the data.

<math>\sigma_f^2 = \sum_i \frac{\partial f}{\partial x_i}^2 \sigma_{x_i}^2 </math>.

We can rewrite in terms of the gradient <math>\nabla f</math> and the noise correlation matrix <math>D</math>:

<math>\sigma_f^2 = (\nabla f)^T D \nabla f </math>

The best fit parameters are functions of the data. From the above derivations, the parameter function (written as a vector) is:

Let <math> \delta_i </math> be a vector with 1 in the <math>i^{th}</math> position and zero elsewhere. The <math>i^{th}</math> parameter is then

<math> p_i = delta_i^T M^T D^{-1} M)^{-1} M^T D^{-1} \vec{d} </math>

then the partial derivative of the <math>i^{th}</math> parameter with respect to the <math>j^{th}</math> datum is

<math>\frac{ \partial p_i}{ \partial d_j } = \delta_i^T M^T D^{-1} M)^{-1} M^T D^{-1} \delta_j </math>

where this step follows from the fact that the expression for the best fit parameter is linear in the data. It follows that the gradient of the <math>i^{th}</math> parameter is

<math> \nabla p_i = (\delta_i^T (M^T D^{-1} M)^{-1} M^T D^{-1})^T </math>,

where the transpose makes this a column vector. Substituting this into the matrix-formatted error propagation equation, we the error on the $i^{th}$ parameter:

<math> \sigma_p^2 = (\delta_i^T (M^T D^{-1} M)^{-1} M^T D^{-1} ) D (\delta_i^T M^T D^{-1} M)^{-1} M^T D^{-1} \delta_i)^T </math>

<math> = \delta_i^T (M^T D^{-1} M)^{-1} M^T D^{-1} D D^{-1} M (M^T D^{-1} M)^{-1} \delta_i </math>

<math> = \delta_i^T (M^T D^{-1} M)^{-1} M^T D^{-1} M (M^T D^{-1} M)^{-1} \delta_i </math>

<math> = \delta_i^T (M^T D^{-1} M)^{-1} \delta_i </math>

What this result shows is that the variance on the <math>i^{th}</math> parameter, is the <math>i^{th}</math> diagonal element of (M^T D^{-1} M)^{-1}. In analogy with the D matrix, this matrix is called the parameter covariance matrix.

@@ Line 11: / Line 11: @@
 ==Matrix Notation==
-When dealing with linear systems, it's generally easier to work with matrices. Continuing with the above example of the trajectory, we introduce the parameter vector <math>\vec{p}</math>, which is a column vector with the parameters <math>a,b,c</math>. We also introduce matrix $M$ with  <math>N</math> rows and   3 columns. The first, second, and third columns of <math>M</math> have the elements <math> [ t_1^2,  ..., t_N^2 ] </math>, <math> [ t_1,  ..., t_N ] </math>, and <math> [ 1,  ..., 1 ] </math>. Finally we introduce the data vector <math>\vec{d}</math>, which contains all the data. Given these definitions we can write
+When dealing with linear systems, it's generally easier to work with matrices. Continuing with the above example of the trajectory, we introduce the parameter vector <math>\vec{p}</math>, which is a column vector with the parameters <math>a,b,c</math>. We also introduce matrix <math>M</math> with  <math>N</math> rows and   3 columns. The first, second, and third columns of <math>M</math> have the elements <math> [ t_1^2,  ..., t_N^2 ] </math>, <math> [ t_1,  ..., t_N ] </math>, and <math> [ 1,  ..., 1 ] </math>. Finally we introduce the data vector <math>\vec{d}</math>, which contains all the data. Given these definitions we can write
 <math> \vec{d} = M\vec{p} </math>.