Analysis 3: Difference between revisions
No edit summary |
m (Protected "Analysis 3" ([edit=sysop] (indefinite) [move=sysop] (indefinite))) |
||
(29 intermediate revisions by the same user not shown) | |||
Line 11: | Line 11: | ||
==Matrix Notation== | ==Matrix Notation== | ||
When dealing with linear systems, it's generally easier to work with matrices. Continuing with the above example of the trajectory, we introduce the parameter vector <math>\vec{p}</math>, which is a column vector with the parameters <math>a,b,c</math>. We also introduce | When dealing with linear systems, it's generally easier to work with matrices. Continuing with the above example of the trajectory, we introduce the parameter vector <math>\vec{p}</math>, which is a column vector with the parameters <math>a,b,c</math>. We also introduce matrix <math>M</math> with <math>N</math> rows and 3 columns. The first, second, and third columns of <math>M</math> have the elements <math> [ t_1^2, ..., t_N^2 ] </math>, <math> [ t_1, ..., t_N ] </math>, and <math> [ 1, ..., 1 ] </math>. Finally we introduce the data vector <math>\vec{d}</math>, which contains all the data. Given these definitions we can write | ||
<math> \vec{d} = M\vec{p} </math>. | <math> \vec{d} = M\vec{p} </math>. | ||
Line 21: | Line 21: | ||
<math> \chi^2 = \sum_i^N \frac{(d_i - (a t_i^2 + b t_i + c))^2}{s_i^2} </math>, | <math> \chi^2 = \sum_i^N \frac{(d_i - (a t_i^2 + b t_i + c))^2}{s_i^2} </math>, | ||
We can continue to simplify our analysis by introducing the | We can continue to simplify our analysis by introducing the data covariance matrix <math>D</math>, which for our purposes will be an <math>N \times N</math> diagonal matrix where the <math>i^{th}</math> diagonal element is the variance <math>s_i^2</math> on the <math>i^{th}</math> datum. With this matrix the <math>\chi^2</math> can be written as | ||
<math> \chi^2 = (\vec{d}-M\vec{p})^T D^{-1} (\vec{d}-M\vec{p}) </math>. | <math> \chi^2 = (\vec{d}-M\vec{p})^T D^{-1} (\vec{d}-M\vec{p}) </math>. | ||
The best fit parameters will minimize | The best fit parameters will minimize the <math>\chi^2</math> function. Therefore, setting the gradient of this function with respect to the parameters <math>\vec{p}</math> equal to zero, we get an equation for the best fit parameters | ||
<math> M^T D^{-1} M \vec{p} = M^T D^{-1} \vec{d}</math>. | <math> M^T D^{-1} M \vec{p} = M^T D^{-1} \vec{d}</math>. | ||
Line 31: | Line 31: | ||
Therefore, the vector of best fit parameters can be obtained by computing the right hand side of this equation and applying the inverse of the matrix <math>M^T D^{-1} M</math>. | Therefore, the vector of best fit parameters can be obtained by computing the right hand side of this equation and applying the inverse of the matrix <math>M^T D^{-1} M</math>. | ||
Now you have the best fit parameters given the data. The next step is to compute the <math>\chi^2</math> associated with these parameters. See the tutorial on [[Analysis2|goodness of fit]], to evaluate whether the best fit model is a good fit. If it is a good fit, then you can believe your model and go onto propagate errors to your model. | Now you have the best fit parameters given the data. The next step is to compute the <math>\chi^2</math> associated with these parameters. See the tutorial on [[Analysis2|goodness of fit]], to evaluate whether the best fit model is a good fit. If it is a good fit, then you can believe your model and go onto propagate errors to your model. | ||
==Errors on the Model Parameters== | ==Errors on the Model Parameters== | ||
In the first analysis tutorial, we already learned to [[Analysis1|propagate errors]] | In the first analysis tutorial, we already learned to [[Analysis1|propagate errors on the data]] to errors on a function of the data. | ||
<math>\sigma_f^2 = \sum_i \frac{\partial f}{\partial x_i}^2 \sigma_{x_i}^2 </math>. | <math>\sigma_f^2 = \sum_i \frac{\partial f}{\partial x_i}^2 \sigma_{x_i}^2 </math>. | ||
We can rewrite this | We can rewrite this in matrix form in terms of the gradient <math>\nabla f</math> and the noise correlation matrix <math>D</math>: | ||
<math>\sigma_f^2 = \nabla</math> | <math>\sigma_f^2 = (\nabla f)^T D \nabla f </math>. | ||
The best fit parameters are functions of the data. From the above derivations, the parameter function (written as a vector) is: | The best fit parameters are functions of the data. From the above derivations, the parameter function (written as a vector) is: | ||
Line 47: | Line 47: | ||
<math> \vec{p} = (M^T D^{-1} M)^{-1} M^T D^{-1} \vec{d} </math> | <math> \vec{p} = (M^T D^{-1} M)^{-1} M^T D^{-1} \vec{d} </math> | ||
The error propagation equation | Let <math> \delta_i </math> be a column vector with 1 in the <math>i^{th}</math> position and zero elsewhere. The <math>i^{th}</math> parameter is then | ||
<math> p_i = \delta_i^T (M^T D^{-1} M)^{-1} M^T D^{-1} \vec{d} </math>, | |||
and the partial derivative of the <math>i^{th}</math> parameter with respect to the <math>j^{th}</math> datum is | |||
<math>\frac{ \partial p_i}{ \partial d_j } = \delta_i^T (M^T D^{-1} M)^{-1} M^T D^{-1} \delta_j </math>. | |||
(Note <math>\delta_i</math> has the dimensionality of the parameters where as <math>\delta_j</math> has dimensionality of the data.) This last step follows from the fact that the expression for the best fit parameter is linear in the data. It follows that the gradient of the <math>i^{th}</math> parameter is | |||
<math> \nabla p_i = (\delta_i^T (M^T D^{-1} M)^{-1} M^T D^{-1})^T </math>, | |||
where the transpose makes this a column vector. Substituting this into the matrix-formatted error propagation equation, we obtain the error on the <math>i^{th}</math> parameter: | |||
<math> \sigma_p^2 = (\delta_i^T (M^T D^{-1} M)^{-1} M^T D^{-1} ) D (\delta_i^T M^T D^{-1} M)^{-1} M^T D^{-1} \delta_i)^T </math> | |||
<math> = \delta_i^T (M^T D^{-1} M)^{-1} M^T D^{-1} D D^{-1} M (M^T D^{-1} M)^{-1} \delta_i </math> | |||
<math> = \delta_i^T (M^T D^{-1} M)^{-1} M^T D^{-1} M (M^T D^{-1} M)^{-1} \delta_i </math> | |||
<math> = \delta_i^T (M^T D^{-1} M)^{-1} \delta_i </math> | |||
In words: the variance on the <math>i^{th}</math> parameter is the <math>i^{th}</math> diagonal element of <math>(M^T D^{-1} M)^{-1}</math>. In analogy with the data covariance matrix D, this matrix is called the parameter covariance matrix. |
Latest revision as of 00:16, 11 February 2012
Linear Models
A linear model is a model that is linear in the parameters. Recall the example of the trajectory from the previous tutorials. In this example the data <math>d_i</math> were modeled by a quadratic function:
<math> d_i = a t_i^2 + b t_i + c </math>.
Note that this is quadratic in the sense that it is quadratic in time, not in the parameters <math>a,b,c</math>. With respect to the parameters, this model is linear. An example of a nonlinear model is <math> d_i = cos( 2\pi \nu t_i ) </math>, where you are fitting a frequency parameter <math>\nu</math>. In this section you'll learn how to fit linear models to data.
Matrix Notation
When dealing with linear systems, it's generally easier to work with matrices. Continuing with the above example of the trajectory, we introduce the parameter vector <math>\vec{p}</math>, which is a column vector with the parameters <math>a,b,c</math>. We also introduce matrix <math>M</math> with <math>N</math> rows and 3 columns. The first, second, and third columns of <math>M</math> have the elements <math> [ t_1^2, ..., t_N^2 ] </math>, <math> [ t_1, ..., t_N ] </math>, and <math> [ 1, ..., 1 ] </math>. Finally we introduce the data vector <math>\vec{d}</math>, which contains all the data. Given these definitions we can write
<math> \vec{d} = M\vec{p} </math>.
Fitting a Linear Model
In the previous tutorial, we the goodness of fit parameter <math>\chi^2</math>. For the trajectory example, we have
<math> \chi^2 = \sum_i^N \frac{(d_i - (a t_i^2 + b t_i + c))^2}{s_i^2} </math>,
We can continue to simplify our analysis by introducing the data covariance matrix <math>D</math>, which for our purposes will be an <math>N \times N</math> diagonal matrix where the <math>i^{th}</math> diagonal element is the variance <math>s_i^2</math> on the <math>i^{th}</math> datum. With this matrix the <math>\chi^2</math> can be written as
<math> \chi^2 = (\vec{d}-M\vec{p})^T D^{-1} (\vec{d}-M\vec{p}) </math>.
The best fit parameters will minimize the <math>\chi^2</math> function. Therefore, setting the gradient of this function with respect to the parameters <math>\vec{p}</math> equal to zero, we get an equation for the best fit parameters
<math> M^T D^{-1} M \vec{p} = M^T D^{-1} \vec{d}</math>.
Therefore, the vector of best fit parameters can be obtained by computing the right hand side of this equation and applying the inverse of the matrix <math>M^T D^{-1} M</math>.
Now you have the best fit parameters given the data. The next step is to compute the <math>\chi^2</math> associated with these parameters. See the tutorial on goodness of fit, to evaluate whether the best fit model is a good fit. If it is a good fit, then you can believe your model and go onto propagate errors to your model.
Errors on the Model Parameters
In the first analysis tutorial, we already learned to propagate errors on the data to errors on a function of the data.
<math>\sigma_f^2 = \sum_i \frac{\partial f}{\partial x_i}^2 \sigma_{x_i}^2 </math>.
We can rewrite this in matrix form in terms of the gradient <math>\nabla f</math> and the noise correlation matrix <math>D</math>:
<math>\sigma_f^2 = (\nabla f)^T D \nabla f </math>.
The best fit parameters are functions of the data. From the above derivations, the parameter function (written as a vector) is:
<math> \vec{p} = (M^T D^{-1} M)^{-1} M^T D^{-1} \vec{d} </math>
Let <math> \delta_i </math> be a column vector with 1 in the <math>i^{th}</math> position and zero elsewhere. The <math>i^{th}</math> parameter is then
<math> p_i = \delta_i^T (M^T D^{-1} M)^{-1} M^T D^{-1} \vec{d} </math>,
and the partial derivative of the <math>i^{th}</math> parameter with respect to the <math>j^{th}</math> datum is
<math>\frac{ \partial p_i}{ \partial d_j } = \delta_i^T (M^T D^{-1} M)^{-1} M^T D^{-1} \delta_j </math>.
(Note <math>\delta_i</math> has the dimensionality of the parameters where as <math>\delta_j</math> has dimensionality of the data.) This last step follows from the fact that the expression for the best fit parameter is linear in the data. It follows that the gradient of the <math>i^{th}</math> parameter is
<math> \nabla p_i = (\delta_i^T (M^T D^{-1} M)^{-1} M^T D^{-1})^T </math>,
where the transpose makes this a column vector. Substituting this into the matrix-formatted error propagation equation, we obtain the error on the <math>i^{th}</math> parameter:
<math> \sigma_p^2 = (\delta_i^T (M^T D^{-1} M)^{-1} M^T D^{-1} ) D (\delta_i^T M^T D^{-1} M)^{-1} M^T D^{-1} \delta_i)^T </math>
<math> = \delta_i^T (M^T D^{-1} M)^{-1} M^T D^{-1} D D^{-1} M (M^T D^{-1} M)^{-1} \delta_i </math>
<math> = \delta_i^T (M^T D^{-1} M)^{-1} M^T D^{-1} M (M^T D^{-1} M)^{-1} \delta_i </math>
<math> = \delta_i^T (M^T D^{-1} M)^{-1} \delta_i </math>
In words: the variance on the <math>i^{th}</math> parameter is the <math>i^{th}</math> diagonal element of <math>(M^T D^{-1} M)^{-1}</math>. In analogy with the data covariance matrix D, this matrix is called the parameter covariance matrix.