\documentclass[12pt]{article}

\usepackage{geometry}  % give some flexibility to format
\geometry{verbose,tmargin=0.75in,bmargin=0.75in,lmargin=1in,rmargin=1in} % needs the package geometry to work
\usepackage{amsmath}

\newcommand{\lmaInvCov}{{\boldsymbol \alpha}(\vec{p}, \lambda)}
\newcommand{\chisqd}[1]{\frac{\partial \chi^2}{\partial p_{#1}}} % single derivative
\newcommand{\chisqdd}[2]{\frac{\partial^2 \chi^2}{\partial p_{#1} \partial p_{#2}}} % double derivative

\begin{document}
\title{Data Reduction Notes}
\author{Tobias Marriage}
\date{Feb 2011}
\maketitle

\section{Non-linear Fitting}

To interpret a dataset with measurements $\{d_i\}$, we have a model which, given $n$ model parameters $\vec{p}$, gives corresponding values $\{m_i(\vec{p})\}$ . The chi-squared for this model is
\begin{equation}
\chi^2(\vec{p}) = \sum_i \frac{[d_i - m_i(\vec{p})]^2}{\sigma_i^2},
\label{eq:chi-sq}
\end{equation}
where $\sigma_i^2$ is the estimate for the variance on $d_i$. 

The errors on $d_i$ are assumed uncorrelated.
For linear models, the chi-squared function was a quadratic in the parameters, so we can write
\begin{equation}
\chi^2(\vec{p}) = \chi^2(\vec{p}_0) + \frac{d\chi^2}{d\vec{p}}(\vec{p}_0) (\vec{p}-\vec{p}_0)+ \frac{1}{2}\frac{d^2\chi^2}{d^2\vec{p}}(\vec{p_0}) (\vec{p}-\vec{p}_0)^2,
\end{equation}
where $\vec{p}_0$ is an arbitrary starting point in parameter space. The minimizing ``jump'' in parameter space, $\delta\vec{p} =  (\vec{p_{min}}-\vec{p_0})$, is given by
\begin{eqnarray}
0 & = &\frac{d\chi^2}{d\vec{p}}(\vec{p_0}) + \frac{d^2\chi^2}{d^2\vec{p}}(\vec{p_0})\delta\vec{p}  \nonumber \\
\delta\vec{p} & = & -\left(\frac{d^2\chi^2}{d^2\vec{p}}(\vec{p_0})\right)^{-1} \frac{d\chi^2}{d\vec{p}}(\vec{p_0}) 
\label{eq:quad_decent}
\end{eqnarray}
Note that in general the first derivative of chi-squared is a vector (the gradient), and the second derivative is a matrix (the hessian).

Many times, however, the chi-squared function is not purely quadratic. This is the case for non-linear models. In these cases, the  above prescription for a purely quadratic function can fail. In these cases, we must follow our nose, potentially taking many steps around parameter space to find the minimum of the chi-squared. One easy prescription is to follow the gradient of chi-squared. The step in parameter space is then
\begin{equation}
\delta\vec{p} =  C \frac{d\chi^2}{d\vec{p}}(\vec{p_0}), 
\label{eq:grad_decent}
\end{equation}
where $C$ is a diagonal matrix of dimension $n\times n$. In general, we can use both step formulae (Eq~\ref{eq:quad_decent} and Eq~\ref{eq:grad_decent}) to explore the parameter space in search of the best fit values.

\subsection{Gradient and Hessian}

The next step towards an algorithm for non-linear model fitting is constructing the gradient vector and hessian matrix of the chi-squared function (Eq~\ref{eq:chi-sq}).The gradient is straight forward. The $j^{th}$ element of this vector is
\begin{equation}
\chisqd{j}(\vec{p_0}) = - 2 \sum_i \frac{[d_i - m_i(\vec{p}_0)]}{\sigma_i^2} \left( \frac{\partial m_i}{\partial p_j}(\vec{p}_0) \right).
\label{eq:gradient}
\end{equation} 
And the $(j,k)^{th}$ element of the hessian is
\begin{eqnarray}
\chisqdd{j}{k}(\vec{p_0}) & = & - 2 \sum_{i} -\sigma_i^{-2} \left( \frac{\partial m_i}{\partial p_j}(\vec{p}_0) \right)  \left( \frac{\partial m_i}{\partial p_k}(\vec{p}_0) \right) + \frac{[d_i - m_i(\vec{p}_0)]}{\sigma_i^2} \left( \frac{\partial^2 m_i}{\partial p_j\partial p_k}(\vec{p}_0) \right) \nonumber \\
& \approx &  2 \sum_{i} \sigma_i^{-2} \left( \frac{\partial m_i}{\partial p_j}(\vec{p}_0) \right)  \left( \frac{\partial m_i}{\partial p_k}(\vec{p}_0) \right).
\label{eq:hessian}
\end{eqnarray} 
The second term in the sum on the first line tends to cancel out for a good model near the chi-squared minimum: the quantity $d_i - m_i(\vec{p}_0)$ will be randomly positive and negative across $d_i$ such that its sum over $i$ should be small (Why isn't this the case for Eq \ref{eq:gradient}? My hypothesis is that both of these terms are small near the minimum and it is the first term in the hessian which drives the final steps of the search.). The second term is also supposedly a numerical nuisance, so we drop it for the effective expression of the hessian.

\subsection{Levenburg-Marquardt}

The final step towards solving a non-linear least squares system is providing a prescription for stepping through parameter space and in particular choosing the constant in Eq \ref{eq:grad_decent}. The Levenberg-Marquardt approach is the classic way to do this \cite{levenberg:1944}. In this approach the constant for gradient decent is chosen to be proportional to the inverse of the second derivative with respect to the parameter in question. Following this prescription, Eq \ref{eq:quad_decent} and Eq \ref{eq:grad_decent} can be written
\begin{eqnarray}
-\left[\begin{matrix}
 \chisqdd{1}{1} &  \ldots & \chisqdd{1}{n} \\
\vdots   & \ddots & \vdots\\
\chisqdd{n}{1} & \ldots &  \chisqdd{n}{n} 
\end{matrix}\right]
\left[\begin{matrix} \delta p_1 \\ \vdots \\ \delta p_n\end{matrix} \right] & =  &
\left[\begin{matrix} \chisqd{1} \\ \vdots \\ \chisqd {n}\end{matrix} \right], \label{eq:lma0}\\
\lambda 
\left[\begin{matrix}
\chisqdd{1}{1} & & 0 \\
 & \ddots & \\
0  & & \chisqdd{n}{n}
\end{matrix}\right]
\left[\begin{matrix} \delta p_1 \\ \vdots \\ \delta p_n\end{matrix} \right] & =  & 
\left[\begin{matrix} \chisqd{1} \\ \vdots \\ \chisqd {n}\end{matrix} \right],
\label{eq:lma1}
\end{eqnarray}
where $\lambda$ is a constant. Eq \ref{eq:lma0} and Eq \ref{eq:lma1} specify, respectively, the quadratic (Eq \ref{eq:quad_decent}) and gradient (Eq \ref{eq:grad_decent}) contributions to $\delta\vec{p}$. These equations can be rewritten as a single operation on the gradient
\begin{equation}
\left[\begin{matrix} \delta p_1 \\ \vdots \\ \delta p_n\end{matrix} \right]  =   \left(
\left[\begin{matrix}
 \chisqdd{1}{1} (-1 + \lambda) &  \ldots & \chisqdd{1}{n} \\
\vdots   & \ddots & \vdots\\
\chisqdd{n}{1} & \ldots &  \chisqdd{n}{n} (-1 + \lambda)
\end{matrix}\right]
  \right)^{-1}
\left[\begin{matrix} \chisqd{1} \\ \vdots \\ \chisqd {n}\end{matrix} \right],
\end{equation}
Setting the matrix on the right hand side to $\lmaInvCov$, the expression simplifies to
\begin{equation}
 \delta \vec{p} =  \lmaInvCov^{-1} \bigtriangledown_p \chi^2.
\label{eq:lma3}
\end{equation}
As a prescription for choosing $\lambda$ and stepping around parameter space, here is a suggestion from the ever useful Numerical Recipes books \cite{press:1992}
\begin{enumerate}
\item Solve Eq \ref{eq:lma3} for $\delta\vec{p}$.
\item If  $\chi^2(\vec{p}+\delta\vec{p}) > \chi^2(\vec{p})$, do \emph{not} set $\vec{p} = \vec{p}+\delta\vec{p}$ (reject the step) and increase $\lambda$ by a factor of 10. Return to start.
\item If  $ \chi^2(\vec{p}+\delta\vec{p}) < \chi^2(\vec{p})$, set $\vec{p} = \vec{p}+\delta\vec{p}$ and decrease $\lambda$ by a factor of 10. Return to start.
\end{enumerate}
You stop iterating this sequence when the quantity $\chi^2(\vec{p}+\delta\vec{p}) - \chi^2(\vec{p})$ is significantly less than one.

\begin{thebibliography}{99}
\bibitem{levenberg:1944}Kenneth Levenberg. ``A Method for the Solution of Certain Non-Linear Problems in Least Squares''. The Quarterly of Applied Mathematics 2: 164-168 (1944).
\bibitem{press:1992}William Press et al. {\it Numerical Recipes in C}. Cambridge: Cambridge University Press, 1992.
\end{thebibliography}

\end{document}