Getting started with Non-Linear Least-Squares Fitting — Non-Linear Least-Squares Minimization and Curve-Fitting for Python (2024)

The lmfit package is designed to provide simple tools to help you build ofcomplex fitting models for non-linear least-squares problems and applythese models to real data. This section gives an overview of the conceptsand describes how to set up and perform simple fits. Some basic knowledgeof Python, numpy, and modeling data are assumed.

To do a non-linear least-squares fit of a model to data or for a variety of otheroptimization problems, the main task is to write an objective functionthat takes the values of the fitting variables and calculates either ascalar value to be minimized or an array of values that is to be minimizedin the least-squares sense. For many data fitting processes, theleast-squares approach is used, and the objective function shouldreturn an array of (data-model), perhaps scaled by some weighting factorsuch as the inverse of the uncertainty in the data. For such a problem,the chi-square (\(\chi^2\)) statistic is often defined as:

\[\chi^2 = \sum_i^{N} \frac{[y^{\rm meas}_i - y_i^{\rm model}({\bf{v}})]^2}{\epsilon_i^2}\]

where \(y_i^{\rm meas}\) is the set of measured data, \(y_i^{\rmmodel}({\bf{v}})\) is the model calculation, \({\bf{v}}\) is the set ofvariables in the model to be optimized in the fit, and \(\epsilon_i\)is the estimated uncertainty in the data.

In a traditional non-linear fit, one writes an objective function that takes thevariable values and calculates the residual \(y^{\rm meas}_i -y_i^{\rm model}({\bf{v}})\), or the residual scaled by the datauncertainties, \([y^{\rm meas}_i - y_i^{\rmmodel}({\bf{v}})]/{\epsilon_i}\), or some other weighting factor. As asimple example, one might write an objective function like this:

def residual(vars, x, data, eps_data): amp = vars[0] phaseshift = vars[1] freq = vars[2] decay = vars[3] model = amp * sin(x * freq + phaseshift) * exp(-x*x*decay) return (data-model)/eps_data

To perform the minimization with scipy.optimize, one would do:

from scipy.optimize import leastsqvars = [10.0, 0.2, 3.0, 0.007]out = leastsq(residual, vars, args=(x, data, eps_data))

Though it is wonderful to be able to use python for such optimizationproblems, and the scipy library is robust and easy to use, the approachhere is not terribly different from how one would do the same fit in C orFortran. There are several practical challenges to using this approach,including:

The user has to keep track of the order of the variables, and theirmeaning – vars[0] is the amplitude, vars[2] is the frequency, and soon, although there is no intrinsic meaning to this order.
If the user wants to fix a particular variable (not vary it in thefit), the residual function has to be altered to have fewer variables,and have the corresponding constant value passed in some other way.While reasonable for simple cases, this quickly becomes a significantwork for more complex models, and greatly complicates modeling forpeople not intimately familiar with the details of the fitting code.
There is no simple, robust way to put bounds on values for thevariables, or enforce mathematical relationships between thevariables. In fact, those optimization methods that do providebounds, require bounds to be set for all variables with separatearrays that are in the same arbitrary order as variable values.Again, this is acceptable for small or one-off cases, but becomespainful if the fitting model needs to change.
See Also
mike.depalatis.net - Fitting with lmfit

These shortcomings are really do solely to the use of traditional arrays ofvariables, as matches closely the implementation of the Fortran code. Thelmfit module overcomes these shortcomings by using a core reason for usingPython – objects. The key concept for lmfit is to use Parameterobjects instead of plain floating point numbers as the variables for thefit. By using Parameter objects (or the closely relatedParameters – a dictionary of Parameter objects), one can

not care about the order of variables, but refer to Parametersby meaningful names.
place bounds on Parameters as attributes, without worrying about order.
fix Parameters, without having to rewrite the objective function.
place algebraic constraints on Parameters.

To illustrate the value of this approach, we can rewrite the above exampleas:

from lmfit import minimize, Parametersdef residual(params, x, data, eps_data): amp = params['amp'].value pshift = params['phase'].value freq = params['frequency'].value decay = params['decay'].value model = amp * sin(x * freq + pshift) * exp(-x*x*decay) return (data-model)/eps_dataparams = Parameters()params.add('amp', value=10)params.add('decay', value=0.007)params.add('phase', value=0.2)params.add('frequency', value=3.0)out = minimize(residual, params, args=(x, data, eps_data))

At first look, we simply replaced a list of values with a dictionary,accessed by name – not a huge improvement. But each of the namedParameter in the Parameters object hold additionalattributes to modify the value during the fit. For example, Parameters canbe fixed or bounded. This can be done when being defined:

params = Parameters()params.add('amp', value=10, vary=False)params.add('decay', value=0.007, min=0.0)params.add('phase', value=0.2)params.add('frequency', value=3.0, max=10)

(where vary=False will prevent the value from changing in the fit, andmin=-0.0 will set a lower bound on that parameters value) or afterbeing defined by setting the corresponding attributes after they have beencreated:

params['amp'].vary = Falseparams['decay'].min = 0.10

Importantly, our function to be minimized remains unchanged.

The params object can be copied and modified to make many user-levelchanges to the model and fitting process. Of course, most of theinformation about how your data is modeled goes into the objectivefunction, but the approach here allows some external control, that is bythe user of the objective function instead of just by the author of theobjective function.

Finally, in addition to the Parameters approach to fitting data,lmfit allows you to easily switch optimization methods without rewritingyour objective function, and provides tools for writing fitting reports andfor better determining the confidence levels for Parameters.

Getting started with Non-Linear Least-Squares Fitting — Non-Linear Least-Squares Minimization and Curve-Fitting for Python (2024)

References