Curve fitting using scipy and lmfit | Mandeep Singh Basson (2024)

Introduction

The data from the experiments or simulations, exists as discrete numbers which I usually store as text or binary files. Let’s see the data from one of my experiments:

As soon as I see this data, I can tell that there is a non-linear (maybe exponential) relationship between time and stress. Wouldn’t it be awesome if I could fit a line through this data so that I can predict/extrapolate the stress value at any given time? Well, one of the prime tasks in geotechincal engineering is trying to fit meaningful relationships to sometimes quite absurd and random data.

The easiest way to fit a function to a data would be to import that data in Excel and use its predefined Trendline function. The Trendline option is quite robust for common set of function (linear, power, exponential etc) but it lacks in complexity and rigorosity often required in engineering applications. This is where our best friend Python comes into picture.

In this tutorial, we will look into:
1. scipy’s curve_fit module
2. lmfit module (which is what I use most of the time)

1. Generate data for a linear fitting

Let’s generate some data whose fitting would be a linear line with equation:\begin{equation}y = m x + c\end{equation}

where, m is usually the slope of the line and c is the intercept when x = 0 and x (Time), y (Stress) is our data. The y axis data is usually the measured value during the experiment/simulation and we are trying to find how the y axis quantity is dependent on the x axis quantity.

# Importing numpy for creating data and matplotlib for plottingimport numpy as npimport matplotlib.pyplot as plt# Creating x axis datax = np.linspace(0,10,100)# Creating random data for y axis # Here the slope (m) is predefined to be 2.39645# The intercept is 0# random.normal method just adds some noise to datay = 2.39645 * x + np.random.normal(0, 2, 100)# Plotting the dataplt.scatter(x,y,c='black')plt.xlabel('Time (sec)')plt.ylabel('Stress (kPa)')plt.show()

2. Using Scipy’s curve_fit module

Scipy has a powerful module for curve fitting which uses non linear least squares method to fit a function to our data. The documentation is available it https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html.

To use curve_fit for the above data, we need to define a linear function which will be used to find fitting. The output will be the slope (m) and intercept ( c ) for our data, along with any variance in these values and any fitting statistics ($R^2$ or $\chi^2$).

# Calling the scipy's curve_fit function from optimize modulefrom scipy.optimize import curve_fit# Defining a fitting fucntiondef linear_fit(x,m,c): return m*x + c'''1. Using the curve_fit function to fit the random linear data2. Params returns an array with the best for values of the different fitting parameters. In our case first entry in params will be the slope m and second entry would be the intercept.3. Covariance returns a matrix of covariance for our fitted parameters.4. The first argument f is the defined fitting function.5. xdata and ydata are the x and y data we generated above.'''params, covariance = curve_fit(f = linear_fit, xdata = x, ydata = y)print('Slope (m) is ', params[0])print('Intercept (c) is ', params[1])print(covariance)
Slope (m) is 2.430364409972301Intercept (c) is -0.11571245549643683[[ 0.00434152 -0.02170757] [-0.02170757 0.14544807]]

The slope of the line provided by curve_fit is lower/higher than the input slope. This is due to the random noise introduced in the data. Also, due to this noise the intercept is non zero.

What if we want the line to pass through origin (x=0,y=0)? In other words, what if we want the intercept to be zero?

We can force the curve_fit to works inside the bounds defined by you for the different fitting parameters. Let’s force the intercept ( c ) to be zero, and slope (m) to work between 1 and 3.

Also, let’s plot the two set of fitted parameters we obtained along with the data we generated.

'''1. Bounds argument provides the bounds for our fitting parameters.2. The format for bounds is ((lower bound of m, lower bound of c),(upper bound of m, upper bound of c))3. curve_fir doesn't have an argument to make a parameter fixed, so I gave a very small value to c to force is close to 0.ArithmeticError'''params2, covariance2 = curve_fit(f = linear_fit, xdata = x, ydata = y, bounds=((2,0),(3,0.00000001)))print('Slope (m) is ', params2[0])print('Intercept (c) is ', params2[1])print(covariance2)plt.scatter(x,y,c='black')plt.xlabel('Time (sec)')plt.ylabel('Stress (kPa)')plt.plot(x, linear_fit(x,params[0],params[1]),c='red',ls='-',lw=5)plt.plot(x, linear_fit(x,params2[0],params2[1]),c='blue',ls='-',lw=2)plt.show();
Slope (m) is 2.4130947618073093Intercept (c) is 9.39448469276871e-15[[ 0.0043456 -0.02172798] [-0.02172798 0.14558476]]

What if we want some statistics parameters ($R^2$)?

Although covariance matrix provides information about the variance of the fitted parameters, mostly everyone uses the standard deviation and coefficient of determination ($R^2$) value to get an estimate of ‘goodness’ of fit. I use it all the time for simpler fittings (linear, exponential, power etc).

# Getting the standard deviationstandarddevparams2 = np.sqrt(np.diag(covariance2))# Getting the R^2 value for the fitting# Read more at https://en.wikipedia.org/wiki/Coefficient_of_determination# Step 1 : Get the residuals for fittingresiduals = y - linear_fit(x,params2[0],params2[1])# Step 2 : Get the sum of squares of residualsquaresumofresiduals = np.sum(residuals**2)# Step 3 : Get the total sum of squares using the mean of y valuessquaresum = np.sum((y-np.mean(y))**2)# Step 4 : Get the R^2 R2 = 1 - (squaresumofresiduals/squaresum)print('The slope (m) is ', params2[0],'+-', standarddevparams2[0])print('The intercept (c) is ', params2[1],'+-', standarddevparams2[1])print('The R^2 value is ', R2)
The slope (m) is 2.4130947618073093 +- 0.06592113029190362The intercept (c) is 9.39448469276871e-15 +- 0.3815557049874131The R^2 value is 0.9327449525970685

Well the $R^2$ value for our is < 1, so it is not a good fit (although 0.9 is not bad, all depends on what you are looking for). Now look at the standard deviation on the intercept. Scipy’s curve_fit is not able to accurately force the intercept to be zero which causes that high standard deviation and a low $R^2$ value.

This is where lmfit (my favorite fitting package) comes into play. As the complexity of fitting function and parameter bounds increases curve_fit becomes less accurate and more crumbersome.

2. Using lmfit module

Lmfit provides a high-level interface to non-linear optimization and curve fitting problems for Python. It builds on and extends many of the optimization methods of scipy.optimize, has been quite mature and provides a number of useful enhancements and quality of life improvements.

The most welcomed feature is the parameter and results classes, which along with the minimize function provides a seamless pythonic way for any curve fitting needs. Personally, I love it and have been using it for couple of years for many many curve fittings in different coordinate systems. With lmfit I can clearly and concisely define my fitting functions and parameter bounds and display quite a lot of statistical data for my fit.

Let’s try our linear fitting using lmfit!

'''1. Importing the module. If you dont have lmfit, you can download it on pip or conda. Instructions at https://lmfit.github.io/lmfit-py/installation.html2. Parameters is the main parameters class.3. minimize is the main fitting function. Takes our parameters and spits the best fit parameters.4. fit_report provides the parameter fit values and different statistical values.'''from lmfit import Parameters,minimize, fit_report# Define the fitting functiondef linear_fitting_lmfit(params,x,y): m = params['m'] c = params['c'] y_fit = m*x + c return y_fit-y# Defining the various parametersparams = Parameters()# Slope is bounded between min value of 1.0 and max value of 3.0params.add('m', min=1.0, max=3.0)# Intercept is made fixed at 0.0 valueparams.add('c', value=0.0, vary = False)# Calling the minimize function. Args contains the x and y data.fitted_params = minimize(linear_fitting_lmfit, params, args=(x,y,), method='least_squares')# Getting the fitted valuesm = fitted_params.params['m'].valuec = fitted_params.params['c'].value # Printing the fitted valuesprint('The slope (m) is ', m)print('The intercept (c) is ', c)# Pretty printing all the statistical dataprint(fit_report(fitted_params))
The slope (m) is 2.4130947619915073The intercept (c) is 0.0[[Fit Statistics]] # fitting method = least_squares # function evals = 7 # data points = 100 # variables = 1 chi-square = 362.059807 reduced chi-square = 3.65716977 Akaike info crit = 130.663922 Bayesian info crit = 133.269093[[Variables]] m: 2.41309476 +/- 0.03303994 (1.37%) (init = 1) c: 0 (fixed)

Seems a lot of work for a simple linear fitting?

Trust me, as soon as your fitting functions become complex you will run towards lmfit with its concise and powerful way of defining and bounding parameters.

Sometimes you need to look beyond just least squares ;)

3. Power law curve fitting

Power law fitting is probably one of the most common type of fitting in engineering. Most engineering parameters saturate after some time which is captured by a decreasing power function. The basic equation is described as:

\begin{equation} y = a x^b\end{equation}

The best example would be the change in area of a square relative to the change in the length of the square. The area of a square is dependent on its length as:

\begin{equation} \text{Area} = \text{Length}^2\end{equation}

I frequently use power law to study the variation of stiffness with stress and create constitutive laws for materials. Let’s see how to do a power fitting with scipy’s curve_fit and lmfit.

# Creating a random dummy data for our power fittin# a = 12.56# b = 0.25x = np.linspace(0,10,100)y = 12.56 * x ** 0.25 + np.random.normal(0, 2, 100)# Defining fitting function for curve_fitdef power_fit(x,a,b): return a * x ** b# Calling the curve_fit functionparams, covariance = curve_fit(f = power_fit, xdata = x, ydata = y)print('a is ', params[0])print('b is ', params[1])print(covariance)plt.scatter(x,y,c='black')plt.xlabel('Time (sec)')plt.ylabel('Stress (kPa)')plt.plot(x, power_fit(x,params[0],params[1]),c='red',ls='-',lw=5)plt.show();
a is 12.582417620337397b is 0.25151997896349065[[ 0.13306355 -0.00554453] [-0.00554453 0.00026803]]
from lmfit import Parameters,minimize, fit_report# Define the fitting functiondef power_fitting_lmfit(params,x,y): a = params['a'] b = params['b'] y_fit = a* x **b return y_fit-y# Defining the various parametersparams = Parameters()# Slope is bounded between min value of 1.0 and max value of 3.0params.add('a', min= 1.0, max= 23.0)# Intercept is made fixed at 0.0 valueparams.add('b', min= 0.0, max= 1.0)# Calling the minimize function. Args contains the x and y data.fitted_params = minimize(power_fitting_lmfit, params, args=(x,y,), method='least_squares')# Getting the fitted valuesa = fitted_params.params['a'].valueb = fitted_params.params['b'].value # Printing the fitted valuesprint('The a is ', a)print('The b is ', b)# Pretty printing all the statistical dataprint(fit_report(fitted_params))plt.scatter(x,y,c='black')plt.xlabel('Time (sec)')plt.ylabel('Stress (kPa)')plt.plot(x, a*x**b,c='red',ls='-',lw=5)plt.show();
The a is 12.582417884599595The b is 0.25151996619258815[[Fit Statistics]] # fitting method = least_squares # function evals = 11 # data points = 100 # variables = 2 chi-square = 380.338017 reduced chi-square = 3.88100017 Akaike info crit = 137.589019 Bayesian info crit = 142.799359[[Variables]] a: 12.5824179 +/- 0.36477882 (2.90%) (init = 1) b: 0.25151997 +/- 0.01637166 (6.51%) (init = 0)[[Correlations]] (unreported correlations are < 0.100) C(a, b) = -0.928

4. Transcendental function fitting

Let’s try a slightly complex function this time. Assume you have a sine wave, but it is decreasing in time. This happens when your oscillator loses it’s energy faster than it can provide, giving us a dampled sine wave. The equation of a damped sine wave is:

\begin{equation} y(t) = A e^{(-\lambda t)} cos(\omega t +\phi)\end{equation}

where, A is the amplitude of wave, $\lambda$ is the decay constant, $\omega$ is the angular frequency and $\phi$ is the phase angle. This is super simple with the lmfit module.

# Creating a random dummy data for our power fittin# A = 2.5# lambda = 0.5# omega = 7.2# phi = 3.14t = np.linspace(0,6,100)y = 2.5*np.exp(-0.5*t)*np.cos(7.2*t+3.14) + np.random.normal(0, 0.25, 100)# Define the fitting functiondef decaying_sine(params,t,y): A = params['A'] lambdas = params['lambdas'] #lambda has a different meaning in python omega = params['omega'] phi = params['phi'] y_fit = A * np.exp(-lambdas*t)*np.cos(omega*t+phi) return y_fit-y# Defining the various parametersparams = Parameters()params.add('A', min= 1.0, max= 5.0)params.add('lambdas', min= -1.0, max= 1.0)params.add('omega', min= 0.0, max= 10.0)params.add('phi', min= 0.0, max= 10.0)# Calling the minimize function. Args contains the x and y data.fitted_params = minimize(decaying_sine, params, args=(t,y,), method='nelder')# Getting the fitted valuesA = fitted_params.params['A'].valuelambdas = fitted_params.params['lambdas'].value omega = fitted_params.params['omega'].value phi = fitted_params.params['phi'].value # Pretty printing all the statistical dataprint(fit_report(fitted_params))plt.scatter(t,y,c='black')plt.xlabel('Time (sec)')plt.ylabel('Stress (kPa)')plt.plot(t, A * np.exp(-lambdas*t)*np.cos(omega*t+phi),c='red',ls='-',lw=5)plt.show();
[[Fit Statistics]] # fitting method = Nelder-Mead # function evals = 357 # data points = 100 # variables = 4 chi-square = 6.70880707 reduced chi-square = 0.06988341 Akaike info crit = -262.174903 Bayesian info crit = -251.754223## Warning: uncertainties could not be estimated: this fitting method does not natively calculate uncertainties and numdifftools is not installed for lmfit to do this. Use `pip install numdifftools` for lmfit to estimate uncertainties with this fitting method.[[Variables]] A: 2.51443557 (init = 1) lambdas: 0.49479403 (init = -1) omega: 7.21220270 (init = 0) phi: 3.11888095 (init = 0)

Both scipy and lmfit are powerful curve fitting tools and hopefully this tutorial helps you in appreciating their power!

Curve fitting using scipy and lmfit | Mandeep Singh Basson (2024)

References

Top Articles
Latest Posts
Article information

Author: Kelle Weber

Last Updated:

Views: 5575

Rating: 4.2 / 5 (53 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Kelle Weber

Birthday: 2000-08-05

Address: 6796 Juan Square, Markfort, MN 58988

Phone: +8215934114615

Job: Hospitality Director

Hobby: tabletop games, Foreign language learning, Leather crafting, Horseback riding, Swimming, Knapping, Handball

Introduction: My name is Kelle Weber, I am a magnificent, enchanting, fair, joyous, light, determined, joyous person who loves writing and wants to share my knowledge and understanding with you.