First, I'd like to provide a little explanation on what my code is supposed to do. It is part of a middle-sized project. I restructured the code to work on its own, and also added little comments to help you understand what I'm doing. This is one of the 5 methods how spectrograms can be analyzed. Now let me give you a little physical background (I won't go in the details):
A spectrogram is a dataset, x values mean frequency, y values mean intensity. We either have it on it's own, or we can normalize it (if sam
and ref
args are given). _handle_input
function below takes care of reading the given inputs right. In this example dataset, we have it normalized. The first step is to determine the minimum and maximum points in the given dataset: we can manually feed the minimum and maximum places through minx
and maxx
arguments. If it's not used, we will use scipy's argrelextrema
as default. This example we will not over-complicate things, so let's use the defaults.
In the next step we need to specify a reference point (ref_point
arg, preferably somewhere in the middle of x values), and get the relative positions of minimums and maximums to that point. Then we define an order:
The closest extremal points are 1st order, the 2nd closest points are 2nd order, etc. Then, we multiply the order by pi, and that will be the y value of the spectral phase (which we want to calculate).
Now, our job to fit a polynomial to the spectral phase graph (the lower graph on the picture). The order varies from 1 to 5. From the fitted parameters then we can calculate the dispersion's coefficients, which is the purpose of the whole method. The calculation is described at the poly's docstrings. For example:
def poly_fit5(x, b0, b1, b2, b3, b4, b5): """ Taylor polynomial for fit b1 = GD b2 = GDD / 2 b3 = TOD / 6 b4 = FOD / 24 b5 = QOD / 120 """ return b0+b1*x+b2*x**2+b3*x**3+b4*x**4+b5*x**5
We need the GD, GDD, etc., so the method I wrote returns these params, not the fitted parameters. I also added a decorator called show_disp
, which immediately prints the results in the right format.
Here is the full code: Please, first download the dataset above, scroll down and uncomment the given examples to see how it's working.
import operator from functools import wraps from math import factorial import numpy as np import matplotlib.pyplot as plt from scipy.optimize import curve_fit from scipy.signal import argrelextrema try: from lmfit import Model _has_lmfit = True except ImportError: _has_lmfit = False __all__ = ['min_max_method'] def find_nearest(array, value): array = np.asarray(array) idx = (np.abs(value - array)).argmin() return array[idx], idx def show_disp(f): #decorator to immediately print out the dispersion results @wraps(f) def wrapping(*args, **kwargs): disp, disp_std, stri = f(*args, **kwargs) labels = ['GD', 'GDD','TOD', 'FOD', 'QOD'] for i in range(len(labels)): print(labels[i] + ' = ' + str(disp[i]) + ' +/- ' + str(disp_std[i]) + ' 1/fs^{}'.format(i+1)) return disp, disp_std, stri return wrapping def _handle_input(init_x, init_y, ref, sam): """ Instead of handling the inputs in every function, there is this method. Parameters ---------- init_x: array-like x-axis data init_y: array-like y-axis data ref, sam: array-like reference and sample arm spectrum evaluated at init_x Returns ------- init_x: array-like unchanged x data y_data: array-like the proper y data """ if (len(init_x) > 0) and (len(init_y) > 0) and (len(sam) > 0): y_data = (init_y-ref-sam)/(2*np.sqrt(ref*sam)) elif (len(ref) == 0) or (len(sam) == 0): y_data = init_y elif len(init_y) == 0 or len(init_x) == 0: raise ValueError('Please load the spectrum!\n') else: raise ValueError('Input shapes are wrong.\n') return init_x, y_data @show_disp def min_max_method(init_x, init_y, ref, sam, ref_point, maxx=None, minx=None, fit_order=5, show_graph=False): """ Calculates the dispersion with minimum-maximum method Parameters ---------- init_x: array-like x-axis data init_y: array-like y-axis data ref, sam: array-like reference and sample arm spectra evaluated at init_x ref_point: float the reference point to calculate order maxx and minx: array-like the accepted minimal and maximal places should be passed to these args fit_order: int degree of polynomial to fit data [1, 5] show_graph: bool if True returns a matplotlib plot and pauses execution until closing the window Returns ------- dispersion: array-like [GD, GDD, TOD, FOD, QOD] dispersion_std: array-like [GD_std, GDD_std, TOD_std, FOD_std, QOD_std] fit_report: lmfit report (if available) """ x_data, y_data = _handle_input(init_x, init_y, ref, sam) # handling default case when no minimum or maximum coordinates are given if maxx is None: maxInd = argrelextrema(y_data, np.greater) maxx = x_data[maxInd] if minx is None: minInd = argrelextrema(y_data, np.less) minx = x_data[minInd] _, ref_index = find_nearest(x_data, ref_point) # getting the relative distance of mins and maxs coordinates from the ref_point relNegMaxFreqs = np.array([a for a in (x_data[ref_index]-maxx) if a<0]) relNegMinFreqs= np.array([b for b in (x_data[ref_index]-minx) if b<0]) relNegFreqs = sorted(np.append(relNegMaxFreqs, relNegMinFreqs)) # reversing order because of the next loop relNegFreqs = relNegFreqs[::-1] relPosMaxFreqs = np.array([c for c in (x_data[ref_index]-maxx) if c>0]) relPosMinFreqs= np.array([d for d in (x_data[ref_index]-minx) if d>0]) relPosFreqs = sorted(np.append(relPosMinFreqs,relPosMaxFreqs)) negValues = np.zeros_like(relNegFreqs) posValues = np.zeros_like(relPosFreqs) for freq in range(len(relPosFreqs)): posValues[freq] = np.pi*(freq+1) for freq in range(len(relNegFreqs)): negValues[freq] = np.pi*(freq+1) # making sure the data is ascending in x # FIXME: Do we even need this? x_s = np.append(relPosFreqs, relNegFreqs) y_s = np.append(posValues, negValues) L = sorted(zip(x_s,y_s), key=operator.itemgetter(0)) fullXValues, fullYValues = zip(*L) # now we have the data, let's fit a function if _has_lmfit: if fit_order == 5: fitModel = Model(poly_fit5) params = fitModel.make_params(b0 = 0, b1 = 1, b2 = 1, b3 = 1, b4 = 1, b5 = 1) result = fitModel.fit(fullYValues, x=fullXValues, params=params, method='leastsq') elif fit_order == 4: fitModel = Model(poly_fit4) params = fitModel.make_params(b0 = 0, b1 = 1, b2 = 1, b3 = 1, b4 = 1) result = fitModel.fit(fullYValues, x=fullXValues, params=params, method='leastsq') elif fit_order == 3: fitModel = Model(poly_fit3) params = fitModel.make_params(b0 = 0, b1 = 1, b2 = 1, b3 = 1) result = fitModel.fit(fullYValues, x=fullXValues, params=params, method='leastsq') elif fit_order == 2: fitModel = Model(poly_fit2) params = fitModel.make_params(b0 = 0, b1 = 1, b2 = 1) result = fitModel.fit(fullYValues, x=fullXValues, params=params, method='leastsq') elif fit_order == 1: fitModel = Model(poly_fit1) params = fitModel.make_params(b0 = 0, b1 = 1) result = fitModel.fit(fullYValues, x=fullXValues, params=params, method='leastsq') else: raise ValueError('Order is out of range, please select from [1,5]') else: if fit_order == 5: popt, pcov = curve_fit(poly_fit5, fullXValues, fullYValues, maxfev = 8000) _function = poly_fit5 elif fit_order == 4: popt, pcov = curve_fit(poly_fit4, fullXValues, fullYValues, maxfev = 8000) _function = poly_fit4 elif fit_order == 3: popt, pcov = curve_fit(poly_fit3, fullXValues, fullYValues, maxfev = 8000) _function = poly_fit3 elif fit_order == 2: popt, pcov = curve_fit(poly_fit2, fullXValues, fullYValues, maxfev = 8000) _function = poly_fit2 elif fit_order == 1: popt, pcov = curve_fit(poly_fit1, fullXValues, fullYValues, maxfev = 8000) _function = poly_fit1 else: raise ValueError('Order is out of range, please select from [1,5]') try: if _has_lmfit: dispersion, dispersion_std = [], [] for name, par in result.params.items(): dispersion.append(par.value) dispersion_std.append(par.stderr) # dropping b0 param, it's not needed dispersion = dispersion[1:] dispersion_std = dispersion_std[1:] for idx in range(len(dispersion)): dispersion[idx] = dispersion[idx] * factorial(idx+1) dispersion_std[idx] = dispersion_std[idx] * factorial(idx+1) # fill up the rest with zeros while len(dispersion) < 5: dispersion.append(0) dispersion_std.append(0) fit_report = result.fit_report() else: fullXValues = np.asarray(fullXValues) #for plotting dispersion, dispersion_std = [], [] for idx in range(len(popt)-1): dispersion.append(popt[idx+1]*factorial(idx+1)) while len(dispersion)<5: dispersion.append(0) while len(dispersion_std)<len(dispersion): dispersion_std.append(0) fit_report = '\nTo display detailed results, you must have lmfit installed.' if show_graph: fig = plt.figure(figsize=(7,7)) fig.canvas.set_window_title('Min-max method fitted') plt.plot(fullXValues, fullYValues, 'o', label='dataset') try: plt.plot(fullXValues, result.best_fit, 'r--', label='fitted') except Exception: plt.plot(fullXValues, _function(fullXValues, *popt), 'r--', label='fitted') plt.legend() plt.grid() plt.show() return dispersion, dispersion_std, fit_report #This except block exists because otherwise errors would crash PyQt5 app, and we can display the error #to the log dialog in the application. except Exception as e: return [], [], e def poly_fit5(x, b0, b1, b2, b3, b4, b5): """ Taylor polynomial for fit b1 = GD b2 = GDD / 2 b3 = TOD / 6 b4 = FOD / 24 b5 = QOD / 120 """ return b0+b1*x+b2*x**2+b3*x**3+b4*x**4+b5*x**5 def poly_fit4(x, b0, b1, b2, b3, b4): """ Taylor polynomial for fit b1 = GD b2 = GDD / 2 b3 = TOD / 6 b4 = FOD / 24 """ return b0+b1*x+b2*x**2+b3*x**3+b4*x**4 def poly_fit3(x, b0, b1, b2, b3): """ Taylor polynomial for fit b1 = GD b2 = GDD / 2 b3 = TOD / 6 """ return b0+b1*x+b2*x**2+b3*x**3 def poly_fit2(x, b0, b1, b2): """ Taylor polynomial for fit b1 = GD b2 = GDD / 2 """ return b0+b1*x+b2*x**2 def poly_fit1(x, b0, b1): """ Taylor polynomial for fit b1 = GD """ return b0+b1*x ## To test with the data I provided: # a,b,c,d = np.loadtxt('test.txt', unpack=True, delimiter=',') # min_max_method(a,b,c,d, 2.5, fit_order=2, show_graph=True) ## To see how a usual dataset looks like: # x, y = _handle_input(a,b,c,d) # plt.plot(x, y) # plt.grid() # plt.show()
Obviously there are many things to improve (ex. determining the fit order looks horrible, but I could not find a better way.) I hope this was not too hard to follow. If something's unclear, just ask. I appreciate every suggestion in the code.
EDIT:
fixed
dispersion[idx] = dispersion[idx] / factorial(idx+1) dispersion_std[idx] = dispersion_std[idx] / factorial(idx+1) ... dispersion.append(popt[idx+1]/factorial(idx+1))
to
dispersion[idx] = dispersion[idx] * factorial(idx+1) dispersion_std[idx] = dispersion_std[idx] * factorial(idx+1) ... dispersion.append(popt[idx+1]*factorial(idx+1))