MR Estimation

mrestimator.full_analysis(data, targetdir, title, dt, dtunit, fitfunctions, tmin=None, tmax=None, coefficientmethod=None, substracttrialaverage=False, numboot='auto', seed='auto', loglevel=None, steps=None, targetplot=None)[source]

Wrapper function that performs the following four steps:

  • check data with input_handler()
  • calculate correlation coefficients via coefficients()
  • fit autocorrelation function with fit()
  • export/plot using the OutputHandler

Usually it should suffice to tweak the arguments and call the wrapper function (multiple times). Calling the underlying functions individually gives slightly more control, though.

Parameters:
  • data (str, list or numpy.ndarray) – Passed to input_handler(). Ideally, import and check data first. A string is assumed to be the path to file(s) that is then imported as pickle or plain text. Alternatively, you can provide a list or ndarray containing strings or already imported data. In the latter case, input_handler() attempts to convert it to the right format.
  • targetdir (str) – String containing the path to the target directory where files are saved with the filename title
  • title (str) – String for the filenames. Also sets the main title of the results figure.
  • dt (float) – How many dtunits separate the measurements of the provided data. For example, if measurements are taken every 4ms: dt=4, dtunit=’ms’.
  • dtunit (str) – Unit description/name of the time steps of the provided data.
  • tmin (float) – Smallest time separation to use for coefficients, in units of dtunit.
  • tmax (float) – Maximum time separation to use for coefficients. For example, to fit the autocorrelation between 8ms and 2s set: tmin=8, tmax=2000, dtunit=’ms’ (independent of dt).
Other Parameters:
 
  • coefficientmethod (str, optional) – ts or sm, method used for determining the correlation coefficients. See the coefficients() function for details. Default is ts.
  • substracttrialaverage (bool, optional) – Substract the average across all trials before calculating correlation coefficients. Default is False.
  • numboot (int, optional) – Number of bootstrap samples to draw. This repeats every fit numboot times so that we can provide an uncertainty estimate of the resulting branching parameter and autocorrelation time. Per default, bootstrapping is only applied in coefficeints() as most of computing time is needed for the fitting. Thereby we have uncertainties on the \(r_k\) (which will be plotted) but each fit is only done once. Default is numboot=0.
  • loglevel (str) – The loglevel to use for console output and the logfile created as title.log in the targetdir. ‘ERROR’, ‘WARNING’, ‘INFO’ or ‘DEBUG’. Per default inherited from the current mre console level, (usually ‘INFO’, if not changed by the user).
  • steps (~numpy.array, optional) – Overwrites tmin and tmax. Specify the fitrange in steps \(k\) for which to compute coefficients \(r_k\). Note that \(k\) provided here would need to be multiplied with units of [dt * dtunit] to convert back to (real) time. If an array of length two is provided, e.g. steps=(minstep, maxstep), all enclosed integer values will be used. Arrays larger than two are assumed to contain a manual choice of steps. Strides other than one are possible. Default is None.
  • targetplot (matplotlib.axes.Axes, optional) – You can provide a matplotlib axes element (i.e. a subplot of an existing figure) to plot the correlations into. The axis will be passed to the OutputHandler and all plotting will happen within that axes. Per default, a new figure is created - that cannot be added as a subplot to any other figure later on. This is due to the way matplotlib handles subplots.
Returns:

OutputHandler – that is associated with the correlation plot, fits and coefficients. Also saves meta data and plotted pdfs to targetdir.

Example

# test data, subsampled branching process
bp = mre.simulate_branching(m=0.95, h=10, subp=0.1, numtrials=50)

mre.full_analysis(
    data=bp,
    targetdir='./output',
    title='Branching Process',
    dt=1, dtunit='step',
    tmin=0, tmax=1500,
    fitfunctions=['exp', 'exp_offs', 'complex'],
    )

plt.show()
mrestimator.coefficients(data, steps=None, dt=1, dtunit='ms', method=None, numboot=100, seed=3141, description=None, desc=None)[source]

Calculates the coefficients of correlation \(r_k\).

Parameters:
  • data (ndarray) – Input data, containing the time series of activity in the trial structure. If a one dimensional array is provieded instead, we assume a single trial and reshape the input.
  • steps (array, optional) – Specify the steps \(k\) for which to compute coefficients \(r_k\). If an array of length two is provided, e.g. steps=(minstep, maxstep), all enclosed integer values will be used. Default is (1, 1500). Arrays larger than two are assumed to contain a manual choice of steps. Strides other than one are possible.
  • dt (float, optional) – The size of each step in dtunits. Default is 1.
  • dtunit (str, optional) – Units of step size. Default is ‘ms’.
  • description (str, optional) – Set the description of the CoefficientResult. By default all results of functions working with this set inherit its description (e.g. plot legends).
Other Parameters:
 
  • method (str, optional) – The estimation method to use, either ‘trialseparated’ ('ts', default) or ‘stationarymean’ ('sm'). 'ts' calculates the \(r_k\) for each trial separately and averages over trials. Each trials contribution is weighted with its variance. 'sm' assumes the mean activity and its variance to be constant across all trials.
  • numboot (int, optional) – Enable bootstrapping to generate numboot (resampled) series of trials from the provided one. This allows to approximate statistical errors, returned in stderrs. Default is numboot=100.
  • seed (int or None, optional) – If bootstrapping (numboot>0), a custom seed can be passed to the random number generator used for resampling. Per default, it is set to the same value every time coefficients() is called to return consistent results when repeating the analysis on the same data. Set to None to prevent (re)seeding. For more details, see numpy.random.RandomState.
Returns:

CoefficientResult – The output is grouped and can be accessed using its attributes (listed below).

class mrestimator.CoefficientResult[source]

Result returned by coefficients(). Subclassed from namedtuple.

Attributes are set to None if the specified method or input data do not provide them. All attributes of type ndarray and lists are one-dimensional.

Variables:
  • coefficients (ndarray) – Contains the coefficients \(r_k\), has length numsteps. Access via .coefficients[step]
  • steps (ndarray) – Array of the \(k\) values matching coefficients.
  • dt (float) – The size of each step in dtunits. Default is 1.
  • dtunit (str) – Units of step size. Default is ‘ms’.
  • stderrs (ndarray or None) – Standard errors of the \(r_k\).
  • trialactivities (ndarray) – Mean activity of each trial in the provided data. To get the global mean activity, use np.mean(trialactivities). Has lenght numtrials
  • description (str) – Description (or name) of the data set, by default all results of functions working with this set inherit its description (e.g. plot legends).
  • numtrials (int,) – Number of trials that contributed.
  • numboot (int,) – Number of bootstrap replicas that were created.
  • numsteps (int,) – Number of steps in coefficients, steps and stderrs.
  • bootstrapcrs (list) – List containing the numboot CoefficientResult instances that were calculated from the resampled input data. The List is empty if bootstrapping was skipped (numboot=0).
  • trialcrs (list) – List of the CoefficientResult instances calculated from individual trials. Only has length numtrials if the trialseparated method was used, otherwise it is empty.

Note

At the time of writing, ndarray behaves a bit unexpected when creating arrays with objects that are sequence like (such as CoefficientResult and FitResult), even when specifying dtype=object. Numpy converts the objects into an ndimensional structure instead of creating the (probably desired) 1d-array. To work around the issue, use a list or manually create the array with dtype=object and add the entries after creation.

Example

import numpy as np
import matplotlib.pyplot as plt
import mrestimator as mre

# branching process with 15 trials
bp = mre.simulate_branching(m=0.995, a=10, numtrials=15)

# the bp returns data already in the right format
rk = mre.coefficients(bp, dtunit='step')

# fit
ft = mre.fit(rk)

# plot coefficients and the autocorrelation fit
mre.OutputHandler([rk, ft])
plt.show()

# print the coefficients
print(rk.coefficients)

# get the documentation
print(help(rk))

# rk is inherited from namedtuple with all the bells and whistles
print(rk._fields)
../_images/example_cres.png
mrestimator.fit(data, fitfunc=<function f_exponential>, steps=None, fitpars=None, fitbnds=None, maxfev=None, ignoreweights=True, numboot=0, quantiles=None, seed=10815, desc=None, description=None)[source]

Estimate the Multistep Regression Estimator by fitting the provided correlation coefficients \(r_k\). The fit is performed using scipy.optimize.curve_fit() and can optionally be provided with (multiple) starting fitparameters and bounds.

Parameters:
  • data (CoefficientResult or array) – Correlation coefficients to fit. Ideally, provide this as CoefficientResult as obtained from coefficients(). If arrays are provided, the function tries to match the data.
  • fitfunc (callable, optional) – The model function, f(x, …). Directly passed to curve_fit(): It must take the independent variable as the first argument and the parameters to fit as separate remaining arguments. Default is f_exponential. Other builtin options are f_exponential_offset and f_complex.
  • steps (array, optional) – Specify the steps \(k\) for which to fit (think fitrange). If an array of length two is provided, e.g. steps=(minstep, maxstep), all enclosed values present in the provdied data, including minstep and maxstep will be used. Arrays larger than two are assumed to contain a manual choice of steps and those that are also present in data will be used. Strides other than one are possible. Ignored if data is not passed as CoefficientResult. Default: all values given in data are included in the fit.
Other Parameters:
 
  • fitpars (~numpy.ndarray, optional) – The starting parameters for the fit. If the provided array is two dimensional, multiple fits are performed and the one with the smallest sum of squares of residuals is returned.
  • fitbounds (~numpy.ndarray, optional) – Lower and upper bounds for each parameter handed to the fitting routine. Provide as numpy array of the form [[lowpar1, lowpar2, ...], [uppar1, uppar2, ...]]
  • numboot (int, optional) – Number of bootstrap samples to compute errors from. Default is 0
  • seed (int or None, optional) – If numboot is not zero, provide a seed for the random number generator. If seed=None, seeding will be skipped. Per default, the rng is (re)seeded everytime fit() is called so that every repeated call returns the same error estimates.
  • quantiles (list, optional) – If numboot is not zero, provide the quantiles to return (between 0 and 1). See numpy.quantile. Defaults are [.125, .25, .4, .5, .6, .75, .875]
  • maxfev (int, optional) – Maximum iterations for the fit.
  • description (str, optional) – Provide a custom description.
Returns:

FitResult – The output is grouped and can be accessed using its attributes (listed below).

class mrestimator.FitResult[source]

Result returned by fit(). Subclassed from namedtuple.

Variables:
  • tau (float) – The estimated autocorrelation time in dtunits. Default is ‘ms’.
  • mre (float) – The branching parameter estimated from the multistep regression.
  • fitfunc (callable) – The model function, f(x, …). This allows to fit directly with popt. To get the (TeX) description of a (builtin) function, use math_from_doc(fitfunc).
  • popt (array) – Final fitparameters obtained from the (best) underlying scipy.optimize.curve_fit(). Beware that these are not corrected for the time bin size, this needs to be done manually (for time and frequency variables).
  • pcov (array) – Final covariance matrix obtained from the (best) underlying scipy.optimize.curve_fit().
  • ssres (float) – Sum of the squared residuals for the fit with popt. This is not yet normalised per degree of freedom.
  • steps (array) – The step numbers \(k\) of the coefficients \(r_k\) that were included in the fit. Think fitrange.
  • dt (float) – The size of each step in dtunits. Default is 1.
  • dtunit (str) – Units of step size and the calculated autocorrelation time. Default is ‘ms’. dt and dtunit are inherited from CoefficientResult. Overwrite by providing data from coefficients() and the desired values set there.
  • quantiles (list or None) – Quantile values (between 0 and 1, inclusive) calculated from bootstrapping. See numpy.quantile. Defaults are [.125, .25, .4, .5, .6, .75, .875]
  • tauquantiles (list or None) – Resulting \(\tau\) values for the respective quantiles above.
  • mrequantiles (list or None) – Resulting \(m\) values for the respective quantiles above.
  • description (str) – Description, inherited from CoefficientResult. description provided to fit() takes priority, if set.

Example

import numpy as np
import matplotlib.pyplot as plt
import mrestimator as mre

bp = mre.simulate_branching(m=0.99, a=10, numtrials=15)
rk = mre.coefficients(bp, dtunit='step')

# compare the builtin fitfunctions
m1 = mre.fit(rk, fitfunc=mre.f_exponential)
m2 = mre.fit(rk, fitfunc=mre.f_exponential_offset)
m3 = mre.fit(rk, fitfunc=mre.f_complex)

# plot manually without using OutputHandler
plt.plot(rk.steps, rk.coefficients, label='data')
plt.plot(rk.steps, mre.f_exponential(rk.steps, *m1.popt),
    label='exponential m={:.5f}'.format(m1.mre))
plt.plot(rk.steps, mre.f_exponential_offset(rk.steps, *m2.popt),
    label='exp + offset m={:.5f}'.format(m2.mre))
plt.plot(rk.steps, mre.f_complex(rk.steps, *m3.popt),
    label='complex m={:.5f}'.format(m3.mre))

plt.legend()
plt.show()
../_images/example_fitres.png