MR Estimation¶
-
mrestimator.
full_analysis
(data, targetdir, title, dt, dtunit, fitfunctions, tmin=None, tmax=None, coefficientmethod=None, substracttrialaverage=False, numboot='auto', seed='auto', loglevel=None, steps=None, targetplot=None)[source]¶ Wrapper function that performs the following four steps:
- check data with input_handler()
- calculate correlation coefficients via coefficients()
- fit autocorrelation function with fit()
- export/plot using the OutputHandler
Usually it should suffice to tweak the arguments and call the wrapper function (multiple times). Calling the underlying functions individually gives slightly more control, though.
Parameters: - data (str, list or numpy.ndarray) – Passed to input_handler(). Ideally, import and check data first. A string is assumed to be the path to file(s) that is then imported as pickle or plain text. Alternatively, you can provide a list or ndarray containing strings or already imported data. In the latter case, input_handler() attempts to convert it to the right format.
- targetdir (str) – String containing the path to the target directory where files are saved with the filename title
- title (str) – String for the filenames. Also sets the main title of the results figure.
- dt (float) – How many dtunits separate the measurements of the provided data. For example, if measurements are taken every 4ms: dt=4, dtunit=’ms’.
- dtunit (str) – Unit description/name of the time steps of the provided data.
- tmin (float) – Smallest time separation to use for coefficients, in units of dtunit.
- tmax (float) – Maximum time separation to use for coefficients. For example, to fit the autocorrelation between 8ms and 2s set: tmin=8, tmax=2000, dtunit=’ms’ (independent of dt).
Other Parameters: - coefficientmethod (str, optional) – ts or sm, method used for determining the correlation
coefficients. See the
coefficients()
function for details. Default is ts. - substracttrialaverage (bool, optional) – Substract the average across all trials before calculating correlation coefficients. Default is False.
- numboot (int, optional) – Number of bootstrap samples to draw. This repeats every fit numboot times so that we can provide an uncertainty estimate of the resulting branching parameter and autocorrelation time. Per default, bootstrapping is only applied in coefficeints() as most of computing time is needed for the fitting. Thereby we have uncertainties on the \(r_k\) (which will be plotted) but each fit is only done once. Default is numboot=0.
- loglevel (str) – The loglevel to use for console output and the logfile created as title.log in the targetdir. ‘ERROR’, ‘WARNING’, ‘INFO’ or ‘DEBUG’. Per default inherited from the current mre console level, (usually ‘INFO’, if not changed by the user).
- steps (~numpy.array, optional) – Overwrites tmin and tmax.
Specify the fitrange in steps \(k\) for which to compute
coefficients \(r_k\).
Note that \(k\) provided here would need
to be multiplied with units of [dt * dtunit] to convert
back to (real) time.
If an array of length two is provided, e.g.
steps=(minstep, maxstep)
, all enclosed integer values will be used. Arrays larger than two are assumed to contain a manual choice of steps. Strides other than one are possible. Default is None. - targetplot (matplotlib.axes.Axes, optional) – You can provide a matplotlib axes element (i.e. a subplot of an existing figure) to plot the correlations into. The axis will be passed to the OutputHandler and all plotting will happen within that axes. Per default, a new figure is created - that cannot be added as a subplot to any other figure later on. This is due to the way matplotlib handles subplots.
Returns: OutputHandler – that is associated with the correlation plot, fits and coefficients. Also saves meta data and plotted pdfs to targetdir.
Example
# test data, subsampled branching process bp = mre.simulate_branching(m=0.95, h=10, subp=0.1, numtrials=50) mre.full_analysis( data=bp, targetdir='./output', title='Branching Process', dt=1, dtunit='step', tmin=0, tmax=1500, fitfunctions=['exp', 'exp_offs', 'complex'], ) plt.show()
-
mrestimator.
coefficients
(data, steps=None, dt=1, dtunit='ms', method=None, numboot=100, seed=3141, description=None, desc=None)[source]¶ Calculates the coefficients of correlation \(r_k\).
Parameters: - data (ndarray) – Input data, containing the time series of activity in the trial structure. If a one dimensional array is provieded instead, we assume a single trial and reshape the input.
- steps (array, optional) – Specify the steps \(k\) for which to compute coefficients
\(r_k\).
If an array of length two is provided, e.g.
steps=(minstep, maxstep)
, all enclosed integer values will be used. Default is(1, 1500)
. Arrays larger than two are assumed to contain a manual choice of steps. Strides other than one are possible. - dt (float, optional) – The size of each step in dtunits. Default is 1.
- dtunit (str, optional) – Units of step size. Default is ‘ms’.
- description (str, optional) – Set the description of the
CoefficientResult
. By default all results of functions working with this set inherit its description (e.g. plot legends).
Other Parameters: - method (str, optional) – The estimation method to use, either ‘trialseparated’ (
'ts'
, default) or ‘stationarymean’ ('sm'
).'ts'
calculates the \(r_k\) for each trial separately and averages over trials. Each trials contribution is weighted with its variance.'sm'
assumes the mean activity and its variance to be constant across all trials. - numboot (int, optional) – Enable bootstrapping to generate numboot (resampled) series of trials from the provided one. This allows to approximate statistical errors, returned in stderrs. Default is numboot=100.
- seed (int or None, optional) – If bootstrapping (numboot>0), a custom seed can be passed to the
random number generator used for
resampling. Per default, it is set to the same value every time
coefficients() is called to return consistent results
when repeating the analysis on the same data. Set to None to
prevent (re)seeding. For more details, see
numpy.random.RandomState
.
Returns: CoefficientResult
– The output is grouped and can be accessed using its attributes (listed below).
-
class
mrestimator.
CoefficientResult
[source]¶ Result returned by coefficients(). Subclassed from
namedtuple
.Attributes are set to None if the specified method or input data do not provide them. All attributes of type
ndarray
and lists are one-dimensional.Variables: - coefficients (ndarray) – Contains the coefficients \(r_k\), has length
numsteps. Access via
.coefficients[step]
- steps (ndarray) – Array of the \(k\) values matching coefficients.
- dt (float) – The size of each step in dtunits. Default is 1.
- dtunit (str) – Units of step size. Default is ‘ms’.
- stderrs (ndarray or None) – Standard errors of the \(r_k\).
- trialactivities (ndarray) – Mean activity of each trial in the provided data.
To get the global mean activity, use
np.mean(trialactivities)
. Has lenght numtrials - description (str) – Description (or name) of the data set, by default all results of functions working with this set inherit its description (e.g. plot legends).
- numtrials (int,) – Number of trials that contributed.
- numboot (int,) – Number of bootstrap replicas that were created.
- numsteps (int,) – Number of steps in coefficients, steps and stderrs.
- bootstrapcrs (list) – List containing the numboot
CoefficientResult
instances that were calculated from the resampled input data. The List is empty if bootstrapping was skipped (numboot=0). - trialcrs (list) – List of the
CoefficientResult
instances calculated from individual trials. Only has length numtrials if the trialseparated method was used, otherwise it is empty.
Note
At the time of writing,
ndarray
behaves a bit unexpected when creating arrays with objects that are sequence like (such asCoefficientResult
andFitResult
), even when specifying dtype=object. Numpy converts the objects into an ndimensional structure instead of creating the (probably desired) 1d-array. To work around the issue, use a list or manually create the array with dtype=object and add the entries after creation.Example
import numpy as np import matplotlib.pyplot as plt import mrestimator as mre # branching process with 15 trials bp = mre.simulate_branching(m=0.995, a=10, numtrials=15) # the bp returns data already in the right format rk = mre.coefficients(bp, dtunit='step') # fit ft = mre.fit(rk) # plot coefficients and the autocorrelation fit mre.OutputHandler([rk, ft]) plt.show() # print the coefficients print(rk.coefficients) # get the documentation print(help(rk)) # rk is inherited from namedtuple with all the bells and whistles print(rk._fields)
- coefficients (ndarray) – Contains the coefficients \(r_k\), has length
numsteps. Access via
-
mrestimator.
fit
(data, fitfunc=<function f_exponential>, steps=None, fitpars=None, fitbnds=None, maxfev=None, ignoreweights=True, numboot=0, quantiles=None, seed=10815, desc=None, description=None)[source]¶ Estimate the Multistep Regression Estimator by fitting the provided correlation coefficients \(r_k\). The fit is performed using
scipy.optimize.curve_fit()
and can optionally be provided with (multiple) starting fitparameters and bounds.Parameters: - data (CoefficientResult or array) – Correlation coefficients to fit. Ideally, provide this as
CoefficientResult
as obtained fromcoefficients()
. If arrays are provided, the function tries to match the data. - fitfunc (callable, optional) – The model function, f(x, …).
Directly passed to curve_fit():
It must take the independent variable as
the first argument and the parameters to fit as separate remaining
arguments.
Default is
f_exponential
. Other builtin options aref_exponential_offset
andf_complex
. - steps (array, optional) – Specify the steps \(k\) for which to fit (think fitrange).
If an array of length two is provided, e.g.
steps=(minstep, maxstep)
, all enclosed values present in the provdied data, including minstep and maxstep will be used. Arrays larger than two are assumed to contain a manual choice of steps and those that are also present in data will be used. Strides other than one are possible. Ignored if data is not passed as CoefficientResult. Default: all values given in data are included in the fit.
Other Parameters: - fitpars (~numpy.ndarray, optional) – The starting parameters for the fit. If the provided array is two dimensional, multiple fits are performed and the one with the smallest sum of squares of residuals is returned.
- fitbounds (~numpy.ndarray, optional) – Lower and upper bounds for each parameter handed to the fitting
routine. Provide as numpy array of the form
[[lowpar1, lowpar2, ...], [uppar1, uppar2, ...]]
- numboot (int, optional) – Number of bootstrap samples to compute errors from. Default is 0
- seed (int or None, optional) – If numboot is not zero, provide a seed for the random number
generator. If
seed=None
, seeding will be skipped. Per default, the rng is (re)seeded everytime fit() is called so that every repeated call returns the same error estimates. - quantiles (list, optional) – If numboot is not zero, provide the quantiles to return
(between 0 and 1). See
numpy.quantile
. Defaults are[.125, .25, .4, .5, .6, .75, .875]
- maxfev (int, optional) – Maximum iterations for the fit.
- description (str, optional) – Provide a custom description.
Returns: FitResult
– The output is grouped and can be accessed using its attributes (listed below).- data (CoefficientResult or array) – Correlation coefficients to fit. Ideally, provide this as
-
class
mrestimator.
FitResult
[source]¶ Result returned by fit(). Subclassed from
namedtuple
.Variables: - tau (float) – The estimated autocorrelation time in dtunits. Default is ‘ms’.
- mre (float) – The branching parameter estimated from the multistep regression.
- fitfunc (callable) – The model function, f(x, …). This allows to fit directly with popt.
To get the (TeX) description of a (builtin) function,
use
math_from_doc(fitfunc)
. - popt (array) – Final fitparameters obtained from the (best) underlying
scipy.optimize.curve_fit()
. Beware that these are not corrected for the time bin size, this needs to be done manually (for time and frequency variables). - pcov (array) – Final covariance matrix obtained from the (best) underlying
scipy.optimize.curve_fit()
. - ssres (float) – Sum of the squared residuals for the fit with popt. This is not yet normalised per degree of freedom.
- steps (array) – The step numbers \(k\) of the coefficients \(r_k\) that were included in the fit. Think fitrange.
- dt (float) – The size of each step in dtunits. Default is 1.
- dtunit (str) – Units of step size and the calculated autocorrelation time.
Default is ‘ms’.
dt and dtunit are inherited from
CoefficientResult
. Overwrite by providing data fromcoefficients()
and the desired values set there. - quantiles (list or None) – Quantile values (between 0 and 1, inclusive) calculated from
bootstrapping. See
numpy.quantile
. Defaults are[.125, .25, .4, .5, .6, .75, .875]
- tauquantiles (list or None) – Resulting \(\tau\) values for the respective quantiles above.
- mrequantiles (list or None) – Resulting \(m\) values for the respective quantiles above.
- description (str) – Description, inherited from
CoefficientResult
. description provided tofit()
takes priority, if set.
Example
import numpy as np import matplotlib.pyplot as plt import mrestimator as mre bp = mre.simulate_branching(m=0.99, a=10, numtrials=15) rk = mre.coefficients(bp, dtunit='step') # compare the builtin fitfunctions m1 = mre.fit(rk, fitfunc=mre.f_exponential) m2 = mre.fit(rk, fitfunc=mre.f_exponential_offset) m3 = mre.fit(rk, fitfunc=mre.f_complex) # plot manually without using OutputHandler plt.plot(rk.steps, rk.coefficients, label='data') plt.plot(rk.steps, mre.f_exponential(rk.steps, *m1.popt), label='exponential m={:.5f}'.format(m1.mre)) plt.plot(rk.steps, mre.f_exponential_offset(rk.steps, *m2.popt), label='exp + offset m={:.5f}'.format(m2.mre)) plt.plot(rk.steps, mre.f_complex(rk.steps, *m3.popt), label='complex m={:.5f}'.format(m3.mre)) plt.legend() plt.show()