CDR Package API
Complete API for all public classes and methods in this package.
cdr.backend module
cdr.config module
- class cdr.config.Config(path)[source]
Bases:
object
Parses an *.ini file and stores settings needed to define a set of CDR experiments.
- Parameters:
path – Path to *.ini file
- build_cdr_settings(settings, add_defaults=True, global_settings=None, is_cdr=True, is_cdrnn=False)[source]
Given a settings object parsed from a config file, compute CDR parameter dictionary.
- Parameters:
settings – settings from a
ConfigParser
object.add_defaults –
bool
; whether to add default settings not explicitly specified in the config.global_settings –
dict
orNone
; dictionary of global defaults for parameters missing from settings.is_cdr –
bool
; whether this is a CDR(NN) model.is_cdrnn –
bool
; whether this is a CDRNN model.
- Returns:
dict
; dictionary of settings key-value pairs.
cdr.data module
- cdr.data.add_responses(names, y)[source]
Add response variable(s) to a dataframe, applying any preprocessing required by the formula string.
- Parameters:
names –
str
orlist
ofstr
; name(s) of dependent variable(s)y –
pandas
DataFrame
; response data.
- Returns:
pandas
DataFrame
; response data with any missing ops applied.
- cdr.data.build_CDR_impulse_data(X, first_obs, last_obs, X_in_Y_names=None, X_in_Y=None, impulse_names=None, history_length=128, future_length=0, int_type='int32', float_type='float32')[source]
Construct impulse data arrays in the required format for CDR fitting/evaluation for a single response array.
- Parameters:
X –
list
ofpandas
tables; impulse (predictor) data.first_obs –
list
of index vectors (list
,pandas
series, ornumpy
vector) of first observations; the list contains vectors of row indices, one for each element of X, of the first impulse in the time series associated with the response. IfNone
, inferred from Y.last_obs –
list
of index vectors (list
,pandas
series, ornumpy
vector) of last observations; the list contains vectors of row indices, one for each element of X, of the last impulse in the time series associated with the response. IfNone
, inferred from Y.X_in_Y_names –
list
ofstr
; names of predictors contained in Y rather than X. IfNone
, no such predictors.X_in_Y –
pandas
DataFrame
orNone
; table of predictors contained in Y rather than X. IfNone
, no such predictors.impulse_names –
list
ofstr
; names of columns in X to be used as impulses by the model. IfNone
, all columns returned.history_length –
int
; maximum number of history (backward) observations.future_length –
int
; maximum number of future (forward) observations.int_type –
str
; name of int type.float_type –
str
; name of float type.
- Returns:
triple of
numpy
arrays; let N, T, I, R respectively be the number of rows in Y, history length, number of impulse dimensions, and number of response dimensions. Outputs are (1) impulses with shape (N, T, I), (2) impulse timestamps with shape (N, T, I), and impulse mask with shape (N, T, I).
- cdr.data.build_CDR_response_data(responses, Y=None, first_obs=None, last_obs=None, Y_time=None, Y_gf=None, X_in_Y_names=None, X_in_Y=None, Y_category_map=None, response_to_df_ix=None, gf_names=None, gf_map=None)[source]
Construct response data arrays in the required format for CDR fitting/evaluation for one or more response arrays.
- Parameters:
responses –
list
ofstr
; names of columns in Y to be used as responses (dependent variables) by the model.Y –
list
ofpandas
tables, orNone
; response data. IfNone
, does not return a response array.first_obs –
list
oflist
of index vectors (list
,pandas
series, ornumpy
vector) of first observations, orNone
; the list contains one element for each response array. Inner lists contain vectors of row indices, one for each element of X, of the first impulse in the time series associated with each response. IfNone
, inferred from Y.last_obs –
list
oflist
of index vectors (list
,pandas
series, ornumpy
vector) of last observations, orNone
; the list contains one element for each response array. Inner lists contain vectors of row indices, one for each element of X, of the last impulse in the time series associated with each response. IfNone
, inferred from Y.Y_time –
list
of response timestamp vectors (list
,pandas
series, ornumpy
vector), orNone
; vector(s) of response timestamps, one for each response array. Needed to timestamp any response-aligned predictors (ignored if none in model).Y_gf –
list
ofpandas
DataFrame
, orNone
; vector(s) of response timestamps, one for each response array. Data frames containing random grouping factor levels, if applicable.X_in_Y_names –
list
ofstr
; names of predictors contained in Y rather than X (must be present in all elements of Y). IfNone
, no such predictors.X_in_Y –
list
ofpandas
DataFrame
orNone
; tables (one per response array) of predictors contained in Y rather than X (must be present in all elements of Y). IfNone
, no such predictors.Y_category_map –
dict
orNone
; map from category labels to integers for each categorical response.response_to_df_ix –
dict
orNone
; map from response names to lists of indices of the response files that contain them.gf_names –
list
orNone
; list of names of random grouping factor variables. IfNone
and Y_gf provided, will use all columns of Y_gf.gf_map –
list
ofdict
orNone
; list maps from random grouping factor levels to their indices, one map per grouping factor variable in gf_names.
- Returns:
7-tuple of
numpy
arrays; let N, R, XF, YF, Z, and K respectively be the number of rows (sum total number of rows in Y), number of response dimensions, number of distinct predictor files (X), number of distinct response files (Y), number of random grouping factor variables, and number of response_aligned predictors. Outputs are (1) responses with shape (N, R) orNone
if Y isNone
, (2) an XF-tuple of first observation vectors indexing start indices for each entry in X, (3) a YF-tuple of first observation vectors indexing end indices for each entry in X, (4) response timestamps with shape (N,), (5) response masks (masking out any missing response variables per row) with shape (N, R), (6) random grouping factor matrix with shape (N, Z), orNone
if no random grouping factors provided, and (7) response-aligned predictors with shape (N, K).
- cdr.data.c(df)[source]
Zero-center pandas series or data frame
- Parameters:
df –
pandas
Series
orDataFrame
; input date- Returns:
pandas
Series
orDataFrame
; centered data
- cdr.data.compare_elementwise_perf(a, b, y=None, mode='err')[source]
Compare model performance elementwise.
- Parameters:
a –
numpy
vector; vector of elementwise scores (or predictions if mode iscorr
) for model a.b –
numpy
vector; vector of elementwise scores (or predictions if mode iscorr
) for model b.y –
numpy
vector orNone
; vector of observations. Used only if mode iscorr
.mode –
str
; Type of performance metric. One oferr
,loglik
, orcorr
.
- Returns:
numpy
vector; vector of elementwise performance differences
- cdr.data.compute_filter(y, field, cond)[source]
Compute filter given a field and condition
- Parameters:
y –
pandas
DataFrame
; response data.field –
str
; name of column on whose values to filter.cond –
str
; string representation of condition to use for filtering.
- Returns:
numpy
vector; boolean mask to use forpandas
subsetting operations.
- cdr.data.compute_filters(Y, filters=None)[source]
Compute filters given a filter map.
- Parameters:
Y –
pandas
DataFrame
; response data.filters –
list
; list of key-value pairs mapping column names to filtering criteria for their values.
- Returns:
numpy
vector; boolean mask to use forpandas
subsetting operations.
- cdr.data.compute_partition(y, modulus, n)[source]
Given a
splitID
column, use modular arithmetic to partition data into n subparts.- Parameters:
y –
pandas
DataFrame
; response data.modulus –
int
; modulus to use for splitting, must be at least as large as n.n –
int
; number of subparts in the partition.
- Returns:
list
ofnumpy
vectors; one boolean vector per subpart of the partition, selecting only those elements of y that belong.
- cdr.data.compute_splitID(y, split_fields)[source]
Map tuples in columns designated by split_fields into integer ID to use for data partitioning.
- Parameters:
y –
pandas
DataFrame
; response data.split_fields –
list
ofstr
; column names to use for computing split ID.
- Returns:
numpy
vector; integer vector of split ID’s.
- cdr.data.compute_time_mask(X_time, first_obs, last_obs, history_length=128, future_length=0, int_type='int32', float_type='float32')[source]
Compute mask for expanded impulse data zeroing out non-existent impulses.
- Parameters:
X_time –
pandas
Series
; timestamps associated with each impulse in X.first_obs –
pandas
Series
; vector of row indices in X of the first impulse in the time series associated with each response.last_obs –
pandas
Series
; vector of row indices in X of the last preceding impulse in the time series associated with each response.history_length –
int
; maximum number of history (backward) observations.future_length –
int
; maximum number of future (forward) observations.int_type –
str
; name of int type.float_type –
str
; name of float type.
- Returns:
numpy
array; boolean impulse mask.
- cdr.data.corr_cdr(X_2d, impulse_names, impulse_names_2d, time, time_mask)[source]
Compute correlation matrix, including correlations across time where necessitated by 2D predictors.
- Parameters:
X_2d –
numpy
array; the impulse data. Must be of shape(batch_len, history_length+future_length, n_impulses)
, can be computed from sources bybuild_CDR_impulse_data()
.impulse_names –
list
ofstr
; names of columns in X_2d to be used as impulses by the model.impulse_names_2d –
list
ofstr
; names of columns in X_2d that designate to 2D predictors.time – 3D
numpy
array; array of timestamps for each event in X_2d.time_mask – 3D
numpy
array; array of masks over padding events in X_2d.
- Returns:
pandas
DataFrame
; the correlation matrix.
- cdr.data.expand_impulse_sequence(X, X_time, first_obs, last_obs, window_length, int_type='int32', float_type='float32', fill=0.0)[source]
Expand out impulse stream in X for each response in the target data.
- Parameters:
X –
pandas
DataFrame
; impulse (predictor) data.X_time –
pandas
Series
; timestamps associated with each impulse in X.first_obs –
pandas
Series
; vector of row indices in X of the first impulse in the time series associated with each response.last_obs –
pandas
Series
; vector of row indices in X of the last preceding impulse in the time series associated with each response.window_length –
int
; number of steps in time dimension of outputint_type –
str
; name of int type.float_type –
str
; name of float type.fill –
float
; fill value for padding cells.
- Returns:
3-tuple of
numpy
arrays; the expanded impulse array, the expanded timestamp array, and a boolean mask zeroing out locations of non-existent impulses.
- cdr.data.filter_invalid_responses(Y, dv, crossval_factor=None, crossval_fold=None)[source]
Filter out rows with non-finite responses.
- Parameters:
Y –
pandas
table orlist
ofpandas
tables; response data.dv –
str
orlist
ofstr
; name(s) of column(s) containing the dependent variable(s)crossval_factor –
str
orNone
; name of column containing the selection variable for cross validation. IfNone
, no cross validation filtering.crossval_fold –
list
orNone
; list of valid values for cross-validation selection. Used only ifcrossval_factor
is notNone
.
- Returns:
2-tuple of
pandas
DataFrame
andpandas
Series
; valid data and indicator vector used to filter out invalid data.
- cdr.data.get_first_last_obs_lists(y)[source]
Convenience utility to extract out all first_obs and last_obs columns in Y sorted by file index
- Parameters:
y –
pandas
DataFrame
; response data.- Returns:
pair of
list
ofstr
; first_obs column names and last_obs column names
- cdr.data.get_rangf_array(Y, rangf_names, rangf_map)[source]
Collect random grouping factor indicators as
numpy
integer arrays that can be read by Tensorflow. Returns vertical concatenation of GF arrays from each element of Y.- Parameters:
Y –
pandas
table orlist
ofpandas
tables; response data.rangf_names –
list
ofstr
; names of columns containing random grouping factor levels (order is preserved, changing the order will change the resulting array).rangf_map –
list
ofdict
; map for each random grouping factor from levels to unique indices.
- Returns:
- cdr.data.get_time_windows(X, Y, series_ids, forward=False, window_length=128, t_delta_cutoff=None, verbose=True)[source]
Compute row indices in X of initial and final impulses for each element of y. Assumes time series are already sorted by series_ids.
- Parameters:
X –
pandas
DataFrame
; impulse (predictor) data.Y –
pandas
DataFrame
; response data.series_ids –
list
ofstr
; column names whose jointly unique values define unique time series.forward –
bool
; whether to compute forward windows (future inputs) or backward windows (past inputs, used if forward isFalse
).window_length –
int
; maximum size of time window to consider. Ifnp.inf
, no bound on window size.t_delta_cutoff –
float
orNone
; maximum distance in time to consider (can help improve training stability on data with large gaps in time). If0
orNone
, no cutoff.verbose –
bool
; whether to report progress to stderr
- Returns:
2-tuple of
numpy
vectors; first and last impulse observations (respectively) for each response in y
- cdr.data.preprocess_data(X, Y, formula_list, series_ids, filters=None, history_length=128, future_length=0, t_delta_cutoff=None, all_interactions=False, verbose=True, debug=False)[source]
Preprocess CDR data.
- Parameters:
X – list of
pandas
tables; impulse (predictor) data.Y – list of
pandas
tables; response data.formula_list –
list
ofFormula
; CDR formula for which to preprocess data.series_ids –
list
ofstr
; column names whose jointly unique values define unique time series.filters –
list
; list of key-value pairs mapping column names to filtering criteria for their values.history_length –
int
; maximum number of history (backward) observations.future_length –
int
; maximum number of future (forward) observations.t_delta_cutoff –
float
orNone
; maximum distance in time to consider (can help improve training stability on data with large gaps in time). If0
orNone
, no cutoff.all_interactions –
bool
; add powerset of all conformable interactions.verbose –
bool
; whether to report progress to stderrdebug –
bool
; print debugging information
- Returns:
7-tuple; predictor data, response data, filtering mask, response-aligned predictor names, response-aligned predictors, 2D predictor names, and 2D predictors
- cdr.data.s(df)[source]
Rescale pandas series or data frame by its standard deviation
- Parameters:
df –
pandas
Series
orDataFrame
; input date- Returns:
pandas
Series
orDataFrame
; rescaled data
- cdr.data.split_cdr_outputs(outputs, lengths)[source]
Takes a dictionary of arbitrary depth containing CDR outputs with their labels as keys and splits each output into a list of outputs with lengths corresponding to lengths. Useful for aligning CDR outputs to response files, since multiple response files can be provided, which are underlyingly concatenated by CDR. Recursively modifies the dict in place.
- Parameters:
outputs –
dict
of arbitrary depth withnumpy
arrays at the leaves; the source CDR outputslengths – array-like vector of lengths to split the outputs into
- Returns:
dict
; same key-val structure as outputs but with each leaf split into a list oflen(lengths)
vectors, one for each length value.
cdr.formula module
- class cdr.formula.Formula(bform_str, standardize=True)[source]
Bases:
object
A class for parsing R-style mixed-effects CDR model formula strings and applying them to CDR data matrices.
- Parameters:
bform_str –
str
; an R-style mixed-effects CDR model formula string
- ablate_impulses(impulse_ids)[source]
Remove impulses in impulse_ids from fixed effects (retaining in any random effects).
- Parameters:
impulse_ids –
list
ofstr
; impulse ID’s- Returns:
None
- apply_formula(X, Y, X_in_Y_names=None, all_interactions=False, series_ids=None)[source]
Extract all data and compute all transforms required by the model formula.
- Parameters:
X – list of
pandas
tables; impulse data.Y – list of
pandas
tables; response data.X_in_Y_names –
list
orNone
; List of column names for response-aligned predictors (predictors measured for every response rather than for every input) if applicable,None
otherwise.all_interactions –
bool
; add powerset of all conformable interactions.series_ids –
list
ofstr
orNone
; list of ids to use as grouping factors for lagged effects. IfNone
, lagging will not be attempted.
- Returns:
triple; transformed X, transformed y, response-aligned predictor names
- apply_op(op, arr)[source]
Apply op op to array arr.
- Parameters:
op –
str
; name of op.arr –
numpy
orpandas
array; source data.
- Returns:
numpy
array; transformed data.
- apply_op_2d(op, arr, time_mask)[source]
Apply op to 2D predictor (predictor whose value depends on properties of the response).
- Parameters:
op –
str
; name of op.arr –
numpy
or array; source data.time_mask –
numpy
array; mask for padding cells
- Returns:
numpy
array; transformed data
- apply_ops(impulse, X)[source]
Apply all ops defined for an impulse
- Parameters:
impulse –
Impulse
object; the impulse.X – list of
pandas
tables; table containing the impulse data.
- Returns:
pandas
table; table augmented with transformed impulse.
- apply_ops_2d(impulse, X_2d_predictor_names, X_2d_predictors, time_mask)[source]
Apply all ops defined for a 2D predictor (predictor whose value depends on properties of the response).
- Parameters:
impulse –
Impulse
object; the impulse.X_2d_predictor_names –
list
ofstr
; names of 2D predictors.X_2d_predictors –
numpy
array; source data.time_mask –
numpy
array; mask for padding cells
- Returns:
2-tuple;
list
of new predictor name,numpy
array of predictor values
- static bases(family)[source]
Get the number of bases of a spline kernel.
- Parameters:
family –
str
; name of IRF family- Returns:
int
orNone
; number of bases of spline kernel, orNone
if family is not a spline.
- build(bform_str, standardize=True)[source]
Construct internal data from formula string
- Parameters:
bform_str –
str
; source string.- Returns:
None
- categorical_transform(X)[source]
Get transformed formula with categorical predictors in X expanded.
- Parameters:
X – list of
pandas
tables; input data.- Returns:
Formula
; transformedFormula
object
- compute_2d_predictor(predictor_name, X, first_obs, last_obs, history_length=128, future_length=None, minibatch_size=50000)[source]
Compute 2D predictor (predictor whose value depends on properties of the most recent impulse).
- Parameters:
predictor_name –
str
; name of predictorX –
pandas
table; input datafirst_obs –
pandas
Series
or 1Dnumpy
array; row indices inX
of the start of the series associated with each regression target.last_obs –
pandas
Series
or 1Dnumpy
array; row indices inX
of the most recent observation in the series associated with each regression target.minibatch_size –
int
; minibatch size for computing predictor, can help with memory footprint
- Returns:
2-tuple; new predictor name,
numpy
array of predictor values
- initialize_nns()[source]
Initialize a dictionary mapping ids to metadata for all NN components in this CDR model
- Returns:
dict
; mapping from NNstr
id toNN
object storing metadata for that NN.
- insert_impulses(impulses, irf_str, rangf=None)[source]
Insert impulses in impulse_ids into fixed effects and all random terms.
- Parameters:
impulse_ids –
list
ofstr
; impulse ID’s- Returns:
None
- static irf_params(family)[source]
Return list of parameter names for a given IRF family.
- Parameters:
family –
str
; name of IRF family- Returns:
list
ofstr
; parameter names
- static is_LCG(family)[source]
Check whether a kernel is LCG.
- Parameters:
family –
str
; name of IRF family- Returns:
bool
; whether the kernel is LCG (linear combination of Gaussians)
- pc_transform(n_pc, pointers=None)[source]
Get transformed formula with impulses replaced by principal components.
- Parameters:
n_pc –
int
; number of principal components in transform.pointers –
dict
; map from source nodes to transformed nodes.
- Returns:
list
ofIRFNode
; tree forest representing current state of the transform.
- process_ast(t, terms=None, has_intercept=None, ops=None, rangf=None, impulses_by_name=None, interactions_by_name=None, under_irf=False, under_interaction=False)[source]
Recursively process a node of the Python abstract syntax tree (AST) representation of the formula string and insert data into internal representation of model formula.
- Parameters:
t – AST node.
terms –
list
orNone
; CDR terms computed so far, orNone
if no CDR terms computed.has_intercept –
dict
; map from random grouping factors to boolean values representing whether that grouping factor has a random intercept.None
is used as a key to refer to the population-level intercept.ops –
list
; names of ops computed so far, orNone
if no ops computed.rangf –
str
orNone
; name of rangf for random term currently being processed, orNone
if currently processing fixed effects portion of model.
- Returns:
None
- process_irf(t, input_irf, ops=None, rangf=None, nn_inputs=None, impulses_by_name=None, interactions_by_name=None)[source]
Process data from AST node representing part of an IRF definition and insert data into internal representation of the model.
- Parameters:
t – AST node.
input_irf –
IRFNode
,Impulse
,InterationImpulse
, orNNImpulse
object; child IRF of current nodeops –
list
ofstr
, orNone
; ops applied to IRF. IfNone
, no ops appliedrangf –
str
orNone
; name of rangf for random term currently being processed, orNone
if currently processing fixed effects portion of model.nn_inputs –
tuple
orNone
; tuple of input impulses to neural network IRF, orNone
if not a neural network IRF.
- Returns:
IRFNode
object; the IRF node
- re_transform(X)[source]
Get transformed formula with regex predictors expanded based on matches to the columns in X.
- Parameters:
X – list of
pandas
tables; input data.- Returns:
Formula
; transformedFormula
object
- remove_impulses(impulse_ids)[source]
Remove impulses in impulse_ids from the model (both fixed and random effects).
- Parameters:
impulse_ids –
list
ofstr
; impulse ID’s- Returns:
None
- response_names()[source]
Get list of names modeled response variables.
- Returns:
list
ofstr
; names modeled response variables.
- responses()[source]
Get list of modeled response variables.
- Returns:
list
ofImpulse
; modeled response variables.
- to_lmer_formula_string(z=False, correlated=True)[source]
Generate an
lme4
-style LMER model string representing the structure of the current CDR model. Useful for 2-step analysis in which data are transformed using CDR, then fitted using LME.- Parameters:
z –
bool
; z-transform convolved predictors.correlated –
bool
; whether to use correlated random intercepts and slopes.
- Returns:
str
; the LMER formula string.
- class cdr.formula.IRFNode(family=None, impulse=None, p=None, irfID=None, coefID=None, ops=None, fixed=True, rangf=None, nn_impulses=None, nn_config=None, impulses_as_inputs=True, inputs_to_add=None, inputs_to_drop=None, param_init=None, trainable=None, response_params_list=None)[source]
Bases:
object
Data structure representing a node in a CDR IRF tree. For more information on how the CDR IRF structure is encoded as a tree, see the reference on CDR IRF trees.
- Parameters:
family –
str
; name of IRF kernel family.impulse –
Impulse
object orNone
; the impulse if terminal, elseNone
.p –
IRFNode
object orNone
; the parent IRF node, orNone
if no parent (parent nodes can be connected after initialization).irfID –
str
orNone
; string ID of node if applicable. IfNone
, automatically-generated ID will discribe node’s family and structural position.coefID –
str
orNone
; string ID of coefficient if applicable. IfNone
, automatically-generated ID will discribe node’s family and structural position. Only applicable to terminal nodes, so this property will not be used if the node is non-terminal.ops –
list
ofstr
, orNone
; ops to apply to IRF node. IfNone
, no ops.fixed –
bool
; Whether node exists in the model’s fixed effects structure.rangf –
list
ofstr
,str
, orNone
; names of any random grouping factors associated with the node.nn_impulses –
tuple
orNone
; tuple of input impulses to neural network IRF, orNone
if not a neural network IRF.nn_config –
dict
orNone
; dictionary of settings for NN IRF component.impulses_as_inputs –
bool
; whether to include impulses in input of a neural network IRF.inputs_to_add –
list
ofImpulse
/NNImpulse
orNone
; list of impulses to add to input of neural network IRF.inputs_to_drop –
list
ofImpulse
/NNImpulse
orNone
; list of impulses to remove from input of neural network IRF (keeping them in output).param_init –
dict
; map from parameter names to initial values, which will also be used as prior means.trainable –
list
ofstr
, orNone
; trainable parameters at this node. IfNone
, all parameters are trainable.response_params_list –
list
of 2-tuple
ofstr
, orNone
; Response distribution parameters modeled by this IRF, with each parameter represented as a pair (DIST_NAME, PARAM_NAME). DIST_NAME can beNone
, in which case the IRF will apply to any distribution parameter matching PARAM_NAME.
- ablate_impulses(impulse_ids)[source]
Remove impulses in impulse_ids from fixed effects (retaining in any random effects).
- Parameters:
impulse_ids –
list
ofstr
; impulse ID’s- Returns:
None
- add_child(t)[source]
Add child to this node in the IRF tree
- Parameters:
t –
IRFNode
; child node.- Returns:
IRFNode
; child node with updated parent.
- add_interactions(response_interactions)[source]
Add a ResponseInteraction object (or list of them) to this node.
- Parameters:
response_interaction –
ResponseInteraction
orlist
ofResponseInteraction
; response interaction(s) to add- Returns:
None
- add_rangf(rangf)[source]
Add random grouping factor name to this node.
- Parameters:
rangf –
str
; random grouping factor name- Returns:
None
- atomic_irf_by_family()[source]
Get map from IRF kernel family names to list of IDs of IRFNode instances belonging to that family.
- Returns:
dict
fromstr
tolist
ofstr
; IRF IDs by family.
- atomic_irf_param_init_by_family()[source]
Get map from IRF kernel family names to maps from IRF IDs to maps from IRF parameter names to their initialization values.
- Returns:
dict
; parameter initialization maps by family.
- atomic_irf_param_trainable_by_family()[source]
Get map from IRF kernel family names to maps from IRF IDs to lists of trainable parameters.
- Returns:
dict
; trainable parameter maps by family.
- bases()[source]
Get the number of bases of node.
- Returns:
int
orNone
; number of bases of node, orNone
if node is not a spline.
- categorical_transform(X, expansion_map=None)[source]
Generate transformed copy of node with categorical predictors in X expanded. Recursive. Returns a tree forest representing the current state of the transform. When run from ROOT, should always return a length-1 list representing a single-tree forest, in which case the transformed tree is accessible as the 0th element.
- Parameters:
X – list of
pandas
tables; input data.expansion_map –
dict
; Internal variable. Do not use.
- Returns:
list
ofIRFNode
; tree forest representing current state of the transform.
- coef2impulse()[source]
Get map from coefficient IDs dominated by node to lists of corresponding impulses.
- Returns:
dict
; map from coefficient IDs to lists of corresponding impulses.
- coef2terminal()[source]
Get map from coefficient IDs dominated by node to lists of corresponding terminal IRF nodes.
- Returns:
dict
; map from coefficient IDs to lists of corresponding terminal IRF nodes.
- coef_by_rangf()[source]
Get map from random grouping factor names to associated coefficient IDs dominated by node.
- Returns:
dict
; map from random grouping factor names to associated coefficient IDs.
- coef_id()[source]
Get coefficient ID for this node.
- Returns:
str
orNone
; coefficient ID, orNone
if non-terminal.
- coef_names()[source]
Get list of names of coefficients dominated by node.
- Returns:
list
ofstr
; names of coefficients dominated by node.
- fixed_coef_names()[source]
Get list of names of fixed coefficients dominated by node.
- Returns:
list
ofstr
; names of fixed coefficients dominated by node.
- fixed_interaction_names()[source]
Get list of names of fixed interactions dominated by node.
- Returns:
list
ofstr
; names of fixed interactions dominated by node.
- formula_terms()[source]
Return data structure representing formula terms dominated by node, grouped by random grouping factor. Key
None
represents the fixed portion of the model (no random grouping factor).- Returns:
dict
; map from random grouping factors to data structure representing formula terms. Data structure contains 2 fields,'impulses'
containing impulses and'irf'
containing IRF Nodes.
- has_coefficient(rangf)[source]
Report whether rangf has any coefficients in this subtree
- Parameters:
rangf – Random grouping factor
- Returns:
bool
: Whether rangf has any coefficients in this subtree
- has_composed_irf()[source]
Check whether node dominates any IRF compositions.
- Returns:
bool
, whether node dominates any IRF compositions.
- has_irf(rangf)[source]
Report whether rangf has any IRFs in this subtree
- Parameters:
rangf – Random grouping factor
- Returns:
bool
: Whether rangf has any IRFs in this subtree
- impulse2coef()[source]
Get map from impulses dominated by node to lists of corresponding coefficient IDs.
- Returns:
dict
; map from impulses to lists of corresponding coefficient IDs.
- impulse2terminal()[source]
Get map from impulses dominated by node to lists of corresponding terminal IRF nodes.
- Returns:
dict
; map from impulses to lists of corresponding terminal IRF nodes.
- impulse_names(include_interactions=False, include_nn=False, include_nn_inputs=True)[source]
Get list of names of impulses dominated by node.
- Parameters:
include_interactions –
bool
; whether to return impulses defined by interaction terms.include_nn –
bool
; whether to return NN transformations of impulses.include_nn_inputs –
bool
; whether to return input impulses to NN transformations.
- Returns:
list
ofstr
; names of impulses dominated by node.
- impulse_set(include_interactions=False, include_nn=False, include_nn_inputs=True, out=None)[source]
Get set of impulses dominated by node.
- Parameters:
include_interactions –
bool
; whether to return impulses defined by interaction terms.include_nn –
bool
; whether to return NN transformations of impulses.include_nn_inputs –
bool
; whether to return input impulses to NN transformations.
:param
set
orNone
; initial dictionary to modify.- Returns:
list
ofImpulse
; impulses dominated by node.
- impulses(include_interactions=False, include_nn=False, include_nn_inputs=True)[source]
Get alphabetically sorted list of impulses dominated by node.
- Parameters:
include_interactions –
bool
; whether to return impulses defined by interaction terms.include_nn –
bool
; whether to return NN transformations of impulses.include_nn_inputs –
bool
; whether to return input impulses to NN transformations.
- Returns:
list
ofImpulse
; impulses dominated by node.
- impulses_by_name(include_interactions=False, include_nn=False, include_nn_inputs=True)[source]
Get dictionary mapping names of impulses dominated by node to their corresponding impulses.
- Parameters:
include_interactions –
bool
; whether to return impulses defined by interaction terms.include_nn –
bool
; whether to return NN transformations of impulses.include_nn_inputs –
bool
; whether to return input impulses to NN transformations.
- Returns:
list
ofImpulse
; impulses dominated by node.
- impulses_from_response_interaction()[source]
Get list of any impulses from response interactions associated with this node.
- Returns:
list
ofImpulse
; impulses dominated by node.
- interaction_by_rangf()[source]
Get map from random grouping factor names to associated interaction IDs dominated by node.
- Returns:
dict
; map from random grouping factor names to associated interaction IDs.
- interaction_names()[source]
Get list of names of interactions dominated by node.
- Returns:
list
ofstr
; names of interactions dominated by node.
- interactions()[source]
Return list of all response interactions used in this subtree, sorted by name.
- Returns:
list
ofResponseInteraction
- interactions2inputs()[source]
Get map from IDs of ResponseInteractions dominated by node to lists of IDs of their inputs.
- Returns:
dict
; map from IDs of ResponseInteractions nodes to lists of their inputs.
- irf_by_rangf()[source]
Get map from random grouping factor names to IDs of associated IRF nodes dominated by node.
- Returns:
dict
; map from random grouping factor names to IDs of associated IRF nodes.
- irf_to_formula(rangf=None)[source]
Generates a representation of this node’s impulse response kernel in formula string syntax
- Parameters:
rangf – random grouping factor for which to generate the stringification (fixed effects if rangf==None).
- Returns:
str
; formula string representation of node
- is_LCG()[source]
Check the non-parametric type of a node’s kernel, or return
None
if parametric.- Parameters:
family –
str
; name of IRF family- Returns:
str
orNone; name of kernel type if non-parametric, else ``None
.
- local_name()[source]
Get descriptive name for this node, ignoring its position in the IRF tree.
- Returns:
str
; name.
- nns_by_key(nns_by_key=None)[source]
Get a dict mapping NN keys to objects associated with them.
- Parameters:
keys –
dict
orNone
; dictionary to modify. Empty ifNone
.- Returns:
dict
; map from string keys tolist
of associatedIRFNode
and/orNNImpulse
objects.
- node_table()[source]
Get map from names to nodes of all nodes dominated by node (including self).
- Returns:
dict
; map from names to nodes of all nodes dominated by node.
- nonparametric_coef_names()[source]
Get list of names of nonparametric coefficients dominated by node. :return:
list
ofstr
; names of spline coefficients dominated by node.
- static pointers2namemmaps(p)[source]
Get a map from source to transformed IRF node names.
- Parameters:
p –
dict
; map from source to transformed IRF nodes.- Returns:
dict
; map from source to transformed IRF node names.
- re_transform(X, expansion_map=None)[source]
Generate transformed copy of node with regex-matching predictors in X expanded. Recursive. Returns a tree forest representing the current state of the transform. When run from ROOT, should always return a length-1 list representing a single-tree forest, in which case the transformed tree is accessible as the 0th element.
- Parameters:
X – list of
pandas
tables; input data.expansion_map –
dict
; Internal variable. Do not use.
- Returns:
list
ofIRFNode
; tree forest representing current state of the transform.
- remove_impulses(impulse_ids)[source]
Remove impulses in impulse_ids from the model (both fixed and random effects).
- Parameters:
impulse_ids –
list
ofstr
; impulse ID’s- Returns:
None
- supports_non_causal()[source]
Check whether model contains only IRF kernels that lack the causality constraint t >= 0.
- Returns:
bool
: whether model contains only IRF kernels that lack the causality constraint t >= 0.
- terminal2coef()[source]
Get map from IDs of terminal IRF nodes dominated by node to lists of corresponding coefficient IDs.
- Returns:
dict
; map from IDs of terminal IRF nodes to lists of corresponding coefficient IDs.
- terminal2impulse()[source]
Get map from terminal IRF nodes dominated by node to lists of corresponding impulses.
- Returns:
dict
; map from terminal IRF nodes to lists of corresponding impulses.
- terminal_names()[source]
Get list of names of terminal IRF nodes dominated by node.
- Returns:
list
ofstr
; names of terminal IRF nodes dominated by node.
- terminals()[source]
Get list of terminal IRF nodes dominated by node.
- Returns:
list
ofIRFNode
; terminal IRF nodes dominated by node.
- terminals_by_name()[source]
Get dictionary mapping names of terminal IRF nodes dominated by node to their corresponding nodes.
- Returns:
dict
; map from node names to nodes
- unablate_impulses(impulse_ids)[source]
Insert impulses in impulse_ids into fixed effects (leaving random effects structure unchanged).
- Parameters:
impulse_ids –
list
ofstr
; impulse ID’s- Returns:
None
- unary_nonparametric_coef_names()[source]
Get list of names of non-parametric coefficients with no siblings dominated by node. Because unary splines are non-parametric, their coefficients are fixed at 1. Trainable coefficients are therefore perfectly confounded with the spline parameters. Splines dominating multiple coefficients are excepted, since the same kernel shape must be scaled in different ways.
- Returns:
list
ofstr
; names of unary spline coefficients dominated by node.
- class cdr.formula.Impulse(name, ops=None, is_re=False)[source]
Bases:
object
Data structure representing an impulse in a CDR model.
- Parameters:
name –
str
; name of impulseops –
list
ofstr
, orNone
; ops to apply to impulse. IfNone
, no ops.is_re –
bool
; whether impulse is a regular expression search pattern
- categorical(X)[source]
Checks whether impulse is categorical in a dataset
- Parameters:
X – list
pandas
tables; data to to check.- Returns:
bool
;True
if impulse is categorical in X,False
otherwise.
- expand_categorical(X)[source]
Expand any categorical predictors in X into 1-hot columns.
- Parameters:
X – list of
pandas
tables; input data- Returns:
2-tuple of
pandas
table,list
ofImpulse
; expanded data, list of expandedImpulse
objects
- expand_re(X)[source]
Expand any regular expression predictors in X into a sequence of all matching columns.
- Parameters:
X – list of
pandas
tables; input data- Returns:
list
ofImpulse
; list of expandedImpulse
objects
- get_matcher()[source]
Return a compiled regex matcher to compare to data columns
- Returns:
re
object
- class cdr.formula.ImpulseInteraction(impulses, ops=None)[source]
Bases:
object
Data structure representing an interaction of impulse-aligned variables (impulses) in a CDR model.
- Parameters:
impulses –
list
ofImpulse
; impulses to interact.ops –
list
ofstr
, orNone
; ops to apply to interaction. IfNone
, no ops.
- expand_categorical(X)[source]
Expand any categorical predictors in X into 1-hot columns.
- Parameters:
X – list of
pandas
tables; input data.- Returns:
3-tuple of
pandas
table,list
ofImpulseInteraction
,list
oflist
ofImpulse
; expanded data, list of expandedImpulseInteraction
objects, list of lists of expandedImpulse
objects, one list for each interaction.
- expand_re(X)[source]
Expand any regular expression predictors in X into a sequence of all matching columns.
- Parameters:
X – list of
pandas
tables; input data- Returns:
2-tuple of
list
ofImpulseInteraction
,list
oflist
ofImpulse
; list of expandedImpulseInteraction
objects, list of lists of expandedImpulse
objects, one list for each interaction.
- impulses()[source]
Get list of impulses dominated by interaction.
- Returns:
list
ofImpulse
; impulses dominated by interaction.
- class cdr.formula.NN(nodes, nn_type, rangf=None, nn_key=None, nn_config=None)[source]
Bases:
object
Data structure representing a neural network within a CDR model.
- Parameters:
nodes –
list
ofIRFNode
, and/orNNImpulse
objects; nodes associated with this NNnn_type –
str
; name of NN type ('irf'
or'impulse'
).rangf –
str
or list ofstr
; random grouping factors for which to build random effects for this NN.nn_type –
str
orNone
; key uniquely identifying this NN node (constructed automatically ifNone
).nn_config –
dict
orNone
; map of NN config fields to their values for this NN node.
- all_impulse_names()[source]
Get list of all impulse names associated with this NN component.
- Returns:
list
ofstr
: All impulse names associated with this NN component.
- class cdr.formula.NNImpulse(impulses, impulses_as_inputs=True, inputs_to_add=None, inputs_to_drop=None, nn_config=None)[source]
Bases:
object
Data structure representing a feedforward neural network transform of one or more impulses in a CDR model.
- Parameters:
impulses –
list
ofImpulse
; impulses to transform.impulses_as_inputs –
bool
; whether to include impulses as NN inputs.inputs_to_add –
list
ofImpulse
orNone
; extra impulses to add to NN input.inputs_to_drop –
list
ofImpulse
orNone
; output impulses to drop from NN input.nn_config –
dict
orNone
; map of NN config fields to their values for this NN node.
- expand_categorical(X)[source]
Expand any categorical predictors in X into 1-hot columns.
- Parameters:
X – list of
pandas
tables; input data.- Returns:
3-tuple of
pandas
table,list
ofNNImpulse
,list
oflist
ofImpulse
; expanded data, list of expandedNNImpulse
objects, list of lists of expandedImpulse
objects, one list for each interaction.
- expand_re(X)[source]
Expand any regular expression predictors in X into a sequence of all matching columns.
- Parameters:
X – list of
pandas
tables; input data- Returns:
2-tuple of
list
ofImpulseInteraction
,list
oflist
ofImpulse
; list of expandedImpulseInteraction
objects, list of lists of expandedImpulse
objects, one list for each interaction.
- impulses()[source]
Get list of output impulses dominated by NN.
- Returns:
list
ofImpulse
; impulses dominated by NN.
- class cdr.formula.ResponseInteraction(responses, rangf=None)[source]
Bases:
object
Data structure representing an interaction of response-aligned variables (containing at least one IRF-convolved impulse) in a CDR model.
- Parameters:
responses –
list
of terminalIRFNode
,Impulse
, and/orImpulseInteraction
objects; responses to interact.rangf –
str
or list ofstr
; random grouping factors for which to build random effects for this interaction.
- add_rangf(rangf)[source]
Add random grouping factor name to this interaction.
- Parameters:
rangf –
str
; random grouping factor name- Returns:
None
- contains_member(x)[source]
Check if object is a member of the set of responses belonging to this interaction
- Parameters:
x –
IRFNode
,Impulse
, and/orImpulseInteraction
object; object to check.- Returns:
bool
; whether x is a member of the set of responses
- dirac_delta_responses()[source]
Get list of response-aligned Dirac delta variables dominated by interaction.
- Returns:
list
ofImpulse
and/orImpulseInteraction
objects; Dirac delta variables dominated by interaction.
- irf_responses()[source]
Get list of IRFs dominated by interaction.
- Returns:
list
ofIRFNode
objects; terminal IRFs dominated by interaction.
- nn_impulse_responses()[source]
Get list of NN impulse terms dominated by interaction.
- Returns:
list
ofNNImpulse
objects; NN impulse terms dominated by interaction.
- cdr.formula.pythonize_string(s)[source]
Convert string to valid python variable name
- Parameters:
s –
str
; source string- Returns:
str
; pythonized string
- cdr.formula.standardize_formula_string(s)[source]
Standardize a formula string, removing notational variation. IRF specifications
C(...)
are sorted alphabetically by the IRF call name e.g.Gamma()
. The order of impulses within an IRF specification is preserved.- Parameters:
s –
str
; the formula string to be standardized- Returns:
str
; standardization of s
cdr.io module
- cdr.io.read_tabular_data(X_paths, Y_paths, series_ids, categorical_columns=None, sep=' ', verbose=True)[source]
Read impulse and response data into pandas dataframes and perform basic pre-processing.
- Parameters:
X_paths –
str
orlist
ofstr
; path(s) to impulse (predictor) data (multiple tables are concatenated). Each path may also be a;
-delimited list of paths to files containing predictors with different timestamps, where the predictors in each file are all timestamped with respect to the same reference point.Y_paths –
str
orlist
ofstr
; path(s) to response data (multiple tables are concatenated). Each path may also be a;
-delimited list of paths to files containing different response variables with different timestamps, where the response variables in each file are all timestamped with respect to the same reference point.series_ids –
list
ofstr
; column names whose jointly unique values define unique time series.categorical_columns –
list
ofstr
; column names that should be treated as categorical.sep –
str
; string representation of field delimiter in input data.verbose –
bool
; whether to log progress to stderr.
- Returns:
2-tuple of list(
pandas
DataFrame); (impulse data, response data). X and Y each have one element for each dataset in X_paths/Y_paths, each containing the column-wise concatenation of all column files in the path.
cdr.kwargs module
- class cdr.kwargs.Kwarg(key, default_value, dtypes, descr, aliases=None, default_value_cdrnn='same', suppress=False)[source]
Bases:
object
Data structure for storing keyword arguments and their docstrings.
- Parameters:
key –
str
; Keydefault_value – Any; Default value
dtypes –
list
orclass
; List of classes or single class. Members can also be specific required values, eitherNone
or values of typestr
.descr –
str
; Description of kwargdefault_value_cdrnn – Any; Default value for CDRNN if distinct from CDR. If
'same'
, CDRNN uses default_value.suppress –
bool
; Whether to print documentation for this kwarg. Useful for hiding deprecated or little-used kwargs in order to simplify autodoc output.
- dtypes_str()[source]
String representation of dtypes permitted for kwarg.
- Returns:
str
; dtypes string.
- get_type_name(x)[source]
String representation of name of a dtype
- Parameters:
x – dtype; the dtype to name.
- Returns:
str
; name of dtype.
- in_settings(settings)[source]
Check whether kwarg is specified in a settings object parsed from a config file.
- Parameters:
settings – settings from a
ConfigParser
object.- Returns:
bool
; whether kwarg is found in settings.
- kwarg_from_config(settings, is_cdrnn=False)[source]
Given a settings object parsed from a config file, return value of kwarg cast to appropriate dtype. If missing from settings, return default.
- Parameters:
settings – settings from a
ConfigParser
object ordict
.is_cdrnn –
bool
; whether this is for a CDRNN model.
- Returns:
value of kwarg
- cdr.kwargs.cdr_kwarg_docstring()[source]
Generate docstring snippet summarizing all CDR kwargs, dtypes, and defaults.
- Returns:
str
; docstring snippet
cdr.model module
cdr.opt module
cdr.plot module
- cdr.plot.plot_heatmap(m, row_names, col_names, outdir='.', filename='eigenvectors.png', plot_x_inches=7, plot_y_inches=5, cmap='Blues')[source]
Plot a heatmap. Used in CDR for visualizing eigenvector matrices in principal components models.
- Parameters:
m – 2D
numpy
array; source data for plot.row_names –
list
ofstr
; row names.col_names –
list
ofstr
; column names.outdir –
str
; output directory.filename –
str
; filename.plot_x_inches –
float
; width of plot in inches.plot_y_inches –
float
; height of plot in inches.cmap –
str
; name ofmatplotlib
cmap
object (determines colors of plotted IRF).
- Returns:
None
- cdr.plot.plot_irf(plot_x, plot_y, irf_names, lq=None, uq=None, density=None, sort_names=True, prop_cycle_length=None, prop_cycle_map=None, outdir='.', filename='irf_plot.png', irf_name_map=None, plot_x_inches=6, plot_y_inches=4, ylim=None, cmap='gist_rainbow', legend=True, xlab=None, ylab=None, use_line_markers=False, use_grid=True, transparent_background=False, dpi=300, dump_source=False)[source]
Plot impulse response functions.
- Parameters:
plot_x –
numpy
array with shape (T,1); time points for which to plot the response. For example, if the plots contain 1000 points from 0s to 10s, plot_x could be generated asnp.linspace(0, 10, 1000)
.plot_y –
numpy
array with shape (T, N); response of each IRF at each time point.irf_names –
list
ofstr
; CDR ID’s of IRFs in the same order as they appear in axis 1 of plot_y.lq –
numpy
array with shape (T, N), orNone
; lower bound of credible interval for each time point. IfNone
, no credible interval will be plotted.uq –
numpy
array with shape (T, N), orNone
; upper bound of credible interval for each time point. IfNone
, no credible interval will be plotted.sort_names –
bool
; alphabetically sort IRF names.prop_cycle_length –
int
orNone
; Length of plotting properties cycle (defines step size in the color map). IfNone
, inferred from irf_names.prop_cycle_map –
list
ofint
, orNone
; Integer indices to use in the properties cycle for each entry in irf_names. IfNone
, indices are automatically assigned.outdir –
str
; output directory.filename –
str
; filename.irf_name_map –
dict
ofstr
tostr
; map from CDR IRF ID’s to more readable names to appear in legend. Any plotted IRF whose ID is not found in irf_name_map will be represented with the CDR IRF ID.plot_x_inches –
float
; width of plot in inches.plot_y_inches –
float
; height of plot in inches.ylim – 2-element
tuple
orlist
; (lower_bound, upper_bound) to use for y axis. IfNone
, automatically inferred.cmap –
str
; name ofmatplotlib
cmap
object (determines colors of plotted IRF).legend –
bool
; include a legend.xlab –
str
orNone
; x-axis label. IfNone
, no label.ylab –
str
orNone
; y-axis label. IfNone
, no label.use_line_markers –
bool
; add markers to IRF lines.use_grid –
bool
; whether to show a background grid.transparent_background –
bool
; use a transparent background. IfFalse
, uses a white background.dpi –
int
; dots per inch.dump_source –
bool
; Whether to dump the plot source array to a csv file.
- Returns:
None
- cdr.plot.plot_irf_as_heatmap(plot_x, plot_y, irf_names, sort_names=True, outdir='.', filename='irf_hm.png', irf_name_map=None, plot_x_inches=6, plot_y_inches=4, ylim=None, cmap='seismic', xlab=None, ylab=None, transparent_background=False, dpi=300, dump_source=False)[source]
Plot impulse response functions as a heatmap.
- Parameters:
plot_x –
numpy
array with shape (T,1); time points for which to plot the response. For example, if the plots contain 1000 points from 0s to 10s, plot_x could be generated asnp.linspace(0, 10, 1000)
.plot_y –
numpy
array with shape (T, N); response of each IRF at each time point.irf_names –
list
ofstr
; CDR ID’s of IRFs in the same order as they appear in axis 1 of plot_y.sort_names –
bool
; alphabetically sort IRF names.outdir –
str
; output directory.filename –
str
; filename.irf_name_map –
dict
ofstr
tostr
; map from CDR IRF ID’s to more readable names to appear in legend. Any plotted IRF whose ID is not found in irf_name_map will be represented with the CDR IRF ID.plot_x_inches –
float
; width of plot in inches.plot_y_inches –
float
; height of plot in inches.ylim – 2-element
tuple
orlist
; (lower_bound, upper_bound) to use for y axis. IfNone
, automatically inferred.cmap –
str
; name ofmatplotlib
cmap
object (determines colors of plotted IRF).xlab –
str
orNone
; x-axis label. IfNone
, no label.ylab –
str
orNone
; y-axis label. IfNone
, no label.transparent_background –
bool
; use a transparent background. IfFalse
, uses a white background.dpi –
int
; dots per inch.dump_source –
bool
; Whether to dump the plot source array to a csv file.
- Returns:
None
- cdr.plot.plot_qq(theoretical, actual, actual_color='royalblue', expected_color='firebrick', outdir='.', filename='qq_plot.png', plot_x_inches=6, plot_y_inches=4, legend=True, xlab='Theoretical', ylab='Empirical', ticks=True, as_lines=False, transparent_background=False, dpi=300)[source]
Generate quantile-quantile plot.
- Parameters:
theoretical –
numpy
array with shape (T,); theoretical error quantiles.actual –
numpy
array with shape (T,); empirical errors.actual_color –
str
; color for actual values.expected_color –
str
; color for expected values.outdir –
str
; output directory.filename –
str
; filename.plot_x_inches –
float
; width of plot in inches.plot_y_inches –
float
; height of plot in inches.legend –
bool
; include a legend.xlab –
str
orNone
; x-axis label. IfNone
, no label.ylab –
str
orNone
; y-axis label. IfNone
, no label.as_lines –
bool
; render QQ plot using lines. Otherwise, use points.transparent_background –
bool
; use a transparent background. IfFalse
, uses a white background.dpi –
int
; dots per inch.
- Returns:
None
- cdr.plot.plot_surface(x, y, z, lq=None, uq=None, density=None, bounds_as_surface=False, outdir='.', filename='surface.png', irf_name_map=None, plot_x_inches=6, plot_y_inches=4, xlim=None, ylim=None, zlim=None, plot_type='wireframe', cmap='coolwarm', xlab=None, ylab=None, zlab='Response', title=None, transparent_background=False, dpi=300, dump_source=False)[source]
Plot an IRF or interaction surface.
- Parameters:
x –
numpy
array with shape (M,N); x locations for each plot point, copied N times.y –
numpy
array with shape (M,N); y locations for each plot point, copied M times.z –
numpy
array with shape (M,N); z locations for each plot point.lq –
numpy
array with shape (M,N), orNone
; lower bound of credible interval for each plot point. IfNone
, no credible interval will be plotted.uq –
numpy
array with shape (M,N), orNone
; upper bound of credible interval for each plot point. IfNone
, no credible interval will be plotted.bounds_as_surface –
bool
; whether to plot interval bounds using additional surfaces. IfFalse
, bounds are plotted with vertical error bars instead. Ignored if lq, uq areNone
.outdir –
str
; output directory.filename –
str
; filename.irf_name_map –
dict
ofstr
tostr
; map from CDR IRF ID’s to more readable names to appear in legend. Any plotted IRF whose ID is not found in irf_name_map will be represented with the CDR IRF ID.plot_x_inches –
float
; width of plot in inches.plot_y_inches –
float
; height of plot in inches.xlim – 2-element
tuple
orlist
orNone
; (lower_bound, upper_bound) to use for x axis. IfNone
, automatically inferred.ylim – 2-element
tuple
orlist
orNone
; (lower_bound, upper_bound) to use for y axis. IfNone
, automatically inferred.zlim – 2-element
tuple
orlist
orNone
; (lower_bound, upper_bound) to use for z axis. IfNone
, automatically inferred.plot_type –
str
; name of plot type to generate. One of["contour", "surf", "trisurf"]
.cmap –
str
; name ofmatplotlib
cmap
object (determines colors of plotted IRF).legend –
bool
; include a legend.xlab –
str
orNone
; x-axis label. IfNone
, no label.ylab –
str
orNone
; y-axis label. IfNone
, no label.zlab –
str
orNone
; z-axis label. IfNone
, no label.use_line_markers –
bool
; add markers to IRF lines.transparent_background –
bool
; use a transparent background. IfFalse
, uses a white background.dpi –
int
; dots per inch.dump_source –
bool
; Whether to dump the plot source array to a csv file.
- Returns:
None
cdr.signif module
- cdr.signif.correlation_test(y, x1, x2, nested=False, verbose=True)[source]
Perform a parametric test of difference in correlation with observations between two prediction vectors, based on Steiger (1980).
- Parameters:
y –
numpy
vector; observation vector.x1 –
numpy
vector; first prediction vector.x2 –
numpy
vector; second prediction vector.nested –
bool
; assume that the second model is nested within the first.verbose –
bool
; report progress logs to standard error.
- Returns:
- cdr.signif.permutation_test(a, b, n_iter=10000, n_tails=2, mode='loss', agg='mean', nested=False, verbose=True)[source]
Perform a paired permutation test for significance.
- Parameters:
a –
numpy
array; first error/loss/prediction matrix, shape (n_item, n_model).b –
numpy
array; second error/loss/prediction matrix, shape (n_item, n_model).n_iter –
int
; number of resampling iterations.n_tails –
int
; number of tails.mode –
str
; one of["mse", "loglik"]
, the type of error used (SE’s are averaged while loglik’s are summed).agg –
str
; aggregation function over ensemble components. E.g.,'mean'
,'median'
,'min'
,'max'
.nested –
bool
; assume that the second model is nested within the first.verbose –
bool
; report progress logs to standard error.
- Returns:
cdr.synth module
- class cdr.synth.SyntheticModel(n_pred, irf_name, irf_params=None, coefs=None, fn=None, interactions=False, ranef_range=None, n_ranef_levels=None)[source]
Bases:
object
A data structure representing a synthetic “true” model for empirical validation of CDR fits. Contains a randomly generated set of IRFs that can be used to convolve data, and provides methods for sampling data with particular structure and convolving it with the true IRFs in order to generate a response vector.
- Parameters:
n_pred –
int
; Number of predictors in the synthetic model.irf_name –
str
; Name of IRF kernel to use. One of['Exp', 'Normal', 'Gamma', 'ShiftedGamma']
.irf_params –
dict
orNone
; Dictionary of IRF parameters to use, with parameter names as keys and numeric arrays as values. Values must each have n_pred cells. IfNone
, parameter values will be randomly sampled.coefs – numpy array or
None
; Vector of coefficients to use, wherelen(coefs) == n_pred
. IfNone
, coefficients will be randomly sampled.fn –
str
orNone
; Effect shape to use. One of['quadratic', 'exp', 'logmod', 'linear']. If ``None
, linear effects.interactions –
bool
; Whether there are randomly sampled pairwise interactions (same bounds as those used for coefs).ranef_range –
float
orNone
; Maximum magnitude of simulated random effects. If0
orNone
, no random effects.n_ranef_levels –
int
orNone
; Number of random effects levels. If0
orNone
, no random effects.
- convolve(X, t_X, t_y, history_length=None, err_sd=None, allow_instantaneous=True, ranef_level=None, verbose=True)[source]
Convolve data using the model’s IRFs.
- Parameters:
X – numpy array; 2-D array of predictors.
t_X – numpy array; 1-D vector of predictor timestamps.
t_y – numpy array; 1-D vector of response timestamps.
history_length –
int
orNone
; Drop preceding events more thanhistory_length
steps into the past. IfNone
, no history clipping.err_sd –
float
orNone
; Standard deviation of Gaussian noise to inject into responses. IfNone
, use the empirical standard deviation of the response vector.allow_instantaneous –
bool
; Whether to compute responses whent==0
.ranef_level –
str
orNone
; Random effects level to use (orNone
to use population-level effect)verbose –
bool
; Verbosity.
- Returns:
(2-D numpy array, 1-D numpy array); Matrix of convolved predictors, vector of responses
- convolve_v2(X, t_X, t_y, err_sd=None, allow_instantaneous=True, verbose=True)[source]
Convolve data using the model’s IRFs. Alternate memory-intensive implementation that is faster for small arrays but can exhaust resources for large ones.
- Parameters:
X – numpy array; 2-D array of predictors.
t_X – numpy array; 1-D vector of predictor timestamps.
t_y – numpy array; 1-D vector of response timestamps.
err_sd –
float
; Standard deviation of Gaussian noise to inject into responses.allow_instantaneous –
bool
; Whether to compute responses whent==0
.verbose –
bool
; Verbosity.
- Returns:
(2-D numpy array, 1-D numpy array); Matrix of convolved predictors, vector of responses
- get_curves(n_time_units=None, n_time_points=None, ranef_level=None)[source]
Extract response curves as an array.
- Parameters:
n_time_units –
float
; Number of units of time over which to extract curves.n_time_points –
int
; Number of samples to extract for each curve (resolution of curve)ranef_level –
str
orNone
; Random effects level to use (orNone
to use population-level effect)
- Returns:
numpy array; 2-D numpy array with shape
[T, K]
, whereT
is n_time_points andK
is the number of predictors in the model.
- irf(x, coefs=False, ranef_level=None)[source]
Computes the values of the model’s IRFs elementwise over a vector of timepoints.
- Parameters:
x – numpy array; 1-D array with shape
[N]
containing timepoints at which to query the IRFs.coefs –
bool
; Whether to rescale responses by coefficientsranef_level –
str
orNone
; Random effects level to use (orNone
to use population-level effect)
- Returns:
numpy array; 2-D array with shape
[N, K]
containing values of the model’sK
IRFs evaluated at the timepoints in x.
- plot_irf(n_time_units=None, n_time_points=None, dir='.', filename='synth_irf.png', plot_x_inches=6, plot_y_inches=4, cmap='gist_rainbow', legend=False, xlab=None, ylab=None, use_line_markers=False, transparent_background=False)[source]
Plot impulse response functions.
- Parameters:
n_time_units –
float
; number if time units to use for plotting.n_time_points –
int
; number of points to use for plotting.dir –
str
; output directory.filename –
str
; filename.plot_x_inches –
float
; width of plot in inches.plot_y_inches –
float
; height of plot in inches.cmap –
str
; name ofmatplotlib
cmap
object (determines colors of plotted IRF).legend –
bool
; include a legend.xlab –
str
orNone
; x-axis label. IfNone
, no label.ylab –
str
orNone
; y-axis label. IfNone
, no label.use_line_markers –
bool
; add markers to IRF lines.transparent_background –
bool
; use a transparent background. IfFalse
, uses a white background.
- Returns:
None
- sample_data(m, n=None, X_interval=None, y_interval=None, rho=None, align_X_y=True)[source]
Samples synthetic predictors and time vectors
- Parameters:
m –
int
; Number of predictors.n –
int
; Number of response query points.X_interval –
str
,float
,list
,tuple
, orNone
; Predictor interval model. IfNone
, predictor offsets are randomly sampled from an exponential distribution with parameter1
. Iffloat
, predictor offsets are evenly spaced with interval X_interval. Iflist
ortuple
, the first element is the name of a scipy distribution to use for sampling offsets, and all remaining elements are positional arguments to that distribution.y_interval –
str
,float
,list
,tuple
, orNone
; Response interval model. IfNone
, response offsets are randomly sampled from an exponential distribution with parameter1
. Iffloat
, response offsets are evenly spaced with interval y_interval. Iflist
ortuple
, the first element is the name of a scipy distribution to use for sampling offsets, and all remaining elements are positional arguments to that distribution.rho –
float
; Level of pairwise correlation between predictors.align_X_y –
bool
; Whether predictors and responses are required to be sampled at the same points in time.
- Returns:
(2-D numpy array, 1-D numpy array, 1-D numpy array); Matrix of predictors, vector of predictor timestamps, vector of response timestamps
cdr.util module
- cdr.util.filter_models(names, filters=None, cdr_only=False)[source]
Return models contained in names that are permitted by filters, preserving order in which filters were matched. Filters can be ordinary strings, regular expression objects, or string representations of regular expressions. For a regex filter to be considered a match, the expression must entirely match the name. If
filters
is zero-length, returns names.- Parameters:
names –
list
ofstr
; pool of model names to filter.filters –
list
of{str, SRE_Pattern}
orNone
; filters to apply in order. IfNone
, no additional filters.cdr_only –
bool
; ifTrue
, only returns CDR models. IfFalse
, returns all models admitted by filters.
- Returns:
list
ofstr
; names in names that pass at least one filter, or all of names if no filters are applied.
- cdr.util.filter_names(names, filters)[source]
Return elements of names permitted by filters, preserving order in which filters were matched. Filters can be ordinary strings, regular expression objects, or string representations of regular expressions. For a regex filter to be considered a match, the expression must entirely match the name.
- Parameters:
names –
list
ofstr
; pool of names to filter.filters –
list
of{str, SRE_Pattern}
; filters to apply in order
- Returns:
list
ofstr
; names in names that pass at least one filter
- cdr.util.get_random_permutation(n)[source]
Draw a random permutation of integers 0 to n. Used to shuffle arrays of length n. For example, a permutation and its inverse can be generated by calling
p, p_inv = get_random_permutation(n)
. To randomly shuffle an n-dimensional vectorx
, callx[p]
. To un-shufflex
after it has already been shuffled, callx[p_inv]
.- Parameters:
n – maximum value
- Returns:
2-tuple of
numpy
arrays; the permutation and its inverse
- cdr.util.load_cdr(dir_path, suffix='')[source]
Convenience method for reconstructing a saved CDR object. First loads in metadata from
m.obj
, then uses that metadata to construct the computation graph. Then, if saved weights are found, these are loaded into the graph.- Parameters:
dir_path – Path to directory containing the CDR checkpoint files.
suffix –
str
; file suffix.
- Returns:
The loaded CDR instance.
- cdr.util.mae(true, preds)[source]
Compute mean absolute error (MAE).
- Parameters:
true – True values
preds – Predicted values
- Returns:
float
; MAE
- cdr.util.mse(true, preds)[source]
Compute mean squared error (MSE).
- Parameters:
true – True values
preds – Predicted values
- Returns:
float
; MSE
- cdr.util.names2ix(names, l, dtype=<class 'numpy.int32'>)[source]
Generate 1D numpy array of indices in l corresponding to names in names
- Parameters:
names –
list
ofstr
; names to look up in ll –
list
ofstr
; list of names from which to extract indicesdtype –
numpy
dtype object; return dtype
- Returns:
numpy
array; indices of names in l
- cdr.util.nested(model_name_1, model_name_2)[source]
Check whether two CDR models are nested with 1 degree of freedom
- Parameters:
model_name_1 –
str
; name of first modelmodel_name_2 –
str
; name of second model
- Returns:
bool
;True
if models are nested with 1 degree of freedom,False
otherwise
- cdr.util.pca(X, n_dim=None, dtype=<class 'numpy.float32'>)[source]
Perform principal components analysis on a data table.
- Parameters:
X –
numpy
orpandas
array; the input datan_dim –
int
orNone
; maximum number of principal components. IfNone
, all components are retained.dtype –
numpy
dtype; return dtype
- Returns:
5-tuple of
numpy
arrays; transformed data, eigenvectors, eigenvalues, input means, and input standard deviations
- cdr.util.percent_variance_explained(true, preds)[source]
Compute percent variance explained.
- Parameters:
true – True values
preds – Predicted values
- Returns:
float
; percent variance explained