tabensemb.trainer.Trainer._bootstrap_fit#

method

Trainer._bootstrap_fit(program: str, df: DataFrame, derived_data: Dict[str, ndarray], focus_feature: str, model_name: str, n_bootstrap: int = 1, grid_size: int = 30, refit: bool = True, resample: bool = True, percentile: float = 100, x_min: float | None = None, x_max: float | None = None, CI: float = 0.95, average: bool = True, inspect_attr_kwargs: Dict | None = None) Tuple[ndarray, ndarray, ndarray, ndarray][source]#

Make bootstrap resampling, fit the selected model on the resampled data, and assign sequential values to the selected feature to see how the prediction changes with respect to the feature.

Cook, Thomas R., et al. Explaining Machine Learning by Bootstrapping Partial Dependence Functions and Shapley Values. No. RWP 21-12. 2021.

Parameters:
program

The selected model base.

model_name

The selected model in the model base.

df

The tabular dataset.

derived_data

The derived data calculated using derive_unstacked().

focus_feature

The feature to assign sequential values.

n_bootstrap

The number of bootstrapping, fitting, and assigning runs.

grid_size

The number of sequential values.

refit

Whether to fit the model on the bootstrap dataset (with warm_start=True).

resample

Whether to do bootstrap resampling. Only recommended to False when n_bootstrap=1.

percentile

The percentile of the feature used to generate sequential values.

x_min

The lower limit of the generated sequential values. It will override the left percentile.

x_max

The upper limit of the generated sequential values. It will override the right percentile.

CI

The confidence interval level to evaluate bootstrapped predictions.

average

If True, CI will be calculated on results (grid_size, n_bootstrap)``where predictions for all samples are averaged for each bootstrap run. If False, CI will be calculated on results ``(grid_size, n_bootstrap*len(df)).

Returns:
np.ndarray

The generated sequential values for the feature.

np.ndarray

Averaged predictions on the sequential values across multiple bootstrap runs and all samples.

np.ndarray

The left confidence interval.

np.ndarray

The right confidence interval.