tabensemb.trainer.Trainer._bootstrap_fit#

method

Trainer._bootstrap_fit(program: str, df: DataFrame, derived_data: Dict[str, ndarray], focus_feature: str, model_name: str, n_bootstrap: int = 1, grid_size: int = 30, refit: bool = True, resample: bool = True, percentile: float = 100, x_min: float | None = None, x_max: float | None = None, CI: float = 0.95, average: bool = True, inspect_attr_kwargs: Dict | None = None) → Tuple[ndarray, ndarray, ndarray, ndarray][source]#

Make bootstrap resampling, fit the selected model on the resampled data, and assign sequential values to the selected feature to see how the prediction changes with respect to the feature.

Cook, Thomas R., et al. Explaining Machine Learning by Bootstrapping Partial Dependence Functions and Shapley Values. No. RWP 21-12. 2021.

Parameters:

program: The selected model base.
model_name: The selected model in the model base.
df: The tabular dataset.
derived_data: The derived data calculated using derive_unstacked().
focus_feature: The feature to assign sequential values.
n_bootstrap: The number of bootstrapping, fitting, and assigning runs.
grid_size: The number of sequential values.
refit: Whether to fit the model on the bootstrap dataset (with warm_start=True).
resample: Whether to do bootstrap resampling. Only recommended to False when n_bootstrap=1.
percentile: The percentile of the feature used to generate sequential values.
x_min: The lower limit of the generated sequential values. It will override the left percentile.
x_max: The upper limit of the generated sequential values. It will override the right percentile.
CI: The confidence interval level to evaluate bootstrapped predictions.
average: If True, CI will be calculated on results (grid_size, n_bootstrap)``where predictions for all samples are averaged for each bootstrap run. If False, CI will be calculated on results ``(grid_size, n_bootstrap*len(df)).

Returns:

np.ndarray: The generated sequential values for the feature.
np.ndarray: Averaged predictions on the sequential values across multiple bootstrap runs and all samples.
np.ndarray: The left confidence interval.
np.ndarray: The right confidence interval.