tabensemb.trainer.Trainer._bootstrap_fit#
method
- Trainer._bootstrap_fit(program: str, df: DataFrame, derived_data: Dict[str, ndarray], focus_feature: str, model_name: str, n_bootstrap: int = 1, grid_size: int = 30, refit: bool = True, resample: bool = True, percentile: float = 100, x_min: float | None = None, x_max: float | None = None, CI: float = 0.95, average: bool = True, inspect_attr_kwargs: Dict | None = None) Tuple[ndarray, ndarray, ndarray, ndarray][source]#
Make bootstrap resampling, fit the selected model on the resampled data, and assign sequential values to the selected feature to see how the prediction changes with respect to the feature.
Cook, Thomas R., et al. Explaining Machine Learning by Bootstrapping Partial Dependence Functions and Shapley Values. No. RWP 21-12. 2021.
- Parameters:
- program
The selected model base.
- model_name
The selected model in the model base.
- df
The tabular dataset.
- derived_data
The derived data calculated using
derive_unstacked().- focus_feature
The feature to assign sequential values.
- n_bootstrap
The number of bootstrapping, fitting, and assigning runs.
- grid_size
The number of sequential values.
- refit
Whether to fit the model on the bootstrap dataset (with warm_start=True).
- resample
Whether to do bootstrap resampling. Only recommended to False when n_bootstrap=1.
- percentile
The percentile of the feature used to generate sequential values.
- x_min
The lower limit of the generated sequential values. It will override the left percentile.
- x_max
The upper limit of the generated sequential values. It will override the right percentile.
- CI
The confidence interval level to evaluate bootstrapped predictions.
- average
If True, CI will be calculated on results
(grid_size, n_bootstrap)``where predictions for all samples are averaged for each bootstrap run. If False, CI will be calculated on results ``(grid_size, n_bootstrap*len(df)).
- Returns:
- np.ndarray
The generated sequential values for the feature.
- np.ndarray
Averaged predictions on the sequential values across multiple bootstrap runs and all samples.
- np.ndarray
The left confidence interval.
- np.ndarray
The right confidence interval.