tabensemb.data.datamodule.DataModule.set_data#

method

DataModule.set_data(df: DataFrame, cont_feature_names: List[str], cat_feature_names: List[str], label_name: List[str], derived_stacked_features: List[str] | None = None, derived_data: Dict[str, ndarray] | None = None, warm_start: bool = False, verbose: bool = True, all_training: bool = False, train_indices: ndarray | None = None, val_indices: ndarray | None = None, test_indices: ndarray | None = None)[source]#

Set up the datamodule with a DataFrame. Data splitting, imputation, derivation, and processing will be performed.

Parameters:

df: The tabular dataset. Note that if a DataModule.df is passed here, it should be inverse-transformed first using categories_inverse_transform().
cont_feature_names: A list of continuous features in the tabular dataset.
cat_feature_names: A list of categorical features in the tabular dataset.
label_name: A list of targets. Multi target tasks are experimental.
derived_stacked_features: A list of derived features in the tabular dataset. If not None, only these features are retained after derivation, and all AbstractFeatureSelectors will be skipped.
derived_data: The derived data calculated using data derivers whose argument “stacked” is set to False, i.e. unstacked data. Unstacked derivations will be skipped if it is given.
warm_start: Whether to use fitted data processors to process the data.
verbose: Verbosity.
all_training: Whether all samples are used for training.
train_indices: Manually specify the training indices.
val_indices: Manually specify the validation indices.
test_indices: Manually specify the testing indices.