tabensemb.data.datamodule.DataModule.set_data#

method

DataModule.set_data(df: DataFrame, cont_feature_names: List[str], cat_feature_names: List[str], label_name: List[str], derived_stacked_features: List[str] | None = None, derived_data: Dict[str, ndarray] | None = None, warm_start: bool = False, verbose: bool = True, all_training: bool = False, train_indices: ndarray | None = None, val_indices: ndarray | None = None, test_indices: ndarray | None = None)[source]#

Set up the datamodule with a DataFrame. Data splitting, imputation, derivation, and processing will be performed.

Parameters:
df

The tabular dataset. Note that if a DataModule.df is passed here, it should be inverse-transformed first using categories_inverse_transform().

cont_feature_names

A list of continuous features in the tabular dataset.

cat_feature_names

A list of categorical features in the tabular dataset.

label_name

A list of targets. Multi target tasks are experimental.

derived_stacked_features

A list of derived features in the tabular dataset. If not None, only these features are retained after derivation, and all AbstractFeatureSelectors will be skipped.

derived_data

The derived data calculated using data derivers whose argument “stacked” is set to False, i.e. unstacked data. Unstacked derivations will be skipped if it is given.

warm_start

Whether to use fitted data processors to process the data.

verbose

Verbosity.

all_training

Whether all samples are used for training.

train_indices

Manually specify the training indices.

val_indices

Manually specify the validation indices.

test_indices

Manually specify the testing indices.