tabensemb.data.datamodule.DataModule.set_data#
method
- DataModule.set_data(df: DataFrame, cont_feature_names: List[str], cat_feature_names: List[str], label_name: List[str], derived_stacked_features: List[str] | None = None, derived_data: Dict[str, ndarray] | None = None, warm_start: bool = False, verbose: bool = True, all_training: bool = False, train_indices: ndarray | None = None, val_indices: ndarray | None = None, test_indices: ndarray | None = None)[source]#
Set up the datamodule with a DataFrame. Data splitting, imputation, derivation, and processing will be performed.
- Parameters:
- df
The tabular dataset. Note that if a
DataModule.dfis passed here, it should be inverse-transformed first usingcategories_inverse_transform().- cont_feature_names
A list of continuous features in the tabular dataset.
- cat_feature_names
A list of categorical features in the tabular dataset.
- label_name
A list of targets. Multi target tasks are experimental.
- derived_stacked_features
A list of derived features in the tabular dataset. If not None, only these features are retained after derivation, and all AbstractFeatureSelectors will be skipped.
- derived_data
The derived data calculated using data derivers whose argument “stacked” is set to False, i.e. unstacked data. Unstacked derivations will be skipped if it is given.
- warm_start
Whether to use fitted data processors to process the data.
- verbose
Verbosity.
- all_training
Whether all samples are used for training.
- train_indices
Manually specify the training indices.
- val_indices
Manually specify the validation indices.
- test_indices
Manually specify the testing indices.