tabensemb.config.UserConfig.from_uci#

method

static UserConfig.from_uci(name: str, datafile_name: str | None = None, column_names: List[str] | None = None, save_zip: bool = False, max_retries=3, timeout=20, sep=',') UserConfig | None[source]#

Search, download, and configure a dataset from https://archive.ics.uci.edu/. The dataset will be extracted and saved into a .csv file, and a corresponding UserConfig is returned. This function supports tabular datasets for “Classification” and “Regression”. Integer features are treated as continuous features.

Parameters:
name

The name of the dataset like “Heart Disease”, “Iris”, etc. The name will be searched on the website and be configured if there is a matched dataset.

datafile_name

The name of “.data” file in the downloaded .zip file. If is None and there exists more than one file with the suffix “.data” in a single dataset, the function will print available names.

column_names

Labels of columns in the “.data” file in the downloaded .zip file. If not given, names recorded on the website will be used. However, these names can be in a wrong order, of which “Auto MPG” is a typical example. So a warning will be logged, and save_zip will be set to True to let the user check the “.name” file in the .zip file for the correct order.

save_zip

Whether the downloaded .zip file should be stored.

max_retries

The maximum number of tries of urllib.request.urlopen.

timeout

Waiting time of urllib.request.urlopen.

sep

The delimiter of pd.read_csv.

Returns:
UserConfig

The configuration of the dataset. If the dataset can not be automatically configured, None will be returned and the reason will be printed.