tabensemb.config.UserConfig.from_uci#
method
- static UserConfig.from_uci(name: str, datafile_name: str | None = None, column_names: List[str] | None = None, save_zip: bool = False, max_retries=3, timeout=20, sep=',') UserConfig | None[source]#
Search, download, and configure a dataset from https://archive.ics.uci.edu/. The dataset will be extracted and saved into a .csv file, and a corresponding UserConfig is returned. This function supports tabular datasets for “Classification” and “Regression”. Integer features are treated as continuous features.
- Parameters:
- name
The name of the dataset like “Heart Disease”, “Iris”, etc. The name will be searched on the website and be configured if there is a matched dataset.
- datafile_name
The name of “.data” file in the downloaded .zip file. If is None and there exists more than one file with the suffix “.data” in a single dataset, the function will print available names.
- column_names
Labels of columns in the “.data” file in the downloaded .zip file. If not given, names recorded on the website will be used. However, these names can be in a wrong order, of which “Auto MPG” is a typical example. So a warning will be logged, and save_zip will be set to True to let the user check the “.name” file in the .zip file for the correct order.
- save_zip
Whether the downloaded .zip file should be stored.
- max_retries
The maximum number of tries of
urllib.request.urlopen.- timeout
Waiting time of
urllib.request.urlopen.- sep
The delimiter of
pd.read_csv.
- Returns:
- UserConfig
The configuration of the dataset. If the dataset can not be automatically configured, None will be returned and the reason will be printed.