apt.utils package

Subpackages

Submodules

apt.utils.dataset_utils module

apt.utils.dataset_utils.get_adult_dataset_pd()

Loads the UCI Adult dataset from datasets/adult or downloads it from https://archive.ics.uci.edu/ml/machine-learning-databases/adult/ if necessary.

Returns:

Dataset and labels as pandas dataframes. Returned as a tuple (x_train, y_train), (x_test, y_test)

apt.utils.dataset_utils.get_diabetes_dataset_np(test_set: float = 0.3)

Loads the Diabetes dataset from scikit-learn.

Parameters:

test_set (float) – Proportion of the data to use as validation split (value between 0 and 1). Default is 0.3

Returns:

Entire dataset and labels as numpy arrays. Returned as a tuple (x_train, y_train), (x_test, y_test)

apt.utils.dataset_utils.get_german_credit_dataset_pd(test_set: float = 0.3)

Loads the UCI German credit dataset from datasets/german or downloads it from https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/ if necessary.

Parameters:

test_set (float) – Proportion of the data to use as validation split (value between 0 and 1). Default is 0.3

Returns:

Dataset and labels as pandas dataframes. Returned as a tuple (x_train, y_train), (x_test, y_test)

apt.utils.dataset_utils.get_iris_dataset_np(test_set: float = 0.3)

Loads the Iris dataset from scikit-learn.

Parameters:

test_set (float) – Proportion of the data to use as validation split (value between 0 and 1). Default is 0.3

Returns:

Entire dataset and labels as numpy arrays. Returned as a tuple (x_train, y_train), (x_test, y_test)

apt.utils.dataset_utils.get_nursery_dataset_pd(raw: bool = True, test_set: float = 0.2, transform_social: bool = False)

Loads the UCI Nursery dataset from datasets/nursery or downloads it from https://archive.ics.uci.edu/ml/machine-learning-databases/nursery/ if necessary.

Parameters:
  • raw (boolean) – True if no preprocessing should be applied to the data. Otherwise, categorical data is one-hot encoded and data is scaled using sklearn’s StandardScaler.

  • test_set (float) – Proportion of the data to use as validation split. The value should be between 0 and 1. Default is 0.2

  • transform_social (boolean) – If True, transforms the social feature to be binary for the purpose of attribute inference. This is done by assigning the original value ‘problematic’ the new value 1, and the other original values are assigned the new value 0.

Returns:

Dataset and labels as pandas dataframes. Returned as a tuple (x_train, y_train), (x_test, y_test)

Module contents