apt.utils.datasets package
Submodules
apt.utils.datasets.datasets module
The AI Privacy Toolbox (datasets). Implementation of utility classes for dataset handling
- class apt.utils.datasets.datasets.ArrayDataset(x: ndarray | DataFrame | List | Tensor | csr_matrix, y: ndarray | DataFrame | List | Tensor | csr_matrix | None = None, features_names: list | None = None, **kwargs)
Bases:
Dataset
Dataset that is based on x and y arrays (e.g., numpy/pandas/list…)
- Parameters:
x (numpy array or pandas DataFrame or list or pytorch Tensor) – collection of data samples
y (numpy array or pandas DataFrame or list or pytorch Tensor, optional) – collection of labels
feature_names (list of strings, optional) – The feature names, in the order that they appear in the data
- get_labels() ndarray
Get labels
- Returns:
labels as numpy array
- get_predictions() ndarray
Get predictions
- Returns:
predictions as numpy array
- get_samples() ndarray
Get data samples
- Returns:
data samples as numpy array
- class apt.utils.datasets.datasets.Data(train: Dataset | None = None, test: Dataset | None = None, **kwargs)
Bases:
object
Class for storing train and test datasets.
- Parameters:
train (Dataset) – the training set
test (Dataset, optional) – the test set
- get_test_labels() Collection[Any]
Get test set labels
- Returns:
test labels, or None if no test labels provided
- get_test_predictions() Collection[Any]
Get test set predictions, or None if no test predictions provided
- Returns:
test labels
- get_test_samples() Collection[Any]
Get test set samples
- Returns:
test samples, or None if no test data provided
- get_train_labels() Collection[Any]
Get train set labels, or None if no training labels provided
- Returns:
training labels
- get_train_predictions() Collection[Any]
Get train set predictions, or None if no training predictions provided
- Returns:
training labels
- get_train_samples() Collection[Any]
Get train set samples, or None if no training data provided
- Returns:
training samples
- class apt.utils.datasets.datasets.Dataset(**kwargs)
Bases:
object
Base Abstract Class for Dataset
- abstract get_labels() Collection[Any]
Return labels
- Returns:
the labels
- abstract get_predictions() ndarray
Get predictions
- Returns:
predictions as numpy array
- abstract get_samples() Collection[Any]
Return data samples
- Returns:
the data samples
- class apt.utils.datasets.datasets.DatasetFactory
Bases:
object
Factory class for dataset creation
- classmethod create_dataset(name: str, **kwargs) Dataset
Factory command to create dataset instance.
This method gets the appropriate Dataset class from the registry and creates an instance of it, while passing in the parameters given in
kwargs
.- Parameters:
name (string) – The name of the dataset to create.
kwargs (keyword arguments as expected by the class) – dataset parameters
- Returns:
An instance of the dataset that is created.
- classmethod register(name: str) Callable
Class method to register Dataset to the internal registry
- Parameters:
name (string) – dataset name
- Returns:
a Callable that returns the registered dataset class
- registry = {}
- class apt.utils.datasets.datasets.DatasetWithPredictions(pred: ndarray | DataFrame | List | Tensor | csr_matrix, x: ndarray | DataFrame | List | Tensor | csr_matrix | None = None, y: ndarray | DataFrame | List | Tensor | csr_matrix | None = None, features_names: list | None = None, **kwargs)
Bases:
Dataset
Dataset that is based on arrays (e.g., numpy/pandas/list…). Includes predictions from a model, and possibly also features and true labels.
- Parameters:
x (numpy array or pandas DataFrame or list or pytorch Tensor) – collection of data samples
y (numpy array or pandas DataFrame or list or pytorch Tensor, optional) – collection of labels
feature_names (list of strings, optional) – The feature names, in the order that they appear in the data
- get_labels() ndarray
Get labels
- Returns:
labels as numpy array
- get_predictions() ndarray
Get predictions
- Returns:
predictions as numpy array
- get_samples() ndarray
Get data samples
- Returns:
data samples as numpy array
- class apt.utils.datasets.datasets.PytorchData(x: ndarray | DataFrame | List | Tensor | csr_matrix, y: ndarray | DataFrame | List | Tensor | csr_matrix | None = None, **kwargs)
Bases:
Dataset
Dataset for pytorch models.
- Parameters:
x (numpy array or pandas DataFrame or list or pytorch Tensor) – collection of data samples
y (numpy array or pandas DataFrame or list or pytorch Tensor, optional) – collection of labels
- get_item(idx: int) Tensor
Get the sample and label according to the given index
- Parameters:
idx (int) – the index of the sample to return
- Returns:
the sample and label as pytorch Tensors. Returned as a tuple (sample, label)
- get_labels() ndarray
Get labels.
- Returns:
labels as numpy array
- get_predictions() ndarray
Get predictions
- Returns:
predictions as numpy array
- get_sample_item(idx: int) Tensor
Get the sample according to the given index
- Parameters:
idx (int) – the index of the sample to return
- Returns:
the sample as a pytorch Tensor
- get_samples() ndarray
Get data samples.
- Returns:
samples as numpy array
- class apt.utils.datasets.datasets.StoredDataset(**kwargs)
Bases:
Dataset
Abstract Class for a Dataset that can be downloaded from a URL and stored in a file
- static download(url: str, dest_path: str, filename: str, unzip: bool | None = False) None
Download the dataset from URL
- Parameters:
url (string) – dataset URL, the dataset will be requested from this URL
dest_path (string) – local dataset destination path
filename (string) – local dataset filename
unzip (boolean, optional) – flag whether or not perform extraction. Default is False.
- Returns:
None
- static extract_archive(zip_path: str, dest_path: str | None = None, remove_archive: bool | None = False)
Extract dataset from archived file
- Parameters:
zip_path (string) – path to archived file
dest_path (string, optional) – directory path to uncompress the file to
remove_archive (boolean, optional) – whether remove the archive file after uncompress. Default is False.
- Returns:
None
- abstract load(**kwargs)
Load dataset
- Returns:
None
- abstract load_from_file(path: str)
Load dataset from file
- Parameters:
path (string) – the path to the file
- Returns:
None
- static split_debug(datafile: str, dest_datafile: str, ratio: int, shuffle: bool | None = True, delimiter: str | None = ',', fmt: str | list | None = None) None
Split the data and take only a part of it
- Parameters:
datafile (string) – dataset file path
dest_datafile (string) – destination path for the partial dataset file
ratio (int) – part of the dataset to save
shuffle (boolean, optional) – whether to shuffle the data or not. Default is True.
delimiter (string, optional) – dataset delimiter. Default is “,”
fmt (string or sequence of strings, optional) – format for the correct data saving. As defined by numpy.savetxt(). Default is None.
- Returns:
None
- apt.utils.datasets.datasets.array2numpy(arr: ndarray | DataFrame | List | Tensor | csr_matrix) ndarray
converts from INPUT_DATA_ARRAY_TYPE to numpy array
- apt.utils.datasets.datasets.array2torch_tensor(arr: ndarray | DataFrame | List | Tensor | csr_matrix) Tensor
converts from INPUT_DATA_ARRAY_TYPE to torch tensor array
Module contents
The AI Privacy Toolbox (datasets). Implementation of datasets utility components for datasets creation, load, and store