5.1.1.4. FedEval.dataset

5.1.1.4.1. Submodules

5.1.1.4.2. Package Contents

5.1.1.4.2.1. Classes

FedData

By default, FedData produces datasets for horizontal federated learning

mnist

By default, FedData produces datasets for horizontal federated learning

cifar10

By default, FedData produces datasets for horizontal federated learning

cifar100

By default, FedData produces datasets for horizontal federated learning

FedData

By default, FedData produces datasets for horizontal federated learning

ConfigurationManager

the base class of singletons.

FedVerticalMatrix

By default, FedData produces datasets for horizontal federated learning

wine

By default, FedData produces datasets for horizontal federated learning

mnist_matrix

By default, FedData produces datasets for horizontal federated learning

synthetic_matrix_horizontal

By default, FedData produces datasets for horizontal federated learning

synthetic_matrix_vertical

By default, FedData produces datasets for horizontal federated learning

synthetic_matrix_horizontal_memmap

By default, FedData produces datasets for horizontal federated learning

ml25m_matrix

By default, FedData produces datasets for horizontal federated learning

vertical_linear_regression

By default, FedData produces datasets for horizontal federated learning

vertical_linear_regression_memmap

By default, FedData produces datasets for horizontal federated learning

ml25m_matrix_memmap

By default, FedData produces datasets for horizontal federated learning

ml100k_lr

By default, FedData produces datasets for horizontal federated learning

FedData

By default, FedData produces datasets for horizontal federated learning

sentiment140

By default, FedData produces datasets for horizontal federated learning

FedData

By default, FedData produces datasets for horizontal federated learning

shakespeare

By default, FedData produces datasets for horizontal federated learning

FedData

By default, FedData produces datasets for horizontal federated learning

celeba

CelebA Median: Top 1000 clients

FedData

By default, FedData produces datasets for horizontal federated learning

femnist

FEMnistLarge, 3500 Clients

5.1.1.4.2.2. Functions

shuffle(X, Y)

Input (X, Y) pairs, shuffle and return it.

shuffle(X, Y)

Input (X, Y) pairs, shuffle and return it.

load_synthetic(m, n, alpha)

load_synthetic_large_scale(m, n, alpha)

normalize_text(text)

Final cleanup of text by removing non-alpha characters like '

hashtags_preprocess(x)

Creating a hashtag token and processing the formatting of hastags, i.e. separate uppercase words

allcaps_preprocess(x)

If text/word written in uppercase, change to lowercase and tag with <allcaps>.

glove_preprocess(text)

To be consistent with use of GloVe vectors, we replicate most of their preprocessing.

tweet2Vec(tweet, word2vectors)

Takes in a processed tweet, tokenizes it, converts to GloVe embeddings

shuffle(X, Y)

Input (X, Y) pairs, shuffle and return it.

shuffle(X, Y)

Input (X, Y) pairs, shuffle and return it.

shuffle(X, Y)

Input (X, Y) pairs, shuffle and return it.

get_data_shape(dataset_name)

class FedEval.dataset.FedData

By default, FedData produces datasets for horizontal federated learning

property need_regenerate: bool
_load_and_process_data()
abstract load_data()
iid_data(save_file=True)
_save_dataset_files(dataset: List[Mapping[str, List[numpy.ndarray]]]) None
non_iid_data(save_file=True, called_in_iid=False) List[Mapping[str, List[numpy.ndarray]]]
FedEval.dataset.shuffle(X, Y)

Input (X, Y) pairs, shuffle and return it.

class FedEval.dataset.mnist

Bases: FedEval.dataset.FedDataBase.FedData

By default, FedData produces datasets for horizontal federated learning

load_data()
class FedEval.dataset.cifar10

Bases: FedEval.dataset.FedDataBase.FedData

By default, FedData produces datasets for horizontal federated learning

load_data()
class FedEval.dataset.cifar100

Bases: FedEval.dataset.FedDataBase.FedData

By default, FedData produces datasets for horizontal federated learning

load_data()
class FedEval.dataset.FedData

By default, FedData produces datasets for horizontal federated learning

property need_regenerate: bool
_load_and_process_data()
abstract load_data()
iid_data(save_file=True)
_save_dataset_files(dataset: List[Mapping[str, List[numpy.ndarray]]]) None
non_iid_data(save_file=True, called_in_iid=False) List[Mapping[str, List[numpy.ndarray]]]
FedEval.dataset.shuffle(X, Y)

Input (X, Y) pairs, shuffle and return it.

class FedEval.dataset.ConfigurationManager(data_config: RawConfigurationDict = _DEFAULT_D_CFG, model_config: RawConfigurationDict = _DEFAULT_MDL_CFG, runtime_config: RawConfigurationDict = _DEFAULT_RT_CFG, thread_safe: bool = False)

Bases: FedEval.config.singleton.Singleton, ConfigurationManagerInterface, ClientConfigurationManagerInterface, ServerConfigurationManagerInterface, _CfgYamlInterface, _CfgJsonInterface, _CfgFileInterface, _RoledConfigurationInterface

the base class of singletons. Each cls on the inheritance tree can own only one instance.

property data_unique_id
property config_unique_id
property data_dir_name: str

The output directory of the clients’ data.

Returns:

the name of the data directory.

Return type:

str

property log_dir_path: str

the path of the base of log directory.

property history_record_path: str

the path of the history record.

property job_id: str
property encoding: str

the encoding scheme during (de)serialization.

property data_config_filename: str
property model_config_filename: str
property runtime_config_filename: str
property data_config: _DataConfig
property model_config: _ModelConfig
property runtime_config: _RuntimeConfig
property num_of_train_clients_contacted_per_round: int

the number of clients selected to participate the main federated process in each round.

property num_of_eval_clients_contacted_per_round: int

the number of clients selected to participate the main federated process in each round.

property role: FedEval.config.role.Role

return the role of this runtime entity.

Raises:

AttributeError – called without role configured.

Returns:

the role of this runtime entity.

Return type:

Role

__init_once_lock
__initiated = False
_init_file_names(data_config_filename: str = DEFAULT_D_CFG_FILENAME_YAML, model_config_filename: str = DEFAULT_MDL_CFG_FILENAME_YAML, runtime_config_filename: str = DEFAULT_RT_CFG_FILENAME_YAML) None
classmethod generate_unique_id(data_config: dict, model_config: dict, runtime_config: dict)
static _get_md5(config_string)
static load_configs(src_path, serializer: str | _CfgSerializer = _CfgSerializer.YAML, data_config_filename: str = DEFAULT_D_CFG_FILENAME_YAML, model_config_filename: str = DEFAULT_MDL_CFG_FILENAME_YAML, runtime_config_filename: str = DEFAULT_RT_CFG_FILENAME_YAML, encoding=_DEFAULT_ENCODING) Tuple[RawConfigurationDict, RawConfigurationDict, RawConfigurationDict]
static save_configs(data_cfg: RawConfigurationDict, model_cfg: RawConfigurationDict, runtime_cfg: RawConfigurationDict, dst_path, data_config_filename: str = DEFAULT_D_CFG_FILENAME_YAML, model_config_filename: str = DEFAULT_MDL_CFG_FILENAME_YAML, runtime_config_filename: str = DEFAULT_RT_CFG_FILENAME_YAML, encoding=_DEFAULT_ENCODING, serializer: str | _CfgSerializer = _CfgSerializer.YAML)
static from_files(src_path: str, data_config_filename: str = DEFAULT_D_CFG_FILENAME_YAML, model_config_filename: str = DEFAULT_MDL_CFG_FILENAME_YAML, runtime_config_filename: str = DEFAULT_RT_CFG_FILENAME_YAML, serializer: str | _CfgSerializer = _CfgSerializer.YAML, encoding=_DEFAULT_ENCODING)
to_files(dst_dir_path: str, serializer: str | _CfgSerializer = _CfgSerializer.YAML, encoding: str | None = None) None
__init_role() None
class FedEval.dataset.FedVerticalMatrix

Bases: FedEval.dataset.FedDataBase.FedData, abc.ABC

By default, FedData produces datasets for horizontal federated learning

non_iid_data(*args)
iid_data(save_file=True)
class FedEval.dataset.wine

Bases: FedVerticalMatrix

By default, FedData produces datasets for horizontal federated learning

load_data()
class FedEval.dataset.mnist_matrix

Bases: FedVerticalMatrix

By default, FedData produces datasets for horizontal federated learning

load_data()
FedEval.dataset.load_synthetic(m, n, alpha)
FedEval.dataset.load_synthetic_large_scale(m, n, alpha)
class FedEval.dataset.synthetic_matrix_horizontal

Bases: FedEval.dataset.FedDataBase.FedData

By default, FedData produces datasets for horizontal federated learning

load_data()
class FedEval.dataset.synthetic_matrix_vertical

Bases: FedEval.dataset.FedDataBase.FedData

By default, FedData produces datasets for horizontal federated learning

load_data()
class FedEval.dataset.synthetic_matrix_horizontal_memmap

Bases: FedEval.dataset.FedDataBase.FedData

By default, FedData produces datasets for horizontal federated learning

load_data()
iid_data(save_file=True)
class FedEval.dataset.ml25m_matrix

Bases: FedEval.dataset.FedDataBase.FedData

By default, FedData produces datasets for horizontal federated learning

load_data()
class FedEval.dataset.vertical_linear_regression

Bases: FedVerticalMatrix

By default, FedData produces datasets for horizontal federated learning

load_data()
class FedEval.dataset.vertical_linear_regression_memmap

Bases: FedVerticalMatrix

By default, FedData produces datasets for horizontal federated learning

load_data()
iid_data(save_file=True)
class FedEval.dataset.ml25m_matrix_memmap

Bases: vertical_linear_regression_memmap

By default, FedData produces datasets for horizontal federated learning

load_data()
class FedEval.dataset.ml100k_lr

Bases: FedVerticalMatrix

By default, FedData produces datasets for horizontal federated learning

load_data()
class FedEval.dataset.FedData

By default, FedData produces datasets for horizontal federated learning

property need_regenerate: bool
_load_and_process_data()
abstract load_data()
iid_data(save_file=True)
_save_dataset_files(dataset: List[Mapping[str, List[numpy.ndarray]]]) None
non_iid_data(save_file=True, called_in_iid=False) List[Mapping[str, List[numpy.ndarray]]]
FedEval.dataset.normalize_text(text)

Final cleanup of text by removing non-alpha characters like ‘

‘, ‘ ‘… and

non-latin characters + stripping.

inputs:
  • text (str): tweet to be processed

return:
  • text (str): preprocessed tweet

FedEval.dataset.hashtags_preprocess(x)

Creating a hashtag token and processing the formatting of hastags, i.e. separate uppercase words if possible, all letters to lowercase.

inputs:
  • x (regex group): x.group(1) contains the text associated with a hashtag

Returns:

preprocessed text

Return type:

  • text (str)

FedEval.dataset.allcaps_preprocess(x)

If text/word written in uppercase, change to lowercase and tag with <allcaps>.

inputs:
  • x (regex group): x.group() contains the text

Returns:

preprocessed text

Return type:

  • text (str)

FedEval.dataset.glove_preprocess(text)

To be consistent with use of GloVe vectors, we replicate most of their preprocessing. Therefore the word distribution should be close to the one used to train the embeddings. Adapted from https://nlp.stanford.edu/projects/glove/preprocess-twitter.rb

inputs:
  • text (str): tweet to be processed

Returns:

preprocessed tweet

Return type:

  • text (str)

FedEval.dataset.tweet2Vec(tweet, word2vectors)

Takes in a processed tweet, tokenizes it, converts to GloVe embeddings (or zeroes if words are unknown) and applies average pool to obtain one vector for that tweet.

inputs:
  • tweet (str): one raw tweet from the dataset

  • word2vectors (dict): GloVe words mapped to GloVe vectors

Returns:

resulting sentence vector (shape: (200,))

Return type:

  • embeddings (np.array)

class FedEval.dataset.sentiment140

Bases: FedEval.dataset.FedDataBase.FedData

By default, FedData produces datasets for horizontal federated learning

load_data()
class FedEval.dataset.FedData

By default, FedData produces datasets for horizontal federated learning

property need_regenerate: bool
_load_and_process_data()
abstract load_data()
iid_data(save_file=True)
_save_dataset_files(dataset: List[Mapping[str, List[numpy.ndarray]]]) None
non_iid_data(save_file=True, called_in_iid=False) List[Mapping[str, List[numpy.ndarray]]]
FedEval.dataset.shuffle(X, Y)

Input (X, Y) pairs, shuffle and return it.

class FedEval.dataset.shakespeare

Bases: FedEval.dataset.FedDataBase.FedData

By default, FedData produces datasets for horizontal federated learning

load_data()
class FedEval.dataset.FedData

By default, FedData produces datasets for horizontal federated learning

property need_regenerate: bool
_load_and_process_data()
abstract load_data()
iid_data(save_file=True)
_save_dataset_files(dataset: List[Mapping[str, List[numpy.ndarray]]]) None
non_iid_data(save_file=True, called_in_iid=False) List[Mapping[str, List[numpy.ndarray]]]
FedEval.dataset.shuffle(X, Y)

Input (X, Y) pairs, shuffle and return it.

class FedEval.dataset.celeba

Bases: FedEval.dataset.FedDataBase.FedData

CelebA Median: Top 1000 clients

load_data()
class FedEval.dataset.FedData

By default, FedData produces datasets for horizontal federated learning

property need_regenerate: bool
_load_and_process_data()
abstract load_data()
iid_data(save_file=True)
_save_dataset_files(dataset: List[Mapping[str, List[numpy.ndarray]]]) None
non_iid_data(save_file=True, called_in_iid=False) List[Mapping[str, List[numpy.ndarray]]]
FedEval.dataset.shuffle(X, Y)

Input (X, Y) pairs, shuffle and return it.

class FedEval.dataset.femnist

Bases: FedEval.dataset.FedDataBase.FedData

FEMnistLarge, 3500 Clients FEMnistMedian, 1000 Clients FEMnistSmall, 100 Clients

load_data()
FedEval.dataset.get_data_shape(dataset_name: str)