5.1.1.4. FedEval.dataset
5.1.1.4.1. Submodules
5.1.1.4.2. Package Contents
5.1.1.4.2.1. Classes
By default, FedData produces datasets for horizontal federated learning |
|
By default, FedData produces datasets for horizontal federated learning |
|
By default, FedData produces datasets for horizontal federated learning |
|
By default, FedData produces datasets for horizontal federated learning |
|
By default, FedData produces datasets for horizontal federated learning |
|
the base class of singletons. |
|
By default, FedData produces datasets for horizontal federated learning |
|
By default, FedData produces datasets for horizontal federated learning |
|
By default, FedData produces datasets for horizontal federated learning |
|
By default, FedData produces datasets for horizontal federated learning |
|
By default, FedData produces datasets for horizontal federated learning |
|
By default, FedData produces datasets for horizontal federated learning |
|
By default, FedData produces datasets for horizontal federated learning |
|
By default, FedData produces datasets for horizontal federated learning |
|
By default, FedData produces datasets for horizontal federated learning |
|
By default, FedData produces datasets for horizontal federated learning |
|
By default, FedData produces datasets for horizontal federated learning |
|
By default, FedData produces datasets for horizontal federated learning |
|
By default, FedData produces datasets for horizontal federated learning |
|
By default, FedData produces datasets for horizontal federated learning |
|
By default, FedData produces datasets for horizontal federated learning |
|
By default, FedData produces datasets for horizontal federated learning |
|
CelebA Median: Top 1000 clients |
|
By default, FedData produces datasets for horizontal federated learning |
|
FEMnistLarge, 3500 Clients |
5.1.1.4.2.2. Functions
|
Input (X, Y) pairs, shuffle and return it. |
|
Input (X, Y) pairs, shuffle and return it. |
|
|
|
|
|
Final cleanup of text by removing non-alpha characters like ' |
Creating a hashtag token and processing the formatting of hastags, i.e. separate uppercase words |
|
If text/word written in uppercase, change to lowercase and tag with <allcaps>. |
|
|
To be consistent with use of GloVe vectors, we replicate most of their preprocessing. |
|
Takes in a processed tweet, tokenizes it, converts to GloVe embeddings |
|
Input (X, Y) pairs, shuffle and return it. |
|
Input (X, Y) pairs, shuffle and return it. |
|
Input (X, Y) pairs, shuffle and return it. |
|
- class FedEval.dataset.FedData
By default, FedData produces datasets for horizontal federated learning
- property need_regenerate: bool
- _load_and_process_data()
- abstract load_data()
- iid_data(save_file=True)
- _save_dataset_files(dataset: List[Mapping[str, List[numpy.ndarray]]]) None
- non_iid_data(save_file=True, called_in_iid=False) List[Mapping[str, List[numpy.ndarray]]]
- FedEval.dataset.shuffle(X, Y)
Input (X, Y) pairs, shuffle and return it.
- class FedEval.dataset.mnist
Bases:
FedEval.dataset.FedDataBase.FedDataBy default, FedData produces datasets for horizontal federated learning
- load_data()
- class FedEval.dataset.cifar10
Bases:
FedEval.dataset.FedDataBase.FedDataBy default, FedData produces datasets for horizontal federated learning
- load_data()
- class FedEval.dataset.cifar100
Bases:
FedEval.dataset.FedDataBase.FedDataBy default, FedData produces datasets for horizontal federated learning
- load_data()
- class FedEval.dataset.FedData
By default, FedData produces datasets for horizontal federated learning
- property need_regenerate: bool
- _load_and_process_data()
- abstract load_data()
- iid_data(save_file=True)
- _save_dataset_files(dataset: List[Mapping[str, List[numpy.ndarray]]]) None
- non_iid_data(save_file=True, called_in_iid=False) List[Mapping[str, List[numpy.ndarray]]]
- FedEval.dataset.shuffle(X, Y)
Input (X, Y) pairs, shuffle and return it.
- class FedEval.dataset.ConfigurationManager(data_config: RawConfigurationDict = _DEFAULT_D_CFG, model_config: RawConfigurationDict = _DEFAULT_MDL_CFG, runtime_config: RawConfigurationDict = _DEFAULT_RT_CFG, thread_safe: bool = False)
Bases:
FedEval.config.singleton.Singleton,ConfigurationManagerInterface,ClientConfigurationManagerInterface,ServerConfigurationManagerInterface,_CfgYamlInterface,_CfgJsonInterface,_CfgFileInterface,_RoledConfigurationInterfacethe base class of singletons. Each cls on the inheritance tree can own only one instance.
- property data_unique_id
- property config_unique_id
- property data_dir_name: str
The output directory of the clients’ data.
- Returns:
the name of the data directory.
- Return type:
str
- property log_dir_path: str
the path of the base of log directory.
- property history_record_path: str
the path of the history record.
- property job_id: str
- property encoding: str
the encoding scheme during (de)serialization.
- property data_config_filename: str
- property model_config_filename: str
- property runtime_config_filename: str
- property data_config: _DataConfig
- property model_config: _ModelConfig
- property runtime_config: _RuntimeConfig
- property num_of_train_clients_contacted_per_round: int
the number of clients selected to participate the main federated process in each round.
- property num_of_eval_clients_contacted_per_round: int
the number of clients selected to participate the main federated process in each round.
- property role: FedEval.config.role.Role
return the role of this runtime entity.
- Raises:
AttributeError – called without role configured.
- Returns:
the role of this runtime entity.
- Return type:
- __init_once_lock
- __initiated = False
- _init_file_names(data_config_filename: str = DEFAULT_D_CFG_FILENAME_YAML, model_config_filename: str = DEFAULT_MDL_CFG_FILENAME_YAML, runtime_config_filename: str = DEFAULT_RT_CFG_FILENAME_YAML) None
- classmethod generate_unique_id(data_config: dict, model_config: dict, runtime_config: dict)
- static _get_md5(config_string)
- static load_configs(src_path, serializer: str | _CfgSerializer = _CfgSerializer.YAML, data_config_filename: str = DEFAULT_D_CFG_FILENAME_YAML, model_config_filename: str = DEFAULT_MDL_CFG_FILENAME_YAML, runtime_config_filename: str = DEFAULT_RT_CFG_FILENAME_YAML, encoding=_DEFAULT_ENCODING) Tuple[RawConfigurationDict, RawConfigurationDict, RawConfigurationDict]
- static save_configs(data_cfg: RawConfigurationDict, model_cfg: RawConfigurationDict, runtime_cfg: RawConfigurationDict, dst_path, data_config_filename: str = DEFAULT_D_CFG_FILENAME_YAML, model_config_filename: str = DEFAULT_MDL_CFG_FILENAME_YAML, runtime_config_filename: str = DEFAULT_RT_CFG_FILENAME_YAML, encoding=_DEFAULT_ENCODING, serializer: str | _CfgSerializer = _CfgSerializer.YAML)
- static from_files(src_path: str, data_config_filename: str = DEFAULT_D_CFG_FILENAME_YAML, model_config_filename: str = DEFAULT_MDL_CFG_FILENAME_YAML, runtime_config_filename: str = DEFAULT_RT_CFG_FILENAME_YAML, serializer: str | _CfgSerializer = _CfgSerializer.YAML, encoding=_DEFAULT_ENCODING)
- to_files(dst_dir_path: str, serializer: str | _CfgSerializer = _CfgSerializer.YAML, encoding: str | None = None) None
- __init_role() None
- class FedEval.dataset.FedVerticalMatrix
Bases:
FedEval.dataset.FedDataBase.FedData,abc.ABCBy default, FedData produces datasets for horizontal federated learning
- non_iid_data(*args)
- iid_data(save_file=True)
- class FedEval.dataset.wine
Bases:
FedVerticalMatrixBy default, FedData produces datasets for horizontal federated learning
- load_data()
- class FedEval.dataset.mnist_matrix
Bases:
FedVerticalMatrixBy default, FedData produces datasets for horizontal federated learning
- load_data()
- FedEval.dataset.load_synthetic(m, n, alpha)
- FedEval.dataset.load_synthetic_large_scale(m, n, alpha)
- class FedEval.dataset.synthetic_matrix_horizontal
Bases:
FedEval.dataset.FedDataBase.FedDataBy default, FedData produces datasets for horizontal federated learning
- load_data()
- class FedEval.dataset.synthetic_matrix_vertical
Bases:
FedEval.dataset.FedDataBase.FedDataBy default, FedData produces datasets for horizontal federated learning
- load_data()
- class FedEval.dataset.synthetic_matrix_horizontal_memmap
Bases:
FedEval.dataset.FedDataBase.FedDataBy default, FedData produces datasets for horizontal federated learning
- load_data()
- iid_data(save_file=True)
- class FedEval.dataset.ml25m_matrix
Bases:
FedEval.dataset.FedDataBase.FedDataBy default, FedData produces datasets for horizontal federated learning
- load_data()
- class FedEval.dataset.vertical_linear_regression
Bases:
FedVerticalMatrixBy default, FedData produces datasets for horizontal federated learning
- load_data()
- class FedEval.dataset.vertical_linear_regression_memmap
Bases:
FedVerticalMatrixBy default, FedData produces datasets for horizontal federated learning
- load_data()
- iid_data(save_file=True)
- class FedEval.dataset.ml25m_matrix_memmap
Bases:
vertical_linear_regression_memmapBy default, FedData produces datasets for horizontal federated learning
- load_data()
- class FedEval.dataset.ml100k_lr
Bases:
FedVerticalMatrixBy default, FedData produces datasets for horizontal federated learning
- load_data()
- class FedEval.dataset.FedData
By default, FedData produces datasets for horizontal federated learning
- property need_regenerate: bool
- _load_and_process_data()
- abstract load_data()
- iid_data(save_file=True)
- _save_dataset_files(dataset: List[Mapping[str, List[numpy.ndarray]]]) None
- non_iid_data(save_file=True, called_in_iid=False) List[Mapping[str, List[numpy.ndarray]]]
- FedEval.dataset.normalize_text(text)
Final cleanup of text by removing non-alpha characters like ‘
- ‘, ‘ ‘… and
non-latin characters + stripping.
- inputs:
text (str): tweet to be processed
- return:
text (str): preprocessed tweet
- FedEval.dataset.hashtags_preprocess(x)
Creating a hashtag token and processing the formatting of hastags, i.e. separate uppercase words if possible, all letters to lowercase.
- inputs:
x (regex group): x.group(1) contains the text associated with a hashtag
- Returns:
preprocessed text
- Return type:
text (str)
- FedEval.dataset.allcaps_preprocess(x)
If text/word written in uppercase, change to lowercase and tag with <allcaps>.
- inputs:
x (regex group): x.group() contains the text
- Returns:
preprocessed text
- Return type:
text (str)
- FedEval.dataset.glove_preprocess(text)
To be consistent with use of GloVe vectors, we replicate most of their preprocessing. Therefore the word distribution should be close to the one used to train the embeddings. Adapted from https://nlp.stanford.edu/projects/glove/preprocess-twitter.rb
- inputs:
text (str): tweet to be processed
- Returns:
preprocessed tweet
- Return type:
text (str)
- FedEval.dataset.tweet2Vec(tweet, word2vectors)
Takes in a processed tweet, tokenizes it, converts to GloVe embeddings (or zeroes if words are unknown) and applies average pool to obtain one vector for that tweet.
- inputs:
tweet (str): one raw tweet from the dataset
word2vectors (dict): GloVe words mapped to GloVe vectors
- Returns:
resulting sentence vector (shape: (200,))
- Return type:
embeddings (np.array)
- class FedEval.dataset.sentiment140
Bases:
FedEval.dataset.FedDataBase.FedDataBy default, FedData produces datasets for horizontal federated learning
- load_data()
- class FedEval.dataset.FedData
By default, FedData produces datasets for horizontal federated learning
- property need_regenerate: bool
- _load_and_process_data()
- abstract load_data()
- iid_data(save_file=True)
- _save_dataset_files(dataset: List[Mapping[str, List[numpy.ndarray]]]) None
- non_iid_data(save_file=True, called_in_iid=False) List[Mapping[str, List[numpy.ndarray]]]
- FedEval.dataset.shuffle(X, Y)
Input (X, Y) pairs, shuffle and return it.
- class FedEval.dataset.shakespeare
Bases:
FedEval.dataset.FedDataBase.FedDataBy default, FedData produces datasets for horizontal federated learning
- load_data()
- class FedEval.dataset.FedData
By default, FedData produces datasets for horizontal federated learning
- property need_regenerate: bool
- _load_and_process_data()
- abstract load_data()
- iid_data(save_file=True)
- _save_dataset_files(dataset: List[Mapping[str, List[numpy.ndarray]]]) None
- non_iid_data(save_file=True, called_in_iid=False) List[Mapping[str, List[numpy.ndarray]]]
- FedEval.dataset.shuffle(X, Y)
Input (X, Y) pairs, shuffle and return it.
- class FedEval.dataset.celeba
Bases:
FedEval.dataset.FedDataBase.FedDataCelebA Median: Top 1000 clients
- load_data()
- class FedEval.dataset.FedData
By default, FedData produces datasets for horizontal federated learning
- property need_regenerate: bool
- _load_and_process_data()
- abstract load_data()
- iid_data(save_file=True)
- _save_dataset_files(dataset: List[Mapping[str, List[numpy.ndarray]]]) None
- non_iid_data(save_file=True, called_in_iid=False) List[Mapping[str, List[numpy.ndarray]]]
- FedEval.dataset.shuffle(X, Y)
Input (X, Y) pairs, shuffle and return it.
- class FedEval.dataset.femnist
Bases:
FedEval.dataset.FedDataBase.FedDataFEMnistLarge, 3500 Clients FEMnistMedian, 1000 Clients FEMnistSmall, 100 Clients
- load_data()
- FedEval.dataset.get_data_shape(dataset_name: str)