3.5. Customization

3.5.1. Customize the dataset

There are two steps to add a new dataset into FedEval:

Step 1: Create a new script in $ProjectPath/FedEval/dataset/ with the following content: inherit the FedData class in $ProjectPath/FedEval/dataset/FedDataBase.py, and override the load_data function.

Step 2: Set the shape information in $ProjectPath/FedEval/dataset/__init__.py

Next we will give a concrete example showing how to add a new dataset through the above two steps.

Assuming we want to add the IMDB emotion classification dataset.

Step 1 (Example):

Create the FedIMDB.py file with the following content:

import numpy as np
import tensorflow as tf
from .FedDataBase import FedData

# Example of using data from tf.keras.datasets
class imdb(FedData):
    def load_data(self):
        # load the data
        (x_train, y_train), (x_test, y_test) = tf.keras.datasets.imdb.load_data()
        x = np.concatenate((x_train, x_test), axis=0)
        y = np.concatenate((y_train, y_test), axis=0)
        # data processing (padding to the same length)
        fixed_len = int(np.median([len(e) for e in x]))
        for i in range(len(x)):
            if len(x[i]) > fixed_len:
                x[i] = x[i][-fixed_len:]
            else:
                x[i] = [0] * (fixed_len-len(x[i])) + x[i]
        x = np.array(list(x))
        y = np.expand_dims(y, axis=-1)
        
        # Set self.num_class (Required)
        # Currently we only support classification tasks, because the simulation 
        # of non-IID data needs the class labels. The non-classification tasks will
        # be supported in the future.
        self.num_class = y.shape[-1]
        
        """
        Shape of x: [50000, 176], i.e., [#samples, the frature dims]
        Shape of y: [50000, 1], i.e., [#samples, the label dims]
        """
        return x, y

# Example of using data from arbitary source
class intel(FedData):
    def load_data(self):
        
        # Step 1 : Put the data in $ProjectPath/FedEval/data
        # For example, the data path is $ProjectPath/FedEval/data/intel_image_classification
        # Data used in this example could be downloaded at : 
        # https://www.kaggle.com/puneet6060/intel-image-classification/version/2
        data_path = os.path.join(os.path.dirname(self.local_path), 'data', 'intel_image_classification')
		
        # By default, we load all images in seg_train as training data
        train_image_path = os.path.join(data_path, 'seg_train', 'seg_train')
        image_labels = sorted(os.listdir(train_image_path))
        x = []
        y = []
        for image_label in image_labels:
            curr_label = image_labels.index(image_label)
            for image_file in os.listdir(os.path.join(train_image_path, image_label)):
                tmp_image_data = cv2.imread(os.path.join(train_image_path, image_label, image_file))
                tmp_image_data = cv2.resize(tmp_image_data, (150, 150), interpolation=cv2.INTER_AREA)
                x.append(tmp_image_data)
                y.append(curr_label)

        """
        You may add image preprocessing steps here.
        """
		
        # Formatting the data into np.array
        x = np.array(x).astype(np.float32)
        y = np.expand_dims(np.array(y).astype(np.int32), -1)
        y = tf.keras.utils.to_categorical(y, self.num_class)

        # Shuffle the data
        x, y = shuffle(x, y)

        # Set self.num_class (Required)
        # Currently we only support classification tasks, because the simulation 
        # of non-IID data needs the class labels. The non-classification tasks will
        # be supported in the future.
        self.num_class = y.shape[-1]
        
        print(x.shape, y.shape)

        return x, y
    

Note (For Non-IID dataset): The IMDB dataset are not original used for federated learning, thus we do not have the information of who generate the data samples in real-world. If such information is available, e.g., who generate the data for each sample, we can utilize these information by setting value to self.identity. For example, in the load_data(self) function, we have self.identity = [10, 9, 10], then it means that the dataset is generated by three participants and they have 10, 9, and 10 samples respectively. Apart from setting self.identity properly, we need to sort the data samples (i.e., x) and labels (i.e., y) according to the identity, such that the x[:10],y[:10] are generated by the first participant in self.identity, x[10:19],y[10:19] are generated by the second one, and the last 10 samples in x,y are generated by the last participant.

Step 2 (Example):

Modify the $ProjectPath/FedEval/dataset/__init__.py, such that the new dataset could be used outside.

Two substeps, i.e., 2.1 and 2.2, are required.

from .FedImage import *
from .Sentiment140 import *
from .Shakespeare import *
# Step 2.1 Add the import information
from .FedIMDB import imdb

# Used by the server, because it cannot reach the raw data
def get_data_shape(dataset):
    if dataset == 'celeba':
        x_size = (None, 54, 44, 3)
        y_size = (None, 2)
    elif dataset == 'femnist':
        x_size = (None, 28, 28, 1)
        y_size = (None, 62)
    elif dataset == 'mnist':
        x_size = (None, 28, 28, 1)
        y_size = (None, 10)
    elif dataset == 'cifar10':
        x_size = (None, 32, 32, 3)
        y_size = (None, 10)
    elif dataset == 'cifar100':
        x_size = (None, 32, 32, 3)
        y_size = (None, 100)
    elif dataset == 'shakespeare':
        x_size = (None, 80)
        y_size = (None, 80)
    elif dataset == 'sentiment140':
        x_size = (None, 25, 200)
        y_size = (None, 1)
    # Step 2.2 Add the shape information
    elif dataset == 'imdb':  # the name is the same with the class name, i.e., class imdb(FedData)
        x_size = (None, 176)
        y_size = (None, 1)
    elif dataset == 'intel': # the name is the same with the class name
        x_size = (None, 150, 150, 3)
        y_size = (None, 6)
    else:
        raise ValueError('Unknown dataset', dataset)
    return x_size, y_size

Finish: Now the IMDB and Inter-Image-Classification datasets is available in FedEval by setting dataset to imdb or intel in the data config.

3.5.2. Customize the ML model

We build all ML models in FedEval using the subclass method from tensorflow.keras.

Two steps are required to add a customized ML model:

Step 1: Create a new script in $ProjectPath/FedEval/model/ with the following content: inherit the tf.keras.Model and build your own model.

Step 2: Import the model in $ProjectPath/FedEval/model/__init__.py

Next we give a concrete example showing how to add a new ML model in FedEval.

Assume we want to create a MLP with 2 layers, and each layer has 256 hidden units.

Step 1 (Example):

Create a new file mlp_example.py in $ProjectPath/FedEval/model/, and build the model like following:

import functools
import tensorflow as tf


class MLP_simplify(tf.keras.Model):

    def __init__(self, target_shape, **kwargs):
        super().__init__()
        
        num_classes = target_shape[-1]
        
        self.dense1 = tf.keras.layers.Dense(256)
        self.dense2 = tf.keras.layers.Dense(256)
        self.output = tf.keras.layers.Dense(num_class, activation='softmax')

    def call(self, inputs, training=None, mask=None):
        x = self.dense1(inputs)
        x = self.dense2(x)
        x = self.output(x)
        return x

Step 2 (Example):

Modify the $ProjectPath/FedEval/model/__init__.py refer the following example:

from .MLP import MLP
from .LeNet import LeNet
from .StackedLSTM import StackedLSTM
# Step 2 : Import the new model
from .mlp_example import MLP_simplify

Finish: Now this MLP model is available in FedEval by setting MLMode/name to MLP_simplify in the model config.

3.5.3. Customize the FL strategies

Two steps are required to add a new federated learning strategy:

Step 1: Create a new scripts in $ProjectPath/FedEval/strategy/ with the following content: inherit the FedAvg class in $ProjectPath/FedEval/strategy/FedAvg.py, and override the functions according to the federated learning strategy.

Step 2: Import the strategy in $ProjectPath/FedEval/strategy/__init__.py

Next we will give an example showing how to add FL strategies into FedEval.

Step 1 (Example)

Create a new scripts FedAvg_example.py in $ProjectPath/FedEval/strategy/, with the following content:

import os
import pickle
import numpy as np
import tensorflow as tf

from ..aggregator import aggregate_weighted_average
from ..model import *
from ..dataset import get_data_shape
from ..utils import ParamParser
from .FedAvg import FedAvg


class FedAvg_example(FedAvg):

    def __init__(self, role, data_config, model_config, runtime_config, param_parser=ParamParser):
        # By defaullt, the init function will do the following things:
        # (1) initialize the ML model at self.ml_model
        # (2) load the dataset if current role is client
        # (3) initialize some variables if current role is server, e.g., self.params
        super().__init__(role, data_config, model_config, runtime_config)

    # (1) Host functions
    def host_get_init_params(self):
        """
        This function is used by the server to initialize the params.
        The params are randomly initialized in FedAvg.
        Update this function if you have a different initiaization method.
        Otherwise, this function is not required to override.
        """
        self.params = self.ml_model.get_weights()
        return self.params

    # (1) Host functions
    def update_host_params(self, client_params, aggregate_weights):
        """
        This function is used by the server to update the global params.
        The clients' params are weighted averaged in FedAvg.
        Update this function if you have a different aggregation method.
        Otherwise, this function is not required to override.
        """
        self.params = aggregate_weighted_average(client_params, aggregate_weights)
        return self.params

    # (2) Client functions
    def set_host_params_to_local(self, host_params, current_round):
        """
        This function is used by the clients to set global model to local.
        The global model are directly set to local in FedAvg.
        Update this function if you have a different method.
        Otherwise, this function is not required to override.
        """
        if self.callback is not None:
            host_params = self.callback.on_setting_host_to_local(host_params)
        self.current_round = current_round
        self.ml_model.set_weights(host_params)

    # (2) Client functions
    def fit_on_local_data(self):
        """
        This function is used by the clients to train on local data.
        Update this function if you have a different method.
        Otherwise, this function is not required to override.
        """
        if self.callback is not None:
            self.train_data, model = self.callback.on_client_train_begin(
                data=self.train_data, model=self.ml_model.get_weights()
            )
            self.ml_model.set_weights(model)
        self.local_params_pre = self.ml_model.get_weights()
        train_log = self.ml_model.fit(
            x=self.train_data['x'], y=self.train_data['y'],
            epochs=self.model_config['FedModel']['E'],
            batch_size=self.model_config['FedModel']['B']
        )
        train_loss = train_log.history['loss'][-1]
        self.local_params_cur = self.ml_model.get_weights()
        return train_loss, self.train_data_size

    # (2) Client functions
    def retrieve_local_upload_info(self):
        """
        This function is used by the clients to get the uploading params.
        The whole model are uploaded by clients in FedAvg.
        Update this function if you have a different uploading solution.
        Otherwise, this function is not required to override.
        """
        model = self.ml_model.get_weights()
        if self.callback is not None:
            model = self.callback.on_client_upload_begin(model)
        return model

    # (2) Client functions
    def local_evaluate(self):
        """
        This function is used by the clients to evaluate on local data.
        Update this function if you have a different uploading solution.
        Otherwise, this function is not required to override.
        """
        evaluate = {}
        # val and test
        val_result = self.ml_model.evaluate(x=self.val_data['x'], y=self.val_data['y'])
        test_result = self.ml_model.evaluate(x=self.test_data['x'], y=self.test_data['y'])
        metrics_names = self.ml_model.metrics_names
        # Reformat
        evaluate.update({'val_' + metrics_names[i]: float(val_result[i]) for i in range(len(metrics_names))})
        evaluate.update({'test_' + metrics_names[i]: float(test_result[i]) for i in range(len(metrics_names))})
        # TMP
        evaluate.update({'val_size': self.val_data_size})
        evaluate.update({'test_size': self.test_data_size})
        return evaluate

Step 2 (Example)

Modify the $ProjectPath/FedEval/strategy/__init__.py refer the following example:

from .FedAvg import FedAvg, FedSGD
from .FedSTC import FedSTC
from .FedProx import FedProx
from .FedOpt import FedOpt
from .FedSCA import FedSCA
from .MFedAvg import MFedAvg, MFedSGD
from .LocalCentral import LocalCentral
# Import the new strategy
from .FedAvg_example import FedAvg_example

Finish: Now this FedAvg_example strategy is available in FedEval by setting FedModel/name to FedAvg_example in the model config.