---
tags: [vertical, tabular, advanced, fds]
dataset: [Titanic]
framework: [torch, pandas, scikit-learn]
---

# Vertical Federated Learning with Flower

[<img src="_static/view-gh.png" alt="View on GitHub" width="200"/>](https://github.com/adap/flower/blob/main/examples/vertical-fl)

This example will showcase how you can perform Vertical Federated Learning using
Flower. We'll be using the [Titanic dataset](https://huggingface.co/datasets/julien-c/titanic-survival)
to train simple regression models for binary classification. We will go into
more details below, but the main idea of Vertical Federated Learning is that
each client is holding different feature sets of the same dataset and that the
server is holding the labels of this dataset.

|                       | Horizontal Federated Learning (HFL or just FL)                                                                                                                                                         | Vertical Federated Learning (VFL)                                                                                                                                                                                                                                                                                                                                   |
| --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Data Distribution     | Clients have different data instances but share the same feature space. Think of different hospitals having different patients' data (samples) but recording the same types of information (features). | Each client holds different features for the same instances. Imagine different institutions holding various tests or measurements for the same group of patients.                                                                                                                                                                                                   |
| Model Training        | Each client trains a model on their local data, which contains all the feature columns for its samples.                                                                                                | Clients train models on their respective features without having access to the complete feature set. Each model only sees a vertical slice of the data (hence the name 'Vertical').                                                                                                                                                                                 |
| Aggregation           | The server aggregates these local models by averaging the parameters or gradients to update a global model.                                                                                            | The server aggregates the updates such as gradients or parameters, which are then used to update the global model. However, since each client sees only a part of the features, the server typically has a more complex role, sometimes needing to coordinate more sophisticated aggregation strategies that may involve secure multi-party computation techniques. |
| Privacy Consideration | The raw data stays on the client's side, only model updates are shared, which helps in maintaining privacy.                                                                                            | VFL is designed to ensure that no participant can access the complete feature set of any sample, thereby preserving the privacy of data.                                                                                                                                                                                                                            |

|               HFL               |               VFL               |
| :-----------------------------: | :-----------------------------: |
| ![HFL diagram](_static/hfl.png) | ![VFL diagram](_static/vfl.png) |

Those diagrams illustrate HFL vs VFL using a simplified version of what we will be building in this example. Note that on the VFL side, the server holds the labels (the `Survived` column) and will be the only one capable of performing evaluation.

## Set up the project

### Clone the project

Start by cloning the example project:

```shell
git clone --depth=1 https://github.com/adap/flower.git _tmp \
        && mv _tmp/examples/vertical-fl . \
        && rm -rf _tmp \
        && cd vertical-fl
```

This will create a new directory called `vertical-fl` with the following structure:

```shell
vertical-fl
├── vertical_fl
│   ├── __init__.py
│   ├── client_app.py   # Defines your ClientApp
│   ├── server_app.py   # Defines your ServerApp
│   └── task.py         # Defines your model, training and data loading
├── pyproject.toml      # Project metadata like dependencies and configs
└── README.md
```

### Install dependencies and project

Install the dependencies defined in `pyproject.toml` as well as the `vertical_fl` package.

```bash
pip install -e .
```

## Vertical data partitioning

In this example we use the [VerticalSizePartitioner](https://flower.ai/docs/datasets/ref-api/flwr_datasets.partitioner.VerticalSizePartitioner.html#flwr_datasets.partitioner.VerticalSizePartitioner) from [Flower Datasets](https://flower.ai/docs/datasets/) to vertically split the dataset into 3 partitions (one for each client) with the target column (i.e. whether the passenger survived the Titanic sinking) being available at the `ServerApp` only.

```python
from flwr_datasets import FederatedDataset
from flwr_datasets.partitioner import VerticalSizePartitioner

partitioner = VerticalSizePartitioner(
    partition_sizes=[2, 3, 2], # three partitions with 2,3 and 2 features
    active_party_columns="Survived", # the target
    active_party_columns_mode="create_as_last" # An additional partition will be created 
)                                              # that only contains the target column

fds = FederatedDataset(
    dataset="julien-c/titanic-survival",
    partitioners={"train": partitioner}
)
# Load all partitions
partitions = [fds.load_partition(i) for i in range(fds.partitioners["train"].num_partitions)]

for partition in partitions:
        print(partition.column_names)

# ['Age', 'Sex'] <----------------------------------- ClientApp #0
# ['Fare', 'Siblings/Spouses Aboard', 'Name'] <------ ClientApp #1
# ['Parents/Children Aboard', 'Pclass'] <------------ ClientApp #2
# ['Survived'] <--------------------------------------ServerApp
```

You can control the number of partitions as well as how many features each have by modifying `feature-splits` (defaults to \[`2,3,2`\]) in the `[tool.flwr.app.config]` section of the `pyproject.toml`.

## Run the project

You can run your Flower project in both _simulation_ and _deployment_ mode without making changes to the code. If you are starting with Flower, we recommend you using the _simulation_ mode as it requires fewer components to be launched manually. By default, `flwr run` will make use of the Simulation Engine.

### Run with the Simulation Engine

> [!NOTE]
> Check the [Simulation Engine documentation](https://flower.ai/docs/framework/how-to-run-simulations.html) to learn more about Flower simulations and how to optimize them.

By default, the example runs for 250 rounds using three clients. Launch it with default settings with:

```bash
flwr run .
```

The expected last lines of the log should look like:

```shell
...
INFO :      --- ServerApp Round 250 / 250 ---
INFO :      Requesting embeddings from 3 nodes...
INFO :          Received 3/3 results
INFO :      Round 250, Loss: 0.3096, Accuracy: 84.94%
INFO :      Sending gradients to 3 nodes...
INFO :      
INFO :      === Final Results ===
INFO :      Round 25 -> Loss: 0.6458 | Accuracy: 61.89%
INFO :      Round 50 -> Loss: 0.5888 | Accuracy: 67.76%
INFO :      Round 75 -> Loss: 0.5265 | Accuracy: 72.49%
INFO :      Round 100 -> Loss: 0.4939 | Accuracy: 74.41%
INFO :      Round 125 -> Loss: 0.4499 | Accuracy: 78.13%
INFO :      Round 150 -> Loss: 0.4012 | Accuracy: 81.51%
INFO :      Round 175 -> Loss: 0.3691 | Accuracy: 82.07%
INFO :      Round 200 -> Loss: 0.3710 | Accuracy: 82.75%
INFO :      Round 225 -> Loss: 0.3172 | Accuracy: 84.44%
INFO :      Round 250 -> Loss: 0.3096 | Accuracy: 84.94%
```

You can also override some of the settings for your `ClientApp` and `ServerApp` defined in `pyproject.toml`. For example:

```bash
flwr run . --run-config "num-server-rounds=500 learning-rate=0.05"
```

### Run with the Deployment Engine

Follow this [how-to guide](https://flower.ai/docs/framework/how-to-run-flower-with-deployment-engine.html) to run the same app in this example but with Flower's Deployment Engine. After that, you might be intersted in setting up [secure TLS-enabled communications](https://flower.ai/docs/framework/how-to-enable-tls-connections.html) and [SuperNode authentication](https://flower.ai/docs/framework/how-to-authenticate-supernodes.html) in your federation.
If you are already familiar with how the Deployment Engine works, you may want to learn how to run it using Docker. Check out the [Flower with Docker](https://flower.ai/docs/framework/docker/index.html) documentation.