Flame is a platform that enables developers to compose and deploy federated learning (FL) training workloads easily. The system is comprised of a service (control plane) and a python library (data plane). The service manages machine learning workloads, while the python library facilitates composition of ML workloads. And the library is also responsible for executing FL workloads. With extensibility of its library, Flame can support various experimentations and use cases.
We have improved Flame with a redesigned control plane and data plane (called LIFL) for efficient FL aggregation at scale. LIFL leverages shared memory processing to achieve high-performance communication for hierarchical aggregation. We also introduce locality-aware placement in LIFL to maximize the benefits of shared memory processing. LIFL precisely scales and carefully reuses the resources for hierarchical aggregation to achieve the highest degree of parallelism while minimizing the aggregation time and resource consumption.
The target runtime environment is Linux. Development has been mainly conducted under macOS environment. One should first set up a development environment. For more details, refer to here.
This repo has the following directory structure:
flame
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── Makefile -> build/Makefile
├── README.md
├── api (specification of REST API for flame apiserver)
├── build (configuration files for building flame binaries and container image)
├── cmd (source files for flame control plane)
├── docs (document folder)
├── examples (example folder)
├── fiab (dev/test env in a single box)
├── go.mod
├── go.sum
├── lib (python library for core flame data plane)
├── lint.sh
├── pkg (go packages for cmd)
└── scripts (utility scripts)
A full document can be found here. The document will be updated on a regular basis.
We welcome feedback, questions, and issue reports.
- Maintainers' email address: [email protected]
- GitHub Issues
@inproceedings{flame2023,
author = {Harshit Daga and Jaemin Shin and Dhruv Garg and Ada Gavrilovska and Myungjin Lee and Ramana Rao Kompella},
title = {Flame: Simplifying Topology Extension in Federated Learning},
year = {2023},
booktitle = {Proceedings of the 2023 ACM Symposium on Cloud Computing},
keywords = {Federated Learning, Distributed Machine Learning},
series = {SoCC '23}
}
@inproceedings{lifl-mlsys24,
author = {Qi, Shixiong and Ramakrishnan, K. K. and Lee, Myungjin},
title = {LIFL: A Lightweight, Event-Driven Serverless Platform for Federated Learning},
year = {2024},
booktitle = {Proceedings of Machine Learning and Systems},
}