Software Reliability for Concurrent and Distributed Systems
Led by Burcu Kulahcioglu Ozkan
Develop techniques and tools for increasing the reliability of concurrent and distributed systems.
Today’s software is evolving in the direction of more concurrency and decentralization. With the increasing use of mobile devices and cloud services, the applications we use today are deployed to geo-replicated distributed systems, easily accessible from everywhere. However, the increased complexity of the software systems makes it more difficult to reason about possible behaviors of a system and to produce correct software.
It is challenging to implement distributed systems correctly since their behavior is more complicated than classical sequential programs. The nondeterminism in the delivery order of concurrent messages, network failures, or node crashes may result in subtle executions that lead to buggy behavior. It is difficult for the programmers to consider all possible executions during the system design and implementation. The reliability of distributed systems requires different techniques than those designed for sequential software.
This research line aims to build program analysis, testing, and debugging methods for concurrent programs and distributed systems. Our research interests span a broad spectrum of concurrent programs: multi-threaded, asynchronous, event-driven, and distributed systems.
We aim to build software analysis and testing methods for including (but not limited to):
- Decentralized consensus systems and blockchains
- Distributed systems with weak consistency and weak isolation
- Distributed systems with microservice architecture
- Shared-memory multicore programs
Recent Awards:
-
“Distinguished paper award” for our paper “Randomized Testing of Byzantine Fault Tolerant Algorithms” at OOPSLA’23.
-
“Stellar Academic Research Grant” for the research proposal “Feedback-guided fault-injection testing of blockchain systems” from the Stellar Development Foundation.
-
Ripple Bug Bounty Program Award with the bug our recent work discovered in the XRP Ledger of Ripple. Levin Winter’s contribution to the bug fix is acknowledged in the release notes of XRP Ledger version 1.10.0.
-
“Amazon Research Award” for the research proposal “Coverage-directed randomized testing of distributed systems” in Fall 2022.
Related MSc Courses:
- CS4405 - Analysis of Concurrent and Distributed Programs (course page)
- IN4315 - Software Architecture (course page)
Contact if you are interested in working on software testing, program analysis, concurrent programming, distributed systems, and blockchains.