Confidential Data Analytics

Staff - Faculty of Informatics

Date: 5 September 2024 / 14:30 - 17:30

USI East Campus, Room D0.03

You are cordially invited to attend the PhD Dissertation Defence of Shamiek Mangipudi on Thursday 5 September 2024 at 14:30 in room D0.03.

Abstract:
Cloud computing has introduced a new paradigm in the computing world. Widely known for its ability to harness the power of hyperscale compute infrastructures, cloud computing also brings the inevitable downside of cloud users having to process data in untrusted third-party servers which often get exploited on their vulnerabilities. With data at rest and in transit secured by cryptographic measures, despite best efforts from cloud service providers (CSPs) to secure the cloud stack, new kinds of attacks on data in use involving hypervisor and container breakouts, firmware compromise, insider threats, etc. which target either code or data in use have drawn significant attention. These downsides deter widespread adoption of cloud infrastructures for data processing including critical data such as per- sonally identifiable information (PII), financial data, medical records; compelling many to forgo the substantial economies of scale offered by the cloud. An emerging trend of hardware based trusted execution environment (TEE) technologies aim to remove the CSP from the trusted computing base (TCB) and further reduce the attack surface and thereby protect data in use. On the other hand, there exist viable software solutions based on algorithmic approaches like partially homomorphic encryption (PHE) to protect data in use. In the wake of this, it becomes hard to adopt the right solution or a combination thereof with solutions having different development interfaces, programmability efforts, availability, security guarantees, threat models, and performance characteristics. In response, first, we lay the formal foundations for a solution that uses both Intel software guard extensions (SGX) and PHE in isolation or their hybrid combo through our hybrid approach to distributed confidentiality-preserving data analytics (Hydra) which guarantees end-to-end confidentiality through a novel formulation of noninterference (NI). Then, second, through Scylla, a more system-centric thrust, we develop a fascinating approach by modifying Spark’s runtime which generalizes to all virtual machine (VM)-like TEEs and can also use process-based TEEs like SGX. An accompanying heuristic API can cook up a variety of heuristics which end up defining how and where to plug-in various TEEs and PHE realizations. Finally, we gather the overarching theme from our two previous solutions and present a new paradigm dubbed security policy as code (SPaC) which aims to decrease the gap between security policies and software systems which implement them. All in all, our solutions present a modular approach that ease the development of secure distributed data processing applications for people from diverse backgrounds who are not necessarily well-versed with the intricacies from the realm of security.

Dissertation Committee:
- Prof. Patrick Thomas Eugster, Università della Svizzera italiana, Switzerland (Research Advisor)
- Prof. Carlo Alberto Furia, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Marc Langheinrich, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Stephen Chong, Harvard University (External Member)
- Prof. Pascal Felber, University of Neuchatel (External Member)