Student Projects

The DaST group is always interested to supervise student projects and theses that contribute to the on-going research in the group. If you are interested, please send the project choice from the following list and your CV by email to a group member.

Current Student Projects & Theses

Surgical Phase Recognition using Vision-Language Models (MSc thesis, March 2025 - )
with ETH Computer Vision Lab
Student: Junyong Cao
Leveraging Multi-Task Learning for Improved Spatial Transcriptomics Prediction from Hematoxylin and Eosin Images (MSc thesis, Feb 2025 -)
with Max Plank ETH Center for Learning Systems
Student: David Lebrec
Location specific collaborator suggestion based on papers (MSc thesis, Feb 2025 - )
with ETH Neuroinformatics
Student: Grigor Dochev
Machine Learning-Guided Combinatorial Optimization for Cost-Effective Passenger Rebooking (MSc thesis, Dec 2024 - )
with SWISS Airlines
Student: Georgia Lazaridou
Symbolic Matrix Decompositions in Symbolica (MSc independent study, Nov 2024 - )
Student: Zisen Liu
Randomized Linear Algebra over Database Joins (MSc thesis, Oct 2024 - )
Student: Deborah Stäubler
Automatic Analysis Of The Marginal Bone Level Around Dental Implants (MSc project, June 2024 - )
with Prof Toda and Ms Settecase (UZH Clinic of Reconstructive Dentistry)
Students: Xinyao Cao, Xuan Ji, Petrovic Teodora, Rubesa Nina, Xue Wang

Completed Student Projects & Theses

Implementing factorized incremental view maintenance on top of Flink (MSc thesis, April - December 2024)
Student: Yizhi Zhang
Counterfactual prediction-based analysis of spatial transcriptomics (MSc thesis, April - Oct 2024)
with Center of Experimental Rheumatology (University Hospital Zurich)
Student: Ming Yi
Dynamic Model Counting (internship, Feb - July 2024)
Rémy Kimbrough (Ecole Normale Supérieure, Paris)
Tensor Decomposition by FAQs (MSc thesis, April - Oct 2014)
Student: Yuchen He
Query Language Support for the FRANTIC query engine (BSc thesis, March - Sept 2024 )
Student: Timo Lennard Tietje
Efficient Computation of Banzhaf and Shapley Value for Database Queries (MSc thesis, March - Oct 2024)
Student: Samuel Andreas Brügger
Improving Prompting Strategies for Reasoning Tasks (MSc thesis, Jan - July 2024)
with NLPED lab (ETH)
Student: Adarsh Shivam
Leveraging Artificial Intelligence to Enhance Education in High Schools (MSc thesis, Feb 2024 - July 2024)
with NLPED lab (ETH)
Student: Alessandro Vanzo
Wrist-worn Accelerometer-based Chronic Disease Classification: Model-driven and Data-Driven Approaches (MSc thesis, Nov 2023 - May 2024)
with Health IS Lab (ETH)
Student: Songyi Han
Efficient Query Maintenance using Maximal Hierarchical Subqueries (MSc thesis, Nov 2023 - May 2024)
Student: Neng Xu
Dynamic Computation with Static and Dynamic Relations (MSc thesis, Nov 2023 - April 2024)
Student: Zheng Luo
A Hand-Tailored Benchmark for Celltyping on ASAP and BGEE Resources (MSc project, April 2023 - March 2024)
with Swiss Institute of Bioinformatics
Students: Ming Yi, Liyuan Rong
Cardinality Estimation using Lp-norms on Degree Sequences (MSc thesis, Sept 2023 - March 2024)
Student: Luis Torrejón Machado
SemantIQ: Semantic Query Answering with Natural Language on Knowledge Graphs (MSc project, Oct 2023 - Feb 2024)
Student: Jan Luca Sheerer (ETH)
M-Flow: Incremental View Maintenance of Multiple Query Workloads under Updates (MSc thesis, Feb - Nov 2023)
Student: Rui Zhou (ETH)
Diverse Answers to Queries (MSc project, Jan - Nov 2023 )
Students: Jun Tu, Thi Phuong Anh Pham, Julian Novoa Martin
Applications of Transformer-Based NLP in Recruitment: A Use Case in Education (MSc project, March - Oct 2023)
with the UZH Chair for Development and Emerging Markets
Students: Alessandro Vanzo, Le Hoang Minh Trinh, Maria Korobeynikova
KroneDB: Factorized Databases meet Kronecker Products (MSc thesis, Feb - Oct 2023)
Student: Thomas Rolf Mannhart
CaVieR: Multiple Query Maintenance using Cascading View Trees(MSc thesis, Jan - Aug 2023)
Student: Johann Schwabe
Adaptive Factorised Data Representations via Reinforcement Learning (MSc thesis, Jan - Aug 2023)
Student: Christoph Mayer
Convex Hull for SVM(MSc project, April - July 2023)
Students: Pascal Severin Andermatt, Luis Torrejon Machado
Federated Learning for Respiratory Disease Classification from Audio Recordings in a Pandemic Scenario(MSc thesis, Jan - July 2022)
with Department of Management, Technology and Economics at ETH Zurich
Student: Mesut Ceylan
Approximate Convex Hulls (MSc project, July 2022 - Jan 2023)

Students: Pascal Severin Andermatt, Tobias Fankhauser, Paul-Philipp Luley, Luis Torrejon Machado
Matrix Profile (Independent study, March - Sept 2022)

Student: Christoph Mayer
Dynamic Query Evaluation(Independent study, Dec 2021 - June 2022)

Student: Erlin Zylalaj
Feature Engineering (MSc project, July 2021 - Jan 2022)

Students: Carmen Christen, Lukas Vollenweider, Christoph Mayer

Strand 1: In-Database Machine Learning

Goal: Train machine learning (regression, classification) models over data matrices defined by relational queries. The project may focus more on theoretical or systems-building aspects, depending on the strength and interest of the student. The exact task (e.g., which ML task) can be agreed with the project supervisor.

Outcome: Good understanding of the research landscape on in-database machine learning; Design of a novel learning algorithm that exploits the relational structure of the underlying data to lower the computational complexity of the learning task; Implement the algorithm and benchmark it against existing solutions; Write-up a detailed report.

Context: Recent publications and talks (pdfs and videos) available at our publications page

Prerequisites:

Good programming skills in C++ or Julia
Conversant with: computational complexity; data structures and algorithms; databases (good grades in undergraduate courses on these topics); elements of machine learning (good grades in courses on Foundations of Data Science or Stats)
Desire to push the frontier of knowledge at the interface of databases and machine learning

Strand 2: Linear Algebra over Relational Data

Goal: Understand fundamental linear algebra operations over matrices, where the matrices are defined by queries over relational data. The operations of interest, e.g., various types of matrix decompositions such as QR using Given rotations, can be agreed with the project supervisor. The project may focus more on theoretical or systems-building aspects, depending on the strength and interest of the student.

Outcome: Good understanding of the research landscape on linear algebra over relational data; Design of an algorithm that exploits the relational structure of the underlying data to lower the computational complexity of the linear algebra task; Implement the algorithm and benchmark it against existing solutions; Write-up a detailed report.

Context: Several recent MSc theses supervised by the DAST team on various matrix decompositions: QR, SVD, quadratically-regularised low-rank, give a good idea of this strand of work (pdfs available at https://fdbresearch.github.io/)

Prerequisites:

Good programming skills in C++
Conversant with: linear algebra; computational complexity; data structures and algorithms; databases (good grades in undergraduate courses on these topics)
Desire to work on a hot research topic

Strand 3: Intersection Joins

Goal: Consider databases, where some colunmns can host multi-dimensional intervals instead of scalar values. Whereas on scalar values, a common notion of "agreement" is given by equality joins, for intervals one uses the intersection join that states whether the intervals overlap.
Much research has been dedicated to the evaluation of queries over databases with scalar values. In particular, research in the past decade proposed new and surprising algorithms for query evaluation that achieve worst-case optimality. They were also shown to perform much better than the standard join algorithms for a variery of settings.
This project will investigate novel algorithms for the evaluation of queries with intersection joins. The project can focus on the more theoretical or systems aspects of the problem. This is to be discussed with the supervisor.

Outcome: Good understanding of the research landscape on worst-case optimal join algorithms; Design and implementation of algorithms; Benchmarking against existing solutions; Write-up a detailed report with the findings.

Context: There are no existing results by the DAST group on this line of research. However, there is solid literature on two-way intersection joins and on worst-case optimal equality-joins.

Prerequisites:

Good programming skills in C++
Conversant with: computational complexity; data structures and algorithms; databases (good grades in undergraduate courses on these topics)
Desire to work on a hot research topic

Strand 4: Incremental Maintenance of Analytical Workloads

Goal: Maintain analytics (query results, machine learning models) over relational databases under updates.
Intensive research has been dedicated to the evaluation of queries over static relational databases. One outstanding achievement in this line of work was the design of algorithms that evaluate join queries in worst-case optimal time. It was shown that these algorithms perform much better than the standard join algorithms for cyclic queries.
Incremental view maintenance considers the setting where the database is subject to frequent updates, e.g. insertions and deletions of tuples, or the data comes as a stream of tuples. It aims at providing algorithms that refresh the query result after each update as fast as possible. The naive approach that recomputes the query result after each update from scratch is too time-consuming. The literature proposes techniques like delta processing or view materialisation that perform better than the naive approach.
The DAST group built a system called F-IVM that implements a novel maintenance mechanism over fast-evolving data. Experiments showed that F-IVM outperforms state-of-the-art systems. On the theoretical side, we designed the first worst-case optimal IVM approach that maintains the number of triangles in a database in worst-case optimal time. The approach was further extended to general triangle and hierarchical queries. Partly due to our exciting recent results, incremental maintenance is currently a very hot topic investigated by several research groups within the database community.
This line of projects will investigate novel algorithms for the maintenance of different query classes like conjunctive and aggregate queries, or even more complex analytics such as machine learning models over fast evolving relational data. The project can focus more on theoretical or systems-building aspects, depending on the strength and interest of the student.

Outcome: Good understanding of the research landscape on incremental maintenance techniques; Design of a novel maintenance algorithm; Implement the algorithm and benchmark it against existing solutions; Detailed report.

Context: Recent publications and talks (pdfs and videos) available at our publications page.

Maintaining Triangle Queries under Updates.[arxiv]
Ahmet Kara, Hung Q. Ngo, Milos Nikolic, Dan Olteanu, Haozhe Zhang. To appear in ACM Trans. Database Syst. 2020.
Trade-offs in Static and Dynamic Evaluation of Hierarchical Queries.[paper,arxiv, video]
Ahmet Kara, Milos Nikolic, Dan Olteanu, Haozhe Zhang. In PODS 2020.
Counting Triangles under Updates in Worst-Case Optimal Time.[paper, arxiv]
Ahmet Kara, Hung Ngo, Milos Nikolic, Dan Olteanu, and Haozhe Zhang. In ICDT 2019 (best paper award).
F-IVM: Learning over Fast Evolving Relational Data. [paper, arxiv, video]
Milos Nikolic, Haozhe Zhang, Ahmet Kara, Dan Olteanu. In SIGMOD 2020 (demonstration).
Incremental View Maintenance with Triple Lock Factorisation Benefits.[paper, arxiv]
Milos Nikolic and Dan Olteanu. In SIGMOD 2018.

Prerequisites:

Good programming skills in C++
Conversant with: computational complexity; data structures and algorithms; databases (good grades in undergraduate courses on these topics)
Desire to work on a hot research topic

Master's Project: Implementation of an Efficient Algorithm for the Dynamic Evaluation of Hierarchical Queries [presentation]

Department of Informatics Data Systems and Theory

Quicklinks und Sprachwechsel

Main navigation

Student Projects

Current Student Projects & Theses

Completed Student Projects & Theses

Strand 1: In-Database Machine Learning

Strand 2: Linear Algebra over Relational Data

Strand 3: Intersection Joins

Strand 4: Incremental Maintenance of Analytical Workloads