Master’s Thesis: Automated Medical Report Generation From Doctor-Patient Conversations in an Orthopaedic Ambulatory Clinic

14.08.2025, Abschlussarbeiten, Bachelor- und Masterarbeiten

Project Background

The increasing documentation burden in everyday clinical practice is considered a factor contributing to the growing strain and exhaustion of medical staff [1]. Studies show that physicians spend an average of around 37% of their working time on documentation in both inpatient and outpatient care [2]. At the same time, recent advances in Natural Language Processing (NLP) and Automatic Speech Recognition (ASR) are enabling new approaches to support medical documentation.

The aim of this master's thesis is the automatic generation of medical reports from recorded doctor-patient conversations, thereby helping to reduce the documentation workload in clinical practice. As part of this master's thesis, an AI model is to be developed and evaluated that automatically generates medical reports in German based on conversations between doctors and patients.

Your Tasks

Simulated conversations from the field of endoprosthetics at the Clinic for Orthopaedics and Sports Orthopaedics, TUM University Hospital, serve as the data base. The use of these simulated data is intended as a proof of concept and aims to lay the foundation for future application in real doctor-patient conversations.

After completing the proof-of-concept phase, the developed prototype is to be integrated into an active learning pipeline that processes real audio recordings of doctor-patient conversations and automatically generates draft medical letters as text suggestions. In this phase, the focus is on the continuous improvement of the model through human feedback, to ensure increasing content and linguistic quality of the generated medical letters.

In the next step, potential optimizations through the use of a Retrieval-Augmented Generation (RAG) pipeline and preference-based learning (Preference Alignment Training) will be explored [3, 4, 5]. The goal is to further increase the contextual accuracy and relevance of the text generation and to better align the model with the needs and expectations of physicians.

The task includes, among other things:

Literature Review: Review of existing speech-to-text systems in the clinical context and automated medical report generation.
Design and development of a model for transforming conversation transcripts into structured medical letters.
Evaluation of the generated texts with regard to quality, completeness, and clinical relevance, incorporating physician feedback.
Documentation of the results and derivation of recommended actions.

What We Offer

Contribution to a practice-relevant research project in healthcare.
Supervision by an interdisciplinary team from medicine, computer science, and AI research.
Opportunity to publish the results.
Flexible start date.
Workstations for students in our office.

Prerequisites

Prior knowledge in NLP and ASR.
Interest in the medical context and in interdisciplinary collaboration.
Very good knowledge of German (at least C1). The recorded conversations and the medical reports are exclusively in German.

How to Apply

Send an email to laura.amenda@tum.de and marton.szep@tum.de with the following information:

Your CV
A brief introduction outlining your background and motivation
Your preferred start date
Academic transcripts

We look forward to receiving your email!

References

[1] Amanda J Moy, Jessica M Schwartz, RuiJun Chen, Shirin Sadri, Eugene Lucas, Kenrick D Cato, Sarah Collins Rossetti, Measurement of clinical documentation burden among physicians and nurses using electronic health records: a scoping review, Journal of the American Medical Informatics Association, Volume 28, Issue 5, May 2021, Pages 998–1008
https://doi.org/10.1093/jamia/ocaa325
[2] Pinevich, Y., Clark, K. J., Harrison, A. M., Pickering, B. W., & Herasevich, V. (2021). Interaction time with electronic health records: a systematic review. Applied clinical informatics, 12(04), 788-799.
https://doi.org/10.1055/s-0041-1733909
[3] Chen, Z., Deng, Y., Yuan, H., Ji, K., & Gu, Q. (2024). Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models. Proceedings of the 41st International Conference on Machine Learning, 6621–6642. https://proceedings.mlr.press/v235/chen24j.html
[4] Hong, J., Lee, N., & Thorne, J. (2024). ORPO: Monolithic Preference Optimization without Reference Model. In Y. Al-Onaizan, M. Bansal, & Y.-N. Chen (Eds.), Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (pp. 11170–11189). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.emnlp-main.626
[5] Rafailov, R., Sharma, A., Mitchell, E., Manning, C. D., Ermon, S., & Finn, C. (2023). Direct Preference Optimization: Your Language Model is Secretly a Reward Model. Advances in Neural Information Processing Systems, 36, 53728–53741.
Quiroz, J.C., Laranjo, L., Kocaballi, A.B. et al. Challenges of developing a digital scribe to reduce clinical documentation burden. npj Digit. Med. 2, 114 (2019). https://doi.org/10.1038/s41746-019-0190-1
Shreya J Shah, Anna Devon-Sand, Stephen P Ma, Yejin Jeong, Trevor Crowell, Margaret Smith, April S Liang, Clarissa Delahaie, Caroline Hsia, Tait Shanafelt, Michael A Pfeffer, Christopher Sharp, Steven Lin, Patricia Garcia, Ambient artificial intelligence scribes: physician burnout and perspectives on usability and documentation burden, Journal of the American Medical Informatics Association, Volume 32, Issue 2, February 2025, Pages 375–380,
https://doi.org/10.1093/jamia/ocae295
Stephen P Ma, April S Liang, Shreya J Shah, Margaret Smith, Yejin Jeong, Anna Devon-Sand, Trevor Crowell, Clarissa Delahaie, Caroline Hsia, Steven Lin, Tait Shanafelt, Michael A Pfeffer, Christopher Sharp, Patricia Garcia, Ambient artificial intelligence scribes: utilization and impact on documentation time, Journal of the American Medical Informatics Association, Volume 32, Issue 2, February 2025, Pages 381–385,
https://doi.org/10.1093/jamia/ocae304
van Buchem, M.M., Boosman, H., Bauer, M.P. et al. The digital scribe in clinical practice: a scoping review and research agenda. npj Digit. Med. 4, 57 (2021). https://doi.org/10.1038/s41746-021-00432-5

Kontakt: laura.amenda@tum.de marton.szep@tum.de