Direkt zum InhaltDirekt zur SucheDirekt zur Navigation
▼ Zielgruppen ▼

Humboldt-Universität zu Berlin - Faculty of Mathematics and Natural Sciences - Process-Driven Architectures

Distributed Data Processing (VL)

Prof. Dr. Matthias Weidlich

Dr. Han van der Aa



Data analytics refers to the ability to extract information from data. It has to cope with rapidly growing volumes of data as well as increasing complexity of analysis questions and methods. These trends are no longer matched by performance improvements of single processing units (CPU/GPU cores). As such, sequential processing of data on a single machine is no longer a viable option. Rather, systems for data analytics need to embrace parallel and distributed computation in order to achieve scalability by increasing the number of processing units.



This lecture introduces models and methods to build systems for distributed data processing. That includes foundational aspects, reaching from data models through encoding and replication schemes to notions of consistency and consensus. At the same time, the lecture covers practical implementations of distributed data processing based on infrastructures such as Akka, Spark, Flink, and Kafka.

The course will be given in English.

Exercises are integrated in the lecture. Solutions to these exercises will be collected and graded.



There will be a written exam at the end of the semester. Successful completion of the exercises is a prerequisite for taking the final exam and earn the LP.


Credit Points

The course counts for 6 LP and is open for: Informatik, Master of Science (M.Sc.) Informatik, Master of Education (M.Ed.) Wirtschaftsinformatik, Master of Science (M.Sc.). The related area of specialisation is "Daten- und Wissensmanagement".



VL Th 9-13 RUD 26, Raum 1'307


See AGNES for further details: