Humboldt-Universität zu Berlin - Faculty of Mathematics and Natural Sciences - Process-Driven Architectures

Distributed Data Processing (VL)

Prof. Dr. Matthias Weidlich

Dr. Han van der Aa

 

Content

Data analytics refers to the ability to extract information from data. It has to cope with rapidly growing volumes of data as well as increasing complexity of analysis questions and methods. These trends are no longer matched by performance improvements of single processing units (CPU/GPU cores). As such, sequential processing of data on a single machine is no longer a viable option. Rather, systems for data analytics need to embrace parallel and distributed computation in order to achieve scalability by increasing the number of processing units.

 

Structure

This lecture introduces models and methods to build systems for distributed data processing. That includes foundational aspects, reaching from data models through encoding and replication schemes to notions of consistency and consensus. At the same time, the lecture covers practical implementations of distributed data processing based on infrastructures such as Akka, Spark, Flink, and Kafka.

The course will be given in English.

Exercises are integrated in the lecture. Solutions to these exercises will be collected and graded.

 

Exam

There will be a written exam at the end of the semester. Successful completion of the exercises is a prerequisite for taking the final exam and earn the LP.

 

Credit Points

The course counts for 6 LP and is open for: Informatik, Master of Science (M.Sc.) Informatik, Master of Education (M.Ed.) Wirtschaftsinformatik, Master of Science (M.Sc.). The related area of specialisation is "Daten- und Wissensmanagement".

 

Dates

VL Th 9-13 RUD 26, Raum 1'307

 

See AGNES for further details: