Direkt zum InhaltDirekt zur SucheDirekt zur Navigation
▼ Zielgruppen ▼

Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Institut für Informatik

PhD defense of Jörgen Brandt

Wann 04.12.2020 ab 13:00 (Europe/Berlin / UTC100) iCal
Wo online: Zoom

The PhD defense of Jörgen Brandt will take place on Friday, 4.12.2020, starting 13.00 o''clock. The title of Jörgen's dissertation is:

Cuneiform – A Functional Language for Large-Scale Data Analysis

The defense will be held via Zoom. (Link "Informatik-Account" needed)


Bioinformatics and next-generation sequencing data analyses often form large and complex pipelines. The tools and libraries making up the processing steps in these pipelines come from different sources and have different interfaces which hampers integrating them into data analysis frameworks. Also, these pipelines process large data sets. Thus, users
need to parallelize independent processing steps. The state of the art in large-scale scientific data analysis for bioinformatics and next-generation sequencing are scientific workflow systems. A scientific workflow system allows researchers to describe a data analysis pipeline as a scientific workflow which integrates external software, defines the
data dependencies forming a data analysis pipeline, and parallelizes independent processing steps. Scientific workflow systems consist of a workflow language providing a user interface, and an execution environment. The workflow language determines how users express workflows, reuse and compose workflow fragments, integrate external software, how the scientific workflow system identifies independent processing steps, and how we derive optimizations from a workflow’s structure. The execution environment schedules and runs data processing operations.

In this thesis we present Cuneiform, a workflow language, and its distributed execution environment. For Cuneiform’s design we take the perspective of programming languages. We adopt methods from functional programming towards composition and expressing data dependencies. We apply operational semantics and type systems to define wellformedness,
consistency, and reduction of Cuneiform workflows. For the design of the
distributed execution environment we take the perspective of distributed systems. We apply Petri nets to define the communication patterns among the distributed execution environment’s agents. We show how to use Cuneiform to (i) integrate foreign tools and libraries from languages
like R or Python by wrapping them in functions, (ii) create complex workflows by composing simple workflows, and (iii) use language features common in functional programming like conditional execution, iteration, folding, recursion, or higher-order functions in the context of bioinformatics and next-generation sequencing applications. The execution environment underlying Cuneiform distributes independent foreign function applications to run these applications in parallel.

Some contemporary workflow languages like Swift or Nextflow also promote a functional style. However, to our knowledge, Cuneiform is the only external, statically typed workflow language providing bioinformaticians advanced features of functional programming in a distributed setting.