Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Institut für Informatik

Diplomverteidigung, Mihail Vieru

  • Wann 12.02.2016 von 11:00 bis 12:30
  • Wo Rudower Chaussee 25, Haus IV, Raum 112
  • iCal

Herr Mihail Vieru verteidigt seine Diplomarbeit zum Thema:
"Graph Distance and Textual Similarity Joins on Big Data using Apache Flink"


A common operation for big data analysis platforms such as Apache Flink is the joining of data sets with both graph and textual dimensions. Graph data may be a representation of a social network like Facebook or of a collection of linked documents like the World Wide Web. In the latter case, the text contained in a web site can be represented as a set of words. Web sites are interconnected through hyperlinks that represent the graph edges. A frequent operation on this data is finding all vertex pairs containing similar textual information in the graph, e.g. users with similar interests or web sites with similar content. A usual condition posed is that the vertices must be within a specified distance from each other, i.e., the number of connecting edges must not exceed a specified threshold value. In this thesis we have designed, implemented and evaluated joins that combine both textual similarity and graph distance at the same time using Apache Flink. We have focused our work on the efficient combination of parallel and distributed approaches for the all-pairs-shortest-path and set similarity join problems.

Alle Interessenten sind herzlich eingeladen!