Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Modellierung und Analyse komplexer Systeme


Current online social networks are massive and still growing. For example, Facebook has over 500 million active users sharing over 30 billion items per month. The scale within these data streams has outstripped traditional graph analysis methods. Monitoring requires dynamic analysis rather than repeated static analysis. The massive state behind multiple persistent queries requires shared data structures and not problem-specific representations. We present a framework based on the STINGER data structure that can monitor a global property, connected components, on a graph of 16 million vertices at rates of up to 240 000 updates per second on a 32 processor Cray XMT. For very large scale-free graphs, our implementation uses novel batching techniques that exploit the scale-free nature of the data and run over three times faster than prior methods. Our framework handles, for the first time, real-world data rates, opening the door to higher-level analytics such as community and anomaly detection.