Code Generation in Serializers and Comparators of Apache Flink (ICOOOLPS 2017 - 12th Workshop on Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems)

Who

Gábor Horváth, Norbert Pataki, Márton Balassi

Track

ICOOOLPS 2017

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 19 Jun 2017 15:00 - 15:30 at Vertex WS216 - Afternoon I Chair(s): Edd Barrett

Abstract

There is a shift in the Big Data world. Applications used to be I/O bound. InfiniBand, SSDs reduced the I/O overhead and more sophisticated algorithms were developed. CPU became a bottleneck for some applications. Using state of the art CPUs, reduced CPU usage can lead to reduced electricity costs even when an application is I/O bound.

Apache Flink is an open source framework for processing streams of data and batch jobs. It is using serialization for wide variety of purposes. Not only for sending data over the network, saving it to the hard disk, or for fault tolerance, but also some of the operators can work on the serialized representation of the data instead of Java objects. This approach can improve the performance significantly. Flink has a custom serialization method that enables operators to work on the serialized formats.

Currently, Apache Flink uses reflection to serialize Plain Old Java Objects (POJOs). Reflection in Java is notoriously slow. Moreover, the structure of the code is harder to optimize for the JIT compiler. As a Google Summer of Code project in 2016 we implemented code generation for serializers and comparators for POJOs to improve the performance of Apache Flink. Flink has a delicate type system which provides us with lots of information about the types that need to be serialized. Using this information it is possible to generate specialized code with great performance.

We achieved more than 6X performance improvement in the serialization which was about 20% performance improvement in the overall Flink jobs.

Link to Publication

http://dl.acm.org/citation.cfm?id=3098579&CFID=775126081&CFTOKEN=88871719

DOI

https://doi.org/10.1145/3098572.3098579

File attachments

preprint (a4-horvath.pdf)	464KiB
Slides (ICOOOLPS2017.pdf)	4.36MiB

Gábor Horváth

Eötvös Loránd University, Faculty of Informatics, Department of Programming Languages and Compilers

Norbert Pataki

Eötvös Loránd University, Faculty of Informatics, Department of Programming Languages and Compilers

Márton Balassi

Hungarian Academy of Sciences