Write a Blog >>

Data literacy is becoming increasingly important in the modern world. While spreadsheets make simple data analytics accessible to a large number of people, creating transparent scripts that can be checked, modified, reproduced and formally analyzed requires expert programming skills. In this paper, we describe the design of a data exploration language that makes the task more accessible by embedding advanced programming concepts into a simple core language.

The core language uses type providers, but we employ them in a novel way – rather than providing types with members for accessing data, we provide types with members that allow the user to also compose rich and correct queries using just member access (``dot''). This way, we recreate functionality that usually requires complex type systems (row polymorphism, type state and dependent typing) in an extremely simple object-based language.

We formalize our approach using an object-based calculus and prove that programs constructed using the provided types represent valid data transformations. We discuss a case study developed using the language, together with additional editor tooling that bridges some of the gaps between programming and spreadsheets. We believe that this work provides a pathway towards democratizing data science – our use of type providers significantly reduce the complexity of languages that one needs to understand in order to write scripts for exploring data.

Tomas is a Visiting Researcher at the Alan Turing institute, working on tools for open data-driven storytelling. He is building tools that integrate with modern data sources (open government data, data published by citizen initiatives) and let users easily create analyses and visualizations that are linked to the original data source, making the analyses more transparent, reproducible, but also easy to adapt. His early work on the project can be found at http://thegamma.net.

Tomas’ many other interests include open-source and functional programming (he is an active contributor to the F# ecosystem), programming language theory (his PhD thesis on “coeffects” develops a theory of context-aware programming language language), but also understanding programming through the perspective of philosophy of science.

Fri 23 Jun

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

13:40 - 15:20
Language and Library DesignECOOP Research Papers at Auditorium, Vertex Building
Chair(s): Sophia Drossopoulou Imperial College London
13:40
25m
Talk
IceDust 2: Derived Bidirectional Relations and Calculation Strategy Composition
ECOOP Research Papers
Daco Harkes Delft University of Technology, Eelco Visser Delft University of Technology
Link to publication DOI Media Attached
14:05
25m
Talk
Mixed Messages: Measuring Conformance and Non-Interference in TypeScript
ECOOP Research Papers
Jack Williams University of Edinburgh, J. Garrett Morris University of Edinburgh, UK, Philip Wadler University of Edinburgh, UK, Jakub Zalewski
Link to publication Media Attached
14:30
25m
Talk
EVF: An Extensible and Expressive Visitor Framework for Programming Language Reuse
ECOOP Research Papers
Weixin Zhang University of Hong Kong, Bruno C. d. S. Oliveira The University of Hong Kong
Link to publication Media Attached
14:55
25m
Talk
Data exploration through dot-driven development
ECOOP Research Papers
Tomas Petricek Alan Turing Institute
Link to publication Media Attached