Querying incomplete numerical data: between certain and possible answers
- Speaker(s)
- Liat Peterfreund
- Affiliation
- CNRS, LIGM, Paris-Est University
- Date
- May 17, 2023, 2:15 p.m.
- Room
- room 5050
- Seminar
- Seminar Automata Theory
Queries with aggregation and arithmetic operations, as well as incomplete data, are common in real-world databases, but we lack a good understanding of how they should interact. On the one hand, systems based on SQL provide ad-hoc rules for numerical nulls, on the other, theoretical research largely concentrates on the standard notions of certain and possible answers which in the presence of numerical attributes and aggregates are often meaningless. In this work, we define a principled compositional framework for databases with numerical nulls and answering queries with arithmetic and aggregations over them. We assume that missing values are given by probability distributions associated with marked nulls, which yields a model of probabilistic bag databases. We concentrate on queries that resemble standard SQL with arithmetic and aggregation and show that they are measurable, and that their outputs have a finite representation. Moreover, since the classical forms of answers provide little information in the numerical setting, we look at the probability that numerical values in output tuples belong to specific intervals. Even though their exact computation is intractable, we show efficient approximation algorithms to compute such probabilities. The talk is based on joint work with Marco Console and Leonid Libkin, and will be presented at PODS 2023.