Spreadsheet as a relational database engine
This page contains materials related to the paper
Jerzy Tyszkiewicz:
Spreadsheet As a Relational Database Engine,
in Proc. ACM SIGMOD International Conference on Management of Data:195-206, Indianapolis, 2010.
The following materials can be downloaded:
-
-
-
-
-
Spreadsheet implementation of the simpler query from the slides:
SELECT
name, AVG(income)
FROM
Incomes
GROUP BY name
HAVING COUNT(*)>3
|
xlsx |
xls |
Excel < 2007 | NO | YES |
Excel ≥ 2007 | YES> | YES |
OpenOffice 3.x | YES | YES |
gnumeric 1.10.x | NO | YES |
- The difference between .xls and .xlsx versions is ONLY the file format.
-
The spreadsheet can handle up to 500 rows of data from both relations, because it has 500 rows of formulas in all tabs.
- Tabs correspond to the operators in the query plan. Blue and green columns are their inputs, red one the outputs.
Spreadsheet implementation of the more complicated query from the slides:
SELECT
Families.id, Families.name, AVG(Incomes.income)
FROM
Families JOIN Incomes
ON Families.id=Incomes.id
GROUP BY Families.id, Families.name
HAVING COUNT(*)>3
|
xlsx |
xls |
Excel < 2007 | NO | YES |
Excel ≥ 2007 | YES> | YES |
OpenOffice 3.x | NO | YES |
gnumeric 1.10.x | NO | YES |
- The difference between .xls and .xlsx versions is the file format AND the use of SUMIFS and COUNTIFS functions, and that whole column arguments are replaced by a cell:cell type of reference.
-
The spreadsheet can handle up to 500 rows of data from both relations, because it has 500 rows of formulas in all tabs.
- Tabs correspond to the operators in the query plan. Blue and green columns are their inputs, red one the outputs.
- For the purpose of illustration, the implementation uses a many-to-many join. It is a good exercise to replace it with a many-to-one join (see below).
Excel implementations of all the operators of the relational algebra.
|
xlsx |
xls |
Excel < 2007 | NO | YES |
Excel ≥ 2007 | YES> | YES |
OpenOffice 3.x | NO | YES |
gnumeric 1.10.x | NO | YES |
- Inputs are blue and green, outputs red. Cells with yellow background should not be copied.
- Some of the operators require input to be in a specific form (e.g. many-to-many join requires input relations to be sorted on the column on which the join is performed). Read the paper to check the details.
- Operators expect empty rows in input columns to be filled with ="" formula, and are designed to work with numerical values in the columns.
- A good exercise might be to take the spreadsheet for the slides
query (point above) and replace the many-to-many join with a
many-to-one join, which is suficient in this case. If you do so, you
may eliminate sorting operations, since many-to-one join does not
require its input relations to bo sorted, unlike the many-to-many join.
- Version for earlier Excels, OpenOffice and gnumeric will be available soon.
If you make something interesting with it, please let me know: