Relational representation of RDF data by merging Characteristic Sets

Marios Meimaris, George Papastefanatos, Panos Vassiliadis

Summary

Characteristic sets (CS) organize RDF triples based on the set of properties associated with their subject nodes. This concept was recently used in indexing techniques, as it can capture the implicit schema of RDF data. While most CS-based approaches yield significant improvements in space and query performance, they fail to perform well when answering complex query workloads in the presence of schema heterogeneity, i.e., when the number of CSs becomes very large, resulting in a highly partitioned data organization. In this paper, we address this problem by introducing a novel technique, for merging CSs based on their hierarchical structure. Our method employs a lattice to capture the hierarchical relationships between CSs, identifies dense CSs and merges dense CSs with their ancestors, thus reducing the size of the CSs as well as the links between them.We implemented our algorithm on top of a relational backbone, where each merged CS is stored in a relational table, and we performed an extensive experimental study to evaluate the performance and impact of merging to the storage and querying of RDF datasets, indicating significant improvements.

Texts

Marios Meimaris, George Papastefanatos, Panos Vassiliadis. Hierarchical Property Set Merging for SPARQL Query Optimization. 22nd International Workshop On Data Warehousing and OLAP (DOLAP 2020), Copenhagen, Denmark, March 30, 2020. [Online proceedings at CEUR].

Local copy of the paper (PDF)

Presentations

Presentation for DOLAP 2020 paper available as (PDF) and (PPT).

Video for the DOLAP 2020 paper