OntoSQL
Usage
TODO Saturation only
Up to now, the saturation can be run from the rdf-db jar, but properties have to be in conf/dataLoading.properties
the path is hard coded. See the example below.
mburon@mburon-pc [27-05-2021 18:11] /home/mburon/inria/code/rdf-sum4qa └─>java -cp query-session-deps/rdf-db/target/RDFDBDirectory/ontosql-rdfdb-1.0.13-SNAPSHOT-with-dependencies.jar fr.inria.cedar.ontosql.rdfdb.graphsaturator.RDFGraphSaturator 18:12:13,749 ERROR DataLoading:256 - There is no properties file, you must provide one using -pf option. mburon@mburon-pc [27-05-2021 18:12] /home/mburon/inria/code/rdf-sum4qa └─>java -cp query-session-deps/rdf-db/target/RDFDBDirectory/ontosql-rdfdb-1.0.13-SNAPSHOT-with-dependencies.jar fr.inria.cedar.ontosql.rdfdb.graphsaturator.RDFGraphSaturator -pf run-exp.properties 18:12:27,406 ERROR DataLoading:256 - There is no properties file, you must provide one using -pf option. mburon@mburon-pc [27-05-2021 18:12] /home/mburon/inria/code/rdf-sum4qa └─>mkdir conf mburon@mburon-pc [27-05-2021 18:15] /home/mburon/inria/code/rdf-sum4qa └─>cp run-exp.properties conf/dataloading.properties mburon@mburon-pc [27-05-2021 18:16] /home/mburon/inria/code/rdf-sum4qa └─>mv conf/dataloading.properties conf/dataLoading.properties mburon@mburon-pc [27-05-2021 18:16] /home/mburon/inria/code/rdf-sum4qa └─>java -cp query-session-deps/rdf-db/target/RDFDBDirectory/ontosql-rdfdb-1.0.13-SNAPSHOT-with-dependencies.jar fr.inria.cedar.ontosql.rdfdb.graphsaturator.RDFGraphSaturator -pf run-exp.propertiesq 18:16:28,074 INFO DataLoading:259 - Loading configuration 18:16:28,083 INFO DataLoading:111 - Abort graph saturation (disabled in properties file) mburon@mburon-pc [27-05-2021 18:16] /home/mburon/inria/code/rdf-sum4qa └─>java -cp query-session-deps/rdf-db/target/RDFDBDirectory/ontosql-rdfdb-1.0.13-SNAPSHOT-with-dependencies.jar fr.inria.cedar.ontosql.rdfdb.graphsaturator.RDFGraphSaturator 18:18:06,721 INFO DataLoading:259 - Loading configuration 18:18:06,725 INFO DataLoading:85 - Fetching RDF type dictionary-encode value 18:18:06,962 INFO DataLoading:108 - RDF graph saturation took 100 ms
Known issues
TODO Story of Q09 on lubm 1M
The following CQ Q09 leads OntoSQL to bad performances, when we use reformulation.
Q09<$X, $OX, $EX, $Y, $OY, $TY> :- triple($X, <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#memberOf>, $OX), triple($X, <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#emailAddress>, $EX), triple($OX, <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#subOrganizationOf>, $SO), triple($X, $P, $SO), triple($SO, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, $SOC), triple($P, <http://www.w3.org/2000/01/rdf-schema#range>, $SOC), triple($SOC, <http://www.w3.org/2000/01/rdf-schema#subClassOf>, <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#Organization>), triple($X, <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#advisor>, $Y), triple($Y, <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#memberOf>, $OY), triple($OY, <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#subOrganizationOf>, $SO), triple($Y, <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#telephone>, $TY);
For example, this is one of its reformulation:
<> :- triple($X,<72646>,$OX), triple($X,<247727>,$EX), triple($OX,<32354>,$SO), triple($X,<193849>,$SO), triple($FV_3508,<193849>,$SO), <-- redundant triple triple($X,<191723>,$Y), triple($Y,<15214>,$OY), triple($OY,<32354>,$SO), triple($Y,<173563>,$TY);
Removing the redundant triple improves a lot the evaluation of the corresponding SQL query below from 2s to 350ms on the summary graph.
SELECT * FROM summary_triples AS tt_7, summary_triples AS tt_4, --summary_triples AS tt_5, summary_triples AS tt_6, summary_triples AS tt_3, summary_triples AS tt_1, summary_triples AS tt_2, summary_triples AS tt_8, summary_triples AS tt_9 WHERE tt_7.p=15214 AND tt_4.p=193849 --AND tt_5.p=193849 AND tt_6.p=191723 AND tt_3.p=32354 AND tt_1.p=72646 AND tt_2.p=247727 AND tt_8.p=32354 AND tt_9.p=173563 AND tt_4.s=tt_6.s AND tt_4.s=tt_1.s AND tt_4.s=tt_2.s AND tt_6.s=tt_2.s AND tt_1.s=tt_2.s AND tt_3.s=tt_1.o AND tt_7.s=tt_6.o AND tt_7.s=tt_9.s AND tt_6.o=tt_9.s AND tt_7.o=tt_8.s --AND tt_4.o=tt_5.o AND tt_4.o=tt_3.o AND tt_4.o=tt_8.o AND tt_5.o=tt_3.o AND tt_5.o=tt_8.o
We also observe some inelegant redundancy in the equalities on columns. A transitive reduction should be computed, but it does seem to change the performances.
TODO Need of Pre-pruning
tu veux dire qu'il y a aucun triple dans encoded avec p = 2032005? (sur lubm10m?)
TODO apply a or several pruning before the computation of the UCQ cover
Up to now the cover computation is embedded into the reformulation process. It is maybe a bad choice of design. The cover computation is done for both approaches (CQ minimization and generalization), so it can be applied in the query session directly WHEN we want. I wonder if there is any interest in applying the pruning before the cover computation. For example on Q14, we spent several seconds to compute the cover (ref + cover in fact) for 1152 reformulations, only 7 remains after pruning, so the cover can gain for it. It may be wise to prune at several level of the reformulation computation. For example, we can prune in between the reformulation /Rc and /Ra or also inside reformulation/Ra, we can prune just after instanting the classes and properties. This approach is not correct, since we are used to prune wrt the unsaturated graph, we can not pruned partial reformulation.
Je pensais à une idée folle: si le th d'inversion des saturation s'applique on pourrait pruner au sein du processus de reformulation avec le summary du graphe saturé pour (i) guider la reformulation, donc limiter le temps de ref (ii) limiter le nombre de ref explorée, donc limiter le nombre de requête envoyée sur le summary
Analyse des temps des requêtes de damian sur lubm1m
Je dis qu'il y a de mauvaises performances du pruning quand le temps de pruning est supérieur ou égale au temps d'évaluation de la ref sans pruning sur le graphe initiale. Ça devrait pas arriver si le SGBD fait de bons choix.
- Q01 et Q02 : mauvaises performances du pruning avec un nombre moyen de ref = 136
- Q03: c'est bien
- Q04: c'est bien
- Q05: c'est bien
- Q06: c'est bien
- Q07: c'est bien
- Q08: c'est bien
- Q09: requête difficile avec près de 11 667 refs, on a une erreur sans prunning et avec pruning, plus de 80% du temps total est passé dans la reformulation+cover et le reste dans le pruning, toutes les reformulations sont vides: 0 réponse !
- Q10: c'est bien
- Q11: rien à gagner par le pruning, 1 ref non vide
- Q12: rien à gagner par le pruning, 1 ref non vide
- Q13: c'est bien,
- Q14: pruning très efficace pour 250ms de pruning, l'eval passe de 18.5s à 1.1s,
- Q15: plus de 1900 refs, 70% du temps passé (avec pruning) pour ref+cover, pas de réponse, 100% des refs prunées, pruning très efficace 2s de pruning vs 200s d'eval sans pruning.
- Q16: c'est bien
- Q17: c'est bien
- Q18: ok
- Q19: plus de 1500 refs, 30% du temps pour ref+cover (avec pruning), pruning très utile, pour 1s de pruning, le temps d'eval passe de 135s à moins de 3s !
- Q20: ok
- Q21: ok
- Q22: ok
- Q23: mauvaise performance du pruning !
- Q24: mauvaise performance du pruning !
- Q25: mauvaise performance du pruning !
- Q26: mauvaise performance du pruning ! 1272 refs; ref+cover = 50% de temps avec pruning
- Q27: mauvaise performance du pruning !
- Q28: temps de ref+cover > timeout
Conclusion: sur les requêtes avec beaucoup de reformulations, on passe trop de temps pour ref+cover et la méthode par pruning perd –> mauvaise performance de pruning.