OntoSQL

Published on 26/06/2023

Usage

TODO Saturation only

Up to now, the saturation can be run from the rdf-db jar, but properties have to be in conf/dataLoading.properties the path is hard coded. See the example below.

mburon@mburon-pc [27-05-2021 18:11] /home/mburon/inria/code/rdf-sum4qa 
└─>java -cp query-session-deps/rdf-db/target/RDFDBDirectory/ontosql-rdfdb-1.0.13-SNAPSHOT-with-dependencies.jar fr.inria.cedar.ontosql.rdfdb.graphsaturator.RDFGraphSaturator 
18:12:13,749 ERROR DataLoading:256 - There is no properties file, you  must provide one using -pf option.
mburon@mburon-pc [27-05-2021 18:12] /home/mburon/inria/code/rdf-sum4qa 
└─>java -cp query-session-deps/rdf-db/target/RDFDBDirectory/ontosql-rdfdb-1.0.13-SNAPSHOT-with-dependencies.jar fr.inria.cedar.ontosql.rdfdb.graphsaturator.RDFGraphSaturator -pf run-exp.properties
18:12:27,406 ERROR DataLoading:256 - There is no properties file, you  must provide one using -pf option.
mburon@mburon-pc [27-05-2021 18:12] /home/mburon/inria/code/rdf-sum4qa 
└─>mkdir conf
mburon@mburon-pc [27-05-2021 18:15] /home/mburon/inria/code/rdf-sum4qa 
└─>cp run-exp.properties conf/dataloading.properties
mburon@mburon-pc [27-05-2021 18:16] /home/mburon/inria/code/rdf-sum4qa 
└─>mv conf/dataloading.properties  conf/dataLoading.properties
mburon@mburon-pc [27-05-2021 18:16] /home/mburon/inria/code/rdf-sum4qa 
└─>java -cp query-session-deps/rdf-db/target/RDFDBDirectory/ontosql-rdfdb-1.0.13-SNAPSHOT-with-dependencies.jar fr.inria.cedar.ontosql.rdfdb.graphsaturator.RDFGraphSaturator -pf run-exp.propertiesq
18:16:28,074  INFO DataLoading:259 - Loading configuration
18:16:28,083  INFO DataLoading:111 - Abort graph saturation (disabled in properties file)
mburon@mburon-pc [27-05-2021 18:16] /home/mburon/inria/code/rdf-sum4qa 
└─>java -cp query-session-deps/rdf-db/target/RDFDBDirectory/ontosql-rdfdb-1.0.13-SNAPSHOT-with-dependencies.jar fr.inria.cedar.ontosql.rdfdb.graphsaturator.RDFGraphSaturator
18:18:06,721  INFO DataLoading:259 - Loading configuration
18:18:06,725  INFO DataLoading:85 - Fetching RDF type dictionary-encode value
18:18:06,962  INFO DataLoading:108 - RDF graph saturation took 100 ms

Known issues

TODO Story of Q09 on lubm 1M

The following CQ Q09 leads OntoSQL to bad performances, when we use reformulation.

Q09<$X, $OX, $EX, $Y, $OY, $TY> :-
   triple($X, <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#memberOf>, $OX),
   triple($X, <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#emailAddress>, $EX),
   triple($OX, <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#subOrganizationOf>, $SO),
   triple($X, $P, $SO),
   triple($SO, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, $SOC),
   triple($P, <http://www.w3.org/2000/01/rdf-schema#range>, $SOC),
   triple($SOC, <http://www.w3.org/2000/01/rdf-schema#subClassOf>, <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#Organization>),
   triple($X, <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#advisor>, $Y),
   triple($Y, <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#memberOf>, $OY),
   triple($OY, <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#subOrganizationOf>, $SO),
   triple($Y, <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#telephone>, $TY);

For example, this is one of its reformulation:

<> :- 
	triple($X,<72646>,$OX),
	triple($X,<247727>,$EX),
	triple($OX,<32354>,$SO),
	triple($X,<193849>,$SO),
	triple($FV_3508,<193849>,$SO), <-- redundant triple
	triple($X,<191723>,$Y),
	triple($Y,<15214>,$OY),
	triple($OY,<32354>,$SO),
	triple($Y,<173563>,$TY);

Removing the redundant triple improves a lot the evaluation of the corresponding SQL query below from 2s to 350ms on the summary graph.

SELECT * FROM
summary_triples AS tt_7,
summary_triples AS tt_4,
--summary_triples AS tt_5,
summary_triples AS tt_6,
summary_triples AS tt_3,
summary_triples AS tt_1,
summary_triples AS tt_2,
summary_triples AS tt_8,
summary_triples AS tt_9
WHERE tt_7.p=15214
AND tt_4.p=193849
--AND tt_5.p=193849
AND tt_6.p=191723
AND tt_3.p=32354
AND tt_1.p=72646
AND tt_2.p=247727
AND tt_8.p=32354
AND tt_9.p=173563

AND tt_4.s=tt_6.s
AND tt_4.s=tt_1.s
AND tt_4.s=tt_2.s
AND tt_6.s=tt_2.s
AND tt_1.s=tt_2.s

AND tt_3.s=tt_1.o

AND tt_7.s=tt_6.o
AND tt_7.s=tt_9.s
AND tt_6.o=tt_9.s

AND tt_7.o=tt_8.s

--AND tt_4.o=tt_5.o
AND tt_4.o=tt_3.o
AND tt_4.o=tt_8.o
AND tt_5.o=tt_3.o
AND tt_5.o=tt_8.o

We also observe some inelegant redundancy in the equalities on columns. A transitive reduction should be computed, but it does seem to change the performances.

TODO Need of Pre-pruning

tu veux dire qu'il y a aucun triple dans encoded avec p = 2032005? (sur lubm10m?)

2021-05-31_09-48-33_Capture d%E2%80%99%C3%A9cran de 2021-05-20 13-59-32.png

TODO apply a or several pruning before the computation of the UCQ cover

Up to now the cover computation is embedded into the reformulation process. It is maybe a bad choice of design. The cover computation is done for both approaches (CQ minimization and generalization), so it can be applied in the query session directly WHEN we want. I wonder if there is any interest in applying the pruning before the cover computation. For example on Q14, we spent several seconds to compute the cover (ref + cover in fact) for 1152 reformulations, only 7 remains after pruning, so the cover can gain for it. It may be wise to prune at several level of the reformulation computation. For example, we can prune in between the reformulation /Rc and /Ra or also inside reformulation/Ra, we can prune just after instanting the classes and properties. This approach is not correct, since we are used to prune wrt the unsaturated graph, we can not pruned partial reformulation.

Je pensais à une idée folle: si le th d'inversion des saturation s'applique on pourrait pruner au sein du processus de reformulation avec le summary du graphe saturé pour (i) guider la reformulation, donc limiter le temps de ref (ii) limiter le nombre de ref explorée, donc limiter le nombre de requête envoyée sur le summary

Analyse des temps des requêtes de damian sur lubm1m

Je dis qu'il y a de mauvaises performances du pruning quand le temps de pruning est supérieur ou égale au temps d'évaluation de la ref sans pruning sur le graphe initiale. Ça devrait pas arriver si le SGBD fait de bons choix.

Q01 et Q02 : mauvaises performances du pruning avec un nombre moyen de ref = 136
Q03: c'est bien
Q04: c'est bien
Q05: c'est bien
Q06: c'est bien
Q07: c'est bien
Q08: c'est bien
Q09: requête difficile avec près de 11 667 refs, on a une erreur sans prunning et avec pruning, plus de 80% du temps total est passé dans la reformulation+cover et le reste dans le pruning, toutes les reformulations sont vides: 0 réponse !
Q10: c'est bien
Q11: rien à gagner par le pruning, 1 ref non vide
Q12: rien à gagner par le pruning, 1 ref non vide
Q13: c'est bien,
Q14: pruning très efficace pour 250ms de pruning, l'eval passe de 18.5s à 1.1s,
Q15: plus de 1900 refs, 70% du temps passé (avec pruning) pour ref+cover, pas de réponse, 100% des refs prunées, pruning très efficace 2s de pruning vs 200s d'eval sans pruning.
Q16: c'est bien
Q17: c'est bien
Q18: ok
Q19: plus de 1500 refs, 30% du temps pour ref+cover (avec pruning), pruning très utile, pour 1s de pruning, le temps d'eval passe de 135s à moins de 3s !
Q20: ok
Q21: ok
Q22: ok
Q23: mauvaise performance du pruning !
Q24: mauvaise performance du pruning !
Q25: mauvaise performance du pruning !
Q26: mauvaise performance du pruning ! 1272 refs; ref+cover = 50% de temps avec pruning
Q27: mauvaise performance du pruning !
Q28: temps de ref+cover > timeout

Conclusion: sur les requêtes avec beaucoup de reformulations, on passe trop de temps pour ref+cover et la méthode par pruning perd –> mauvaise performance de pruning.