-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Currently, we are storing the reconciled Wikidata QID in two different ways:
- We store the QID with
exact match (P2888)whenever the dataset already has a URI for the entity (e.g. https://www.diamm.ac.uk/people/1) - We replace all instances of a particular string (e.g. "J. S. Bach") with a QID if the dataset does not have a URI for that particular entity.
Case 1: Dataset contains URI for entity — using exact match (P2888)
When a dataset already defines a URI (e.g., https://www.diamm.ac.uk/people/1), we use the Wikidata property exact match (P2888) to link that local URI to the corresponding Wikidata entity. This approach ensures the original URI from the dataset is preserved.
For example, if https://www.diamm.ac.uk/people/1 was reconciled to https://www.wikidata.org/entity/Q1339, there would be a triple in our graph stating:
<https://www.diamm.ac.uk/people/1> wdt:P2888 <https://www.wikidata.org/entity/Q1339> .However, all other statement related to this entity would use only the original dataset URI, never the Wikidata URI. For example:
<https://www.diamm.ac.uk/people/1> wdt:P569 "01-01-1200"^^xsd:dateTime .
<https://www.diamm.ac.uk/people/1> wdt:1449 "Beltrandus de Francia" .
<https://www.diamm.ac.uk/sources/1> wdt:P50 <https://www.diamm.ac.uk/people/1> .In this case, the SPARQL query must first retrieve the DIAMM ID, then retrieve the Wikidata QID from that:
Case 2: Dataset does not contain URI for entity — replacing string with QID
This case applies to most values in our datasets, since it is much more common to have strings rather than URIs.
For example, if "Anonymous" (only string, no URI) was reconciled to https://www.wikidata.org/entity/Q4233718, the Wikidata URI would be directly placed within the triple:
<https://www.diamm.ac.uk/sources/1> wdt:P50 <https://www.wikidata.org/entity/Q4233718> .
<https://www.diamm.ac.uk/compositions/1> wdt:P86 <https://www.wikidata.org/entity/Q4233718> .In this, the SPARQL query must directly retrieve the Wikidata ID.
Problem with Having Two Different Schema
Storing QIDs in two different ways confuses the LLM, since they require two different SPARQL queries.
Another issue is that a SPARQL query can retrieve a mix of Wikidata URIs and local URIs, instead of retrieving the complete set of one or the other.
For example, this query retrieves a mix of Wikidata URI and of The Global Jukebox URI:
SELECT ?culture
WHERE {
GRAPH gj: {
?ensemble a gj:Ensemble ;
wdt:P2596 ?culture .
}
}