Course Project

Winter 2019
Instructor: Raghava Mutharaju
IIIT-Delhi
IIIT Delhi

Each group works on a different dataset and preferably different domain as well
There will be intermediate deadlines
Expected deliverables

Ignore empty and NULL values in the CSV
Be careful with complex cell values (might have to be broken down into multiple separate triples
Make use of a CSV library such as Apache Commons CSV

Mapping file containing the mapping between the column number/name in the CSV and the class/property in the ontology. This can be in any format
Number of triples in the RDF Graph should not be less than 500k
Make use of at least one existing vocabulary such as Dublin Core, FOAF, schema.org, SKOS, SIOC, CC, etc. Modify your ontology accordingly
Choose any triple store (next slide). There should be an even spread of triple stores among the project groups
Load the triples into the triple store
Deadline: April 10, 2019
Deliverable: Submit the following screenshots

Mapping file
Your code/program that includes the triple store connection string and the result of the SPARQL count query

Form meaningful SPARQL queries based on your data/triples and the broad templates given in the next two slides
Use at least two variables in each query
Put each query in a separate text file
Write code to read each query from the file
Use an API to submit the query to the triple store and collect the results
Write the results to a file

At least 3 triple patterns involved in subject-subject joins. This is called a star query. You can use Filter to limit the results (in case of more than 100 triples)
At least 3 triple patterns involved in subject-object joins. This is called a chain query. Get only 100 results using limit clause
At least 3 triple patterns involved in object-object joins. This is another form of star query. You can use Filter/Limit to restrict the number of results
Two star queries with 3 triple patterns each joined on a common object. You can use Filter/Limit to restrict the number of results
Two star queries with 3 triple patterns each, connected to each other like a chain, i.e., the subject/object of the first star query is connected to the object/subject of the other star query. You can use Filter/Limit to restrict the number of results

Query involving reasoning: Assuming that A $\sqsubseteq$ B is in your ontology and a_1, a_2, ..., a_n are instances of A and not B in your RDF graph, write a query to check whether a_1, a_2, ..., a_n are also instances of B
Query involving reasoning: Your ontology has p rdfs:domain C for an object property p and s p o in your RDF graph. Write a query to check whether s rdf:type C? If s rdf:type C is already asserted in your graph, delete that triple and then run the query
Property path query involving a sequence and at least one triple pattern. Sequence should contain at least two properties in it
Property path query involving use of alternative and at least one triple pattern. Alternative should contain at least two properties in it
ASK query with a GROUP BY, and HAVING clause and with at least 3 triple patterns in the query such that the output is true

Validate the datatype of a string data property (check whether the instances involved with this property are indeed what you asserted in your ontology). Make use of any of the following to validate the instances connected to this data property further: minCount, maxCount, minLength, maxLength
Validate the object type to which the object property is connected to and check whether the node is indeed an IRI

Either a) validate the datatype of a string type data property and using a regular expression check whether instances using this data property are indeed having correct values (eg., phone numbers, zipcode etc) or b) if you have dates or numeric data types, make use of lessThan or lessThanOrEquals (eg., startDate is lessThan endDate) and validate the datatype to be of numeric type
Make use of closed and ignoredProperties