Semantic Web: Mining Knowledge from the Web


Raghava Mutharaju
Knowledgeable Computing and Reasoning Lab
IIIT-Delhi
IIIT Delhi

About Myself

Work Experience and Education


  • Assistant Professor (CSE), IIIT-D
  • Research Scientist, GE Research Center, New York
  • Internships at IBM Research, Bell Labs, Xerox Research, Stardog
  • Software Engineer, CA Technologies, Hyderabad

  • PhD from Wright State University
  • M.Tech from MNNIT, Allahabad
  • B.Tech from JNTU, Hyderabad

Research Interests


  • Semantic Web
    • Ontology Modelling
    • Ontology Reasoning
    • Knowledge Graphs
    • SPARQL Query Processing
  • Big Data
  • IoT

Introduction

Abdul Kalam Wiki
Image source: https://en.wikipedia.org/wiki/A._P._J._Abdul_Kalam
  1. Who was the president of India who was also a scientist?
  2. Who was the scientist who worked at DRDO and was a Bharat Ratna?
  3. Who was the president of India with an aerospace engineering background?
  4. Name the person that worked at DRDO and ISRO and was called the “People’s President”?
  5. Who was the Bharat Ratna holder from rameswaram?

Humans

  • can read the text
  • can understand the text
  • can make inferences based on the text
  • can answer the questions

Machines

  • can read the text
  • cannot understand the text
  • cannot make any inferences
  • may answer some of the questions

Knowledge Graph

Abdul Kalam KG
Knowledge Graph involving Abdul Kalam

Knowledge Graph


  • Capture the knowledge in a structured form
  • Machine processable
  • Inferences can be drawn
  • KG can be linked to other related KGs
    • Presidents of India
    • Books written by presidents
    • Details of DRDO and ISRO
  • Machines have better chance to answer questions using KGs

Knowledge Graph

Abdul Kalam KG
Knowledge Graph involving Abdul Kalam

Domain(born in, Person)     Range(born in, Place)     Range(worked at, Organization)

Knowledge Graph

Abdul Kalam KG
Expanded Knowledge Graph involving Abdul Kalam
Google Abdul Kalam KG
Google's knowledge card for Abdul Kalam

Knowledge Graph

Knowledge Graph
Image source: https://goo.gl/S2F3mH

Semantic Web

  • Purpose is to provide structure to the Web and to the data in general
  • Move from web of documents to web of data
  • Proposed by Tim-Berners Lee in 1999
  • Semantic Web technologies and W3C standards
    • RDF (Resource Description Framework)
    • OWL (Web Ontology Language)
    • SPARQL (query language)
    • SHACL (Shapes Constraint Language)
LOD
Image source: https://lod-cloud.net/

RDF


  • Resource Description Framework (RDF) is a data model that is used to describe resources
    • Physical things
    • Abstract concepts
    • Numbers and strings
  • We use the term resource and entity synonymously here
  • It is a universal, machine readable data exchange format
  • Resources are described using triples (subject, predicate, object)
  • A triple captures the relationship (predicate) between a subject and an object
  • <Delhi> <capitalOf> <India>
  • RDF triple is called a RDF Statement
  • Triple can be represented as a directed labelled graph
RDF Graph
Image source: https://www.w3.org/TR/rdf11-concepts/rdf-graph.svg
Sample RDF Graph
Image source: http://www.obitko.com/tutorials/ontologies-semantic-web/rdf-graph-and-syntax.html

Mining Knowledge

Abdul Kalam Wiki
Image source: https://en.wikipedia.org/wiki/A._P._J._Abdul_Kalam

Knowledge Graph Construction


  • Manual
    • Good quality KG (entities and relations)
    • Not scalable
  • Automatic
    • Quality unpredictable, but generally not good
    • Scalable: large amount of text, any domain

Automatic Knowledge Graph Construction


  • Steps
    • Parts of speech tagging
      • Identify nouns, verbs, adjectives, etc.
    • Dependency parsing
      • Recognize the syntactic structure of the sentence along with the dependencies among the words
    • Named Entity Recognition
      • Identify person, location, organization, etc.

Automatic Knowledge Graph Construction


  • Steps
    • Coreference resolution
      • All words that refer to the same entity (I, we, he, she, it)
    • Entity resolution and linking
      • Uniquely identify entities that refer to the same entity
      • Obama, Barack Obama, President, Paris, Amazon
    • Information extraction
      • Define the domain (high level concepts and relations)
      • Learn extractors/templates
      • Rank/score the candidate facts

Automatic Knowledge Graph Construction

Existing OpenIE Systems


  • OpenIE 5.0
  • FRED
  • ClausIE
  • MinIE

  • Demo

List of papers/code/data


Challenges


  • Large scale benchmark for all the OpenIE systems. Results should be reproducible.
  • OpenIE systems for non-English languages
  • Handle all types of sentences from different domains
    • Long/short sentences, with clauses, numerical values, etc.

Semantic Web: Mining Knowledge from the Web

Conclusion

  • In order for the machines to understand and make sense of the data, it has to be in a structured form
  • Knowledge can be captured and (machine) processed if it is in the form of Knowledge Graphs
  • Automatically building Knowledge Graphs is hard
    • Quality of the triples is not good
    • Some of the open challenges were discussed