Introduction to Semantic Web


Raghava Mutharaju
Knowledgeable Computing and Reasoning Lab
IIIT-Delhi
IIIT Delhi

About Myself

Work Experience and Education


  • Assistant Professor (CSE), IIIT-D
  • Research Scientist, GE Research Center, New York
  • Internships at IBM Research, Bell Labs, Xerox Research, Stardog
  • Software Engineer, CA Technologies, Hyderabad

  • PhD from Wright State University
  • M.Tech from MNNIT, Allahabad
  • B.Tech from JNTU, Hyderabad

Research Interests


  • Semantic Web
    • Ontology Modelling
    • Ontology Reasoning
    • Knowledge Graphs
    • SPARQL Query Processing
  • Big Data
  • IoT

Motivating Scenarios


  • Big Data
  • Information Retrieval

Big Data

Data Explosion
Image source: https://aci.info/2014/07/12/the-data-explosion-in-2014-minute-by-minute-infographic/

3Vs + 2Vs


  • Characteristics of Big Data
    • Volume
    • Velocity
    • Variety
    • Veracity
    • Value
Big Data 5Vs
Image source: https://www.edureka.co/blog/what-is-big-data/

Information Retrieval


Semantic Web


  • Two parts of the Semantic Web
    • Semantics
    • Web

History of the Web

Web 1.0


  • "Read only" Web
  • Static websites
  • Users cannot interact with the web page
  • This was the period before 1999

Web 2.0


  • "Read Write" Web
  • Users can also create content on the Web
  • Blogs, Socia Media platforms like YouTube, Twitter, Facebook, Instagram etc.
  • This was the period starting from 1999

Web 3.0


  • "Read Write Execute" Web
  • Machines can interpret the data and "understand" what it means
    • Number 42 in a Wiki page could refer to age, price, weight etc.
  • Semantic markup/annotations (schema.org) can be used to indicate the meaning of the data
  • Similar to the Web of documents, we can have Web of data (Linked Data)
  • This is referred to as the Semantic Web

The Semantic Web Vision

  • Pete wants to take an appointment with a doctor who is nearby and has good ratings
  • The scheduling software agent needs to communicate with the following and suggest the best possible plan
    • Pete's calendar
    • List of doctors/healthcare providers and their addresses
    • Distance between his home and the doctor's address
    • Healthcare provider should be covered by Pete's insurance

The Semantic Web. Tim Berners-Lee, James Hendler, Ora Lassila. Scientific American. May 2001.

Understanding Data


  • Humans can read the text on a webpage and understand it. But a machine cannot
  • Unless a machine can understand the data and interact with other machines/agents, Pete's task cannot be automated
  • Vocabularies (more formally, ontologies) can be used to annotate the data and make it more understandable for a machine
    • Ontology is a shared understanding of the World
  • Individual data silos can be connected together to form a Web of Data, called Linked Data

TimBL's TED Talk on Linked Data

YouTube video link: https://www.youtube.com/watch?v=OM6XIICm_qo.

Are We There Yet?


  • Pete's task
    • Google Assistant
    • Cortana
    • Siri

  • It is not yet possible to completely automate Pete's task

Let's come to the Semantics part

Artificial Intelligence

  • It is the study of the general principles of building intelligent agents.
  • An agent is any device that can perceive its environment through sensors and react to it by taking action to achieve a stated goal.
  • Mimicking human senses, i.e., sight, sound, touch, smell, taste, is a form of perception.
  • Intelligent agents should be able to interpret and process other forms of input such as text, semi-structured, and structured data.
  • Output of an intelligent agent could be in multiple forms such as movement (reaching a destination), decision taken, sound etc.
  • Several applications of AI that have not only improved our day-to-day lives but help in saving lives
    • Google Maps
    • Web Search
    • Intelligent Assistants (Siri, Cortana etc.)
    • Targeted advertising
    • Diagnosis of diseases
    • Autonomous vehicles
    • Playing games (Jeopardy, Chess, Go etc.)
    • Robots

Subfields of AI


  • Planning
  • Natural Language Processing
  • Learning (Machine Learning)
  • Computer Vision
  • Robotics
  • Knowledge Representation and Reasoning
  • Artificial Neural Networks

Knowledge Representation and Reasoning (KRR)

  • Techniques to capture knowledge about the world in a form that machines can understand
    • A Car is a type of Vehicle
    • Car has exactly four wheels
    • Car has at least two doors and at most four doors
  • Reasoning is the process of deriving new facts (knowledge) based on existing facts
    • All birds fly
    • Pigeon is a bird
    • Can Pigeon fly?
  • KRR is the field of AI that helps an agent to use what it knows (background knowledge) to decide what to do

KRR Formalisms


  • Different mechanisms to capture knowledge and reason over it
    • Frames
    • Semantic Nets
    • Logic
      • First Order Logic
      • Description Logics
      • Ontologies
      • Resource Description Framework (RDF)

Semantic Web

Three Themes


  1. Building Models
    • Describe the world in abstract terms to simplify its understanding
  2. Computing with Knowledge
    • Machines that can do logical deduction/inference from encoded knowledge in order to draw meaningful conclusions
  3. Exchanging Information
    • Transmission of complex information between machines that allows distribution, interlinking, and reconciliation of knowledge
    • RSS - some versions of RSS use RDF

Enabling Technologies


  • Purpose is to provide structure to the Web and to the data in general
  • Move from web of documents to web of data (Linked Data)
  • Semantic Web technologies and W3C standards
    • RDF (Resource Description Framework)
    • OWL (Web Ontology Language)
    • SPARQL (query language)
    • SHACL (Shapes Constraint Language)

Linked Data


https://lod-cloud.net/

Structured Data (RDF) Demo


RDF Graph, Property Graph, Knowledge Graph

RDF Graph


  • Triples that describe any resource
  • <Delhi> <capitalOf> <India>
  • Triples are directed labelled graphs

Property Graph

Property Graph
Image source: https://neo4j.com/developer/graph-database/

Knowledge Graph

Knowledge Graph
Image source: https://goo.gl/S2F3mH

Knowledge Graph


  • There is no standard definition of Knowledge Graphs
  • It is a graph that captures knowledge in the form of entities, relationships between them, properties, and additional information including provenance
    • "Things" not strings. Things should have semantics.
    • Eg: What does it mean to be a "Person", "Organization", etc.
    • Things are entities that have properties and are connected by relationships

Applications


  • Knowledge Graphs are used in several domains and by several commercial enterprises
    • Healthcare
    • Geoscience
    • Industrial domains such as manufacturing, power, oil and gas
    • Web Search
    • Recommender systems
    • Conversational agents (chatbots, QA systems)
    • Google, Microsoft, Amazon, Ebay, LinkedIn, GE, Accenture etc.

Gartner's Hype Cycle

Gartner's Hype Cycle of Emerging Technologies, 2018
Image source: https://goo.gl/LDGrwP

References


  • Textbook: Foundations of Semantic Web Technologies. Pascal Hitzler et. al. CRC Press.
  • Reference book: Artificial Intelligence. Stuart Russell, Peter Norvig. Pearson
  • Reference book: Knowledge Representation and Reasoning. Ronald Brachman, Hector Levesque. Morgan Kaufmann