Swoogle

3 min readDec 27, 2020

Swoogle can be identified as a Google for all semantic web documents. i.e. it functions as a search engine for semantic web documents or SWDs. Although it was started as a research project at the University of Maryland, it has gained much popularity since then as a system to collect and retrieve semantic web documents.

Swoogle homepage accessible via http://swoogle.umbc.edu/

Any online document written in RDF or OWL can be identified as a semantic web document. These documents have either .rdf or .owl extensions. Other extensions such as rss, n3 and daml are also accepted by Swoogle. Swoogle uses crawler-based indexing and retrieval for SWDs only. Other documents such as HTML, pdf and image files are ignored by the crawler.

Now, let’s look at how Swoogle can be used.

1. Ontology search for reuse

To describe a resource in RDF documents, existing URIs for resources should be used as much as possible to facilitate intelligent decisions. This prevents multiple ontologies being added randomly to the web for the same resource and facilitates a common understanding of the resource. This is one of the basic concepts in the semantic web. Therefore Ontology Reuse can be seen as a major use of Swoogle.

Swoogle helps to search for available ontologies that match our needs within the same domain. Key terms can be used to query similar to a Google search and all matching ontologies will be returned. However, the search results are links that point to SWDs unlike in Google.

2. Finding specific instance data

Similar to resource reuse, we are also interested in reusing URIs of already existing instances. Swoogle facilitates this through its search with constraints on classes and properties used by the resources.

3. Navigation in the semantic web

Swoogle’s crawler collects metadata about each SWD it crawls across. For example, metadata such as the document type, whether the document is embedded in another document, whether the documents are legal, namespaces used by the document etc. are collected.

These namespaces help in the navigation to other documents that use the same namespace and it benefits in understanding the internal linkages of the semantic web.

Swoogle Architecture

Swoogle’s architecture can be broken into four main components :

> SWD discovery components

> Metadata collection component

> Data analysis component

> Indexation and retrieval component

> User interface

SWD Discovery

Swoogle uses APIs provided by Google to find potential SWDs by asking Google to find documents ending with .rdf, .owl and .daml. The search results returned are treated as seed URLs for Swoogle. Each URL in this seed set is fed to a focused crawler which then visits only the websites that contain the given seed and also the directly linked pages. The crawler will then find more SWDs that Google would.

Additionally, Swoogle also uses Swooglebot to find SWDs.

Metadata Collection

The metadata collection component collects the 3 different types of metadata namely

> document metadata

> content metadata

> relation metadata

The calculation of rankings using metadata

Similar to the PageRank of Google, Swoogle has its own ranking algorithm called as OntoRank and TermRank. OntoRank evaluates the importance of each document based on the metadata collected by the crawler whereas TermRank sorts them.