Created: August 21, 2009
Updated: September 25, 2009
QUOTES
"A Semantic Web Primer, Second Edition"
Grigoris Antoniou and Frank van Harmelen
2008
Cambridge, Massachusetts: The MIT Press
----------
(p. xvii)
Not surprisingly, this call by Tim Berners-Lee has received tremendous at-
tention by researchers and practitioners alike. There is now an International
Semantic Web Conference series, a Semantic Web Journal published by Else-
vier, as well as industrial committees that are looking at the first generation
of standards for the Semantic Web.
----------
(p. xvii)
The book of-
fers a gentle introduction to Semantic Web concepts, including XML, DTDs,
and XML schemas, RDF and RDFS, OWL, logic, and inference.
----------
(p. 2)
Also, results of Web searches are not readily accessible by other
software tools; search engines are often isolated applications.
----------
(p. 6)
Part of this direction involves wikis, collections of Web pages that allow
users to add content (usually structured text and hypertext links) via a
browser interface. Wiki systems allow for collaborative knowledge creation
because they give users almost complete freedom to add and change infor-
(p. 7)
mation without ownership of content, access restrictions, or rigid workflows.
----------
(p. 11)
However, in more recent years, ontology has become one of the many
words hijacked by computer science and given a specific technical meaning
that is rather different from the original one. Instead of "ontology" we now
speak of "an ontology." For our purposes, we will use T. R. Gruber's defini-
tion, later refined by R. Studer: An ontology is an explicit and formal specification
of a conceptualization.
----------
(p. 13)
And third, automated reasoners can deduce (infer) conclusions from the
given knowledge, thus making implicit knowledge explicit. Such reason-
ers have been studied extensively in AI.
----------
(p. 14)
For example, our previous examples in-
volved rules of the form, "If conditions, then conclusion," where conditions
and conclusion are simple statements, and only finitely many objects needed
to be considered. This subset of logic, called Horn logic, is tractable and
supported by efficient reasoning tools.
----------
(p. 15)
1.3.4 Agents
Agents are pieces of software that work autonomously and proactively. Con-
ceptually they evolved out of the concepts of object-oriented programming
and component-based software development.
----------
(p. 16)
1.3.5 The Semantic Web versus Artificial Intelligence
As we have said, most of the technologies needed for the realization of the
Semantic Web build upon work in the area of artificial intelligence. Given
that AI has a long history, not always commercially successful, one might
worry that, in the worst case, the Semantic Web will repeat AI's errors: big
promises that raise too high expectations, which turn out not to be fulfilled
(at least not in the promised time frame).
(p. 17)
This worry is unjustified. The realization of the Semantic Web vision does
not rely on human-level intelligence; in fact, as we have tried to explain, the
challenges are approached in a different way. The full problem of AI is a
deep scientific one, perhaps comparable to the central problems of physics
(explain the physical world) or biology (explain the living world). So seen,
the difficulties of achieving human-level Artificial Intelligence within ten or
twenty years, as promised at some points in the past, should not have come
as a surprise.
But on the Semantic Web partial solutions will work. Even if an intelligent
agent is not able to come to all conclusions that a human user might draw, the
agent will still contribute to a Web much superior to the current Web. This
brings us to another difference. If the ultimate goal of AI is to build an intel-
ligent agent exhibiting human-level intelligence (and higher), the goal of the
Semantic Web is to assist human users in their day-to-day online activities.
It is clear that the Semantic Web will make extensive use of current AI tech-
nology and that advances in that technology will lead to a better Semantic
Web. But there is no need to wait until AI reaches a higher level of achieve-
ment; current AI technology is already sufficient to go a long way toward
realizing the Semantic Web vision.
----------
(p. 18)
Figure 1.3 shows the "layer cake" of the Semantic Web (due to Tim Berners-
Lee), which describes the main layers of the Semantic Web design and vision.
----------
(p. 19)
This classical layer stack is currently being debated. Figure 1.4 shows an
alternative layer stack that takes recent developments into account. The main
differences, compared to the stack in figure 1.3, are the following:
(p. 20)
o The ontology layer is instantiated with two alternatives: the current stan-
dard Web ontology language OWL, and a rule-based language. Thus an
alternative stream in the development of the Semantic Web appears.
o DLP is the intersection of OWL and Horn logic, and serves as a common
foundation.
----------
(p. 21)
o Key technologies include explicit metadata, ontologies, logic and infer-
encing, and intelligent agents.
----------
(p. 25)
Today HTML (hypertext markup language) is the standard language in
which Web pages are written. HTML, in turn, was derived from SGML (stan-
dard generalized markup language), and international standard (ISO 8879) for
the definition of device- and system-independent methods of representing
information, both human- and machine-readable.
----------
(p. 25)
Languages conforming to SGML are called SGML applications. HTML is
such an application; it was developed because SGML was considered far too
complex for Internet-related purposes. XML (extensible markup language) is
another SGML application, and its development was driven by shortcomings
of HTML.
----------
(p. 28)
Just as people cannot communicate effectively if they don't use a common
language, applications on the WWW must agree on common vocabularies
if they need to communicate and collaborate. Communities and business
sectors are in the process of defining their specialized vocabularies, creat-
ing XML applications (or extensions; thus the term extensible in the name of
XML). Such XML applications have been defined in various domains, for
example, mathematics (MathML), bioinformatics (BSML), human resources
(HRML), astronomy (AML), news (NewsML), and investment (IRML).
Also, the W3C has defined various languages on top of XML, such as SVG
and SMIL. This approach has also been taken for RDF (see chapter 3).
It should be noted that XML can serve as a uniform data exchange format
between applications. In fact, XML's use as a data exchange format between
applications nowadays far outstrips its originally intended use as document
markup language.
----------
(p. 29)
2.2.1 Prolog
The prolog consists of an XML declaration and an optional reference to ex-
ternal structuring documents. Here is an example of an XML declaration:
It specifies that the current document is an XML docuument, and defines the
version and the character encoding used in the particular system (such as
UTF-8, UTF-16, and ISO 8859-1). The character encoding is not mandatory,
but its specification is considered good practice.
----------
(p. 29)
A reference to external structuring documents looks like this:
Here the structuring information is found in a local file called book.dtd.
Instead, the reference might be a URL. If only a locally recognized name or
only a URL is used, then the label SYSTEM is used. If, however, one wishes
to give both a local name and a URL, then the label PUBLIC should be used
instead.
2.2.2 Elements
XML elements represent the "things" the XML document talks about, such
as books, authors, and publishers. They compose the main concept of XML
documents. An element consists of an opening tag, its content, and a closing
tag. For example,
David Billington
Tag names can be chosen almost freely; there are very few restrictions.
----------
(p. 34)
There are two ways of defining the structure of XML documents: DTDs,
the older and more restricted way, and XML Schema, which offers extended
possibilities, mainly for the definition of data types.
2.3.1 DTDs
External and Internal DTDs
The components of a DTD can be defined in a separate file (external DTD) or
within the XML document itself (internal DTD). Usually it is better to use ex-
ternal DTDs, because their definitions can be used across several documents;
otherwise duplication is inevitable, and the maintenance of consistency over
time becomes difficult.
----------
(p. 47)
The same is true for XML documents, for
which there exist a number of proposals for query languages, such as XQL,
XML-QL, and XQuery.
----------
(p. 66)
Note that the first two formalizations include essentially an opposite nesting
although they represent the same information. So there is no standard way
of assigning meaning to tag nesting.
Although often called a "language" (and we commit this sin ourselves in
this book), RDF (Resource Description Framework) is essentially a data model.
Its basic building block is an object-attribute-value triple, called a statement.
----------
(p. 66)
The name RDF Schema is now
widely regarded as an unfortunate choice. It suggests that RDF Schema has a
similar relation to RDF as XML Schema has to XML, but in fact this is not the
case. XML Schema constrains the structure of XML documents, whereas RDF
Schema defines the vocabulary used in RDF data models.
----------
(p. 68)
URI schemes have been defined not only
for Web locations but also for such diverse objects as telephone numbers,
ISBN numbers, and geographic locations. There has been a long discussion
about the nature of URIs, even touching philosophical questions (for exam-
ple, what is an appropriate unique identifier for a person?), but we will not
go into detail here. In general, we assume that a URI is the identifier of a Web
resource.
3.2.2 Properties
Properties are a special kind of resources; they describe relations between
resources, for example "written by", "age", "title", and so on. Properties in
RDF are also identified by URIs (and in practice by URLs). This idea of using
URIs to identify "things" and the relations between them is quite impor-
tant. This choice gives us in one stroke a global, worldwide, unique naming
scheme. The use of such a scheme greatly reduces the homonym problem
that has plagued distributed data representation until now.
----------
(p. 68)
Literals are atomic values (strings), the
structure of which we do not discuss further.
----------
(p. 68)
We can think of this triple (x, P, y) as a logical formula P(x,y), where the
binary predicate P relates the object x to the object y. In fact, RDF offers only
(p. 69)
binary predicates (properties).
----------
(p. 70)
The descriptions are given in a certain order; in other words, the XML
syntax imposes a serialization. The order of descriptions (or resources) is not
significant according to the abstract model of RDF. This again shows that the
graph model is the real data model of RDF and that XML is just a possible
serial representation of a graph.
----------
(p. 72)
Although the solution is
sound, the problem remains that the original predicate with three arguments
was simpler and more natural.
(p. 73)
Another problem with RDF has to do with the handling of properties. As
mentioned, properties are special kinds of resources. Therefore, properties
themselves can be used as the object in an object-attribute-value triple (state-
ment). While this possibility offers flexibility, it is rather unusual for model-
ing languages, and can be confusing for modelers.
Also, the reification mechanism is quite powerful and appears misplaced
in a simple language like RDF. Making statements about statements intro-
duces a level of complexity that is not necessary for a basic layer of the Se-
mantic Web. Instead, it would have appeared more natural to include it in
the more powerful layers, which provide richer representational capabilities.
Finally, the XML-based syntax of RDF is well suited for machine process-
ing but is not particularly human-friendly.
----------
(p. 73)
And ultimately the
Semantic Web will not be programmed in RDF, but rather with user-friendly
tools that will automatically translate higher representations into RDF.
----------
(p. 87)
On the other hand, this handling of
properties deviates from the standard approach that has emerged in the area
of modeling and object-oriented programming. It is another idiosyncratic
feature of RDF/RDFS.
----------
(p. 93)
3.5.8 Example: Motor Vehicles
Here we present a simple ontology of motor vehicles. The class relationships
are shown in figure 3.7.
----------
(p. 97)
The formal language we use is predicate logic, universally accepted as the
foundation of all (symbolic) knowledge representation. Formulas used in the
formalization are referred to as axioms.
----------
(p. 102)
3.8 A Direct Inference System for RDF and RDFS
As stated, the axiomatic semantics detailed in section 3.7 can be used for
automated reasoning with RDF and RDF Schema. However, it requires a
first-order logic proof system to do so. This is a very heavy requirement and
also one that is unlikely to scale when millions of statements are involved
(e.g., millions of statements of the form Type(?r, ?c).
----------
(p. 105)
The SPARQL Query Language is a W3C Candidate Recommendation for
querying RDF, and as such is fast becoming the standard query language for
this purpose. At the time of writing, almost all major RDF query tools had
begun implementing support for the SPARQL query language. Even though
other query languages (e.g., SeRQL and RQL) have existed longer and
have a more mature implementation base and more expressive feature set,
they typically are supported by only one or two tools, hindering interoper-
ability. We therefore concentrate here on the SPARQL query language.
----------
(p. 109)
o RDF has an XML-based syntax to support syntactic interoperability. XML
and RDF complement each other because RDF supports semantic inter-
operability.
----------
(p. 113)
However, the Web Ontology Working Group of W3C identified a number
of characteristic use cases of the Semantic Web that would require much
more expressiveness than RDF and RDF Schema offer.
----------
(p. 114)
4.2.1 Requirements for Ontology Languages
Ontology languages allow users to write explicit, formal conceptualizations
of domain models. The main requirements are a well-defined syntax, effi-
cient reasoning support, a formal semantics, sufficient expressive power and
convenience of expression.
----------
(p. 114)
Of course, it is questionable whether the XML-based RDF syntax is very
user-friendly; there are alternatives better suited to human users (for exam-
ple, see the OIL syntax). However, this drawback is not very significant
because ultimately users will be developing their own ontologies using au-
thoring tools, or more generally, ontology development tools, instead of writing
them directly in DAML+OIL or OWL.
----------
(p. 115)
A formal semantics and reasoning support are usually provided by map-
ping an ontology language to a known logical formalism, and by using auto-
mated reasoners that already exist for those formalisms. OWL is (partially)
mapped on a description logic, and makes use of existing reasoners such as
FaCt and RACER. Description logics are a subset of predicate logic for which
efficient reasoning support is possible.
----------
(p. 117)
This includes the possibility (also present in
RDF) of changing the meaning of the predefined (RDF or OWL) primitives
by applying the language primitives to each other. For example, in OWL
Full, we could impose a cardinality constraint on the class of all classes, es-
sentially limiting the number of classes that can be described in any ontology.
----------
(p. 117)
The disadvantage of OWL Full is that the lan-
guage has become so powerful as to be undecidable, dashing any hope of
complete (or efficient) reasoning support.
4.3.2 OWL DL
In order to regain computational efficiency, OWL DL (short for Description
Logic) is a sublanguage of OWL Full that restricts how the constructors from
OWL and RDF may be used.
----------
(p. 118)
4.3.3 OWL Lite
An even further restriction limits OWL DL to a subset of the language con-
structors. For example, OWL Lite excludes enumerated classes, disjointedness
statements, and arbitrary cardinality.
----------
(p. 119)
OWL builds on RDF and RDF Schema and uses RDF's XML-based syntax.
Since this is the primary syntax for OWL, we use it here, but RDF/XML does
not provide a very readable syntax. Because of this, other syntactic forms for
OWL have also been defined:
----------
(p. 129)
Unlike typical database systems, OWL does not adopt the unique-names as-
sumption; just because two instances have a different name or ID does not
imply that they are different individuals.
----------
(p. 129)
Because such inequality statements occur frequently, and the required num-
ber of such statements would explode if we wanted to state the inequality of
a large number of individuals, OWL provides a shorthand notation to assert
the pairwise inequality of all individuals in a given list:
(p. 130)
----------
(p. 131)
4.5.2 OWL DL
In order to exploit the formal underpinnings and computational tractability
of Description Logics, the following constraints must be obeyed in an OWL
DL ontology:
----------
(p. 133)
This had led to the defi-
inition of an interesting sublanguage of OWL DL, named OWL DLP. OWL
DLP is not part of the official W3C OWL species layering but is nevertheless
sufficiently interesting to deserve some discussion here.
----------
(p. 134)
Traditionally, systems such as databases and logic-programming systems
have tended to support closed worlds and unique names, whereas know-
ledge representation systems and theorem provers support open worlds and
non-unique names. Ontologies are sometimes in need of one and sometimes
in need of the other. Consequently, discussions can be found in the literature
and on the WWW about whether OWL should be more like a knowledge rep-
resentation system or more like a database system. This debate was nicely
resolved by Volz and Horrocks (see the Grosof, Horrocks, Volz and Decker
item in the further readings), who identified a fragment of OWL called DLP
(Description Logic Programming). This fragment is the largest fragment on
which the choice for CWA and UNA does not matter, see figure 4.3. That is,
OWL DLP is weak enough so that the differences between the choices don't
show up. The advantage of this is that people or applications that wish to
make different choices on these assumptions can still exchange ontologies in
OWL DLP without harm. Of course, as soon as they go outside OWL DLP,
they will notice that they draw different conclusions from the same state-
ments. In other words, they will notice that they disagree on the semantics.
----------
(p. 157)
Knowledge representation had been studied long before the emergence of
the World Wide Web, in the area of artificial intelligence and, before that,
in philosophy. In fact, it can be traced back to ancient Greece; Aristotle is
considered to be the father of logic. Logic is still the foundation of knowledge
representation, particularly in the form of predicate logic (also known as first-
order logic).
----------
(p. 158)
+ Predicate logic is unique in the sense that sound and complete proof sys-
tems do exist. More expressive logics (higher-ordered logics) do not have
such proof systems.
----------
(p. 158)
Another subset of predicate logic with efficient proof systems comprises
the so-called rule systems (also known as Horn logic or definite logic programs).
A rule has the form
A1, ... An -> B
where Ai and B are atomic formulas.
----------
(p. 159)
It is interesting to note that description logics and Horn logic are orthog-
onal in the sense that neither of them is a subset of the other. For example,
it is impossible to express the relation uncleOf(X,Y). This relation requires
the ability to constrain the value of the property brotherOf of one term (X)
to be the value of the property childOf of another term (Y). Stated another
way, the property brotherOf applied to X must produce a result that is also
a value of ChildOf when applied to Y. This "joining" of predicates is be-
yond the expressive capabilities of OWL. On the other hand, this piece of
knowledge can easily be represented using rules:
brother(X,Y), childOf(Z,Y) -> uncle(X,Z)
On the other hand, rules cannot (in the general case) assert (a) nega-
tion/complement of classes; (b) disjunctive information, for instance, that a
person is either a man or a woman; (c) existential quantification, for instance,
that all persons have a father. In contrast, OWL is able to express complement
and union of classes and certain forms of existential quantification.
----------
(p. 162)
In general, all variables occurring in
a rule are implicitly universally quantified.
In summary, a rule r
B1, ..., Bn -> A
is interpreted as the following formula, denoted by pl(r):
----------
(p. 167)
The computation of such most general witnesses is the primary aim of a
proof system, called SLD resolution, the presentation of which is beyond the
scope of this book.
5.5 Description Logic Programs (DLP)
As stated at the beginning of this chapter, Horn logic and description logics
are orthogonal. In attempting to achieve their integration in one framework,
the simplest approach is to consider the intersection of both logics, this is, the
part of one language that can be translated in a semantics-preserving way to
the other language, and vice versa. In our case, the "intersection"
of Horn logic and OWL is called Description Logic Programs (DLP); it is the
Horn-definable part of OWL, or stated another way, the OWL-definable part
of Horn logic.
----------
(p. 179)
o DLP and SWRL are two important ways of combining OWL with Horn
rules. DLP is essentially the intersection of OWL and HOrn logic, whereas
SWRL is a much richer language.
----------
(p. 187)
Elsevier is experimenting with the possibility of providing access to multi-
ple information sources in the area of the life sciences through a single inter-
face, using EMTREE as the single underlying ontology against which all the
vertical information sources are indexed (see figure 6.1).
Semantic Web technology plays multiple roles in this architecture. First,
RDF is used as an interoperability format between heterogeneous data
sources.
----------
(p. 191)
Modern
browsers such as Mozilla Firefox have builtin RSS readers that allow sub-
scribing to RSS feeds attached to Web pages.
----------
(p. 191)
There is also an AJAX-based interface for browsing and searching the
publication collection (see figure 6.3) which builds queries and displays the
results.
----------
(p. 193)
For example, at the Vrije Universiteit Am-
sterdam an RDF front end to the local LDAP database has been developed
that represents its data as FOAF profiles.
----------
(p. 193)
The crawler in openacademia collects the FOAF profiles and publication
files. The crawling can be restricted to a specific domain, which can be useful
for limiting the data collection to the domain of an institute. All data are
subsequently stored in Sesame, an RDF database.
----------
(p. 195)
The query answering
requires publication data as well as information on department membership
and personal data from either the LDAP database or the self-maintained pro-
file of the researcher.
----------
(p. 196)
The first ontology (SWRC) de-
scribes different generic aspects of bibliographic metadata (and would be
valid across many different research domains); the second ontology (ACM
Topic Ontology.) describes specific categories of literature for the computer
science domain.
----------
(p. 197)
Ontologies help to measure the semantic similarity between
the different answers and to remove apparent duplicates as identified by the
similarity function.
----------
(p. 197)
The screenshot in figure 6.6 indicates how the use cases are realized in Bib-
ster.
----------
(p. 201)
This application scenario is now realistic enough that com-
panies like Unicorn (Israel), Ontoprise (Germany), Network Inference (UK),
and others are staking their business interests on this use of Semantic Web
technology.
----------
(p. 211)
SOAP, WSDL, UDDI, and BPEL4LAWS are the standard technology combina-
tioun to build a Web service application. However, they fail to achieve the
goals of automation and interoperability because they require humans in the
loop.
----------
(p. 220)
This section is based on a usecase from the OWL Requirements document
(see Suggested Reading).
----------
(p. 226)
As a consequence, there is not correct ontology of a
specific domain. An ontology is by necessity an abstraction of a particular
domain, and there are always viable alternatives.
----------
(p. 229)
Some ontologies are carefully crafted by a large team of experts over many
years. An example in the medical domain is the cancer ontology from the
National Cancer Institute in the United States. Examples in the cultural
domain are the Art and Architecture Thesaurus (AAT), containing 125,000
terms, and the Union List of Artist Names (ULAN), with 220,000 entries
on artists.
----------
(p. 231)
The general question of importing ontologies and establishing mappings
between different mappings is still wide open, and is considered to be one of
the hardest (and most urgent) Semantic Web research issues.
----------
(p. 241)
Arguably the best current editor is Protege, but
we have also had good experiences with OILed, and OntoEdit.
----------
(p. 247)
Fallacy 2: The Semantic Web requires everybody to subscribe to a
single predefined meaning for the terms they use.
Of course, the meaning of terms cannot be predefined for global use; in addi-
tion, meaning is fluid and contextual. The motto of the Semantic Web is not
the enforcement of a single ontology but rather "let a thousand ontologies
blossom."
----------
ERRATA
(p. 18, 6th paragraph)
At the bottom we find XML
...should be...
Next to the bottom we find XML
[the bottom layer shown is Unicode/URI, not XML]
(pp. 113, 220)
(p. 133) use case
...is inconsistent with...
(p. 220) usecase
(p. 116, 1st bullet)
we cannot say that
...should be...
in RDF Schema we cannot say that
[the scope of the statement is not clear, and is incorrect as a general statement]
(pp. 134, 167)
(p. 134) DLP is defined as "Description Logic Programming"
...which is inconsistent with...
(p. 167) DLP is defined as "Description Logic Programs"
(p. 159, 1st full paragraph)
it is impossible to express the relation
...should be...
in OWL it is impossible to express the relation
[the scope of the statement is not clear, and is incorrect as a general statement]
(p. 196, last paragraph)
(ACM Topic Ontology.)
...should be...
(ACM Topic Ontology)
(p. 225, 1st paragraph)
How can tools and techniques best be appliled?
...should be...
How can tools and techniques best be applied?
(pp. 237, 238)
(p. 238) mentions the query language SPARQL
...which is inconsistent with...
(p. 237) Figure 7.1 shows RQL as the query language