Created: August 27, 2009 Updated: September 26, 2009 QUOTES "Explorer's Guide to the Semantic Web" Thomas B. Passin 2004 Greenwich, CT: Manning Publications Co. ---------- (p. xiii) I found the Semantic Web the hardest of all to get a handle on, because it seemed to range from obvious extensions of what I already knew on the one extreme, to extremely complex and advanced integrations of logic, semantics, and artificial intelligence on the other. I now know that I was not the only one to become bewildered, yet fascinated, by the Semantic Web. ---------- (p. xix) Because Extensible Markup Language (XML) plays a prominent role in some of the important technologies such as RDF, Topic Maps, and OWL, a reading acquaintance with XML will be useful. ---------- (p. xx) Mind-mapping is an informal way to capture ideas and associations between them. ---------- (p. xx) Purchase of Explorer's Guide to the Semantic Web includes free access to a private web forum run by Manning Publications where you can make comments about the book, ask technical questions, and receive help from the author and from other users. To access the forum and subscribe to it, point your web browser to www.manning.com/passin. This page provides information on how to get on the forum once you are registered, what kind of help is available, and the rules of conduct on the forum. ---------- (p. 2) There are many different ideas of what it is, not just one. In this chapter, we examine a range of ideas about what the Semantic Web should be. Some of them may seem futuristic or impractical, but a great deal of work is going on in all the areas we'll examine. ---------- (p. 3) In brief, the Semantic Web is supposed to make data located anywhere on the Web accessible and understandable, both to people and to machines. This is more a vision than a technology. In this book, we'll explore the technologies that will play roles in bringing the vision to life. As you might expect, there are many different ideas about what this general vision encompasses. An almost overwhelming number of different ideas exists about the supposed nature of the Semantic Web, and that's the first lesson to learn: The Semantic Web is a fluid, evolving, informally defined concept rather than an integrated, working system. ---------- (p. 4) o The automated infrastructure view--"In his recent Scientific American article Berners-Lee argues that the Semantic Web is infrastructure and not an application. We couldn't agree more." (Tuttle et al 2001) "Therefore, the real problem is the lack of an easy automation framework in the current Web..." (Garcia and Delgado 2001) ---------- (p. 5) Data about other data is often called meta data. For example, the ISBN number and the author's name are meta data about a novel. ---------- (p. 10) The Web is not the whole Internet, and it would be possible to develop many capabilities of the Semantic Web using other means besides the World Wide Web. But because the Web is so widespread, and because its basic operations are relatively simple, most of the technologies being contem- plate for the Semantic Web are based on the current Web, sometimes with extensions. ---------- (p. 12) Other Internet protocols can be used, and additional messaging lay- ers can be carried over HTTP as well (such as SOAP, whose name no longer stands for anything). There is some controversy over what methods should be used for the Web--as distinct from the Internet, which includes much more than the World Wide Web--and whether the Semantic Web architecture should restrict itself to the simpler architecture of the current Web. ---------- (p. 14) This diagram, sometimes called the "Semantic Web layer cake," has been reproduced often, and our own version of it is depicted in figure 1.1. ---------- (p. 17) However, the idea of meaning is compli- cated and has many levels, and RDF deals with only two of them: assigning prop- erties to things and relating one thing to another. ---------- (p. 18) The Semantic Web is not a cut-and-dried, integrated technology. It's a concept of how computers, people, and the Web can work together more effectively than is possible now. Because it's visionary, it has no one definition. ---------- (p. 21) RDF and similar technologies can represent data and meta data equally well--they make no distinction between them. ---------- (p. 22) To play its role in describing data and meta data, RDF (and other potential lan- guages for the role) needs to include the following capabilities: o Able to describe most kinds of data that will be available o Able to describe the structural design of data sets o Able to describe relationships between bits of data ---------- (p. 22) That, in a nut- shell, is a data model. ---------- (p. 23) A URI reference is a URI plus optional characters, such as the so-called fragment identifier (the part that follows the # sign after a URI, if any). ---------- (p. 25) Anonymous resources are also called blank nodes or b-nodes. ---------- (p. 33) As mentioned in section 2.1.3, such nodes are called anonymous nodes, or sometimes blank nodes or b-nodes. ---------- (p. 37) The detailed instructions are contained in XML files using the XUL (Extensible User interface Language) for- mat. ---------- (p. 86) To the extent that advanced logical reasoning will become important to the Semantic Web (and many people think that it will be highly important), RDF will likely have an advantage over Topic Maps for some time to come. ---------- (p. 95) A Wiki (short for WikiWiki) is a site that lets its members (and, often, any reader) edit its articles and add comments. ---------- (p. 108) Library science knows many specialized distinctions between kinds of informa- tion to search for: inventory control, establishing the provenance of documents (thheir source or origin), locating similar items, finding similar sets (such as dif- ferent editions of the same work), and many more. ---------- (p. 117) The project known as TAP (http:// tap.stanford.edu/), being developed at Stanford University, has evolved a scheme for doing this that dovetails well with the design of RDF. ---------- (p. 118) RDF uses unique URIs as identifiers. But it's also possible to describe something without using its identifier: "my car is the small silver con- vertible parked near the end of the block." ---------- (p. 120) An approach that is frequently used in research papers and is now starting to show up on some search sites is to make a preliminary query using a web site like Google (or more than one site) and then to perform additional analysis. In (Amitay 2000), the work starts by searching for "Albert Einstein" on Google and other search sites. Each hit is analyzed by computer to find the best description of its contents, which are then presented to the user or filtered out. The hits are analyzed by looking for certain structural features in the pages; other research uses different strategies. ---------- (p. 125) Bloggers link to each others' blogs, and rankings showing the blogs with the most links pointing to them are becoming available. ---------- (p. 133) An ontology establishes the things that a system can talk and reason about. This means the vocabulary, but as chapter 7 discusses, there's more to it than just a collection of words and names--the terms have logical relationships to each other that need to be specified, and this in turn means that any ontology system must adopt some variety of logic, either formally or informally. On an abstract level, logic is used to represent knowledge, at least the kind of knowledge that is well defined enough to be clearly stated and discussed. ---------- (p. 133) SOAP is a standard for packaging a message, usually in XML. It's most often used with web services (see chapter 8). ---------- (p. 134) 6.1.6 The representation of knowledge Logic as a formal discipline deals mostly with formal languages that can express a subset of everything that can be articulated using natural languages. The for- mal description of data and information thus naturally involves logic. Many sys- tems of notation have been developed to express logical descriptions. RDF, for one, has a basis in formal logic. Not all languages have the same ability to express information, and RDF's ability is fairly weak, although it's adequate for capturing simple data and meta data about specific subjects. ---------- (p. 134) What is certain is that RDF as it exists now can't express many things that will need to be communicated. Some of these are discussed elsewhere in the chapter, including negation of a statement, quantifi- cation ("for all..."), and modal logic. ---------- (p. 135) The variety of "logics" can seem overwhelming: first-order logic, second-order logic, temporal logic, modal logic, fuzzy logic, description logic, and many more. ---------- (p. 136) 6.2.1 First-order logic: the logic of individual things First-order logic (FOL) lets you make statements about things and collections of things, their types and properties, and to qualify them, describe them, or relate them to other things. Statements like "This ball is red" and "Robert Smith is mar- ried to Jane Smith" are simple examples. FOL doesn't (in its usual flavors) say much about properties or relationships--usually called predicates--themselves (predicates like "red" and "is married to," for example), beyond specifying their types and where they may be used. FOL is generally considered the most significant and most complete logic, because the higher-order logics can be considered extensions of it. FOL is also considered fundamental because (with a bit of extension) it can define all of mathematics. This represents a great deal of expressive power. ---------- (p. 137) Logicians have been developing FOL subsets specialized for classifying things for many years. A certain group of these forms of logic has come to be called Description Logics. (Nardi and Brachman 2003, Description Logics, Horrocks 2002) Since they're designed for classification tasks, they're good for creating ontolo- gies. There are multiple families of Description Logics, representing slightly dif- ferent subsets of FOL. Description logics are carefully designed so that they're amenable to mathematical analysis and so that results can be mathematically proven to be computable. That is a strength. On the other hand, the price of this mathematical tractability comes in the form of limitations on their expressive power--that is, in the kinds of things that can be said using them. ---------- (p. 139) On the other hand, the calculations required to perform some of these rea- soning tasks can take a long time. It's even possible that certain calculations might not ever finish. What's more, we can't always prove whether some calcula- tions will finish, depending on the complexity of the problem. As mentioned in section 6.2.3, by restricting the kind of logical descriptions that are allowed, cal- culations can at least be guaranteed to complete and possibly to complete within known times. Once again a tradeoff exists between computational complexity and the ability to decide on the one hand, versus expressiveness on the other. It's the old tradeoff of risk versus opportunity; only experience will let us learn the proper mix. ---------- (p. 139) Because computer systems can't communicate without sharing a language, it's sometimes argued that universal, world-wide vocabularies will be necessary for the Semantic Web. Otherwise, how could my computer understand yours? His- torically, such universality has happened only a few times, most notably with the World Wide Web and its Universal Resource Locators, Hypertext Transfer Proto- col, and Hypertext Markup Language. ---------- (p. 145) Alternatively, the club might say that a person is a member if they have paid their dues this year. This method specifies a criterion rather than an enumera- tion. A definition that defines the criteria for inclusion in a class is sometimes called its intension (not its intention). ---------- (p. 146) Classes as sets There are many ways to define classes, but most current ontology languages for the Web base them on sets. A class is considered to be a set of individuals that are collected together for the purpose of classification. ---------- (p. 147) The Web Ontology Language (OWL) family of languages, which you'll meet in section 7.4, has a subtly different definition of a class. Instead of having the class contain the instances, it's considered to be associated with them. This slight shift allows various mathematical theorems to be proven that would otherwise be diffi- cult to prove. ---------- (p. 151) 7.3.1 Frameworks Section 7.4 discusses some of the more prominent ontology languages--which might better be called ontology construction languages--that are likely to play a role in the Semantic Web. They aren't specific ontologies but rather provide frameworks for constructing ontologies. Typically, the framework will provide a syntax, a vocabulary, and some predefined terms. You could almost say that an ontology framework is an ontology for constructing ontologies. ---------- (p. 156) A collection of statements or triples can be called a triples store for conven- ience. A collection of triples is also called a graph, because it can be represented as a collection of nodes (the resources and literal values) connected by lines (the properties). ---------- (p. 161) Flavors of OWL OWL comes in three sublanguages or flavors, called OWL Lite, OWL DL, and OWL Full. ---------- (p. 165) In OWL, there are two kinds of properties: object properties and datatype properties. Datatype properties let you make statements about datatypes. ---------- (p. 168) Because DAML is so similar to OWL, we don't illustrate it here. We mention it mainly because DAML-S, a language for web services (see chapter 8), uses DAML. DAML-S is being reworked to use OWL, so at some point DAML will probably fade out of use. At the time of this writing, it's still possible to find interesting experi- mental DAML sites using leads provided by Google. ---------- (p. 172) When you know that you have to put "%20" when you want a space in a URL (and "%22" for a quotation mark), it's easy to see how to ask for any other search. ---------- (p. 174) The Internet isn't just the Web, carrying as it does many other protocols, such as email, FTP (File Transfer Protocol), and instant messaging. The term Web essentially means the combination of HTTP (HyperText Transfer Protocol), URLs for addressing, and hyperlinks for referring to web resources (web pages and so forth that can be accessed using web methods). ---------- (p. 175) (SOAP originally stood for Simple Object Access Protocol, but in the newest version, the name is no longer considered to be an acronym.) ---------- (p. 177) Although most current work on XML web services takes it for granted that SOAP will be used, there is no reason why RDF, let's say, couldn't be used. However, there don't seem to be any viable RDF proposals that would have any hope of supplanting SOAP, especially in the world of commercial web services--certainly not in the short term. Providing data in RDF form would offer advantages, such as established semantics, extensible ontologies, and standard processing tools. The new version of SOAP makes it more practical to encode RDF data right in a SOAP message (Ogbuji 2002). ---------- (p. 180) Because the typical XML web service doesn't use this pattern for interactions between client and server. You might (p. 181) find this surprising, given the success of the Web to date, but the developers of web services saw things differently. They designed XML web services around the remote procedure call (RPC). 8.3.3 Remote procedure calls When SOAP was first developed, it was seen mainly as a way to perform RPCs. Most distributed systems in the past (that is, before the Web) used an RPC style of interaction, and an RPC approach seems natural to many programmers. RPC tries to emulate typical programming techniques--that is to say, function or method calls. A method call is a function call belonging to a programming object. A function is a procedure that computes something. ---------- (p. 187) What is WSDL? WSDL ia an XML-based language that describes key technical data needed for connecting to a service. ---------- (p. 207) Agents have been used in commercial service for several years, although they may not meet all the criteria mentioned above for intelligent, autonomous, dis- tributed agents. Some industrial process controllers have many of the character- istics of agents; but they tend to be closely tied to the exact industrial proccess they run, and they certainly aren't open to relatively uncontrolled environments and the requests of other unknown agents. Still, these agent systems have had to solve many of the same problems as agents on the Semantic Web. ---------- (p. 217) We consider an agent to be a software com- ponent that acts for people with some degree of autonomy. By intelligent, we mean the agent has some awareness of its environment and can adapt its behav- ior to changes in that environment and to the responses of other agents. ---------- (p. 245) If the goal is universal interchange of information, it's often thought that a uni- versal vocabulary will be necessary. However, a universal ontology would suffer from all the problems of a centralized repository and more--presuming that (p. 246) such a thing could be developed. Experience shows that global agreement on a large vocabulary is next to impossible in computer disciplines and in business, just as with natural languages. The next step down from a universal ontology would be a basic set of con- cepts that would be used by all ontologies, sometimes called a Standard Upper Ontology (SUO). It's far from certain whether one SUO, general enough for all applications and adaptable enough to be extended for any and all uses, is feasi- ble. It isn't even known whether people function that way; that is, do people come with an SUO built in? Several efforts are in progress to develop SUOs. A few are available now, but it will probably be years before SUOs stabilize and become generally accepted (if this should ever happen). ---------- (p. 247) Strong AI denoted an approach whereby a system's behavior was established primarily by logical reasoning, usually based on a set of rules. The problem turned out to be that the rules could never cover all situations, and the reasoning was too rigid. As a result, the classic approach to AI was discredited, at least in the opinion of many. With the huge increase of computer power in recent years, the field of logical reasoning has seen a resurgence of sorts. Many researchers involved in develop- ing RDF, OWL, and similar technologies talk about the Semantic Web as if it will be nothing more than a massive exercise in logical reasoning--based on estab- lished ontologies that supply logical relationships and possibly on extensive sets of rules--that gets applied to the vast storehouse of data scattered across the Web. People who remain skeptical of this view sometimes talk about it as being a resurgence of strong AI (the implication is usually rather negative). ---------- (p. 248) In some cases, meta data of interest can be extracted by automated processing. In addition, the structure of a document can imply certain meta data (this approach can be very effective when it's possible). In other cases, it's possible to apply natural language processing (NLP) to textual documents with varying success. ---------- (p. 249) We don't know how to tell a machine to be suspicious of an offered Super Bowl ticket because it seems likely to have been stolen. We can't easily record that information along with the reasons it came to be suspicious. ---------- (p. 249) 11.4 How semantic will it be? Pure mathematics, of which formal logic is commonly considered a branch, deals entirely in symbols and rules for combining them and manipulating them. Geometry, which for most of us is mainly about diagrams, can be done without using geometric diagrams at all, just by using rules and symbols. Computers do little except to manipulate symbols. Semantics, on the other hand, is usually taken to refer to the "meaning" of language. If we consider that there's a distinction between symbol manipulation and meaning, then the Semantic Web could be said to be a web of symbols rather than a web of meaning. The computers and software agents will only be manipu- lating symbols. The Semantic Web, its computers and agents, will manipulate the symbols using formal techniques, and only the people will understand their meaning. For a good summary of this kind of view, see Butler (2003). But in another sense, from an operational point of view, the system could be said to understand the meaning of its data and instructions. If a person or machine takes action appropriate to its environment and internal state, it can be (p. 250) said to understand its situation. This is especially the case when no clear, direct, cause-and-effect relationship exists between the stimulus, the environment, and the result. (It's true that this criterion would apply to an ant making a detour around an obstacle, which normally isn't considered to require much under- standing, but the point remains.) Goal-seeking activity comes primarily from the response to feedback signals that indicate how far the current state is from the desired one. An ordinary room thermostat attempts to maintain a constant room temperature, a simple form of goal-seeking behavior. When the goal, internal behavior, and environment are all complex, then the organism or machine seems to be self-directed, regardless of whether the perception is an illusion. If the goals and rules that affect behav- ior can change during the activity, and change because of that activity, then the illusion of intelligent behavior becomes more compelling. In this sense, the Semantic Web may deserve its name after all. ---------- (p. 250) One of the interesting things about the development of technologies for the Semantic Web is that all of them can be useful in other settings--situations that have nothing to do with the Semantic Web. ---------- (p. 251) Web logging seems to be following the same path as FOAF (the informal Friend of a Friend language) and RSS (the news summary language), but more quickly. ---------- (p. 252) Published Subject Indicators If you need to publish definitions of certain concepts and terms, consider pub- lishing them as Published Subject Indicators (PSIs, as discussed in chapter 3). Each PSI is denoted by a URI; the PSI will be usable by Topic Map processors, and RDF and OWL systems can use the URIs too. Human-readable descriptions located at the URIs will be useful to developers and anyone else you had in mind when you decided to publish them. ---------- (p. 253) URI descriptions If you make up a URI to denote a particular resource, concept, and so on, as you often will when you use RDF and OWL, consider making the URI point to a real web page that contains a description of the item indicated by the URI. In theory, these URIs don't have to point to a real location; but if you follow this conven- tion, it will be easier for everyone to discover what you mean by the URI. ---------- (p. 258) The fol- lowing example of RDF--FOAF data is always published in RDF--was created using the FOAF-a-matic web site (www.ldodds.com/foaf/foaf-a-matic.html). FOAF- a-matic makes it simple to create a starter FOAF page. ---------- (p. 270) (Butler 2003) Butler, Mark. 2003. Is the Semantic Web Hype? www-uk.hpl.hp.com/people/ marbut/isTheSemanticWebHype.pdf. ---------- (p. 270) (Description Logics) Description Logics. http://dl.kr.org. ---------- (p. 271) (Horrocks 2002) Horrocks, Ian. 2002. Description Logic: Axioms and Rules. European Conference on Artificial Intelligence 2002. www.cs.man.ac.uk/~horrocks/Slides/ dagstuhlS070202.pdf ---------- (p. 272) (Nardi and Brachman 2003) Nardi, Daniele, and Ronald Brachman. 2003. An Introduc- tion to Description Logics. In The Description Logic Handbook. Cambridge: Cambridge University Press. www.cs.man.ac.uk/~franconi/dl/course/dlhb/dlhb-01.pdf. ---------- ERRATA (p. 5) the ISBN number and ...should be... the ISBN and [the "N" in "ISBN" already stands for "number"] (p. 247) Strong AI denoted an approach whereby a system's behavior was established primarily by logical reasoning ...should be... Strong methods of AI denoted an approach whereby a system's behavior was established primarily by logical reasoning [author is confusing "strong methods" with "strong AI", which are very different concepts] (p. 270, Butler 2003 reference) www-uk.hpl.hp.com/people/marbut/isTheSemanticWebHype.pdf ...should be... http://web.archive.org/web/20040427115352/http://www.hpl.hp.com/personal/marbut/isTheSemanticWebHype.pdf [the web page has moved to archives]