Created: August 10, 2009 Updated: September 25, 2009 QUOTES "Semantic Web for Dummies" Jeffrey T. Pollock 2009 Hoboken, NJ: Wiley Publishing, Inc. ---------- (p. v) Author's Acknowledgements Semantic Web is a passion for me. Without the inspiration of Tim Berners- Lee, Jim Hendler, Ora Lassila, Deb McGuinness, Ian Horrocks, and others like them, I would not have ever embraced this vision for the future. ---------- (p. 13) Rebirthing Artificial Intelligence The science of artificial intelligence (AI) goes through ups-and-downs in the academic community. In times past, artificial intelligence research has seemed to hold the promise of radical new computers and the keys to new forms of life, but after years of failed promises, the research funding for AI inevitably dries up. This boom-and-bust cycle for AI has repeated itself many times throughout the 1960s, '70s, '80s, and '90s. Now, the boom cycle has come again, largely due to the Semantic Web excitement. New research funding since the late 1990s into the areas of knowledge rep- resentation (KR) and AI for the Web has grown substantially worldwide, with particular growth in Europe and Asia. The Semantic Web has been yet another source of rebirth of AI, and most of the Semantic Web roots go deep into KR and AI problems that originally emerged several decades ago. For academics and researchers, these AI foundations of the Semantic Web are the most interesting and fruitful. ---------- (p. 14) Also in 1999, the Defense Departments of the United States and the European Union (EU) Commission independently opened research topics in the area of intelligent agents. Both the United States and the EU had recognized that in order for software to act more autonomously--without the constant updat- ing by human engineers--the software needed a better data format than XML, relational databases, or the Unified Modeling Language (UML) could pro- vide. So the U.S. Defense Advanced Research Projects Agency (DARPA) cre- ated DAML (DARPA Agent Markup Language), and the EU created OIL (Ontology Inference Layer). These two formats were remarkably similar and were eventually combined to form DAML+OIL, and that finally turned into OWL (Web Ontology Language). ---------- (p. 15) The legacy of artificial intelligence Some folks are savvy enough about the roots of the Semantic Web to trace back core ideas and concepts to their artificial intelligence (AI) legacy. For some, the AI origin of the Semantic Web alone is enough to dismiss the whole thing as an ivory-tower exercise in futility. Originally based in the logical foundations of Semantic Networks and Description Logics (each well-known domains of AI research), most mathematicians and AI researchers see those AI foundations as anachronisms from the 1970s that don't have a place in modern computing. It's true that the Semantic Web formats are grounded in these mathematical foundations that are almost 40 years old, but it's also true that the Semantic Web fundamentally alters these older AI concepts and catapults them into the Web page by making them dependent on URIs (Universal Resource Indicators) and compatible with XML. In fact, this combi- nation of AI roots with Web foundations is what makes the Semantic Web so compelling and so different from other modern software languages. Politics of standards movements Professional software engineers accept that committee-based designs are often the worst of all worlds. Although the W3C does a phenomenal job of avoiding "groupthink" and anti-patterns (common patterns of incorrect solu- tions) in their specifications, the Semantic Web is often rightly criticized as accepting design trade-offs intended to appeal to small minorities. ---------- (p. 18) Garlik is a very successful startup using Semantic Web data aimed at protecting the privacy of its customers and pre- venting identity theft. ---------- (p. 19) Object Management Group (OMG) is the global standards organization that maintains the Unified Modeling Language and other software modeling for- mats that apply to databases, online analytical processing (OLAP), and data warehousing. OMG is also incorporating the Semantic Web into its core speci- fications as a metamodel for many of its core reference models. Finally, OASIS (Organization for the Advancement of Structured Information Standards) is also leveraging the Semantic Web formats in its community for a host of standards that aim to improve data processing for security, data centers, and Web server process definitions. ---------- (p. 22) + Stay organized on the Web and in your Web browser. Try the Adaptive Blue Glue toolbar, which uses Semantic Web metadata to better link your actions and predict what you might want to do next. ---------- (p. 27) Understandably, the corresponding hype about this new phenomenon has pro- duced inflated expectations for Web 2.0 businesses that result in high-profile, high-value acquisitions of iconic Web 2.0 companies like YouTube, Flickr, and (p. 28) MySpace. Others, like Facebook, are still independent despite billion-dollar takeover offers from traditional media companies that would benefit from access to their databases of information about their millions of users. ---------- (p. 30) Instead of software services becoming simply about behavior and programming interfaces, the Web 3.0 and Semantic Web movement are enabling the publication and consumption of the data and data models as services inside cloud computing systems (software applications that are hosted entirely via Web protocols and services). ---------- (p. 30) Network computing, Software as a Service (SaaS) business models, dis- tributed computing applications, and grid computing are all part of this Web 3.0 movement--the data, applications, and processing of software are all becoming virtual, shared, and open as services hosted within clusters of adaptive service clouds. ---------- (p. 31) This new federated (when data is stored and retrieved from different locations during a single query) data Web enables new levels of data integration, portability, and application interoperability, thereby making data as openly accessible and linkable as Web pages. ---------- (p. 31) Some people think that machine intelligence will emerge in an organic fashion, as an outgrowth of communities of intelligent people putting data on the Web (such as with Web 2.0 applications like del.icio.us, Flickr, and Digg) and Semantic Web applications that extract meaning and order from that same data to automate the way people interact with it. Automation and intelligence in the data are key promises that the Semantic Web has yet to fulfill. If that vision comes to fruition in the next ten years, it will be with the aid of other technology areas like natural language processing, machine learning, machine reasoning, and autono- mous software agents. ---------- (p. 34) A company called Adaptive Blue is exploring this path with a product called Glue (www.glue.com). Glue is a browser toolbar (see Figure 2-2) that tells you when you're looking at Web content that your friends have looked at and lets you know what they thought about it. Glue also gives you recommendations about other topics that you might want to check out. You don't have to belong to a social net- work or go to a Web page to get this kind of interaction: It's right there in your browser application. Using such browser smarts is one way that the Web of tomorrow seems smarter than what you use today. ---------- (p. 36) If you're an existing blogger who wants to get started with a semantic blog, you'll very likely need to install an extension to your Web browser. After your extension is installed, visit your favorite blog Web site with that browswer and then begin to write your post. ---------- (p. 37) Wikis are in some ways the defining application of the Web 2.0 movement-- they were the first widespread application of technology that allowed groups to work on the same Web content. But regular Wikis are pretty basic technol- ogy by today's standards: They generally consist of some places for people to type unstructured text and insert uncategorized hyperlinks into a Web page. Although Wikis are pretty good at version control on Web pages, they don't really have a whole lot of smarts built in to them in other ways. ---------- (p. 42) The Semantic Web can enable social network data portability with a format called FOAF (Friend of a Friend). The FOAF format is already widely used by millions of people, and some social networks already allow import and export of FOAF data so that their users can keep and reuse all that data that they upload to the services. If you think about it, this is really quite a leap from the social network literally owning your personal data to taking back control and ownership of your own data on your own terms. This is one small example of how the Semantic Web can help you regain control over your data. ---------- (p. 43) This chapter is a comprehensive examination of top business and chief information officer (CIO) issues with a focus on how the Semantic Web can help. ---------- (p. 54) For the technically astute, think "ontology-based plain old Java objects (POJO) layer without compiled code." In simpler terms, the semantic applica- tion will be capable of substantial evolution without requiring a programmer to rigidly encode the program's execution path in advance. Likewise, because each of these semantic applications will make all their data Web-addressable in an open graph data format, the costs of reusing that data in other systems will be trivial. ---------- (p. 55) Semantic Web directories A semantic directory will be where you go to find something in the business. It will include the regular search capabilities that you've become used to. It will also include a way to read all those LDAP (Lightweight Directory (p. 56) Access Protocol) and Active Directory registries that are strewn about within a typical large global enterprise. ---------- (p. 64) Almost 10 years ago, XML was wrought from SGML as a way to give structure to documents and messages. ---------- (p. 65) Key Semantic Web specifications were commissioned by U.S. and European government agencies in the early 2000s because their defense research scientists knew that RDBMS, UML, and XML technologies could not, by themselves, solve the information challenges of the next century. ---------- (p. 65) + OASIS, which controls many business domain-specific data speci- fications, is adopting RDF and OWL as a core feature in standards for Documents, Data Centers, Security, and Business Process Management. ---------- (p. 66) In short, the Semantic Web can help smash the silos of data that currently cost the enterprise time and money to make interoperable. Start training and planning for it. Talk to your vendors about it now. ---------- (p. 78) Really Simple Syndication (RSS) Really Simple Syndication (RSS) allows Web users to view some of your site's content without actually having to visit your site directly. RSS provides a syndication infrastructure for content to be easily distributed and consumed. RSS is quite popular; in fact, the syndication Web site www.syndic8.com alone currently links to more than 500,000 RSS feeds worldwide. ---------- (p. 230) + Semantic Association for Web Service Description Language (SAWSDL) is a W3C standard for annotating service-oriented architecture Web services with RDF or OWL (or any other ontology) metadata to aid in the simpler discovery of services. ---------- (p. 90) Actually grounded in several old ideas from the artificial intelligence (AI) community dating as far back as the 1950s, the intellectual heritage of the Semantic Web can be traced back to some of the following roots: + Graph systems, network databases, and semantic networks + Frame languages and object-oriented systems + Expert systems, description logic programs, and knowledge representation Far from impractical, these central ideas from AI have been deliberately com- bined with Web architecture technologies like HTTP and the URI (Uniform Resource Identifier) to make AI more practical in today's Web-centric world. ---------- (p. 91) The Gartner Hype Cycle As one of the premier business analyst firms in the software sector, Gartner's voice on technol- ogy trends carries far and is listened to intently. One of Gartner's established ways of defining the maturity of a new technology is to plot its progress on the Gartner Hype Cycle. This Hype Cycle is used to aid in Gartner's analysis of nearly every technology it covers. ---------- (p. 92) As analog "machines" capable of high degrees of pattern recognition and an unparalleled aptitude for guessing, humans can usually make sense of what they find on the Web. Machines, on the other hand, cannot. ---------- (p. 93) Sure, some new programming languages have surfaced--like Ajax, Flash, Ruby on Rails, JSON, and a more liberal use of XML--as shown in Figure 5-1, but these have been incremental improve- ments upon the existing Web platform and haven't fundamentally changed the fact that the Web is driven by documents and pages. ---------- (p. 94) New ways to harness community tagging projects (where groups of people create hierarchies of tags) allow people to build folksonomies, which are vocabu- laries that evolve much like natural language evolves--in small pockets of communities. The term mashup is now a common part of the lingo, used to describe when people reuse other people's content in their own way. ---------- (p. 97) When Ted Codd wrote his seminal paper on relational databases in 1970, there was little certainty that his ideas were valuable. Even his employer, IBM, dragged its feet in implementing his breakthrough concepts. ---------- (p. 102) Object databases Many considered object databases a failure in the past few years. The object database was originally conceived as an alternative to the relational database to become a more natural way of storing data for object-oriented software programs. The benefits were supposed to bring a simpler object mapping to storage and fast pointer-based object retrieval. ---------- (p. 104) Unfortunately, there is no mathematical consistency across object-oriented languages, so it's not possible to create a general-purpose declarative data modeling framework. In simple terms, the classical object database is by defi- nition a data silo unique unto itself and not suitable for any large-scale information management problems. ---------- (p. 105) Some may say that because the Semantic Web is based on Web standards like the URI (Uniform Resource Identifier) it is inherently federated, or geo- graphically distributed. ---------- (p. 110) First of all, XML and its schema language, XSD, are not true data models. They weren't intended to be. They're document models. The difference is in how strong the model semantics are required to be. ---------- (p. 112) Seeing Why Object Orientation Is a Heuristic Object-oriented programming (OOP) is a software programming style that isn't grounded in an underlying mathematical model. Unlike the Semantic Web, which is grounded entirely on a complete mathematical model, the object-oriented heuristics offer an approach toward structuring software programs that is based upon rules of thumb and past experiences. Object- oriented heuristics cover both the structure of the data objects as well as their behavior. ---------- (p. 114) The Semantic Web can't replace any programming language, nor is it intended to replace UML; however, it does provide a more common-sense way to model data than to relay on UML or depend solely on your programming language. Seeing a New Beginning for Artificial Intelligence (AI) Long in the doldrums, the AI winter has lasted decades. The AI winter is a phrase used by software industry insiders to describe the long periods when AI fell out of favor with mainstream software. Although many wish it weren't true, the Semantic Web is indeed built upon certain formalism that emerged from the artificial intelligence community--but so are object-oriented sys- tems, search engines, and relational databases. Nonetheless, it's still hip in some software circles to disavow any AI ancestry once a given technology becomes wildly popular. Factually speaking, the roots of Semantic Web languages lie in both semantic nets (network data models) and in description logics (a type of frame logic that is a decidable subset of first-order logic). Both of these areas of AI fall within the category known as knowledge representation. (p. 115) In the artificial intelligence community, the study of knowledge representa- tion (KR) revolves around finding optimal ways to encode human knowledge in machine-understandable structures. This long-standing area of research has produced many different types of formalisms for encoding knowledge-- several of which are the ancestors of the modern Semantic Web languages RDF and OWL. Even the relational database structure is a type of knowledge representation-- albeit a very restricted type. Historically, the various techniques for representing knowledge in computer systems have been localized and built within silos that had few means to interact with data outside their own system. Prolog programs and large sys- tems like Cyc have typically had to work with data that's held closely to their local format and semantics. Following are the two biggest differences with the Semantic Web that haven't ever happened before in the span of computer science: + The standardization of a formal model theory for data + The intersection of an AI KR language with Web architecture Taken together, the fact that there's a community standing behind the Semantic Web formalisms, and that it's built upon the Web architecture for boundary-less scale of distribution, this represents a breakthrough of sub- stantial proportions beyond what AI has yet achieved. ---------- (p. 117) Metadata is data about data. Now that I have that definition out of the way, what else is there to say? The unfortunate truth about that oft- quoted definition of metadata is that it's so vague that it's all but useless in practice. When a software developer or architect talks about metadata, you have to be aware of the context. You see, the word metadata is so overloaded with different meanings that it can mean many different things. For example, the metadata in a word-processing document is different from the metadata in a document content repository, which is different than the metadata in the word processing software program, which in turn uses Web metadata for publishing the document format, and so on and so forth. You really have to pay attention to precisely what people mean when they use the word meta- data. The real problem with metadata is that it should be a very serious and formal discipline for software development, yet it has become relegated to the trash bin of overused, meaningless catchphrases bandied about in an already jargon-filled industry. Metadata is an important topic to understand because metadata is what the Semantic Web is really all about. Unlike the many kinds of conventional, informal, undisciplined kinds of software metadata that I cover in this chapter, the Semantic Web was designed from the ground up to be about linking and references and model-driven. ---------- (p. 118) Even the most advanced types of metadata--take the Semantic Web meta- data for example--are simply ways of enriching data and information so that it may preserve its meaning outside of its original context. This is why Semantic Web languages are part of a type of artificial intelligence (AI) called knowledge representation (KR). KR is one of the fundamental foundations of the entire AI discipline--the Semantic Web families of KR are just one type of modern KR format. ---------- (p. 119) This kind of distinction between data, information, and knowledge may seem superficial to some or nothing but a semantic game for others, but for many, it's the mark of a funda- mentally different and more powerful layer of metadata. ---------- (p. 119) At the purest level, data exists without a data type and outside of a particular software ---------- (p. 122) For example, here are some typical relationship types from UML (Unified Modeling Language), XML, and OWL (Web Ontology Language) models: + Inheritance/superclass/subclass + Aggregation + Composition + Hierarchy/taxonomy + Unions + Intersection + Disjointedness + Equivalence ---------- (p. 129) Programming frameworks go beyond the basic language features to pre- implement additional features that developers can use to further simplify the construction of complex software applications. Most of the major software providers have implemented their own frame- works, some of which are resold and some of which are freely accessible via open-source arrangements. IBM uses the Eclipse Model Framework (EMF), shown in Figure 6-5, which is the underlying programming model for any Eclipse-based project. Developers can use the EMF core (ECore) objects in their own applications to take advantage of prebuilt features that are avail- able only to programs that use the ECore model. ---------- (p. 138) MOF, depicted in Figure 6-10, is the overarching framework for UML, MDA, and CWM. These modeling advances are significantly beyond the scope of typical data warehouse and business intelligence applications. ---------- (p. 149) The challenge with using probabilistic data representations in practice is that you can't ever be 100-percent certain that your algorithms have found every match you need. For example, when you query using search algorithms such as Google, you see only a very small set of all possible matches to your queries-- you may even see some false matches as well. ---------- (p. 154) Triplify me! RDF has a model framework based on the idea of a triple. A complete RDF triple, or statement, must have the following three parts: + The thing the statement describes + The properties of the thing the statement describes + The values of those properties the statement describes ---------- (p. 155) The basic structure for sentences reacquaints us to the term triple as a gram- mar school concept. When formally speaking about the data specification, the term triple refers to the subject, predicate, and object (in that order) of an RDF statement. Because every RDF statement must have exactly these (p. 156) three items, it's also referred to as an RDF triple or just plain triple. Other terms sometimes used to describe the concept of a triple are facts, assertions, and of course statements. ---------- (p. 157) A collection of RDF triples is commonly referred to as a RDF graph. ---------- (p. 160) What do I mean when I say, "RDF is XML?" What I really mean is that RDF provides an XML syntax for representing RDF graphs. Essentially, RDF is XML (plus more). However, XML is not RDF. ---------- (p. 161) As you can see, this looks a lot like valid XML syntax. It is. And because XML was designed for machines first, humans second, so was RDF. Not as pretty as a picture, but machines like it! ---------- (p. 169) N3 N3 stands for Notation3 and is a shorthand notation for representing RDF graphs. N3 was designed to be easily read by humans, and it isn't an XML- compliant language. ---------- (p. 170) Turtle Turtle is a more verbose subset of N3, and an extension of N-Triples, which I discuss next. The previous N3 example is valid Turtle. Turtle stands for Terse RDF Triple Language. This particular serialization is popular among develop- ers of the Semantic Web. Consequently, many tools are available to support this format. Turtle files typically have a .ttl extension. ---------- (p. 171) Some people would say that microformats and RDF stand on opposite sides of the format spectrum: Microformats can be small and loose, whereas RDF is a little heavy and can be verbose. Although RDF itself is a framework, not just a format, many pundits can't resist comparing the two data languages. ---------- (p. 171) Microformats are a collection of formats (tags) for embedding document metadata within Web pages, XHTML, and HTML. Their ability to be embed- ded in HTML is seen by some as a major advantage over plain RDF. Later in this section, I show you how eRDF and RDFa allow you to achieve the same results. For now, I take a quick look at a few points regarding microformats to help differentiate between the two: Microformats + Were designed for humans first, machines second + Solve a specific problem + Reuse building blocks from widely adopted standards + Are a way of thinking about data + Are NOT a new language + Are NOT infinitely extensible and open-ended + Are NOT a panacea for all taxonomies and ontologies ---------- (p. 179) RDF as a language is truly the foundation for the Semantic Web, but it is still only a small part of the total Semantic Web vision. Chapter 8 supplies the definitive primer for the Web Ontology Language, and Chapter 9 explains more details about several proposals still in progress, including the use of business rules as part of the Semantic Web. ---------- (p. 183) Either of the two OWL models would be interpreted exactly the same by an OWL inference engine (referred to as a reasoner). ---------- (p. 184) Discovering the Various Species of OWL The "species" of OWL, as I refer to them (the W3C calls them sublanguages), are specific versions of the OWL 1.0 language that are optimized for unique purposes and are distinguished by the language expressivity of the allowable axioms and constructors used in the OWL model. The DL in OWL-DL stands for description logics--a family of knowledge representation languages that have historically been developed in the artificial intelligence community. ---------- (p. 185) Exploring the Foundations of OWL OWL's foundation rests on a family of knowledge representation languages called description logics (DL). DL allows you to describe concepts and logic- based semantics for a particular domain in a formal, well-structured way. DL is based on first-order predicate logic, which is a deductive reasoning system with foundations in mathematics. This means that, with OWL, you can express facts and rely on a proven query foundation based on mathematics to discover the implications of those facts. In a general sort of way, you can think of DL as a more powerful type of relational algebra that enables us to develop more pow- erful databases. That's why OWL databases are usually called knowledgebases-- because they allow more expression and dynamism than regular databases. Open-world assumption The open-world assumption (OWA) is a monumental, cannot-be-exaggerated difference between Semantic Web data languages and regular relational data- bases. OWA is an assumption made in more formal logic systems that often confuses even the most seasoned ontologist. To explain the OWA, it helps to first explain its opposite--the closed-world assumption (CWA). The CWA is an assumption that states that any statement that is not know to be true is false. ---------- (p. 187) Individuals represent physical or virtual concepts the ontology is describing. ---------- (p. 203) In other words, if two individuals are asserted to be disjoint, the OWL reasoner will always conclude that those instances are provably not equivalent. ---------- (p. 204) Human thinkers typically follow the UNA-- especially where it comes to solving riddles. Here's a riddle that provides a clear metaphor. Two sons and two fathers went to a pizza res- taurant. They ordered three pizzas. When they came out, everyone had a whole pizza. How can that be? Most people would assume that there were four people who entered the pizza restaurant, "two sons and two fathers," and focus on the word whole as some attempt to trick them. But in fact, there were three people: a grandfather, a father, and a son. Remember the absence of the Unique Name Assumption when you're using OWL. ---------- (p. 210) Every object property has an implied inverse. For example, partOf is a natural choice for the inverse of assembledFrom. But it is implied or unnamed. In other words, the subjects of partOf are the set of all objects of assembledFrom. A Venn diagram in Figure 8-13 illustrates this point. ---------- (p. 225) For technical purists, RDF and OWL are the heart and soul of the Semantic Web. However, several other "neighboring" technologies may not be considered as core to the Semantic Web, but are not doubt essential to its success. Natural language processing (NLP) technology, business rule languages, and vari- ous data vocabularies built with RDF/OWL may all be instrumental to the long-term success of the Semantic Web despite the fact that many people do not consider them a central feature of the core technologies. These various Semantic Web enablers are the topic of this chapter. ---------- (p. 225) Revisiting the Semantic Web Stack The defining picture of the Semantic Web is sometimes called the "layer cake." The logical architecture diagram in Figure 9-1 is the visual depiction of how the core technologies of the Semantic Web should fit together. In practice the technology represented by each of these individual architec- ture layers is in a different state of maturity. Figure 9-2 shows which technol- ogies are highly mature, mostly mature, and still immature. ---------- (p. 226) One point that may not be obvious to a casual observer is that the tech- nologies described in Figures 9-1 and 9-2 are not nearly enough to write an entire software application. In fact, to put it into context, the entire family of Semantic Web languages is only capable of replacing some of the data definition aspects of conventional object-oriented programming language and relational databases. To put it bluntly, there is not such thing as a "pure" Semantic Web application: There will always be some sort of procedural application code required to surface the Semantic Web data into regular soft- ware applications. ---------- (p. 227) XML is a hotly debated topic in the Semantic Web community because the first versions of the RDF and OWL specifications were encoded exclusively in XML, but the inelegance of XML for encoding has prompted a movement to enable Semantic Web languages encoding in other formats. A few of those alternative formats, like N3, Turtle, and N-Triples, are described in Chapter 7. ---------- (p. 228) RIF and SWRL The Rule Interchange Format (RIF) is a Working Group (an approved action committee) within the W3C. ---------- (p. 228) By far, the most widely deployed focus area for business rules are production rule systems. ---------- (p. 229) Software frameworks (open source, or from commercial vendors) that supply all the component parts of a Semantic Web framework in a single collection exist, and they've each implemented their own unifying logic to make every- thing work together, but each of those software frameworks do the unifica- tion in a different way. Thus, although the RDF and OWL remain standard and portable, the implementation of the application does not. ---------- (p. 232) Developing Easy RDF Models Say that you understand RDF, OWL, and SPARQL, and you think this stuff is the best thing since you learned how to upload photos on Facebook. But if you've honestly and truly been following along with the technical examples, you probably realize that you would never want to put up with the hassle of creating RDF, OWL, or SPARQL by hand-coding it into your favorite text editor while having to cut and past from a spreadsheet containing your busi- ness data in another window. So how can you easily create RDF? ---------- (p. 233) Although Protege is most widely used in the academic community, its fully featured support for OWL and RDF is garnering it a wider following in com- mercial enterprises as well. Because it's free, Protege may well continue to be a leading ontology editor. The source code is also freely available under the open-source Mozilla Public License (MPL). ---------- (p. 234) TopQuadrant is a long-time pioneer in the Semantic Web field. Traditionally focused on consulting engagements, the company's shift toward software products started with the very successful TopBraid Composer, shown in Figure 9-5. The Composer tool comes in multiple editions and is more than just a modeling tool: It's like a toolbox for developing complete Semantic Web applications. Beyond the class modeling, data modeling, SPARQL queries, and source code editing, the Composer tool also enables data source map- pings, geography mapping, form generation, scripting, and various conver- sion utilities for XML and e-mail messages. ---------- (p. 245) The newer OWL 1.1 specification has begun to define fragments of OWL logics that can be safely used as self-contained entailment levels, with well- defined consequences for moving from one level to the next. The following list describes a few of the more commonplace entailment levels that are com- monly used today: ---------- (p. 271) As seen in Figure 11-1, business appli- cations may sometimes be integrated at the database tier, the logic tier, or sometimes even the interface tier (not shown). ---------- (p. 275) If you need to brush up on your SOA fundamentals, you can find out all you need to know about from the recently updated 2nd edition of Service Oriented Architecture For Dummies (Wiley). ---------- (p. 275) Although these highly dynamic use cases aren't for every business, some companies that depend on close operations with partners can use this Semantic Web extension to BPEL as a way to be more flexible and dynamic. ---------- (p. 280) Governance is one of the most catchy, overused, and ill-defined buzz words in enterprise software. Depending on who you talk to, it could mean some- thing as trivial as making sure you have a strong password, or something as all-encompassing as surviving a Sarbanes-Oxley audit by the government. Governance is big business today, but mostly for professional services organizations that supply auditors and technical staff to help shore-up and stabilize the enterprise computing environment. In fact, governance is a broad collection of management, security, and audit processes that span many different kinds of IT systems. ---------- (p. 285) The main substantial limitation to this approach is that the OWL knowledge- base does not and cannot ever scale to the levels of a relational database. Both in terms of query speed and in amount of data, the OWL knowledge- base is always behind a comparable relational database. ---------- (p. 291) Eli Lilly: Targeted drug assessment ---------- (p. 291) At Eli Lilly, the Semantic Web is used to extend the capabilities of the Target Assessment Tool (TAT). Scientists and researchers use TAT to evaluate can- didate drugs in light of scientific and business requirements. ---------- (p. 295) But then the reality of the Semantic Web sinks in--its Achilles heel and main weakness has always been scalability. Scalability means different things to different people, but for the purposes of discussing Semantic Web architec- tures, scalability questions are typically about the following: + How much data the system can take + How expressive the reasoning on the data can be + How fast the system can calculate the newly inferred data ---------- (p. 301) The Sesame project and Hewlett Packard's Jena software are popular frameworks that employ this approach. ---------- (p. 302) Another recent development has been the experimentation with distributed B+tree systems like Google BigTable, Yahoo! Hadoop, and speciality Semantic Web implementations like the open-source projects called BigData and Mulgara. ---------- (p. 304) + Tableau reasoning system: Another common OWL reasoning technology is based on the tableau system. A tableau reasoning system applies infer- ences within datasets that are kept consistent as part of its core opera- tions. Thus the tableau reasoning system can guarantee computational correctness, but it trades efficiency, especially on smaller datasets. ---------- (p. 313) Imagine a battlefield situation where a single application running on a laptop in a tent needs to be capable of run- ning effectively with no network access, yet automatically connect to and use data from other nearby command centers (tanks, planes, boats) as well as data coming all the way from Washington, D.C., via a Global Information Grid (GIG). ---------- (p. 316) Many business use cases can benefit from seeing uncertain data, and Semantic Web technology gives those businesses a more comprehensive set of tools to work with. ---------- (p. 317) However, if your application's core functionality is depen- dent on these NLP engines, approach the Semantic Web cautiously and delib- erately because these technologies inject additional technical risks of failure into your project. ---------- (p. 319) Google and others are keeping a close eye on the Semantic Web evolution and won't be blindsided by a new startup that ruins their business. If you're looking for the next software juggernaut, don't look in the search industry! ---------- (p. 319) But a realistic assessment of your first project with the Semantic Web should start much smaller. Why confront the many limitations of scale (see Chapter 12) if you don't have to? Start small, act fast, and build a system that can grow with you over time! ---------- (p. 319) Blue Ocean Strategies A Blue Ocean Strategy is defined in the book of the same name by Chan and Mauborgne. Essentially, this is the idea that, in a particular market, you're either competing in a crowded marketplace where products become commodities and growth is increasingly difficult over time, or you're com- peting in new industries that are largely untainted by competition. The Blue Ocean is where demand is created and the rules of the marketplace have not yet been defined. The Red Ocean is where competition is cutthroat and the foundations have essentially viewed themselves as Blue Ocean innovators, producing software that is fundamentally a new way of doing things that dra- matically disrupts the old ways. ---------- (p. 323) Systems with a need for exceptionally robust and flexible levels of data secu- rity shouldn't be considering RDF/OWL-based systems. As of 2009, there are very few widely deployed RDF/OWL platforms that can compare with built-in data level security features of most relational databases. ---------- (p. 330) + Semantic Web Rule Language (SWRL): Not yet approved, this language proposal is part of the Rule Interchange Format Working Group at the Semantic Web. SWRL is a working draft of a rule language that offers more complex and powerful rule extensions to OWL. It's proposed in such a way that it can leverage OWL classes and individuals within rule definitions. ---------- (p. 331) Building a complete Semantic Web solution requires you to use non-standard technology. Even if you make every effort to use standards whenever pos- sible, there are many different ways to use the Semantic Web languages that would leave your system incompatible with other Semantic Web applications. For example, your application would still require procedural programs like Java or C++ to make your system executable. The way you choose to imple- ment the logic in your application is precisely the decision that determines how standarized and portable your solution is. ---------- (p. 333) On any given week, you might be able to find a few hundred open positions in the United States and Canada for Semantic Web skills like RDF, OWL, and graph data modeling, but there aren't enough experienced developers to meet the demand. Your project is competing with many other projects for the developers who already have hands-on experience. Of course, any experi- enced software developer can learn RDF/OWL in a fairly short period, but the experiences of using these languages on a real project are priceless. ---------- (p. 340) Twine is an interest networking Web site designed to let people share links, comments, files, and more about topics they're interested in. When Twine launched as a beta, it mostly attracted people involved with the Semantic Web. But since then, the diversity of people on Twine has grown rapidly, and now a quick look at the Top 100 Twines (interest categories) show interests as diverse as green business and investing, science discoveries, geopolitics, sustainable living, and thousands more. ---------- (p. 349) The technology at the heart of TripIt is the Itinerator, which is TripIt's patent-pending and proprietary Semantic Web technology for automatically creating itineraries from travel confirmation e-mails. ---------- (p. 352) Long on the forefront of new technology, the BBC (British Broadcasting Corporation) is no slouch when it comes to using advanced software technol- ogies. The BBC was on the first wave of Web 2.0 technology, and it also is an early adopter of new communication mediums like Twitter. It shouldn't come as any surprise then, that it's also pushing the limits of Semantic Web in the online world of BBC.com. One of the first forays into the Semantic Web by the BBC was rolled out in order to provide direct access to the actual data backing BBC content and programs. ---------- (p. 363) The net effect is that TopBraid is flexible enough to be used as a content management system and wiki. ---------- (p. 369) In this case, the facts are that the Semantic Web is still in need of a killer app, and the media wishes this killer app would be in the search engine space. ---------- (p. 371) Even if you wanted to think narrowly about the Semantic Web as a "catalog framework," you would have to conclude that it was the most powerful catalog framework ever conceived. ---------- (p. 373) Clay Shirky is a regarded author of many forward-thinking technology works, and a vocal skeptic of the Semantic Web. In one particular article he wrote back in 2003, he attempts to trivialize the foundation logic of OWL-- description logics. Well, Shirky mistakenly assumes too much of the role for OWL in the Semantic Web. ---------- (p. 374) The Semantic Web Is Artificial Intelligence (Again) At the beginning of the 21st century, AI was still a bad word. An AI winter had long iced-over the prospects for artificial intelligence to revolutionize computing. At various points throughout the history of AI research, the media has turned against it, and the funding ran dry. So to call the Semantic Web just another AI technology is to insult the technology and dismiss it as an abject failure. This particular assertion--that the Semantic Web is artificial intelligence-- is true. However, the underlying premise that AI is bad is actually a myth worth debunking. Artificial intelligence is a term coined in 1956, and it refers to the creation of intelligent machines. The AI field of research is broad and deep, encompassing areas from speech understanding to the encoding of human knowledge and brain simulation. Several spectacular failures through the years have contributed to the widely held perception that AI as a whole is a failure, such as in the areas of speech understanding, machine translation, and expert systems. Compounding this perception of failure, the media has widely promoted some few successes that seem trivial in the big picture. IBM's Deep Blue beating Gary Kasparov at chess was a substantial feat, but understanding underwhelming in comparison to all that was promised from AI as a whole. Nearly all modern software technology like object-oriented systems, business rule engines, relational databases, modern machine code compilers, and countless other algorithms and solution patterns have made their way from the realm of AI science fiction to become workplace science fact. Industries like financial services, life sciences, pharmaceuticals, manufactur- ing, and retail are all dependent on AI technology for the very core of their operations. So what if the Semantic Web is AI? Lots of cool stuff was AI, and lots of tech- nology that made people very rich was AI. Maybe when the Semantic Web goes entirely mainstream, everyone will forget this pesky little detail and just wallow in the glory of Web 3.0. ---------- (p. 378) Things got a little better with RSS (another RDF Semantic Web application, albeit a simple one) because you can now subscribe to a set of feeds and have them appear in a particular place. ---------- (p. 380) For evidence, you need only look to the Linking Open Data project hosted by the W3C, where hundreds of organizations are placing their RDF and OWL data on the Web and making it interoperable with the basic standards for linking open data. Projects like DBpedia and Freebase look to organize the world's content into RDF browse-able formats and place the data on a cloud of servers (such as Amazon's A9) for you to make your own Semantic Web application. ---------- (p. 381) The area of knowledge representation (KR) is clearly the core of the Semantic Web, and in that area, there's still a ways to go to reach its fullest potential. ---------- (p. 386) Why not try a simple answer like, "The Semantic Web is a new computer language for describing all the knowledge that people could ever save in books or computers. It lets programmers connect facts and ideas that would otherwise be located in all sorts of different places, making it much easier for people to find things they need even though there is so much information in the world"? That may not be the best definition of the Semantic web, but it might be one that your grandmother could understand and appreciate. If you have a technologically savvy grandmother and she asks you, "Isn't that what Google is for?" you can reply, "Sort of, but Google just helps people find words in documents, whereas the Semantic Web helps people find ideas and concepts in any kind of data." If your grandmother is very curious and she asks you what the difference is between finding words and finding ideas, just buy her a copy of this book! ---------- (p. 388) If fact, if you're really serious about building cool applications that require data from all sorts of different places, Calais, shown in Figure 18-1, may be your killer app. ---------- (p. 388) The fourth release of Calais goes beyond the ability to extract semantic data from your content to link that extracted semantic data to datasets from dozens of other information sources such as Wikipedia, Freebase, and the (p. 389) CIA World Fact Book. ---------- (p. 389) Read Up on RDF and OWL Modeling or Attend Training This book is a broad and comprehensive look at the Semantic Web, but it isn't a deep treatise on how to code with RDF and OWL or how to apply best practices for ontology modeling. A book that I've found quite useful for hands-on projects is Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL by Dean Allemang and James Hendler (published by Morgan Kaufmann). ---------- (p. 391) Many Semantic Web businesses started just this way, with a few notes and a picture on the back of a napkin. ---------- (p. 392) Most people make a judgment about your idea within the first few seconds of your pitch. If you don't pass this critical "sniff test," you may not get the chance to try again! Ask Zepheira You've probably never heard of Zepheira before: It is a niche consultancy with a disproportionately large big brain trust. Its key partners are long-time leaders of the Semantic Web standards and veteran entrepreneurs who have each seen several Semantic Web startup businesses come and go. With the lessons learned from successes and failures, it may provide the critical input you need to succeed the first time around. ---------- Summary of "Ten Next Steps to Take from Here" (Chapter 18) 1. Try Twine 2. Explore Yahoo! SearchMonkey 3. Check Out Calais 4. Read Up on RDF and OWL Modeling or Attend Training 5. Read the RDF and OWL Specifications 6. Contact Your Trusted Vendors 7. Write Down and Assess New Ideas 8. Ask Zepheira 9. Prototype Using Open-Source and Free Software 10. Sell Your Boss on the Idea! ---------- ERRATA (p. 93, last paragraph) some new programming languages have surfaced--like Ajax, Flash, ...should be... some new programming languages and techniques have surfaced--like Ajax, Flash, [Ajax is not a language] (p. 118, last paragraph) Semantic Web languages are part of a type of artificial intelligence ...should be... Semantic Web languages are part of a branch of artificial intelligence [languages are not a type of AI] (p. 157, 1st unboxed paragraph) a RDF graph ...should be... an RDF graph (p. 191, last paragraph) the subject and the object the triple are both individuals ...should be... the subject and the object in the triple are both individuals (p. 259, 6th bullet) can working with the raw formats ...should be... can work with the raw formats (p. 267, 4th paragraph) have one or more ECM solution ...should be... have one or more ECM solutions (p. 275, 1st incomplete paragraph) you can find out all you need to know about from ...should be... you can find out all you need to know about them from (p. 303, 1st paragraph) can support billions of triple in main memory ...should be... can support billions of triples in main memory (p. 340, 2nd paragraph) a quick look at the Top 100 Twines (interest categories) show interests ...should be... a quick look at the Top 100 Twines (interest categories) shows interests (p. 373, 3rd paragraph) Shirky mistakenly assumes too much of the role for OWL ...should be... Shirky mistakenly assumes too little of the role for OWL [Shirky believes OWL is only for syllogisms] (p. 384, last paragraph) but also sells coupons a la the Semantic Web ...should be... but also sells coupons à la the Semantic Web