Tartalmi kivonat
Digital Enterprise Research Institute www.deriie Source: http://www.doksinet Produce and Consume Linked Data with Drupal! Stéphane Corlosquet, Renaud Delbru, Tim Clark, Axel Polleres and Stefan Decker ISWC 2009 scorlosquet@gmail.com DERI NUI Galway, MGH October 27th, 2009 Copyright 2009 Digital Enterprise Research Institute. All rights reserved Chapter 1 Source: http://www.doksinet Loads of Data on the Web in CMS. Digital Enterprise Research Institute www.deriie 2 Source: http://www.doksinet Some Motivations. Digital Enterprise Research Institute www.deriie Status of the current web Data contained in millions of documents Disparate platforms and systems Wide range of topics (personal blogs, news, etc.) Various types of resources (text, pictures, video, etc.) Note: Lots of Structured data in Content Management Systems Problem Not possible to reuse this data outside the CMS (except RSS) Not available as unified machine readable format
3 Source: http://www.doksinet So, here’s our idea of CMS: Digital Enterprise Research Institute www.deriie PROJECT BLOGS DBLP SPARQL endpoint REMOTE DRUPAL SITE SPARQL endpoint Tim . SELECT ?name ?title WHERE { ?person foaf:made ?pub. ?person rdfs:label ?name. ?pub dc:title ?title. FILTER regex(?title, "knowledge", "i") } SPARQL endpoint Figure 3.5: Extended example in a typical Linked Data eco-system 4 Source: http://www.doksinet Approach Digital Enterprise Research Institute www.deriie Our Goal Integrate "any" CMS site to the Web of Data A challenging task Little incentive for users to annotate their data manually Site owners do not have the resources to convert their data to RDF Per-site schema: each site is different and its structure cannot be predefined Solutions Expose the CMS site structure in a unified format AUTOMATICALLY! Use Semantic Web standards (RDFa, SPARQL) 5 Source:
http://www.doksinet Approach Digital Enterprise Research Institute www.deriie Implementation in Drupal Why? One of the most popular CMS out there Modules to take the burden off the site users What our modules allow: 1. Automatic site vocabulary generation 2. Mapping Content Models to existing ontologies 3. Data endpoint for SPARQL querying 4. Lazy loading of external data (data import) 6 Source: http://www.doksinet Pre-Existing work Digital Enterprise Research Institute www.deriie “Semantic Content Management Systems” Ontology-based CMS: – Semantic community Web portals (2000) – OntoWebber: Model-Driven Ontology-Based Web Site Management (2001) Our approach is reverse: from existing CMS structure to ontologies 7 Source: http://www.doksinet The Drupal CMS Digital Enterprise Research Institute www.deriie Drupal* Easy to use Large community Popular on the Web Hundreds of thousands of sites
Modular design Drupal site workflow Site administrator: set up the site and install modules they like/need Site editors: create the content of the site following the schema defined by the site administrator * http://drupal.org/ 8 Source: http://www.doksinet Drupal: Content Construction Kit Digital Enterprise Research Institute www.deriie Content Construction Kit (CCK) module GUI for extending the internal schema of a Drupal site Used on many Drupal sites Can build new types of pages, known as content types Can create fields for each content types. Fields can be of various types: plain text fields, dates, email addresses, file uploads, reference to other pages 9 Source: http://www.doksinet Drupal: Content Construction Kit Digital Enterprise Research Institute www.deriie Demo use case: project blogs site* Community site Various content: – People – Organizations – Projects – Blogs PROJECT BLOGS DBLP SPARQL endpoint
SPARQL endpoint SELECT ?name ?title WHERE { ?person foaf:made ?pub. ?person rdfs:label ?name. ?pub dc:title ?title. FILTER regex(?title, "knowledge", "i") } REMOTE DRUPAL SITE Tim . SPARQL endpoint Figure 3.5: Extended example in a typical Linked Data eco-system * http://drupal.deriie/projectblogs/ one for bridging the DBLP SPARQL endpoint to the project blogs website, and a second for bridging the Science Collaboration Framework website. When visiting Tim’s profile page, the relevant publication information will be fetched from both DBLP and SCF websites, and either new nodes will be created on the site or older ones will be updated if necessary. 10 3.4 Neologism: Easy RDFS vocabulary publishing Neologism11 is a web-based vocabulary editor and publishing platform designed to Source: http://www.doksinet Drupal: Content Construction Kit Digital Enterprise Research Institute www.deriie CCK User Interface 11 Source: http://www.doksinet Drupal:
Content Construction Kit The fields form for the Person content type is displayed on Figure 2.11 This form Digital Enterprise Research Institute llows to easily reorder the fields by a “drag and drop” technique, add new fields, www.deriie emove existing fields or access the configuration form for a field. CCK User Interface Figure 2.12: Defining constraints on 12the gender field in Drupal’s CCK. Source: http://www.doksinet Figures 2.9, 210, 211 and 2.12 show the typical look and feelKit of a Drupal page and Drupal: Content Construction administrative interface for the Person content type, without our extensions installed. Digital Enterprise Research Institute www.deriie This content type offers fields such as name, homepage, email, colleagues, blog url, current past projects, publications, contributions. project, CCK User Interface Figure 2.9: User profile page built with Drupal’s CCK 13 An example of node (page) of the type Person is depicted on Figure 2.9 where
all Source: http://www.doksinet What do we add? Digital Enterprise Research Institute www.deriie 1, 2 14 Source: http://www.doksinet 1. Site Vocabulary Digital Enterprise Research Institute www.deriie Automatic site vocabulary in RDFS/OWL from CCK Describes the content types and fields Content type <=> RDF class Field <=> RDF property RDFa output on site http://siteurl/ns# 15 Source: http://www.doksinet 1. Site Vocabulary Digital Enterprise Research Institute www.deriie Automatic site vocabulary in RDFS/OWL Field constraints Example with cardinalities: – the name of a Person is required – max. 5 projects per person 16 Search examples are shown in Figure 3.2 Details on improving the ran algorithm can be found into[45]. 2.search Mapping Content Models existing ontologies Source: http://www.doksinet Digital Enterprise Research Institute 3.23 Mapping process Mapping Content Models to Existing Ontologies
www.deriie The terms suggested by both of the import service and the ontology search Import of any vocabulary published online be mapped to each content type and their fields. For mapping content ty External ontology search service choose among the classes of the imported ontologies and for fields, one Localthe terms are subclasses/subproperties of public among properties. The local terms will be linked withterms rdfs:subCl rdfs:subPropertyOf statements, e.g site:Person rdfs:subClassOf foaf:Person to the mapped site vocabulary; wherever a mapping is definined, extra triples using the m are exposed in the RDFa of the page. Ensure “safe” vocabulary re-use: we allow inverse reuse of existing properties. Eg, ass –Additionally, only subclassing/subproperty avoids “redefinition” administrator imports amight vocabulary ex: that defines astill, relation between C – adding cardinalities introduce inconsistencies possible to in the user gionsavoid and goods that
interface this region/coutry produces via the property ex:prod user interface also allows to relate fields to the inverse of imported proper stance, the origin field could be related to ex:produces in such an inve resulting in 17 site:origin rdfs:subPropertyOf Source: http://www.doksinet 2. Mapping Content Models to existing ontologies Digital Enterprise Research Institute www.deriie RDF mappings page 18 Figure 3.2: RDF mappings management through the Dru Source: http://www.doksinet 2. Mapping Content Models to existing ontologies Digital Enterprise Research Institute RDF mappings page 19 agement through the Drupal interface: RDF class map- www.deriie Source: http://www.doksinet What do we add? Digital Enterprise Research Institute 3 www.deriie 1, 2 20 Source: http://www.doksinet 3. Data endpoint for complex querying Digital Enterprise Research Institute www.deriie Local RDF data exposed in a SPARQL endpoint Enables interoperability across
sites Built on the PHP ARC2 library All RDF data indexed in the endpoint Each page stored as graph and kept up to date 21 Figure 3.6: A list of SPARQL results (left) and an RDF SPARQL Proxy Source: http://www.doksinet 3. Data endpoint for complex querying Digital Enterprise Research Institute www.deriie Local RDF data exposed in a SPARQL endpoint enable interoperability across sites built on the PHP ARC2 library all RDF data indexed in the endpoint Each page stored as graph and kept up to date 22 Source: http://www.doksinet What do we add? Digital Enterprise Research Institute www.deriie 4 3 1, 2 23 Source: http://www.doksinet 4. Lazy loading of external data Digital Enterprise Research Institute Lazy loading (caching) of distant RDF resources Enables interoperability across sites Built on the PHP ARC2 library CONSTRUCT query to map distant schema to local schema A list of SPARQL results (left) and an24RDF SPARQL
Proxy profile form www.deriie Source: http://www.doksinet 4. Lazy loading of external data Digital Enterprise Research Institute www.deriie Lazy loading of distant RDF resources 25 Source: http://www.doksinet Digital Enterprise Research Institute www.deriie Where is it used? 26 Source: http://www.doksinet Science Collaboration Framework Digital Enterprise Research Institute www.deriie Web application toolkit based on Drupal Enables online scientific collaboration – publishing, annotating, sharing and discussing any content – articles, papers, reviews, perspectives, interviews, news, biographies – profile information on community members Targets biomedecine communities, but generic in essence Networked sites producing Linked Data 27 Source: http://www.doksinet SCF collaborating sites Digital Enterprise Research Institute www.deriie Stembook (Stem Cell articles and reviews) – http://www.stembookorg/ 28 Source:
http://www.doksinet SCF collaborating sites Digital Enterprise Research Institute www.deriie Michael J Fox Foundation (Parkinson disease) – http://www.pdonlineresearchorg/ 29 Source: http://www.doksinet Digital Enterprise Research Institute www.deriie Conclusion 30 Source: http://www.doksinet Conclusion Digital Enterprise Research Institute www.deriie Structure of CMS sites contain valuable schema information Our suggested “workflow”: site vocabulary from the local structure (RDF CCK) enables out-of-the-box RDF export: expose your Drupal site to the Web of Data without any additional effort from site admin or content editors (RDF CCK) mapping to existing RDF vocabularies improves integration in the LOD cloud (evoc) SPARQL endpoint Lazy loading of RDF resources (RDF Proxy) 31 Source: http://www.doksinet Conclusion Digital Enterprise Research Institute www.deriie Drupal 6 modules available for download –
http://drupal.org/project/rdfcck – http://drupal.org/project/evoc – http://drupal.org/project/sparql ep – http://drupal.org/project/rdfproxy Online prototype – http://drupal.deriie/projectblogs/ 32 Source: http://www.doksinet Good news from Drupal 7: Digital Enterprise Research Institute www.deriie RDF mapping feature committed to Drupal 7 core RDFa output by default (blogs, forums, comments, etc.) using FOAF, SIOC, DC, SKOS. Download development snapshot – http://ftp.drupalorg/files/projects/drupal-7x-devtargz Currently more than 200.000* sites on Drupal 6 waiting to make the switch to Drupal 7 waiting to massively increase the amount of RDF data on the Web Discussion http://groups.drupalorg/semantic-web * http://drupal.org/project/usage/drupal 33