Core Linked Data Interest Group

last person joined: 5 days ago 

✉ Send an email to ALA-CoreLinkedData@ConnectedCommunity.org to start a discussion or share a file.

About this Group

👐 Anyone can view all content in the group, but only people who join it can post to it. Anyone can join to participate.


Purpose: Provides a forum for discussion of issues related to Linked Library Data and the role of library metadata in the Semantic web. Goals include: raising awareness of Semantic Web technologies, such as the Resource Description Framework (RDF) and the use of URIs as identifiers within bibliographic descriptions; promoting research on linked data challenges, such as domain modeling and vocabulary selection and design; and informing the ongoing development of existing metadata standards for Libraries, Archives and Cultural Heritage Institutions.

Related Groups:

This interest group is part of Core's Metadata and Collections Section.

Portraits of three Core members with caption Become a Member: Find Your Home: Core.

 

Linked Library Data Interest Group Activities at ALA Annual 2014

  • 1.  Linked Library Data Interest Group Activities at ALA Annual 2014

    Posted Jul 14, 2014 07:38 PM
    Edited by System Nov 20, 2020 11:34 AM

    At ALA Annual 2014 in Las Vegas, Nevada, the ALCTS/LITA Linked Library Data Interest Group (I) hosted an all-day preconference; (II) co-hosted 2 sessions entitled "International Developments in Library Linked Data: Think Globally, Act Globally;" (III) hosted a IG-specific session featuring a talk by Jon Phipps. The IG did not conduct a formal business meeting at this year's Annual, nor did we participate in an overall ALCTS or LITA IG/Committee session, due to the heavy commitments elsewhere.


    (I) "Practical Linked Data with Open Source," an all-day preconference held June 27, 2014, sponsored by LITA and organized/hosted by the ALCTS/LITA Linked Library Data Interest Group.


    The preconference schedule was posted before the event at https://connect.ala.org/communities/community-home/librarydocuments/viewdocument?DocumentKey=bee6206a-0eee-4017-9e52-90e292d36e6c.


    65-75 people attended the preconference.


    After introductions, Dan Scott began the day with a brief talk "Structured data for libraries: RDFa and schema.org," followed by a set of codelabs attendees worked on for about 75 minutes, followed by a 15 minute talk, "Structured data in the wild." Dan Scott's slides accompanying the talks and his codelabs can be accessed at http://stuff.coffeecode.net/2014/lld_preconference/.


    Following a break, Galen Charlton delivered his talk, "Hybridizing MARC: Using Linked Data in Your ILS Now." Slides that accompanied this talk can be found at https://connect.ala.org/communities/community-home/librarydocuments/viewdocument?DocumentKey=bee6206a-0eee-4017-9e52-90e292d36e6c.


    This took us to lunch, where attendees were encouraged to stay and continue working on Dan Scott's exercises. Dan also provided 2 additional exercises for lunchtime "keeners" who stayed in the room to work: "A short demonstration in Python of how to crawl a sitemap and extract RDFa from library catalogues" and "a codelab showing how to create a Google Custom Search Engine over library catalogues enriched with RDFa + schema.org." About 40 keeners worked on these. How did they nourish themselves? Richard Urban picked up some pizzas! As in the first set of codelabs, indefatigable presenters circulated around the room in case there were questions.


    After lunch, Richard Urban delivered a talk, "Linked Data Patterns for Libraries, Archives, and Museums." Slides that accompanied this talk are available at https://dl.dropboxusercontent.com/u/3881880/LODpreConf.pdf. During the talk, Richard introduced the LODLAM protopattern wiki at http://lodlampatterns.org/protopattern. Attendees were asked to consider LD problems unique to LODLAM and start considering creating patterns that address those problems. To facilitate this, tables engaged in a two-part discussion; first, tables had a problem discussion; second, tables had a solution discussion. Results of each discussion were reviewed by all, led by Richard Urban. The main problem that emerged from the tables was reconciling and managing faculty identities; other problems included: managing thesauri; linking to entities using entity URIs, especially as part of a cataloging workflow; turning text strings to URIs; making descriptions about a resource adhere; and locating available expertise at the local level. The solutions discussion wrap-up centered around solutions to the main problem of faculty identities, during which most of the discussion focused on authority and trusted sources of data. Richard Urban reminded us of LD's open world assumption, where it is assumed we cannot know all about all at any given point, meaning our representations are incremental. Richard Wallis encouraged all to take authority from wherever/whenever we see fit, and not to be overcautious. Also the role of metadata/provenance metadata/named graphs was emphasized in the area of trust and authority. Finally Dan Scott approached solutions to mapping MARC to schema.org; his table discussed implementation issues in this area, such as clarifying the value of a MARC fixed field using values in a MARC variable field to create accurate schema.org descriptions.


    The next portion of the preconference broke attendees into nine tables, each with a topic to discuss. Jodi Schneider's notes on the discussion -- taken during the event -- are available at https://connect.ala.org/communities/community-home/librarydocuments/viewdocument?DocumentKey=bee6206a-0eee-4017-9e52-90e292d36e6c. The following is another description of the nine tables:


    (1) How to start a linked data project, facilitated by Sylvia Southwick. This was the most populated tabled. Discussion included: the importance of cleaning data sets, the importance of having a data model, the idea of just doing your projects -- don't even tell administrators -- and producing protoptypes. Specific tools discussed included the use of Open Refine for reconciliation and the Europeana Data Model.


    (2) The real benefits of LOD, or the business case for LOD, facilitated by Richard Wallis and Sarah Quimby. This was the second most populated table. The business case included the success of the Bibliothèque nationale de France redesign, the phenomenon of businesses making widespread use of LD, the success of Columbia University's use of schema.org in their digital repository, the ease with which authority data can be auto-updated using LD, and how the BIBFRAME model provides an excellent answer to the need to edit a record hundreds or thousands of times.


    (3) Disambiguating resources, facilitated by Kevin Ford. This was the third most populated table. Identifiers were discussed at this table. ISBNs and the quality of data were engaged. The importance of the provenance LOD cloud was emphasized.


    (4) Getting items into websearch, facilitated by Dan Scott. Discussion included republishing data, HTML/RDFa as a quasi-stable structure, and the complication of mapping when the property "sameAs" is not accurate or sufficient.


    (5) Enhancing MARC data, facilitated by Galen Charlton. Focused on what we can do now, for example, plugging in URIs where applicable, as that will improve the transition to RDF and BIBFRAME.


    (6) Striving for change. Discussion included the search for innovators among current staff; the role of LD in bridging our many repositories (catalog, finding aids, digital collections, etc.); how to bridge diverse vocabularies; combining theoretical knowledge with IT; a challenge to the concept of triples and ideas on how to go beyond the triple in conceiving of LD.


    (7) Enhancing/reconciling/enriching non-MARC metadata, facilitated by Theo Gerontakos. Discussion included the role of ontologies and schemas in LD; reconciliation as a formidable task (Open Refine and Silk Server were discussed as reconciliation services); how to reconcile LCSH strings to URIs and the role of FAST; the role of xml databases in tech services workflows and, by extension, in LD workflows; and whether or not LD can play a role in managing ETDs and/or IRs.


    (8) Teaching LD, facilitated by Richard Urban. Discussion focused on LD in the LIS curriculum. The lack of teaching materials was noted. The role of coding and programming in the schools was engaged. Queries into what schools are actually doing in regards to integrating LD were posed.


    (9) Visual images. Discussion probed the nature of visual image data, as at least a LAM issue -- perhaps a LODLAM issue -- and raised concerns about images as metadata, and the general problem of “having metadata” as opposed to “being metadata.”


    After the round table discussions, Jodi Schneider closed the day with a talk "How Can Structured & Linked Data Serve Users?" Slides that accompanied this talk are available at https://connect.ala.org/communities/community-home/librarydocuments/viewdocument?DocumentKey=bee6206a-0eee-4017-9e52-90e292d36e6c.


     


    (II) "International Developments in Library Linked Data: Think Globally, Act Globally." Saturday, June 28, 8:30-11:30 AM.


    The Linked Library Data Interest Group (LLDIG) co-hosted this event with the ALCTS International Relations Committee (IRC), with the bulk of the organizing done by David Miller of the ALCTS IRC. Part one took place in the morning from 8:30-10:00 and was moderated by LLDIG co-chair Theo Gerontakos. Part two took place later, from 10:30-11:30, and was moderated by LLDIG co-chair Sarah Quimby. There were a total a 5 presentations with question and answer periods. Approximately 200 people attended part one, and approximately 150 people attended part 2.


    part 1:
    Richard Wallis spoke first. Slides that accompanied his talk can be found at http://www.slideshare.net/rjw/linked-data-from-library-entities-to-the-web-of-data.


    Jodi Schneider spoke second. Slides that accompanied her talk can be found at http://ala14.ala.org/files/ala14/Jodi_Schneider_LD_intl.pptx_0.pdf, with the embedded video (on the "FreeYourMetadata" YouTube channel) available at https://www.youtube.com/watch?v=MnM3tHWAsSA&noredirect=1.


    Neil Wilson spoke third. Slides that accompanied his talk can be found at http://www.slideshare.net/nw13/bl-lod-ala-presentation-june-2014.


    part 2:
    Gordon Dunsire spoke first. Slides that accompanied his talk can be found at either http://www.gordondunsire.com/pubs/pres/RDAGlobal.pptx or http://www.slideshare.net/GordonDunsire/rda-global-ss.


    Rienhold Heuvelmann spoke second. Slides that accompanied his talk can be found at https://dl.dropboxusercontent.com/u/40360858/2014-06-28_BIBFRAME_Heuvelmann.pdf or http://de.slideshare.net/sollbruchstelle/2014-0628-bibframeheuvelmann.


     


    (III) Linked Library Data Interest Group Managed Discussion featuring Jon Phipps. Sunday, June 29, 8:30-10:00 AM.


    Sarah Quimby started the session with some information on the LITA/ALCTS Linked Library Data Interest Group:



    • We have a new Co-Chair, Violeta Ilik, Semantic Technologies Librarian at Texas A&M University Libraries.

    • The two-year term of the previous Co-Chair, Theo Gerontakos, has ended; Sarah Quimby will continue as Co-Chair for another year.


    Jon Phipps delivered a presentation, "RDA FTW or WTF." Slides that accompanied the discussion are available at http://connect.ala.org/files/ALA%20LITA%20Presentation.pptx. This was a remarkable talk that covered many topics, including:



    • Jon's history of metadata standards

    • notions on the open world assumption of RDF

    • a description of the global web of data as a "brain" that only knows what we tell it

    • Anglo-American Centrism in software development

    • RDA's effort to NOT be Anglo-American centric

    • reflections on diverse requirements for semantics and cross-cultural linking in global metadata

    • lots of thoughts on MARC21: its limited syntax, its rich semantics achieved through committee with frequent requests from "below," how MARC21 is very bad with FRBR, how its semantics is unfortunately tied to its syntax, how it's hard to extend, and how it cannot be locally extended without local pain 

    • the importance of the RDA data model (based on FRBR and DCAM), formalized in 2013

    • opaque identifiers in RDA RDF (supporting RDA's commitment to cross-cultural semantic alignment)

    • lexical aliases to make RDA opaque identifiers easier to work with

    • unconstrained RDA RDF (no domain or range constraints, assists cross-domain mapping)

    • RDA's use of git/GitHub, rdaregistry.info, and the Open Metadata Registry (OMR)

    • some problems with BIBFRAME (discards MARC 21 semantics, redefines existing properties, redefines frbr:work, ‘borrows’ semantics from RDA without reference, discards FRBR semantics, unique approach to frbr:work, proprietary approach to MARC 21 mapping)

    • some problems with schema.org (oriented toward global search engines, redefines frbr:work, hard (not impossible) to extend, constrained by limitations of HTML-based container, hard to map, hard to translate, lack of instructions, ‘unique’ approach to frbr:work, proprietary approach to MARC 21 mapping)

    • some problems with RDA (lots of cruft from AACR2 and MARC 21, minimal community involvement in development of the data model, strong commitment to FRBR)

    • MARC as our shared single point of view; linked open data is forcing the consideration of multiple points of view

    • recommendation: harvest globally: "be conservative in what you do, be liberal in what you accept from others" (RFC 793)

    • recommendation: process locally: aggregate, validate, map and cherry-pick

    • recommendation: publish globally: use many standards, be consistent, be precise, make data "knowable."


    Jon's talk required the full session; there was no discussion, even though the session was advertised as a managed discussion, as is usually done by the IG. However this was somewhat by design. Sarah and Theo had requested table rounds to facilitate discussion; when we arrived in the room, we saw we did not have tables but simply chairs facing front. Sarah and Theo suggested to Jon that maybe he could speak a little longer than originally planned? Jon complied! Thank you, Jon! However, had Jon left, say, 20 minutes at the end of the session, surely we would have managed a "lively discussion" as promised, but the Co-Chairs, when all was done, felt the extended talk was the best of all possible results.


    Respectfully submitted,


    Theo Gerontakos