MODS and RDF Call 2015-11-02

Time: 9am PDT / Noon EDT

Call-In Info: 712-775-7035 (Access Code: 960009)

Homework Reminder: 

Moderator: Steven Anderson (Boston Public Library)

Primary Notetaker:  Rebecca Fraimow (etherpad link: https://etherpad.wikimedia.org/p/RDF-MODS-20151102)

Attendees:

Agenda:

  1. Conversion code update
    1. Cheap URL purchased for $1 for http://mods2rdf.xyz (Blacklight stuff to be fixed / configured still).
    2. Upload form is located at: http://mods2rdf.xyz/uploads
      1. Takes a single MODS XML record and will process the title as our alpha RDF title mapping (simple or complex).
        1. Note: While complex works in ActiveFedora, it is bugged in the Fedora Admin UI that then throws an error. It looks to be fixed in Fedora 4.4.1 release (they have the fix "In Review"): https://jira.duraspace.org/browse/FCREPO-1811
      2. Site / Code be updated to support more elements. 
        1. Status of support and issues will be be located at: <to add>. 
        2. The codebase is located at: https://github.com/boston-library/mods2rdf
    3. In order to access the Fedora repository admin UI from the link of uploaded samples, use the following credentials:
      1. Username: darthvader
      2. Password: tapdancing
    4. Let sanderson know if you want to contribute or have questions / thoughts on it!
       
  2. Library of Congress Type of Resource Mappings (brief initial notes lost due to etherpad connection issue) (https://goo.gl/8jiEs4)
    1. Shawn at NYPL: how are people using Type of Resource? is more than one value ever used? Mapping manuscript value to manuscript and not text, or would people map something to manuscript as well as text? Manuscript could be seen as a refinement of text; if used for faceting, could be useful. Or maybe we don't recommend specific mappings? 
      1. Q: Is NYPL using this for faceting?
        1. Yes, NYPL is, so in their system when using an either/or mapping, faceting for "text" to include "manuscript" adds complexity.
      2. From BPL point of view, no problem with mapping a manuscript=yes to both what the element value is set to (be it "text" or something else) and also then to manuscript. Consensus from others that this approach would be fine.
    2.  'digital' vs. 'multimedia'?
      1. 'Digital' might be 'born-digital'?
        1. Kelcy: then are you not specifying that it's text if it is a born-digital text document? If you're using digital to specify born-digital, you would also want to specify more specifically what it is.
        2. There's another MODS element where you can specify born-digital or if something has been reformatted through digitization. That would be physical description, digital origin covers the 'digital' aspect.
        3. So 'digital' as a resource type is maybe a different nuance than that. MODS Type of Resource is about the content, not about the origin.
      2. In the end, will just use the mapping document for the standard mappings that we could research. Individual mappings and usage outside of what we can figure out things to mean will be up to each individual institution to do as there isn't enough guidance on some of the resourceType elements.
    3. Any other mapping concerns?
      1. Karen from Northwestern: thinks it looks good, concerns about having more details could use another element, Internet Media Type, which takes from mime types and could define exactly what kind of file this is.
         
  3. MODS Type of Resource Collaboration update (https://goo.gl/UfzQfc)
    1. Everyone last time seemed to the have the same mapping for Type of Resource of dcterms:type to the Library of Congress resourceTypes. So any differences or nuances not captured in this document?
      1. No objections raised.
         
  4. MODS Genre discussion ( MODS Genre Individual Institution Usage And RDF Conversion)
    1. Boston Public Library
      1. Decided to use schema:genre as the predicate for the value. It supports both string literals and URIs and seemed to be a fairly common predicate.
        1. Mostly have URIs. But some harvested data has a genre that did not come from an available linked data source (text only).
      2. We also use display_label to indicate a "general genre" (top level faceting) vs a "specific genre". We don't need to keep this as this distinction could just be kept in the Application logic to have a list that would allow it to put the values in the correct Solr fields when an object is indexed.
      3. Emily: they had looked at Schema Genre as well, does it look analogues to the same concept? 
        1. Steven: had emailed UC Santa Barbara about their choice since it was different than ours (rdaregistry:formOfWork) and their response was that it wasn't an exact match since sometimes you have Format values in your genre field. (rdaregistry:formOfWork isn't much better in being a great match though). 
          1. However, while not 100% in the values that would be associated with the predicate, Steven isn't that bothered by it.
    2. Indiana University
      1. Outlined two possible approaches in their document:
        1. One uses the predicate edm:hasType and the other used the Dublin Core Elements Type predicate for the mapping.
          1. Using DC and LoC Genre form -- which is where source data tends to come from -- so thought that would be a priority of authorities, was TGM, then LoC.
      2. At DLF, question asked: is there a reason we're not looking at how DPLA is mapping? DPLA map is using Europeana data model hasType for Genre and referring to the AAT vocab, and it seems to work; any Genre seen in IU examples was fitting in, and would line us up more in terms of what we're mapping to with what DPLA is doing. 
        1. Julie general question: should we be looking at these in terms of mapping precedents that are already out there? 
        2. Julie and Jen from IU kind of leaning towards what DPLA is using of their two options. But hadn't looked into schema:genre that much for comparison.
        3. Emory University kind of has the same question -- also had Bibframe on the list, but doesn't know what the complexities of working with Bibframe are like.
        4. Steven: looks like the EDM hasType can be a literal or a URI, is that correct?
          1. Julie: not positive. 
            1. Steven: what does hasType have in its range value? 
              1. Julie: DPLA map doesn't have a ton of information, has to go back to look at the Europeana data model. 
        5. In the end, it appears DPLA restricts to URI(?) while the official Europeana data model allows for either a URI or a literal.
        6. With a requirement to support both URIs and Literals, it seems we would be alright going with the Europeana interpretation of the predicate.
          1. But will look more into this still!
      3. Any further questions or comments?
        1. Kelsey at Amherst: Julie makes a good point about using existing mappings where that makes sense.
        2. reminder: OCLC distributes RDF data with Schema.org, so any Marc records is available via Schema, so we could see what they've already done wrt Type of Resource and Genre. How would one go about seeing that? Go to worldcat.org, bring up any record, at the very bottom is information about linked data 
    3. Emory University
      1. Still kind of brainstorming, most of the MODS Genre entries may or may not have the authority, a lot of them will be literals but usually based on a controlled vocabulary (AAT or Marc Genre Terms).  Personally would be interested in finding a mapping that could accomodate a literal or a URI
        1. Thinking about specifically for Hydra using RDF vocab gem, not sure about status of that, so it might be good to find something more stable.
          1. Steven: Schema currently does exist in the RDF vocabularies, don't know about EDM but assume it does because DPLA is using it. Predicates would be defined in those shared libraries already. 
        2. Looking at EDM's schema and Bibframe but had not dug into details of that.
    4.  UC Santa Barbara mapping
      1.  Their output maps Genre to RDA registry formOfWork, and specifically Form of Work that is English notation, formOfWork .en, wondering why they had chosen it, since Form of Work had only been used in a very few locations.
      2. Full explanation on their email response on the reasoning.
        1. Mostly also points out how neither schema:genre nor formOfWork are actually appropriate for what we typically store in MODS Genre.
    5. New York Public Library
      1.  Didn't really settle on a predicate.
        1. Do want to use both URIs and non-URIs with it though.
        2. Likes the idea of using EDM with the idea of doing the same thing as with EDM: Agents for names, so if a term is already in an authority file, would mint and store it locally so that they could also store local terms as SKOS concepts.
          1. Essentially this sounds similar to what we ended up doing for Names. Mint a local object and skos:exactMatch when appropriate.
          2. Another possibility would allow defining own SKOS Concept schemes that could be used to group certain terms; semantics of Genre is very swirly, it means different things to different people, and NYPL has aggregated content from different divisions, so using a very broad predicate but having potential to refine categories by putting them in different schemes might be one strategy for managing terms.
          3. Steven: possible that you could write up a quick example of why you would want to have a local Genre and then have that point to the remote Genre? 
            1. Yes, NYPL will draft something up in document for next time. 
            2. Steven: it's an interesting approach, but it complicates the data model. Maybe something worth supporting for the complicated mapping though.
        3. Does NYPL favor hasType? 
          1. Don't know if they favor it, but it would be more interoperable with DPLA 
    6. Wrap-up
      1. Steven: currently leaning towards EDM hasType as the matching, if DPLA is using it would make mapping to them much more straightforward. Anybody have an objection to this? 
        1. Comment: As long as we think it can accommodate literals.
          1. Which the official EDM definition says that it does as far as we can tell.
      2. If anybody comes up with a better idea before the next meeting, we can discuss it then, but for the initial pass that seems like it would work. If NYPL could go ahead and show use case for why you might mint something locally, that would be useful in terms of determining whether or not to support it as a group, or is it a simple mapping only? 
      3. Anything else for Genre that we should cover as a group or work on for next time?
        1. No comments
      4.  Steven will create collaborative document for this default mapping.
        1. Will depend on what we see from NYPL whether we have just a simple case or both a simple and complex case.

  5. Next MODS top-level element: Origin Info
    1. Seems like a complex element, should we break it down and do two of the sub-elements at a time?
    2. Steven: we can take a stab at it and then refine it during several meetings. Can always do an initial first stab and then a second stab based on what we couldn't figure out or what we had questions on.
      1. Comment on it still likely being too large for what one can accomplish for the next meeting.
        1. Do do certain sub-elements go together as a group we can tackle first?
          1. Consensus seems to be to work on the "date" elements under origin info first and the remainder once that is complete.
             
  6. Final Wrap-Up
    1. Steven will contact Steve from Northwestern to see about getting everyone access to Fedora, will process more elements. Will also try to get the Blacklight portion of the application working.
    2. Genre for now will use EDM hasType, NYPL will show use cases for locally minted Genre which has a URI within the locally minted copy to determine whether or not to support a complex case as a group.
    3. Each institution will map Origin Info for the Date elements.