ACRL Artificial Intelligence (AI) Interest Group

 View Only
last person joined: 10 days ago 

To provide a forum for discussing the impact of AI on libraries and related topics, facilitating the exchange of ideas, best practices, and collaborative initiatives among library professionals.
Community members can post as a new Discussion or email ALA-ACRL-AI-IG@ConnectedCommunity.org.
Before you post: please note job postings are prohibited on ALA Connect. Please see the Code of Conduct for more information.
  • 1.  Google NotebookLM - Copyright Questions

    Posted Jul 15, 2025 10:49 AM

    Hi all,

    Does anyone know if uploading PDFs from behind pay-walled sources (i.e. PDFs you can get from the library) to Google's NotebookLM is copyright infringement? I've been speaking with some faculty at my institution who are using NotebookLM and uploading all kinds of PDFs and other documents to their Notebooks, and the question of copyright, of course, came up. Google's stance on copyright and NotebookLM is "Respect copyright laws. Do not share copyrighted content without the necessary rights to do so. It's our policy to respond to clear notices of alleged copyright infringement. Repeated infringement of intellectual property rights, including copyright, will result in account termination."

    I assume this means you cannot share your Notebook, but what are folks advising (if anything) to faculty and researchers who are using NotebookLM for their own research/courses? Thanks in advance!

    Kelsey



    ------------------------------
    Kelsey Diemand Irvine
    Associate Director for Research & Collections
    Wentworth Institute of Technology
    Boston MA
    She/Her/Hers
    ------------------------------


  • 2.  RE: Google NotebookLM - Copyright Questions

    Posted Jul 15, 2025 01:11 PM

    I am working on putting together a short FAQ about this topic. Here's what I've learned so far as background. I wish it was simpler!

    1.  Steven Johnson is one of the product managers of NotebookLM (and also a well-known non-fiction author). He was interviewed in a podcast.
    "As an author, Johnson clarifies that no material uploaded to the model is used to train NotebookLM or Google Gemini; it's only sent to the model's context window, or "short-term memory." Johnson explains that if you "have the right to use [the material] under copyright, you can use it inside of Notebook."
    https://every.to/podcast/inside-the-pod-the-ai-research-assistant-you-ve-been-dreaming-of 

    So to me that sounds like it's fine for individual instructors or students at a university to upload PDFs from our licensed ejournals into a Notebook for their own individual use - since you are the only person who can see it. 

    Also Google says "We value your privacy and never use your personal data to train NotebookLM."

    2. But I wondered, what about publicly sharing a Notebook with material you have the right to use? (a relatively new feature)

    In  Use public notebooks and featured notebooks in NotebookLM, Google say, know that "Public sharing is only enabled for consumer accounts. It's currently disabled for Workspace Enterprise or Education accounts."

    (I have a Gemini Pro account which is an individual consumer account, so that gives me NotebookLM Pro). 

    Google also says: "If you have upgraded NotebookLM to include premium features, you can specify if you want to share your notebook "chat only" or "full notebook". Sharing your notebook "chat only" will prevent viewers from interacting directly with artifacts and sources."

    So I take that to mean that I can publicly share a Notebook I created that used copyrighted sources, but I should share it as "chat only" instead of "full notebook" to avoid copyright concerns. (I want to verify this first, though. And I think it may be safer to recommend not sharing Notebooks with copyrighted information publicly at all).

    Also, as of today there is a new feature called "Featured Notebooks."  Anyone signed in to NotebookLM can view one of those (such as one shared by The Economist or The Atlantic), and click on the sources in them to read the full articles... which of course they are making available as the owner themselves.

    So I'll try to distill this down to a few simple sentences! Anyone else have more info on this? Thanks!



    ------------------------------
    Nicole Hennig
    ELearning Developer
    University of Arizona Libraries
    nhennig@arizona.edu
    ------------------------------



  • 3.  RE: Google NotebookLM - Copyright Questions

    Posted Jul 15, 2025 02:23 PM

    Hi, Kelsey.

    As long as it's for private use and they're not sharing the PDFs with others, I'd say they're likely not violating copyright. I would still remind them not to share the materials (or Notebooks that include those materials) with others. If they do want to share a Notebook, I would recommend that they include a short summary and link out to the source or DOI instead of uploading the full text.

     

       Regards,

       Cynthia Soll

       Research Librarian

       254-299-8343

     

     






  • 4.  RE: Google NotebookLM - Copyright Questions

    Posted Jul 30, 2025 10:37 AM

    Hi everyone,

    This is such an important and complex topic!

    I appreciated Nicole's overview, which I believe is correct as pertains to copyright as such.

    However, my understanding has been that aside from copyright, there are legal issues to consider in feeding full text articles to AI tools (whether NotebookLM, ChatGPT, or whatever else) - namely, publishers can have terms of service that prohibit the inputting of their content into AI systems, regardless of copyright (so this would apply to open access articles as well), regardless of whether the data is used for training the model in some cases. Sometimes it is allowed based on the intended use (commercial or not), sometimes the type of subscriber (academic or not) is also a factor. My understanding is that this counts as text and data mining (TDM), and policies usually frame it this way.

    In my work this has come up more in the context of teams wanting to do AI-assisted systematic reviews, where they want to feed a model a batch of PDFs to do screening or data extraction. We have had to go over copyright issues as well as other legal issues. I am not a lawyer, I'm a university medical librarian, but I am trying to follow these developments closely. My stance is essentially that I am telling teams that the onus is on them to validate the terms of use for each publisher of articles they want to work with, and to consult with legal services in cases of doubt.

    Some examples of publishers' current positions on TDM:

    • Wiley - from their footer, they initially seem to prohibit it:

      "Copyright © 1999-2025 John Wiley & Sons, Inc or related companies. All rights reserved, including rights for text and data mining and training of artificial intelligence technologies or similar technologies."
      But then they have a page about TDM that seems to allow it for academic subscribers:
      "Academic subscribers can perform TDM under license (or in accordance with statutory rights under applicable legislation) on subscribed content for non-commercial purposes at no extra cost." (https://onlinelibrary.wiley.com/library-info/resources/text-and-datamining)

    • NEJM's footer is similar:
      "Copyright © 2025 Massachusetts Medical Society. All rights reserved, including those for text and data mining, AI training, and similar technologies."
      Their terms of service add more nuance - it seems like individuals could use materials that they have license to access within NotebookLM or other closed loop tools:
      "You may not scrape, copy, display, distribute, modify, publish, reproduce, store, transmit, post, translate, or create derivative works from, including with artificial intelligence tools, or in any way exploit any part of the Content except that you may make use of the Content for your own personal, noncommercial use, provided you keep intact all copyright and other proprietary rights notices.
       YOU MAY NOT USE CONTENT TO TRAIN AI MODELS OR USE CONTENT FOR OTHER PURPOSES SUCH AS TRAINING A MACHINE LEARNING OR ARTIFICIAL INTELLIGENCE MODEL WITHOUT EXPRESS PERMISSION."
    • Springer allows it for academic subscribers, for non-commercial use, though adds some requirements as to the max number of requests per second or minute (depending on whether API is used), and has stipulations about data storage (no access for third parties, must only be kept for "the duration of the TDM project", etc.) (https://www.springernature.com/gp/researchers/text-and-data-mining)
    • Elsevier seems to allow it (https://www.elsevier.com/about/policies-and-standards/text-and-data-mining): 
      "We have adopted a license-based approach that automatically enables researchers at subscribing institutions to text mine for non-commercial research purposes and to gain access to full-text content in XML for this purpose."
    • Liebert also seems to allow it for non-commercial purposes:
      "You do not need a license to undertake non-commercial TDM of any content you have lawful access to or of Sage's website." (https://home.liebertpub.com/customer-support/text-and-data-mining/193)

    As I mentioned, I don't think it is our role to break down all the details for each team; we are not research assistants working for them. We simply need to be able to explain that copyright is not the only legal issue to think about, and to advise them to look at TDM policies as well if they want to input PDFs into AI tools.

    For me this brings about a few other reflections. We already know that the digital divide will continue to grow as gAI becomes widespread, and there will be income-related inequalities. Researchers that have access to funds will have high-powered, high-quality tools that allow them to do more in less time while protecting their data with top tier subscription versions, and researchers that do not have access to funds will have to make do with manual work and limited AI tools that are more likely to use their data for training purposes. Now, there is a whole other layer - that less conscientious researchers are more likely to just upload whatever they want to whichever tools they please, regardless of the terms of service of the publisher. Sure, our role is to sensitize them to the issues and assert that it's their responsibility to use these tools ethically, which includes these legal aspects pertaining to TDM over and above copyright issues, but we all know that many of them really do not give a thought to licensing, copyright, etc. So labs with more money and fewer scruples will soar ahead, and those without funds and with more scruples will work in an entirely different way. 

    Anyway, if others have done deep dives into TDM related aspects of working with AI with PDFs and have drawn different conclusions, I am all ears!



    ------------------------------
    Amy Bergeron
    Librarian
    Université De Montréal
    ------------------------------