Hi everyone,
This is such an important and complex topic!
I appreciated Nicole's overview, which I believe is correct as pertains to copyright as such.
However, my understanding has been that aside from copyright, there are legal issues to consider in feeding full text articles to AI tools (whether NotebookLM, ChatGPT, or whatever else) - namely, publishers can have terms of service that prohibit the inputting of their content into AI systems, regardless of copyright (so this would apply to open access articles as well), regardless of whether the data is used for training the model in some cases. Sometimes it is allowed based on the intended use (commercial or not), sometimes the type of subscriber (academic or not) is also a factor. My understanding is that this counts as text and data mining (TDM), and policies usually frame it this way.
In my work this has come up more in the context of teams wanting to do AI-assisted systematic reviews, where they want to feed a model a batch of PDFs to do screening or data extraction. We have had to go over copyright issues as well as other legal issues. I am not a lawyer, I'm a university medical librarian, but I am trying to follow these developments closely. My stance is essentially that I am telling teams that the onus is on them to validate the terms of use for each publisher of articles they want to work with, and to consult with legal services in cases of doubt.
Some examples of publishers' current positions on TDM:
- Wiley - from their footer, they initially seem to prohibit it:
"Copyright © 1999-2025 John Wiley & Sons, Inc or related companies. All rights reserved, including rights for text and data mining and training of artificial intelligence technologies or similar technologies."
But then they have a page about TDM that seems to allow it for academic subscribers:
"Academic subscribers can perform TDM under license (or in accordance with statutory rights under applicable legislation) on subscribed content for non-commercial purposes at no extra cost." (https://onlinelibrary.wiley.com/library-info/resources/text-and-datamining)
- NEJM's footer is similar:
"Copyright © 2025 Massachusetts Medical Society. All rights reserved, including those for text and data mining, AI training, and similar technologies."
Their terms of service add more nuance - it seems like individuals could use materials that they have license to access within NotebookLM or other closed loop tools:
"You may not scrape, copy, display, distribute, modify, publish, reproduce, store, transmit, post, translate, or create derivative works from, including with artificial intelligence tools, or in any way exploit any part of the Content except that you may make use of the Content for your own personal, noncommercial use, provided you keep intact all copyright and other proprietary rights notices.
YOU MAY NOT USE CONTENT TO TRAIN AI MODELS OR USE CONTENT FOR OTHER PURPOSES SUCH AS TRAINING A MACHINE LEARNING OR ARTIFICIAL INTELLIGENCE MODEL WITHOUT EXPRESS PERMISSION."
- Springer allows it for academic subscribers, for non-commercial use, though adds some requirements as to the max number of requests per second or minute (depending on whether API is used), and has stipulations about data storage (no access for third parties, must only be kept for "the duration of the TDM project", etc.) (https://www.springernature.com/gp/researchers/text-and-data-mining)
- Elsevier seems to allow it (https://www.elsevier.com/about/policies-and-standards/text-and-data-mining):
"We have adopted a license-based approach that automatically enables researchers at subscribing institutions to text mine for non-commercial research purposes and to gain access to full-text content in XML for this purpose."
- Liebert also seems to allow it for non-commercial purposes:
"You do not need a license to undertake non-commercial TDM of any content you have lawful access to or of Sage's website." (https://home.liebertpub.com/customer-support/text-and-data-mining/193)
As I mentioned, I don't think it is our role to break down all the details for each team; we are not research assistants working for them. We simply need to be able to explain that copyright is not the only legal issue to think about, and to advise them to look at TDM policies as well if they want to input PDFs into AI tools.
For me this brings about a few other reflections. We already know that the digital divide will continue to grow as gAI becomes widespread, and there will be income-related inequalities. Researchers that have access to funds will have high-powered, high-quality tools that allow them to do more in less time while protecting their data with top tier subscription versions, and researchers that do not have access to funds will have to make do with manual work and limited AI tools that are more likely to use their data for training purposes. Now, there is a whole other layer - that less conscientious researchers are more likely to just upload whatever they want to whichever tools they please, regardless of the terms of service of the publisher. Sure, our role is to sensitize them to the issues and assert that it's their responsibility to use these tools ethically, which includes these legal aspects pertaining to TDM over and above copyright issues, but we all know that many of them really do not give a thought to licensing, copyright, etc. So labs with more money and fewer scruples will soar ahead, and those without funds and with more scruples will work in an entirely different way.
Anyway, if others have done deep dives into TDM related aspects of working with AI with PDFs and have drawn different conclusions, I am all ears!
------------------------------
Amy Bergeron
Librarian
Université De Montréal
------------------------------
Original Message:
Sent: Jul 15, 2025 10:48 AM
From: Kelsey Diemand Irvine
Subject: Google NotebookLM - Copyright Questions
Hi all,
Does anyone know if uploading PDFs from behind pay-walled sources (i.e. PDFs you can get from the library) to Google's NotebookLM is copyright infringement? I've been speaking with some faculty at my institution who are using NotebookLM and uploading all kinds of PDFs and other documents to their Notebooks, and the question of copyright, of course, came up. Google's stance on copyright and NotebookLM is "Respect copyright laws. Do not share copyrighted content without the necessary rights to do so. It's our policy to respond to clear notices of alleged copyright infringement. Repeated infringement of intellectual property rights, including copyright, will result in account termination."
I assume this means you cannot share your Notebook, but what are folks advising (if anything) to faculty and researchers who are using NotebookLM for their own research/courses? Thanks in advance!
Kelsey
------------------------------
Kelsey Diemand Irvine
Associate Director for Research & Collections
Wentworth Institute of Technology
Boston MA
She/Her/Hers
------------------------------