ACRL Artificial Intelligence (AI) Interest Group

 View Only
last person joined: 23 hours ago 

To provide a forum for discussing the impact of AI on libraries and related topics, facilitating the exchange of ideas, best practices, and collaborative initiatives among library professionals.
Community members can post as a new Discussion or email ALA-ACRL-AI-IG@ConnectedCommunity.org.
Before you post: please note job postings are prohibited on ALA Connect. Please see the Code of Conduct for more information.

Hugging Face releases first AI models trained exclusively on open data

  • 1.  Hugging Face releases first AI models trained exclusively on open data

    Posted Dec 07, 2024 12:24 PM

    Worthwhile read from Hugging Face - on Dec 5th they released the Pleias 1.0 family of small language models, "the first ever models trained exclusively on open data ... the first fully EU AI Act compliant models" - - definitely worth a look. 

    On their training data: "We are moving away from the standard format of web archives. Instead, we use our new dataset composed of uncopyrighted and permissibly licensed data, Common Corpus. To create this dataset, we had to develop an extensive range of tools to collect, to generate, and to process pretraining."

    On the models: 

    • "multilingual, offering strong support for multiple European languages
    • safe, showing the lowest results on the toxicity benchmark
    • performant for key tasks, such as knowledge retrieval
    • able to run efficiently on consumer-grade hardware locally (CPU-only, without quantisation)"

    Read the full press release here: https://huggingface.co/blog/Pclanglais/common-models

    Warmly,



    ------------------------------
    Heather Sardis
    Associate Director for Technology and Strategic Planning
    Massachusetts Institute of Technology
    ------------------------------