Skip main navigation (Press Enter).

ACRL Artificial Intelligence (AI) Interest Group

Back to discussions

Expand all | Collapse all

Hugging Face releases first AI models trained exclusively on open data

1. Hugging Face releases first AI models trained exclusively on open data

Recommend
Heather Sardis
Posted Dec 07, 2024 12:24 PM

Print Message
Worthwhile read from Hugging Face - on Dec 5th they released the Pleias 1.0 family of small language models, "the first ever models trained exclusively on open data ... the first fully EU AI Act compliant models" - - definitely worth a look.

On their training data: "We are moving away from the standard format of web archives. Instead, we use our new dataset composed of uncopyrighted and permissibly licensed data, Common Corpus. To create this dataset, we had to develop an extensive range of tools to collect, to generate, and to process pretraining."

On the models:

"multilingual, offering strong support for multiple European languages

safe, showing the lowest results on the toxicity benchmark

performant for key tasks, such as knowledge retrieval

able to run efficiently on consumer-grade hardware locally (CPU-only, without quantisation)"

Read the full press release here: https://huggingface.co/blog/Pclanglais/common-models

Warmly,

------------------------------
Heather Sardis
Associate Director for Technology and Strategic Planning
Massachusetts Institute of Technology
------------------------------

Powered by Higher Logic

Global message icon