Switzerland has developed an open set of models, Apertus | Swiss AI, that is trained on a documented training set, “developed with due consideration to Swiss data protection laws, Swiss copyright laws, and the transparency obligations under the EU AI Act.”
EPFL, ETH Zurich, and the Swiss National Supercomputing Centre (CSCS) has released Apertus, Switzerland’s first large-scale open, multilingual language model — a milestone in generative AI for transparency and diversity. Trained on 15 trillion tokens across more than 1,000 languages – 40% of the data is non-English – Apertus includes many languages that have so far been underrepresented in LLMs, such as Swiss German, Romansh, and many others. Apertus serves as a building block for developers and organizations for future applications such as chatbots, translation systems, or educational tools.
This project should interest us in Canada as we are talking Sovereign AI. Should Canada develop its own open models? What advantages would that provide? Here are some I can think of:
- It could provide an open and well maintained set of LLMs that researchers and companies could build on or use without fear that access could be changed/pulled or data logged about usage.
- It could be designed to be privacy protecting and to encourage adherence to relevant and changing Canadian laws and best practices.
- It could be trained on an open and well documented bilingual data that would reflect Canadian history, culture, and values.
- It could be iteratively retrained as issues like bias is demonstrated to be tied to part of the training data. It could also be retrained for new capacities as needed by Canadians.
- It could include ethically accessed Indigenous training sets developed in consultation with indigenous communities. Further, it could be made available to indigenous scholars and communities with support for the development of culturally appropriate AI tools.
- We could archive code, data, weights, documentation in such a way that Canadians could check, test, and reproduce the work.
I wonder if we could partner with Switzerland to build on their model or other countries with similar values to produce a joint model?