Course start: 30. septembar 2024

CALLISTO: The basics of corpora and NLP for low-resource languages.

University of Vienna, Centre national de la recherche scientifique

Eva Vetter, Universität Wien

Scientific classification:

  • Ostale društvene znanosti (509)
  • Lingvistika (602)
  • Ostale humanističke znanosti (605)

Course start: 30. septembar 2024

CALLISTO: The basics of corpora and NLP for low-resource languages.

University of Vienna, Centre national de la recherche scientifique

Eva Vetter, Universität Wien

  • Scope: 4 units
  • Effort: 1 hour/week
  • Current participants: 317
  • Licence: CC BY-NC-ND 4.0
  • Course start: 30. septembar 2024
  • Course end: -
  • Current status: Ongoing course
  • Available languages:
    • Deutsch ‎(de)‎
    • English ‎(en)‎
    • Español - Internacional ‎(es)‎
    • Français ‎(fr)‎
All courses on iMooX are and remain free for everyone! Read more

Course details

General information about the course

Welcome to this course for anyone interested in the humanities, social sciences and linguistics who want to learn about working with corpora of minoritized languages!

Discover and understand how this type of corpora can be created and learn about important frameworks to be able to work on and with them.

This short introductory course offers insights into how language corpora support the progress of research and society and how they contribute to the digital survival of languages with a limited corpus.

"In the field of information technology, all language communities are entitled to have at their disposal equipment adapted to their linguistic system and tools and products in their language, so as to derive full advantage from the potential offered by such technologies for self-expression, education, communication, publication, translation and information processing and the dissemination of culture in general."

UNESCO – The United Nations Educational, Scientific and Cultural Organization. Universal Declaration Of Linguistic Rights (Barcelona, 1996). [accessed on Dec. 05, 2023 under : https://culturalrights.net/descargas/drets_culturals389.pdf

Course content

This course consists of 4 lessons and is designed to be conducted in a linear and step-by-step manner! The following main topics are covered:

Lesson 1:          NLP (Natural Language Processing) and the future of languages

Lesson 2:          Suitable corpora for NLP

Lesson 3:          FAIR corpus sharing

Lesson 4:          Legal and ethical caution

Learning goals

  • Identify the issues involved in sharing corpora for minoritized languages 
  • Understand the development chain for NLP applications 
  • Distinguish between the different corpus types that can be used for NLP development 
  • Identify some of the characteristics of a good corpus for NLP 
  • Integrate the FAIR principles into your corpus sharing practices
  • Examine the legal framework that applies to corpora 

Prerequisites

As this course is designed as an introduction to working with corpora of minority languages, no special prior knowledge is required.

Course schedule

The course consists of the four short lessons mentioned above, each containing written content, interactive elements and a final quiz. The individual lessons are unlocked according to performance: Once you have successfully completed a lesson and the associated quiz, you will gain immediate access to the next lesson.

Certificate

For actively participating in the course you will receive an automatic certificate which includes your name, the course name as well as the completed lessons. We want to point out that this certificate merely confirms that you answered at least 75% of the self-assessment questions correctly.

Licence

Additional content

Kursübersicht

  • Lektion 1: NLP und die Zukunft von Sprachen
  • Lektion 2: Welcher Korpus für NLP
  • Lektion 3: Für einen Korpusaustausch mit den FAIR-Grundsätzen
  • Lektion 4: Rechtliche und ethische Vorsichtsmaßnahmen
    • Questions juridiques linguistique

Course Instructor

Eva Vetter, Universität Wien



Course creators


Vetter & Jouitteau

Mélanie Jouitteau, IKER (CNRS)
Mélanie Jouitteau is a researcher in linguistics at the CNRS, specialising in Breton, minority languages and participatory research. Aware of the fragile future of human linguistic diversity in the context of the technologisation of human relations, she is focusing her research on the creation of data acquisition solutions to support the NLP development of languages with restricted corpora.


LyndaKehli 

Lynda Kehli, Inist – DoRANum (CNRS)
Lynda Kehli is in charge of pedagogical engineering and training in STI (Scientific and Technical Information) at Inist-CNRS. With the Formation-DoRANum team, she is developing a new generation of blended learning courses combining digital learning, webinars and other methods such as serious games. The aim is to support the scientific community in opening up and sharing.




Organisation

Lena Kratochwil, University of Vienna
Lena Kratochwil is studying to become a teacher of French and German and works as a student assistant in the language teaching and learning research team (Centre for Teacher Education / Institute of Linguistics).

Vetter & Kratochwil

Eva Vetter, University of Vienna
Eva Vetter is a professor in the domain of language teaching and learning research. In her sociolinguistic work, she positions the issue of linguistic minorization in the context of equity (in education) and human rights.
http://orcid.org/0000-0003-0504-6991

Partners

The Research Centre for Basque Language and Texts (IKER) - a joint research unit (Unité Mixte de Recherche – UMR5478) administered by the Centre National de la Recherche Scientifique (CNRS), the University Bordeaux Montaigne and the University of Pau and Pays de l'Adour (UPPA).

Centre national de la recherche scientifique

Ready to learn something new?

Prijavi i upiši