FR EN

Workshop

Training workshops will be offered on Tuesday, October 21, from 9:30 a.m. to 12:30 p.m., before the start of the conference.
They are free and open to all participants, but registration is required by email at jlc2025@sciencesconf.org,specifying the workshop you wish to attend. Your registration is binding, as the number of places is limited.

We warmly thank our colleagues who are organizing these workshops.

NB: the workshops will be held in French.

AVAA Toolkit (9h30-12h30) : a toolbox to support the analysis of interactions based on multimodal corpora
Video collection and processing (11:00 -12:30)

Collection : audiovisual equipment, points of view, which parameters to use, storage, and GDPR
Processing : import, storage, editing, synchronization, anonymization, automatic transcription

CORLI Gum
TXM beginner written corpus (9h30-12h30)
TXM beginner oral corpus (9h30-12h30)

===============================================================================================

AVAA Toolkit: a toolkit to support the analysis of interactions from multimodal corpora

Introduction
The AVAA Toolkit (Audio and Video Annotations Analysis Toolkit) software offers numerous features for interaction analysis. It can be used at different stages of the research process based on annotated audiovisual data corpora: processing, mining, data visualization, but also the annotation process itself with an intercoding procedure enabling the collaborative construction of coding items.

Facilitator
Clotilde George, University of Lorraine, language sciences researcher, associate member of ATILF

Topic & Objective
Getting started with the AVAA Toolkit software, which is suitable for analyzing annotated audiovisual corpora (particularly with ELAN). Creating combined primary and secondary data collections.

Resources & Prerequisites
Software: www.avaa-toolkit.org
Quick overview: https://avaa-toolkit.org/features/
Documentation: https://avaa-toolkit.org/documentation

Participants must have a corpus of annotations aligned with the signal (eaf, azp, cha, textgrid formats, etc.).

Terms
Duration: 3 hours, 9:30 a.m. to 12:30 p.m.
Number of participants: 10 maximum

Pre-installation of software: yes (www.avaa-toolkit.org). Please contact dev@avaa-toolkit.org if you have any difficulties installing the software.

=============================================================================================

Audiovisual data collection

Description
The production and processing of corpora involves methodological considerations, technical knowledge, and legal and ethical issues. We will look at the different types of video and audio recording equipment available. What equipment can be used depending on the field and research questions? In addition to traditional camcorders and lapel microphones, we will look at equipment such as 360° cameras, subjective cameras, and action cameras.

We will address the issue of formats, editing, and exporting audiovisual materials. We will carry out the entire processing chain from import to export of synchronized files. Then, we will perform automatic transcription of audiovisual files using several methods.

Facilitators
Justine Lascar, CNRS research engineer at the ICAR laboratory, head of the Audiovisual Engineering Corpus (CIA) unit
Léa Mouton, CNRS assistant engineer at the ICAR laboratory, member of the Audiovisual Engineering Corpus (CIA) unit

Resources
No software installation required
CIA page https://icar.cnrs.fr/recherche/services/

Terms
Duration: 1.5 hours, 11:00 a.m. to 12:30 p.m.
Number of participants: 10 maximum

=============================================================================================

CORLI GUM

Introduction
The CORLI-GUM project aims to provide training in tool-based linguistic annotation and to collaboratively build a multi-annotated resource for French. Largely inspired by the GUM resource developed at Georgetown University (https://corpling.uis.georgetown.edu/gum/), this project offers a comprehensive framework that gives teachers involved in university courses in NLP and tool-based linguistics the opportunity to involve their students in the linguistic annotation of non-standard texts. Several layers of annotation are proposed, based on annotation guides validated by the community: annotation of tokens and syntactic dependencies according to the Universal Dependencies model, annotation of named entities according to the Quaero model, annotation of coreference according to the Democrat model, and annotation of discourse markers according to the Crible & Degand model. Beyond the creation of a multi-annotated resource, the project invites students to apply their knowledge of linguistics to natural data that can sometimes be confusing (text messages, oral or online discussions, technical texts) and to discuss the definitions of each with each other during the adjudication process.

Facilitator
Lydia-Mai Ho-Dac, CORLI & Université Toulouse Jean Jaurès/CLLE

Topic & Objective
Presentation of the project and how it works from an educational perspective.Discover and experiment with annotation layers: annotation guides and annotation procedure with INCEpTION.

Resources & Prerequisites
A connected web browser
Knowledge of linguistic analysis

Terms
Duration: 1.5 hours, 11:00 a.m. to 12:30 p.m.
Number of participants: 20 maximum

=============================================================================================

TXM for beginners (written corpora)

Introduction
TXM is a software program that allows you to search corpora and extract concordances and statistics.
The training is intended for beginners and will consist of two parts.

1. Importing a corpus
We will see how to organize and import your corpora. Demo corpora will be provided, but you can also bring your own corpus
(plain text, XML − but *not* PDF), and we will see what we can do...
If you bring your own corpus, you will need to send it to me a few days before the training.

2. Searching a corpus
We will see how to explore the corpus and search it using the CQL query language (also used by other
software).

Facilitator
Achille Falaise, Laboratoire de Linguistique Formelle (LLF - UMR7110)

Prerequisites
You must have a computer with TXM installed. You can download TXM here: https://txm.gitpages.huma-num.fr/textometrie/files/software/TXM/0.8.4/
. You do not need the latest version. However, please make sure
that TXM starts up properly! I will not be able to provide technical support
during the training.
We will also use a plain text editor (https://www.sublimetext.com/ is recommended) and a spreadsheet program (https://fr.libreoffice.org/download/telecharger-libreoffice/ is recommended—please note that Excel is also a spreadsheet program, but often causes problems for the purposes for which we will be using it).

Terms
Duration: 3 hours, 9:30 a.m. to 12:30 p.m.
Number of participants: 15 maximum

==============================================================================================

TXM for beginners (spoken corpora)

TXM is software that allows you to search corpora and extract concordances and statistics. This training course focuses on the analysis of spoken corpora (text corpora generally composed of orthographic transcriptions of spoken language).
The training course is intended for beginners and will consist of three parts.

1. Preparing a corpus
We will look at how to prepare a corpus of orthographic transcriptions in order to structure it in a format that can be used with TXM. In particular, we will convert formats (from ELAN, CLAN, or Text files, for example) and clean up the transcriptions (deleting annotations that cannot be used by TXM, for example).

2. Importing a corpus
We will see how to organize and import your corpora. Demo corpora will be provided, but you can also bring your own corpus. If you bring your own corpus, you will need to provide it to me a few days before the training.

2. Searching a corpus
We will see how to explore the corpus and perform searches using the CQL query language (also used by other software).

Facilitator
Loïc Liegeois, Language Research Laboratory (LRL)

We will also use a plain text editor (https://www.sublimetext.com/ is recommended) and a spreadsheet program (https://fr.libreoffice.org/download/telecharger-libreoffice/ is recommended—note that Excel is also a spreadsheet program, but often causes problems for the purposes we will be using it for).

Terms
Duration: 3 hours , 9:30 a.m. to 12:30 p.m., max. 20 people

Privacy | Accessibility