Keynotes

Quentin Feltgen

University of Ghent

Resampling techniques in corpus data statistical analysis

In this talk, I intend to present a general approach on statistical analysis, which consists in resampling data to build distributions over relevant quantities. An observed empirical value can then be compared to such a distribution in order to assess its significance. Compared to more traditional statistical tests, this approach is extremely versatile and can be tailored to address virtually any research question, without too much of a concern for the underlying distribution of the data (e.g. one does not need to check for normality). From an epistemological point of view, resampling data, insofar as it probes their statistical structure, not only provides a data analysis toolkit, but also a way forward in unravelling the underlying properties of language organization. 

I will present the main idea behind resampling, highlight one key caveat of these techniques when applied to language data, and detail three applications of resampling techniques: the study of a linguistic pattern’s productivity, the comparison of diachronic dynamics across different types of a given construction, and the automatic detection of semantic change in semi-schematic constructions.

 

Francesca Frontini

Computational linguistics institute Antonio Zampolli, CNR Pisa

Towards FAIR Specialized Corpora: a Bilingual Corpus in the Wastewater and Stormwater Domain

In this presentation, we will explore the challenges related to the creation, annotation, and dissemination of a multilingual corpus dedicated to the field of wastewater and stormwater networks. We will specifically address the stages of corpus creation, alignment, and annotation, with a particular focus on named entity recognition and information extraction methods. A major objective of this work is to ensure compliance with the FAIR principles (Findable, Accessible, Interoperable, Reusable), through the integration of the corpus in the CLARIN infrastructure.

 

Biagio Ursi

University of Orléans

Linguistics and interactional corpora: queries, exploitations and comparisons

In this talk, I will present my current research trajectories in the field of interactional corpus linguistics, focusing on three areas.
Firstly, the interrogation of spoken corpora that researchers can carry out on freely available online databases in order to provide a sequential and multimodal analysis of conversational exchanges. Secondly, the exploitation of corpus studies that researchers can consider from an applied linguistics perspective (mainly for didactic purposes). Finally, comparisons of the use of certain linguistic structures that can be drawn from different corpora of spoken languages in interaction. For this last part, I will propose contributions to the comparative study of discourse markers in two Romance languages, French and Italian talk-in-interaction.

 

Geoffrey Williams

University of Grenoble Alpes

Corpus linguistics: from exploratory origins to a necessary future

So-called “Large Language Models” have become the flavour of the month, and have superseded Web as Corpus in language engineering. There usability in computing cannot be denied, but are they really added value in Corpus linguistics?
To answer this question, it is necessary to return to basics and the accepted definition of the concept of ‘corpus’, the very heart of corpus linguistics, but also to look at the very basics of all scientific enquiry.
To illustrate the importance of a balance and representativity in corpus building, I shall use examples drawn from special language corpora and the key theoretical issues that underlie all creation and analysis of corpora: the contexts of culture and situation


 
Online user: 4 Privacy | Accessibility
Loading...