Welcome into my own little personnal space... '
Bio
Dr. Djamé Seddah is a former tenured associate professor at Sorbonne University, now on a full time senior research position INRIA Paris in the Almanach team. His interests cover the field of natural language processing, mainly wide-coverage multilingual syntactic analysis, the syntax-semantics interface, language models for low-resource languages, etc. A specialist in the construction of annotated corpora (Sequoia corpus, French Social Media Bank, French Question Bank, Narabizi Treebank, etc.), he participated in the development of the CamemBERT, PagnolXL, CamemBERTa and ModernCamemBERT language models, as well as character-based models for dialectal and highly noisy languages.
His current research focuses on language models and possible ways of avoiding their weaponization (content detection, bias detection and mitigation, etc.). To this end, together with Benoit Sagot and Eric de la Clergerie, he's extremely involved in the development of an upcoming series of LLMs that focus on French but ssshhh...
Contact me if you're interested !
NEWS (2025)
- Arij Riabi defended her PhD thesis on March 18th, "Small is Beautiful: Addressing Resource Scarcity, Language Variation, and Transfer Challenges for Automatic Detection of Harmful Language". Congrats !
NEWS (2024)
- I've defended my HDR (Habilitation à diriger) on September 18th, "From French Statistical Parsing to Low-Resource Language Modeling: a Research Journey". Joie et tout..
- Our BPI "Commun Numériques" Projects Scribe got accepted! With LightOn, Aleia and Idris/CNRS ! Same for "Code Common" with Software Heritage. I'm the PI for Almanach for both.
- Our Action Exploratoire SALM, Sociologically Aware Language Modeling, got accepted ! Co-Pi with Jean-Philippe Cointet (Science Po). Work with Chloé Clavel and Alexander Kindle.
- I'm a PSAI, Prairie 2, Chair !
OLD NEWS (2023)
- Fantastic: we got a paper accepted at ACL2023 ! check it out here it's about a new sota LM for French ! Congrats Wissam !
- Got promoted to the Hors-Classe this year !
- Ghazi Felhi defended his PhD thesis :)
- I got interviewed by France Info about ChatGPT and its impact on student production (Replay here) and by L'Express magazine about the ways to possibly detect language models-generated content (article here) !
- Gave a cool talk at Goteborg in January :)
Publications page updated, many papers were accepted! Very good 2022 year :)
I'll be giving a talk next week at Goteborg in Sweden :)
Benjamin Muller defended his thesis on November 2022, congrats!
We got a NAACL accepted paper, bravo Ghazi and Jospeh !
I gave an interview in January at France Culture for La méthode scientifique ("7mn reportage")
Great news! I gave a tutorial at the NLP winter school in the Alps (Sometime between January 17th and January 21st, 2022) !!
Also I gave a cool talk at the ENS Lyon as part of their IXXI seminar series Recording (in French) here
I'm invited to give a talk at the DFKI on CamemBert and Character-based Bert models (October 12th)!
Another cool year (2021) : One huge European project accepted! 3 papers accepted at EACL, NAACL and EMNLP ! Many ongoing work papers accepted at cool workshops! Congrats to my students!
Great year so far (2020) (beside that whole global pandemic of course): 3 papers accepted at ACL, one at LREC. Co-organization of the first two IWPT shared tasks on parsing Enhanced Universal Dependencies and super interesting collaborating work UGC treebanking.
Another talk given at the NLP Paris Meetup about CamemBERT in last January, the French contextual Language models that was actually the first large scale BERT model to be released outside English. Talk on the same topic also given in front of the whole Axa R&D world-wide groups.
Together with Marie Candito, I'll be organizing the Treebank and Linguistics Theory conference 2019 edition in paris. It'll be a joint even with Depling 2019 and the UD 2019 workshop. (Actually it was a whole week of conferences, called the SyntaxFest 2019. Check that out SyntaxFest 2019 Website
Our 2017 #ParsingTragedy system was re-evaluated and finally ranked #3 overall :) This year's system (by mostly my students Ganash Jawahar and Benjamin Muller ranked #10 with a nice model though (ELmo and external lexicon features, Dozat's neural model).
Teaming up with the Stanford NLP group (Sebastian Schuster and Chris Manning). We ranked #1 and #3 at the Extrinsic Evaluation Shared 2017 Shared task :)
We unofficially scored #6 at the #ParsingTragedy shared task. Why unofficial? because a bug in the official metadata led our parser to failback on delexicalized mode for all languages. Ranking #26 but #3 on POS tagging. Check out or paper
(...)
co-orgnanizing the SPMRL 2014 Shared Task (with Reut Tsarfaty and Sandra Kübler)
Check it out :
http://www.spmrl.org/spmrl2014-sharedtask.html
it's still ongoing. This year with an emphasis on semi supervized parsing!
honorably participating in the Semeval task 8 broad coverage semantic parsing (two transition-based graph parsers + syntactic features)
co-orgnanizing the SPMRL 2013 Shared Task (with Reut Tsarfaty and Sandra Kübler)
Check it out :
http://www.spmrl.org/spmrl2013-sharedtask.html
More up-to-date wiki is here :
http://dokufarm.phil.hhu.de/spmrl2013/doku.php
Ranked #2 and #3 on the constituency parsing track of the sancl 2012 google shared task system based on self-training, delicate part-of-speech, normalisation and hard clustering (With Benoit Sagot)
(last updated May 12th, 2025)