Research Article
Towards a Set of Morphosyntactic Labels for the Fulani Language: An Approach Inspired by the EAGLES Recommendations and Fulani Grammar
Zouleiha Alhadji Ibrahima,
Charles Moudina Varmantchaonala,
Dayang Paul*
,
Kolyang
Issue:
Volume 9, Issue 2, December 2025
Pages:
110-121
Received:
4 June 2025
Accepted:
27 June 2025
Published:
21 July 2025
DOI:
10.11648/j.ajai.20250902.12
Downloads:
Views:
Abstract: This paper details the development of a morphosyntactic label set for the Adamawa dialect of the Fulani language (Fulfulde), addressing the critical lack of digital resources and automatic processing tools for this significant African language. The primary objective is to facilitate the creation of a training corpus for morphosyntactic tagging, there by aiding linguists and advancing Natural Language Processing (NLP) applications for Fulani. The proposed label set is meticulously constructed based on a dual methodological approach: it draws heavily from the well-established EAGLES (Expert Advisory Group on Language Engineering Standards) recommendations to ensure corpus reuse and cross-linguistic comparability, while simultaneously incorporating an in-depth analysis of Fulani grammatical specificities. This adaptation is crucial given the morphological richness and complex grammatical structure of Fulani, including its elaborate system of approximately 25 noun classes, unique adjective derivations, and intricate verbal conjugations. The resulting tagset comprises 15 mandatory labels and 54 recommended labels. While some EAGLES categories like "article" and "residual" are not supported, new categories such as "participle," "ideophone," "determiner," and "particle" are introduced to capture the nuances of Fulani grammar. The recommended tags further detail the mandatory categories, subdividing nouns into proper, common singular, and common plural; verbs based on voice and conjugation (infinitive active, middle, passive; conjugated active affirmative/negative, middle affirmative/negative, passive affirmative/negative); and adjectives and pronouns into more specific types based on demonstrative, possessive, subject, object, relative, emphatic, interrogative, and indefinite functions. Participles are divided into singular and plural, adverbs into time, place, manner, and negation, numbers into singular and plural, and determiners into singular and plural. Particles are further broken down into dicto-modal, abdominal, interrogative, emphatic, postposed, and postposed negative. The categories of preposition, conjunction, interjection, unique, punctuation, and ideophone remain indivisible. This meticulously defined tag set was utilized to manually annotate 5,186 words from Dominique Noye’s Fulfulde-French dictionary, creating a valuable, publicly accessible resource for linguistic research and NLP development. Furthermore, the paper outlines a robust workflow for automatic morphosyntactic tagging of Fulfulde sentences, leveraging a Hidden Markov Model (HMM) in conjunction with the Viterbi algorithm. This approach, which extracts transition and emission probabilities from the annotated corpus, enables the disambiguation of morphosyntactic categories within context, considering the specific syntactic and lexical patterns of the Adamawa dialect. Ultimately, this work significantly contributes to the digitization and standardization of the Fulani language, enhancing the performance of linguistic tools and fostering its integration into digital technologies and multilingual systems.
Abstract: This paper details the development of a morphosyntactic label set for the Adamawa dialect of the Fulani language (Fulfulde), addressing the critical lack of digital resources and automatic processing tools for this significant African language. The primary objective is to facilitate the creation of a training corpus for morphosyntactic tagging, the...
Show More