Abstract
This study explores linguistic individuality - each individual’s unique repertoire of units (sequences of words, morphemes or parts of speech) that they use recurrently - through a corpus-based analysis. Whilst previous research tends to focus on collective linguistic features, this study targets fine-grained, individual-specific patterns that can be identified through computational authorship verification techniques. Some of these sequences are highly specific to an individual, such as Tony Blair’s use of "entirely understand" (Mollin 2009). However, there is also intuitively overlap across the repertoires of units that different individuals possess, for instance very common lexical bundles such as "I said to him" (Biber et al. 2021). As such, it is more often the combination of a large number of core grammatical constructions, as opposed to a small number of noticeably idiosyncratic phrases, that results in greater variation between authors than within one individual’s language (Barlow 2013). The present study found that across 18 different authors, each writing two summaries of the exact same text 30 days apart, only one character 7-gram featured across all of the texts. All authors used at least one long character n-gram (7-9 characters) in both texts that was entirely unique to them. The study also explores whether the component units within the n-grams differ between what is entirely individual, yet consistent, and what is used consistently, but is shared by other members of the group. The implications of this research centre on enhancing our understanding of why authorship analysis methods work, producing empirical evidence of cognitive linguistic theories of individuality, which a limited number of existing studies have aimed to investigate, and exemplifies the benefits and possibilities of applying corpus linguistic methodologies to authorship analysis problems.
List of references
Barlow, Michael. 2013. Individual Differences and Usage-based Grammar. International Journal of Corpus Linguistics 1, 443-478.
Biber, Douglas, Stig Johansson, Geoffrey N Leech, Susan Conrad & Edward Finegan. 2021. Lexical Expressions in Speech and Writing. In Grammar of Spoken and Written English, 979 – 1030. Amsterdam: John Benjamin’s Publishing Company.
Mollin, Sandra. 2009. “I entirely understand” is a Blairism: The Methodology of Identifying Idiolectal Collocations. International Journal of Corpus Linguistics 14, 367-392.
List of references
Barlow, Michael. 2013. Individual Differences and Usage-based Grammar. International Journal of Corpus Linguistics 1, 443-478.
Biber, Douglas, Stig Johansson, Geoffrey N Leech, Susan Conrad & Edward Finegan. 2021. Lexical Expressions in Speech and Writing. In Grammar of Spoken and Written English, 979 – 1030. Amsterdam: John Benjamin’s Publishing Company.
Mollin, Sandra. 2009. “I entirely understand” is a Blairism: The Methodology of Identifying Idiolectal Collocations. International Journal of Corpus Linguistics 14, 367-392.
Original language | English |
---|---|
DOIs | |
Publication status | Published - 30 Jun 2025 |
Event | International Corpus Linguistics Conference - Aston University/Birmingham City University/University of Birmingham, Birmingham, United Kingdom Duration: 30 Jun 2025 → 3 Jul 2025 Conference number: 13 |
Conference
Conference | International Corpus Linguistics Conference |
---|---|
Abbreviated title | CL2025 |
Country/Territory | United Kingdom |
City | Birmingham |
Period | 30/06/25 → 3/07/25 |
Fingerprint
Dive into the research topics of 'A corpus analysis of idiolectal n-grams'. Together they form a unique fingerprint.Impacts
-
Forensic linguistic authorship analysis of disputed texts
Nini, A. (Participant)
Impact: Legal impacts, Societal impacts