Assessing the suitability of forensic authorship analysis methodologies for speech data

James Tompkinson, Andrea Nini

Research output: Contribution to conferenceAbstractpeer-review

Abstract

The development of new analytical methods and frameworks which could be integrated into forensic speaker comparison (FSC) work is a core focus for research in forensic speech science. In this paper, we explore the applicability of methods that have been used in forensic authorship analysis (FAA) to speech data. Our work has two main areas, 1) whether methods borrowed from authorship analysis can be used to analyse discrete phonetic variables using a likelihood-ratio based framework and 2) whether the embedding of auditory phonetic analysis with “higher order” features (Gold and French 2011) such as lexis, grammar and morphology, which are frequently considered in FAA tasks, can be used for speaker comparison. Our work builds on research by Sergidou et al. (2023), who showed that frequent words did have some speaker discriminatory power, and argued that this could be useful in FSC casework. We expand this work to examine how phonetic variation can be incorporated into such a framework. We analysed transcribed speech data from a random sample of 30 speakers from the West Yorkshire Regional English Database (Gold 2020) across two different speaking styles (Task 1 and Task 2), using two well-known authorship analysis methods which incorporate the likelihood ratio (LR) framework: Cosine Delta (Ishihara 2021) and Phi n-gram tracing (Nini 2023). We applied these methods to transcripts which had been adapted to represent a range of phonetic features - vocalised hesitation markers, syllable-initial realisations of /θ/, intervocalic word-medial /t/, syllable-initial /l/ and realisations of the -ing suffix - to assess 1) whether algorithms used in FAA are similarly effective on phonetic feature sets of this kind and 2) whether the combination of “higher-order” linguistic features with segmental phonetic analysis would achieve greater speaker discriminatory power. Our findings support previous research which has suggested that methods used to discriminate between authors can be usefully applied to transcribed speech data. We find that Cosine Delta and N-gram tracing are both effective in performing speaker comparison on transcribed speech data. In addition, our results show how a logistic regression calibrated Cosine Delta using the consonant phonetic features alone already offers valuable information. The analytical framework for this project, where phonetic information is embedded in transcripts and then subjected to authorship analysis techniques using the likelihood ratio paradigm, could potentially be used as a way of systematically evaluating auditory phonetic variables within a likelihood-ratio approach even when the phonetic features are discrete.

References
Gold, E. (2020). WYRED - West Yorkshire Regional English Database 2016-2019. [data collection]. UK Data Service. SN: 854354, DOI: 10.5255/UKDA-SN-854354
Ishihara, Shunichi. 2021. Score-based likelihood ratios for linguistic text evidence with a bag-of-words model. Forensic Science International. Elsevier 327. 110980.
Nini, A. (2023). A Theory of Linguistic Individuality for Authorship Analysis. Elements in Forensic Linguistics. Cambridge University Press.
Sergidou, E. K., Scheijen, N., Leegwater, J., Cambier-Langeveld, T., & Bosma, W. (2023). Frequent-words analysis for forensic speaker comparison. Speech Communication, 150, 1-8.
Original languageEnglish
Number of pages29
DOIs
Publication statusPublished - 21 Jul 2025
EventAnnual Conference of the International Association for Forensic Phonetics and Acoustics - Leiden University, The Hague, Netherlands
Duration: 20 Jul 202523 Jul 2025
Conference number: 33
https://www.universiteitleiden.nl/en/events/2025/07/iafpa

Conference

ConferenceAnnual Conference of the International Association for Forensic Phonetics and Acoustics
Abbreviated titleIAFPA
Country/TerritoryNetherlands
CityThe Hague
Period20/07/2523/07/25
Internet address

Fingerprint

Dive into the research topics of 'Assessing the suitability of forensic authorship analysis methodologies for speech data'. Together they form a unique fingerprint.

Cite this