Bayesian Uncertainty Meets Stylometry: Rethinking Trust in Authorship Attribution

Iqra Zahid

Department of Computer Science

Student thesis: Phd

Abstract

Authorship Attribution (AA) is a foundational task in Natural Language Processing that seeks to identify the author of a given text by uncovering consistent linguistic and stylistic patterns. While classical stylometric approaches—relying on handcrafted features—have shown efficacy, they often struggle with scalability. At the same time, the rise of large language models (LLMs) has led to notable performance gains in AA, but these models remain opaque and difficult to interpret, particularly in high-stakes or low-resource scenarios. This thesis bridges the gap between traditional stylometry and modern deep learning, offering a unified framework that prioritises interpretability, generalisability, and predictive transparency. We begin by examining the reliability of linguistic feature-based methods across diverse domains, including human- and machine-generated texts, and introduce the largest static feature set to date for stylometric analysis. These features are then used to explore authorial fingerprints across languages, modalities, and text types. To address the limitations of single-loss neural models, we propose a multi-loss fusion framework that integrates contrastive and angular losses with standard classification objectives. This improves model robustness and class separation in challenging multiclass attribution tasks. Building on this, we develop BEDAA (Bayesian Enhanced DeBERTa for Authorship Attribution), which introduces Monte Carlo Dropout for uncertainty-aware predictions—enabling calibrated, interpretable outputs without modifying the underlying architecture. Together, these contributions chart a methodological evolution from static stylometry to deep representation learning and Bayesian uncertainty estimation. The resulting framework demonstrates improved accuracy, robustness, and interpretability across cross-domain, multilingual, and machine-generated authorship attribution tasks, including source code. This work lays a foundation for more trustworthy and generalisable authorship analysis in real-world applications.

Date of Award	9 Jun 2025
Original language	English
Awarding Institution	The University of Manchester
Supervisor	Riza Theresa Batista-Navarro (Supervisor) & Andrea Nini (Supervisor)

Keywords

NLP
Authorship Attribution
Uncertainty Quantification

Cite this

Documents

Bayesian Uncertainty Meets Stylometry: Rethinking Trust in Authorship Attribution
File: application/pdf, 3.96 MB
Type: Thesis