A METHOD OF AUTHORIAL REPRESENTATIONS OF TEXTS FORMING USING CONTRASTIVE LEARNING

VIKTORIIA BADZ; VASYL TESLYUK

doi:10.31891/2307-5732-2026-365-94

Authors

VIKTORIIA BADZ Lviv Polytechnic National University Author https://orcid.org/0009-0002-8114-2723
VASYL TESLYUK Lviv Polytechnic National University Author https://orcid.org/0000-0002-5974-9310

DOI:

https://doi.org/10.31891/2307-5732-2026-365-94

Keywords:

authorship attribution, contrastive learning, transformer models, text embeddings, metric learning, latent space, stylometry

Abstract

The problem of authorship attribution remains one of the fundamental challenges in computational linguistics, digital forensics, and intelligent information systems, particularly in the context of rapidly growing volumes of unstructured textual data. Although modern transformer-based architectures provide high-quality contextual embeddings, their latent representations are not explicitly optimized for discriminating between authorial styles. As a result, texts produced by different authors may form overlapping clusters in the embedding space, which negatively affects classification robustness and interpretability.

The paper presents a method for forming authorial representations of texts using supervised contrastive learning aimed at improving the separability of author classes in the feature space. The created approach integrates transformer-based encoders with a contrastive metric learning module that explicitly optimizes embedding geometry by minimizing intra-class variance and maximizing inter-class distances. Positive and negative text pairs are constructed based on author labels, and a contrastive loss function is applied to enforce discriminative representation learning. The method includes stages of text preprocessing, contextual embedding extraction, pair construction, contrastive optimization, and author-level aggregation followed by classification.

Experimental evaluation was conducted on benchmark authorship attribution datasets, including PAN-2019, IMDB62, and the Blog Authorship Corpus. The created method was compared with baseline transformer classifiers without contrastive optimization. The results demonstrate a consistent improvement in classification accuracy, macro-averaged F1-score, and clustering quality metrics. The contrastive framework significantly enhances embedding compactness for texts of the same author while increasing distances between different author clusters. Experimental results confirm the effectiveness of the proposed method compared to baseline neural models without contrastive learning.

The scientific contribution of this study lies in the development of a supervised contrastive learning framework specifically tailored for authorial representation formation. The practical significance of the obtained results consists in improving the reliability of automated authorship attribution systems and enabling their application in digital forensics, plagiarism detection, cybersecurity monitoring, and large-scale text analytics. The proposed method can be extended to multilingual and cross-domain scenarios, forming a foundation for further research in discriminative author modeling and metric learning in natural language processing.

A METHOD OF AUTHORIAL REPRESENTATIONS OF TEXTS FORMING USING CONTRASTIVE LEARNING

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Language

Make a Submission

Index

Flag