Defending Against Authorship Identification Attacks

🔬 Research Summary by Haining Wang, a Ph.D. student at Indiana University Bloomington, specializing in natural language processing and large language models.

[Original paper by Haining Wang]

Overview: Writings reveal one’s identity, even when personal identifying information is removed or protected by Tor and end-to-end encryption methods. This manuscript explores the techniques used to uncover individual authors’ identities and defenses against those techniques. Practical suggestions against such attacks are provided at the end.

Introduction

What are authorship identification attacks?

Suppose you just discovered that your company is engaging in unethical activities. Driven by a sense of justice, you decide to blow the whistle. To avoid retaliation, you choose to post a few lines on social media using a throw-away email and a camouflaged IP address. Before hitting ‘enter,’ a thought strikes you: does this precaution truly guarantee anonymity?

The answer is NO. Studies have shown that a person’s writing style can reveal their identity.

Based on the most frequent word distribution, one basic machine learning method correctly predicts the author 70% of the time from a pool of 40 candidates. The probability of evading linguistic forensics is slim given a company’s access to an employee’s past emails and reports.

In this digital era, every text, whether a tweet, blog post or research paper, can potentially be used to trace its author’s subsequent writings, a task known as authorship identification. Abuse of authorship identification raises significant privacy concerns, particularly for whistleblowers, journalists, activists, and individuals living under oppressive regimes.

Key Insights

How are people fingerprinted through their writings?

The devil is in the details. Modern machine learning-based methods predominantly leverage telltale signs that authors are least aware of, such as the use of function words. These words carry minimal semantic weight but are essential for grammar, like ‘is’ and ‘that.’ Such indicators, characterized by their high frequency, wide dispersion, and independence from the content, are deeply embedded in one’s writing style. Even if we refrain from using our preferred emojis and spelling variations (e.g., changing ‘folks’ to ‘folx’), it is difficult to disrupt the overall patterns of function word use. Indeed, both field studies and analyses of large bodies of text indicate that individuals’ writing styles can be distinguished by their word choice and syntax.

How can one defend against authorship fingerprinting?

Manual obfuscations

One may simply resort to concealing one’s style by deliberately writing differently; another approach is to mimic the style of a famous author with a distinctive style, e.g., Cormac McCarthy. Field studies have shown that non-professionals can effectively alter their style in a new 500-word essay, greatly reducing the accuracy of standard authorship identification models trained with their previous writings to almost chance level. It’s great that we always have a fallback plan, requiring manual effort if trustworthy computational resources are unavailable. However, extra caution is necessary for messages longer than a few paragraphs or when long-term anonymity is desired.

Rules-based obfuscations

Of course, there also are ways to obfuscate one’s style with tools. For instance, it is possible to automatically manipulate a document using a set of rules that alter sentence structure, spelling, punctuation, and word choice. Compared to manual evasion, such rules must ensure the altered document still conveys its intended meaning. (Randomly replacing every word in an essay would make it hard to detect the author, but would not be very useful.) Therefore, synonym substitution is the most popular choice among all rule-based methods. Researchers substitute original words with synonyms (or near-synonyms) from thesauri (e.g., WordNet) and word embeddings (e.g., GloVe) to disrupt patterns linking the current text to previous writings.

Rule-based approaches have proven very effective in evading authorship fingerprinting in a series of authorship identification posed to researchers, perhaps because they can potentially disrupt virtually any aspect of writing style. However, research has shown that such straightforward perturbations are vulnerable to reverse engineering: the adversary could easily build a model to neutralize the thin disguise. Also, if rules are badly crafted or too aggressive, they can make the text look suspicious. Researchers have used heuristics to make the application of the rules less rigid and predictable. While computationally demanding, such methods are less predictable and, thereby, more effective in obscuring writing styles.

Obfuscations with generative models

Recent efforts have focused on using generative models to alter writing style. A common tool is round-trip translation: by translating text into another language and back, we hope for accidental style changes. Increasing the number of ‘trips’ and using diverse languages can enhance this effectiveness. Another approach is style transfer, a form of monolingual ‘translation’ that works because we seek style change instead of cross-lingual understanding. For example, a language model can be fine-tuned using different versions of the Bible, taking advantage of the distinctive styles of, say, the King James Version and the International Children’s Bible. However, high-quality monolingual corpora can be hard to find, and researchers resort to training frameworks that do not rely on such corpora. These models have shown varied efficacy in reducing the performance of authorship identification.

Between the lines

Open research challenges

Despite advancements in defending against authorship analysis, significant challenges remain open:

How can we introduce randomness into perturbations without compromising the writing’s relevance and naturalness?
Which post-transformation style is most effective: generic, specific, or somewhere in between? Is more complex or simpler better?
How can we develop accessible software and deliver it to users who need to anonymize their writings?

Tips for those seeking anonymity

Unfortunately, no existing software is considered suitable for practical use by general users, mainly due to challenges in software delivery: online servers are susceptible to traffic analytics, so it’s unrealistic to expect the software used to go undetected in places where it’s needed. Here are some tips if authorship fingerprinting threatens you:

Have faith in yourself and try hard to write differently; you can do it by yourself, at least in a 500-word essay.
Use a local translator, like TranslateLocally; translating to a different language and back is beneficial.
Have experience with local large language models like Llama2? Good for you! Prompts like ‘Paraphrase the following content…’ are valid options.