Machine learning has revealed exactly how much of a Shakespeare play was written by someone else

Literary analysts have long noticed the hand of another author in Shakespeare’s Henry VIII. Now a neural network has identified the specific scenes in question—and who actually wrote them.

Emerging Technology from the arXivarchive page

November 22, 2019

Photograph of Shakespeare manuscriptAP

For much of his life, William Shakespeare was the house playwright for an acting company called the King’s Men that performed his plays on the banks of the River Thames in London. When Shakespeare died in 1616, the company needed a replacement and turned to one of the most prolific and famous playwrights of the time, a man named John Fletcher.

Fletcher’s fame has since quelled. But in 1850, a literary analyst named James Spedding noticed a remarkable similarity between Fletcher’s plays and passages in Shakespeare’s Henry VIII. Spedding concluded that Fletcher and Shakespeare must have collaborated on the play.

The evidence comes from studies of each author’s linguistic idiosyncrasies and how they crop up in Henry VIII. For example, Fletcher often writes ye instead of you, and ’em instead of them. He also tended to add the word sir or still or next to a standard pentameter line to create an extra sixth syllable.

These characteristics allowed Spedding and other analysts to suggest that Fletcher must have been involved. But exactly how the play was divided is highly disputed. And other critics have suggested that another English dramatist, Philip Massinger, was actually Shakespeare’s coauthor.

Which is why analysts and historians would dearly love to determine, once and for all, who wrote which parts of Henry VIII.

Enter Petr Plecháč at the Czech Academy of Sciences in Prague, who says he has solved the problem using machine learning to identify the authorship of more or less every line of the play. “Our results highly support the canonical division of the play between William Shakespeare and John Fletcher proposed by James Spedding,” says Plecháč.

The new approach is straightforward in principle. Machine-learning algorithms have been used for some years to identify distinctive patterns in the way authors write.

The technique uses a body of the author’s work to train the algorithm and a different, smaller body of work to test it on. However, because an author’s literary style can change throughout his or her lifetime, it is important to ensure that all works have the same style.

Once the algorithm has learned the style in terms of the most commonly used words and rhythmic patterns, it is able to recognize it in texts it has never seen.

Plecháč follows exactly this technique. He first trains the algorithm to recognize Shakespeare’s style using other plays written at the same time as Henry VIII. These plays are The Tragedy of Coriolanus, The Tragedy of Cymbeline, The Winter’s Tale, and The Tempest.

He then trains the algorithm to recognize the work of John Fletcher using plays he wrote at this time—Valentinian, Monsieur Thomas, The Woman’s Prize, and Bonduca.

Finally, he lets the algorithm loose on Henry VIII and asks it to determine the author of the text, using a rolling window technique to scroll through the play.

The results are interesting. They tend to agree with Spedding’s analysis that Fletcher wrote scenes amounting to almost half the play. However, the algorithm allows a more fine-grained approach that reveals how the authorship sometimes changes not just for new scenes, but also towards the end of previous ones. For example, in Act 3, Scene 2, the model suggests a mixed authorship after line 2081 and finds that Shakespeare takes over completely at line 2200, before the start of Act 4, Scene 1.

Plecháč also trained his model to recognize the work of Philip Massinger but finds little evidence of his involvement. “The participation of Philip Massinger is rather unlikely,” he concludes.

That’s interesting work that shows how linguists and literary analysts are using machine learning to better understand our literary past.

However, there is much work ahead. For example, when machine vision algorithms were trained to recognize artistic style, computer scientists quickly worked out how to extract a style and apply it to other images, using a technique known as neural style transfer. Overnight, it became possible to give an ordinary photograph the style of a Van Gogh or a Monet.

That raises the question of whether a similar technique is possible for text. Might it be possible to transform an essay, or indeed an article for MIT Technology Review, into the style of Shakespeare or John Fletcher, for example?

Sadly, not yet, other than in the trivial way of replacing word like them with ’em and so on. This is largely because the underlying structure of communication is not well enough understood by linguists or their algorithmic charges.

Ref: arxiv.org/abs/1911.05652 : Relative contributions of Shakespeare and Fletcher in Henry VIII: An Analysis Based on Most Frequent Words and Most Frequent Rhythmic Patterns