A View from Emerging Technology from the arXiv
How an AI Algorithm Learned to Write Political Speeches
Political speeches are often written for politicians by trusted aides and confidantes. Could an AI algorithm do as well?
“Ask not what your country can do for you; ask what you can do for your country.”
—John F. Kennedy, 1961
When it comes to political speeches, great ones are few and far between. But ordinary political speeches, those given in U.S. congressional floor debates, for example, are numerous.
They are also remarkably similar. These speeches tend to follow a standard format, repeat similar arguments, and even use the same phrases to indicate a particular political affiliation or opinion. It’s almost as if there is some kind of algorithm that determines their content.
That raises an interesting question. Is it possible for a machine to write these kinds of political speeches automatically?
Today, we get an answer thanks to the work of Valentin Kassarnig at the University of Massachusetts, Amherst, who has created an artificial intelligence machine that has learned how to write political speeches that are remarkably similar to real speeches.
The approach is straightforward in principle. Kassarnig used a database of almost 4,000 political speech segments from 53 U.S. Congressional floor debates to train a machine-learning algorithm to produce speeches of its own.
These speeches consist of over 50,000 sentences each containing 23 words on average. Kassarnig also categorized the speeches by political party, whether Democrat or Republican, and by whether it was in favor or against a given topic.
Of course, the devil is in the details of how to analyze this database. Having tried a number of techniques, Kassarnig settled on an approach based on n-grams, sequences of “n” words or phrases. He first analyzed the text using a parts-of-speech approach that tags each word or phrase with its grammatical role (whether a noun, verb, adjective, and so on).
He then looked at 6-grams and the probability of a word or phrase appearing given the five that appear before it. “That allows us to determine very quickly all words which can occur after the previous five ones and how likely each of them is,” he says.
The process of generating speeches automatically follows from this. Kassarnig begins by telling the algorithm what type of speech it is supposed to write—whether for Democrats or Republicans. The algorithm then explore the 6-gram database for that category to find the entire set of 5-grams that have been used to start one of these speeches.
The algorithm then chooses one of these 5-grams at random to start its speech. It then chooses the next word from all those that can follow this 5-gram. “Then the system starts to predict word after word until it predicts the end of the speech,” he says.
There are few tricks along the way, of course. The algorithm knows, for example, the probability that a particular topic will appear in a speech. It then chooses topics by working out what other topics the speech already contains and determining how well these are being covered.
The results are surprisingly good. Here is an example of an automatically generated Democratic speech:
“Mr. Speaker, for years, honest but unfortunate consumers have had the ability to plead their case to come under bankruptcy protection and have their reasonable and valid debts discharged. The way the system is supposed to work, the bankruptcy court evaluates various factors including income, assets and debt to determine what debts can be paid and how consumers can get back on their feet. Stand up for growth and opportunity. Pass this legislation.”
That’s impressive given that there is no training involved other than the initial parts of speech tags, the 6-gram analysis of the political speech database and a little bit of magic sauce. Kassarnig has evaluated these speeches against criteria such as grammatical correctness, sentence transition and speech structure and content and found that they generally perform well. “In particular, the grammatical correctness and the sentence transitions of most speeches were very good,” he says.
Nevertheless, Kassarnig is not optimistic about his algorithm’s chances of taking the political stage by storm. “Despite the good results it is very unlikely that these methods will be actually used to generate speeches for politicians,” he says, presumably because the kind of unscrupulous politician who might exploit his algorithm is so rare (cough).
However, the algorithm could be used to generate other kinds of texts. Kassarnig suggests that it could produce news stories, given other stories on the same incident. Another option could be to produce blog posts about arXiv papers, given a large database of similar stories (ahem).
And he encourages anybody to have a go, say that all of his source code is available on GitHub (https://github.com/valentin012/conspeech). “We explicitly encourage others to try using, modifying and extending it,” he says. “Feedback and ideas for improvement are most welcome.”
Ref: arxiv.org/abs/1601.03313 : Political Speech Generation