Game of Thrones is brought into play through Machine learning. See. How?

By ridhigrg |Email | May 13, 2019 | 1860 Views

No-one wants the beloved series to end. Some, like the geeks at Pachyderm, have gone to great lengths to extend its life span, to the point of employing ML to serve the Iron Throne. 

This is a new example of style transfer where ML identifies the essential characteristics of a genre in order to create its own examples, such as we've seen before with art and even with cooking.

But first of all, what is Pachyderm and where does that word come from?

Pachyderm is a direct translation of the Greek word which means 'fat skinned', referring to the Elephant in their logo. The Elephant itself is used to denote Pachyderm's relation to Hadoop, another Elephant logo, but posing as its newer and better counterpart.

Pachyderm, like Hadoop, is an analytics engine, following the same philosophy but built to go where Hadoop falls short. So in place of writing jobs in Java and running them on the JVM, it uses Docker containers deployed on Kubernetes which can contain jobs written in any language. In place of HDFS, it uses the Pachyderm File System and, in place of MapReduce, it uses Pachyderm Pipelines to pipe containers together, as shown in this GoT ML example. For more about the latest release, see of Pachyderm Gets Faster And Gets Funding.

The aim of Pachyderm is to enable programmers not familiar with MapReduce or Java to write their analytics applications with whatever tool they see fit, as such developers can do their big data processing without requiring specialized know-how, hence Pachyderm's posing as a much more accessible option than Hadoop.

The GoT example involves deploying Pachyderm as a container on Kubernetes on an AWS S3 Bucket and a repository holding the input/output data used.

The input data are the preprocessed GoT scripts of Season 1 Episode 1. They have been labeled to help the Recurrent Neural Network used to distinguish between dialogs and other elements of the script regarded as "exposition", mainly stage directions and context settings:

<open-exp> First scene opens with three Rangers riding through a tunnel, leaving the Wall, and going into the woods <eos> (Eerie music in background) One Ranger splits off and finds a campsite full of mutilated bodies , including a child hanging from a tree branch <eos> A birds-eye view shows the bodies arranged in a shield-like pattern <eos> The Ranger rides back to the other two <eos> <close-exp>

<boname> WAYMAR_ROYCE <eoname> What d 'you expect <question> They 're savages <eos> One lot steals a goat from another lot and before you know it, they 're ripping each other to pieces <eos>

<boname> WILL <eoname> I 've never seen wildlings do a thing like this <eos> I 've never seen a thing like this, not ever in my life <eos>

<boname> WAYMAR_ROYCE <eoname> How close did you get <question>

<boname> WILL <eoname> Close as any man would <eos>

<boname> GARED <eoname> We should head back to the wall <eos>

<boname> ROYCE <eoname> Do the dead frighten you <question>
<boname> GARED <eoname> Our orders were to track the wildlings <eos> We tracked them <eos> They won 't trouble us no more <eos>

<boname> ROYCE <eoname> You don 't think he 'll ask us how they died <question> Get back on your horse <eos>

<open-exp> GARED grumbles <eos> <close-exp>

Pachyderm takes this locally stored data and starts the first pipeline, feeding the data to a container running TensorFlow to train the model. After processing, TF writes the model's output as well as the TF specific files back to the repo.

Then the 2nd pipeline begins which takes as input those files outputted in the previous step and with that feeds another TF instance in order to generate the scripts. As simple as that, without the complexity of MapReduce.

The first scripts generated didn't make a lot of sense, but the quality can be improved a lot by raising the accuracy level of TensorFlow, as well as feeding the NN more scripts. Be aware, however, that increasing the accuracy also increases the processing time from hours to even days depending on the underlying infrastructure.

The project's Github repo contains the scripts generated with low model accuracy, not bad as the whole process lasted under an hour, and while not winning the Emmys, they represent a considerable feat of style transfer insofar as they capture the "very essence" of GoT.

[ OLENNA ]: Then where would she serve that Grand Master and nine are men? And soon, we need to rule that to a man who wore on the walls.

[ CERSEI ]: The world doesn't 't make there . .

[ CERSEI ]: Home . Oh.

{{ MAESTER UNELLA takes the leaves, merchants pushes againt }} 

[ SAM ]: Doesn 't make me wasn't asking a woman was better prison {{ Something moving with Bronn and scans to another side, where Jon grabs his sword and is added on the actor 's main performance of life . Inside , as a number of moments cuts like that, we 're thrown to Daenerys, who seems to stop. }}.

[ LADY_CRANE ]: impression Ser later. Let him teach him. All I always have to banners south of my life, Meryn.
{{ Stannis runs away and Melisandre walks through the lift door, no smiles). }}.

This neural net didn't know English, much less grammar, but it did pick up on some pieces of structure:

it learned that each line is either a spoken line or an exposition line

it learned that spoken lines begin with a name

it learned which words were names
    it learned how to open/close the brackets around a name
    it learned the types of words (present tense verbs, I/you 
     nouns) used in spoken lines

it learned how to open/close the brackets for exposition lines
    it learned what type of words (3rd person) is used
Putting more scripts would automatically start the pipelines again to retrain the model and generate the new scripts. The project's repo also includes instructions on how to replicate the experiment yourself, so have a look and judge for yourself if the quality of the output holds true to the series's plot.

Say, why not let the Network write Season 9 and keep everyone happy?

Source: HOB