*The Following Press Release Was Issued By Flix Agency*
Andrew Paley Releases New Video And Announces Split 7″ With Days N Daze
In the course of the release of his new album “Scattered Light,” singer / songwriter Andrew Paley has created a generative art project with the help of a self-created artificial intelligence, which functions as a new video for his song “Sequels.” The video functions both as a reminiscence of the film classics of the 80s and as a critique of the Trump era. The result is impressive and not only exciting for film nerds.
Andrew Paley has also announced a new split 7″ with the Fatwreck folk-punk band Days N Daze, which will be released soon via SBÄM and Flail Records. Days N Daze have covered the song “Caroline.” The original by Andrew Paley is also featured on the vinyl EP.
“Caroline” by Andrew Paley as video: https://youtu.be/gaTQR5QF6gI
Here you can find the most important information about the video from the artist himself:
How did the idea develop?
I’m working on a PhD in artificial intelligence, but my research is in a different space (conversational systems, natural language processing, democratization of information access). That said, over the past couple of years, I’ve been working on a set of independent research spikes in the space of generative art. It’s been a fun way to explore the edge of the possible and come to understand a space that’s outside my regular work and it’s also given me a way to merge the research and music halves of my life. Earlier in the year, I created a couple of videos for other songs (“Give Up” and “Stay Safe”) based on getting GANs to dream up some otherworldly imagery and sync it to music, but I wanted to take this video in a different direction.
I’ve had a longstanding interest in and concern about the space of deepfakes and what they’ll do to our relationship to media and information. Somewhere in there, I decided one way to get a handle on what to make of them would be to actually get hands on with what’s possible and conceptually, I thought that mining the movies I loved growing up would make for some great fodder (though I admittedly underestimated just how much editing would be involved). So, then when I read about the Wav2Lip work a couple months back, everything just lined up.
How were the scenes selected?
There are different ideas that I explore throughout the song — thematic connections, little callbacks, conceptual evolutions across scenes/movies — and they tie in various ways to the themes of the song itself. Collectively, it’s also in some way me taking snapshots from an earlier era and pulling them forward into the chaos of now. It’s an experiment in leveraging the shared language of cultural touchstones and reimagining them as raw material — in repurposing the familiar as a means of going a few steps down into some new form of uncanny valley in this budding era of synthetic media. It left me all the more convinced that the next five to 10 years are going to be pretty wild (and potentially dangerous) as this space evolves. So, I guess this video is both a reminder of where we’ve been and also a hint of where we might be headed.
Andrew: First, the slightly technical background, though a simplified version: I leveraged a model that had been trained in a generative adversarial network (GAN) architecture. It’s a now-common approach to training generative models that involves two competing models — one that creates (the generator) and one that judges the output of the generator as passable or not (the discriminator). These two models play off each other until the generator is capable of reliably creating “passable” work. More specifically, there has been much research into lip re-syncing in videos over the past few years, but previous techniques either required the model be trained on the face it’s trying to re-animate and/or didn’t do very well with arbitrary moving images. Recent research — as in the past few months — has yielded a model that actually can pull off re-animating arbitrary faces in videos pretty well, and it’s called Wav2Lip. Reading about it piqued my curiosity, so I thought I’d see what I could do with it.
So, the inputs to the model are an audio file (the track of my singing from the song) and a video file (the face of a person that I’d like to reanimate to appear as though they were singing the words), and the output is the re-render of their face synced to my singing. To get the video as it is, I mined movies from the late 80s through early 00s that left a mark on me growing up, and pieced slices of scenes together to create a visual that matched the song — and then I went slice by slice reprocessing the faces to match the lyrics. There was an enormous amount of experimentation — lots of videos/inputs required tweaking and some just simply didn’t work for various reasons — but it was a really interesting foray into a pretty fascinating space.