What's the Deal With Robot Comedy?
This is a guest post. The views expressed here are solely those of the author and do not represent positions of IEEE Spectrum or the IEEE.
In my mythical free time outside of professorhood, I'm a stand-up comedian and improviser. As a comedian, I've often found myself wishing I could banter with modern commercial AI assistants. They don't have enough comedic skills for my taste! This longing for cheeky AI eventually led me to study autonomous robot comedians, and to teach my own robot how to perform stand-up.
I've been fascinated with the relationship between comedy and AI even before I started doing comedy on my own in 2013. When I moved to Los Angeles in 2017 as a postdoctoral scholar for the USC Interaction Lab, I began performing in roughly two booked comedy shows per week, and I found myself with too good of an opportunity for putting a robot onstage to pass up.
Programming a NAO robot for stand-up comedy is complicated. Some joke concepts came easily, but most were challenging to evoke. It can be tricky to write original comedy for a robot since robots have been part of television and cinema for quite some time. Despite this legacy, we wanted to come up with a perspective for the robot that was fresh and not derivative.
Another challenge was that in my human stand-up comedy, I write almost entirely from real-life experience, and I've never been a robot! I tried different thought exercises-imagining myself to be a robot with different annoyances, likes, dislikes, and "life" experiences. My improv comedy training with the Upright Citizens Brigade started to come in handy, as I could play-act being a robot, map classic (and even somewhat overdone) human jokes to fit robot experiences, and imagine things like, "What is a robot family?", "What is a robot relationship like?", and "What are drugs for a robot?"
Text-to-speech researchers would probably be astounded by the mounds of SSML that we wrote to get the robot to clearly pronounce phrases that humans have almost certainly never said, such as "I want to backpropagate all over your hidden layers"As a robotics professor, you never quite know how thousands of dollars of improv classes will come into play in your professional life until they suddenly do! Along the way, I sought inspiration and premises from my comedy colleagues (especially fellow computer scientist/comedian Ajitesh Srivastava), although (at least for now) the robot's final material is all written by myself and my husband, John. Early in our writing process, we made the awkward misstep of naming the robot Jon as well, and now when people ask how John's doing, sometimes I don't know which entity they're talking about.
Searching for a voice for Jon was also a bit of a puzzle. We found the built-in NAO voice to be too childlike, and many modern text-to-speech voices to be too human-like for the character we were aiming to create. We sought an alternative that was distinctly robotic while still comprehensible, settling on Amazon Polly. Text-to-speech researchers would probably be astounded by the mounds of SSML (Speech Synthesis Markup Language) that we wrote to get the robot to clearly pronounce phrases that humans (or at least humans in the training dataset) have almost certainly never said, such as "I want to backpropagate all over your hidden layers" or "My only solace is re-reading Sheryl Sand-bot's hit book, 'Dial In.'" For now, we hand-engineered the SSML and also hand-selected robot movements to layer over each joke. Some efforts have been made by the robotics and NLP communities to automate these types of processes, but I don't know of any foolproof solution-yet!
During the first two performances of the robot, I encountered several cases in which the audience could not clearly hear the setup of a joke when they laughed long enough at the previous joke. This lapse in audibility is a big impediment to "getting the joke." One way to address this problem is to lengthen the pause after each joke:
As shown in the video, this option is workable, but falls short of deftly-timed robot comedy. Luckily, my humble studio apartment contained a full battery of background noises and two expert human laughers. My husband and I modulated all aspects of apartment background noise, cued up laugh tracks, and laughed enthusiastically in search of a sensing strategy that would let the robot pause when it heard uproarious laughter, and then carry on once the crowd calmed down. The resulting audio processing tactic involved counting the number of sounds in each ~0.2-second period after the joke and watching for a moving average-filtered version of this signal to drop below an experimentally-determined threshold.
Human comics not only vie for their jokes to be heard over audience laughter, but they also read the room and adapt to joke success and failure. For maximal entertainment, we wanted our robot to be able to do this, too. By summing the laughter signal described above over the most intense 1 second of the post-joke response, we were able to obtain rudimentary estimates of joke success based on thresholding and filtering the audio signal. This experimental strategy was workable but not perfect; its joke ratings matched labels from a human rater about 60 percent of the time and were judged as different but acceptable an additional 15 percent of the time. The robot used its joke success judgements to decide between possible celebratory or reconciliatory follow-on jokes. Even when the strategy was failing, the robot produced behavior that seemed genuinely sarcastic, which the audience loved.
By this point, we were fairly sure that robot timing and adaptiveness of spoken sequences were important to comedic effectiveness, but we didn't have any actual empirical evidence of this. As I stepped into my current role as an assistant professor at Oregon State University, it was the perfect time to design an experiment and begin gathering data! We recorded audio from 32 performances of Jon the Robot at comedy venues in Corvallis and Los Angeles, and began to crunch the numbers.
Our results showed that a robot with good timing was significantly funnier-a good confirmation of what the comedy community already expected. Adaptivity actually didn't make the robot funnier over the course of a full performance, but it did improve the audience's initial response to jokes about 80 percent of the time.
While this research was certainly fun to conduct, there were also some challenges and missteps along the way. One (half serious/half silly) problem was that we designed the robot to have a male voice, and as soon as I brought it to the heavily male-dominated local comedy scene, the robot quickly began to get more offers of stage time than I did. This felt like a careless oversight on my part-my own male-voiced robot was taking away my stage time! (Or sometimes I gave it up to Jon the Robot, for the sake of data.)
Some individual crowd members mildly heckled the robot. One audience member angrily left the performance, grumbling at the robot to "write your own jokes."All of the robot's audiences were very receptive, but some individual crowd members mildly heckled the robot. Because of our carefully-crafted writing, most of these hecklers were eventually won over by the robot's active evaluation of the crowd, but a few weren't. One audience member angrily left the performance, grumbling directly at the robot to "write your own jokes." While all of Jon's jokes are original material, the robot doesn't know how to generate its own comedy-at least, not that we're ready to tell you about yet.
Writing comedy material for robots, especially as a roboticist myself, also can feel like a bit of a minefield. It's easy to get people to laugh at quips about robot takeovers, and robot jokes that are R-rated are also reliably funny, if not particularly creative. Getting the attendees of a performance to learn something about robotics while also enjoying themselves is of great interest to me as a robotics professor, but comedy shows can lose momentum if they turn too instructional. My current approach to writing material for shows includes a bit of all of the above concepts-in the end, simply getting people to genuinely laugh is a great triumph.
Hopefully by now you're excited about robot comedy! If so, you're in luck- Jon the Robot performs quarterly in Corvallis, Ore., and is going on tour, starting with the ACM/IEEE International Conference on Human-Robot Interaction this year in Cambridge, U.K. And trust me-there's nothing like "live"-er, well, "physically embodied"-robot comedy!
Naomi Fitter is an assistant professor in the Collaborative Robotics and Intelligent Systems (CoRIS) Institute at Oregon State University, where her Social Haptics, Assistive Robotics, and Embodiment (SHARE) research group aims to equip robots with the ability to engage and empower people in interactions from playful high-fives to challenging physical therapy routines. She completed her doctoral work in the GRASP Laboratory's Haptics Group and was a postdoctoral scholar in the University of Southern California's Interaction Lab from 2017 to 2018. Naomi's not-so-secret pastime is performing stand-up and improv comedy.