How to program a speaking robot

– shrinking the gap between expectations and reality.

When I was offered an internship as a “Robot personality designer” with Furhat Robotics, a small swedish company that produces one of the worlds most sophisticated interactive Social Robots, I did not think twice. I was super happy to try the role, part of a campaign on future jobtitles. I got to take baby steps in robot programming, and a starstruck selfie with robot creator Samer Al Moubayed.

This is how I got connected with the Furhat.

Social Robot Head and computor

The Furhat I have been assigned to program looks at me with big eyes and a friendly, open expression. It is not really possible to mistake it for a human. But the facial mimic, the natural posture of the head, the soft, 3d animated image, projected from the inside on it’s silicone mask, and the fact that it can speak and listen to me, creates a surprisingly likeable and realistic persona.

It is easy to like the Furhat. But how easy will it be to program it? I will get back to that in a moment. First a description of what we are dealing with.

Below the movable head is a wide-angle camera, stereo microphones, and double loudspeakers, giving the current 3,5 kg torso great abilities to talk, register, attend and react to what is going on around it. It looks really sophisticated.

Making the Furhat move and speak in a way that makes sense, to me and you as social beings, is just one part. Behind the seemingly simple and human-like robotic features of this basic Furhat model lies hours and hours of complex developing, start-up struggles, and expert knowledge on human social interaction, on the next level past the PhD’s.

Crowd sourcing and development of AI resources online are some elements that the team believe will help in creating great applications for the Furhat platform. And great scripting.

Maybe, as a UX writer, I could be great at this? In fact, crowd sourcing is one of the reasons why I got the possibility to sign up at the Furhat Robotics’ developer pages, download and use the latest developer kit and try some robot interaction programming, not only as an intern but also at home.

Instructions on a screen and an instructor

I personally find voice interaction and robotics super interesting, have attended talks about ethics, algorithms, and legal implications in AI decision making, and read massive amounts of reviews on topics like ”When does algorithms become digital persons, and should these persons be legal subjects?” and “Does it actually have any legal implications that the robot Sophia got a citizenship in Saudi Arabia?”.

But this was my first interaction with the real deal.

During one intense day I got introduced to the Furhat unit I was going to work with, an introduction to the SDK (software development kit) I was supposed to try out, a go-to-person, and cheers from the crew like “come on, you can do it!” (conversations are mostly in english since the skilled team is an international bunch of people).

Then I just had to throw myself in the water and try to swim, while the team did their best to transfer detailed knowledge on some small secrets and bits that makes the robot speak and behave in a natural way. The result of my first day as a robot programmer was not very impressing. And my first demo was, to put it kindly, very teaching. But the Furhatters kept on cheering me.

My go-to-person Quinten and frontend developer Sam

I thought I was ready for this. But I did not expect the interaction to feel so real. Or the interaction programming to be so demanding. This is not only scripting, but also a three dimensional design of tonality, expressions, posture and response through classic programming with “if” and “else” and other conditional statements, expressions and constructs.

Maybe you’ve heard the news about non-biased recruiting through using robotics, machine learning and AI-based processes? Or the possibility of interviewing refugees at EU borders with robots?

It has been all over the media, at least in Sweden, and as a journalist and techgeek gone UX-writer, I already knew (before visiting Furhat Robotics) that voice interaction as well as interactive social robots and chatbots have taken the stage as the next big hype. But the Furhat is clearly not just a show robot or a hype. It’s manifestations are rather models just enough defined to show us how social robot interaction can, and most likely will, become useful in daily urban lives during the next five to ten years.

OK, we are impressed. But, can this really work? Well, TNG Recruiting and Furhat Robotics are not just crazy pioneers who dare stick their necks out and risk to be mocked because of their visions. They are extremely knowledgeable researchers who should be taken very seriously. So are the professors at a Swedish university who programmed a Furhat to sound depressed, mimic depression, and then used it to train their students with very good results.

In Frankfurt, Germany, Furhats were introduced in 2018 as concierge robots at the international airport, and a central railway station, where they serve travelers with information. During fall 2019 Furhats will also be introduced in schools in Stockholm. But they are not yet ready to be sold to consumers.

The robots in the promotion videos are of cause a bit extra cheeky, but what takes place is still not so far from reality. It all depends on the programming.

The next big question is if the social robot can live up to the expectations. There are a multitude of creative ideas, people would like to use it in teaching, research, health care, service and many other ways. Possibilities span from roles as a test leader and information officer, to storyteller and interpreter. With a robot connected to the internet, the options and visions of what can be done are almost unlimited for the developers. But how soon can we expect the robots to be fully interactive? Or on the market for consumers?

I managed to ask the very busy CEO Samer Al Moubayed for a selfie in front of one of the robot models, Petra. I look crazy on the photo, but that’s ok. After all, it’s not everyday you get to meet with a famous robot creator.

I also got to know that since 2011, when their first robot model was exhibited at the London Science Museum, the researchers and developer teams at Furhat Robotics have managed to create, and scale, their first commercial interactive platform which is now, since last year, bought and rented by teaching institutions and business organisations world wide.

Just before my internship Furhat Robotics had recieved another investment, of 2,5 million Euro, from the EU. And the demand for B2B cooperation projects is soaring. Among the interested and mentioned companies are Honda, Intel, Merck, Toyota, KPMG and Disney.

But consumers will have to wait, maybe five more years.

Aminata with founder Samed

Samer Al Moubayed, researcher, specialist in computer science, speech technology and animation (his doctoral thesis had the title ”Bringing the avatar to life”), also a former intern at Disney Research, founded Furhat Robotics in 2014 together with research colleagues Gabriel Skantze, Jonas Beskow and Preben Wik. They were at that point all experts in different academic areas at the KTH Royal Institute of Technology, in Stockholm. Through EIT Digital Accelerator the company could start growing with the help of investments and business angels.

Today the founders have been travelling around the world a couple of times, to talk about their robot and their research. In the video below (from 2016) Samer Al Moubayed, together with founding colleagues from the KTH Royal Institute of Technology, demonstrates some early models to a reporter from Bloomberg.

You see the furhat in the video? To make the head look less mechanical and more human, just before a demo was about to take place in the lab, the researchers were looking for something to cover the head with (this was at an earlier stage of the development, and the back of the robot’s head did not have a plastic skull, just a mish-mash of cables and wires). It was winter and cold outside, so, what they finally found and put on the head was a furhat. And that’s how the robot got it’s name.

An extra touch of magic (and wit), when programming a social robot, will probably always be needed. Making the Furhat speak in ways that makes sense, to me and you as social beings, gets easier when the tonality doesn’t have to be dead serious. Making people laugh a little can help bridging the glitches in interactive conversations, when resources and voice recognition are still not good enough (unless of cause if you are programming for use with people who have certain needs and inabilities, for instance when it comes to detecting tonality or irony). Taking breathing pauses when speaking also makes the robot sound more human.

Socially weird answers to users’s questions is what developers do not want, those could either make, or break, the established connection with the user.

But through machine learning and artificial intelligence the systems keeps on learning what works in live conversations with humans, and what does not. Large amounts of data are continuosly gathered and analysed. There are a multitude of resources possible to share and use online, to create interactivity and adapt the functions of robots to the environment where they are supposed to be placed. Google is far ahead in creating sharable online resources for developers, for example voices in 40 different languages for interaction, and a good enough choice of different voices and speech recognition in the largest languages. In this way, reality keeps closing in on expectations for spoken interaction.

discussions in front of a computer screen

To help robots have conversations with humans developers use different aspects of NLU (natural language understanding), machine learning, artificial intelligence, speech recognition systems, synthesized voices, and advanced research on language and social interaction between people.

Lots of this knowledge has been built into the design of the first commercial Furhat model, and the definitions of its’ attention span, mimics, movements, facial expressions and voice. The platform even comes with a set of random jokes, prepared to be built into conversations.

The Furhat with its soft and toony facial appearance, making it impossible to mistake the robot for a real person, is perceived as very “human” and non-threatening by most users. In one review, by a reporter from Forbes, Furhat is compared to other similar robots, and ranked to be the best one by far. The video below (from the article) shows early Furhat robot models in use as “test instructors” at a science museum in Sweden.

Social robotics, and truly realistic interacting units (not just “show robots” programmed to carry out a single specific conversation), is an extremely exciting but very complex area of science.

To predict what a person could possibly want to ask or say to a robot is almost impossible. There are many solutions to this, but if, or when, the robot is unable to respond to the user in a “human” way, the magic spell could be broken very quickly. The way developers choose to tackle that challenge could be decisive. This is extra true in the school world, where kids will probably keep scrutinizing the robots’ performance without mercy.

Another challenge is to “popularize” the software used with the Furhat platform. As an end-user of a Furhat you would like to be able to put all the exciting functions of the robot in use, while simultaneously having a not too difficult time learning how to program it. If the learning curve is too steep no one will be happy. Programming should be fun.

Changing the face of the robot

Ok, enough said, I will now get back to playing with the Furhat SDK, and have the virtual robot do things my way (great feeling). Right now my goal is to create a conversation that could go on for at least 30 to 60 seconds without any noticeable glitches.

As an experienced radio producer I also know one thing that is extremely important when programming a talking robot. The fact that humans breathe will, now and then, make them stop talking for some milliseconds before they keep on. To mimic this will make automated speech seem more natural, and less mechanical. To make this happen you have to create small pauses in the robots talked sentences. Small details, big impact.

Maybe you too will soon be able to talk with a Furhat, in a nearby place.

Want to try programming a social robot? More reading for developers and designers: https://www.furhatrobotics.com/developers/

Sources: Furhat Robotics, Forbes, Bloomberg, Jusektidningen Karriär, Computer Sweden, the newssite Entreprenör, Dagens Industri, Tidningen Innovation. KTH Royal Institute of Technology.

Photo © Aminata Merete Grut, except for the selfie, where Samer Al Moubayed took the photo (with my iPhone), or when stated otherwise. Videos from Bloomberg, TNG and Furhat.

Additional reading: • Software engineer Yoav Luft talks about his work with Petra, the health pre-screening robot. Behind the scenes: Programming the worlds most advanced robot. • The swedish newspaper Ny Teknik lists the most successful new tech companies in Sweden 2019. Här är alla vinnarna på 33-listan 2019. • Heading towards GITEX, the largest tech event in the arab world, through telecom and tech giant Etisalat, the Furhat will start interacting with audiences in the Middle East. • Follow the press updates from Furhat.

UX Designer & UX Writer, with some Frontend skills. Journalist. IP/Author's Rights. Tech. New Digital Media. Networking across the Globe. Ex board mbr RSF/SE.