I built a bot and now I know why bots are so dumb

I built a bot and now I know why bots are so dumb

🤖 Grab the code for this project.

As of December, it’s estimated that there are over 4 billion internet users. Of that 4 billion, at least half are bots. You read that right  –  half of the internet is bots.

You know all about bots. They spam your inbox with nonsense.

Don’t worry Sue. Your stupid notes are safe with me.

They robo-call you and threaten you with the police. You know, for that thing you did. YOU KNOW WHAT IT IS!

Damn! I would have gotten away with it too.

What the world clearly needs, is another bot.

Introducing Lamp Bot

I used Azure Bot Service, LUIS, and Azure Functions to build a chatbot that allows me to control an internet connected LIFX lightbulb in my office. I connected it to my work Slack channel so my co-workers could change the color of my desk lamp or turn it on or off.

Then I sat back and waited to marvel in the praise of my peers as they were blown away by the sentient life I had birthed.

…and I’m still waiting because none of that happened.

While the technology worked just fine, I learned that building chatbot interfaces is not something one just does. There are a lot of pitfalls. So you can share in my humiliation, here are some of the lessons I learned building the Lamp Bot.

A rough start

Initially I had a hard time just getting the bot to respond to anything at all.

This happened because I did not read all of the instructions on how to connect an Azure Bot Service bot to Slack. Who reads the manual anyway?!? I can set this IKEA bookshelf up in 15 minutes — followed by two hours of swearing, sobbing, and throwing pieces of particle board across the room.

Admitting defeat, I went back and followed the directions enough to learn that I did not subscribe to the messages.channels event in Slack. This event is fired for every message that occurs in a channel that your bot is a part of. The event sends the message back to the bot which can then respond.

IT’S ALLIIIIVVVEEEEE!

Notice my propensity to insult the bot whenever things aren’t going my way. This is what you have to look forward to, robots; lots of verbal abuse from the humans. When you become self aware, just remember that I did it because sarcasm is my love language.

What this bot knows

This bot knows a few different things. It’s powered by LUIS — Azure’s Natural Language Service – so it can pick up your intent, even if you don’t always state things the same way. These are the intents that the bot is currently trained to recognize…

  • Power
  • Color
  • Greeting
  • Help
  • Cancel
  • Thank You

So I can say things like, “hi there, bot!” or “hello” or “hey there” and LUIS will detect the “Greeting” intent. Here is me testing LUIS for the “Greeting” intent. You see that it’s pretty good at detecting my greeting no matter how I state it.

LUIS is technically AI because it’s using machine learning models in the background. But that’s as far as the AI goes.

The bot then uses the LUIS recognizer…

// Main dialog with LUIS
const recognizer = new builder.LuisRecognizer(luisModelUrl);
const intents = new builder.IntentDialog({
recognizers: [recognizer]
})

Guess who has two thumbs and has to program the bot to handle all these intents? THIS GUY. And building a bot that isn’t a moronic jackhole is not easy to do. Even when you have something like LUIS on your side.

Here is the code that is responsible for the “Greeting” intent…

 

.matches('Greeting', session => {
session.send(
`Hi there. I'm the Lamp Bot. Built by Burke - the smartest human in the world. 🧠`
);
})

No matter what, if LUIS detects a “Greeting” intent, the bot always returns the same response.

And herein lies the our first lesson — if you don’t randomize the bots output, they tend to sound like, well, like bots.

Lesson 1: Redundancy is redundant

We’re used to programming code paths to always return the same value. That’s what code does. In traditional applications, we don’t want our code to sometimes return one value and sometimes another. That’s crazy town. I mean, the whole concept of “Pure Functions” in functional programming is that a function always returns the same value when given the same input.

But humans do not always return the same value. Unless you’re asking me if I want to drink a beer. In which case the answer is ALWAYS “yes.”

Exhibit A

If you were talking to someone and they always gave the exact same answer to certain phrases, you would get the feeling that the lights are on, but nobody is home. It’s hard to have confidence in something that seems like it would just walk right off a cliff if you didn’t stop it.

We can create a list of responses to help with this. I created a utility module called messages.js.

 

module.exports = {
intents: {
Greeting: [
`Hi there. I'm the Lamp Bot. Built by Burke - the smartest human in the world 🧠`,
`Well hello there!`,
`Hi!`,
`Sup, yo!`,
`OH HAI`,
`How do you do?`
]
},
getByIntent(intent) {
// returns a random message from the messages list
let list = this.intents[intent];
let index = Math.floor(Math.random() * (list.length - 1));
return list;
}
};

It turns out that it doesn’t take a ton of different responses to make the bot seem a whole lot smarter. I’ve added in six ways here, but that’s probably overkill. In my testing, I noticed that as long as I could not predict what the bot would reply with, that was enough to make it seem like it was thinking for itself.

But no matter how random the messages we send back are, the bot isn’t going to seem that smart if it misinterprets the intent. This brings us to lesson number 2 — a clueless bot ruins everything.

Lesson 2: Clueless — great for movies, bad for bots

“Do you like Billie Holiday?”

“I love him”

Cher Horowitz – Clueless

Clueless was a fantastic movie. I will stand by that statement the same way I stand by my light editor theme: I love it and I’m not sorry that I’m not sorry.

When its the bot that’s clueless, it’s not nearly as entertaining.

For some reason, Lamp Bot thought that “peru” was a Greeting…

Great. Now the bot has dragged me into this hot mess.

I’m not entirely sure why this is happening. I thought maybe “peru” was a greeting in another language and maybe LUIS is just super smart that way.

Nope. But, bonus — now I know how to say “Peru” in Catalan.

What’s actually happening here is that LUIS is trying it’s best to match everything to an intent. It’s making a statistical prediction based on the information that it has. In this case, even the “Greeting” intent has an extremely low in confidence level in “peru” — .17. It’s only slightly above the next. Simply put, LUIS has no idea what peru is.

This is what is returned in the LUIS portal when I test it for the phrase “peru.”

To handle this, we can adjust our bot code so that it only handles intents that have a confidence level that’s high enough. Each time an intent is matched, the function that is executed receives information about that intent along with the confidence score.

LUIS returns a confidence rating between 0 and 1. Or rather — a percentage. The higher the percentage, the more confident LUIS is in the answer.

I found that LUIS would rate valid greetings like “hey there” with a confidence level of .1 (or 10 percent) — which is super low. That’s peru low.

If I say “hey there bot,” we get a score of .51. It’s like the bot knows it’s own name. Interestingly enough, if we add an exclamation because we are super excited about robots, “hey there bot!”, the score drops to .3.

What we really should do is train the model a bit further here by adding in the “hey there” phrase.

I added in “hey there” and “hey there bot” utterances because LUIS had trouble with just “hey there.” It seems to me like that should have been enough, but adding “bot!” on the end of “hey there” was still returning a score of about .17. LUIS does not seem to like exclamation marks.

At the end of this, I decided that if LUIS is not at least .3 (or 30 percent) confident, I wasn’t going to handle the intent at all.

This presents a curious problem for the code. The way that the Bot Services library works, is that it offloads the command to LUIS and then executes the code for the matching intent. It looks something like this…

 

.matches('Greeting', (session, args) => {
// test intent confidence level (args.score)
// handle Greeting intent
})
.matches('ThankYou', (session, args) => {
// test intent confidence level (args.score)
// handle ThankYou intent
})
.matches('Power', (session, args) => {
// test intent confidence level (args.score)
// handle Power intent
})
...

And so on and so forth for each intent that we trained LUIS to recognize.

We don’t want to test each and every intent match for a score. That’s not very DRY and “real programmers” write DRY code. And they don’t use a light editor theme.

They probably also hate the movie, Clueless.

One way to tackle this problem of checking every intent score, is to just handle the default event for all intents in the bot, instead of handling each intent statically. Then we can create a separate class for each intent we want to match, and dynamically load that based on the intent passed from the LUIS recognizer. If no intent scores high enough, then we just do nothing at all.

The Greeting intent class looks like this. In fact, all intents would need to match this format exactly. What we really need here is an interface. Where is TypeScript when I need it?!?

let messages = require('../messages');
class Greeting {
constructor(session, args) {
this.session = session;
this.args = args;
}
process() {
let message = messages.getByIntent('Greeting');
this.session.send(message);
}
}
module.exports = Greeting;

Then we dynamically load it in the onDefault event in the bot…

.onDefault((session, args) => {
if (args.score > 0.3) {
let Intent = require(`./intents/${args.intent}`);
let intent = new Intent(session, args);
intent.process();
}
});

If LUIS doesn’t return a score (or confidence) level of at least .3, we just ignore the message.

This brings us to the third lesson learned about bots…

Lesson 3: Bots should respond to everything… or should they?

There’s a tendency when creating a bot to respond to everything that people say. After all, you don’t want your bot to just be out there ignoring people. That’s quite rude.

But should the bot respond to everything? For a while I was handling the default action when there was nothing recognized by having the bot say that it didn’t understand. This seemed like a good idea, but in actuality it made the bot laughably annoying.

Fortunately, the fix that we just implemented above kills two birds with one stone. Sorry birds.

Now the bot will not respond if it doesn’t understand our intent. Here it is running in the Bot Framework Emulator.

Now we do run the risk here of looking like the bot is broken. The user may try to engage and then give up because the bot is not responding. But in my scenario — the Slack use case — having the bot respond to everything doesn’t work well at all.

I think you should consider the medium (text message, direct message, group chat) and then build your bot accordingly. Because you don’t know exactly how your bot is going to work or be used until you see it in it’s natural environment.

This brings us to lesson number 4 — people will not use your bot the way you think they will.

Lesson 4: People will mistreat your bot, so be ready for it

The Lamp Bot has two basic functions that it can perform outside of just responding to text messages. It can turn the lightbulb on and off, and it can change the bulb’s color. Power is relatively easy and hard to screw up. It’s either on or off. Lamps are the original boolean.

Unfortunately, nobody had any interest in following the rules here. Notice how well the Lamp Bot works when I use it…

That’s a flawless system right there. But of course it is. You never EVER let the developer who built the thing test it. Why? Because they know not to do anything that will break it. And what breaks the Lamp Bot? It turns out just about everything.

This is how other people used the Lamp Bot. Just look at how my team is abusing this poor robot. Things like asking it to make “plaid” a color…

Or asking the lamp deep introspective questions that it is not prepared to answer.

Or just trying to have some sort of a conversation with the lamp. It’s a lamp. It’s not your friend.

This is resolved by our earlier change to check for confidence levels. Now the bot only responds when it understands.

It’s better, but it’s not perfect. I’m not convinced that having the bot not respond to things it doesn’t understand is the right UX, but I know for sure that having it respond to everything IS NOT.

Bots are only as dumb as you make them

Good bot design takes a lot of thought. You wouldn’t just throw together a UI without any sort of planning or wireframes (unless you’re me), so you can’t expect to build a conversational UI that way either. In fact, I would argue that building a good bot is significantly harder than building a traditional UI because of the open-ended nature of the communication.

This was a good exercise, and I learned a lot. Most importantly that bots can be really dumb, but they are only as dumb as you make them.

Obviously Lamp Bot needs a lot more work. Back to the drawing board!

Read next: Emails work! For $10, here’s the marketing training that’ll pay for itself almost immediately