Home Contact

OK, ROBOT

REBOOT (VOICE)

WEEK 3

Passive-aggressive IVR Experience

Imperfect robot interaction that uses speech-to-text (STT)

thebank.html

This week, May and I built an IVR system called “The Bank.”

It uses speech-to-text (STT) as the input and browser-based speech synthesis as output/voice to simulate a very imperfect customer service exchange (the one you’d normally have before speaking with a human). This “agent” is a gatekeeper in many ways! The thing that makes this system imperfect is that the IVR agent never gets your name right, but rather they can only understand your name to be “Maxine”, and they don’t allow you to correct them. They do eventually pass you on to a representative, but you’ll most likely need to restate all of your information. Also, it’s a funny idea that a robot would impersonate someone by calling. I guess crazier things have happened.

Constraints:
- Uses existing web technologies: browser SpeechRecognition API (voice input/output)
- Runs entirely in the browser
- Voice is the primary input method
- Has multiple exchanges (name, number, riddle)

What is the user expectation for this experience?

How does the BOT's personality influence the interaction?
How does the implemented technology enhance or constrain the experience?

The caller expects efficiency and speed, but the limitations we set (speech recognition errors, forced flow) make the system inefficient and a waste of time. I think this reflects a lot of our experiences with IVR technology; it’s meant to save time for both the caller and the company, but it often has the opposite effect. I find Alex’s voice to be cloyingly neutral and yet also passive aggressive; it’s somehow managing to hit all of the markers of a “helper” by repeating what they incorrectly heard, but never asking, “Did I get that right?”

WEEK 2

A possessive health check-in experience.

Command-line interaction that uses text-to-speech

ritual.mjs

I used Eleven Labs to clone my voice and then customized a ritual.mjs file to engage in a possessive “health check-in experience”.

Constraints:

Takes text inputs and delivers speech outputs.
Triggered on the terminal with a single command and runs without a GUI.

VOICE TRAINING:
- In Eleven Labs, I used around 4 voice memos. In two of them, I just rambled, and in the other two, I read a few excerpts from this Reddit post, where folks offer scripts that help to optimize the cloning process. Let me tell you, this clone sounds exactly like me, and it’s extremely bizarre listening to it.
- I grabbed an API key as well as a Voice ID key, both of which I used in the ritual.mjs file.
- CODE: A combination of this Medium article by Ivan Koop, assistance from Claude, as well as node.js stuff.

Does this interaction serve any purpose?
Can you make this interaction more playful?
How does the technology enhance or constrain the experience?

This interaction is meant to read as a practical health check-in assistant bot. While previously the chatbot may have engaged in an impersonal way, this time it's starting to show possessiveness and judgment. I think I meant it to be funny due to the uncanniness and how it weaves in and out of practicality and insubordination. After hearing my own voice back, I ended up feeling weirdly comforted by the chat's possessiveness. It made me think to myself, “wow, this voice is obsessively caring for me... in the way I should be”, which was an unexpected feeling.