Audio Interfaces : Kelake

3 + 1

16 years ago we were invited to display our musical chairs prototype in front of the Presidential Office Building in Taipei but somehow forgot one of the chairs we used. Though the replacement worked it was rather embarrassing. Actually, considering how simplistic the concept was, I’m surprised we were invited at all. We embedded pressure sensors in pillows which allowed people to create music by activating prepared samples and manipulating various parameters based on how people moved to the music. Though the artists statement was ever so slightly more eloquent, essentially four people created a musical performance with their ass. An in-action shot is below.

Alexa and Siri sitting in a tree

I’ve had an interest in voice user interfaces for some time but living in Taiwan kept my experiences limited to Siri, due to I suppose limited support for local languages and I’m sure a host of legal/licensing issues. I’ve in the past been underwhelmed with Siri’s natural language processing, perhaps do to my dreams of interactions more akin to science fiction, like in the film Her.

Siri does often surprise, as she did by suggesting on my iPhone lock screen that I call my wife on her birthday.

When Amazon had the Echo Dot’s for sale at half price, I bought four, which is excessive, but I justified the expense since they can function as a decent bluetooth speaker for each of the kids room. The sound that comes out of them is about what you would expect from a speaker of this size, and better balanced than most of the bluetooth speakers we have purchased in the past. Pairing two together makes a for a pleasant enough bedroom listening experience. I haven’t spent a great deal of time with the devices but here are some insights thus far:

Siri is interesting in that after a short training period it will only respond to your voice, which is great when you consider the hilarity that would ensue with the potential number of iPhones in one room. Alexa out of the box has no such limitations. So unless you speak quietly, one person activates all the echo dots at similar times. You can set-up a voice profile but it’s buried within the iPhone app. and not readily that you can do so.
It’s much more enjoyable interacting with a device with decent sound output but I don’t find Alexa’s responses to be as smooth as Siri’s.
Apple Music skill is US only. A big disappointment to my kids.
My kids love music, but Alexa won’t play anything on their accounts. Alexa won’t play podcasts either. I have prime so I get a subset of music, but the catalog and experience pales in comparison to Apple Music.
I paired 2 Echo Dots together for stereo separation but this only works for music and not for anything else. When playing a sleep sounds skill it plays through 1 speaker only. This seems like an incredibly glaring oversight on their part which I hope they fix in the near future.
When you pair 2 speakers, the volume increases when you play anything other than music. It’s like the Echo is compensating for the loss of the other.
Alexa skills have discoverability problems, and
Skills are only interesting in theory as it requires the user to remember a long set of commands. UI should not add to the users cognitive load – I feel like I am back to having to write down common commands for Unix because I don’t use the system enough to commit them to memory. While I understand you are supposed to be able to ask Alexa what skills are available, I’ve never been able to get it to work. There is no equivalent to typing “help” in Alexa.
Skills seem very similar to traditional interactive voice-response systems, very unnatural.
I’ve yet been able to get it to play a podcast. This is likely due to not linking one of the music sources that contains podcasts, but Alexa describes it as an unknown error.
All the common requests (weather, etc.) seem to be handled with aplomb.
You need to think very clearly before you ask Alexa (or Siri) to perform a request. I suspect most don’t. There is no mid sentence error correction.
There is no way to get Alexa to repeat a response – no please repeat that again. Often times a response to a request is too long and it’s difficult to fit all the information into working memory.
Most surprising to me was when I asked a more advanced query, “Alexa, I would like to buy toothpaste”, it responds with an error message stating that I would have to change the primary language of my account. I thought Amazon would have this part nailed down tight. At least I can be assured that my kids won’t be buying their favourite treats via the Echo in my bedroom.
The most fun usage (for me) thus far is the Rooster skill. I walked into my sons bedroom early this AM and asked Alexa to play Rooster and it proceeded to loudly play a variety of Rooster crowing sounds. He hates me now.

Other than hardware that many people can afford, I as of yet see no major advances with Alexa over my experiences with Siri. It’s a pity that Apple hasn’t developed a similar smart speaker “for the rest of us”, $450 for each HomePod is not money well spent I think. Despite their limitations, I do look forward to digging deeper into skills and routines – in our effort to keep kids eyes away from screens, I hope to develop my own.

Ethics yes, politics no

All voice assistants encourage only one relationship dynamic — the servile companion: Always there for you, empathetic, cheerful, like a friend. But equally ready at all times to take orders and carrying out tasks, like a servant. It is no accident that the personality of the servile companion is enacted by a female voice — society is intimately familiar with women as casual servants in the roles of secretaries, housewives and mothers. As Ben Parr of Octane AI puts it, “We’re basically training our kids that they can bark commands at a female and she will respond.”

This italicized line is absolute bullshit and contradicts the reasoning behind using female voices in Voice UI stated previously in the article. Mother and wife as casual servant? I’ll have to mention that to my wife, or any partner to any man I know, and see how far it flies. The author appears to lack an understanding of the current state of voice interfaces – we can’t have a conversation with Alexa or Siri, we can only give tasks. It has nothing to do with servitude, and everything to do with replacing what we do with our fingers or pointing device, with our voice. We are a long way off from proactive assistants.

But from a brand perspective, the quest for universal likeability is misplaced. In personality design as in brand design, pandering to users can be self-destructive. Good brands don’t merely follow. Good brands are like good people. They believe in something and they stand for it. Standing for something is polarizing, but it’s the difference between expected and inspiring. Why shouldn’t a voice assistant balk when a user shouts a slur? Why shouldn’t it promote diversity, just like most corporations do in their annual reports?

Generally an interesting article but with these bombs of stupidity thrown in. In the past we had to face a different politics in design, corporate back stabbing, fiefdoms and the like, now we have to deal with unsubstantiated drivel.

Why Our Voice Assistants Need Ethics

Design for the ears to provide information, to communicate and to experience.

I haven’t finished absorbing all that is contained in the article but it’s really worth digging into if you have any interest in the UX of sound. Sharing this also gives me a chance to complain about the poor sound UX (is that a term?) of Walmart in Charlottetown’s credit/debit card terminals. That extra beep drives me crazy as it infers an error.

As we move into an artificially intelligent world whose logics of operation often exceed our own understanding, perhaps we should linger a bit longer on those blips and clicks. Compressed within the beep is a whole symphony of historical resonances, socio-technical rhythms, political timbres, and cultural harmonies. Rather than simply signaling completion, marking a job done right, a beep instead intones the complex nature of our relationships to technology — and the material world more generally.

Things that Beep: A Brief History of Product Sound Design

Natural sound is as essential as visual information because sound tells us about things we can’t see and it does so while our eyes are occupied elsewhere. Natural sounds reflect the complex interaction of natural objects; the way one part moves against another, the material of which the parts are made. Sounds are generated when materials interact and the sound tells us whether they are hitting, sliding, breaking or bouncing. Sounds differ according to the characteristics of the objects and they differ on how fast things are going and how far they are from us.
Bill Gaver

Detour: Location-Aware Audio Walks

Detour is a brand new way to experience the world. Gorgeous audio walks in San Francisco that reveal hidden stories, people and places all over SF. Each hour-long Detour takes you at your own pace, your own schedule, alone or synced with friends.

Lovely idea, an “immersive location-aware audio walk service”, perfect for people like me who might land in a place alone and might appreciate a guide to the more interesting places off the beaten path. Similar to something I explored before.

Tangible Interface: Beatblox

beatbox

Beat Blox by Per Holmquist is an interactive music machine that offers “free creative expression” without requiring prior knowledge; an interesting tangible user interface.

Moff – a wearable smart toy

Moff bills itself as a “wearable smart toy”. Grab a broom and strum, and the Moff bracelet will emit a guitar sound. Grab a banana and point it like a gun, and your shots ring out. A pretend tennis racket, you’ll hear a swat sound when you swing.

I love this idea, it’s weakness and strength is it’s reliance on a smartphone.

Ambient wants to make the world calmer

With Ambient the physical environment becomes an interface to digital information rendered as subtle changes in form, movement, sound, color or light.
Current information interfaces are either interruptive or too detailed. For the first time in history, ubiquitous wireless networks can affordably deliver digital information anytime, anywhere. The result for most of us is cacophony. Ambient wants to make the world calmer. Ambient Devices.

From an old ambient interface project of mine called unoriginally, Girls Ambient Room.

Approaches to the design of sounds

In his thesis ‘Auditory Information Design‘, Barrass (1998) describes 7 types of approaches to the design of sounds that in particular support information processing activities – syntactic, semantic, pragmatic, perceptual, task-oriented, connotative, and device-oriented.

The syntactic approach focuses on the organization of auditory elements into more complex messages. The semantic approach focuses on the metaphorical meaning of the sound. The pragmatic method focuses on the psychoacoustic discrimination of the sounds. The perceptual method focuses on the significance of the relations between the sounds. The task-oriented method designs the sounds for a particular purpose. The connotative method is concerned with the cultural and aesthetic implications of the sound. The device-oriented method focuses on the transportability of the sounds between different devices, and the optimization of the sounds for a specific device”. (Barrass 1998).

It’s great reading if you are interested in Audio Interfaces.

Unique qualities of sound

Some of the ways that sound is unique are as follows:

Sound can provide information about the interior of an object. Our ears perceive patterns of moving air from vibrating objects and sound can carry information about the consistency and hollowness of objects.
Sound also communicates information very quickly (Brewster 1998)
“Sound exists in time and over space, vision exists in space and over time” (Gaver 1989)
Sound is not bound to a specific location

A draw back is that one cannot turn away from sounds nor can one turn off our ears to unpleasant sounds.

Audio User Experience – Q&A

Kathy Sierra was nice enough to send me an email asking me some thoughts on audio/sound. I sometimes need this impetus to write down even the briefest thoughts on a subject (and these are just brief sketches). The following are her questions and my answers.

Do you agree with me that the power of audio/sound is being greatly overlooked in so many areas of product design, user experience, etc. (as opposed to areas where sound is recognized as crucial, like movies and commercials)?

Yes I agree but there is a good reason – I would also extend your characterization of crucial to include games and toys.
Movies and commercials are passive shared experiences. Task based products are interactive and not generally shared. It’s an obvious but crucial difference. Everyone outside of China may agree that noise is something that we would rather not experience. But sound is not noise.
Sound is distinguished from noise by the simple fact that sound can provide information.
Sound answers questions; sound supports activities for tasks, so sound is inheritly useful. Consider the information provided by the click when the bolt on a door slides open, the sound of your zipper when you close a pair of pants, the whistle of a kettle when your water is finished boiling, the sound of a river moving in the distance, the sound of liquid boiling, of food frying, and the sounds of people talking in the distance. In the workplace there are the sounds of keys being pressed on a computer keyboard.
Natural sound is as essential as visual information because sound tells us about things we can’t see and it does so while our eyes are occupied elsewhere. Natural sounds reflect the complex interaction of natural objects; the way one part moves against another, the material of which the parts are made. Sounds are generated when materials interact and the sound tells us whether they are hitting, sliding, breaking or bouncing. Sounds differ according to the characteristics of the objects and they differ on how fast things are going and how far they are from us.
An extension of the statement that tasks are not shared is that the environment in which the tasks are competed are – one person’s sound is another noise. Visual displays are not as intrusive as auditory ones.
So the question of whether or not auditory interfaces would or should be used is primarily a question of implementation – how to restrict the receiving of the information inherent to sound to the person meant to be receiving it? When we solve this problem cheaply then I think we will see a great deal more use of sound in other products’ development.

Do you see any areas of great leverage — places where audio/sound could be incorporated that could make a big difference in either usability, user experience (even if simply for more pleasure in the experience)?

I hesitate to use these buzz words but with the popularization of Ajax/Web 2.0 interfaces it may be a good time for people to start experimenting further with sound in online application interfaces. Since these interfaces load data in real time, we lose a vital visual clue from the pages loading or refreshing. Sometimes the data change happens so fast we can’t follow any clues.
But these ideas are always met with criticism. An example from Jeffery Veen, “Sounds I stopped counting how many times I tore the headphones from my ears when a site started blaring music or “interaction” cues like pops, whistles, or explosions whenever I moused over something. Am I the only one who listens to music while using my computer?”
I love childrens toys and gain much inspiration from them. Cheap cheap sensors which illicite wonderfully fun feedback. We should have these in everything. Imagine buying a jacket that when you closed the snaps it sounded “heavier” than it feels or looks. Like the difference in sound between the door closing on a Lada and a Benz. Lots of possibilities.

Any other comments on your “Adult Chair” experience? What you learned from observing users interacting, etc.?

The adult chairs were just a small part of a broader set of objectives in creating non-elitist interfaces to musical expression. Though all of my work at that time were prototypes, just some manifestations of some ideas I had, I was harshly criticized because of the lack of “new science” or extended interactivity. Basically my work was too simple due to using off the shelf tech. and short lengths of time that people were engaged in the activities. I rejected this criticism, mostly, because I knew the criticizers didn’t understand the goals of the project and they weren’t looking at people actually using the prototypes. Though it was never intended to be so, this project ended up being the greatest champion of user centered design for me personally. We video taped allot of sessions and gathered allot of anecdotal data which drove later iterations of the design.
Some of the conclusions:

Its really hard to design interfaces that have no visual responses. In a game we developed around an interface similar to Adult Chairs (hulabaloo) children kept looking for flashing lights or some kind of physical response. Eventually they learned to use their ears only which was good as it was a music appreciation game. Children here are very conditioned to visual response.
People love being surprised and they want to have fun. They don’t care if the technology came from radio shack – they care if you can make them smile.
Features, options, and controls are not needed to allow people to have fun for a short period of time. To keep them engaged for long periods of time people want that control.

Any other thoughts or tips for the rest of us?

I think too often when people thing of audio interfaces they immediately think of the horrible implementations in Yahoo IM, icq, and flash sites with hip hop sound tracks. It can be intelligently and elegantly designed.
Another thought is the difficulty in designing “gray sound”. Computer user interfaces are gray – not thought provoking – sit in the background and purposely boring. Icons and language localizations aside I think they work everywhere. But how to design auditory signals that work everywhere – cultural differences abound and what data is there to help us?
I live and work in Taiwan, arguably the noisiest group of people anywhere (i’m guessing). They “appear” to have a tolerance for noise and a need for sound that is far different than my own. Because their environment is so full of aural cues how do we design for them? A Japanese garden is a place of tranquility. A Canadian park a place of clean nature. A Taiwanese park is frequently experienced with a soundtrack as they pump in music and nature sounds to keep it from becoming quiet. Quiet seems to make them uncomfortable. This is just one example of what is acceptable or normal for levels of aural cues across 3 different locations and cultures. I think localizing audio interfaces will be quite challenging.

Richard Etter’s Melodious Walkabout

A project quite similar to my proposal Guidebot which unfortunately I couldn’t get enough interest in the project to get some budget to take it beyond a simple exhibition poster. It’s great that Richard Etter was able to take something similar, and likely more capable, and make it real. His project is likely a far better fit for the the Fraunhofer Institute for Applied Information Technology than mine was for where I was working at the time. Fraunhofer investigates human-centered computing in a process context.

A variety of navigation systems have been developed that use a GUI-based interaction style. However visual navigation systems are often inappropriate in the dynamic mobile context since the user has to watch the device and cannot keep his eyes on his surrounding environment. Auditory navigation systems are more convenient, mobile users can easily interact with the system and are not visually distracted. But most auditory systems navigate the traveler by using precise spoken instructions and speech requires high attention.

Read: Melodious Walkabout – a new approach to navigation – Richard Etter

Adult chairs exhibited in Taipei

Never underestimate the power of fun.
Today one of my tangible interface experiments is being exhibited in Taipei as part of some industry showcase – apparently the President of Taiwan is going to have a look though after yesterday’s election I doubt he will have much enthusiasm for the fun my piece seems to provide. The continued quasi-popularity of Adult Chairs always suprises me as it was without a doubt the simplist interface I had made. I think it proves how important simplicity, discovery (surprise), and fun can be in creating these type of products (interfaces). At least in the context of everything else that was being produced by the company.
Adult chairs were shown in a greater exhibition I had back in January and were a part of a project I was running called smenms. They were very simple prototypes which consisted of 7 pressure sensors (couldn’t afford 8) hidden in 4 pillows which when activated controlled a simple parameter of music. I wanted to embed sensors in ordinary objects which would allow people to interact with music and sound with a form that had a completely different use. It was hoped that by making the interface invisible and a part of ordinary objects we could invoke a sense of wonder, surprise, and hopefully engage people in the creation of sound and music in a whole new way. The whole project followed an iterative project development cycle, with every cycle producing an increasingly expressive musical interface. This was a slightly different version of the first iteration which utilized a simple on/off interaction metaphor. In first experimenting with different sounds, music, and parameters in which to control (I originally wanted people to control wildly different parameters but no one found it fun – it sounded too “post modern”) I finally settled on controlling the volume of separate pre-composed tracks. I was disappointed in the amount of expression but the audience was enthused – perhaps the lively music I wrote and produced carried the day.
Yesterday Chientai and I were setting it up for likely the last time. I will miss projects like this.
Here is an example output of the song the chairs controlled called Sit and Dance.
More info. here and a related article which perhaps should have been the title of this one Never Underestimate the Power of Fun.

Notes on (Audio) Iconic Interface Design

Notes of interest to audio interfaces from an aging but good paper, “The Use of Metaphors in Iconic Interface Design“.
“Icons are used extensively for communication purposes. The term icon has been adapted from its Russian origins -‘ikon’ meaning a religious painting or statue. Within the context of computing the word is used to refer to a small image which embeds ‘meaning’.”
“… some of the most popular of the new developments are ‘picons’, ‘micons’ and ‘earcons’. Picons are essentially icons that embed a picture (as opposed to a symbol). Similarly, micons are composed of moving pictures or video clips. Earcons, or auditory icons, are based upon the use of sounds and are usually embedded in sonic sequences …”
“… metaphors can play a part in the development of a functional specification. Indeed, metaphor usage should be made explicit at the design stage of application development so that maximum benefits can be attained. In this way a whole range of functions can be identified for which icons are required. In addition, metaphors can assist interface design by providing ideas for individual icon designs.
Second, the use of metaphors can have a significant impact upon end-users. Within an end-user interface metaphors can provide cues for the recognition of iconic symbolism …”
“As well as textual augmentation, audio augmentation may also take place.
Of course, serious problems can arise from the use of textual and audio augmentation – particularly, with respect to international interface designs. Obviously, if an interface is intended for international use then any textual labels which have been attached to it must be dynamically switchable between the target languages.”
How do you localise audio interfaces? The desktop metaphor seems to be transcending culture but what about audio metaphors? A click sound is a click sound but what about more complex functions.
Read the full paper: Metaphors in Iconic Interface design

Beeps and blips

Quote:“Sounds are critical,” said Donald Norman, co-founder of the Nielsen Norman Group and a University of California San Diego professor emeritus in the departments of cognitive science and psychology. “You have to spend the same type of attention to designing sound as visual appearance. Companies these days always hire graphic artists. They need to hire sound artists.” ”
Read:

Audio in the Computer User-Interface

“A number of studies have shown how audio contributes to the interaction process in order to provide a richer, more robust environment than with mere graphic feedback. Auditory feedback can present further information when the bandwidth of graphic information has been exhausted, as is often the case with the present emphasis on graphic presentation. By expanding conventional interfaces in another dimension, sounds make tasks easier and more productive. Other studies have even shown certain types of information to be represented better by sound than through graphics or text. Additionally, audio feedback may complement graphics and text to create valuable redundancy, reinforcing or reconfirming a concept in the user’s mind.”
Noise Between Stations: Audio in the Computer User-Interface

Do you agree with me that the power of audio/sound is being greatly overlooked in so many areas of product design, user experience, etc. (as opposed to areas where sound is recognized as crucial, like movies and commercials)?

Do you see any areas of great leverage — places where audio/sound could be incorporated that could make a big difference in either usability, user experience (even if simply for more *pleasure* in the experience)?

Any other comments on your “Adult Chair” experience? What you learned from observing users interacting, etc.?

Any other thoughts or tips for the rest of us?

Do you see any areas of great leverage — places where audio/sound could be incorporated that could make a big difference in either usability, user experience (even if simply for more pleasure in the experience)?