PDA

View Full Version : if you could voice-enable squeeze boxen...



netchord
2016-05-27, 16:25
what would you want to be able to do? from squeezeboxen or any networked audio system? beyond just replicating functionality that already exists, is there something that's unique about a voice-controlled system?

Amazon Echo/Alexa, Google Assistant, Siri...

pippin
2016-05-28, 09:05
Amy kind of complex stuff, like deep search or directly setting complex alarms and such.
It's the only thing I'd ever consider to use voice for but it's hard... The search part would probably only work for the local library on SB

bpa
2016-05-28, 10:33
IIRC Touch has a microphone - so any eagar user can use it.

mherger
2016-05-30, 01:15
> what would you want to be able to do? from squeezeboxen or any
> networked audio system? beyond just replicating functionality that
> already exists, is there something that's unique about a
> voice-controlled system?

It adds some level of suspense to your music listening experience: does
it really understand what I want? Or will it play the great group
"Prints" again, instead of Prince? What about AC/DC's "Health Bells"?

Really, I don't get all the hype around voice controlled systems. In
particular not in a music listening environment: do you really want to
shut down the party just so your system does understand what you're
yelling at it? I doubt it...

And no, I didn't make up above examples. That's the results I got from
one such voice recognition system. Ok, I'm not a native English speaker.
But results don't get any better at all once you want to talk to it in
German. "Spiel mir Prinz". Nah...

--

Michael

epoch1970
2016-05-30, 04:59
Really, I don't get all the hype around voice controlled systems. In
particular not in a music listening environment
Having played with TTS (the non-AI other way around) I also doubt very much that a voice interface works with music, esp. when it comes to pronounciation.

But I believe it is possible to explain the hype. The web has killed the PC, the apps are killing the web, what could kill the apps? When you have a big database, enough CPU to do a bit of AI, and your name isn't Apple, I guess it is tempting enough to buy for scrap an audio speaker maker and try to swing a voice UI.
A networked speaker is something people can imagine and buy, so now you have your beachhead. And since use for music control doesn't really work (plus music is already monetized) while IoT is looming large, you prepare to make the speaker a hub for the smart home.
Does a voice-controlled home make more sense than voice controlled music? I don't think it can make less sense. But I think some marketers have seen one too many Iron Man movie.
I'd be content with an app-controlled home (assuming I don't need a working internet link to open the front door, etc.)

Dogberry2
2016-05-31, 08:13
I love the voice-controlled system in my car, where I can say "play album Abbey Road" or "play playlist Jazz Recliner" and it just plays. While driving at 70mph, that's much better than having to manually and visually scroll through menus to find what I want; I keep my eyes on the road and my hands upon the wheel, and key the system to listen for a command with a flick of a thumb on a button right there. Handy, convenient.

But for a home system, I can't see it being either needed or as convenient. Keying the system to listen, when you could be anywhere, walking around, moving from room to room, or even just sitting in your standard listening chair but with music already playing? Nah. Not so convenient. It's an entirely different environment with different requirements: I find that a phone/tablet/Controller/whatever is just fine for home system use. No shouting, portable, easy, and full-featured. I can't see a voice controlled system being any improvement at home.

EricBergan
2016-05-31, 10:33
Actually, the main thing I want is the play controls - "pause", "play", "next", "who is this". If I'm doing something else (reading, computer, etc.) while listening, I'd rather not have to pick up tablet/phone for these. Changing playlists would also be nice, but I wouldn't use that as often.

epoch1970
2016-05-31, 10:35
I love the voice-controlled system in my car...
Yes, I've though about this and for me the only room in the home where you may be highly constrained in terms of attention or physical interaction is the kitchen. Cooking is a bit like driving, accidents happen if you don't pay attention. (If I were spending any time in my kitchen) I suppose I'd appreciate voice control, even for music.

epoch1970
2016-05-31, 10:41
Actually, the main thing I want is the play controls - "pause", "play", "next", "who is this". If I'm doing something else (reading, computer, etc.) while listening, I'd rather not have to pick up tablet/phone for these. Changing playlists would also be nice, but I wouldn't use that as often.
That's why I still love my IR remotes. So quick and simple.
(the "who is this" case could be funny...)

netchord
2016-05-31, 14:16
the pronunciation problem is solvable, was in fact largely solved several years ago, but not enough of the newer systems take advantage of data that's available. it's actually better in the car for the most part, at least those systems that relay on Nuance AVR.

and i agree about the kitchen as a primary use environment. still, it's much easier to just say "play me xxxxx" than to grab my phone, unlock it, scroll/click perhaps several menu layers deep, to find what i want.

ideally, there'd be some sort of AI + voice, so one could just walk in the house, say "i feel fucking great, play music..." and the system would understand "fucking great" is greater than merely great, and play music accordingly.

and the reverse as well.

pippin
2016-05-31, 18:55
Yes, I've though about this and for me the only room in the home where you may be highly constrained in terms of attention or physical interaction is the kitchen. Cooking is a bit like driving, accidents happen if you don't pay attention. (If I were spending any time in my kitchen) I suppose I'd appreciate voice control, even for music.

It's a bit of a fallacy to think that voice control doesn't require attention. Quite to the contrary, the lack of visual input (that you can easily revisit, too) means it requires MORE attention than other means of control, not less.
It's just that you don't have to take your eyes off the road and you have the hands free (especially in the kitchen).

A good example are hands-free phone docks in cars. They are now required pretty much everywhere but that's just a subsidy for vendors. Studies show they have exactly no impact on the accident rates because it's the calls themselves that distract people, not holding the phone.

mherger
2016-05-31, 21:08
> I love the voice-controlled system in my car, where I can say "play
> album Abbey Road" or "play playlist Jazz Recliner" and it just plays.

How does it handle mis-recognitions? Would it just play whatever it
thinks it understood, or would it ask for confirmation? Is this working
on your own, limited collection of music, or on some music service?

Eg. I would have a hard time pronouncing simple artist names like "ABBA"
or "Prince". I think I never managed to get the latter. It would always
pick some artist "Prints", even if I tried to get "prinnsss" :-).

--

Michael

mherger
2016-05-31, 21:20
> the pronunciation problem is solvable, was in fact largely solved
> several years ago

Not sure. Maybe for the all EN speaking community. But the problem for
me and large parts of what the Americans call "the rest of the world" is
that commands would be in the local native language (or not even native,
as for me this would be Swiss German, not German). But the parameters or
data in music very often is in English. "spiel mir Thunderstruck von
AC/DC". Good luck with that! Funnily enough Siri seems pretty much up to
date, as it at least returned (translated for your understanding) "can't
find Axel AC DC" :-).

--

Michael

mherger
2016-05-31, 21:53
> data that's available. it's actually better in the car for the most
> part, at least those systems that relay on Nuance AVR.

How do you know which system they're using? And what is "Nuance AVR"
anyway? Google doesn't really show up anything interesting... And would
you know a music related example application using it?

--

Michael

epoch1970
2016-05-31, 23:44
ideally, there'd be some sort of AI + voice
"Calling Greg. Hold on."

epoch1970
2016-05-31, 23:47
It's a bit of a fallacy to think that voice control doesn't require attention. Quite to the contrary, the lack of visual input (that you can easily revisit, too) means it requires MORE attention than other means of control, not less.
Point taken, my mistake.

drmatt
2016-06-01, 01:06
> data that's available. it's actually better in the car for the most
> part, at least those systems that relay on Nuance AVR.

How do you know which system they're using? And what is "Nuance AVR"
anyway? Google doesn't really show up anything interesting... And would
you know a music related example application using it?

--

Michael
Nuance is the company that supply cloud voice recognition tech behind the scenes for a lot of third parties such as Samsung TVs, kids toys and so on. They have some genuinely pretty reliable voice recognition going on, it's quite impressive to use.

And the name will be buried deep in the Ts+Cs because they have to send a sound recording out to them and, as a third party, they have to notify you of this fact.

But I have to say I doubt that a VR system could pick up my voice in my lounge over my HiFi running at an unhealthy volume level without me shouting louder than I'd like, and certainly loud enough that I'd rather reach for a remote.

netchord
2016-06-01, 04:58
> data that's available. it's actually better in the car for the most
> part, at least those systems that relay on Nuance AVR.

How do you know which system they're using? And what is "Nuance AVR"
anyway? Google doesn't really show up anything interesting... And would
you know a music related example application using it?

--

Michael

AVR= Automatic Voice Recognition & TTS = Text To Speech.

most car based implementations rely on Nuance, and there's some music centric DNA in the nuance system that i'm familiar with. it could still get better of course.

Alexa (Amazon) rolled their own, likewise Siri and Cortana.

for weird pronunciations (AC/DC, Sade, 311) and natural language examples (CCR, The Boss, The King, The Stones) one needs specific rules hard-coded (AC/DC= ack slash Dee See) in to the AVR database. a lot of work went into this several years ago. additionally, there was a lot of time spent optimizing systems for non-native english speakers. it could always be better of course, and similar work would need to be done of other languages.

point is these problems are solvable, so i'm wondering what experiences one could enable.

pippin
2016-06-01, 07:02
One has to understand that there are _two_ aspects to voice recognition:

One is understanding the speech, so to determine what people actually said. That's what Nuance does and what they are pretty good at (even Siri and Google are said to use their system).

The other is the AI to determine what the user wants to tell the system, which is all about semantics and that's what Siri, Google at al are working on.

The speech recognition itself is pretty much solved, I've seen impressive systems (e.g. from Nuance but also Dragon etc.) almost 20 years ago. The problem for actual voice interaction, though, has always been to determine what the user wants...
A sub-problem of this is to detect which language a phrase or even sub-phrase is spoken in.
Here I'm underwhelmed of some systems, especially Siri. I usually use my phone in English which solves the "Music" problem a good part (except for German music, of course) but it means I can no longer navigate in Germany because for the life of me I can't figure what Siri thinks how "Hackescher Markt" should be pronounced in English :) And that would actually be such an easy problem because hey, Siri knows where I am so she should be aware street names are German in Berlin...

How good a solution VR is pretty much depends on the expectations. I know people who like it a lot and who are willing to learn hacks to use it:



for weird pronunciations (AC/DC, Sade, 311) and natural language examples (CCR, The Boss, The King, The Stones) one needs specific rules hard-coded (AC/DC= ack slash Dee See) in to the AVR database. a lot of work went into this several years ago. additionally, there was a lot of time spent optimizing systems for non-native english speakers. it could always be better of course, and similar work would need to be done of other languages.

point is these problems are solvable, so i'm wondering what experiences one could enable.

That's the point: some people think VW will understand them like they _mean_ things (not even other people will always understand what you _say_). They are usually quickly disappointed.
Then there are people (like me) who find it deeply awkward to talk to their hardware aloud in the presence of other people (and I find it disturbing if others do it) which significantly limits the usefulness if you are not home alone.
And then there are people who love to have a complete hands-off interaction and are willing to learn to use the system (and learn what kind of pronunciation it expects), for them this can already work really well even today.

I think voice interaction will always stay an interaction model that only works for part of the population but I'm pretty sure it can work pretty well for those who like it.

mherger
2016-06-01, 07:24
> The speech recognition itself is pretty much solved, I've seen
> impressive systems (e.g. from Nuance but also Dragon etc.)

hehe... Dragon, now a Nuance product :-)

http://www.nuance.de/dragon/index.htm

--

Michael

cliveb
2016-06-01, 07:36
... I keep my eyes on the road and my hands upon the wheel, ...
Presumably while listening to "Seven Little Girls Sitting in the Back Seat"?
(Sorry, couldn't resist)

netchord
2016-06-01, 17:25
I think voice interaction will always stay an interaction model that only works for part of the population but I'm pretty sure it can work pretty well for those who like it.

and if you were one of that part of the population, what would you like to do with it?

pippin
2016-06-01, 17:50
Didn't I say that close to the top of this thread?