PDA

View Full Version : CLI and titles with non-ascii characters



adhawkins
2008-01-18, 09:23
Hi all,

I'm trying to use the CLI to extract the database of tracks from
SqueezeCenter.

The docs say:

"For strings, SqueezeCenter uses the UTF-8 charset. Some extended queries
allow to return data in a different charset."

However, when I ask for track information for one of my albums where the
track names contain non-ascii characters, I get the following (apologies for
the wrapping):

Sent: titles 0 100 album_id:461 tags:adt
Received: titles 0 100 album_id%3A461 tags%3Aadt count%3A14 id%3A7441
title%3A30%20Minutes artist%3At.A.T.u. duration%3A197.59 tracknum%3A4
id%3A7449 title%3A30%20Minutes%20(remix) artist%3At.A.T.u.
duration%3A353.724 tracknum%3A12 id%3A7439
title%3AAll%20the%20Things%20She%20Said artist%3At.A.T.u. duration%3A214.309
tracknum%3A2 id%3A7443 title%3AClowns%20(Can%20You%20See%20Me%20Now%3F)
artist%3At.A.T.u. duration%3A192.366 tracknum%3A6 id%3A7442
title%3AHow%20Soon%20Is%20Now%3F artist%3At.A.T.u. duration%3A195.945
tracknum%3A5 id%3A7444 title%3AMalchik%20Gay artist%3At.A.T.u.
duration%3A189.571 tracknum%3A7 id%3A7450 title%3AMalchik%20Gay%20(remix)
artist%3At.A.T.u. duration%3A303.673 tracknum%3A13 id%3A7451
title%3ANe%20Ver%2C%20Ne%20Boisia%20(Eurovision%20 2003) artist%3At.A.T.u.
duration%3A185.678 tracknum%3A14 id%3A7438 title%3ANot%20Gonna%20Get%20Us
artist%3At.A.T.u. duration%3A262.922 tracknum%3A1 id%3A7440
title%3AShow%20Me%20Love artist%3At.A.T.u. duration%3A256 tracknum%3A3
id%3A7448 title%3AShow%20Me%20Love%20(extended%20version) artist%3At.A.T.u.
duration%3A310.413 tracknum%3A11 id%3A7445 title%3AStars artist%3At.A.T.u.
duration%3A248.268 tracknum%3A8 id%3A7447
title%3A%D0%9D%D0%B0%D1%81%20%D0%BD%D0%B5%20%D0%B4 %D0%BE%D0%B3%D0%BE%D0%BD%D1%8F%D1%82
artist%3At.A.T.u. duration%3A262.322 tracknum%3A10 id%3A7446
title%3A%D0%AF%20%D1%81%D0%BE%D1%88%D0%BB%D0%B0%20 %D1%81%20%D1%83%D0%BC%D0%B0
artist%3At.A.T.u. duration%3A214.727 tracknum%3A9

The bit I'm particularly interested in is (for example)

title%3A%D0%AF%20%D1%81%D0%BE%D1%88%D0%BB%D0%B0%20 %D1%81%20%D1%83%D0%BC%D0%B0

Once I 'url-decode' this, it seems to me that a lot of the characters will
end up with ascii values > 128. This doesn't fit with my understanding of
what UTF-8 should look like. However, perhaps I'm just misunderstanding what
UTF-8 should actually look like!

What I'm trying to do is get this string into Python, and then into Python's
XML library such that I can write out a correctly formed UTF-8 XML file with
the title in it. However, when I write out the XML file I'm actually getting
'binary' data. While IE can display this correctly, I suspect that this is
just a fluke.

Can anyone give me a clue as to how to modify the above string (in Python
perhaps) such that I get the correct unicode representation of the string,
which I can then (hopefully) coerce so as to be output as UTF-8?

I'm new to Python, and still a little confused as to how Unicode, UTF-8 etc.
all work, so any pointers would be greatly appreciated.

Thanks

Andy