PDA

View Full Version : Character encoding issues with "Browse music folder"



mcfly
2005-05-25, 11:26
Hello all,

Since Slimserver "upgraded" to UTF-8, The "Browse Music Folder" functionality in
Slimserver doesn't work right with UTF-8 encoded files/directories and doesn't
work at all for files/directories encoded in ISO-8859-1 (the directory will not
even be displayed). I'm running latest slimserver-svn using Perl 5.8.4 on Debian
SID.

The script below can be used for reproducing this. It creates a directory
"UmlautTest" in the current directory and puts two subdirectories in it, both
having a o-umlaut in their names. One is encoded in ISO-8859-1, the other in
UTF-8. The script requires openssl on your box. Run it in the root-dir of your
music folder.

---8<---8<---8<---
#!/bin/bash
set -e

# Base64 representation of "utf-8_o-umlaut-ouml"
UTF8="aXNvLTg4NTktMV9vLXVtbGF1dC32"
# ISO-8859-1 representation of "iso-8859-1_o-umlaut-ouml"
ISO88591="dXRmLThfby11bWxhdXQtw7Y="

TD="${1:-"UmlautTest"}"

if [ ! -d "$TD" ]; then
mkdir "$TD"
fi

cd "$TD"

mkdir "`echo "$UTF8" |openssl base64 -d`"
mkdir "`echo "$ISO88591" |openssl base64 -d`"

echo "OK, Now browse '$TD' with your Slimserver..."
---8<---8<---8<---


Attached is how the utf-8 directory is displayed in my browser (FF 1.0.4 /
MacOSX)...

Before I file a bug: Maybe a Perl/UTF-8 crack has mercy and can shed some light
on what's going wrong?!?

Cheers,

Michel

mcfly
2005-05-25, 12:49
Me, Myself and I wrote:

> ... and doesn't work at all for files/directories encoded
> in ISO-8859-1 (the directory will not even be displayed). I'm running
> latest slimserver-svn using Perl 5.8.4 on Debian SID.

Now that I changed the LANG variable in my slimserver-start script from
en_US.utf8 to POSIX, the UTF-8 directory is displayed correctly, but still no
sign of the iso-8859-1 dir...

Dan Sully
2005-05-26, 12:13
* mma shaped the electrons to say...

>> ... and doesn't work at all for files/directories encoded
>>in ISO-8859-1 (the directory will not even be displayed). I'm running
>>latest slimserver-svn using Perl 5.8.4 on Debian SID.
>
>Now that I changed the LANG variable in my slimserver-start script from
>en_US.utf8 to POSIX, the UTF-8 directory is displayed correctly, but still
>no sign of the iso-8859-1 dir...

Michel - I'm working on some changes that will help the Unicode situation in the filesystem.

Be ware though, that your test is somewhat invalid. Filesystems will create
directories with the octets you generated - but based on the current locale
setting, only one will be able to be displayed correctly. This holds true for
things such as a 'ls'.

-D
--
<fuz> deregulation will lead to greater competition, consumer choice, and lower prices.
my name is elmer fudd. I own a mansion and a yacht.

mcfly
2005-05-26, 12:48
Dan Sully wrote:
> Be ware though, that your test is somewhat invalid. Filesystems will create
> directories with the octets you generated - but based on the current locale
> setting, only one will be able to be displayed correctly. This holds
> true for
> things such as a 'ls'.
Agreed... And I can live with slimserver not showing filenames correctly, but It
should not hide a file/directory because of some (strangely-)encoded name.

I thought that slimserver would operate exactly the same no matter how $LANG is
set but it doesn't look like it. So what is your recommendation for the $LANG
variable in start-scripts?

Dan Sully
2005-05-26, 12:52
* mma shaped the electrons to say...

>Agreed... And I can live with slimserver not showing filenames correctly,
>but It should not hide a file/directory because of some (strangely-)encoded name.

Well, it's not just "showing" the directories - it's also checking to see if
it can read them. And that means doing a stat() against the properly encoded
octets for your locale.

>I thought that slimserver would operate exactly the same no matter how
>$LANG is set but it doesn't look like it. So what is your recommendation
>for the $LANG variable in start-scripts?

If you're on a Debian system, I would recommend making sure your system is
completely UTF8. dpkg-reconfigure locales, and then set LANG/LC_CTYPE to en_US.UTF-8

Please note though, that if you have previously ripped directories/files that
were created in a different locale, they will not show up correctly in 'ls'
or SlimServer.

-D
--
It is dark. You are likely to be eaten by a grue.

mcfly
2005-05-26, 13:19
Dan Sully wrote:

> If you're on a Debian system, I would recommend making sure your system is
> completely UTF8. dpkg-reconfigure locales, and then set LANG/LC_CTYPE to
en_US.UTF-8


OK, did a dpkg-reconfigure locales, chose en_US.iso88591, en_US.utf8, no default
locale.

Now when having "LANG=en_US.utf8 export LANG" at the start of my
/etc/init.d/slimserver script, I get the following in my browser:

iso-8859-1_o-umlaut-ö
utf-8_o-umlaut-ö

Setting LANG to en_US.iso88591 produces

iso-8859-1_o-umlaut-ö
utf-8_o-umlaut-ö

Setting LANG to POSIX gives

iso-8859-1_o-umlaut-ö
utf-8_o-umlaut-ö


Not setting LANG at all (unset LANG)

iso-8859-1_o-umlaut-ö
utf-8_o-umlaut-ö


To sum it up: After fiddling with locales, the iso88591-encoded directory shows
up again but the UTF-8 encoded file will no longer be displayed correctly... I'm
really confused now. Is there some sort of cache on the server-side?

Dan Sully
2005-05-26, 13:23
* mma shaped the electrons to say...

>To sum it up: After fiddling with locales, the iso88591-encoded directory
>shows up again but the UTF-8 encoded file will no longer be displayed
>correctly... I'm really confused now. Is there some sort of cache on the
>server-side?

It's currently broken - I'm working on a patch that will hopefully not break
anything else. Look for it shortly. Also, the locale should be: en_US.UTF-8

-D
--
"It has become appallingly obvious that our technology has exceeded our humanity." - Albert Einstein

mcfly
2005-05-26, 13:27
Dan Sully wrote:

> It's currently broken - I'm working on a patch that will hopefully not break
anything else. Look for it shortly.

OK, I will stop nagging and look forward for your patch :-)

> Also, the locale should be: en_US.UTF-8
Are you sure? doing "locale -a" gives:

C
en_US
en_US.iso88591
en_US.utf8
POSIX

Dan Sully
2005-05-26, 16:30
* mma shaped the electrons to say...

> > It's currently broken - I'm working on a patch that will hopefully not
>break anything else. Look for it shortly.
>
>OK, I will stop nagging and look forward for your patch :-)

It's attached.

Can someone with a Windows machine please test this on a directory structure
that has accented characters in both directory and file names? My Windows box
is hosed at the moment.

> > Also, the locale should be: en_US.UTF-8
>Are you sure? doing "locale -a" gives:
>
>C
>en_US
>en_US.iso88591
>en_US.utf8
>POSIX

My mistake - I guess it's ok then.

-D
--
There was supposed to be a big kaboom.

mherger
2005-05-26, 23:14
> Can someone with a Windows machine please test this on a directory
> structure
> that has accented characters in both directory and file names? My

Hmm... with the patch it misses exactly those folders/files, without the
patch it's fine. (W2000/SP4, cp1252)

--

Michael

-----------------------------------------------------------
Help translate SlimServer by using the
StringEditor Plugin (http://www.herger.net/slim/)

Dan Sully
2005-05-26, 23:26
* Michael Herger shaped the electrons to say...

>>Can someone with a Windows machine please test this on a directory
>>structure
>>that has accented characters in both directory and file names? My
>
>Hmm... with the patch it misses exactly those folders/files, without the
>patch it's fine. (W2000/SP4, cp1252)

And if you uncomment out the from_to lines?

We're still getting reports of "SlimServer not finding all my files" from
mostly Windows people (I think).

-D
--
For every new fool-proof invention there is a new and improved fool.

mherger
2005-05-26, 23:40
>> Hmm... with the patch it misses exactly those folders/files, without
>> the patch it's fine. (W2000/SP4, cp1252)
>
> And if you uncomment out the from_to lines?

They're back again.

> We're still getting reports of "SlimServer not finding all my files" from
> mostly Windows people (I think).

I think Michel mentioned using a mac.

--

Michael

-----------------------------------------------------------
Help translate SlimServer by using the
StringEditor Plugin (http://www.herger.net/slim/)

Dan Sully
2005-05-26, 23:48
* Michael Herger shaped the electrons to say...

>>>Hmm... with the patch it misses exactly those folders/files, without
>>>the patch it's fine. (W2000/SP4, cp1252)
>>
>>And if you uncomment out the from_to lines?

Ok - try this slightly updated patch.

>>We're still getting reports of "SlimServer not finding all my files" from
>>mostly Windows people (I think).
>
>I think Michel mentioned using a mac.

I saw Debian.. but I could be wrong.

-D
--
It does not do to leave a live Dragon out of your calculations..

mherger
2005-05-27, 00:06
> Ok - try this slightly updated patch.

Ok! Got my album "Ohrewrm" ("earworms") with songs like "Schtrnefifi".

>> I think Michel mentioned using a mac.
>
> I saw Debian.. but I could be wrong.

You mean "Debian SID" isn't exactly mac? I guess you're right :-).

--

Michael

-----------------------------------------------------------
Help translate SlimServer by using the
StringEditor Plugin (http://www.herger.net/slim/)

Dan Sully
2005-05-27, 00:11
* Michael Herger shaped the electrons to say...

>>Ok - try this slightly updated patch.
>
>Ok! Got my album "Ohrewörm" ("earworms") with songs like "Schtärneföifi".

And does the patch work on your SME box as well? Better or worse than before?

-D
--
<Djall> and I also learned that a meat vortex takes meat away from you.

mcfly
2005-05-27, 00:25
Michael Herger wrote:

> I think Michel mentioned using a mac.

I use MacOSX (with Firefox 1.0.4) at the client side, but my server runs
Debian Sid...

mherger
2005-05-27, 00:34
>>> Ok - try this slightly updated patch.
>>
>> Ok! Got my album "Ohrewrm" ("earworms") with songs like
>> "Schtrnefifi".
>
> And does the patch work on your SME box as well? Better or worse than
> before?

With the stock Perl 5.6.x I see the folders, but not contents. I don't
know any more whether this is better or worse...

With Perl 5.8 the files aren't found any more. This is definitely worse
than before. If you want to test: your account is still there :-)

--

Michael

-----------------------------------------------------------
Help translate SlimServer by using the
StringEditor Plugin (http://www.herger.net/slim/)

Dan Sully
2005-05-27, 08:32
* Michael Herger shaped the electrons to say...

>>And does the patch work on your SME box as well? Better or worse than
>>before?
>
>With the stock Perl 5.6.x I see the folders, but not contents. I don't
>know any more whether this is better or worse...
>
>With Perl 5.8 the files aren't found any more. This is definitely worse
>than before. If you want to test: your account is still there :-)

Heh, ok. What is your LANG/LC_CTYPE set to?

-D
--
<dr.pox> what're the units of the coefficient of agnosticity? I don't knows per hour?

Dan Sully
2005-05-27, 15:07
Ok - here's round 3 of this patch. Tested on Linux, Windows (local & samba), OSX (local & samba).

-D
--
<Nigel> Please refrain from fearing the reaper.

mherger
2005-05-28, 01:24
> Heh, ok. What is your LANG/LC_CTYPE set to?

en.US

--
Michael

-----------------------------------------------------------
Help translate SlimServer by using the
SlimString Translation Helper (http://www.herger.net/slim/)

Dan Sully
2005-05-28, 01:27
* Michael Herger shaped the electrons to say...

>> Heh, ok. What is your LANG/LC_CTYPE set to?
>
>en.US

Ok - can you try out my latest patch?

You should probably be set to de_DE.ISO-8859-1 for your locale.

-D
--
There was supposed to be a big kaboom.

mherger
2005-05-28, 01:32
> Ok - can you try out my latest patch?
>
> You should probably be set to de_DE.ISO-8859-1 for your locale.

I'm out of town, but will give it a try tomorrow (sunday) night.

--
Michael

-----------------------------------------------------------
Help translate SlimServer by using the
SlimString Translation Helper (http://www.herger.net/slim/)

mcfly
2005-05-28, 02:59
Dan Sully wrote:
> Ok - here's round 3 of this patch. Tested on Linux, Windows (local &
> samba), OSX (local & samba).

Your patch is looking pretty good here; both subdirs (UTF-8, ISO-8859-1) in my
UmlautTest directory show up correctly when Browsing the music folder. LANG is
set to en_US.utf8 in my slimserver start script. Thanks!

However, two (small) issues remain:

1. If I browse the "UmlautTest" folder for the first time, the "o-umlaut" in the
UTF-8 encoded directory name looks garbled. Reloading the page fixes this. This
can be easily reproduced by copying the existing UmlautTest directory to .e.g
UmlautTest2 and browsing the new dir.

2. When browsing "UmlautTest", I see this in the slimserver log on each page reload:

Malformed UTF-8 character (unexpected end of string) at
/usr/share/perl/5.8/File/Spec/Unix.pm line 25.
Malformed UTF-8 character (unexpected end of string) at
/usr/share/perl/5.8/File/Spec/Unix.pm line 26.
Malformed UTF-8 character (unexpected end of string) at
/usr/share/perl/5.8/File/Spec/Unix.pm line 27.
Malformed UTF-8 character (unexpected end of string) at
/usr/share/perl/5.8/File/Spec/Unix.pm line 28.
Malformed UTF-8 character (unexpected end of string) at
/usr/share/perl/5.8/File/Spec/Unix.pm line 29.

Dan Sully
2005-05-28, 10:32
* mma shaped the electrons to say...

>Your patch is looking pretty good here; both subdirs (UTF-8, ISO-8859-1) in
>my UmlautTest directory show up correctly when Browsing the music folder.
>LANG is set to en_US.utf8 in my slimserver start script. Thanks!
>
>However, two (small) issues remain:
>
>1. If I browse the "UmlautTest" folder for the first time, the "o-umlaut"
>in the UTF-8 encoded directory name looks garbled. Reloading the page fixes
>this. This can be easily reproduced by copying the existing UmlautTest
>directory to .e.g UmlautTest2 and browsing the new dir.

Hmm.. I'm not seeing that here. What browser are you using?

>2. When browsing "UmlautTest", I see this in the slimserver log on each
>page reload:

This is because SlimServer doesn't know how to deal with the iso-8859-1
encoded directory when your LC_CTYPE is UTF8. There's no metadata or
otherwise that I can use to distinguish it, as UTF-8 encompasses char 1-255.

-D
--
<phone> i am a sausage fan

mherger
2005-05-29, 23:31
Dan,

> Ok - here's round 3 of this patch. Tested on Linux, Windows (local &
> samba), OSX (local & samba).

This is still fine with perl 5.8, but same problems with 5.6 as before:
folders are shown correctly, but not their content. When I change LOCALE
from en_US to en_US.utf8 even folder names won't be displayed correctly.
Umlauts are replaced by a comma (eg. "Bj,rk" instead of "Bjrk").

--

Michael

-----------------------------------------------------------
Help translate SlimServer by using the
StringEditor Plugin (http://www.herger.net/slim/)