Home of the Squeezebox™ & Transporter® network music players.
Page 2 of 2 FirstFirst 12
Results 11 to 20 of 20
  1. #11
    Senior Member
    Join Date
    Aug 2012
    Location
    Austria
    Posts
    1,041
    Quote Originally Posted by doggod View Post
    natrlich kenne ich das deutsche und sollte mich wirklich daran erinnern, dass auch verwendet wird. Eigentlich einmal Deutsch gelernt ... das war erst vor> 30 Jahren. Vielleicht erklrt das die Sache ...
    Ok I admit, used google translate for that one, most likely very poor german?
    Actually, this isn't bad at all...

    One thing though, so if i understand this right, it (this encoding "issue") has nothing to do with the Trackstat plug-in?
    But oh shit, just come to think about. You wrote "(in UTF-8, generally used by modern file systems) " ...my main library from windows naturally is on a NTFS disk BUT my test disk (both are external USB disks) is ExFat! Could this be the reason? Should I make me a "testlibrary" on a NTFS disk instead? Would that work under linux? For now I use Dietpi which is based on Debian.
    I glossed over some aspects in my explanation to keep it simple, but I guess we'll need more details.
    The file system (FS) is only part of it, there's also the OS (which will have a system-wide default encoding) and the application (Perl / LMS / Trackstat in this case).
    So even if the FS in theory could use Unicode (both NTFS and ExFAT can), the OS still can encode file names differently (Windows-1252 in case of legacy Microsoft Windows), which will determine how they are saved in the FS. If such a FS is mounted on a system using a different encoding, a OS (or more precisely the FS driver) can either convert it (in Linux, the codepage and nls mount options - that's why you can use a NTFS drive with Windows-1252 file names event though Linux uses UTF-8 as system encoding) or fail gracelessly (the Windows approach).
    Trackstat on Windows will therefore get file names as Windows-1252 encoded strings, and this is how they are written to the XML files (percent-encoded, but still). A more portable method would have been to convert these names from Windows-1252 to UTF-8 before percent-encoding and writing the XML, which would have made all of this a non-issue.

    I think I understand the thing with different encodings in general terms. But from yours ;
    "Blue%2520%25C3%2596yster%2520Cult means c396 (hexadecimal) is (in UTF-8, generally used by modern file systems)
    Blue%2520%25D6yster%2520Cult means d6 (hexadecimal) is (in Windows-1252, used by legacy components of Microsoft Windows)"

    There's parts in your explanation missing that confuses me. You say "c396" but I read "%2520%25C3%2596" from "Blue" to "yster" ...
    I can see there's a C3" but then there's "%25" before the"96" part, how does one know that one should read it as "c396"?
    I'm trying hard to see a pattern but sadly can't :-/ Does "%2520" mean "space"? What does "%25" before "C3" and "D6" mean?
    I assume you have read and understood percent-encoding (see #3).
    In the case of the Trackstat XML file, there's another twist: For no apparent reason, Trackstat encodes the file name twice (but the folder name only once).
    So decoding %2520%25C3%2596 once gives %20%C3%96 (since %25 is literally %), decoding it a second time gives " " since %20 is space, %C3%96 is (percent-encoding uses one %XX stanza per byte, but UTF-8 can be multi-byte, so these two (C3 and 96) are combined into c396)

    Is there any converter tool that can be used on a *.xml text file with Windows-1252 encoded file URL's that convert it to UTF-8?
    Not that I know of.

    Hmm, maybe Im asking the wrong question? Guess what I first and foremost would like to understand is exactly what is it that makes (if we stay with the example ) An appear as 25D6 in the Trackstat on Windows created xml backup file and respectively like 25C3%2596on the linux install?
    Is it only because of two different OSs?
    OSes that use different default system encodings, and an application (Trackstat) that doesn't use a cross-platform file format.

    Can any setting make a change to this?
    Not that I know of.

    Or is it just as simple as theres no way around it other than the long and hard way of search and edit the *xml backup file? Also, can it mean that basically there could be more or other "characters/letters" that is not interpreted correctly by LMS/trackstat Linux install?
    Probably any letter not in the English alphabet (unlauts, accents, etc.)

    If so I think I give up my linux project.
    Well, umlauts are only 6 search/replace operations. It'll depend if you also have accents, nordic characters etc.
    Also note that this is not really about Linux, it will affect you when you migrate away from legacy Windows (possibly even if the destination is a newer Windows, see here - not sure about that)

    Personally, I simply avoid any non-English characters in file names for files which may end up somewhere else (e.g. car audio systems etc.), although nowadays UTF-8 is mostly a safe bet.
    Various SW: Web Interface | Playlist Editor / Generator | Music Classification | Similar Music | Announce | EventTrigger | LMSlib2go | ...
    Various HowTos: build a self-contained LMS | Bluetooth/ALSA | Control LMS with any device | ...

  2. #12
    FANTASTIC!, thank you so much Ronald0 for your time giving such a lengthy and wonderful explanation! One simply has to love this forum and the knowledge gathered here!

    Your explanations is kind of how I figured from reading and googling from the hints I've been given from the earlier posts but you put all my thoughts in an understandable and wonderful text!

    From reading and learning naturally gives more questions, for me this is the natural human learning behaviour. Particulary your last bit brings me an idea or maybe shall we say interesting question.

    Well, umlauts are only 6 search/replace operations. It'll depend if you also have accents, nordic characters etc.
    Also note that this is not really about Linux, it will affect you when you migrate away from legacy Windows (possibly even if the destination is a newer Windows, see here - not sure about that)
    I read your link and honestly don't understand every bit of it, but in general, yes understandable. For me it seems strange that MS when "upgrading" to Win 10 would maybe have? given up on not being able of reading file/directory locations from "win legacy OS"? So my "idea" is, do you think that going through a Win 10 LMS install and from there to Linux would solve the "xml/URL" issue? Idea is if Win 10 LMS can read the "legacy" exported xml and then perhaps a second export would give a xml with UTF-8 URL's?
    (hope you understand and excuse my sad novice thinking).

    Also would like to give a tip to anyone else on the same quest reading this thread about a site I found which is really helpful.
    > https://www.w3schools.com/tags/ref_urlencode.ASP

    And nah, I haven't given up my linux quest. Not yet at least. ;-)
    Last edited by doggod; 2020-07-29 at 03:54. Reason: spelling error

  3. #13
    Senior Member
    Join Date
    Aug 2012
    Location
    Austria
    Posts
    1,041
    Quote Originally Posted by doggod View Post
    F For me it seems strange that MS when "upgrading" to Win 10 would maybe have? given up on not being able of reading file/directory locations from "win legacy OS"? So my "idea" is, do you think that going through a Win 10 LMS install and from there to Linux would solve the "xml/URL" issue? Idea is if Win 10 LMS can read the "legacy" exported xml and then perhaps a second export would give a xml with UTF-8 URL's?
    First of all, I have no idea if LMS on Windows can use Unicode at all (apparently, an application has to be specifically designed to do so). Also, Unicode on Windows isn't UTF-8, but UTF-16 (although Microsoft now seems to have changed their mind, according to Wikipedia: "As of May 2019, Microsoft seems to have reversed course and now supports and recommends using UTF-8". According to other sources, the UTF-8 support on WIndows is currently too buggy to use, though)
    And finally, there is a basic flaw in your approach: If Trackstat runs with UTF-8 encoding (no matter on which OS), it will not be able to read it's own data (which is exactly what't happening now). Your new approach is functionally identical to what you are doing now...

    Anyway, writing a conversion script should (hopefully!) only take a couple of minutes. You can either upload your Trackstat file somewhere I can D/L it, or I'll upload the script and you can run it locally (note, however, that with the latter option, I have no way of testing if the conversion actually works)
    Various SW: Web Interface | Playlist Editor / Generator | Music Classification | Similar Music | Announce | EventTrigger | LMSlib2go | ...
    Various HowTos: build a self-contained LMS | Bluetooth/ALSA | Control LMS with any device | ...

  4. #14
    Quote Originally Posted by Roland0 View Post
    First of all, I have no idea if LMS on Windows can use Unicode at all (apparently, an application has to be specifically designed to do so). Also, Unicode on Windows isn't UTF-8, but UTF-16 (although Microsoft now seems to have changed their mind, according to Wikipedia: "As of May 2019, Microsoft seems to have reversed course and now supports and recommends using UTF-8". According to other sources, the UTF-8 support on WIndows is currently too buggy to use, though)
    And finally, there is a basic flaw in your approach: If Trackstat runs with UTF-8 encoding (no matter on which OS), it will not be able to read it's own data (which is exactly what't happening now). Your new approach is functionally identical to what you are doing now...
    I couldn't help myself but actually try my idea ..."the long and winding road". Installed a version of Win 10 pro (19041.264) as a VM and then LMS.
    No problem to import from my older win 7 lms. But of course the export thing didn't work as I was hoping for. Well, "if you don't go you don't know". Stupid idea really but on the other hand one never knows with "computing", sooo many variables. One thing I've found out is that the trackstat URL's in LMS actual database is encoded as win 1252 in Win and UTF-8 in LMS linux. I guess it makes sense and is natural for anyone more clever than me but I just thought mentioning it as info for anyone else reading who's on my level of knowledge.

    Anyway, writing a conversion script should (hopefully!) only take a couple of minutes. You can either upload your Trackstat file somewhere I can D/L it, or I'll upload the script and you can run it locally (note, however, that with the latter option, I have no way of testing if the conversion actually works)
    That sounds fantastic if you want to help like that! Should I send you my smaller "test library" maybe? It's about 2300 files or so, <1MB (main is aprox.40-50MB. From my tests on it I've found at a minimum these characters/letters as "problems; , , , , , , , , , , . Most likely many more "problems" on my main library.Think I can use "google drive" for an upload? Send you a link in a PM maybe? But you need to give me some instructions how to use it. I know what a "batfile" is on windows and how to use it. Think batfile is a sort of "script"? But on linux I'm completly blindfolded if that's what you've in mind for your script?
    Last edited by doggod; 2020-07-31 at 02:42.

  5. #15
    Senior Member
    Join Date
    Aug 2012
    Location
    Austria
    Posts
    1,041
    Quote Originally Posted by doggod View Post
    One thing I've found out is that the trackstat URL's in LMS actual database is encoded as win 1252 in Win and UTF-8 in LMS linux.
    Not only Trackstat's, all URLs (LMS' library will have the same issues if you switch encoding)


    Should I send you my smaller "test library" maybe? It's about 2300 files or so, <1MB (main is aprox.40-50MB.
    Makes no difference to me. Just make sure to compress the file (zstd > level 15, lzma2 (e.g. 7z, xz) will reduce 50MB to ~3MB)

    From my tests on it I've found at a minimum these characters/letters as "problems; , , , , , , , , , , . Most likely many more "problems" on my main library.
    In theory, all characters should be correct after the conversion.

    Think I can use "google drive" for an upload? Send you a link in a PM maybe?
    Any service I can download from without registration is fine.

    But you need to give me some instructions how to use it. I know what a "batfile" is on windows and how to use it. Think batfile is a sort of "script"? But on linux I'm completly blindfolded if that's what you've in mind for your script?
    It's a Python script you run from the CLI. If you upload your complete trackstat file, you won't have to run it locally (although of course you can if you prefer)
    Various SW: Web Interface | Playlist Editor / Generator | Music Classification | Similar Music | Announce | EventTrigger | LMSlib2go | ...
    Various HowTos: build a self-contained LMS | Bluetooth/ALSA | Control LMS with any device | ...

  6. #16
    I've sent you a PM with a download link fo my "test library". (I choosed it 'cause I'll need to do adjustments to my main library before completly migrating to linux LMS and I'ts quite a bit of work which I don't have the time to at the moment. And it's constantly changing anyway so for sure will have to do it later)

    In theory, all characters should be correct after the conversion.
    Yes that's what I was hoping for, would save so much time and struggle. Really hope it'll work out. :-)

    It's a Python script you run from the CLI. If you upload your complete trackstat file, you won't have to run it locally (although of course you can if you prefer)
    I've been googling a bit about Python script and think/hope I'll get it to work. All though a little instruction would be very helpful. How can a get the script later on if it show to work out as hoped for? Will you publish it here maybe?

  7. #17
    Senior Member
    Join Date
    Aug 2012
    Location
    Austria
    Posts
    1,041
    Quote Originally Posted by doggod View Post
    I've been googling a bit about Python script and think/hope I'll get it to work. All though a little instruction would be very helpful.
    If you want to run it on Windows, the PDF attached to this post has some hints. On Linux, simply run it from the command prompt (after making sure Python is installed)
    Various SW: Web Interface | Playlist Editor / Generator | Music Classification | Similar Music | Announce | EventTrigger | LMSlib2go | ...
    Various HowTos: build a self-contained LMS | Bluetooth/ALSA | Control LMS with any device | ...

  8. #18
    Senior Member
    Join Date
    Aug 2012
    Location
    Austria
    Posts
    1,041
    While the script generally seems to work, one issue remains (as discussed in PM).
    There are two encoding passes
    - the first doesn't encode characters it should (according to the specs)
    - the second (which is pointless to begin with) also doesn't encode characters it should (but different ones)

    currently, the script has this:
    Code:
    safe_chars_firstenc = ':/[]@!$&()*+,='
    safe_chars_secondenc = '()!'
    i.e. characters in safe_chars_firstenc will not be encoded by the first pass, characters in safe_chars_secondenc will not be encoded by the second pass.
    Since this is getting tedious, I'll leave it to you to find out if there are any changes necessary to get exactly the output TrackStat expects (simply add/remove characters from these two variables).
    You can get the script from here
    Run
    Code:
    ts_cp1252-to-utf8.py --help
    to see it's options
    Various SW: Web Interface | Playlist Editor / Generator | Music Classification | Similar Music | Announce | EventTrigger | LMSlib2go | ...
    Various HowTos: build a self-contained LMS | Bluetooth/ALSA | Control LMS with any device | ...

  9. #19
    [QUOTE=Roland0;983932]While the script generally seems to work, one issue remains (as discussed in PM).
    There are two encoding passes
    - the first doesn't encode characters it should (according to the specs)
    - the second (which is pointless to begin with) also doesn't encode characters it should (but different ones)

    currently, the script has this:
    Code:
    safe_chars_firstenc = ':/[]@!$&()*+,='
    safe_chars_secondenc = '()!'
    Quote Originally Posted by Roland0 View Post
    Since this is getting tedious, I'll leave it to you to find out if there are any changes necessary to get exactly the output TrackStat expects (simply add/remove characters from these two variables)
    Yep agree, very tedious. I'll see what I manage. Thank you so much for all your help and time you spent. Fantastic! :-)

    Hope you eventually can find some time and answer one thing; you say "currently, the script has this",
    I hope/suppose that wasn't the setting used for the last xml you uploaded to me? Wondering since I notice that the ( ) characters appears on both "safe_chars". But as we discussed they actullay were encoded in that last file (to; %2528 & %2529), but they shouldn't have been from the line in example above (if I understood you right?).

  10. #20
    Senior Member
    Join Date
    Aug 2012
    Location
    Austria
    Posts
    1,041
    Quote Originally Posted by doggod View Post
    you say "currently, the script has this",
    I hope/suppose that wasn't the setting used for the last xml you uploaded to me?
    Correct

    Wondering since I notice that the ( ) characters appears on both "safe_chars". But as we discussed they actullay were encoded in that last file (to; %2528 & %2529), but they shouldn't have been from the line in example above (if I understood you right?).
    Yes, I've added a number of characters to be excluded from encoding (for both passes). You should notice the changes when you run it yourself.
    Various SW: Web Interface | Playlist Editor / Generator | Music Classification | Similar Music | Announce | EventTrigger | LMSlib2go | ...
    Various HowTos: build a self-contained LMS | Bluetooth/ALSA | Control LMS with any device | ...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •