Home of the Squeezebox™ & Transporter® network music players.
Page 5 of 5 FirstFirst ... 345
Results 41 to 47 of 47
  1. #41
    Senior Member
    Join Date
    May 2010
    Location
    London, UK
    Posts
    923
    Quote Originally Posted by KaPa View Post
    Main conclusion:
    - The names of the sub directories have big impact in the time it takes to scan my library if with Greek characters. I suspect that the problem is bigger when characters with umlaut (diaeresis) are used. I have to underline that the final results of the scan, although delayed are OK.
    - On the other hand I have many directories with greek names causing no problem to scan very fast.

    So the problem has to be analyzed further by someone who understands the technical details of the log file.
    That has been a very productive piece of detective work.

    After some fiddling around, I was able to reproduce your issue, using the files you provided. I had to make sure that I had both albums, and that they were stored in the same sub-folder, suitably named with greek characters.

    Code:
    /media/Music/podcast/Τα ζειμπεκικα που εγραψαν ιστορια/Τα Ζεϊμπεκικα που εγραψαν ιστορια Vol.4
    /media/Music/podcast/Τα ζειμπεκικα που εγραψαν ιστορια/Τα Ζεϊμπεκικα που εγραψαν ιστορια Vol.3
    The scan "hung" for about 100 minutes when it reached the second folder, with a log trace matching the trace which you noted in your own scanner log.
    Code:
    [21-10-22 17:39:59.5134] Slim::Schema::_createOrUpdateAlbum (1130) -- Searching for an album with: (
      [
        "albums.title = ?",
        "albums.disc IS NULL",
        "albums.discc IS NULL",
        "tracks.url LIKE ?",
      ],
      [
        "\x{3A4}\x{3B1} \x{3B6}\x{3B5}\x{3CA}\x{3BC}\x{3C0}\x{3AD}\x{3BA}\x{3B9}\x{3BA}\x{3B1} \x{3C0}\x{3BF}\x{3C5} \x{3AD}\x{3B3}\x{3C1}\x{3B1}\x{3C8}\x{3B1}\x{3BD} \x{3B9}\x{3C3}\x{3C4}\x{3BF}\x{3C1}\x{3AF}\x{3B1}",
        "file:///media/Music/podcast/%CE%A4%CE%B1%20%CE%B6%CE%B5%CE%B9%CE%BC%CF%80%CE%B5%CE%BA%CE%B9%CE%BA%CE%B1%20%CF%80%CE%BF%CF%85%20%CE%B5%CE%B3%CF%81%CE%B1%CF%88%CE%B1%CE%BD%20%CE%B9%CF%83%CF%84%CE%BF%CF%81%CE%B9%CE%B1/%CE%A4%CE%B1%20%CE%96%CE%B5%CF%8A%CE%BC%CF%80%CE%B5%CE%BA%CE%B9%CE%BA%CE%B1%20%CF%80%CE%BF%CF%85%20%CE%B5%CE%B3%CF%81%CE%B1%CF%88%CE%B1%CE%BD%20%CE%B9%CF%83%CF%84%CE%BF%CF%81%CE%B9%CE%B1%20Vol.4%",
      ],
    )
    [21-10-22 19:19:25.3203] Slim::Schema::_createOrUpdateAlbum (1294) -- Creating album 'Τα ζεϊμπέκικα που έγραψαν ιστορία' with columns:
    Setting the logging level of (database.info) - Metadata & Parsing Logging is enough to obtain the log trace. I had to ensure that the scan had made some reasonable progress at this point (i.e. a reasonable amount of tracks in the DB already). Scanning with "Look for new and changed media" did the trick, as my DB is already full at that point.

    The problem lies in the tracks.url LIKE element of the "search for album" specification. A SQL LIKE statement uses the character '%' as a "wild card", and file URLs can be chock-full of '%' characters. I think that, in this case, the search space became too wide to achieve a swift result.

    I also think it can be easily fixed, by "escaping" the '%' characters in the file URL before undertaking the search. I have tested a quick modification to the LMS code which does this, with a thoroughly effective result, and will propose an appropriate change to LMS after due testing.

    The relevant piece of code has been around in LMS for at least 10/12 years, probably more. I guess no one has knowingly suffered this problem before you hit LMS with it. Perhaps because most systems will be using predominantly ASCII based file/folder names. You noted that changing part of the folder structure to contain more ASCII characters solved the immediate problem, and I guess that is because the removal of a significant number of '%' characters from the file URL being searched for sufficiently closed down the search space. But a fix to LMS is a much better solution, and I think I have that.

    Quote Originally Posted by KaPa View Post
    Other conclusions:
    Greek characters appear with 3 different coding in the log files making difficult to identify problems
    a)ANSI (?) in the "discovering files/directories" scanning process
    b)URL % coding filenames and directories
    c) normal Greek in TAGS
    I am trying out some LMS changes to solve (a). If successful I shall propose an appropriate change. I'm still not sure about how best to approach (b). We can be thankful for (c) !

  2. #42
    Junior Member
    Join Date
    Oct 2021
    Location
    Athens, Greece
    Posts
    16
    Quote Originally Posted by mrw View Post
    That has been a very productive piece of detective work.
    .....

    I also think it can be easily fixed, by "escaping" the '%' characters in the file URL before undertaking the search. I have tested a quick modification to the LMS code which does this, with a thoroughly effective result, and will propose an appropriate change to LMS after due testing.

    The relevant piece of code has been around in LMS for at least 10/12 years, probably more. I guess no one has knowingly suffered this problem before you hit LMS with it. Perhaps because most systems will be using predominantly ASCII based file/folder names. You noted that changing part of the folder structure to contain more ASCII characters solved the immediate problem, and I guess that is because the removal of a significant number of '%' characters from the file URL being searched for sufficiently closed down the search space. But a fix to LMS is a much better solution, and I think I have that.
    .....

    I am trying out some LMS changes to solve (a). If successful I shall propose an appropriate change. I'm still not sure about how best to approach (b). We can be thankful for (c) !
    Thanks very much mrw.
    Its so relief to see that the many hours spent to detect the problem were not lost and finally my findings will help for improvements of the platform.

    Please let me also ask the following naif question:
    I noticed that in some parts of the log file the greek characters of the file names are substituted by "corresponding" latin characters. For example in the scanner.log file I uploaded (line numbers from notepad++):
    Code:
    line 1068 [21-10-21 17:32:22.3574] Slim::Schema::_newTrack (1574)   titlesearch : 09 EIMAETOS KhORIS PhTERA   <------ Latin characters
    ................
    line 1077 [21-10-21 17:32:22.3595] Slim::Schema::_newTrack (1574)   titlesort : 09 ΕΙΜ΄ΑΗΤΟΣ ΧΩΡΙΣ ΦΤΕΡΑ <------- Greek characters
    ....................
    
    line 1093 [21-10-21 17:32:22.3646] Slim::Schema::_createOrUpdateAlbum (1297) --- titlesearch : TA ZEIMPEKIKA POU EGRAPsAN ISTORIA  <------ Latin characters
    ............................
    line 1096 [21-10-21 17:32:22.3654] Slim::Schema::_createOrUpdateAlbum (1297) --- titlesort : ΤΑ ΖΕΪΜΠΈΚΙΚΑ ΠΟΥ ΈΓΡΑΨΑΝ ΙΣΤΟΡΊΑ  <---------Greek characters
    Who is doing this character substitution? Is this a part of the LMS code e.g define variables (like "titlesearch") to help searching the relevant fields?
    Last edited by KaPa; 2021-10-23 at 13:41.

  3. #43
    Babelfish's Best Boy mherger's Avatar
    Join Date
    Apr 2005
    Location
    Switzerland
    Posts
    20,626

    Problem scanning music files with nonenglish file names

    > Who is doing this character substitution? Is this a part of the LMS code
    > e.g define variables (like "titlesearch") to help searching the relevant
    > fields?


    Yes, there's a method to "Turn a UTF-8 string into it's US-ASCII
    equivalent." [sic!]

    https://github.com/Logitech/slimserv...e.pm#L423-L435

  4. #44
    Babelfish's Best Boy mherger's Avatar
    Join Date
    Apr 2005
    Location
    Switzerland
    Posts
    20,626

    Problem scanning music files with nonenglish file names

    > The scan "hung" for about 100 minutes when it reached the second folder,
    > with a log trace matching the trace which you noted in your own scanner
    > log.
    >

    ....
    > The problem lies in the -tracks.url LIKE- element of the "search for
    > album" specification. A SQL LIKE statement uses the character '%' as a
    > "wild card", and file URLs can be chock-full of '%' characters. I think
    > that, in this case, the search space became too wide to achieve a swift
    > result.


    Oh wow... now that's an excellent catch!

  5. #45
    Babelfish's Best Boy mherger's Avatar
    Join Date
    Apr 2005
    Location
    Switzerland
    Posts
    20,626
    mrw - would the following patch help with this issue?

    Code:
    diff --git a/Slim/Schema.pm b/Slim/Schema.pm
    index 864a03915..f6c55c210 100644
    --- a/Slim/Schema.pm
    +++ b/Slim/Schema.pm
    @@ -1113,8 +1113,8 @@ sub _createOrUpdateAlbum {
     				# as a last resort if both DISC and DISCC are unknown.
     				(!$checkDisc && !defined $disc && !defined $discc && !$extId)
     			) {
    -				push @{$search}, 'tracks.url LIKE ?';
    -				push @{$values}, "$basename%";
    +				push @{$search}, 'tracks.url LIKE ? ESCAPE "\"';
    +				push @{$values}, sqlEscapeUrl($basename) . '%';
     				$join = 1;
     			}
     
    @@ -1346,6 +1346,12 @@ sub _createOrUpdateAlbum {
     	return $albumHash->{id};
     }
     
    +sub sqlEscapeUrl {
    +	my $url = shift;
    +	$url =~ s/\%/\\%/g;
    +	return $url;
    +}
    +
     # Years have their own lookup table.
     sub _createYear {
     	my ($self, $year) = @_;
    Michael

    "It doesn't work - what shall I do?" - "Please check your server.log and/or scanner.log file!"
    (LMS: Settings/Information)

  6. #46
    Senior Member
    Join Date
    May 2010
    Location
    London, UK
    Posts
    923
    Quote Originally Posted by mherger View Post
    mrw - would the following patch help with this issue?
    One should escape, as well, the chosen escape character and the other SQL wild card ( '_' from memory).

    Here's the slightly tidied up proof of concept that I tried out earlier for comparison.

    I think that '^' was not the best escape character to have chosen, ideally one that rarely appears in a URL and doesn't have the special semantics in a [...] character class that '^' has. '\' or '/' definitely don't work for me - too many walking toothpicks.

    I think the regex works correctly, but may not be optimal for speed ?
    Having your subroutine makes the reason for doing it clearer. Although I think only needed once in this particular script file.

    Code:
    --- Schema.pm.original	2009-08-03 16:14:33.000000000 +0100
    +++ Schema.pm.new	2021-10-24 13:51:51.111825750 +0100
    @@ -1114,8 +1114,12 @@
     				# as a last resort if both DISC and DISCC are unknown.
     				(!$checkDisc && !defined $disc && !defined $discc && !$extId)
     			) {
    -				push @{$search}, 'tracks.url LIKE ?';
    -				push @{$values}, "$basename%";
    +				push @{$search}, "tracks.url LIKE ? ESCAPE '^'";
    +				# Prepare SQL escaped copy of $basename
    +				# Escape our esc char '^' and the SQL wildcards '_' & '%'
    +				my $escaped_bname = $basename;
    +				$escaped_bname =~ s/([%_^])/^$1/g; 
    +				push @{$values}, "$escaped_bname%";
     				$join = 1;
     			}
    There are a number of other places in LMS that may also be impacted, not necessarily all URLs. Any 'LIKE' may want escaping. But I haven't reviewed them yet. grep says about 75 'LIKES' to review. And I think it should be done.

  7. #47
    Babelfish's Best Boy mherger's Avatar
    Join Date
    Apr 2005
    Location
    Switzerland
    Posts
    20,626

    Problem scanning music files with nonenglish file names

    > One should escape, as well, the chosen escape character and the other
    > SQL wild card ( '_' from memory).


    Right.

    > I think that '^' was not the best escape character to have chosen,


    ^ can be confusing in a regex, too...

    > I think the regex works correctly, but may not be optimal for speed ?


    It would hopefully still gain some time by speeding up the SQL request.
    Plus it's not called often, is it?

    > Having your subroutine makes the reason for doing it clearer. Although I
    > think only needed once in this particular script file.


    I believe this line should us it, too:

    https://github.com/Logitech/slimserv...r/Local.pm#L65

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •