PDA

View Full Version : FTS: Just a quick question



dolodobendan
2018-09-18, 06:37
Trying to understand how FTS works in respect to prioritizing results, I had a look at the library.db.

Do I assume correctly that if w10 only has one or less hits in a search (only then!) w1 kicks in, providing more results by widening the search?

w10 | results > 1 | showing w10 results only
w10 | results ≤ 1 | showing w10 result + w1 results

dolodobendan
2018-09-18, 06:41
But in w1 it has to be an exact match for the whole string, no wildcards?

mherger
2018-09-18, 08:34
> Trying to understand how FTS works in respect to prioritizing results, I
> had a look at the library.db.

https://github.com/Logitech/slimserver/blob/ce5feac2b2c0b43966c725f38eb86d3988061e9f/Slim/Plugin/FullTextSearch/Plugin.pm#L316

I'd have to read those links in there again to fully understand what I
did :-).

Basically w10 has highest weight, w1 lowest. The overall weight is
calculated by adding the different weighed values. Eg. w10 has a weight
of 10'000, w1 a weight of 1. Which means that if a keyword is found in
w10, it's higher prioritized than when it's found in one of the other
columns (there are w5 and w3, too).

What is being stored in those various columns depends on the type of
record. Eg. title/name would always be in w10, w5 would be the year of
the album/track etc. See pretty much at the top of that file. For albums
you'd get the individual tracks' names in w1, allowing to find an album
by a track title etc.

> Do I assume correctly that if w10 only has one or less hits in a search
> (only then!) w1 kicks in, providing more results by widening the
> search?

It's not either or, but the sum of the individual weights. If there's a
hit on w10, then it's pretty difficult for any of the other columns to
compensate, as w10 gets weighed heavily.

--

Michael

dolodobendan
2018-09-18, 13:56
Thank you for explaining that. I have the feeling that sometimes (maybe when using different ways of search), the w1 results are prioritized. Not in a way that they supersede w10, but in a way that they are there at all.

One search in Squeeze Ctrl for Win 10 gets 2 results for songs, 0 for albums, another on gets 51 for songs and 1 for albums. The first search could have shown more results if w1 was included - but it was not, only w10 was used. The second search showed one w10 hit and 50 w1 hits. The w10 hits was the title, explicitly not the album, in both searches, so I would not expect for them to be weighted differently. Yet there's w1 results with search 2.

I'll update my test environment and have a look at it this weekend.

mherger
2018-09-18, 22:03
> One search in Squeeze Ctrl for Win 10 gets 2 results for songs, 0 for
> albums, another on gets 51 for songs and 1 for albums. The first search
> could have shown more results if w1 was included - but it was not, only
> w10 was used. The second search showed one w10 hit and 50 w1 hits. The
> w10 hits was the title, explicitly not the album, in both searches, so I
> would not expect for them to be weighted differently. Yet there's w1
> results with search 2.

I don't follow. Maybe you should give us examples with names and actual
content. Setting plugin.fulltext to debug mode can give you some more
insight, too.

--

Michael

dolodobendan
2018-09-19, 04:09
>
I don't follow. Maybe you should give us examples with names and actual
content. Setting plugin.fulltext to debug mode can give you some more
insight, too.

Can't blame you, working on that!

dolodobendan
2018-09-19, 06:05
Ok, here it goes (partially).

I had a look at library.db/fulltext:


id : 82405
type : track
w10 : Spanish Eyes SPANISH EYES
w5 : 1989 Like A Prayer LIKE A PRAYER Pop POP
w3 : ARTIST:Madonna Artist:Madonna ARTIST:MADONNA Artist:MADONNA flc stereo
w1 : 854608 854kbps 44100 44.1 16 /share/Multimedia/Music/FLACs/Modern/Madonna/Like A Prayer/10 - Madonna - Spanish Eyes.flac

(w10: Is the second, upper case set of entries (SPANISH EYES) used for case insensitive searches?)

So I tried to trigger w1 directly by searching for the first thing in there, 854608.

25671

That's why I came up with


But in w1 it has to be an exact match for the whole string, no wildcards?

25672

(A search for the sample size only got the song, but not the album containing it as a result. A search for the title gives both. Just wondering why that is, under normal circumstances I do not search titles by their samples size.)

25673

But nobody would use the complete path in a search. So I tried that also.
/share/Multimedia/Music/FLACs/Modern/Madonna/ As there are no spaces in this path, I'd expect it to trigger only when matching the whole string.

25674

But it does not. The first entry does not share a path with Madonna:
/share/Multimedia/Music/FLACs/Modern/Die Ärzte/Das Beste von kurz nach früher bis jetze/ (The path was ok before I timed out here. After logging in again it looked like this.) Now some kind of wildcard seems to have been used, in this case even superseding better matches.

25675

dolodobendan
2018-09-19, 06:46
I was able to resolve yesterday's mystery. Mea culpa. :o

One search was a popular term, the other was not. That's why for the first search only w10 results were shown. Please don't hit me.

mherger
2018-09-19, 08:25
> (w10: Is the second, upper case set of entries (SPANISH EYES) used for
> case insensitive searches?)

We have a "title" and a "titlesearch" attribute in the database. The
latter would be a simplified version of the former. No special
characters, no dots etc. "Björk ft. Me" -> "BJORK FT ME". I put both in
the fulltext index to remain flexible.

> (A search for the sample size only got the song, but not the album
> containing it as a result. A search for the title gives both. Just

As you can see from the code I've referenced I'm only adding select
attributes of a track to the album index. Sample size obviously isn't
part of it.

> But nobody would use the complete path in a search. So I tried that
> also.
> Code:
> --------------------
> /share/Multimedia/Music/FLACs/Modern/Madonna/
> --------------------
> As there are no spaces in this path, I'd expect it to trigger only when
> matching the whole string.

I guess the tokenizer would split that string on the /. Resulting in
"Madonna" matching the first tracks's title. And as title is a w10 item
it appears top of the list.

--

Michael

dolodobendan
2018-09-19, 09:25
> (w10: Is the second, upper case set of entries (SPANISH EYES) used for
> case insensitive searches?)

We have a "title" and a "titlesearch" attribute in the database. The
latter would be a simplified version of the former. No special
characters, no dots etc. "Björk ft. Me" -> "BJORK FT ME". I put both in
the fulltext index to remain flexible.

I asked because I was wondering whether this might have to do with some searches being case sensitive.



w3: ARTIST:Земфира Artist:Земфира ARTIST:ZIeMFIRA Artist:ZIeMFIRA flc stereo

w3: ARTIST:Madonna Artist:Madonna ARTIST:MADONNA Artist:MADONNA flc stereo

w3: ARTIST:Die rzte Artist:Die rzte ARTIST:DIE ARZTE Artist:DIE ARZTE flc stereo


MaDoNnA is found where as земфира or rzte is not. (There should be Cyrillic characters and umlauts there.)



As you can see from the code I've referenced I'm only adding select
attributes of a track to the album index. Sample size obviously isn't
part of it.


I'm pretty lousy at reading source code, but I should've seen that.




I guess the tokenizer would split that string on the /. Resulting in
"Madonna" matching the first tracks's title. And as title is a w10 item
it appears top of the list.

Can't thank you enough for your explanations!

mherger
2018-09-19, 23:56
Oh, non-latin characters are case sensitive. That kind of rings a bell... I think I looked into this before, but stopped for some reason. Will re-look :-)

mherger
2018-09-20, 01:10
Another update is on its way which should fix Die Ärzte. Don't know
about Cyrillic... (and it fixes another issue which was actually
breaking search all together under some circumstances).

--

Michael

dolodobendan
2018-09-20, 05:47
I think I looked into this before, but stopped for some reason. Will re-look :-)

I probably bugged you with questions at the time.


Another update is on its way which should fix Die Ärzte. Don't know
about Cyrillic... (and it fixes another issue which was actually
breaking search all together under some circumstances).

That's great, thank you so much!

dolodobendan
2018-09-20, 05:57
Now neither rzte nor rzte can be found. Same for Cyrillic searches. Do I need a rescan?

mherger
2018-09-20, 06:30
> Now neither Ärzte nor ärzte can be found. Same for Cyrillic searches. Do
> I need a rescan?

Oh... crap... yes, you might need a rescan.

but you'd find "arzte"?

--

Michael

dolodobendan
2018-09-20, 08:29
Oh... crap... yes, you might need a rescan.


Then I'll do so and get back to you. ;)



but you'd find "arzte"?


Yes, but I think that worked before, too.

dolodobendan
2018-09-21, 05:07
It works now as expected, even with Cyrillic letters! Thank you!