PDA

View Full Version : Patch for font rendering code [Slim\Display\Lib\Fonts.pm]



SWB
2008-01-11, 16:11
Hello all,

I'm new to all of this ... this is my first contribution to any open-source project, and everything I know about the SlimServer/SqueezeCenter code and Perl, I figured out in the past two days. So I'm probably doing this completely wrong, but here 'goes, anyway. ;-) (Feel free to set me straight!)

This goes in Slim\Display\Lib\Fonts.pm, after line 252 in the 6.5.4 code, which is inside sub string, immediately before the declaration for my $unpackTemplate. I looked at a recent nightly build, and it looks like this will also work in 7.x without changes, although I haven't tried it.



# Our bitmap fonts are actually cp1252 (Windows-Latin1), NOT iso-8859-1.
# The cp1252 encoding has 27 printable characters in the range [\x80-\x9F] .
# In iso-8859-1, this range is occupied entirely by non-printing control codes.
# The Unicode codepoints for the characters in this range are > 255, so instead
# of displaying these characters with our bitmapped font, the code in this
# sub will normally either replace them with characters from a TTF font
# (if present) or transliterate them into the range [\x00-\x7F] .
#
# To prevent this (and allow our full bitmap font to be used whenever
# possible), the following remaps the affected Unicode codepoints to their
# locations in cp1252.
$string =~ s/\x{0152}/\x8C/g; # LATIN CAPITAL LIGATURE OE
$string =~ s/\x{0153}/\x9C/g; # LATIN SMALL LIGATURE OE
$string =~ s/\x{0160}/\x8A/g; # LATIN CAPITAL LETTER S WITH CARON
$string =~ s/\x{0161}/\x9A/g; # LATIN SMALL LETTER S WITH CARON
$string =~ s/\x{0178}/\x9F/g; # LATIN CAPITAL LETTER Y WITH DIAERESIS
$string =~ s/\x{017D}/\x8E/g; # LATIN CAPITAL LETTER Z WITH CARON
$string =~ s/\x{017E}/\x9E/g; # LATIN SMALL LETTER Z WITH CARON
$string =~ s/\x{0192}/\x83/g; # LATIN SMALL LETTER F WITH HOOK
$string =~ s/\x{02C6}/\x88/g; # MODIFIER LETTER CIRCUMFLEX ACCENT
$string =~ s/\x{02DC}/\x98/g; # SMALL TILDE
$string =~ s/\x{2013}/\x96/g; # EN DASH
$string =~ s/\x{2014}/\x97/g; # EM DASH
$string =~ s/\x{2018}/\x91/g; # LEFT SINGLE QUOTATION MARK
$string =~ s/\x{2019}/\x92/g; # RIGHT SINGLE QUOTATION MARK
$string =~ s/\x{201A}/\x82/g; # SINGLE LOW-9 QUOTATION MARK
$string =~ s/\x{201C}/\x93/g; # LEFT DOUBLE QUOTATION MARK
$string =~ s/\x{201D}/\x94/g; # RIGHT DOUBLE QUOTATION MARK
$string =~ s/\x{201E}/\x84/g; # DOUBLE LOW-9 QUOTATION MARK
$string =~ s/\x{2020}/\x86/g; # DAGGER
$string =~ s/\x{2021}/\x87/g; # DOUBLE DAGGER
$string =~ s/\x{2022}/\x95/g; # BULLET
$string =~ s/\x{2026}/\x85/g; # HORIZONTAL ELLIPSIS
$string =~ s/\x{2030}/\x89/g; # PER MILLE SIGN
$string =~ s/\x{2039}/\x8B/g; # SINGLE LEFT-POINTING ANGLE QUOTATION MARK
$string =~ s/\x{203A}/\x9B/g; # SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
$string =~ s/\x{20AC}/\x80/g; # EURO SIGN
$string =~ s/\x{2122}/\x99/g; # TRADE MARK SIGN

andyg
2008-01-11, 16:54
On Jan 11, 2008, at 6:11 PM, SWB wrote:

>
> Hello all,
>
> I'm new to all of this ... this is my first contribution to -any-
> open-source project, and everything I know about the
> SlimServer/SqueezeCenter code -and- Perl, I figured out in the past
> two
> days. So I'm probably doing this completely wrong, but here 'goes,
> anyway. ;-) (Feel free to set me straight!)
>
> This goes in *Slim\Display\Lib\Fonts.pm*, after *line 252* in the
> 6.5.4
> code, which is inside *sub string*, immediately before the declaration
> for *my $unpackTemplate*. I looked at a recent nightly build, and it
> looks like this will also work in 7.x without changes, although I
> haven't tried it.

Hi, thanks for your patch. Can you explain what bug this fixes?

SWB
2008-01-11, 17:14
It's in the comments in the code, but I'll try to explain it a different way.

The bitmap fonts actually use the cp1252 (Windows-Latin1) encoding, not iso-8859-1. This is a good thing, because cp1252 is essentially a superset of iso-8859-1. It contains all the printable characters of iso-8859-1, but it also contains 27 additional printable characters in the range [\x80-\x9F], where iso-8859-1 contains only non-printable control codes. Some of the useful characters in this range (to me, anyway) include typographers quotes, en and em dashes, the trademark symbol, etc.

Without this patch, one of two things happen, depending on whether or not a Unicode TrueType font (such as CODE2000.ttf) is available to SlimServer/SqueezeCenter:

If a TrueType font is available, the TrueType font is used to display these characters instead of the bitmap font, even though the characters actually exist in the bitmap font. One implication of this is that the characters do not match the rest of the text, since they're displayed in a different font. Also, I suspect there is some performance penalty to rendering the TrueType font instead of using the bitmap font, but it may be insignificant.

If a TrueType font is not available, SlimServer/SqueezeCenter uses Unidecode to transliterate these characters to the 7-bit ASCII range. As a result, typographers quotes become plain quotes, the euro symbol becomes EU, the bullet becomes *, etc.

In niether case are the characters from the bitmap font used, even though they exist in the font.

This patch fixes that, ensuring that all characters in the bitmap font are actually used, reverting to the TrueType font or transliteration only for characters outside of cp1252.

andyg
2008-01-11, 17:27
On Jan 11, 2008, at 7:14 PM, SWB wrote:

>
> It's in the comments in the code, but I'll try to explain it a
> different
> way.
>
> The bitmap fonts actually use the cp1252 (Windows-Latin1) encoding,
> -not- iso-8859-1. This is a good thing, because cp1252 is
> essentially a
> superset of iso-8859-1. It contains all the printable characters of
> iso-8859-1, but it also contains 27 additional printable characters in
> the range [\x80-\x9F], where iso-8859-1 contains only non-printable
> control codes. Some of the useful characters in this range (to me,
> anyway, include typographers quotes, en and em dashes, the trademark
> symbol, etc.
>
> Without this patch, one of two things happen, depending on whether or
> not a Unicode TrueType font (such as CODE2000.ttf) is available to
> SlimServer/SqueezeCenter:
>
> If a TrueType font is available, the TrueType font is used to display
> these characters instead of the bitmap font, even though the
> characters
> actually exist in the bitmap font. One implication of this is that
> the
> characters do not match the rest of the text, since they're displayed
> in a different font. Also, I suspect there is some performance
> penalty
> to rendering the TrueType font instead of using the bitmap font, but
> it
> may be insignificant.
>
> If a TrueType font is -not- available, SlimServer/SqueezeCenter uses
> Unidecode to transliterate these characters to the 7-bit ASCII range.
> As a result, typographers quotes become plain quotes, the euro symbol
> becomes EU, the bullet becomes *, etc.
>
> In niether case are the characters from the bitmap font used, even
> though they exist in the font.
>
> This patch fixes that, ensuring that all characters in the bitmap font
> are actually used, reverting to the TrueType font or transliteration
> only for characters outside of cp1252.

OK that makes sense. Can you post some screenshots of the difference
in character renderings using SoftSqueeze? That may help to convince
people.

Also, your patch probably needs some optimization, running that many
regexes on every bit of display text will be too slow. And these
chars are certainly used very rarely, so the code should have as small
a performance impact as possible.

SWB
2008-01-11, 20:14
Also, your patch probably needs some optimization, running that many regexes on every bit of display text will be too slow.

I wondered about that. I'll see what I can do on that front.

Here are some screenshots (this and the next two posts).

First, here is what it looks like now, without a TrueType font (so transliteration is being performed).

SWB
2008-01-11, 20:16
Here's what it looks like now with the CODE2000.TTF font installed.

SWB
2008-01-11, 20:21
And here's what it looks like with my patch.

I've also included with this post a small FLAC file (zipped) with metadata that demonstrates this.

SWB
2008-01-11, 21:22
Ok, here's an update that does it with one regex. How's this?



# Our bitmap fonts are actually cp1252 (Windows-Latin1), NOT iso-8859-1.
# The cp1252 encoding has 27 printable characters in the range [\x80-\x9F] .
# In iso-8859-1, this range is occupied entirely by non-printing control codes.
# The Unicode codepoints for the characters in this range are > 255, so instead
# of displaying these characters with our bitmapped font, the code in this
# sub will normally either replace them with characters from a TTF font
# (if present) or transliterate them into the range [\x00-\x7F] .
#
# To prevent this (and allow our full bitmap font to be used whenever
# possible), the following remaps the affected Unicode codepoints to their
# locations in cp1252.

my %cp1252mapping = (
"\x{0152}" => "\x8C", # LATIN CAPITAL LIGATURE OE
"\x{0153}" => "\x9C", # LATIN SMALL LIGATURE OE
"\x{0160}" => "\x8A", # LATIN CAPITAL LETTER S WITH CARON
"\x{0161}" => "\x9A", # LATIN SMALL LETTER S WITH CARON
"\x{0178}" => "\x9F", # LATIN CAPITAL LETTER Y WITH DIAERESIS
"\x{017D}" => "\x8E", # LATIN CAPITAL LETTER Z WITH CARON
"\x{017E}" => "\x9E", # LATIN SMALL LETTER Z WITH CARON
"\x{0192}" => "\x83", # LATIN SMALL LETTER F WITH HOOK
"\x{02C6}" => "\x88", # MODIFIER LETTER CIRCUMFLEX ACCENT
"\x{02DC}" => "\x98", # SMALL TILDE
"\x{2013}" => "\x96", # EN DASH
"\x{2014}" => "\x97", # EM DASH
"\x{2018}" => "\x91", # LEFT SINGLE QUOTATION MARK
"\x{2019}" => "\x92", # RIGHT SINGLE QUOTATION MARK
"\x{201A}" => "\x82", # SINGLE LOW-9 QUOTATION MARK
"\x{201C}" => "\x93", # LEFT DOUBLE QUOTATION MARK
"\x{201D}" => "\x94", # RIGHT DOUBLE QUOTATION MARK
"\x{201E}" => "\x84", # DOUBLE LOW-9 QUOTATION MARK
"\x{2020}" => "\x86", # DAGGER
"\x{2021}" => "\x87", # DOUBLE DAGGER
"\x{2022}" => "\x95", # BULLET
"\x{2026}" => "\x85", # HORIZONTAL ELLIPSIS
"\x{2030}" => "\x89", # PER MILLE SIGN
"\x{2039}" => "\x8B", # SINGLE LEFT-POINTING ANGLE QUOTATION MARK
"\x{203A}" => "\x9B", # SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
"\x{20AC}" => "\x80", # EURO SIGN
"\x{2122}" => "\x99" # TRADE MARK SIGN
);

$string =~ s/(\x{0152}|\x{0153}|\x{0160}|\x{0161}|\x{0178}|\x{0 17D}|\x{017E}|\x{0192}|\x{02C6}|\x{02DC}|\x{2013}| \x{2014}|\x{2018}|\x{2019}|\x{201A}|\x{201C}|\x{20 1D}|\x{201E}|\x{2020}|\x{2021}|\x{2022}|\x{2026}|\ x{2030}|\x{2039}|\x{203A}|\x{20AC}|\x{2122})/$cp1252mapping{$1}/eg;

andyg
2008-01-11, 22:03
Well those screenshots definitely look better, I'll do some benchmarks on your code and see what we can do, thanks.

gerph
2008-01-12, 02:03
Ok, here's an update that does it with one regex. How's this?

...



Can I suggest wrapping this section in something like...



if (/[\x{0152}-\x{2122}]/o)
{
... do the processing
}


so that these checks are only done if there are any characters in that range - effectively a fast reject, because many(western) strings will be ISO 8859-1 and will avoid the penalty of such an expensive regex.

Can I also suggest that any of the regular expressions use the 'o' specifier to ensure that the regular expression is only compiled once, not every time it's executed, eg :



$string =~ s/(\x{0152}|\x{0153}|\x{0160}|\x{0161}|\x{0178}|\x{0 17D}|\x{017E}|\x{0192}|\x{02C6}|\x{02DC}|\x{2013}| \x{2014}|\x{2018}|\x{2019}|\x{201A}|\x{201C}|\x{20 1D}|\x{201E}|\x{2020}|\x{2021}|\x{2022}|\x{2026}|\ x{2030}|\x{2039}|\x{203A}|\x{20AC}|\x{2122})/$cp1252mapping{$1}/ego;


AIUI the 'o' only applies to the left side of the expression - ie its compilation, so the right side being executed here makes no difference.

andyg
2008-01-12, 07:22
I've applied this as change 16211. It performs very well, thanks guys.