PDA

View Full Version : Unicode support and text direction



Avi Schwartz
2005-11-01, 11:08
I have two questions regarding the state of the Unicode support:


1) Is Unicode is supported only for SB2 and above? Is there no way to
use it with a SB1 with the graphics display upgrade? Since there is no
way to have two sets of tags, one for each device, it is a case all or
nothing. Either upgrade all my devices or stick with English. I know
which one Slim Devices will prefer. :-)


2) While Hebrew letters appear fine on SB2, the characters are in the
reverse order. If it were English, then it will appear as hsilgnE. Is
there anything I can do to fix it besides entering the text in reverse?
Also it would be nice to have the SB2 change the scrolling direction
when displaying a language that is written from right to left.


Avi

--
Avi Schwartz
http://public.xdi.org/=avi.schwartz

Dan Sully
2005-11-01, 11:12
* Avi Schwartz shaped the electrons to say...

>1) Is Unicode is supported only for SB2 and above? Is there no way to
>use it with a SB1 with the graphics display upgrade? Since there is no
>way to have two sets of tags, one for each device, it is a case all or
>nothing. Either upgrade all my devices or stick with English. I know
>which one Slim Devices will prefer. :-)

It should work on SB1, but of course the resolution is lower. I've not tested it though.

>2) While Hebrew letters appear fine on SB2, the characters are in the
>reverse order. If it were English, then it will appear as hsilgnE. Is
>there anything I can do to fix it besides entering the text in reverse?
>Also it would be nice to have the SB2 change the scrolling direction
>when displaying a language that is written from right to left.

The plan is to support it eventually. Patches welcome.

-D
--
"My pockets hurt." - Homer Simpson

Avi Schwartz
2005-11-01, 11:24
Dan Sully wrote:

> * Avi Schwartz shaped the electrons to say...
>
>> 1) Is Unicode is supported only for SB2 and above? Is there no way
>> to use it with a SB1 with the graphics display upgrade? Since there
>> is no way to have two sets of tags, one for each device, it is a
>> case all or nothing. Either upgrade all my devices or stick with
>> English. I know which one Slim Devices will prefer. :-)
>
>
> It should work on SB1, but of course the resolution is lower. I've not
> tested it though.


It does not work on the SB1+G. All I am getting are the old and
familiar strange collection of characters. On the SB2 on the other hand
it works fine.

>
>> 2) While Hebrew letters appear fine on SB2, the characters are in the
>> reverse order. If it were English, then it will appear as hsilgnE.
>> Is there anything I can do to fix it besides entering the text in
>> reverse? Also it would be nice to have the SB2 change the scrolling
>> direction when displaying a language that is written from right to left.
>
>
> The plan is to support it eventually. Patches welcome.

OK, fair enough. I am sure you don't have too many clients with right
to left needs. Time to go and get myself a Perl book I guess...

Avi

--
Avi Schwartz
http://public.xdi.org/=avi.schwartz

Marc Sherman
2005-11-01, 12:01
Avi Schwartz wrote:
>
> OK, fair enough. I am sure you don't have too many clients with right
> to left needs. Time to go and get myself a Perl book I guess...

Check out FriBidi -- it's a bidi library with a Perl module:

http://imagic.weizmann.ac.il/~dov/freesw/FriBidi/

It looks like the perl module may be rather out-of-date relative to the
current official Gnu release, though:

http://fribidi.org/wiki/

- Marc

Triode
2005-11-01, 12:36
> 2) While Hebrew letters appear fine on SB2, the characters are in the
> reverse order. If it were English, then it will appear as hsilgnE. Is
> there anything I can do to fix it besides entering the text in reverse?
> Also it would be nice to have the SB2 change the scrolling direction
> when displaying a language that is written from right to left.
>
Changing text order is conceptually easy, changing the direction of scrolling is harder...

Is there an easy way to tell which text should be ordered the other way Dan?

Dan Sully
2005-11-01, 12:51
* Triode shaped the electrons to say...

>Changing text order is conceptually easy, changing the direction of
>scrolling is harder...
>
>Is there an easy way to tell which text should be ordered the other way Dan?

Not easily - there would need to be a user pref, and some guessing.

-D
--
"It has become appallingly obvious that our technology has exceeded our humanity." - Albert Einstein

pfarrell
2005-11-01, 12:52
Dan Sully said:
> * Triode shaped the electrons to say...
>>Changing text order is conceptually easy, changing the direction of
>>scrolling is harder...
>>
>>Is there an easy way to tell which text should be ordered the other way
>> Dan?
>
> Not easily - there would need to be a user pref, and some guessing.

This rapidly gets really hard, and probably belongs over on the dev-list.

You can easily tell the language given a Unicode string. And so have a
dispatch that sets it one way or the other for Hebrew, Arabic and other
that don't follow the western European tradition.

But you have to at least allow a library of tunes to have mixed
albums, some in English or French and some in mid-Eastern languages.
So you at least have to switch the scrolling direction by tune/song.

To do this right, you have to handle Hebrew/Arabic words in English
titles, and vice versa, which is where it starts to get very tricky. I'm
only smart enough to know it is hard, I don't know all the rules.


Pat
http://www.pfarrell.com

Avi Schwartz
2005-11-01, 13:05
Dan Sully wrote:

> * Triode shaped the electrons to say...
>
>> Changing text order is conceptually easy, changing the direction of
>> scrolling is harder...
>>
>> Is there an easy way to tell which text should be ordered the other
>> way Dan?
>
>
> Not easily - there would need to be a user pref, and some guessing.
>
Can we rely on certain character ranges within the Unicode character set?
The biggest problem as I see it is what to do when RTL and LTRs are mixed.

Avi

--
Avi Schwartz
http://public.xdi.org/=avi.schwartz

pfarrell
2005-11-01, 14:56
On Tue, 2005-11-01 at 14:05 -0600, Avi Schwartz wrote:

> Can we rely on certain character ranges within the Unicode character set?

Yes, the language character sets are uniquely identified. Hebrew
characters are in the range: 059005FF

I'm not sure if Yiddish uses the same or different characters/range.

Doing this will have performance impact, so you probably want a
system-wide preference to turn on testing. It probably will increase
memory usage as well.

> The biggest problem as I see it is what to do when RTL and LTRs are mixed.

That is very hard. So don't do that.


--
Pat Farrell
http://www.pfarrell.com

Michaelwagner
2005-11-01, 17:27
I'm not sure if Yiddish uses the same or different characters/range.Yiddish uses the same characters (when written in "proper script") as Hebrew.

However:

The very famous Andrews Sisters song "Bei Mir Bist Du Shaen" (often incorrectly written as "Bei Mir Bist Du Schoen" - which would be German) is almost always written in English, despite the fact that the title is Yiddish.

Songs may have two titles, depending on which language (and alphabet) is available. I seem to recall the ID3 standard has support for subtitles, etc, but I don't think it has explicit support for alternate versions of the same title in different languages.

There is a Language tag, but that is for the language of the material recorded, not for the title.

There is also a problem with titles that are in mixed language. For instance, the title of a song by the Toten Hosen: Never Mind The Hosen - Here's Die Roten Rosen. Imagine doing the same thing in Hebrew - which way should it scroll? (it seems to me in fact that the Geshash haHiver did something like that)

http://www.dietotenhosen.de/en/veroeffentlichungen_discographie_1987.php

The problem gets very complex very fast.

But I think you could do well to say, if the lions share of the unicode code points were from one language, the text and scroll direction should be reversed. That would cover 90%+ of the cases.

It seems to me this came up before, in another context. Someone wanted to mount a slim upside down under a kitchen cabinet, and wanted to know if the display could run upside down.

I think he gave up and mounted it another way, leaving the question of whether the software can perform this trick unanswered.

Triode
2005-11-03, 03:38
>
> It seems to me this came up before, in another context. Someone wanted
> to mount a slim upside down under a kitchen cabinet, and wanted to know
> if the display could run upside down.
>
> I think he gave up and mounted it another way, leaving the question of
> whether the software can perform this trick unanswered.
>

The software is responsible for building bitmaps which are sent to the client. So clearly the software _could_ build the bitmaps
rotated by 180 degrees. However all the code assumes normal orientation of the display so this would need a reasonably amount of
rewriting. All the animation routines in the firmware would be reversed too (scrolling bottom line would become scolling top line).
In short, rotating the display hardware is definately easier...

dean
2005-11-03, 14:48
I wonder if it's possible for the string rendering code to look at
the character range and then simply change the order of the rendering
of the characters based on some rules:

look at a character
if (it's in the range of characters that should be rendered right to
left) {
scan forward until you find a non-space character that should be
rendered left to right
render the characters in revers order until you get back to the
first character
skip forward to after the last character rendered
}

also, the scrolling code could look at the first character that it's
asked to scroll and render the animation right to left if it's in the
right-to-left range.


On Nov 3, 2005, at 2:38 AM, Triode wrote:

>>
>> It seems to me this came up before, in another context. Someone
>> wanted
>> to mount a slim upside down under a kitchen cabinet, and wanted to
>> know
>> if the display could run upside down.
>>
>> I think he gave up and mounted it another way, leaving the
>> question of
>> whether the software can perform this trick unanswered.
>>
>
> The software is responsible for building bitmaps which are sent to
> the client. So clearly the software _could_ build the bitmaps
> rotated by 180 degrees. However all the code assumes normal
> orientation of the display so this would need a reasonably amount
> of rewriting. All the animation routines in the firmware would be
> reversed too (scrolling bottom line would become scolling top
> line). In short, rotating the display hardware is definately easier...
>

Michaelwagner
2005-11-03, 19:19
I would suggest slightly different logic.

Separate the test for determining text direction, and make it something that the user interface can influence (plug-in, default, not sure).

I'm sure Avi would prefer the default went the other way, because probably more of his collection is in Hebrew.

But, in general, any heuristic like "guess based on code points" is doomed to failure, because it is a heuristic. Unless unicode encompases rendering direction (does it?), you're guessing about languages you don't know enough about.

So make it something the user can influence (or replace?).

The rest of the code - how it's implemented - can be built-in as Dean describes. But is rendering direction important? If the characters paint themselves in the other order, is that important? Provided the customer can read them properly afterwards, does anyone care? I'm not sure. It's not a rhetorical question - I really don't know if it matters.

For a demonstration project almost 20 years ago, we got a dual-daisywheel printer (or one wheel with both alphabets on it, don't remember) and printed side by side columns of the first page of the hebrew bible (which starts out "In the beginning") and an English translation (I think we used King James but it's too long ago and the details are indistinct).

We reversed the string in memory to calculate space requirements, but we actually rendered it left to right, including the hebrew, because it was faster in the hardware and no one reads the type ball as it's typing.

Avi Schwartz
2005-11-03, 20:41
Michaelwagner wrote:

>I would suggest slightly different logic.
>
>Separate the test for determining text direction, and make it something
>that the user interface can influence (plug-in, default, not sure).
>
>I'm sure Avi would prefer the default went the other way, because
>probably more of his collection is in Hebrew.
>
>
Actually, not. Most of my collection is in English, but I still have a
substantial Hebrew collection.

>But, in general, any heuristic like "guess based on code points" is
>doomed to failure, because it is a heuristic. Unless unicode encompases
>rendering direction (does it?), you're guessing about languages you
>don't know enough about.
>
>
I found the following information:
"The UNICODE specification assigns directionality to characters and
defines a (complex) algorithm for determining the proper directionality
of text."

Wikipedia says that "The Unicode standard also includes a number of
related items, such as character properties, text normalization forms,
and bidirectional display order (for the correct display of text
containing both right-to-left scripts, such as Arabic or Hebrew, and
left-to-right scripts)." But I don't think that directionality is part
of the property of a character. i.e., not a flag within that can be
examined.

I started reading the BIDI algorithm specification and it is a great
sleeping pill: http://www.unicode.org/reports/tr9/

>So make it something the user can influence (or replace?).
>
>The rest of the code - how it's implemented - can be built-in as Dean
>describes. But is rendering direction important? If the characters
>paint themselves in the other order, is that important? Provided the
>customer can read them properly afterwards, does anyone care? I'm not
>sure. It's not a rhetorical question - I really don't know if it
>matters.
>
>
Are you talking about the scrolling direction? If you are, then yes, it
is important. Try to read this sentence from right to left through a
small gap and you'll see.

--
Avi Schwartz
http://public.xdi.org/=avi.schwartz

Avi Schwartz
2005-11-03, 21:08
Avi Schwartz wrote:

> I found the following information:
> "The UNICODE specification assigns directionality to characters and
> defines a (complex) algorithm for determining the proper
> directionality of text."
>
> Wikipedia says that "The Unicode standard also includes a number of
> related items, such as character properties, text normalization forms,
> and bidirectional display order (for the correct display of text
> containing both right-to-left scripts, such as Arabic or Hebrew, and
> left-to-right scripts)." But I don't think that directionality is
> part of the property of a character. i.e., not a flag within that can
> be examined.

I just found an interesting utility for the Mac called UnicodeChecker
which lead me to a set of text files called the Unicode Database which
describe the properties of each character including its direction. So
yes, there is a way to get this information.

Avi

--
Avi Schwartz
http://public.xdi.org/=avi.schwartz

Michaelwagner
2005-11-03, 22:11
Are you talking about the scrolling direction?No. Rendering direction.

If it takes 1/10th of second for an entire screen (not scrolled) to show up on the screen, do you care in what order the characters appear on the screen?

Avi Schwartz
2005-11-03, 22:32
Michaelwagner wrote:

>Avi Schwartz Wrote:
>
>
>>Are you talking about the scrolling direction?No. Rendering direction.
>>
>>
>
>If it takes 1/10th of second for an entire screen (not scrolled) to
>show up on the screen, do you care in what order the characters appear
>on the screen?
>
>
Of course not.

--
Avi Schwartz
http://public.xdi.org/=avi.schwartz

Marc Sherman
2005-11-04, 11:48
dean blackketter wrote:
> I wonder if it's possible for the string rendering code to look at the
> character range and then simply change the order of the rendering of
> the characters based on some rules:
>
> look at a character
> if (it's in the range of characters that should be rendered right to
> left) {
> scan forward until you find a non-space character that should be
> rendered left to right
> render the characters in revers order until you get back to the
> first character
> skip forward to after the last character rendered
> }

Unicode specifies a pretty complex algorithm to get bidi working correctly.

Once again, allow me to suggest the FriBidi perl module:
http://imagic.weizmann.ac.il/~dov/freesw/FriBidi/

- Marc