Re: unicode macro rehearsal

From: leif h silli (lhs@RUSSISK.NO)
Date: Tue Jan 09 2001 - 05:17:28 AEDT


Den 08 January 2001 skal Jens Oestergaard Petersen <oesterg@HUM.KU.DK> ha skrive:


>Zev Handel (zev@SOCRATES.BERKELEY.EDU) made the following posting to the CHINESE-MAC@YORKU.CA list on 14 April last year:

>Some weeks back someone on this list inquired about converting
>formatted files from one Chinese encoding to another.  The general
>consensus was that this can be done for text files, but not for
>formatted files (like Word or Nisus documents).
>
>It is however possible to do it with Nisus files.  Unlike most word
>processors, Nisus stores all its formatting information in the
>resource fork, leaving only straight text in the data fork.  (This is
>why applications that read text files only, like BBEdit, can open
>Nisus files.)
--
<--->

>I used the Chinese Text Converter instead of the Hanzi Converter 1.5 (somehow, couldn't make Cyclone work) and converted a text with GB Chinese to UTF-8, conserving all formatting. Did the same with a text with Big5 in it. I thought this trick only worked because two bytes of Simplified Chinese were of the same length as two bytes of Traditional Chinese, making the formatting apply to the "same places" in the two documents, but it works with UTF-8 (which has three bytes in the CJK range) as well, so some other principle must be at work. No conversion takes place in the footnotes, though, so one will have to move these to the document itself.

Interesting of course. I have tried something similar. I opened a Nisus file in text editor PEPPER. And then used it to convert to mac Latin to Unicode.

However, it could not convert a mixed text of Mac latin and mac cyrillic. This is probably due to a limitatino of PEPPER itself. In fact, one can do the same with Style. Open a NW file in STyle or Pepper and the files will retain the file icon of NW. And they treats the non-mac-roman one-byte text the same way: it become unreadable accented latin text.

And ... this is is in fact one of the **negative* results of the NW-file format. Because all the language information is really a kind of style it is (I suppose) stored in the resource fork. And when you use STYLE to open the NW file and it only sees the data fork, the result isn't very positive.

If, however, you save your text as SimpleText (e.g. with the ST filter included with STYLE) you can open a style NW file in STYLE and convert the hole thing to Unicode just by using STYLE to save as Unicode.

With the asian encodings and UTF encodings this is different because they use character combinantions to symbolize singel characters. Thus, they doesn't rely on style info for their sign value.

>Just a little question: a lot was said last year about applications utilizing the Text Encoding Converter. Why are there no applications drawing on the Unicode Converter? According to Apple documentation, this supports "multiple  encoding runs" - isn't this one of the things we are/were looking for?

 To you little question: the fact is that I think many applicatinos uses Unicode Converter. I am not sure but I thikn Unicode Converter is part of TEC.  At least they can cooperate. I think Unicode Converter basically is for converting e.g. from an internal Uncode format to external mac, windows etc encodings. While TEC is for applications which doesn't use Unicode as an internal format.

 Leif



This archive was generated by hypermail 2b29 : Mon Jan 15 2001 - 23:00:12 AEDT