“Butterflies stir a breeze
and the ripples flow unceasingly:
far away the cyclones swirl.
It's a whole, connected world.”*
More details for the very curious
A short description for Read Me haters
Cyclone is a text converting utility that uses Apple Text Encoding
Converter.
Highlights include:
Version history
Details for the curious
License/distribution:
Cyclone is free for any use (except for abuse).
Any distribution is encouraged, CD makers should not hesitate
to send a complimentary copy to the author :-)
The author retains copyright for the program and does not wish
to see any modified/incomplete versions distributed.
Author:
Cyclone Requirements:
Tested with:
Theory of operation
Text Encoding Converter (called TEC) is a Mac OS engine for handling
different languages using different character sets. It supports
many standards, it is robust and pretty fast. Many applications
use it for their internal conversion needs and that's great but
I could not seem to find a plain converter using this engine.
So here comes the Cyclone.
Highlights revisited
Because Cyclone is using TEC's conversion maps, it will grow with
TEC even if the program itself will not be developed. When more
encodings appear in future incarnations of TEC or any maps are
corrected or modified, Cyclone is supposed to use them as if nothing
has changed (OK — one exception — I hard-coded the names of encodings,
because I was not satisfied with the names returned by TEC, but
if Cyclone will not find the name for any new encoding in its
own resources, it will use the name given by TEC). TEC does not
change line endings properly so I added this option (any bugs
in this field are mine), look “More details” section for specifications.
Cyclone can convert many files dragged at it or chosen from standard
file dialog (Navigation Services needed for multiple selection).
The conversion is streamed, so the size of input and output files
is not limited, but of course the more memory you give to Cyclone
by “Get Info”, the larger chunks of text it will be able to read-in
and convert at a time. Speed changes can be significant. Clipboard
conversion is limited by the size of Cyclone's memory. No memory
outside Cyclone's own heap is used for safety reasons.
Conversions
When you look at the conversion dialog you will see the two sets
of pop-ups, left for input, right for output. Choose the standard/platform
first, then specific encoding and lastly the variant (if any variant
for the given encoding exists). You may choose whatever you want
for input and output encodings, but you must be aware that not
all conversions make sense — you cannot translate from Chinese
to Greek with TEC (not yet :-)). Sometimes you will get an error,
but sometimes not. You are responsible for choosing a valid encodings
for input and output. You may use content sniffers, which can
help with input encoding (look “More details” section for description
of sniffers), but do not rely on it.
Preferences
I implemented the following options to make my life easier (and
hopefully yours too):
Multiple file settings
More details for the very curious
Content sniffers
Content sniffing is a feature offered by TEC and used by Cyclone
when checked in preferences.
When this option is active, Cyclone tries to suggest what input
encoding is used. Unfortunately in current TEC version (1.5)
can guess content ONLY for far-east languages. So if you are using
these languages frequently, this option is for you. Otherwise
you will be annoyed that Cyclone (or TEC, to be precise) suggests
Chinese or Japanese every time you want to convert a plain ASCII.
This option is turned off by default.
Content sniffing is not working correctly.
I do not use it and people seem not to care about it — this is why it is not fixed yet.
Sniffers available in TEC 1.4.3 and 1.5 (in order of appearance):
Macintosh:
Line Breaks
As mentioned before, TEC does not change the line breaks to match
the output standard. For example when you convert from Mac to
Windows, everything is converted OK except for line endings, which
remain in Mac standard. So the option to change the line breaks
has been added. Here are the rules for output standards:
Unicode and HTML
HTML writers please note, that if you are building a page where
most (or all) characters are ASCII, the encoding of choice
for you is Unicode UTF-8. If all characters are ASCII, the length
of your page will be exactly the same as if no Unicode is used.
To inform a browser that the Unicode UTF-8 is used, type:
<META HTTP-EQUIV="content-type" CONTENT="text/html;charset=UTF-8">
between <HEAD> and</HEAD> at the beginning of your file.
Another UTF-8 issue is line breaks (again). I tried the UTF-8
text on recent versions of Netscape, Explorer and iCab and I found that it works
fine provided that you will NOT use the PS = 0xE280A9 which is
not recognized neither by Netscape nor Explorer. So if you are
using Cyclone for HTML conversions from any 8-bit encoding to
Unicode UTF-8, you are safe that this unwanted line break will
not occur. If you are converting from Unicode standard where the
PS (= 0x2029) is used, you will get the unwanted breaks.
Unicode standard (16 bit) may also be used for creating HTMLs,
but it seems that only Netscape is able to handle it — and only if a byte-order mark
is present at the beginning of the file (yes, you guessed it,
Cyclone puts the needed mark :-) )
If you convert HTMLs with Cyclone, it would be nice if you gave
a credit to it — but you are not obliged — just to spread the
word and help Unicode become more popular. You may add something
like this:
This page has been converted to Unicode by <A HREF="http://www.ire.pw.edu.pl/~tkukiel/cyclone.html">Cyclone</A>
Unicode and MS Office 98 for Mac
I had a chance to try MS Word's Unicode export feature — beware,
it “eats” some characters, and there is no particular logic for
what characters it happens. For one hundred chars at least one
char is lost. The solution? Save in a plain text format and use
Cyclone for conversion. If you are using different languages in
one text, you must separate the chunks that use the same encoding
and convert them one by one.
More Unicode notes
The registered type for standard Unicode (UTF-16) text is 'utxt'
(used for file and clipboard), while plain 8-bit text uses 'TEXT'.
You may not be able to see the content of the clipboard or paste
it if the application you use does not support Unicode. Unicode
UTF-8 and UTF-7 remain 'TEXT'.
Each standard Unicode (UTF-16) text produced by Cyclone has a
byte-order mark (0xFEFF) at the beginning to ensure 100% portability.
Scripting
Beginning with version 1.1 “Cyclone” is scriptable via AppleScript. Please see
the sample scripts provided in “Scripting” folder. A document entitled
“Encodings Dictionary” contains predefined encoding names which can be used in scripts.
Available AppleScript commands:
convert <file_list> from <encoding> to <encoding>
convert clipboard from <encoding> to <encoding>
convert text <some_text> from <encoding> to <encoding>
Beginning with version 1.3 you may pass an Interent name for encoding.
This option is available with any “convert” command: “convert”,
“convert text”, “convert clipboard”:
convert some_file from "ISO-8859-1" to "UTF-8"
Setting Options:
set option <an_option>
The future
Cyclone is quite static now — the software is stable and not very buggy as far as I know. There are some feature requests and I have my own ideas to implement, but I am simply too busy with my regular job to do it now.
Small print
The author gives no warranty for this software and takes no responsibility
for any damages that it may cause. If you cannot accept it, please
delete your copy.
All trademarks are properties of their owners.
* the quotation is from Peter Hammill (“Gaia”).