“Butterflies stir a breeze
and the ripples flow unceasingly:
far away the cyclones swirl.
It's a whole, connected world.”*
Details for the curious
More details for the very curious
A short description for Read Me haters
Cyclone is a text converting utility that uses Apple Text Encoding
Details for the curious
Cyclone is free for any use (except for abuse).
Any distribution is encouraged, CD makers should not hesitate to send a complimentary copy to the author :-)
The author retains copyright for the program and does not wish to see any modified/incomplete versions distributed.
Theory of operation
Text Encoding Converter (called TEC) is a Mac OS engine for handling different languages using different character sets. It supports many standards, it is robust and pretty fast. Many applications use it for their internal conversion needs and that's great but I could not seem to find a plain converter using this engine. So here comes the Cyclone.
Because Cyclone is using TEC's conversion maps, it will grow with TEC even if the program itself will not be developed. When more encodings appear in future incarnations of TEC or any maps are corrected or modified, Cyclone is supposed to use them as if nothing has changed (OK — one exception — I hard-coded the names of encodings, because I was not satisfied with the names returned by TEC, but if Cyclone will not find the name for any new encoding in its own resources, it will use the name given by TEC). TEC does not change line endings properly so I added this option (any bugs in this field are mine), look “More details” section for specifications.
Cyclone can convert many files dragged at it or chosen from standard file dialog (Navigation Services needed for multiple selection). The conversion is streamed, so the size of input and output files is not limited, but of course the more memory you give to Cyclone by “Get Info”, the larger chunks of text it will be able to read-in and convert at a time. Speed changes can be significant. Clipboard conversion is limited by the size of Cyclone's memory. No memory outside Cyclone's own heap is used for safety reasons.
When you look at the conversion dialog you will see the two sets of pop-ups, left for input, right for output. Choose the standard/platform first, then specific encoding and lastly the variant (if any variant for the given encoding exists). You may choose whatever you want for input and output encodings, but you must be aware that not all conversions make sense — you cannot translate from Chinese to Greek with TEC (not yet :-)). Sometimes you will get an error, but sometimes not. You are responsible for choosing a valid encodings for input and output. You may use content sniffers, which can help with input encoding (look “More details” section for description of sniffers), but do not rely on it.
I implemented the following options to make my life easier (and hopefully yours too):
Multiple file settings
More details for the very curious
Content sniffing is a feature offered by TEC and used by Cyclone when checked in preferences.
When this option is active, Cyclone tries to suggest what input encoding is used. Unfortunately in current TEC version (1.5) can guess content ONLY for far-east languages. So if you are using these languages frequently, this option is for you. Otherwise you will be annoyed that Cyclone (or TEC, to be precise) suggests Chinese or Japanese every time you want to convert a plain ASCII.
This option is turned off by default.
Content sniffing is not working correctly.
I do not use it and people seem not to care about it — this is why it is not fixed yet.
Sniffers available in TEC 1.4.3 and 1.5 (in order of appearance):
As mentioned before, TEC does not change the line breaks to match the output standard. For example when you convert from Mac to Windows, everything is converted OK except for line endings, which remain in Mac standard. So the option to change the line breaks has been added. Here are the rules for output standards:
Unicode and HTML
HTML writers please note, that if you are building a page where most (or all) characters are ASCII, the encoding of choice for you is Unicode UTF-8. If all characters are ASCII, the length of your page will be exactly the same as if no Unicode is used.
To inform a browser that the Unicode UTF-8 is used, type:
<META HTTP-EQUIV="content-type" CONTENT="text/html;charset=UTF-8">
between <HEAD> and</HEAD> at the beginning of your file.
Another UTF-8 issue is line breaks (again). I tried the UTF-8 text on recent versions of Netscape, Explorer and iCab and I found that it works fine provided that you will NOT use the PS = 0xE280A9 which is not recognized neither by Netscape nor Explorer. So if you are using Cyclone for HTML conversions from any 8-bit encoding to Unicode UTF-8, you are safe that this unwanted line break will not occur. If you are converting from Unicode standard where the PS (= 0x2029) is used, you will get the unwanted breaks.
Unicode standard (16 bit) may also be used for creating HTMLs, but it seems that only Netscape is able to handle it — and only if a byte-order mark is present at the beginning of the file (yes, you guessed it, Cyclone puts the needed mark :-) )
If you convert HTMLs with Cyclone, it would be nice if you gave a credit to it — but you are not obliged — just to spread the word and help Unicode become more popular. You may add something like this:
This page has been converted to Unicode by <A HREF="http://www.ire.pw.edu.pl/~tkukiel/cyclone.html">Cyclone</A>
Unicode and MS Office 98 for Mac
I had a chance to try MS Word's Unicode export feature — beware, it “eats” some characters, and there is no particular logic for what characters it happens. For one hundred chars at least one char is lost. The solution? Save in a plain text format and use Cyclone for conversion. If you are using different languages in one text, you must separate the chunks that use the same encoding and convert them one by one.
More Unicode notes
The registered type for standard Unicode (UTF-16) text is 'utxt' (used for file and clipboard), while plain 8-bit text uses 'TEXT'. You may not be able to see the content of the clipboard or paste it if the application you use does not support Unicode. Unicode UTF-8 and UTF-7 remain 'TEXT'.
Each standard Unicode (UTF-16) text produced by Cyclone has a byte-order mark (0xFEFF) at the beginning to ensure 100% portability.
Beginning with version 1.1 “Cyclone” is scriptable via AppleScript. Please see the sample scripts provided in “Scripting” folder. A document entitled “Encodings Dictionary” contains predefined encoding names which can be used in scripts.
Available AppleScript commands:
convert <file_list> from <encoding> to <encoding>
convert clipboard from <encoding> to <encoding>
convert text <some_text> from <encoding> to <encoding>
Beginning with version 1.3 you may pass an Interent name for encoding.
This option is available with any “convert” command: “convert”, “convert text”, “convert clipboard”:
convert some_file from "ISO-8859-1" to "UTF-8"
set option <an_option>
Cyclone is quite static now — the software is stable and not very buggy as far as I know. There are some feature requests and I have my own ideas to implement, but I am simply too busy with my regular job to do it now.
The author gives no warranty for this software and takes no responsibility for any damages that it may cause. If you cannot accept it, please delete your copy.
All trademarks are properties of their owners.
* the quotation is from Peter Hammill (“Gaia”).