|
UnicodeConverter
Usage: UnicodeConverter comes in three versions—Java, Windows, and .NET executable; they sport similar graphic user interface and are capable of converting multiple files, a directory, including subdirectories, or an entire website. They all include conversion support for text, RTF, HTML, and Word/Excel file types. The Java program requires Java 2 Runtime Environment, 1.4 or later. The Windows version requires Microsoft Java Virtual Machine to run. Any problems encountered while launching the program can mostly be solved by installing or updating to the latest Microsoft VM (build 3805 or later) through Microsoft Windows Update. The .NET version requires Microsoft .NET Framework Redistributable. Preparations To ensure successful conversion of HTML files in legacy formats and to minimize post-conversion editing, some pre-conversion conditioning needs to be performed on the HTML source files. Removing obsolete dynamic font links (.pfr or .eot) and associated ActiveX control scripts (e.g., tdserver.js) is recommended (yellow text in the illustration), for leaving them in will needlessly slow down page download.
Changing the original document fonts to the more common ones with respect to its original encoding may be needed.
These basic editing tasks should be done prior to the actual conversion process and can be expeditiously performed using MDI (multiple document interface) text editors which allow opening multiple files and performing global find/replace actions on all open files at once. CuteHTML, TextPad, UltraEdit, EditPlus, EditPad, etc. are some text editors that sport such useful features. They can be searched and downloaded from http://www.download.com. Running UnicodeConverter
The resulting Unicode output files are placed in a x_Unicode directory located at the same tree level as the source directory that contains the original files, which remain unchanged. UTF-8 HTML files are viewable on any Unicode-enabled web browsers, such as Firefox, Netscape, Mozilla, Internet Explorer, Opera, or Safari. The default fonts for the output files are Times New Roman and Arial. Users can change to other Unicode-compliant fonts, using Unicode-compatible word processors or HTML editors, such as Word, FrontPage, or Composer. Do not use Unicode-incompatible editors (such as Notepad of Win9x/Me) to edit UTF-8 files. Doing so would corrupt the UTF-8 byte sequence, rendering the characters or the file unreadable. It is assumed that the Unicode Composite text files to be converted were saved in UTF-8 format. Note: It is recommended that
Microsoft Word/Excel not open any file when you convert Word/Excel documents. It may cause
errors or slow down the conversion process. |