Usage:
UnicodeConverter comes in three versions—Java, Windows, and .NET executable; they sport similar graphic user interface and are capable of converting multiple files, a directory, including subdirectories, or an entire website. They all include conversion support for text, RTF, HTML, and Word/Excel/PowerPoint file types.
The Java program requires Java Runtime Environment 6 or later.
The .NET version requires Microsoft .NET Framework 4.0 Redistributable.
The Windows version requires Microsoft Java Virtual Machine to run. Any problems encountered while launching the program can mostly be solved by installing or updating to the latest Microsoft VM (build 3805 or later) through Microsoft Windows Update. Note: This version has been deprecated.
Preparations
To ensure successful conversion of HTML files in legacy formats and to minimize post-conversion editing, some pre-conversion conditioning needs to be performed on the HTML source files. Removing obsolete dynamic font links (.pfr or .eot) and associated ActiveX control scripts (e.g., tdserver.js) is recommended (yellow text in the illustration), for leaving them in will needlessly slow down page download.
<html>
<head>
<meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1252">
<title>HISTORY OF VIETNAM</title>
<link REL="FONTDEF" SRC="http://www.vietnam.org/vni.pfr">
<script LANGUAGE="JavaScript" SRC="http://www.vietnam.org/tdserver.js">
</script>
<link>
</head>
<body bgcolor="#FFFFFF" link="#FF0000" vlink="#FF0000">
<font FACE="VNI-Times">
<h1>HISTORY OF VIETNAM</h1>
Changing the original document fonts to the more common ones with respect to its original encoding may be needed.
Encoding | Fonts for original HTML document |
VNI | VNI-Times, VNI Times, VNI-Aptima, VNI Aptima, VNI-Helve, VNI Helve |
VPS | VPS Times, VPS Helv |
VISCII | VI Times, VI Arial, HoangYen, MinhQuân, PhuongThao, ThaHuong, UHoài |
TCVN3 | .VnTime, .VnArial |
VIQR | No font formatting |
These basic editing tasks should be done prior to the actual conversion process and can be expeditiously performed using MDI (multiple document interface) text editors which allow opening multiple files and performing global find/replace actions on all open files at once. CuteHTML, TextPad, UltraEdit, EditPlus, EditPad, etc. are some text editors that sport such useful features. They can be searched and downloaded from http://www.download.com.
Running UnicodeConverter
- Java: Launch the conversion program by double-clicking on the
UnicodeConverter.jar
file or icon or by executing the following command at the command line:
java -jar UnicodeConverter.jar
or
javaw -jar UnicodeConverter.jar
Note: Be sure the directory that contains theUnicodeConverter.jar
file is the current directory.
Windows or .NET: LaunchUni.exe
from Windows desktop or explorer.
- Select the encoding of the source files and click Select Files if you want
to convert files or click the Entire directory, including sub checkbox to
switch to directory selection mode.
To convert a directory/subdirectories using the Windows program, due to Windows file dialog's inability to select a directory, select instead any file in the directory to provide the program a cue as to what directory is to be converted; in cases if there is no file available to be selected, create in that directory an empty file that has the same file extension as the type of file you want to perform conversion on.
- Use the file filter to choose the type of files to work on. Select the files or
directory to be converted from the file dialog box. Multiple files can be selected
by clicking on the files while pressing and holding Shift or Control key down.
- Click Convert. A message box will appear shortly indicating the directory
where the output files are placed. During conversion, a status panel will pop up
showing a list of the files that have been processed.
- Done.
The resulting Unicode output files are placed in a x_Unicode
directory
located at the same tree level as the source directory that contains the original
files, which remain unchanged. UTF-8 HTML files are viewable on any Unicode-enabled
web browsers, such as Firefox, Netscape, Mozilla, Internet Explorer, Opera, or Safari.
The default fonts for the output files are Times New Roman and Arial. Users can change to other Unicode-compliant fonts, using Unicode-compatible word processors or HTML editors, such as Word, FrontPage, or Dreamweaver. Do not use Unicode-incompatible editors (such as Notepad of Win9x/Me) to edit UTF-8 files. Doing so would corrupt the UTF-8 byte sequence, rendering the characters or the file unreadable.
It is assumed that the Unicode Composite text files to be converted were saved in UTF-8 format.
Note: It is recommended that Microsoft Word/Excel/PowerPoint not
open any file when you convert Word/Excel/PowerPoint documents. It may cause errors or slow
down the conversion process.
Tip: Minimize the number of text boxes within Word documents
to a few; having too many will slow down conversion significantly.