SourceForge.net Logo
UnicodeConverter

Usage:

UnicodeConverter comes in three versions—Java, Windows, and .NET executable; they sport similar graphic user interface and are capable of converting multiple files, a directory, including subdirectories, or an entire website. They all include conversion support for text, RTF, HTML, and Word/Excel file types.

The Java program requires Java Runtime Environment, 6 or later.

The Windows version requires Microsoft Java Virtual Machine to run. Any problems encountered while launching the program can mostly be solved by installing or updating to the latest Microsoft VM (build 3805 or later) through Microsoft Windows Update.

The .NET version requires Microsoft .NET Framework 2.0 Redistributable.

Preparations

To ensure successful conversion of HTML files in legacy formats and to minimize post-conversion editing, some pre-conversion conditioning needs to be performed on the HTML source files. Removing obsolete dynamic font links (.pfr or .eot) and associated ActiveX control scripts (e.g., tdserver.js) is recommended (yellow text in the illustration), for leaving them in will needlessly slow down page download.

<html>
<head>
<meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1252">
<title>HISTORY OF VIETNAM</title>
<link REL="FONTDEF" SRC="http://www.vietnam.org/vni.pfr">
<script LANGUAGE="JavaScript" SRC="http://www.vietnam.org/tdserver.js">

</script>
<link>

</head>

<body bgcolor="#FFFFFF" link="#FF0000" vlink="#FF0000">
<font FACE="VNI-Times">

<h1>HISTORY OF VIETNAM</h1>

Changing the original document fonts to the more common ones with respect to its original encoding may be needed.

Encoding Fonts for original HTML document
VNI VNI-Times, VNI Times, VNI-Aptima, VNI Aptima, VNI-Helve, VNI Helve
VPS VPS Times, VPS Helv
VISCII VI Times, VI Arial, HoangYen, MinhQuân, PhuongThao, ThaHuong, UHoài
TCVN3 .VnTime, .VnArial
VIQR No font formatting

These basic editing tasks should be done prior to the actual conversion process and can be expeditiously performed using MDI (multiple document interface) text editors which allow opening multiple files and performing global find/replace actions on all open files at once. CuteHTML, TextPad, UltraEdit, EditPlus, EditPad, etc. are some text editors that sport such useful features. They can be searched and downloaded from http://www.download.com.

Running UnicodeConverter

  1. Java: Launch the conversion program by double-clicking on the Uni.jar file or icon or by executing the following command at the command line:

        java -jar Uni.jar
    or
        javaw -jar Uni.jar

    Note: Be sure the directory that contains the Uni.jar file is the current directory.

    Windows or .NET: Launch Uni.exe from Windows desktop or explorer.

  2. Select the encoding of the source files and click Select Files if you want to convert files or click the Entire directory, including sub checkbox to switch to directory selection mode.

    To convert a directory/subdirectories using the Windows program, due to Windows file dialog's inability to select a directory, select instead any file in the directory to provide the program a cue as to what directory is to be converted; in cases if there is no file available to be selected, create in that directory an empty file that has the same file extension as the type of file you want to perform conversion on.


     
  3. Use the file filter to choose the type of files to work on. Select the files or directory to be converted from the file dialog box. Multiple files can be selected by clicking on the files while pressing and holding Shift or Control key down.



     
  4. Click Convert. A message box will appear shortly indicating the directory where the output files are placed. During conversion, a status panel will pop up showing a list of the files that have been processed.



     
  5. Done.

The resulting Unicode output files are placed in a x_Unicode directory located at the same tree level as the source directory that contains the original files, which remain unchanged. UTF-8 HTML files are viewable on any Unicode-enabled web browsers, such as Firefox, Netscape, Mozilla, Internet Explorer, Opera, or Safari.

The default fonts for the output files are Times New Roman and Arial. Users can change to other Unicode-compliant fonts, using Unicode-compatible word processors or HTML editors, such as Word, FrontPage, or Dreamweaver. Do not use Unicode-incompatible editors (such as Notepad of Win9x/Me) to edit UTF-8 files. Doing so would corrupt the UTF-8 byte sequence, rendering the characters or the file unreadable.

It is assumed that the Unicode Composite text files to be converted were saved in UTF-8 format.

Note: It is recommended that Microsoft Word/Excel not open any file when you convert Word/Excel documents. It may cause errors or slow down the conversion process.

Tip: Minimize the number of text boxes within Word documents to a few; having too many will slow down conversion significantly.