Recode commandline utility

Release 1.3 - ...

Recode is a commandline utility to convert a textfile in some encoding to a textfile in another encoding. It is for example possible to convert a file using the Windows-1252 codepage to a file using UTF-8 in order to transfer the file to a Unix system. The commandline tool itself is a Windows program that requires the .NET framework. Versions:

  • Recode utility: v0.1
  • .NET framework: v3.5
  • This information: 2010-10-10

Use case

The utility was introduced in a scenario where files were received, to be loaded in a SQL Server database. Loading files was done with BULK INSERT and using .fmt format files. This approach allowed for import of files in Windows-1252 encoding or UTF-16 encoding. The supplied files were however encoded with UTF-8. An extra step using the recode utility solved the issue.

Invocation

Invoke recode in a DOS box.

  CopyCode image Copy Code
recode infile outfile /help /? /list /info /in= /out= /bom /force /auto /test

Invoking recode without parameters results in a brief usage instruction:

  CopyCode image Copy Code
Usage:
recode infile outfile /help /? /list /info /in= /out= /bom /force /auto /test

Recodes an input file to an output file. Version 0.1.

infile   Specification of the input file.
outfile  Specification of the output file.
/help    Requests the usage summary (this text).
/?       Equal to /help.
/list    Requests a list of the supported encodings.
/info    Displays information about the encoding of the input file.
/in=     Encoding of the input file. Use the name, code page or description of
         the encoding as returned by /list.
/out=    Encoding of the output file. Supply a value as for /in=.
/[no]bom Produce an output file with or without byte order mark. The default is
         /bom.
/force   Forces recoding, even if there is a mismatch between the contents of
         the input file and the encoding specified for the input file.
/auto    Autodetect the encoding of the input file. This works for UTF with BOM,
         and this works for ASCII. This does not work for code page encodings.
/test    Test by performing an inverse recode of the output file and by
         comparing the result to the original input file.

A typical invocation is for example:

  CopyCode image Copy Code
recode in.txt out.txt /in=1252 /out=utf-8

Parameters

Commandline parameters
Parameter Default Description
infile Specification of the input file.
outfile Specification of the output file.
/help Results in a usage summary.
/? Results in a usage summary.
/list /nolist Produces a list of the supported encodings. When using /list the utility performs no further operation.
/info Analyzes the input file and displays the encoding information that could be detected. Depending on the presence of a byte order mark and depending on the file contents this may be more or less information. When using /info the utility performs no further operation. /info is the default if the commandline contains one file specification. /noinfo is the default if the commandline contains two file specifications.
/in= Specification of the input file encoding, for example /in=1252 or /in=Windows-1252. Specify an encoding name, a codepage number or an encoding description as returned by /list. Typical encodings are utf-16, Windows-1252, us-ascii, iso-8859-1 and utf-8.
/out= Specification of the output file encoding, as for /in=.
/[no]bom /bom Use /bom to let the output file start with a byte order mark. This is applicable for the UTF encodings. The byte order mark helps text editors and other applications in detecting the proper encoding. Use /nobom to omit the byte order mark.
/force /noforce The utility analyzes the input file. In a number of cases it is able to detect the encoding. This allows to report a mismatch between the detected encoding and the encoding specified with /in=. Use /force to report a mismatch as a warning and to force an attempt to recode. Use /noforce to report a mismatch as an error and to abort recoding.
/auto /noauto Request to autodetect the encoding of the input file. This works for UTF-8, UTF-16 and UTF-32 files having a byte order mark. This also works for files that are entirely in ASCII. It works for UTF-8 files without a byte order mark, with some risk that a file adheres to the UTF-8 format but is not UTF-8. It does not work for files using a codepage. If the utility fails to autodetect the encoding it will report an error.
/test /notest Tests the result of the recoding by performing an inverse recoding of the output file and by comparing that result to the original input file. This allows to detect a loss of information when recoding to a narrower encoding, for example when recoding from Windows-1252 to ASCII or when recoding from UTF-8 to Windows-1252. A successful test does not prove that an assumed encoding for the input file is correct. It is for example possible to recode a binary file such as a jpg image from Windows-1252 to UTF-8. The inverse recoding results in the original input file, the input file is however not Windows-1252.

Commandline result

The commandline utility returns 0 on success and returns 1 on failure. This supports integrating the utility in batch files and to take measures on failures. The following cases are reported as failures.

  • Missing information on the commandline, for example a missing specification of the output encoding.
  • A missing input file or an input file that cannot be read.
  • Failure to write the output file, for example due to permissions or because of a lack of disk space.
  • A mismatch between the encoding detected for the input file and the encoding specified on the commandline, unless /force is specified.
  • An attempt to encode the output file as ASCII, where non-ASCII characters are already detected for the input file, unless /force is specified.
  • An error during the actual decoding of the input file or encoding of the output file.

Resources

The commandline utility, the source and a manual page are available for download in the see-also section.

---

 

Topics