Recode commandline utility
Recode is a commandline utility to convert a textfile in some encoding to a textfile in another encoding. It is for example possible to convert a file using the Windows-1252 codepage to a file using UTF-8 in order to transfer the file to a Unix system. The commandline tool itself is a Windows program that requires the .NET framework. Versions:
- Recode utility: v0.1
- .NET framework: v3.5
- This information: 2010-10-10
Use case
The utility was introduced in a scenario where files were received, to be loaded in a SQL Server database. Loading files was done with BULK INSERT and using .fmt format files. This approach allowed for import of files in Windows-1252 encoding or UTF-16 encoding. The supplied files were however encoded with UTF-8. An extra step using the recode utility solved the issue.
Invocation
Invoke recode in a DOS box.
Copy Code | |
---|---|
recode infile outfile /help /? /list /info /in= /out= /bom /force /auto /test |
Invoking recode without parameters results in a brief usage instruction:
Copy Code | |
---|---|
Usage: recode infile outfile /help /? /list /info /in= /out= /bom /force /auto /test Recodes an input file to an output file. Version 0.1. infile Specification of the input file. outfile Specification of the output file. /help Requests the usage summary (this text). /? Equal to /help. /list Requests a list of the supported encodings. /info Displays information about the encoding of the input file. /in= Encoding of the input file. Use the name, code page or description of the encoding as returned by /list. /out= Encoding of the output file. Supply a value as for /in=. /[no]bom Produce an output file with or without byte order mark. The default is /bom. /force Forces recoding, even if there is a mismatch between the contents of the input file and the encoding specified for the input file. /auto Autodetect the encoding of the input file. This works for UTF with BOM, and this works for ASCII. This does not work for code page encodings. /test Test by performing an inverse recode of the output file and by comparing the result to the original input file. |
A typical invocation is for example:
Copy Code | |
---|---|
recode in.txt out.txt /in=1252 /out=utf-8 |
Parameters
Parameter | Default | Description |
---|---|---|
infile | Specification of the input file. | |
outfile | Specification of the output file. | |
/help | Results in a usage summary. | |
/? | Results in a usage summary. | |
/list | /nolist | Produces a list of the supported encodings. When using /list the utility performs no further operation. |
/info | Analyzes the input file and displays the encoding information that could be detected. Depending on the presence of a byte order mark and depending on the file contents this may be more or less information. When using /info the utility performs no further operation. /info is the default if the commandline contains one file specification. /noinfo is the default if the commandline contains two file specifications. | |
/in= | Specification of the input file encoding, for example /in=1252 or /in=Windows-1252. Specify an encoding name, a codepage number or an encoding description as returned by /list. Typical encodings are utf-16, Windows-1252, us-ascii, iso-8859-1 and utf-8. | |
/out= | Specification of the output file encoding, as for /in=. | |
/[no]bom | /bom | Use /bom to let the output file start with a byte order mark. This is applicable for the UTF encodings. The byte order mark helps text editors and other applications in detecting the proper encoding. Use /nobom to omit the byte order mark. |
/force | /noforce | The utility analyzes the input file. In a number of cases it is able to detect the encoding. This allows to report a mismatch between the detected encoding and the encoding specified with /in=. Use /force to report a mismatch as a warning and to force an attempt to recode. Use /noforce to report a mismatch as an error and to abort recoding. |
/auto | /noauto | Request to autodetect the encoding of the input file. This works for UTF-8, UTF-16 and UTF-32 files having a byte order mark. This also works for files that are entirely in ASCII. It works for UTF-8 files without a byte order mark, with some risk that a file adheres to the UTF-8 format but is not UTF-8. It does not work for files using a codepage. If the utility fails to autodetect the encoding it will report an error. |
/test | /notest | Tests the result of the recoding by performing an inverse recoding of the output file and by comparing that result to the original input file. This allows to detect a loss of information when recoding to a narrower encoding, for example when recoding from Windows-1252 to ASCII or when recoding from UTF-8 to Windows-1252. A successful test does not prove that an assumed encoding for the input file is correct. It is for example possible to recode a binary file such as a jpg image from Windows-1252 to UTF-8. The inverse recoding results in the original input file, the input file is however not Windows-1252. |
Commandline result
The commandline utility returns 0 on success and returns 1 on failure. This supports integrating the utility in batch files and to take measures on failures. The following cases are reported as failures.
- Missing information on the commandline, for example a missing specification of the output encoding.
- A missing input file or an input file that cannot be read.
- Failure to write the output file, for example due to permissions or because of a lack of disk space.
- A mismatch between the encoding detected for the input file and the encoding specified on the commandline, unless /force is specified.
- An attempt to encode the output file as ASCII, where non-ASCII characters are already detected for the input file, unless /force is specified.
- An error during the actual decoding of the input file or encoding of the output file.
Resources
The commandline utility, the source and a manual page are available for download in the see-also section.
---