Encoding conversion in java

Is there any free java library which I can use to convert string in one encoding to other encoding, something like iconv? I'm using Java version 1.3.

Asked by: Ryan361 | Posted: 28-01-2022

Answer 1

You don't need a library beyond the standard one - just use Charset. (You can just use the String constructors and getBytes methods, but personally I don't like just working with the names of character encodings. Too much room for typos.)

EDIT: As pointed out in comments, you can still use Charset instances but have the ease of use of the String methods: new String(bytes, charset) and String.getBytes(charset).

See "URL Encoding (or: 'What are those "%20" codes in URLs?')".

Answered by: Walter993 | Posted: 01-03-2022

Answer 2

CharsetDecoder should be what you are looking for, no ?

Many network protocols and files store their characters with a byte-oriented character set such as ISO-8859-1 (ISO-Latin-1).
However, Java's native character encoding is Unicode UTF16BE (Sixteen-bit UCS Transformation Format, big-endian byte order).

See Charset. That doesn't mean UTF16 is the default charset (i.e.: the default "mapping between sequences of sixteen-bit Unicode code units and sequences of bytes"):

Every instance of the Java virtual machine has a default charset, which may or may not be one of the standard charsets.
[US-ASCII, ISO-8859-1 a.k.a. ISO-LATIN-1, UTF-8, UTF-16BE, UTF-16LE, UTF-16]
The default charset is determined during virtual-machine startup and typically depends upon the locale and charset being used by the underlying operating system.

This example demonstrates how to convert ISO-8859-1 encoded bytes in a ByteBuffer to a string in a CharBuffer and visa versa.

// Create the encoder and decoder for ISO-8859-1
Charset charset = Charset.forName("ISO-8859-1");
CharsetDecoder decoder = charset.newDecoder();
CharsetEncoder encoder = charset.newEncoder();

try {
    // Convert a string to ISO-LATIN-1 bytes in a ByteBuffer
    // The new ByteBuffer is ready to be read.
    ByteBuffer bbuf = encoder.encode(CharBuffer.wrap("a string"));

    // Convert ISO-LATIN-1 bytes in a ByteBuffer to a character ByteBuffer and then to a string.
    // The new ByteBuffer is ready to be read.
    CharBuffer cbuf = decoder.decode(bbuf);
    String s = cbuf.toString();
} catch (CharacterCodingException e) {

Answered by: Aldus688 | Posted: 01-03-2022

Answer 3

I would just like to add that if the String is originally encoded using the wrong encoding it might be impossible to change it to another encoding without errors. The question does not state that the conversion here is made from wrong encoding to correct encoding but I personally stumbled to this question just because of this situation so just a heads up for others as well.

This answer in other question gives an explanation why the conversion does not always yield correct results https://stackoverflow.com/a/2623793/4702806

Answered by: Brad414 | Posted: 01-03-2022

Answer 4

It is a whole lot easier if you think of unicode as a character set (which it actually is - it is very basically the numbered set of all known characters). You can encode it as UTF-8 (1-3 bytes per character depending) or maybe UTF-16 (2 bytes per character or 4 bytes using surrogate pairs).

Back in the mist of time Java used to use UCS-2 to encode the unicode character set. This could only handle 2 bytes per character and is now obsolete. It was a fairly obvious hack to add surrogate pairs and move up to UTF-16.

A lot of people think they should have used UTF-8 in the first place. When Java was originally written unicode had far more than 65535 characters anyway...

Answered by: Jack966 | Posted: 01-03-2022

Similar questions

encoding - Java implicit conversion of int to byte

I am about to start working on something the requires reading bytes and creating strings. The bytes being read represent UTF-16 strings. So just to test things out I wanted to convert a simple byte array in UTF-16 encoding to a string. The first 2 bytes in the array must represent the endianness and so must be either 0xff 0xfe or 0xfe 0xff. So I tried creating my byte array as follows: byte[] bytes = ne...

Java file encoding conversion from ANSI to UTF8

I have a requirement to change the encoding of a file from ANSI(windows-1252) to UTF8. I wrote below program to do it through java. This program converts the characters to UTF8, but when I opened the file in notepad++ the encoding type was displayed as ANSI as UTF8. This gives me error when I import this file in access db. A file with UTF8 encoding only is desired. Also the requirement is to convert the file without openin...

encoding - Byte and char conversion in Java

If I convert a character to byte and then back to char, that character mysteriously disappears and becomes something else. How is this possible? This is the code: char a = 'È'; // line 1 byte b = (byte)a; // line 2 char c = (char)b; // line 3 System.out.println((char)c + " " + (int)c); Until line 2 everything is fine:

utf 8 - JAVA string encoding conversion issues

I have a piece of code working fine when started in netbeans (this code replaces some string in content.xml file extracted from .odt file): String cont = new String(Utils.readBinaryFile(path + "/content.xml")); for (Patterns p : patterns) { cont = cont.replaceAll(p.search.replaceAll("\\{", "\\\\{"), p.replace.replaceAll("\n", "<text:line-break/>").replaceAll("\\{", "\\\\{")); } Utils.saveToFile(pa...

encoding - Byte Order Conversion in Java for Network and Host order

The topic is rather simple, but I might have confused myself a bit here. Network byte order is Big-Endian. Java by default uses Big-endian as well (as in the case of class files). My windows machine uses Intel processors, which are Little-Endian and windows itself also uses Little-Endian. So, if I uses a java.nio.ByteBuffer.allocateDirect(), then the default Endianness is Little-Endian (because of the OS a...

java - Why does my charset encoding conversion only work for lower case letters?

I have made a work around for my web application, as I failed to se the character encoding to UTF-8 in all scopes when first creating it. I made a simple character conversion java class, so that I could insert character encoding conversion where needed. These are my methods for that: public static String encodeUTF8ToLatin(String s) throws UnsupportedEncodingException { byte[] b = s.getBytes("UTF...

java - Design Pattern to apply conversion to multiple properties in multiple classes

I am using the WMD markdown editor in a project for a large number of fields that correspond to a large number of properties in a large number of Entity classes. Some classes may have multiple properties that require the markdown. I am storing the markdown itself since this makes it easier to edit the fields later. However, I need to convert the properties to HTML for display later on. The question is: is there som...

Java to C# Conversion

I need to convert several Java classes to C#, but I have faced few problems. In Java I have following class hierarchy: public abstract class AbstractObject { public String getId() { return id; } } public class ConcreteObject extends AbstractObject { public void setId(String id) { this.id= id; } } There are implementation of AbstractObject which do ...

Java nested list to array conversion

What is the most efficient way to convert data from nested lists to an object array (which can be used i.e. as data for JTable)? List<List> table = new ArrayList<List>(); for (DATAROW rowData : entries) { List<String> row = new ArrayList<String>(); for (String col : rowData.getDataColumn()) row.add(col); table.add(row); } // I'm doing the conversion manually n...

java - Piecemeal Conversion from Struts to Tapestry 5

I have a Struts (1.3.8) application that I'd like to convert to Tapestry 5. There will probably not be time to do the whole conversion in one fell swoop. I'd like to deliver new functionality in Tapestry and convert existing Struts / JSPs as time permits. Has anyone attempted something like this? Can Struts and Tapestry co-exist?

unicode - How expensive is java's string encoding conversion?

I was wondering how expensive Java's string encoding conversion algorithms are, say, for a piece of text is in EBCDIC that needs to be converted to UTF-16, or for a similar conversion of a large file. Are there any benchmarks on the cost of this conversion? Benchmarks for multiple encodings would be better.

java string to datetime conversion issue

I can't seem to see the problem with the example code below. For some reason seems to be ignoring the year and saying the dates are the same, as can be seen in the output below. I must be missing something simple. 01/28/2006 01/16/2007 Tue Apr 01 00:00:00 PDT 2008 Tue Apr 01 00:00:00 PDT 2008 done import java.util.*; import java.text.DateFormat; import java.text.ParseEx...

java - JSP/HTML Page to PDF conversion

Closed. This question does not meet Stack Overflow guid...

c# - byte[] to string to byte array conversion did not work fine in java

I have a byte array initialised like this: public static byte[] tmpIV = {0x43, (byte)0x6d, 0x22, (byte)0x9a, 0x22, (byte)0xf8, (byte)0xcf, (byte)0xfe, 0x15, 0x21, (byte)0x0b, 0x38, 0x01, (byte)0xa7, (byte)0xfc, 0x0e}; If I print it it gives me 67 109 34 -102 34 -8 -49 -2 21 33 11 ...

smartcard - Java Card Conversion

I have written a Java Card App and want to upload it to a Card. But the card supports Java 2.1.1 standard. So I am looking for a way of generating a 2.1.1 Compliant CAP file with my 2.2.2 Kit I have: Java 1.6 GPShell 1.4.2 Java Card 2.2.2 JCOP 30 V2 Card

Automatic Java to C++ conversion

This question already has answers here:

Still can't find your answer? Check out these amazing Java communities for help...

Java Reddit Community | Java Help Reddit Community | Dev.to Java Community | Java Discord | Java Programmers (Facebook) | Java developers (Facebook)