AWT TextArea seems to count cr-lf sequence as one character

I have this problem:

The text "ABCD\r\nEFGHJ" loaded from a file is matched with java regex "EFGH". Matcher object of course says start of the matched string is in position 6. The matcher counts \r \n as two positions.

I put the original text in a AWT TextArea Component and then call select(6,10) to highlight the area which was matched. Guess what... it starts highlighting from 'F' letter. One position forward than it should...

If more than 1 pair of crlf precedes matched area then highlighting moves even more forward than it should.

Anyone has any simple solution?

Asked by: Anna658 | Posted: 21-01-2022

Answer 1

Simple solution: remove all \r from the text... :-P

Not as stupid as it sounds, unless you have inconsistent end of lines (it can happen) and want to keep them unchanged... And that's probably what the component does anyway.

Answered by: Dainton926 | Posted: 22-02-2022

Answer 2

I cant mess with the text because it is protocol data and \r and \n characters have semantics that dont have to do with display or line separation. I just want a component that will treat each one input character separately and treat it as one displayed and counted, no matter how it is displayed.

Answered by: Max686 | Posted: 22-02-2022

Answer 3

If the \r\n are consistent, you can remove the \r's before running the regex, then replace them before handing off to whatever is next. Or change a copy, if that works better. This way, your regex finds the position in a way consistent with what AWT is expecting.

Answered by: Julian820 | Posted: 22-02-2022

Similar questions

java - Why unicode character \u0004 is not showing in Javafx TextArea

I have algorithm which converts given unicode string to some other form. User has to provide this unicode string via TextArea. This unicode string contains (\u0004) character which will process by algorithm; Exa...

java - how can I now the index of entered character for textArea in javafx

this is my fxml area <TextArea fx:id="textArea" onKeyTyped="#handleTextAreaAction" BorderPane.alignment="CENTER"> <BorderPane.margin> <Insets bottom="5.0" left="5.0" right="5.0" top="5.0" /> </BorderPane.margin> <padding> <Insets bottom="1.0" left="1.0" right="1.0" top="1.0" /> </padding> <...

java - Oracle JDBC Euro character

We have a problem with the Euro character when saving and retrieving it from a Oracle 10g using the Oracle JDBC driver. The problem only occurs during a JUnit test running under Linux. The Euro characters returned from database after saving are total screwed up. Oracle has been configured to use character set "WE8MSWIN1252". Could it be that Linux cannot work with this character set?

java - How do I append a newline character for all lines except the last one?

I'm iterating through a HashMap (see my earlier question for more detail) and building a string consisting of the data contained in the Map. For each item, I will have a new line, but for the very last item, I don't want the new line. How can I achieve this? I was thinking I could so some k...

regex - Java: how to check if character belongs to a specific unicode block?

I need to identify what natural language my input belongs to. The goal is to distinguish between Arabic and English words in a mixed input, where the input is Unicode and is extracted from XML text nodes. I have noticed the class Character.UnicodeBlock. Is it related to my problem? How can I get it to work? Edit: The Character.UnicodeBlock approac...

java - How can I check if a single character appears in a string?

In Java is there a way to check the condition: "Does this single character appear at all in string x" without using a loop?

Newline character insertion in Java

How can I insert a newline character in a Java applet program if I am using "label" instead of System.out.println()? Thanks in advance.

java - How to convert UTF-8 character to ISO Latin 1?

I need to convert a UTF-8 trademark sign to a ISO Latin 1, and save it into database, which is also ISO Latin 1 encoded. How can I do that in java? I've tried something like String s2 = new String(s1.getBytes("ISO-8859-1"), "utf-8"); but it seems not work as I expected.

Character Encoding Trouble - Java

I've written a little application that does some text manipulation and writes the output to a file (html, csv, docx, xml) and this all appears to work fine on Mac OS X. On windows however I seem to get character encoding problems and a lot of '"' seems to disappear and be replaced with some weird stuff. Usually the closing '"' out of a pair. I use a FreeMarker to create my output files and there is a byte[] array a...

java - External javascript character encoding on Webshpere

How can one set character encoding on external JavaScript files using only Websphere (5.1)? I don't have Apache in front of it so I can't set it using "AddCharset UTF-8 .js". Or maybe there is some other way to force it on a web container via web.xml or similar magic?

unicode - How to check if a Java character is a currency symbol

I have to perform a check on a character variable to see whether or not it is a currency symbol. I have discovered the Character.UnicodeBlock.CURRENCY_SYMBOLS constant however I am unsure of how to use this to determine whether or not the character is in that block. If anyone has done this before help would be much appreciated. Thanks

java - Character Encoding Detection Algorithm

I'm looking for a way to detect character sets within documents. I've been reading the Mozilla character set detection implementation here: Universal Charset Detection I've also found a Java implementation of this called jCharDet: JCharDet

Still can't find your answer? Check out these amazing Java communities for help...

Java Reddit Community | Java Help Reddit Community | Java Community | Java Discord | Java Programmers (Facebook) | Java developers (Facebook)