Quantcast
Channel: UltraEdit, UltraCompare, UEStudio forums
Viewing all articles
Browse latest Browse all 307

Find/Replace/Regular Expressions • Regex find/replace messing up diacritic characters

$
0
0
I am working with a large text file, and need to insert a linebreak after every character in the text. Unlike other text tools I have tried, UltraEdit is able to handle the large amount of text really quickly (pretty much everything else I have tried times out).

I can use (Perl) regex to find each character using either

Code:

(.)
or

Code:

(\X)

and replace with

Code:

$1\n
This inserts the line breaks, but UltraEdit loses any diacritic characters in the original string, replacing them with non-character codepoints. So, for example, if my input string is

Code:

Union à Dieuchez Denys l'Aréopagite
the output of the find/replace operation is 

Code:

Union �� Dieuchez Denys l'Ar��opagite
The individual Unicode diacritic characters à and é are being replaced by sequences of two U+FFFD REPLACEMENT CHARACTER codes.


Is there a way to prevent this? I tested short strings like this in TextMate and Sublime Text, and they didn’t mess up the diacritics like this, but they can’t handle my large text file.

statistics: Posted by tiro_j19:26 - 1 day ago — Replies 2 — Views 37



Viewing all articles
Browse latest Browse all 307

Latest Images

Trending Articles