I am working with a large text file, and need to insert a linebreak after every character in the text. Unlike other text tools I have tried, UltraEdit is able to handle the large amount of text really quickly (pretty much everything else I have tried times out).
I can use (Perl) regex to find each character using either
or
and replace with
This inserts the line breaks, but UltraEdit loses any diacritic characters in the original string, replacing them with non-character codepoints. So, for example, if my input string is
the output of the find/replace operation is
The individual Unicode diacritic characters à and é are being replaced by sequences of two U+FFFD REPLACEMENT CHARACTER codes.
Is there a way to prevent this? I tested short strings like this in TextMate and Sublime Text, and they didn’t mess up the diacritics like this, but they can’t handle my large text file.
I can use (Perl) regex to find each character using either
Code:
(.)
Code:
(\X)
and replace with
Code:
$1\n
Code:
Union à Dieuchez Denys l'Aréopagite
Code:
Union �� Dieuchez Denys l'Ar��opagite
Is there a way to prevent this? I tested short strings like this in TextMate and Sublime Text, and they didn’t mess up the diacritics like this, but they can’t handle my large text file.
statistics: Posted by tiro_j — 19:26 - 1 day ago — Replies 2 — Views 37