Questions tagged [character-encoding]
Questions that deal with various representations of characters & character sets, such as: ASCII, UTF-8, EBCDIC, among others. Often encountered when moving files between operating systems that encode new lines with carriage returns and/or newline characters.
422 questions
6votes
1answer
314views
Revert filenames after they were garbled by using different encoding
I have a file СМП бваг™вга† The first three letters are proper Cyrllic and the remaining part is mojibake. "Mojibake is the garbled or gibberish text that is the result of text being decoded ...
0votes
1answer
49views
Output of echo uses different encoding than the one specified according to LANG and LC_CTYPE
It is my understanding that the LANG and LC_CTYPE environment variables define the encoding used by shell commands when writing to stdout. However, after executing LANG=de_DE.iso88591 LC_CTYPE=de_DE....
0votes
2answers
100views
To have or not Byte Order Mark (BOM) in UTF-8 text files?(Linux)
Is it advisable to have or not Byte Order Mark (BOM) in UTF-8 text files on Linux? Is it correct to say byte order (even for multi-byte characters) is already strictly defined/fixed in UTF-8 standard? ...
0votes
0answers
51views
Advanced CLI tool/code to determine text encoding (besides enca)
Looking for advanced CLI tool/code to determine text Codepage/Language (besides enca). Goal: Automate as much as possible conversion of hundreds/thousands of 8-bit text files (including non-ASCII ...
0votes
0answers
19views
file -i provide two different charsets for the same file on the same FS
I'm a bit confuse about a behavior of the file -i command. I searched a while and give up since I didn't have a sufficient knowledge regarding encoding as well as linux file command (to stay concise ...
-2votes
1answer
56views
Convert subtitles so they are coded correctly (Polish and `"` even gets wrongly coded)
Wrong encoding: 1 00:01:27,879 --> 00:01:31,216 No i dupa. Koniec z darmowym wi-fi. 2 00:01:33,009 --> 00:01:34,972 - Ki-jung! - No? 3 00:01:35,219 --> 00:01:39,183 Kobieta z góry ...
0votes
0answers
47views
What does a locale’s codeset get used for?
According to glibc’s manual: Most locale names follow XPG syntax and consist of up to four parts: language[_territory[.codeset]][@modifier] For example, you could have a locale named zh_CN.GB18030 ...
2votes
1answer
126views
removing hidden control characters in filenames
I have a huge number of files spread across a large directory structure that have hidden control characters in their names. ls lists them as, e.g.: '614.7-4-F1-00-090-007-RozvadØ'$'\302\237'' RP1-...
1vote
1answer
136views
How can I set the character to Latn-1 or MCS when using serial-getty?
I'd like to use my old VT420 terminal as system console. Adding RS232 ports and setting up serial-getty are not a problem, but: For years, almost all Linux distros have been using UTF-8 as the ...
11votes
3answers
2kviews
UTF-8 characters in POSIX shell script *comments* - anything against it?
I would like to include a couple of non-ASCII characters in my POSIX shell script comments. Note this is in no way a duplicate of e.g. "Which character encodings are supported by posix?" as ...
0votes
0answers
16views
Displaying European accents without Xorg [duplicate]
I'm setting up Arch Linux on a new computer, and I intend to use bspwm as a window manager. I know bspwm runs on Xorg, and there is plenty of information online on how to display region-specific ...
0votes
1answer
123views
regex: how come the trademark symbol matches to a-z?
Sorry if this is a repeat or basic question but it is hard to search for a ™. I'm writing a script to remove weird characters from file names. How come the trade mark symbol ™ matches [^a-z] ??? $ ...
0votes
1answer
216views
nmtui is not rendering correctly
When using nmtui, ┌ and │ are added in places where they definitely should not be (see attached screenshot): How can I solve this?
4votes
2answers
1kviews
How can I convert full-width characters to half-width characters (and vice versa)?
Here is my simple problem, how can I convert half-width to full-width from the command line. I thought this would be built-in my iconv command line, but I did not find anything here: $ iconv -l | ...
0votes
1answer
1kviews
If I have a json string how do I calculate the number of bytes needed when stored?
I have a json string formatted displayed in a web page. What I am trying to understand is what is the size in terms of bytes that this json string requires. If I copy and pipe to wc -c I get 1000 ...