Introduction
In this article, we will briefly discuss the program encoding, and then we will understand the source encoding and default external encoding.

Before understanding the source encoding and default external Encoding, let's discuss what program encoding is?
What is Program Encoding?
A Ruby program is fundamentally just a string of characters. The ASCII character set is used to establish the lexical rules of the Ruby programming language. Allowable whitespace characters include the horizontal tab (ASCII 9), newline (10), vertical tab (11), form feed (12), carriage return (13), and space. Comments can start with the # character (ASCII code 35), for instance, (32). All Ruby keywords are written in ASCII characters, and the ASCII character set is also used for all operators and other punctuation.
The Ruby interpreter, by default, takes the ASCII encoding of Ruby source code as given.
-
It is not necessary to use this Encoding; the interpreter can handle files that do so as long as they can represent the entire ASCII character set. The Ruby interpreter needs to know what Encoding to employ to be able to translate the bytes in a source file into characters. Ruby files can recognize their encodings, or you can instruct the interpreter on how to encrypt them. I'll explain how to do that shortly.
-
The characters that appear in a Ruby program are fairly versatile with the Ruby interpreter. Beyond that, a Ruby program may contain any characters permitted by the Encoding. Some ASCII characters have specific meanings, while some ASCII characters are not allowed in identifiers. The possibility of characters beyond the ASCII character set appearing in identifiers was previously discussed. Similar rules apply to comments, string literals, and regular expression literals: all characters other than the delimiter character that signifies the end of the remark or literal are permitted.
- Strings in ASCII-encoded files may contain any number of bytes, including those that represent control characters that don't print. (Using raw bytes like this is not advised; instead, use Ruby string literals that provide escape sequences to incorporate any nature via numeric code.) If the file was created using the UTF-8 Encoding, arbitrary Unicode characters may be used in comments, strings, and regular expressions. Strings may include Kanji characters if the file is encoded using the Japanese SJIS or EUC encodings.
Source Encoding and Default External Encoding
-
It's critical to comprehend Ruby 1.9 between a Ruby file's source encoding and a process's default external Encoding. As we mentioned before, the source encoding instructs the Ruby interpreter on how to read characters in a script. Coding comments are often used to set source encodings. A Ruby application might consist of many files, each with a unique source encoding.
-
The Encoding of a file's string literals depends on its source encoding.
-
Ruby employs a different encoding by default when reading from files and streams, known as the default external Encoding. The default external Encoding is constant across all files in the Ruby process. The locale that your computer is set to determine the default external encoding in most cases. However, as we'll explain in a moment, you may also explicitly define the default external encoding with command-line parameters. The Encoding of string literals is unaffected by the default external Encoding.
-
Earlier, we discussed how to provide the source encoding with the -K interpreter option. This option sets the process's default external Encoding and uses that Encoding as the process's default source encoding.
- The -K option is present in Ruby 1.9 for backward compatibility with Ruby 1.8, although it is not the recommended method for changing the default external encoding. Thanks to the new options- E and- encoding, you can now specify an encoding by its complete name rather than a one-character abbreviation.
For Example
ruby -E utf-8 # Encoding name follows -E
ruby -Eutf-8 # The space is optional
ruby --encoding utf-8 # Encoding following --encoding with a space
ruby --encoding=utf-8 # Or use an equals sign with --encoding
You can query the default external encoding with Encoding.default_external. This class method returns an Encoding object. Use Encoding.locale_charmap to obtain the name (as a string) of the character encoding derived from the locale. This method is always based on the locale setting and ignores command-line options that override the default external Encoding.