Vim, Putty, SSH and MySQL in UTF-8 and Chinese Ideograms Display

Created: Friday, October 14, 2011 20:36:00
Updated: Friday, June 29, 2018 21:51:53

Vim or vi in a Lenny Debian Linux session is configured in a native way on ISO-Latin1 charset. Until today, Unix OS in general, as well as other console programs like MsDos, MySQL native client (mysql.exe) are not yet intrinsically in Unicode. It's not easy to visualize UTF-8 text for multi-byte files such as chinese writings.

The webmasters manage their Debian Linux Server at distance via a SSH session by PuTTY program from Windows 7 system, for example.

But a Chinese Windows is not required to show chinese ideograms, we can do it under Occidental Windows, French or English version.

On vim under SSH session, to visualize files encoded in charset=UTF-8, whatever it is Simplified Chinese or in Arabic, one must configure at two extermities: Linux and Windows Seven.

At the Linux Side

One must install UTF-8 locales, the chinese fonts and modify the configuration ~/.vmimrc.

  1. Generate the UTF-8 Locales under Linux

    • Connect you to your Linux Server on SSH by PuTTY with the account root or by another administrator account.
    • Run the command
      # dpkg-reconfigure locales
      .
    • Select the locales that you need for your multi-byte languages.
      To select, you must type the [SPACE] bar in the case. For my case the Simplified Chinese, with some lux, I choosed these ones:

      en_US ISO-8859-1
      zh_CN GB2312
      zh_CN.GBK GBK
      zh_CN.UTF-8 UTF-8

      You can also validate the option All locales. Here you would have all available locales, but they would take several dozens of Mb of hard disk at your hoser and would take several minutes of installation.
  2. Install Chinese UTF-8 Fonts under Linux

    • Once the locales acquired under your Linux OS, we shall add some UTF-8 fonts in your server. On the web among the free Chinese Fonts available for unix system there is the arphic bundle, a tawanese editor, we need these ones:

      ttf-arphic-gbsn00lp (AR PL SungtiL GB)
      ttf-arphic-gkai00mp (AR PL KaitiM GB)
      ttf-arphic-bsmi00lp (AR PL Mingti2L Big5)
      ttf-arphic-bkai00mp (AR PL KaitiM Big5)

      To install these fonts, one must run the command
      # apt-get install ttf-arphic-bkai00mp ttf-arphic-bsmi00lp ttf-arphic-gbsn00lp ttf-arphic-gbsn00lp
      . This will create files bkai00mp.ttf bsmi00lp.ttf gbsn00lp.ttf under the folder /usr/share/fonts/truetype/arphic, they are sufficient to display the simplified chinese characters.
  3. Modify the vim configuration file ~/.vimrc

    • In the per user configuration file ~/.vimrc, add following lines:

      "...
      set fileencodings=utf-8,gb2312,gbk,gb18030
      set termencoding=utf-8
      set encoding=utf-8
      "...

      All the three directives are required to display the encoded UTF-8 Chinese Ideograms.
  4. You are now ready on the side of Linux Debian 5 (Lenny).

    • The joined file utf8sample.txt is in UTF-8 without BOM. When on types the unix command
      # vi utf8sample.txt
      here is the chinese text: We observe that there is still something blocking, the chinese ideograms in Unicode sent by SSH Linux are shown by Windows in ASCII Latin-1.

At the Windows 7 side, Windows Vista, Windows XP

The modern Windows is by default conforming to the UTF-8 norm, one has nothing to do for UTF-8 Locales under Windows. Nor for the character fonts, since the UTF-8 font is installed by default: Arial under Windows Microsoft. The Arial font can fill the role, even though it is nor ideal, since this font is not of constant pitch for every alphabet, this is not favorable to coding in programming, in English, in French for example. But for the first step, it's sufficient.

  1. Configure the Translation of PuTTY under Windows

    • Run putty.exe under Windows. Click on Category/Windows/Translation, in the combobox Remote character set select UTF-8.
  2. Selection a Chinese UTF-8 font for PuTTY under Windows

    • Then Choose Category/Windows/Appearence, check first the checkbox [x] Allow selection of variable-pitch fonts, and click on the button [Change...].
    • On the popup Fonts Window, select a UTF-8 font. The TrueType SimSun (SimSun.ttf) would be welcome, but if you have not it, use Arial as show the screenshot. Click on [OK].
  3. Define the SSH PuTTY Session for the connetion under Windows

    • Then Select Category/Session, Enter your server domain name or IP address for Host Name (or IP Address). Give a name for Saved Sessions), here Session Linux, click on the button [Save...].
  4. You are now ready for a SSH connetion by clicking on the button [Open].

Tests

Now you are connected to your Linux host at distance.

  1. Test Shell

    • Here we test first if our SSH PuTTY session is on Unicode, by running the Shell command by a Unicode character:
      # printf "\xc3\xa9\n"
      It would show é on a Unicode terminal. On a Latin-1 terminal, it would give é.
    • Here is the changement after the operation. The default locale C becomes the locale POSIX that takes into account UTF-8, by typing the command:
      # locale
      The LANG is not predefined, so selected by default by Linux.
  2. Test vi, vim editor

    • The joined file utf8sample.txt is in UTF-8 without BOM. When on types the unix command
      # vi utf8sample.txt
      here is the chinese text so desired: Another test file utf8bomsample.txt on UTF-8 with BOM (Byte Order Mark) will give the same display result. Please be quiet to the dos file format indication, since the chinese text has been input under Windows by NJStar and saved in the local PC.

More Chinese on vi, vim

The ordre of the configuration has no importance. You can start by the Linux side, or by the Windows 7 side. Moreover, to simply display chinese ideograms on vim, you have no need to define the accurate locale by yourselves, the genuin Linux does it for you.

If you would like to also have a Chinese vim interface, you must define the locale via the global Shell variable $LANG among the following possibilities:

zh_CNGB2312
zh_CN.GBKGBK
zh_CN.UTF-8UTF-8

Please note that for each locale separated by a space, the only first part is the valid locale name, for example, here is a good declaration:

# export LANG="zh_CN.UTF-8"

With this setting, the SHELL interface is also in the language zh_CN.UTF-8, who shows the system messages in Chinese. Here are some test commands run and ressults shown:

# date
# df
# ls nonexistent.file
.

MySQL Native Client and Chinese

This setting is also valid for the native client of MySQL /usr/bin/mysql. To display contents stocked in Simplified chinese in a table, you run mysql by

# mysql --default-character-set=utf8 -u<unsername> -p<password> <your_dbname>
Or you launch mysql without the option --default-character-set, then under the MySQL console, do a special definition query
mysql> SET NAMES 'utf8' COLLATE utf8_general_ci;
. And then under the MySQL console, launch the query like
mysql> SELECT NameCn FROM products WHERE (ID=358);
. The Chinese characters will be displayed in the result.

MySQL Native Client and Chinese in Windows

It's impossible to show UTF-8 Chinese in the MySQL native client mysql.exe, since it's launched by the Shell, here cmd32.exe or MS-DOS. MS-DOS does not accept Asian UTF-8 fonts for the standard terminal, no matter it is FangSong, SimSun... MySQL is so limited to Ms-DOS terminal fonts approved by Microsoft: Raster Terminal, TrueType Lucida Console, Consolas that offer no chinese ideogram. Even if one modifies Windows Register, there is any effect for the Chinese Fonts. And changing the Code Page as Unicode: After all this hard work, in the MySQL console, the chinese ideograms still appear in Hollow Square. This means that the font is Unicode, which supports the encoding of Chinese characters, but in which the glyphs, here the chinese ideograms are not drawn by the creator of the loaded font. To show Simplified Chinese contents stored in a MySQL table, you should use MySQL GUI Administrator Tools that is a FreeWare MySQL GUI Administrator Tools. Or by PHPMyAdmin via a Browser.

References

  1. UTF-8 and Unicode FAQ of Markus Kuhn for Unix/Linux
    a comprehensive one-stop information resource on how you can use Unicode/UTF-8 on POSIX systems (Linux, Unix).

Give us feedback (0)

Add URL |
No any user feedback.

Email
Web

Or sign in:
Who?

Please copy the string:
String to be copied in the box at the right that is case-sensitive. This pain is necessary after receiving thousands of messages generated automatically per day.

Asia Home™ > Tools > Vim, Putty, SSH and MySQL in UTF-8 and Chinese Ideograms Display | General Sales Conditions | Returns and refunding | Privacy Policy | FAQ
  

Want to come? | Call us 7/700 33 467 790 487 (no surcharge)

Popup Window    Close
Patience please...