Saturday, April 3, 2010

Inputting Tamil character in Vim on XP

Open the file with gvim. The file should have been encoded in UTF-8.
I am opening a file which was created earlier with Notepad with some Tamil text and saved in UTF-8 encoding.
:set enc=utf-8
:set gfn=*
This would pop up a window to choose font
Select TheneeUni, Click OK
This would show any Tamil in text already in the file.
To input Tamil char, enter
xxxx stands for 4 hex chars corresponding to unicode point of the Tamil char
For example, to input Tamil அ

Thanks to
gvim does support Unicode, but it may be easier or harder depending on your OS and its settings. The easiest is of course if you start gvim in a Unicode locale, or, on Unix, if you run a version compiled for the GTK2 toolkit (which uses Unicode by default). Here is a code snippet which you can paste into your vimrc to enable support for Unicode in all versions which have Unicode support compiled-in.
if has("multi_byte")    " if not, we need to recompile
  if &enc !~? '^u'      " if the locale 'encoding' starts with u or U
                        " then Unicode is already set
    if &tenc == ''
      let &tenc = &enc  " save the keyboard charset
    set enc=utf-8       " to support Unicode fully, we need to be able
                        " to represent all Unicode codepoints in memory
  set fencs=ucs-bom,utf-8,latin1
  setg bomb             " default for new Unicode files
  setg fenc=latin1      " default for files created from scratch
  echomsg 'Warning: Multibyte support is not compiled-in.'
You must also set a 'guifont' which includes the glyphs you will need, but most fonts don't cover the whole range of "assigned" Unicode codepoints from U+0000 (well, U+0020 since 0-1F are not "printable") to U+10FFFF (well, U+10FFFD since anything ending in FFFE or FFFF is invalid). If you are like me, you will have to set different fonts at different times depending on what languages you're editing at any particular moment. Courier New has (in my experience) a wide coverage for "alphabetic" languages (Latin, Greek, Cyrillic, Hebrew, Arabic); for Far Eastern scripts you will need some other font such as FZ FangSong or MingLiU.
With the above settings, Unicode files will be recognised when possible:
- Any file starting with a BOM will be properly recognised as the appropriate Unicode encoding (out of, IIUC, UTF-8, UTF-16be, UTF-16le, UTF-32be and UTF-32le). - Files with no BOM will still be recognised as UTF-8 if they include nothing that is invalid in UTF-8.
- Fallback is to Latin1.
- The above means that 7-bit US-ASCII will be diagnosed as UTF-8; this is not a problem as long as you don't add to them any characters with the high bit set, since the codepoints U+0000 to U+007F have both the same meaning and the same representation in ASCII and UTF-8. The first time you add a character above 0x7F to such a file, you will have to save it with, for instance,
:setlocal fenc=latin1
if you want it to be encoded in Latin1. From then on, the file (containing one or more bytes with high bit set in combinations invalid in UTF-8) will be recognised as Latin1 by the 'fileencodings' heuristics set above. - It also means that for non-UTF-8 Unicode files with no BOM, or in general for anything not autodetected (such as 8-bit files other than Latin1), you will have to specify the encoding yourself (e.g. ":e ++enc=utf-16le filename.txt"). Also with the above settings, new files will be created in Latin1. To create a new file in UTF-8, use for instance
        :setlocal fenc=utf-8

        :help Unicode
        :help 'encoding'
        :help 'termencoding'
        :help 'fileencodings'
        :help 'fileencoding'
        :help 'bomb'
        :help ++opt


No comments:

Post a Comment