ASCII

Computer Methods in Chemical Engineering


Table of Contents


ASCII Table

In a computer, each English alphabet is represented by a series of eight bits, equivalently eight 0's and 1's. The rule of representing an alphabet with a number is specified in a standard called ASCII (pronounced as "askee", which stands for "American Standard Code for Information Interchange"). I passed out this table in class. Click
here to download this table -- you will need to save it on your disk because an internet browser distorts the table and does not display it the same way as DOS does. Only the first 128 (i.e., ASCII 0 through ASCII 127) are common among all computers, whether they are a PC, a Mac, a UNIX or VMS mainframe, a printer, or some other computer-related equipment. Everyone follows this standard. Note that only 7 bits are required to yield the first 128 combinations. During antiquity, the 8th bit was used as the parity bit. Nowadays, computers make use of this bit to code another 128 characters, mostly the accented characters of the Western European languages. Note that all characters on the upper half of the ASCII table (i.e., ASCII 128 through ASCII 255) uniformly have the highest bit set to "1". The U.S. DOS has its own set of characters for the second half of the ASCII codes; however, this coding is NOT unique among different computers, or even among different DOS settings or software. For example, DOS and Windows have different uses for ASCII 128 through ASCII 255, although these two products both come from the same company! The ISO-8859-1 standard includes the first 128 characters plus the accented ones.

For example, 01000001, or decimal equivalent of 65 and hexadecimal equivalent of 41, is the alphabet "A". Note that upper cases and lower cases have distinct representations that differ by 32 decimal or 20 hexadecimal. To enter a character that your PC keyboard does not have, you enter the ASCII number with the number keys in the number key pad section (not the numbers at the top of the keyboard) while pressing down the "ALT" key, When you release the ALT key, the character is entered. For example, ALT-65 gives you an "A". Another example: blank space is ASCII 32 decimal or ASCII 20 hexadecimal; thus, a blank space in a URL is represented as "%20", e.g., "www.nogood.edu/nobody/%20/abc."

There are a number of tricks with nonstandard characters. For example, you can create files named with graphical characters. You can create a file with ASCII 255 -- another blank character that does not show up but differs completely from the blank space character of ASCII 32.


Control Codes

ASCII 1 through ASCII 26 are also known as control characters. They are generated by simultaneously pressing the CTRL key and the alphabet keys, e.g, CTRL-A. These characters have special meanings that are nearly universal in all computers and associated equipment. Some are listed below.
   -------------------------------------------------
   CTRL Key      Meaning
   -------------------------------------------------
   CTRL-C        kill a program
   CTRL-D        eof (end-of-file) character in UNIX
   CTRL-G        bell
   CTRL-H        backspace
   CTRL-I        horizontal tab
   CTRL-J        line feed
   CTRL-K        vertical tab
   CTRL-L        form feed (page eject)
   CTRL-M        carriage return (enter key)
   CTRL-P        toggle display echo to printer
   CTRL-Q        restart a suspended program
   CTRL-S        suspend a program
   CTRL-Z        eof (end-of-file) character in DOS
   -------------------------------------------------

Side Bar -- Chinese

Compared to English's 26 alphabets, Chinese has tens of thousands of characters. (Yes, my sons spend every weekend learning them.) Naturally, one byte does not provide adequate number of combinations. Each Chinese character is represented by two bytes, or 16 bits. Thus, two bytes can give 216=64K combinations, just enough to handle the rich variety of Chinese characters. To enable a U.S.-based hardware and software to handle Chinese, the slightly modified operating system only needs to look at two bytes at a time and render the characters on a display based on a different lookup table. Thus, with a slightly modified operating system, you can run U.S. application software in Chinese without any further modification in the application software itself. Neat, isn't it?

Unfortunately, mainly because of politics, there are several Chinese coding systems. The most common ones are GuoBiao (GB), which is commonly used in mainland China and Singapore; Big5, which is popular in Taiwan and Hong Kong; and the Unicode. GB contains about 6500 Chinese characters, where the first and second bytes are both coded from Hex A1 to Hex FE. On the other hand, Big5 contains about 13000 characters, where the first byte is coded from Hex A1 to FE adn the second byte from Hex 40 to 7F or Hex A1 to FE. Note the 127 standard ASCII alphabets are intermeshed with GB and Big5. Furthermore, to make the matter worse, many languages, including Japanese and Korean use a subset or a variant of the Chinese characeters (kanji). Each of these languages, in turn, has several encoding systems. Some people speculate that Unicode, which is adopted in MS IE5, will eventually become the standard, and we we no longer need to worry about the encoding scheme. In the means time, see how to configure your Windows to read Chinese/Japanese/Korean.

Side Bar -- Nature's Way of Representing Genetic Information

The biological system, from humans to lowly bacteria, uses a nearly uniform quaternary "number system". All genetic information is stored in DNA, which has four possible states: nucleotide bases of A, T, C, and G. We can simply assign each of these bases a number of 0, 1, 2, and 3. A living organism has an "alphabet system" of 20 amino acids. Thus, two nucleotide bases give 42=16 combinations, which are just a little short of the 20 amino acids. However, three nucleotide bases give 43=64 combinations, which are sufficient to represent the 20 amino acid alphabet. Indeed, the nature does actually code each amino acid with three nucleotide bases. Isn't nature just as smart as can be!
Summary of Analogy
                                 Computer            Nature
  ------------------------------------------------------------
  # Possible States              2 (0 & 1)       4 (A, T, C G)
  # of Characters              26+ alphabets   20 amino acids
  # of Binary Digits Needed         8                  3


Return to Prof. Nam Sun Wang's Home Page
Return to Computer Methods in Chemical Engineering (ENCH250)

Computer Methods in Chemical Engineering -- ASCII
Forward comments to:
Nam Sun Wang
Department of Chemical & Biomolecular Engineering
University of Maryland
College Park, MD 20742-2111
301-405-1910 (voice)
301-314-9126 (FAX)
e-mail: nsw@umd.edu ©1996-2006 by Nam Sun Wang
UMCP logo