REPRESENTATION OF DATA ON COMPUTER

Data Representation refers to the methods used internally to represent information stored in a computer. Computers store lots of different types of information:
  • numbers
  • text
  • graphics of many varieties (stills, video, animation)
  • sound
At least, these all seem different to us. However, ALL types of information stored in a computer are stored internally in the same simple format: a sequence of 0's and 1's. How can a sequence of 0's and 1's represent things as diverse as your photograph, your favorite song, a recent movie, and your term paper?
It all depends on how we interpret the information. Computers use numeric codes to represent all the information they store. These codes are similar to those you may have used as a child to encrypt secret notes: let 1 stand for A, 2 stand for B, etc. With this code, any written message can be represented numerically. The codes used by computers are a bit more sophisticated, and they are based on the binary number system (base two) instead of the more familiar (for the moment, at least!) decimal system. Computers use a variety of different codes. Some are used for numbers, others for text, and still others for sound and graphics.

Memory Structure in Computer

  • Memory consists of bits (0 or 1)
    • a single bit can represent two pieces of information
  • bytes (=8 bits)
    • a single byte can represent 256 = 2x2x2x2x2x2x2x2 = 28 pieces of information
  • words (=2,4, or 8 bytes)
    • a 2 byte word can represent 2562 pieces of information (approximately 65 thousand).
  • Byte addressable - each byte has its own address.

Binary Numbers

Normally we write numbers using digits 0 to 9. This is called base 10. However, any positive integer (whole number) can be easily represented by a sequence of 0's and 1's. Numbers in this form are said to be in base 2 and they are called binary numbers. Base 10 numbers use a positional system based on powers of 10 to indicate their value. The number 123 is really 1 hundred + 2 tens + 3 ones. The value of each position is determined by ever-higher powers of 10, read from left to right. Base 2 works the same way, just with different powers. The number 101 in base 2 is really 1 four + 0 twos + 1 one (which equals 5 in base 10). For more of a comparison, click here.

Text

Text can be represented easily by assigning a unique numeric value for each symbol used in the text. For example, the widely used ASCII code (American Standard Code for Information Interchange) defines 128 different symbols (all the characters found on a standard keyboard, plus a few extra), and assigns to each a unique numeric code between 0 and 127. In ASCII, an "A" is 65," B" is 66, "a" is 97, "b" is 98, and so forth. When you save a file as "plain text", it is stored using ASCII. ASCII format uses 1 byte per character 1 byte gives only 256 (128 standard and 128 non-standard) possible characters The code value for any character can be converted to base 2, so any written message made up of ASCII characters can be converted to a string of 0's and 1's.

Graphics

Graphics that are displayed on a computer screen consist of pixels: the tiny "dots" of color that collectively "paint" a graphic image on a computer screen. The pixels are organized into many rows on the screen. In one common configuration, each row is 640 pixels long, and there are 480 such rows. Another configuration (and the one used on the screens in the lab) is 800 pixels per row with 600 rows, which is referred to as a "resolution of 800x600." Each pixel has two properties: its location on the screen and its color.
A graphic image can be represented by a list of pixels. Imagine all the rows of pixels on the screen laid out end to end in one long row. This gives the pixel list, and a pixel's location in the list corresponds to its position on the screen. A pixel's color is represented by a binary code, and consists of a certain number of bits. In a monochrome (black and white) image, only 1 bit is needed per pixel: 0 for black, 1 for white, for example. A 16 color image requires 4 bits per pixel. Modern display hardware allows for 24 bits per pixel, which provides an astounding array of 16.7 million possible colors for each pixel!

Compression

Files today are so information-rich that they have become very large. This is particularly true of graphics files. With so many pixels in the list, and so many bits per pixel, a graphic file can easily take up over a megabyte of storage. Files containing large software applications can require 50 megabytes or more! This causes two problems: it becomes costly to store the files (requires many floppy disks or excessive room on a hard drive), and it becomes costly to transmit these files over networks and phone lines because the transmission takes a long time. In addition to studying how various types of data are represented, you will have the opportunity today to look at a technique known as data compression. The basic idea of compression is to make a file shorter by removing redundancies (repeated patterns of bits) from it. This shortened file must of course be de-compressed - have its redundancies put back in - in order to be used. However, it can be stored or transmitted in its shorter compressed form, saving both time and money.

Data Representation

Most of us write numbers in Arabic form, ie, 1, 2, 3,..., 9. Some people write them differently, such as I, II, III, IV,..., IX. Nomatter what type of representation, most human beings can understand, at least the two types I mentioned. Unfortunately the computer doesn't. Computer is the most stupid thing you can ever encounter in your life.
Modern computers are built up with transistors. Whenever an electric current pass into the transistors either an ON or OFF status will be established. Therefore the computer can only reconize two numbers, 0for OFF, and 1 for ON, which can be referred to as BIT. There is nothing in between Bit 0 and Bit 1 (eg Bit 0.5 doesn't exist). Hence computers can be said to be discrete machines. The number system consists only of two numbers is called Binary System. And to distinguish the different numbering systems, the numbers human use, ie 1,2,3,4..., will be called Decimals (since they are based 10 numbers) from now on.
How, therefore, can computer understand numbers larger than 1? The answer is simple, 2 is simply 1+1, (like 10 = 9+1 for human) the numbers are added and overflow digit is carred over to the left position. So (decimal) 2 is representated in Binary as 10. To further illustrate the relationship, I have listed the numbers 1 to 9 in both systems for compaison:
DecimalBinary
00000 0000
10000 0001
20000 0010
30000 0011
40000 0100
50000 0101
60000 0110
70000 0111
80000 1000
90000 1001
You may ask why do I always put 8 binary digits there. Well, the smallest unit in the computer's memory to store data is called a BYTE, which consists of 8 BITS. One Byte allows upto 256 different combinations of data representation (28= 256). What happens when we have numbers greater than 256? The computer simply uses more Bytes to hold the value, 2 Bytes can hold values upto 65536 (216) and so forth.

ASCII FORMAT

Not only does the computer not understand the (decimal) numbers you use, it doesn't even understand letters like "ABCDEFG...". The fact is, it doesn't care. Whatever letters you input into the computer, the computer just saves it there and delivers to you when you instruct it so. It saves these letters in the same Binary format as digits, in accordance to a pattern. In PC (including DOS, Windows 95/98/NT, and UNIX), the pattern is called ASCII (pronounced ask-ee) which stands for American Standard Code for Information Interchange.
In this format, the letter "A" is represented by "0100 0001" ,or most often, referred to decimal 65 in the ASCII Table. The standard coding under ASCII is here. When performing comparison of characters, the computer actually looks up the associated ASCII codes and compare the ASCII values instead of the characters. Therefore the letter "B" which has ASCII value of 66 is greater than the letter "A" with ASCII value of 65.

Data Types

The computer stores data in different formats or types. The number 10 can be stored as numeric value as in "10 dollars" or as character as in the address "10 Main Street".  So how can the computer tell? Once again the computer doesn't care, it is your responsibility to ensure that you get the correct data out of it. (For illustration character 10 and numeric 10 are represented by 0011-0001-0011-0000 and 0000-1010 respectively — you can see how different they are.) Different programming languages have different data types, although the fUndamental ones are usually very similar.


Comments

Popular posts from this blog

Work certified and uncertified

Sale of Goods Act- Conditions and Warranties

E Commerce Security environment