History of Computers - Shannon-Fano Coding

From SJS Wiki
Jump to: navigation, search

Introduction

Shannon-Fano coding was the first form of lossless compression, compression in which the quality of the data being compressed is not altered. It was first revealed in “A Mathematical Theory of Communication” in 1948 in the Bell System Technical Journal by Claude Shannon.[1][2]

Overview

Consider the following page which contains 100 characters that use an alphabet that only uses A,B,C,D, and E in which A’s frequency = 30%, B’s frequency = 26%, C’s frequency = 24%, D’s frequency = 11%, and E’s frequency= 9%. First, the page is divided into two sets based on frequency (the smallest set with a combined frequency that is greater than the other set goes in one category and the leftover ones go in the other set). Set 1, composed of A and B, is issued a 0 as the first digit, while set 2, composed of C, D, and E, is issued a 1 as the first digit. This step is then repeated among sets until each letter is in its own set. A is assigned another 0 digit while B is assigned a 1 digit. C is assigned a 0 digit while D and E are assigned another 1 digit. Then, D is issued a 0 while E is issued another 1 digit. This results in each letter having a binary code (A’s code is 00, B’s code is 01, C’s code is 10, D’s code is 110, and E’s code is 111).[3] These codes can be used instead of the three bits that would usually be needed for each letter in this alphabet.

300px-ShannonCodeAlg.svg.png [4]

This saves us a lot of space. Using linear coding to store this page would require 300 bits, while using Shannon-Fano coding would only require 218 bits.[5] This reduces the size of the document by about 28%. This space is saved because some characters only have two bits instead of three. While this might not seem like a lot, when there is a lot of data, 28% saved could mean gigabytes or even terabytes worth of data saved.

Significance

While it is not the most optimal form of lossless compression, Shannon-Fano coding was the first form of lossless compression, a field of information theory that is becoming even more vital by the second with so much memory being used for all sorts of data. Lossless compression will play a large part in the future of computing, and Shannon-Fano coding pioneered it.

References

  1. http://www.britannica.com/biography/Claude-Shannon
  2. http://www.britannica.com/topic/information-theory#ref710425
  3. https://www.youtube.com/watch?v=dJCck1OgsIA - A Youtube video created by www.Stat-Lab.com, a website devoted to the study of statistics, that explains the method of Shannon-Fano coding
  4. https://en.wikipedia.org/wiki/Shannon%E2%80%93Fano_coding
  5. http://www.binaryessence.com/dct/en000046.htm Note - This only provided an explanation of the method

-Sebastian Varma Svarma (talk) 20:36, 14 September 2015 (CDT)