Image Compression WIP

Meta Description

What is image compression? Learn how binary numbers can be used to create and send graphical images over computer networks.

Learning Objectives

Understand how images are compressed in a computer system.

Exercising artistic and creative abilities.

Understanding the basic concepts of the binary number system.

Learning how to use graphs to represent images.

Key Terms

Binary system
A number system which represents numbers using two digits: 0 and 1. Binary numbers are used in digital systems, such as computers. Bit Represents a binary digit, which is a single 1 or 0 in the binary system.

Compression Ratio
This is a term used in compression to quantify the reduction in data after a file has been operated on by a compression algorithm. For example, a compression ratio of 1:700 means that the compressed file is 700 times smaller than the original file.

Data compression
A process which electronic devices such as computers use to transform large files into smaller files containing the same information.

Pixel
A picture element, the smallest element on a display screen.

Run-length coding
A lossless type of data compression where a stream of repeated characters is stored as a single character together with a count of the number of times it is repeated. For example, AAA becomes A3.

Step 1
Prepare a list with directions for the participant which indicate how the image should be created on the graph paper. The directions should take the form of a list of lines of numbers, which indicate how the participant should colour in the paper. In the directions, an empty box represents a binary zero (0) and a coloured-in box is represented using a binary one (1). In this list, there are as many rows as there are rows on the paper. Each row contains a sequence of numbers representing how the binary data in the corresponding row of the paper is to be filled in. Always assume that the first number in a row represents the number of consecutive blank (white) boxes at the start of the row. The second number represents the number of consecutive black boxes in the row after the block of white and so on. For example, if a line in the list contains the numbers 1,3,1 – this means that the first box should remain white, the next three boxes should be filled in black and the last box should remain white. If the line starts with a 0, then the line should start by filling in a number of black boxes. For example, a line with 0,5 would result in a row of 5 black boxes being filled it. The list can be created to result in an image, such as a letter, being created using the blank and filled boxes.

Step 2
Present the audience with the sheet and explain how the graph paper needs to be filled in. Instruct participants to go through the list, reading the numbers associated with a given row in the paper and filling in the boxes accordingly.

• The size of the paper, the number of divisions and design of the artwork depends on the how complex the demonstrator would like the activity to be.
• Participants could create the instruction list themselves and use it to fill in the graph paper.
1. Always recycle or reuse any papers after the activity.

Start by asking what computers use to generate pictures and images on a screen. The answer should be pixels, which is short for ‘picture element’. The computer uses binary coding to generate the necessary structure and colour for this picture. Consider a two tone picture, for example, a black a white picture. Each pixel could be either black or white which can be represented by either a 1 or a 0 respectively, which are binary digits. Hence, black and white pictures require a single bit of data to represent the colour of each pixel.

What does a white box represent?
A 0 bit.

What does a black box represent?
A 1 bit.

What if the picture has colours?
The number of bits used to represent the colour of each pixel must be increased.

Why isn’t the image represented as a series of 1s and 0s instead of these shorter but larger numbers?
Compression.

How is this related to computer science?
Computers use this technique to store and transmit data more efficiently.

Each line represents the amount of boxes that should or should not be filled in. An example can be used to show this graphically.

The following representation of the letter A can be represented using the following lines:

5

1,3,1

1,1,1,1,1

1,1,1,1,1

1,3,1

1,1,1,1,1

1,1,1,1,1

5

5

5

The lines in the above script are compressed. Taking, for example, the first line: the 5 represents that all the boxes should be left empty, as white pixels. The original uncompressed line would be 0, 0, 0, 0, 0 which represents that each box would be left empty. The number 5 alone can be represented by 101 in binary code which can be represented in 3 bits, but the original uncompressed line would require 5 bits. The small decrease in bits across each line could be enough to drastically compress an image file.

Another example would be the 1,3,1 in the second line. The numbers in the line represent that the first box should be left empty (white pixel), the next 3 boxes should be filled in using a black marker (black pixels) and the last box should be left empty (white pixel). The original uncompressed line would be 0, 1, 1, 1, 0 in this case.

Thus, image compression is important for transmission and storage. The more compressed the image the less time it takes for it to transfer from one computer to another and the less space it occupies on a hard drive, allowing for more files to be stored on the computer. On a large scale, this saves a lot of time and reduces costs spent on data storage.
(Source: https://www.youtube.com/watch?v=VsjpPs146d8. Explaining what happens in the video)

Consider a two-tone picture, for example a black and white image. Each pixel could be either black or white, which can be represented in binary by either a 1 or a 0. Hence, black and white pictures require a single bit of data for each pixel. For a colourful picture, the amount of bits used must be increased. For example 8 bits can give 256 different colour representations, since 28 = 256 (1 + 2 + 4 + 8 + 16 + 32 + 64 + 128 = 0-255).

An image could be described as a stream of 1s and 0s, however  describing a line by compressing it as shown by this activity should provide the same exact image by using a smaller amount of data. This is a brief and basic example of data compression. It is essentially how fax machines work: a document is loaded into the fax machine and the machine transmits a sequence of lines through a modem to the recipient whose fax machine then uses the lines to print the same exact representation of the document.

Let us take the following line as an example which contains information related to an image:

BBBBBBBBBBBBBBBAABBBBBBBBBBBBBAAAABBBBBBABBBBBBBB

This line can be compressed in a simple line as follows:

15B2A13B4A6B1A8B

The compressed line is able to describe the same piece of information in 16 characters instead of 49 characters. This is approximately a third of the original line. Normally image files are bound to have millions of characters so imagine the advantages gained by using compression algorithms, not just within images but also within text files and video files.

This type of compressing is known to programmers as ‘run-length coding’  and it is an effective way to reduce the amount of storage space required to store an image. Without compression, the images take a longer time to be transmitted and the huge file sizes would make it infeasible to show images through web pages.

However, compression does have some disadvantages. When a user edits an image file on a photo editing software, such as Adobe Photoshop, they should do so using a raw, uncompressed file. Raw uncompressed files store all the data coming from the image sensor of the camera taking the picture. This means it contains information on brightness and white balance, amongst other things. Image files which are compressed do not have this kind of information since it does not affect image quality or detail.  Editing uncompressed files is easier because users are able to edit and modify all the data coming from the image sensor.

Nowadays, using more complex algorithms and techniques, computers are able to compress images to a hundredth of their original size which allows a larger amount of images to be stored on a hard disk and uploaded to web pages.

Application
The importance of image compression comes about due to the growing need to store data in a more efficient manner and the need to make data transfer over computer networks such as the internet faster. It may not seem like it but image compression is all around us. For example, images on Facebook are compressed to keep the application running smoothly which is why new posts seem to load almost instantaneously.

Research
A lossless compression algorithm identifies bits and eliminates all the statistical redundancies it can find whilst not removing any information in the file. Conversely, lossy compression eliminates unused bits and thus removes the less important information. Research is currently under way to try and incorporate both types of compressions into a single algorithm. This would result in a single file with elements of lossy and lossless compression. Thus the size of files would be reduced and can be transferred in a more efficient way while still preserving all essential information.

A genome sequence is typically stored as a 3GB text based file. As research regarding DNA becomes more prominent and developed, the need to store these files in a more efficient manner becomes more important. This article discusses how JDNA has been implemented to compress genome sequencing files. JDNA is a free, open-source Java tool which has been compared to other state of the art tools such as FRESCO and has performed to a similar standard and in some aspects faster than the norm. The compression ratio reached by these algorithms is on average about 1:700.

Participants could be asked to create their own images and translate them into lines of compressed data which can then be passed on to other students who would try to recreate the images from the sequence of numbers given to them.

Education

Time Required

• ~30 minutes

• Preparation: 10 minutes

• Conducting: 15 minutes

• Clean Up: 2 minutes

Cost

Recommended Age

Number of People

Supervision

Materials

A paper divided into boxes. Graph paper can be used, but for a more specific artwork, a custom made paper with a specific amount of divisions would be required.

Black marker

Contributors

Sources

Colour by Numbers—Image Representation

Compression (Video)

Run-Length Coding (Video)

Encoding images (Beginner)

How Image Compression Works: The Basics (Beginner)

Pixel (Beginner)

Cite this Experiment