M4JPEG - Free and Open Source Steganography Software | A detailed look at Steganography

Information Hiding
and its Applications
Steganography and Watermarking
A detailed look at Steganography
   ◦ Text Steganography
   ◦ Hypertext Steganography
   ◦ Audio Steganography
   ◦ Image Steganography
   ◦ Steganography in Open System
Image Steganography Techniques
   ◦ Spatial Domain LSB Insertion
   ◦ Masking and Filtering
   ◦ DCT-based Steganography
   ◦ Wavelet-based Steganography
How to Detect Steganography
   ◦ Blind Detection
   ◦ Analytical Detection

A detailed look at Steganography

Lets look at what a theoretically perfect secret communication (Steganography) would consist of. To illustrate this concept, we will use three fictitious characters named Amy, Bret and Crystal. Amy wants to send a secret message (M) to Bret using a cover (C) which can be sent to Bret without raising suspicion. Amy then changes the cover message (C) to a stego-object (S) by embedding the secret message (M) into the cover message (C) by using a stego-key (K). Amy should then be able to send the stego object (S) to Bret without being detected by Crystal. Bret will then be able to read the secret message (M) because he knows the stego-key (K) used to embed it into the cover message (C).

In a perfect system, a normal cover should not be distinguishable from a stego-object, neither by a human nor by a computer looking for statistical patterns. In practice, however, this is not always the case. In order to embed secret data into a cover message, the cover must contain a sufficient amount of redundant data or noise. This is because the embedding process Steganography replaces this redundant data with the secret message. This limits the types of data that we can use with Steganography.

In practice, there are basically three types of steganographic protocols used. They are Pure Steganography, Secret Key Steganography and Public Key Steganography. Pure Steganography is defined as a steganographic system that does not require the exchange of a cipher such as a stego-key. This method of Steganography is the least secure means by which to communicate secretly because the sender and receiver can rely only upon the presumption that no other parties are aware of this secret message. Secret Key Steganography is defined as a steganographic system that requires the exchange of a secret key (stego-key) prior to communication. Secret Key Steganography takes a cover message and embeds the secret message inside of it by using a secret key (stego-key). Only the parties who know the secret key can reverse the process and read the secret message.

Public Key Steganography takes the concepts from Public Key Cryptography as explained below. Public Key Steganography is defined as a steganographic system that uses a public key and a private key to secure the communication between the parties wanting to communicate secretly. The sender will use the public key during the encoding process and only the private key, which has a direct mathematical relationship with the public key, can decipher the secret message. Public Key Steganography provides a more robust way of implementing a steganographic system because it can utilize a much more robust and researched technology in Public Key Cryptography.

Text-based steganography:

Encoding secret messages in text can be a very challenging task. This is because text files have a very small amount of redundant data to replace with a secret message. Another drawback is the ease of which text based Steganography can be altered by an unwanted parties by just changing the text itself or reformatting the text to some other form (from .TXT to .PDF, etc.). There are numerous methods by which to accomplish text based Steganography. Below a few of the more popular encoding methods are introduced.

Line-shift encoding involves actually shifting each line of text vertically up or down by as little as 3 centimeters. Depending on whether the line was up or down from the stationary line would equate to a value that would or could be encoded into a secret message.

Word-shift encoding works in much the same way that line-shift encoding works; only we use the horizontal spaces between words to equate a value for the hidden message. This method of encoding is less visible than line-shift encoding but requires that the text format support variable spacing.

Feature specific encoding involves encoding secret messages into formatted text by changing certain text attributes such as vertical/horizontal length of letters such as b, d, T, etc. All three of these text-based encoding methods require either the original file or the knowledge of the original files formatting to be able to decode the secret message.

hyper text-based steganography:

Steganography with hypertext can also be done a variety of different ways, like the text files the hyper text files have a small amount of redundant data to replace with a secret message so encoding secret messages in hyper text can be a very challenging task so it has the same drawbacks of the text-based steganography.

There are numerous methods by which to accomplish hyper text-based Steganography, we have the same methods that have been mentioned in text-based steganography, line-shift encoding and word-shift encoding. In addition hiding data could be performed by use of Hypertext comment notations that can take place within the hypertext file. In other hand we can hide secret messages in the web visible page by a special Arrangement of the contents on a given web page or by the presence or absence of content elements like images and phrases.

Audio-based Steganography:

Encoding secret messages in audio is the most challenging technique to use when dealing with Steganography. This is because the human auditory system (HAS) has such a dynamic range that it can listen over. To put this in perspective, the (HAS) perceives over a range of power greater than one million to one and a range of frequencies greater than one thousand to one making it extremely hard to add or remove data from the original data structure. The only weakness in the (HAS) comes at trying to differentiate sounds (loud sounds drown out quiet sounds) and this is what must be exploited to encode secret messages in audio without being detected.

There are two concepts to consider before choosing an encoding technique for audio. They are the digital format of the audio and the transmission medium of the audio. There are three main digital audio formats typically in use. They are Sample Quantization, Temporal Sampling Rate and Perceptual Sampling.

Sample Quantization which is a 16-bit linear sampling architecture used by popular audio formats such as (WAV and AIFF). Temporal Sampling Rate uses selectable frequencies (in the KHz) to sample the audio. Generally, the higher the sampling rate is, the higher the usable data space gets. The last audio format is Perceptual Sampling. This format changes the statistics of the audio drastically by encoding only the parts the listener perceives, thus maintaining the sound but changing the signal. This format is used by the most popular digital audio on the Internet today in ISO MPEG (MP3).

Transmission medium (path the audio takes from sender to receiver) must also be considered when encoding secret messages in audio. There are four possible transmission mediums:
1) Digital end to end - from machine to machine without modification.
2) Increased/decreased resampling - the sample rate is modified but remains digital.
3) Analog and resampled - signal is changed to analog and resampled at a different rate.
4) Over the air - signal is transmitted into radio frequencies and resampled from a microphone.

We will now look at three of the more popular encoding methods for hiding data inside of audio. They are low-bit encoding, phase-coding and spread spectrum:

• Low-bit encoding embeds secret data into the least significant bit (LSB) of the audio file. This method is easy to incorporate but is very susceptible to data loss due to channel noise and resampling.

• Phase coding substitutes the phase of an initial audio segment with a reference phase that represents the hidden data. This can be thought of, as sort of an encryption for the audio signal by using what is known as Discrete Fourier Transform (DFT), which is nothing more than a transformation algorithm for the audio signal.

• Spread spectrum encodes the audio over almost the entire frequency spectrum. It then transmits the audio over different frequencies which will vary depending on what spread spectrum method is used. Direct Sequence Spread Spectrum (DSSS) is one such method that spreads the signal by multiplying the source signal by some pseudo random sequence known as a (CHIP). The sampling rate is then used as the chip rate for the audio signal communication. Spread spectrum encoding techniques are the most secure means by which to send hidden messages in audio, but it can introduce random noise to the audio thus creating the chance of data loss.

Image-based Steganography:

Coding secret messages in digital images is by far the most widely used of all methods in the digital world of today? This is because it can take advantage of the limited power of the human visual system (HVS). Almost any plain text, cipher text, image and any other media that can be encoded into a bit stream can be hidden in a digital image. With the continued growth of strong graphics power in computers and the research being put into image based Steganography, this field will continue to grow at a very rapid pace.

Before diving into coding techniques for digital images, a brief explanation of digital image architecture and digital image compression techniques should be explained. To a computer, an image is an array of numbers that represent light intensities at various points, or pixels. These pixels make up the images raster data. When dealing with digital images for use with Steganography, 8-bit and 24-bit per pixel image files are typical. Both have advantages and disadvantages, as we will explain below. 8-bit images are a great format to use because of their relatively small size. The drawback is that only 256 possible colors can be used which can be a potential problem during encoding. Usually a gray scale color palette is used when dealing with 8-bit images such as (GIF) because its gradual change in color will be harder to detect after the image has been encoded with the secret message. 24-bit images offer much more flexibility when used for Steganography. The large numbers of colors (over 16 million) that can be used go well beyond the human visual system (HVS), which makes it very hard to detect once a secret message, has been encoded. The other benefit is that a much larger amount of hidden data can be encoded into a 24-bit digital image as opposed to an 8-bit digital image. The one major drawback to 24-bit digital images is their large size (usually in MB) makes them more suspect than the much smaller 8-bit digital images (usually in KB) when sent over an open system such as the Internet.

Digital image compression is a good solution to large digital images such as the 24-bit images mentioned earlier. There are two types of compression used in digital images, lossy and lossless. Lossy compression such as (JPEG) greatly reduces the size of a digital image by removing excess image data and calculating a close approximation of the original image. Lossy compression is usually used with 24-bit digital images to reduce its size, but it does carry one major drawback. Lossy compression techniques increase the possibility that the uncompressed secret message will lose parts of its contents because of the fact that lossy compression removes what it sees as excess image data. Lossless compression techniques, as the name suggests, keeps the original digital image in tact without the chance of loss. It is for this reason that it is the compression technique of choice for steganographic uses. Examples of lossless compression techniques are (GIF and BMP). The only drawback to lossless image compression is that it doesn't do a very good job at compressing the size of the image data.

We will now discuss a couple of the more popular digital image encoding techniques used today. They are least significant bit (LSB) encoding and masking and filtering techniques:

Least significant bit (LSB) encoding is by far the most popular of the coding techniques used for digital images. By using the LSB of each byte (8 bits) in an image for a secret message, you can store 3 bits of data in each pixel for 24-bit images and 1 bit in each pixel for 8-bit images. As you can see, much more information can be stored in a 24-bit image file. Depending on the color palette used for the cover image (i.e., all gray), it is possible to take 2 LSB's from one byte without the human visual system (HVS) being able to tell the difference. The only problem with this technique is that it is very vulnerable to attacks such as image changes and formatting (changing from .GIF to .JPEG).

Masking and filtering techniques for digital image encoding such as Digital Watermarking (integrating a companies logo on there web content) are more popular with lossy compression techniques such as (JPEG). This technique actually extends an image data by masking the secret data over the original data as opposed to hiding information inside of the data. Some experts argue that this is definitely a form of Information Hiding, but not technically Steganography. The beauty of Masking and filtering techniques are that they are immune to image manipulation which makes there possible uses very robust.

There are many other techniques that use complex algorithms, image transformation techniques and image encryption techniques are still relatively new, but show promise to be more secure and robust ways to use digital images in Steganography.

Steganography in an open system:

In this section we will look at some of the possible applications for steganography. The three most popular and researched uses for steganography in an open systems environment are covert channels, embedded data and digital watermarking.

Covert channels in TCP/IP involve masking identification information in the TCP/IP headers to hide the true identity of one or more systems. This can be very useful for any secure communications needs over open systems such as the Internet when absolute secrecy is needed for an entire communication process and not just one document as mentioned next. Using containers (cover messages) to embed secret messages into is by far the most popular use of Steganography today. This method of Steganography is very useful when a party must send a top secret, private or highly sensitive document over an open systems environment such as the Internet. By embedding the hidden data into the cover message and sending it, you can gain a sense of security by the fact that no one knows you have sent more than a harmless message other than the intended recipients.

Although not a pure steganographic technique, digital watermarking is very common in Today’s world and does use Steganographic techniques to embed information into documents. Digital watermarking is usually used for copy write reasons by companies or entities that wish to protect their property by either embedding their trademark into their property or by concealing serial numbers/license information in software, etc. Digital watermarking is very important in the detection and prosecution of software pirates and digital thieves.

Reference:
• A Detailed Look at Steganographic Techniques and their Use in an Open-Systems Environment, by Bret Dunbar, January 2002.