A detailed look at Steganography
Lets look at what a theoretically perfect secret
communication (Steganography) would consist of.
To illustrate this concept, we will use three
fictitious characters named Amy, Bret and
Crystal. Amy wants to send a secret message (M)
to Bret using a cover (C) which can be sent to
Bret without raising suspicion. Amy then changes
the cover message (C) to a stego-object (S) by
embedding the secret message (M) into the cover
message (C) by using a stego-key (K). Amy should
then be able to send the stego object (S) to
Bret without being detected by Crystal. Bret
will then be able to read the secret message (M)
because he knows the stego-key (K) used to embed
it into the cover message (C).
In
a perfect system, a normal cover should not be
distinguishable from a stego-object, neither by
a human nor by a computer looking for
statistical patterns. In practice, however, this
is not always the case. In order to embed secret
data into a cover message, the cover must
contain a sufficient amount of redundant data or
noise. This is because the embedding process
Steganography replaces this redundant data with
the secret message. This limits the types of
data that we can use with Steganography.
In practice, there are basically three types of
steganographic protocols used. They are Pure
Steganography, Secret Key Steganography and
Public Key Steganography. Pure Steganography is
defined as a steganographic system that does not
require the exchange of a cipher such as a
stego-key. This method of Steganography is the
least secure means by which to communicate
secretly because the sender and receiver can
rely only upon the presumption that no other
parties are aware of this secret message.
Secret Key Steganography is defined as a
steganographic system that requires the exchange
of a secret key (stego-key) prior to
communication. Secret Key Steganography takes a
cover message and embeds the secret message
inside of it by using a secret key (stego-key).
Only the parties who know the secret key can
reverse the process and read the secret message.
Public Key Steganography takes the concepts from
Public Key Cryptography as explained below.
Public Key Steganography is defined as a
steganographic system that uses a public key and
a private key to secure the communication
between the parties wanting to communicate
secretly. The sender will use the public key
during the encoding process and only the private
key, which has a direct mathematical
relationship with the public key, can decipher
the secret message. Public Key Steganography
provides a more robust way of implementing a
steganographic system because it can utilize a
much more robust and researched technology in
Public Key Cryptography.
Text-based steganography:
Encoding secret messages in text can be a very
challenging task. This is because text files
have a very small amount of redundant data to
replace with a secret message. Another drawback
is the ease of which text based Steganography
can be altered by an unwanted parties by just
changing the text itself or reformatting the
text to some other form (from .TXT to .PDF,
etc.). There are numerous methods by which to
accomplish text based Steganography. Below a few
of the more popular encoding methods are
introduced.
Line-shift encoding involves actually shifting
each line of text vertically up or down by as
little as 3 centimeters. Depending on whether
the line was up or down from the stationary line
would equate to a value that would or could be
encoded into a secret message.
Word-shift encoding works in much the same way
that line-shift encoding works; only we use the
horizontal spaces between words to equate a
value for the hidden message. This method of
encoding is less visible than line-shift
encoding but requires that the text format
support variable spacing.
Feature specific encoding involves encoding
secret messages into formatted text by changing
certain text attributes such as
vertical/horizontal length of letters such as b,
d, T, etc. All three of these text-based
encoding methods require either the original
file or the knowledge of the original files
formatting to be able to decode the secret
message.
hyper text-based steganography:
Steganography with hypertext can also be done a
variety of different ways, like the text files
the hyper text files have a small amount of
redundant data to replace with a secret message
so encoding secret messages in hyper text can be
a very challenging task so it has the same
drawbacks of the text-based steganography.
There are numerous methods by which to
accomplish hyper text-based Steganography, we
have the same methods that have been mentioned
in text-based steganography, line-shift encoding
and word-shift encoding. In addition hiding data
could be performed by use of Hypertext comment
notations that can take place within the
hypertext file.
In other hand we can hide secret messages in the
web visible page by a special Arrangement of the
contents on a given web page or by the presence
or absence of content elements like images and
phrases.
Audio-based Steganography:
Encoding secret messages in audio is the most
challenging technique to use when dealing with
Steganography. This is because the human
auditory system (HAS) has such a dynamic range
that it can listen over. To put this in
perspective, the (HAS) perceives over a range of
power greater than one million to one and a
range of frequencies greater than one thousand
to one making it extremely hard to add or remove
data from the original data structure. The only
weakness in the (HAS) comes at trying to
differentiate sounds (loud sounds drown out
quiet sounds) and this is what must be exploited
to encode secret messages in audio without being
detected.
There are two concepts to consider before
choosing an encoding technique for audio. They
are the digital format of the audio and the
transmission medium of the audio.
There are three main digital audio formats
typically in use. They are Sample Quantization,
Temporal Sampling Rate and Perceptual Sampling.
Sample Quantization which is a 16-bit linear
sampling architecture used by popular audio
formats such as (WAV and AIFF). Temporal
Sampling Rate uses selectable frequencies (in
the KHz) to sample the audio. Generally, the
higher the sampling rate is, the higher the
usable data space gets. The last audio format is
Perceptual Sampling. This format changes the
statistics of the audio drastically by encoding
only the parts the listener perceives, thus
maintaining the sound but changing the signal.
This format is used by the most popular digital
audio on the Internet today in ISO MPEG (MP3).
Transmission medium (path the audio takes from
sender to receiver) must also be considered when
encoding secret messages in audio. There are
four possible transmission mediums:
1) Digital end to end - from machine to machine
without modification.
2) Increased/decreased resampling - the sample
rate is modified but remains digital.
3) Analog and resampled - signal is changed to
analog and resampled at a different rate.
4) Over the air - signal is transmitted into
radio frequencies and resampled from a
microphone.
We will now look at three of the more popular
encoding methods for hiding data inside of
audio. They are low-bit encoding, phase-coding
and spread spectrum:
•
Low-bit encoding embeds secret data into the
least significant bit (LSB) of the audio file.
This method is easy to incorporate but is very
susceptible to data loss due to channel noise
and resampling.
•
Phase coding substitutes the phase of an initial
audio segment with a reference phase that
represents the hidden data. This can be thought
of, as sort of an encryption for the audio
signal by using what is known as Discrete
Fourier Transform (DFT), which is nothing more
than a transformation algorithm for the audio
signal.
•
Spread spectrum encodes the audio over almost
the entire frequency spectrum. It then transmits
the audio over different frequencies which will
vary depending on what spread spectrum method is
used. Direct Sequence Spread Spectrum (DSSS) is
one such method that spreads the signal by
multiplying the source signal by some pseudo
random sequence known as a (CHIP). The sampling
rate is then used as the chip rate for the audio
signal communication. Spread spectrum encoding
techniques are the most secure means by which to
send hidden messages in audio, but it can
introduce random noise to the audio thus
creating the chance of data loss.
Image-based Steganography:
Coding secret messages in digital images is by
far the most widely used of all methods in the
digital world of today? This is because it can
take advantage of the limited power of the human
visual system (HVS). Almost any plain text,
cipher text, image and any other media that can
be encoded into a bit stream can be hidden in a
digital image. With the continued growth of
strong graphics power in computers and the
research being put into image based
Steganography, this field will continue to grow
at a very rapid pace.
Before diving into coding techniques for digital
images, a brief explanation of digital image
architecture and digital image compression
techniques should be explained.
To a computer, an image is an array of numbers
that represent light intensities at various
points, or pixels. These pixels make up the
images raster data. When dealing with digital
images for use with Steganography, 8-bit and
24-bit per pixel image files are typical. Both
have advantages and disadvantages, as we will
explain below. 8-bit images are a great format
to use because of their relatively small size.
The drawback is that only 256 possible colors
can be used which can be a potential problem
during encoding. Usually a gray scale color
palette is used when dealing with 8-bit images
such as (GIF) because its gradual change in
color will be harder to detect after the image
has been encoded with the secret message. 24-bit
images offer much more flexibility when used for
Steganography. The large numbers of colors (over
16 million) that can be used go well beyond the
human visual system (HVS), which makes it very
hard to detect once a secret message, has been
encoded. The other benefit is that a much larger
amount of hidden data can be encoded into a
24-bit digital image as opposed to an 8-bit
digital image. The one major drawback to 24-bit
digital images is their large size (usually in
MB) makes them more suspect than the much
smaller 8-bit digital images (usually in KB)
when sent over an open system such as the
Internet.
Digital image compression is a good solution to
large digital images such as the 24-bit images
mentioned earlier. There are two types of
compression used in digital images, lossy and
lossless. Lossy compression such as (JPEG)
greatly reduces the size of a digital image by
removing excess image data and calculating a
close approximation of the original image. Lossy
compression is usually used with 24-bit digital
images to reduce its size, but it does carry one
major drawback. Lossy compression techniques
increase the possibility that the uncompressed
secret message will lose parts of its contents
because of the fact that lossy compression
removes what it sees as excess image data.
Lossless compression techniques, as the name
suggests, keeps the original digital image in
tact without the chance of loss. It is for this
reason that it is the compression technique of
choice for steganographic uses. Examples of
lossless compression techniques are (GIF and
BMP). The only drawback to lossless image
compression is that it doesn't do a very good
job at compressing the size of the image data.
We will now discuss a couple of the more popular
digital image encoding techniques used today.
They are least significant bit (LSB) encoding
and masking and filtering techniques:
Least significant bit (LSB) encoding is by far
the most popular of the coding techniques used
for digital images. By using the LSB of each
byte (8 bits) in an image for a secret message,
you can store 3 bits of data in each pixel for
24-bit images and 1 bit in each pixel for 8-bit
images. As you can see, much more information
can be stored in a 24-bit image file. Depending
on the color palette used for the cover image
(i.e., all gray), it is possible to take 2 LSB's
from one byte without the human visual system (HVS)
being able to tell the difference. The only
problem with this technique is that it is very
vulnerable to attacks such as image changes and
formatting (changing from .GIF to .JPEG).
Masking and filtering techniques for digital
image encoding such as Digital Watermarking
(integrating a companies logo on there web
content) are more popular with lossy compression
techniques such as (JPEG). This technique
actually extends an image data by masking the
secret data over the original data as opposed to
hiding information inside of the data. Some
experts argue that this is definitely a form of
Information Hiding, but not technically
Steganography. The beauty of Masking and
filtering techniques are that they are immune to
image manipulation which makes there possible
uses very robust.
There are many other techniques that use complex
algorithms, image transformation techniques and
image encryption techniques are still relatively
new, but show promise to be more secure and
robust ways to use digital images in
Steganography.
Steganography in an open system:
In
this section we will look at some of the
possible applications for steganography. The
three most popular and researched uses for
steganography in an open systems environment are
covert channels, embedded data and digital
watermarking.
Covert channels in TCP/IP involve masking
identification information in the TCP/IP headers
to hide the true identity of one or more
systems. This can be very useful for any secure
communications needs over open systems such as
the Internet when absolute secrecy is needed for
an entire communication process and not just one
document as mentioned next.
Using containers (cover messages) to embed
secret messages into is by far the most popular
use of Steganography today. This method of
Steganography is very useful when a party must
send a top secret, private or highly sensitive
document over an open systems environment such
as the Internet. By embedding the hidden data
into the cover message and sending it, you can
gain a sense of security by the fact that no one
knows you have sent more than a harmless message
other than the intended recipients.
Although not a pure steganographic technique,
digital watermarking is very common in Today’s
world and does use Steganographic techniques to
embed information into documents. Digital
watermarking is usually used for copy write
reasons by companies or entities that wish to
protect their property by either embedding their
trademark into their property or by concealing
serial numbers/license information in software,
etc. Digital watermarking is very important in
the detection and prosecution of software
pirates and digital thieves.
Reference:
• A Detailed Look at Steganographic Techniques
and their Use in an Open-Systems Environment, by
Bret Dunbar, January 2002.
|