Content:
IEEE TRANSACTION ON INFORMATION THEORY, VOL. 47, NO. 4, MAY 2001
1423
Quantization Index Modulation: A Class of Provably Good Methods for Digital Watermarking and Information Embedding Brian Chen, Member, IEEE, and Gregory W. Wornell, Senior Member, IEEE
AbstractWe consider the problem of embedding one signal (e.g., a digital watermark), within another "host" signal to form a third, "composite" signal. The embedding is designed to achieve efficient tradeoffs among the three conflicting goals of maximizing informationembedding rate, minimizing distortion between the host signal and composite signal, and maximizing the robustness of the embedding. We introduce new classes of embedding methods, termed quantization index modulation (QIM) and distortioncompensated QIM (DCQIM), and develop convenient realizations in the form of what we refer to as dither modulation. Using deterministic models to evaluate digital watermarking methods, we show that QIM is "provably good" against arbitrary bounded and fully informed attacks, which arise in several copyright applications, and in particular, it achieves provably better rate distortionrobustness tradeoffs than currently popular spreadspectrum and lowbit(s) modulation methods. Furthermore, we show that for some important classes of
Probabilistic Models, DCQIM is optimal (capacityachieving) and regular QIM is nearoptimal. These include both additive white Gaussian noise (AWGN) channels, which may be good models for hybrid transmission applications such as digital audio broadcasting, and meansquareerrorconstrained attack channels that model privatekey watermarking applications. Index TermsData hiding, digital audio broadcasting, dither modulation, digital watermarking, hybrid transmission, information embedding, quantization index modulation (QIM), steganography. I. INTRODUCTION A NUMBER of applications have emerged recently [1] that require the design of systems for embedding one signal, sometimes called an "embedded signal" or "watermark," within another signal, called a "host signal." The embedding must be done such that the embedded signal is "hidden," i.e., causes no serious degradation to its host. At the same time, the embedding must be robust to common degradations of the watermarked signalthe watermark must survive whenever the host signal Manuscript received June 14, 1999; revised September 20, 2000. This work was supported in part by MIT Lincoln Laboratory Advanced Concepts Committee, the
National Science Foundation under Grant CCR0073520, Microsoft Research, and a National Defense Science and Engineering Graduate Fellowship. The authors are with the Department of
electrical engineering and
Computer Science,
Massachusetts Institute of Technology (MIT), Cambridge, MA 02139 USA (email:
[email protected];
[email protected]). Communicated by T. E. Fuja, Associate Editor At Large. Publisher Item Identifier S 00189448(01)028346.
does. In some applications these degradations are the result of benign processing and transmission; in other cases they result from deliberate attacks. Several of these applications relate to copyright notification and enforcement for audio, video, and images that are distributed in digital formats. In these cases, the embedded signal either notifies a recipient of any copyright or licensing restrictions or inhibits or deters unauthorized copying. For example, this embedded signal could be a digital "fingerprint" that uniquely identifies the original purchaser of the copyrighted work. If illicit copies of the work were made, all copies would carry this fingerprint, thus identifying the owner of the copy from which all illicit copies were made. In another example, the embedded signal could either enable or disable copying by some duplication device that checks the embedded signal before proceeding with duplication. Such a system has been proposed for allowing a copyonce feature in digital video disc recorders [2]. Alternatively, a standardscompliant player could check the watermark before deciding whether or not to play the disc [3]. Other applications include automated monitoring of airplay of advertisements on commercial radio broadcasts. Advertisers can embed a digital watermark within their ads and count the number of times the watermark occurs during a given broadcast period, thus ensuring that their ads are played as often as promised. In other applications, the embedded signal may be used for authentication ofor detection of tampering withthe host signal. For example, a digital signature could be embedded in a military map. A number of other
national security applications are described in [4] and include covert communication, sometimes called "steganography" or low probability of detection communication, and socalled traitor tracing, a version of the digital fingerprinting application described above used for tracing the source of leaked information. One final application for which the digital watermarking methods developed in this paper are wellsuited is the backwardcompatible upgrading of an existing communication system, an example of which is the socalled hybrid inband onchannel digital audio broadcasting [5], [6]. In this application, one would like to simultaneously transmit a digital signal with existing analog (AM and/or FM) commercial broadcast radio without interfering with conventional analog reception. Thus, the analog signal is the host signal and the digital signal is the watermark. Since the embedding does not degrade the host signal too much, conventional analog receivers can demodulate the analog host signal. In addition,
00189448/01$10.00 © 2001 IEEE
1424
IEEE TRANSACTION ON INFORMATION THEORY, VOL. 47, NO. 4, MAY 2001
nextgeneration digital receivers can decode the digital signal embedded within the analog signal, which may be all or part of a digital audio signal, an enhancement signal used to refine the analog signal, or simply supplemental information such as station identification or traffic information. More generally, the host signal in these hybrid transmission systems could be some other type of analog signal such as video [7], or even a digital waveformfor example, a digital pager signal could be embedded within a digital cellular telephone signal. In general, designers of information embedding systems for these kinds of applications seek to achieve high embedding rates with high levels of robustness and low levels of embeddinginduced distortion. However, in general, these three goals are conflicting. Thus, in this paper we characterize methods in terms of the efficiency with which they trade off rate, distortion, and robustness. For instance, for any minimum embedding rate requirement and maximum acceptable level of embedding distortion, the more efficient an embedding method is, the higher the robustness that can be achieved. A great many informationembedding algorithms have been proposed [1] in this still emerging field. Some of the earliest proposed methods [8], [9], [7] employ a quantizeandreplace strategy: after first quantizing the host signal, these systems change the quantization value to embed information. A simple example of such a system is the socalled lowbit(s) modulation (LBM), where the least significant bit(s) in the quantization of the host signal are replaced by a binary representation of the embedded signal. More recently, additive spreadspectrumbased methods, which embed information by linearly combining the host signal with a small pseudonoise signal that is modulated by the embedded signal, have received considerable attention in the literature as an alternative to LBMtype methods [10][13]. In this paper, we show that both LBMtype strategies and additive spread spectrum are, in general, not good choices for most information embedding and digital watermarking applications. As an alternative, this paper introduces a new class of informationembedding strategies we refer to as "quantization index modulation" (QIM) that is, in general, preferable and in many specific scenarios optimal. We further develop computationally efficient implementations of QIM in the form of what we refer to as "dither modulation." We evaluate both specific realizations of uncoded and coded QIM, and the asymptotic performance limits of coded QIM using informationtheoretic analysis. Other emerging informationtheoretic results on the digital watermarking problem are developed in, e.g., [14][20]. The specific organization of the paper is as follows. In Section II, we develop two useful equivalent models for the informationembedding problem. In Section III, we classify traditional approaches to this problem, and in the process identify some of their shortcomings. Section IV introduces the QIM class of embedding methods, and Section V develops practical realizations that are compared to corresponding implementations of traditional approaches. Next, Section VI establishes conditions under which different forms of QIM are optimal in an informationtheoretic sense. We then evaluate the methods of this paper in the context of Gaussian models for unintentional attacks in Section VII, and in the context of some general inten
Fig. 1. General informationembedding problem model. A message m is embedded in the hostsignal vector using some embedding function ( ; m). A perturbation vector corrupts the composite signal . The decoder extracts an estimate m ^ of m from the noisy channel output .
tional attack models in Section VIII. Finally, Section IX contains some concluding remarks.
II. PROBLEM MODEL Two mathematically equivalent models for the informationembedding problem are useful in our development.
A. DistortionConstrained Multiplexing Model
The informationembedding problem is naturally and gener
ally described by Fig. 1. In this figure, there is a hostsignal
vector
into which we wish to embed some informa
tion .1 We wish to embed at a rate of bits per dimension
(hostsignal sample) so we can think of as an integer in the
set
.
An embedding function maps the host signal and embedded
information to a composite signal
subject to some
distortion constraint. Various distortion measures may be of in
terest, an example of which is the squarederror distortion
(1)
or its expectation
. The composite signal is
subjected to various common signal processing manipulations
such as lossy compression, addition of random noise, and
resampling, as well as deliberate attempts to remove the
embedded information. These manipulations occur in some
channel, which produces an output signal
. For future
convenience, we define a perturbation vector to be the differ
ence
, as shown in Fig. 1; we consider cases of both
signalindependent and signaldependent perturbation vectors
in this paper.
A decoder extractsi.e., forms an estimate ofthe em
bedded information based on the channel output . We focus
primarily on the "hostblind" case of interest in most applica
tions, where is not available to the decoder, in contrast to the
"knownhost" case, where the decoder can separately observe
. (See, e.g., [14] [17] for informationtheoretic treatments of
some aspects of the knownhost case.) Our interest is in de
coders that produce reliable estimates whenever the channel is
not too severe, where reliable means either that
deter
ministically or that
for sufficiently small . In
1The vector is any convenient representation of all or part of the host signal. In the case of a host image, it could be a vector of pixel values or discrete cosine transform (DCT) coefficients, for example. In the case of a host audio waveform, this vector could be a vector of samples, spectral parameters, or linear prediction coding (LPC) coefficients, for example.
CHEN AND WORNELL: QUANTIZATION INDEX MODULATION
1425
such cases, the tolerable severity of the channel degradations is a measure of the robustness of an informationembedding system.
B. Equivalent Super
channel modelAn alternative representation of the model of Fig. 1 is
shown in Fig. 2. The two models are equivalent since any
embedding function
can be written as the sum of the
host signal and a hostdependent distortion signal
,
i.e.,
, simply by defining the distortion
signal to be
. Thus, one can view
as the input to a superchannel that consists of the cascade
of an adder and the true channel. The host signal is a
state of this superchannel that is known at the encoder. The
measure of distortion
between the composite and
host signals maps onto a hostdependent measure of the size
of the distortion signal . For example,
squarederror distortion (1) equals the power of
Fig. 2. Equivalent superchannel model for information embedding. The composite signal is the sum of the host signal, which is the state of the superchannel, and a hostdependent distortion signal. Hostinterference nonrejecting methods have the general property that the host signal is effectively a source of interference in the system, and generally result from system designs that do not allow the encoder in Fig. 2 to sufficiently exploit knowledge of the host signal . The simplest of such methods have purely additive embedding functions of the form
(2)
Therefore, one can view informationembedding problems as powerlimited communication over a superchannel with a state that is known at the encoder.2 As we will develop, this view will be convenient for determining achievable rate distortionrobustness tradeoffs of various informationembedding and decoding methods. C. Channel Models In general, the channel model is either a characterization of the degradations that can actually occur to the composite signal, or alternatively, a description of the class of degradations to which the embedder and decoder must be robust, i.e., the system is designed to work against all degradations described by this particular model. The latter viewpoint is particularly useful in the context of intentional attacks. We consider both probabilistic and deterministic channel models. In the probabilistic case, we specify the channel inputoutput relationship in terms of the
conditional probability law . Implicitly, this specification also describes the conditional probability law of the perturbation vectors against which the system must be robust since
where
is typically a pseudonoise sequence. Such em
bedding methods are often referred to as additive spreadspec
trum methods, and some of the earliest examples are described
in [24], [25], [10], [26], [11], [12]. Typically,
takes the
form
(3)
where is a unitenergy spreading vector and
is a scalar
function of the message.3
It is often convenient to view additive spreadspectrum as per
turbation of a projection. In particular, substituting (3) into (2)
and using that has unit energy, we obtain
(4)
which when projected onto we obtain
(5)
where is the corresponding projection of the host signal, i.e.,
(6)
In the deterministic case, the channel inputoutput relationship
is described most generally in terms of the set of possible out
puts
for every given input, or equivalently, in terms
of the set of desired tolerable perturbation vectors
for
every given input.
III. CLASSES OF EMBEDDING METHODS An extremely large number of embedding methods have been proposed in the literature [22], [23], [1]. Broadly, for our purposes these can be divided into two classes: 1) hostinterference nonrejecting methods and 2) hostinterference rejecting methods. 2Cox et al. have also recognized that one may view watermarking as communications with side information known at the encoder [21].
Finally, substituting (5) back into (4) yields the composite signal reconstruction from projections
(7)
From (2), we see that for this class of embedding methods,
the host signal acts as additive interference that inhibits the de
coder's ability to estimate . Consequently, even in the absence
of any channel perturbations
, one can usually embed
only a small amount of information. Thus, these methods are
useful primarily when either the host signal is available at the
decoder (as assumed in, e.g., [26]) or when the hostsignal in
terference is much smaller than the channel interference.
3Technically, spreadspectrum systems (2) for which (3) applies are classified as amplitudemodulation additive spreadspectrum methods, but since there is no risk of confusion in this paper, we will use the term "additive spreadspectrum" to specifically mean those systems based on amplitude modulation.
1426
IEEE TRANSACTION ON INFORMATION THEORY, VOL. 47, NO. 4, MAY 2001
Information embedding systems can achieve hostinterference rejection when knowledge of the host signal at the encoder is adequately exploited in system design. Examples include LBM and, more generally, quantizeandreplace systems. In LBM systems, the least significant bit(s) in the binary representation of a host sample are simply replaced with message bits. A class of quantizeandreplace systems that we refer to as generalized LBM systems implement a vector generalization of this embedding strategy. Generalized LBM embedding functions are of the form
(8)
where represents the coarse quantizer that determines the most significant bits, and is determined only by the (modulated) least significant bits. A defining characteristic of generalized LBM systems is that the embedding never alters the most significant bits of the host signal, which is expressed in terms of the constraint
(9)
Without loss of generality, we may assume that good generalized LBM quantizers are unbiased, i.e.,
(10)
One example of a generalized LBM system is that developed in [7], where LBM is effectively applied to a pseudorandom projection of the form (6). Thus, the embedding is of the form (7) where is now of the form
(11)
with a uniform, scalar quantization function of step size
and
a perturbation value. It is convenient to think of this
class of generalized LBM systems as "spread LBM" systems.
While generalized LBM systems are hostinterference re
jecting, they are unnecessarily constrained in a way that makes
them generally inefficient and vulnerable to various classes of
attacks, which in turn limits the range of applications for which
they can be used. Avoiding these constraints in the process of
developing optimal informationembedding systems naturally
gives rise to a new and general class of hostinterference
rejecting embedding methods called QIM, which we develop
in the sequel.
IV. QUANTIZATION INDEX MODULATION
To develop the QIM concept, we begin by viewing the embed
ding function
as an ensemble of functions of , indexed
by . We denote the functions in this ensemble as
to
emphasize this view. If the embeddinginduced distortion is to
be small, each function in the ensemble must be close to an iden
tity function in some sense so that
(12)
That the system needs to be robust to perturbations suggests that the points in the range of one function in the ensemble should be far away in some sense from the points in the range
2 Fig. 3. QIM for information embedding. The points marked with 's and 's belong to two different quantizers, each with its associated index. The
minimum distance d measures the robustness to perturbations, and the sizes of the quantization cells, one of which is shown in the figure, determine the distortion. = 2 If m 1, the host signal is quantized to the nearest . If m = 2, the host signal is quantized to the nearest .
of any other function. For example, one might desire at the very
least that the ranges be nonintersecting. Otherwise, even in the
absence of any perturbations, there will be some values of from
which one will not be able to uniquely determine . In fact, it is
precisely the nonintersection property that leads to hostsignal
interference rejection.
The nonintersection property along with the approximate
identity property (12), which suggests that the ranges of each
of the functions "cover" the space of possible (or at least highly
probable) hostsignal values , suggests that the functions be
discontinuous. Quantizers are just such a class of discontinuous,
approximateidentity functions. Then, "QIM" refers to embed
ding information by first modulating an index or sequence of
indices with the embedded information and then quantizing the
host signal with the associated quantizer or sequence of quan
tizers.
Fig. 3 illustrates this QIM informationembedding technique.
In this example, one bit is to be embedded so that
.
Thus, we require two quantizers, and their corresponding sets
of reconstruction points in are represented in Fig. 3 with
's and 's. If
, the host signal is quantized with the
quantizer, i.e., is chosen to be the closest to . If
,
is quantized with the quantizer.
As varies, the composite signal value varies from one
point (
) to another or from one point
to an
other, but it never varies between a point and a point. Thus,
even with an infinite energy host signal, one can determine
if channel perturbations are not too severe. The points and
points are both quantizer reconstruction points and signal con
stellation points,4 and we may view design of QIM systems as
the simultaneous design of an ensemble of source codes (quan
tizers) and channel codes (signal constellations).
Conveniently, properties of the quantizer ensemble can be re
lated directly to the performance parameters of rate, distortion,
and robustness. For example, the number of quantizers in the
ensemble determines the informationembedding rate . The
sizes and shapes of the quantization cells determine the embed
dinginduced distortion, all of which arises from quantization
4One set of points, rather than one individual point, exists for each value of m.
CHEN AND WORNELL: QUANTIZATION INDEX MODULATION
1427
error. Finally, for many classes of channels, the minimum dis where tance
is the Gaussian function
(18) (13)
between the sets of reconstruction points of different quantizers in the ensemble effectively determines the robustness of the embedding.5 It is important to emphasize that, in contrast to the case where the host signal is known at the receiver, the minimumdistance decoder needs to choose from all reconstruction points of the quantizers, not just those corresponding to the actual host signal . In particular, the minimumdistance decoder makes decisions according to the rule6 (14)
A. DistortionCompensated QIM
Distortion compensation is a type of postquantization pro
cessing that can improve the achievable rate distortionrobust
ness tradeoffs of QIM methods. To see this, we begin by noting
that for a fixed rate and a given quantizer ensemble, scaling8 all
quantizers by
increases by a factor of , thereby
increasing the robustness of the embedding. However, the em
beddinginduced distortion also increases by a factor of .
Adding back a fraction
of the quantization error to the
quantization value removes, or compensates for, this additional
distortion. The resulting embedding function is
If, which is often the case, the quantizers
map to the
nearest reconstruction point, then (14) can be rewritten as
(19)
(15) (While the minimumdistance decoder is especially convenient to implement and analyze, a variety of other potentially useful decoders are discussed in [27].) Intuitively, the minimum distance measures the size of perturbation vectors that can be tolerated by the system. For example, if channel perturbations are bounded according to7 (16) then the minimumdistance decoder is guaranteed to not make an error as long as (17)
where
is the th quantizer of an ensemble
whose reconstruction points have been scaled by so that two
reconstruction points separated by a distance before scaling
are separated by a distance after scaling. The first term in
(19) represents normal QIM embedding. We refer to the second
term as the distortioncompensation term.
The quantization error added back is a source of interfer
ence to the decoder. Typically, the
Probability Density Functions(pdfs) of the quantization error for all quantizers in the QIM en
semble are similar. Therefore, the distortion compensation term
in (19) is effectively statistically independent of and can be
treated as independent noise. Thus, decreasing leads to greater
minimum distance, but for a fixed embeddinginduced distor
tion, the distortioncompensation interference at the decoder in
creases. One optimality criterion for choosing is to maximize
the following "SNR" at the decision device:
In the case of a classical additive white Gaussian noise (AWGN) channel with a noise variance of , at high signaltonoise ratio (SNR) the minimum distance also characterizes the error probability of the minimumdistance decoder [28],
5When the host signal is known at the decoder, as is the case in some applications of interest, then the more natural minimum distance is d (x) = min ks(x; i) 0 s(x; j)k or k 0 k d = min min s(x; i) s(x; j) :
6Alternatively, if the host signal x is known at the decoder
k 0 k m^ ( ; x) = arg min
s(x; m) :
7We refer to this as the bounded perturbation channel and will revisit this deterministic channel in Section VIIIB1.
SNR
where this SNR is defined as the ratio between the squared
minimum distance between quantizers and the total interfer
ence energy from both distortioncompensation interference
and channel interference. Here, is the minimum distance
when
and is a characteristic of the particular quantizer
ensemble. One can easily verify that the optimal scaling
parameter that maximizes this SNR is
DNR (20) DNR
where DNR is the (embeddinginduced) distortiontonoise
ratio
.
As we will see, suitably coded versions of this distortion
compensated QIM with precisely the parameter setting (20) also
have important asymptotic optimality properties. Before devel
oping these properties, let us first investigate some constraints
8If a reconstruction point is at q, it is "scaled" by by moving it to q=.
1428
IEEE TRANSACTION ON INFORMATION THEORY, VOL. 47, NO. 4, MAY 2001
that are useful to impose on QIM systems to facilitate implementation.
V. DITHER MODULATION: AN IMPLEMENTATION OF QIM
A key aspect of the design of QIM systems involves the
choice of practical quantizer ensembles for such systems,
which we now explore. In the process, we obtain additional
insights into the design, performance evaluation, and imple
mentation of QIM embedding methods, particularly those of
low complexity. A convenient structure to consider is that
of socalled dithered quantizers [29], [30], which have the
property that the quantization cells and reconstruction points
of any given quantizer in the ensemble are shifted versions of
the quantization cells and reconstruction points of any other
quantizer in the ensemble. In nonwatermarking contexts, the
shifts typically correspond to pseudorandom vectors called
dither vectors. For informationembedding purposes, the dither
vector can be modulated with the embedded signal, i.e., each
possible embedded signal maps uniquely onto a different dither
vector
. The host signal is quantized with the resulting
dithered quantizer to form the composite signal. Specifically,
we start with some base quantizer , and the embedding
function is
This constraint ensures that the two corresponding
dimensional dithered quantizers are the maximum
possible distance from each other. For example, a
pseudorandom sequence of
and its negative
satisfy this constraint. One could alternatively choose
pseudorandomly with a uniform distribution over
.10 Also, the two dither sequences need
not be the same for each length block.
iii) The th block of is quantized with the dithered quantizer
using the dither sequence
.
A detailed assessment of the complexity of this QIM realization is developed in [15], [27]. The minimumdistance properties of coded binary dither modulation are readily deduced. In particular, any two distinct coded bit sequences differ in at least places, where is the minimum Hamming distance of the errorcorrection code. For each of these blocks, the reconstruction points of the corresponding quantizers are shifted relative to each other by in the th dimension. Thus, the square of the minimum distance (13) over all dimensions is
We call this type of information embedding "dither modulation." We discuss several lowcomplexity realizations of such dithermodulation methods in the sequel.
A. Coded Binary Dither Modulation with Uniform Scalar Quantization
Coded binary dither modulation with uniform, scalar quanti
zation is one such realization.9 We assume that
.
The dither vectors in a coded binary dither modulation system
are constructed as follows.
i) The
information bits
repre
senting the embedded message are errorcorrection
coded using a rate
code to obtain a coded bit se
quence
, where
(22) where to obtain the second equality we have used (21), and where, in the third line, is the gain of the errorcorrection code (23) In the high signaltodistortion ratio (SDR) regime of primary interest for highfidelity applications, the quantization cells are sufficiently small that the host signal can be modeled as uniformly distributed within each cell. In this case, the expected squarederror distortion of a uniform, scalar quantizer with step size is the familiar
(21)
(24)
(In the uncoded case,
and
.) We di
vide the host signal into nonoverlapping blocks of
length and embed the th coded bit in the th block,
as described below.
ii) Two length dither sequences
and
and
one length sequence of uniform, scalar quantizers
with step sizes
are constructed with the
constraint
Thus, the overall average expected distortion (1) is (25) Combining (22) and (25) yields the "distortionnormalized" squared minimum distance (26)
9By scalar quantization, we mean that the highdimensional base quantizer q(1) is the Cartesian product of scalar quantizers.
10A uniform distribution for the dither sequence implies that the quantization error is statistically independent of the host signal and leads to fewer "false contours," both of which are generally desirable properties from a perceptual viewpoint [29].
CHEN AND WORNELL: QUANTIZATION INDEX MODULATION
1429
Fig. 4. Dither modulation with uniform quantization step sizes.
Fig. 5. Transform dither modulation with nonuniform quantization step sizes.
a quantity that can be used to characterize the achievable performance of QIM realizations more generally, as we will develop.
B. SpreadTransform Dither Modulation
One special class of coded binary dither modulation methods
is what we refer to as spreadtransform dither modulation
(STDM). We now develop its properties and quantify its
advantages over other forms of dither modulation, over additive
spreadspectrum methods, and over spread LBM.
To introduce STDM, we begin by observing that the dis
tortionnormalized squared minimum distance (26) of binary
dither modulation with uniform scalar quantization does not de
pend on the sequence , i.e., on the distribution of the dis
tortion across samples within the length block. Thus, one is
free to choose any distribution without sacrificing
, so the
's can be chosen to optimize other characteristics of the em
bedding.
To understand this property, consider Figs. 46, each of which
show the reconstruction points of two quantizers for embedding
one bit in a block of two samples. For each of the three systems,
the minimum distance
and the average squarederror
distortion
per sampleare identical. Thus, the robust
ness against bounded perturbations is the same in each case.
However, the quantization differs in each case. In Fig. 4, where
scalar quantization is applied to each sample separately, the
quantization step sizes are the same for both samples. In Figs. 5
and 6, the samples are first pretransformed and the resulting co
efficients quantized unevenly. In particular, a unitary transform
(coordinate rotation) is applied to the pair of samples before
quantization; the first transform coefficient is the component of
the host signal in the direction of depicted. In Fig. 5, the step
size for quantizing the first transform coefficient is larger than
that used to quantize the second transform coefficient, which
lies in the direction orthogonal to . Finally, in the extreme case
of Fig. 6, the step size for the first coefficient is larger still, and
that for the second coefficient is zero, i.e., all embedding occurs
in the first coefficient. In this case, the reconstruction points be
come reconstruction lines, so to embed a bit, the host signal
is quantized to the nearest point on a line labeled with a . To
embed a bit, the host signal is quantized to the nearest point
on a line labeled with a .
Fig. 6. Transform dither modulation with quantization of only a single transform component. The quantization step size for the component of the host signal orthogonal to v is zero. While the three systems corresponding to Figs. 46 have the same minimum distance, the number of perturbation vectors of minimum length that cause decoding errors is higher for the case of Fig. 4 than for the case of Fig. 6. (For intermediate cases such as the one shown in Fig. 5, where quantization step sizes in different dimensions are different but nonzero, the number of perturbation vectors of minimum length that cause decoding errors is the same as in Fig. 4, but these vectors are not orthogonal.) Thus, for probabilistic channels, such as additive noise channels, the probability of error is generally different in each case. For example, suppose a bit is embedded and the composite signal is the point labeled with in Figs. 4 and 6. If the channel output lies in the decision region defined by the dashed box in Fig. 4 and defined by the two dashed lines in Fig. 6, then the decoder will correctly determine that a bit was embedded. If the perturbation vector places the channel output outside the decision region, however, the decoder will make an error with very high probability. (There is some possibility that the channel output is outside the decision region but is still closer to a point other than than to the closest . These events, however, are very unlikely for many perturbation
probability distributions that are of practical interest.) Since the decision region of Fig. 6
1430
IEEE TRANSACTION ON INFORMATION THEORY, VOL. 47, NO. 4, MAY 2001
contains the decision region of Fig. 4, it follows that the probability of a correct decision in the case of nonuniform quantization step sizes is higher. The unitary transform in the case of Fig. 6 not only facilitates a comparison of Figs. 4 and 6, but also serves to spread any embeddinginduced distortion over frequency and time/space when a peak distortion constraint is imposed, for example. Although the distortion is concentrated in only one transform coefficient, if the energy of is spread over space/time and frequencyfor example, if is chosen pseudorandomlythen the distortion will also be spread. As we will see in subsequent sections of this paper, dithermodulation methods have considerable performance advantages over previously proposed additive spreadspectrum and spread LBM methods in a variety of contexts. However, much effort has already been invested in optimizing both additive spreadspectrum and spread LBM systems, for example, by exploiting perceptual properties of the human visual and auditory systems or designing receiver frontends to mitigate effects of geometric and other distortions. An additional advantage of STDM specifically over other forms of dither modulation is that one can easily convert existing additive spreadspectrum and spread LBM systems into STDM systems while retaining the other optimized components of the system. In particular, it suffices to replace the addition step of additive spread spectrum, i.e., (5), or the quantizeandreplace step of spread LBM, i.e., (11), with the dithered quantization step of STDM, i.e.,
where
so that the expected distortion in both
cases is the same, and where we have used the fact that
and are chosen such that
.
The decoder in both cases makes a decision based on , the
projection of the channel output onto . In the case of additive
spread spectrum,
, while in the case of STDM,
, where is the projection of the perturbation
vector onto . We let be some measure of energy. For
example,
in the case of a deterministic variable , or
when is random. The energy of the interference
or "noise" is
for additive spread spectrum, but only
for STDM, i.e., the hostsignal interference for STDM is
zero. Thus, the SNR at the decision device is
SNR
for additive spread spectrum and
SNR
for STDM, where the "signal" energies
and
are given by (28) and (29). Thus, the advantage of STDM over additive spread spectrum is
SNR
(27)
SNR
(30)
SNR Advantage of STDM: In this section, we quantify the
performance gain of STDM over additive spread spectrum and
spread LBM from an SNR perspective that applies to a broad
range of contexts. We focus our analysis on the representative
case of embedding one bit in a length block using a unit
length spreading vector . Because, as (5), (11), and (27) re
flect, in each case the embedding occurs entirely in the projec
tion of onto , a onedimensional problem results. In addition,
because all of the embeddinginduced distortion occurs only in
the direction of , the distortion in each case also has the same
temporal/spatial distribution and frequency distribution. Thus,
one would expect that any perceptual effects due to time/space
masking or frequency masking are the same in each case. There
fore, squarederror distortion and SNRtype measures are more
meaningful measures of distortion when comparing these em
bedding methods than one might expect in other more general
contexts where squarederror distortion may fail to capture cer
tain perceptual effects.
SNR avantage of STDM over additive spread spec
trum: Considering the case of additive spreadspectrum first,
since
in (5), we have
which is typically very large since the channel perturbations
are usually much smaller than the host signal if the channel
output is to be of reasonable quality. For example, if the host
signaltochannelnoise ratio is 30 dB and and are uncor
related, then the SNR advantage (30) of STDM over additive
spread spectrum is 28.8 dB.11
SNR advantage of STDM over spread LBM: Spreadtrans
form dither modulation methods also have an SNR advantage
over spread LBM methods. As we show in Appendix A, the
distortionnormalized squared minimum distance (26) of LBM
is
2.43 dB worse than that of dither modulation in the
case of coded binary embedding with uniform, scalar quantiza
tion. Thus, for a fixed rate and embeddinginduced distortion,
the squaredminimum distance, and hence the SNR at the deci
sion device, for spread LBM will be 2.43 dB worse than that of
STDM, i.e.,12
SNR SNR
2.43 dB
(31)
This SNR advantage is illustrated in Fig. 7, where the quantizer reconstruction points and embedding intervals for both spread
For STDM (27)
(28) (29)
4 3 = ~ 0 11Note that while the high SDR approximation (30) predicts that STDM is worse than additive spread spectrum by a factor of = 1.25 dB when x
( ) = 61 4 (as would be the case, for example, if the host signal had very little energy in
the direction of v), in fact, if one chooses d m
= then it is straightfor
ward to verify that STDM performs as well as additive spread spectrum in this
low SDR regime.
2 ! 1 12Appendix A also shows that for M ary embedding the SNR gain grows to
(3 dB) as M
.
CHEN AND WORNELL: QUANTIZATION INDEX MODULATION
1431 and a memoryless channel with known pdf where and are the th components of and , respectively.13 Then, the superchannel is also memoryless and has probability law
Fig. 7. Spreadtransform dither modulation versus spread LBM. The embedding interval boundaries of spread LBM, which are shown with solid 2 lines, are the same for both points and points. In contrast, in the case of 2 STDM, the point embedding intervals, shown by solid lines, differ from the point embedding intervals, shown by dashed lines. An SNR advantage of 7=4 = 2.43 dB for STDM results. LBM and STDM are shown, with the same embeddinginduced squarederror distortion for both cases. The preceding analysis establishes some important advantages of QIM methods over common informationembedding methods. In fact, it turns out that QIM methods are asymptotically optimal in many key scenarios of interest. To develop these results, we next examine information embedding within an informationtheoretic framework. VI. INFORMATIONTHEORETIC OPTIMALITY OF QIM This section explores the best possible ratedistortionrobustness performance that one could hope to achieve with any informationembedding system. Our analysis leads to insights about some properties and characteristics of good informationembedding methods, i.e., methods that achieve performance close to the informationtheoretic limits. In particular, a canonical "hidden QIM" structure emerges for information embedding that consists of 1) preprocessing of the host signal, 2) QIM embedding, and 3) postprocessing of the quantized host signal to form the composite signal. One incurs no loss of optimality by restricting one's attention to this simple structure. We also derive sufficient conditions under which only distortion compensation postprocessing is required. As we develop in Sections VII and VIII, these conditions are satisfied in several important cases of practical interest. A. Communication over Channels with Side Information The superchannel model of Section IIB and Fig. 2 facilitates our analysis, i.e., we view information embedding as the transmission of a hostdependent distortion signal over a superchannel with side information or state that is known at the encoder. In this section, we also restrict our attention to a squarederror distortion constraint
The capacity [31] of this superchannel is the reliable informationembedding rate that is asymptotically achievable with long signal lengths . In nonwatermarking contexts, Gel'fand and Pinsker [32] and Heegard and El Gamal [33] have determined the capacity of such a channel in the case of a random state vector with independent and identically distributed (i.i.d.) components when the encoder sees the entire state vector before choosing the channel input . In this case, the capacity is (32)
where
denotes mutual information and is an auxiliary
random variable. Since
we can think of
in (32) as being generated from , and, in turn, from and .
While the mapping from to is, in general, probabilistic, from
convexity properties of mutual information, one can deduce that
the maximizing distribution in (32) always has the property that
is a deterministic function of
[32].
In the case of watermarking, the maximization (32) is subject
to a distortion constraint
. A
formal proof of the ex
tension of (32) to include this constraint is developed in [20].
Other researchers [18], [19], [16] are working on extending or
have extended these results to the case where the channel law
is not fixed but rather is chosen by an attacker subject to a
distortion constraint. A related informationtheoretic formula
tion can be found in [14].
As we shall see in the next section, one way to interpret (32)
is that
is the total number of bits per hostsignal sample
that can be transmitted through the channel, and
is the
number of bits per sample that are allocated to the host signal .
The difference between the two is the number of bits per host
signal sample that can be allocated to the embedded information
.
1) Hidden QIM: As we show in this subsection, one can
achieve the capacity (32) by a type of "hidden" QIM, i.e., QIM
that occurs in a domain represented by the auxiliary random
variable . One moves into and out of this domain with pre
and postquantization processing.
13Extension of results in this section to the case where the channel is only blockwise memoryless is straightforward by letting y and s be the ith blocks, rather than ith scalar components, of y and s. In this case, information rates are measured in bits per block, rather than bits per sample.
1432
IEEE TRANSACTION ON INFORMATION THEORY, VOL. 47, NO. 4, MAY 2001
Fig. 8. Capacityachieving "hidden QIM." One embeds by choosing a codeword u that is jointly distortiontypical with from the mth quantizer's codebook. The distortion function is e (u; x). The decoder finds a codeword that is jointly typical with . If this codeword is in the ith subset, then m ^ = i.
To develop this optimality of hidden QIM, we begin by
adding an interpretation in terms of quantization (source
coding) to the proof of the achievability of capacity by Gel'fand
and Pinsker [32], the result of which is summarized as fol
lows. Fig. 8 shows an ensemble of
quantizers, where
, where each source codeword
(quantizer reconstruction vector) is randomly drawn from
the i.i.d. distribution , which is the marginal distribution
corresponding to the hostsignal distribution and the max
imizing
conditional distributionfrom (32). Although
the source codebooks are, therefore, random, both the encoder
and decoder, of course, know the codebooks. Each codebook
contains
codewords so there are
codewords total.
QIM embedding in this domain corresponds to finding a
vector in the th quantizer's codebook that is jointly distor
tiontypical with and generating
By distortiontypical, we mean that and are jointly typical
and
, i.e., the function
is the
distortion function in the domain. Since the th quantizer's
codebook contains more than
codewords, the proba
bility that there is no that is jointly distortiontypical with
is small.14 Thus, the selection of a codeword from the th
quantizer is the quantization part of QIM, and the generation of
, and, therefore,
, from the codeword and is the
postquantization processing.
The decoder finds a that is jointly typical with the channel
output and declares
if this is in the th quantizer's
codebook. Because the total number of codewords, is less than
, the probability that a other than is jointly typical
with is small. Also, the probability that is jointly typical with
is close to .15 Thus, the probability of error
is
small, and we can indeed achieve the capacity (32) with QIM in
the domain.
The remaining challenge, therefore, is to determine the right
preprocessing and postprocessing given a particular channel
(attack) . As mentioned above, for a number of important
cases, it turns out that the only processing required is postquan
tization distortion compensation. We discuss these cases in the
next subsection.
2) Optimality of DistortionCompensated QIM: When
distortioncompensated QIM (DCQIM) as introduced in Sec
tion IVA is viewed as an instance of hidden QIM, we obtain
that is a quantized version of . We show in this section
that suitably coded versions DCQIM can achieve capacity
whenever the maximizing distribution
in (32) is of a
form such that the postprocessing is linear, i.e., when, without
loss of generality, is generated according to
(33)
To see that DCQIM can achieve capacity when the maximizing pdf in (32) satisfies (33), we show that one can construct an ensemble of random DCQIM codebooks that satisfy (33). First, we observe that quantizing is equivalent to quantizing with a scaled version of the quantizer and scaling back the result, i.e.,
(34)
where
is as defined following (19). Then, rearranging
terms in the DCQIM embedding function (19) and substituting
(34) into the result, we obtain
(35)
We construct our random DCQIM codebooks by choosing
the codewords of
from the i.i.d. distribution ,
the one implied by the maximizing pdf in (32) together with
the host pdf . (Equivalently, we choose the codewords of
in (19) from the distribution of .) Our quan
tizers
choose a codeword that is jointly distor
tiontypical with . The decoder looks for a codeword in all
of the codebooks that is jointly typical with the channel output.
Then, following the achievability argument of Section VIA1,
we can achieve a rate
. From (35), we see that
Since
, we see that
. Thus, if
the maximizing distribution in (32) satisfies (33), our DCQIM
codebooks can also have this distribution and, hence, achieve
capacity (32).
As a final comment, it is worth emphasizing that QIM sys
tems are optimal in other important scenarios as well. As one ex
ample, in the noisefree case
, which arises, for example,
14This principle is, of course, one of the main ideas behind the ratedistortion
15These principles are, of course, two of the main ideas behind the classical
theorem [31, Ch. 13].
channel coding theorem [31, Ch. 8].
CHEN AND WORNELL: QUANTIZATION INDEX MODULATION
1433
when a discretevalued composite signal is transmitted over a digital channel with no errors, QIM is optimal even without distortion compensation, and achieves capacity [27] (36) As a second example, and as shown in [27], QIM is optimal even when the host signal is also available at the decoder achieving the capacity (37) determined by Heegard and El Gamal [33]. We next examine some key scenarios when the optimality condition (33) is met.
Remarkably, the capacity is independent of the signal variance
and, in fact, as we shall discuss later in this section, is the
same as in the case when the host signal is known at the
decoder. Note that this implies that an infinite energy host
signal causes no decrease in capacity in this Gaussian case, i.e.,
good informationembedding systems can completely reject
hostsignal interference in the Gaussian case.
Based on our earlier results, to establish the optimality of
DCQIM for this channel, it suffices to verify that (33) is sat
isfied. This follows from the proof [34] of (38). In particular, as
shown in [34], the pdf that maximizes (32) is indeed one im
plied by (33), for some parameter , where is chosen as a
function of so that
and so that the pair and
are independent. To see this, note that for a fixed value of , an
achievable rate
is [34]
VII. GAUSSIAN CHANNELS In this section, we examine the ultimate performance limits of informationembedding methods when both the host signal is white and Gaussian, the channel is an AWGN channel, and the host and channel noise are independent of one another. Extensions to colored host and/or colored channel cases are developed in [15], [27]. Our main result of the section is that DCQIM is optimal for this class of channels, and that, in addition, the optimum distortioncompensation parameter is also given by (20), which maximized SNR in uncoded DCQIM systems. In general, the embedding strategies optimized for Gaussian channel models can be expected to be good designs for a variety of applications in which one primarily requires robustness against unintentional attacks.16 And while Gaussian host models are not always accurate, the better the hostsignal interference rejection properties of an informationembedding system, the smaller the role we might expect the hostsignal model to play in determining the ultimate performance of such systems. A. Capacities and the Optimality of DCQIM Specializing the formulation of Section VIA to the Gaussian scenario of interest, with the zeromean, variance variables denoting elements of the dimensional hostsignal vector , and, similarly, the zeromean, variance variables denoting elements of the corresponding noise vector , the distortion constraint can be expressed as
with the corresponding constraint on
in (32) being
. We see that squarederror distortionconstrained,
Gaussian information embedding is equivalent to powercon
strained communication over a Gaussian channel with Gaussian
side information known at the encoder, a case for which Costa
[34] has determined the capacity to be, expressed in terms of
the (embedding induced) DNR
DNR
DNR
(38)
16Indeed, these models can even apply to optimal, i.e., ratedistortion achieving [31], lossy compression of a Gaussian source, as discussed in [27].
which can also be written in terms of the DNR and the host SNR
(SNR
)
DNR DNR SNR
DNR SNR DNR
SNR (39)
This rate is maximized by setting (cf. (20)) DNR (40) DNR from which we conclude that the rate (38) is achievable. To establish that (38) is also the maximum achievable rate, it suffices to show that it is the capacity when is known at the decoder, since one obviously cannot do better in the hostblind case. To develop the knownhost capacity, first recall that the capacity is given by (37). Again, the maximization is subject to a distortion constraint, which in the case of white noise is . Because subtracting a known constant from does not change mutual information, we can equivalently write
Noting that
, we immediately conclude that
in the case of an AWGN channel the knownhost capacity is
indeed given by (38), where the maximizing distribution is
a zeromean Gaussian distribution with variance .
In the knownhost case, additive spread spectrum is optimal,
and optimal additive spreadspectrum systems superimpose
zeromean i.i.d. Gaussian sequences with variance onto the
host signal. However, it is important to note that QIM is also
optimal in this case as wellas discussed in [15], quantizers
of optimal QIM systems have reconstruction sequences
chosen i.i.d. from a zeromean Gaussian distribution with
variance
. Hence, yet another attractive property of
QIM methods is that they are optimal in more general Gaussian
broadcast scenarios, where some intended recipients of the
embedded information know the host signal and some do not.
As a final comment, several of the methods we have dis
cussed can be optimal in the small hostsignal interference sce
1434
IEEE TRANSACTION ON INFORMATION THEORY, VOL. 47, NO. 4, MAY 2001
nario
. In fact, the capacity (38) is rather immediate
in this scenario: Fig. 2 reduces to the classical communication
problem considered in, e.g., [31],
, so that the capacity is
the usual mutual information between
and maximized
over all
such that
. In the AWGN channel
case, specifically, (38) results. Examining (39) in the associated
regime SNR
, we see that distortioncompensated QIM
with any , including
(regular QIM), is optimal in this
small host interference scenario. As one might expect, additive
spreadspectrum systems can be capacityachieving in this limit
as well, which we will see more explicitly in Section VIIC4.
B. Capacities for Hybrid Transmission In this section, we consider scenarios corresponding to applications in which information embedding is part of a hybrid transmission scheme. We investigate two classes of such schemes: analogdigital and digitaldigital transmission. In the former class, the host is an analog signal, as arises in, for example, the digital audio broadcasting application. In the latter class, the host signal is itself a digital signal, which has implications for broadcast transmission and related applications [31, Ch. 14]. In both cases, one is generally most concerned with the quality of the received signals, i.e., the channel output, rather than the channel input (composite signal). 1) Analog Host Signals: In this subsection, we determine how reliable embedding at a given rate impacts the quality with which an analog host signal is received and can be decoded with its conventional receiver from a noisy channel. In general, the effect of the embedding is to create an additional noise source DNR times as strong as the channel noise, and, therefore, the received signal quality drops by a factor of DNR or
DNR dB
(41)
For example, in the scenario analyzed in Section VIIA, op
timum DCQIM results in an embeddinginduced distortion that
looks like white noise with variance . With no embedding,
one would have had a received host SNR of SNR
.
Due to the additional interference from the embeddinginduced
distortion, however, the received host SNR drops to
SNR DNR a drop of DNR. Since the capacity in bits per dimension (bits per hostsignal sample) is given by (38), and there are two independent hostsignal samples per second for every hertz of hostsignal bandwidth [28], the capacity in bits per second per hertz (b/s/Hz) is
DNR b/s/Hz
(42)
Taking the ratio between (41) and (42), we see that the "value" in embedded rate of each decibel drop in received hostsignal quality is
DNR DNR
0.3322 b/s/Hz/dB (43)
Thus, the available embedded digital rate in bits per second depends only on the bandwidth of the host signal and the tolerable degradation in received hostsignal quality, and is approximately 1/3 b/s for every hertz of bandwidth and every decibel drop in received host SNR. It is worth noting that, as developed in [15], [27], these results carry over to the case of colored host and/or colored channel cases as well. Additional insights into the performance limits of such systems when the digital signal is specifically information for refining the analog signal, as arises in applications involving the upgrading of analog infrastructure, are developed in [20]. 2) Coded Digital Host Signals: When the host signal is a coded digital signal, an alternative measure of the received hostsignal quality is the capacity of the corresponding host digital channel. For example, in the case of white noise and a white host signal,17 if there were no embedding, the capacity corresponding to a host digital signal power of and a noise variance of would be
SNR
Embedding an additional digital signal within the host digital signal drops the host digital capacity to SNR DNR due to the drop in received host SNR of DNR. Unlike in the case of an analog host signal, if one must actually lower the rate of the coded host digital signal as a result of the embedding, then one may have to redesign both the digital encoder that generates this coded digital host signal and the corresponding decoder. Thus, depending on the designed noise margin of the original digital host signal, backward compatibility may or may not be possible. However, even when digitaldigital transmission cannot be backwardcompatible, using information embedding for simultaneous transmission of two digital signals is potentially attractive from the point of view of complexity and privacy. In particular, the decoder for the host signal need not decode (nor know how to decode) the embedded signal, and vice versa. As discussed further in [27], this is qualitatively different behavior from the superposition coding and successive cancellation decoding one might otherwise use for simultaneous transmission of two digital signals, where one of the receivers needs to decode both messages to receive its own. Interestingly, the informationembedding approach is equally efficient. To see this, we note that the embedded digital channel rate is given by (38)
DNR so that the combined rate of the two channels is DNR SNR
Since the associated expended power is
, we conclude
that this digitaloverdigital transmission strategy is indeed ef
17As is well known [31], white Gaussian coded signals are capacityachieving for transmission over AWGN channels, so this is a good model for the host signal in this case.
CHEN AND WORNELL: QUANTIZATION INDEX MODULATION
1435
ficient: the combined rate
is as large as the achievable
of spreadtransform QIM by appropriately modifying
rate using a single digital signal with this same total power.
(44) to obtain
C. Gaps to Capacity
In Section VIIA, we saw that DCQIM is a capacity
achieving strategy. In this section, for comparison, we eval
uate the degree to which specific strategies such as regular
QIM (i.e., without distortion compensation), coded additive
spreadspectrum, uncoded STDM, and uncoded generalized
LBM can each approach capacityand hence the performance
of DCQIMwhen suitably optimized. We quantify the perfor
mance of these systems in terms of the additional DNR required
to achieve the same rate as a capacityachieving system.
1) Regular QIM Gap to Capacity: As we now show, the per
formance of the best QIM methods without distortion compen
sation can approach the Gaussian capacity at high rates and is
within 4.3 dB of capacity at low rates, indicating that the QIM
class is large enough to include very good embedding functions
and decoders.
To develop a lower bound on the achievable rate of QIM
without distortion compensation, we begin by specializing (39)
to the case
, resulting in
DNR SNR
DNR
(44)
DNR SNR
DNR
DNR SNR DNR SNR
DNR
(45)
To upperbound the gap between QIM and capacity we first recognize from (45) that the minimum DNR required for QIM to achieve a rate asymptotically with large is
DNR
(46)
which is minimized at even in the limit of large sets
to have
.19 However, . Thus, if one
(47)
then (46) remains a valid upper bound on the required DNR
for a QIM method to achieve a rate . From (38) we see that
the minimum DNR required for a capacityachieving method to
achieve a rate is
, which when combined
with (46) yields the following upper bound between QIM and
the Gaussian capacity:
where to achieve this bound we choose reconstruction points
from the pdf implied by (33).18 The righthand side of (44) is
generally not the capacity of QIM, howeveri.e., QIM systems
can achieve a rate greater than the lower bound (44). Indeed, the
righthand side of (44) actually approaches in the limit of
low DNR.
A tighter lower bound is obtained by developing a dif
ferent lower bound on the capacity of a particular subclass
of QIM methods we refer to as "spreadtransform QIM."
In spreadtransform QIM, which is a generalization of
STDM as developed in Section VB, the hostsignal vector
is projected onto
orthonormal vec
tors
to obtain transformed hostsignal
samples
which are quantized using QIM.
Because projection onto the vectors represents a change of
orthonormal basis, the transformed hostsignal samples and
the transformed noise samples
, which are the
projections of the original noise vector
onto
the orthonormal vectors , are still independent, zeromean,
Gaussian random variables with the same variance as the
original host signal and noise samples, respectively. However,
if the distortion per original hostsignal sample is , then the
distortion per transformed hostsignal sample is
. Thus,
we obtain a "spreading gain" of in terms of DNR, but the
number of bits embedded per original hostsignal sample is
only
times the number of bits embedded per transformed
hostsignal sample. Thus, one can determine an achievable rate
= N 18The pdf of the reconstruction points u s in this case is (0; D + ), N 0 which is not the same as the wellknown ratedistortion optimal pdf [31] for quantizing Gaussian random variables, which is (0; D ).
DNR (48) DNR
This expression is plotted in Fig. 9, where is given by (47).
We now examine the asymptotic limits of (48) at low and high
rates. Equation (47) implies
in the limit
of small , so in this limit (48) approaches
DNR DNR
as
Thus, the gap is at most a factor of (approximately 4.3 dB) in the limit of low rates. In the limit of large , (47) implies so (48) approaches
DNR as DNR
Thus, QIM asymptotically achieves capacity at high embedding rates. As we described in Section VIIB, in hybrid transmission applications one may be concerned about the degradation to the received host signal, which is DNR rather than DNR. The
19Note that since N + 0:5
N round
N 0 0:5
one can, indeed, approach this optimum spreading gain L in the limit of large
N even though N=L
need be a positive integer less than or equal to N .
1436
IEEE TRANSACTION ON INFORMATION THEORY, VOL. 47, NO. 4, MAY 2001
Fig. 9. DNR gap between spreadtransform QIM and Gaussian capacity Fig. 10. Received host SNR gap (1 + DNR) between spreadtransform QIM
(achieved by DCQIM). The maximum gap is a factor of e ( 4.3 dB).
and capacity (achieved by DCQIM).
gap in DNR (48) is larger than the gap in a corresponding upper bound
DNR , which has
DNR DNR
This gap is plotted in Fig. 10 as a function of , the rate in
b/s/Hz. Again, is given by (47) since minimizing DNR
also minimizes DNR . Thus, for example, at the (near)
worst case digital rate of 1 b/s/Hz using QIM requires at most 1.6
dB more drop in analog channel quality than the approximately
3dB drop required for DCQIM (Section VIIB1).
2) Uncoded STDM Gap to Capacity: The results above can
be compared to the achievable performance of uncoded binary
STDM with uniform scalar quantization as a minimalcom
plexity realization of QIM.
The gap between uncoded STDM and capacity can easily
be quantified for low rates
, which are typical in
many applications, at a given probability of error. A straightfor
ward union bound on the biterror probability of uncoded binary
STDM with uniform scalar quantization is (see Fig. 6)
Fig. 11. Uncoded STDM gap to Gaussian capacity. The solid curve shows the biterror probability for uncoded STDM as a function of ratenormalized distortiontonoise ratio (DNR ). The dashed curve is the minimum required DNR for reliable information embedding for any embedding method.
For low embedding rates minimum required DNR error is
,
so the
for arbitrarily low probability of
DNR
1.4 dB
(50)
This bound is reasonably tight for low error probabili
ties, and from (26) we can write this probability of error
in terms of the ratenormalized distortiontonoise ratio
DNR
DNR
DNR
DNR
(49)
A capacityachieving method can achieve arbitrarily low prob
ability of error as long as
, which using (38) can
be expressed as
DNR
The probability of error of STDM is plotted as a function of DNR in Fig. 11. The required DNR for a given can be compared to (50) to determine the gap to capacity. For example, at an error probability of , uncoded STDM is about 13.6 dB from capacity. One can reduce this gap by at least 9.3 dB through channel coding, vector quantization, and nondithered quantization. The remaining gap (at most 4.3 dB) is the gap between QIM and capacity and can be closed with distortion compensation. As shown in [15], [27], it is fairly easily to close the gap between uncoded STDM (with uniform scalar quantizers) and capacity by about 6 dB using practical channel codes and distortion compensation. 3) Uncoded Spread LBM Gap to Capacity: The gap to capacity for uncoded binary spread LBM based on uniform, scalar
CHEN AND WORNELL: QUANTIZATION INDEX MODULATION
1437
quantization also follows readily from the results of Appendix
A, which shows that the distortionnormalized minimum dis
tance for this form of spread LBM is a factor of
2.43 dB
worse than that of STDM (26). Thus, the LBM counterpart to
(49) is that the biterror probability of uncoded spread LBM is
DNR
(51)
Thus, the gap to capacity of uncoded binary spread LBM at an
error probability of
is about 16 dB, 2.4 dB more than the
13.6dB gap of uncoded binary STDM. Furthermore, as also
discussed in Appendix A, for ary implementations the gap
widens by an additional 0.6 dB as
.
4) Coded Additive SpreadSpectrum Gap to Capacity: For
additive spread spectrum, where
, the distortion
signal in Fig. 2 is not a function of the host signal:
. Thus,
. The distortion constraint
is still
so that in the Gaussian case considered here,
the achievable rate of an additive spreadspectrum method is the
wellknown [31] Gaussian channel capacity, treating both and
as interference sources20
a uniform distribution over the interval case
. In this
where the quantization error is uniformly distributed over the
interval
and statistically independent of [29].
Thus, the achievable rate
is slightly lower than the
case where is Gaussian. The entropy power inequality can be
used to show that the decrease in achievable rate is bounded by
[36]
DNR (54) DNR
This gap approaches the upper limit of
0.2546
b/dimension as the DNR gets large. For any finite DNR, the gap
is smaller. By subtracting the upper bound on the gap (54) from
the capacity (38), one obtains a lower bound on the achievable
rate of this type of dither modulation
DNR
(55)
DNR SNR (52)
where, again, SNR is the ratio between the hostsignal variance and the channelnoise variance. Comparing (52) to (38), we see that the gap to capacity of additive spread spectrum is
DNR
SNR
(53)
DNR
which is typically large, since SNR must be large so that
channel noise will not excessively degrade signal quality.
In fact, in the high signaltodistortion rastio (SDR) limit
where
, the achievable rate of additive spreadspec
trum (52) clearly approaches zero, again reflecting the inability
of additive spreadspectrum methods to reject hostsignal
interference like other methods.
At the opposite extreme, when SNR
the host inter
ference is small so the gap (53) disappears, and, indeed, addi
tive spread spectrum is an optimum embedding strategy for this
case, along with both DCQIM and QIM as discussed at the end
of Section VIIA.
The other scenario in which additive spread spectrum can be
optimal is when the host is known at the decoder, which also
corresponds to a noninterfering host situation.
5) KnownHost Case: As discussed at the end of Sec
tion VIIA, both capacityachieving QIM and capacityachie
ving additive spreadspectrum methods exist when the host
signal is known at the decoder. Although QIM realizations in
the form of coded dither modulation with uniform, scalar quan
tization are not optimal in this case, for AWGN channels one
can achieve performance within
1.53 dB of capacity
as we show below. We consider the case of dither signals with
20This rate is also the capacity when n is nonGaussian, but still independent of s, and a correlation detector is used for decoding [35].
Thus, dither modulation with uniform scalar quantization in this
case is at most
1.53 dB from capacity.
VIII. INTENTIONAL ATTACKS We now turn our attention from AWGN channel models for unintentional attacks, to some
alternative models for intentional attacks. Intentional, distortionconstrained attacks may be encountered in copyright, authentication, and covert communication applications. In these kinds of applications, attackers generally attempt to remove or alter the embedded information, and face a distortion constraint on their signal manipulations so that the integrity of the host signal is not compromised. An attacker's ability to prevent reliable watermark decoding depends on the amount of knowledge that the attacker has about the embedding and decoding processes. To limit such knowledge, some digital watermarking systems use keys, parameters that allow appropriate parties to embed and/or decode the embedded signal. The locations of the modulated bits and the pseudonoise vectors in an additive spreadspectrum and generalized LBM systems are examples of keys. If only certain parties privately share the keys to both embed and decode information, and no one else can do either of these two functions, then the watermarking system is a privatekey system. Alternatively, if some parties possess keys that allow them to either embed or decode, but not both, then the system is a publickey system since these keys can be made available to the public for use in one of these two functions without allowing the public to perform the other function. However, in some scenarios it may be desirable to allow everyone to embed and decode watermarks without the use of keys. For example, in a copyright ownership notification system, everyone could embed the ASCII representation of a copyright notice such as, "Property of ..." in their copyrightable works. Such a system is analogous to the system currently used to place copyright notices in (hardcopies of) books, a system in which there is no
1438
IEEE TRANSACTION ON INFORMATION THEORY, VOL. 47, NO. 4, MAY 2001
need for a central authority to store, register, or maintain separate keysthere are noneor watermarksall watermarks are English messagesfor each user. The widespread use of such a universally accessible "nokey" system requires only standardization of the decoder so that everyone will agree on the decoded watermark, and hence, the owner of the copyright. We analyze both privatekey and nokey systems in the sequel, and establish the attractiveness of QIM in both cases. A. Attacks on PrivateKey Systems Although the attacker does not know the key in a privatekey scenario, he or she may know the basic algorithm used to embed the watermark. In [16], Moulin and O'Sullivan model such a scenario by assuming that the attacker knows the codebook distribution, but not the actual codebook. As we now develop, exploiting results of Moulin and O'Sullivan in this privatekey scenario, we determine that DCQIM methods are optimal (capacityachieving) against squarederror distortionconstrained attackers. Moulin and O'Sullivan have derived both the capacityachieving distribution and an explicit expression for the capacity (32) in the case where the host is white and Gaussian and the attacker faces an expected perturbation energy constraint . In this case, the capacity is [16] DNR
SNR SNR
DNR DNR
where DNR
is the distortiontoperturbation
ratio and SNR
is the hostsignaltoperturba
tion ratio. The maximizing distribution is such that [16]
with
statistically independent of and
DNR (56) DNR
Since this distribution satisfies the condition (33), we can infer from our analysis in Section VIA2 that DCQIM can be used to achieve capacity against these attacks. Moreover, (56) gives the optimal distortioncompensation parameter. Moulin and O'Sullivan have also considered the case of host signals that are not necessarily Gaussian but that have zero mean, finite variance, as well as bounded and continuous pdfs. In the limit of small (high SDR) and , a limit of interest in highfidelity applications, the capacity approaches

DNR
and the capacityachieving distribution is such that

where, again,
is statistically independent of
[16]. Since this distribution satisfies the condition (33), we can
again conclude that distortioncompensated QIM can achieve
capacity in this highfidelity limit. The capacityachieving distortioncompensation parameter is [16]
DNR

DNR
B. Attacks on NoKey Systems In contrast to the scenario above, in nokey systems an attacker has full knowledge of the embedding and decoding processes, including all codebooks. For this case, some deterministic models we develop in this section are better for characterizing the associated worst case intheclear (i.e., fully informed) attacks. With these models, we show that QIM methods in general, and dither modulation in particular, are robust and achieve provably better rate distortionrobustness tradeoffs than both additive spreadspectrum and generalized LBM techniques. We consider two models for such attackers: 1) a bounded perturbation channel model in which the squarederror distortion between the channel input and channel output is bounded and 2) a bounded hostdistortion channel model in which the squarederror distortion between the host signal and channel output is bounded. In each case, we develop conditions under which errorfree decoding is possible with various implementations of QIM and DCQIM, and quantify their advantages over the corresponding realizations of additive spread spectrum and generalized LBM. 1) Bounded Perturbation Channel: The bounded perturbation channel is one in which the attacker can perturb the composite signal in any way it desires (based on its full knowledge of the composite signal and the embedding algorithm), provided the energy in the perturbation vector does not exceed a prescribed level, i.e., (16), which reflects a requirement that the attacker not excessively degrade the original composite signal. Thus, this channel model imposes only a maximum distortion21 or minimum SNR constraint between the channel
input and output. Binary dither modulation with uniform scalar quantization: One can combine the guaranteed errorfree decoding condition (17) for a minimumdistance decoder (15) with the distortionnormalized minimum distance (26) of binary dither modulation with uniform scalar quantization to compactly express its achievable performance as
(57) or, equivalently, its achievable rate as (58) One can view the achievable rate (58) as the deterministic counterpart to the more conventional notions of achievable rates and capacities of random channels discussed in Sections VI and VII. 21Some types of distortion, such as geometric distortions, can be large in terms of squared error, yet still be small perceptually. However, in some cases, these distortions can be mitigated either by preprocessing at the decoder or by embedding information in parameters of the host signal that are less affected (in terms of squared error) by these distortions. For example, a simple delay or shift may cause large squared error, but the magnitude of the discrete Fourier transform coefficients are relatively unaffected.
CHEN AND WORNELL: QUANTIZATION INDEX MODULATION
1439
Additive spread spectrum: The nonzero minimum distance of QIM methods offers quantifiable robustness to perturbations, even when the host signal is not known at the decoder. In contrast, additive spreadspectrum methods offer relatively little robustness if the host signal is not known at the decoder. As discussed in Section III, these methods have linear embedding functions of the form
TABLE I ATTACKER'S DISTORTION PENALTIES. THE DISTORTION PENALTY IS THE ADDITIONAL DISTORTION THAT AN ATTACKER MUST INCUR TO SUCCESSFULLY REMOVE A WATERMARK. A DISTORTION PENALTY LESS THAN 0 dB INDICATES THAT THE ATTACKER CAN ACTUALLY IMPROVE THE SIGNAL QUALITY AND REMOVE THE WATERMARK SIMULTANEOUSLY
(59)
where
is a pseudonoise vector. From the definition of
minimum distance (13)
i.e., the minimum distance is zero.
Thus, although these methods may be effective when the
host signal is known at the decoder, when the host signal is not
known, they offer no guaranteed robustness to perturbations,
i.e., no achievable rate expression analogous to (58) exists for
additive spread spectrum. As is evident from (59), in an additive
spreadspectrum system, is an additive interference, which
is often much larger than due to the distortion constraint.
In contrast, the quantization that occurs with QIM provides
immunity against this hostsignal interference, as discussed in
Section IV.22
Generalized LBM: As shown in Appendix A, the distor
tionnormalized minimum distance of generalized binary LBM
with uniform scalar quantization is about 2.43 dB worse than
that of the corresponding dithermodulation strategy. Therefore,
its achievable ratedistortionrobustness performance is also
about 2.43 dB worse than (57). Again, as also developed in the
appendix, for ary implementations, the gap grows to 3 dB
for large .
2) Bounded HostDistortion Channel: As an alternative to
the bounded perturbation channel, some attackers may work
with distortion constraint between the channel output and the
host signal, rather than the channel input, since this distortion
is the most direct measure of degradation to the host signal. For
example, if attackers have partial knowledge of the host signal,
which may be in the form of a probability distribution, so that
they can calculate this distortion, then it may be appropriate to
bound the expected distortion
, where this ex
pectation is taken over the conditional probability density .23
We refer to this as the bounded hostdistortion channel.
For this channel, we measure robustness to attacks by the
minimum expected distortion for a successful attack, where
the expectation is taken with respect to . The ratio between
and the expected embeddinginduced distortion is the
distortion penalty that the attacker must pay to remove the wa
termark and, hence, is a figure of merit measuring the robust
nessdistortion tradeoff at a given rate. Distortion penalties for
the primary methods of interest are derived below and summa
rized in Table I for the high SDR regime of primary interest.
As this table reflects, among these methods considered, only
QIM methods (including binary dither modulation with uniform
scalar quantization) are robust enough such that the attacker
must degrade the hostsignal quality to remove the watermark.
Regular QIM: We first consider the robustness of regular
QIM. For any distortion measure, as long as each reconstruc
tion point lies at the minimumdistortion point of its respective
quantization cell, the QIM distortion penalty is greater than or
equal to since any output that an attacker generates must nec
essarily lie away from this minimumdistortion point. Equality
occurs only if each quantization cell has at least two minimum
distortion points, one of which lies in the incorrect decoder de
cision region. For expected squarederror distortion, the min
imumdistortion point of each quantization cell is its centroid,
and one can express this distortion penalty in terms of the dis
tortionnormalized minimum distance and the signal length ,
as we show below.
We use to denote the quantization cell containing and
to denote the conditional pdf of given that
.
Again, for sufficiently small quantization cells, this pdf can
often be approximated as uniform over , for example. Since
is the centroid of
(60)
22Another way to understand this hostsignal interference rejection is to con
sider, for example, that a quantized random variable has finite entropy while a
continuous random variable has infinite entropy.
= D = 23Note that if the attacker has full knowledge of the host signal, he or she can
trivially remove the embedded information by setting
, so
0. We
restrict our attention to the more realistic scenario in which an attacker has only
partial knowledge of the host, in the form of a conditional pdf.
Also, the expected squared error per letter embeddinginduced
distortion given
is
(61)
1440
IEEE TRANSACTION ON INFORMATION THEORY, VOL. 47, NO. 4, MAY 2001
The most general attack can always be represented as
an expression whose counterpart for the bounded perturbation
, where may be a function of . The resulting distortion channel was (57). Thus, the corresponding achievable rates are
is
given by
Distortioncompensated QIM: An intheclear attacker of a DCQIM system knows the quantizers and can determine the watermark after observing the composite signal . If the quantization cells are contiguous so that the distortioncompensation term in (19) does not move out of the cell containing , then an attacker can recover the original host signal with the following attack:
where we have used (61), the fact that
is a pdf and, thus,
integrates to one, and (60) to obtain the last line. For a successful
attack,
so
Averaging both sides of this expression over all quantization
cells yields
so that our figure of merit
for QIM methods is
(62)
Thus, for any QIM method of nonzero distortionnormalized
minimum distance
, the attacker's distortion penalty is al
ways greater than (0 dB), indicating that to remove the water
mark, the attacker must degrade the hostsignal quality beyond
the initial distortion caused by the embedding of the watermark.
Binary dither modulation with uniform, scalar quantiza
tion: In this case, (26) gives
in (62). Moreover, due to the
uniformity of the quantizers, the bound (62) is met with equality
so that the attacker's distortion penalty specializes to
(63)
Because the Hamming distance of a block code cannot exceed the number of coded bits
where the first equality follows from the definition (23) of . Thus, an upper bound for the distortion penalty (63) in this case is 2.43 dB Although this penalty may seem modest, it is larger than that obtainable by either additive spread spectrum or generalized LBM, as we show below. Larger distortion penalties are not possible because intheclear attackers can concentrate all their distortion in the minimumdistance direction in dimensional space. As a final note, (63) implies that binary dither modulation with uniform, scalar quantization can defeat any attacker as long as
where the final line follows simply by inverting (19). Thus, the
attacker's distortion penalty
is decibels. We see
that although DCQIM is optimal against both independent ad
ditive Gaussian noise attacks and squarederrordistortioncon
strained attacks in privatekey scenarios, it is in some sense
"maximally suboptimal" against intheclear attacks. Regular
QIM, on the other hand, is almost as good as DCQIM against
additive Gaussian noise attacks (Section VII) and also resistant
to intheclear attacks as discussed above. Thus, regular QIM
methods may offer an attractive compromise when one requires
resistance to both intentional attacks and unintentional attacks
in a nokey system.
Additive spread spectrum: Since the embedding function
of an additive spreadspectrum system is (2), the resulting dis
tortion is
. An attacker with full knowledge
of the embedding and decoding processes can decode the mes
sage , and hence, reproduce the corresponding pseudonoise
vector . Therefore, the attacker can completely remove the
watermark by subtracting from to obtain the original host
signal, i.e.,
. Hence, the resulting distortion
penalty is
decibels.
Because the additive spreadspectrum embedding function
combines the host signal and watermark
in a simple
linear way, anyone that can extract the watermark, can easily re
move it. Thus, these methods are not suitable for universally ac
cessible nokey digital watermarking applications. By contrast,
the advantage of QIM is that it effectively hides the host signal
even when the embedded information is known.
Generalized LBM: Recall that the embedding function of
a generalized LBM system can be written as (8) with having
the property (9). Good generalized LBM systems also have the
property that the reconstruction points of are at the cen
troids of the quantization cells, as we shall assume. One attack
that completely removes information about is to output these
reconstruction points, i.e.,
. Since is at a
minimum distortion point of the quantization cell,
0 dB, with equality only if both and are minimum
distortion points. Thus, an attacker can remove the watermark
without causing additional distortion to the host signal. This
CHEN AND WORNELL: QUANTIZATION INDEX MODULATION
1441
result applies regardless of whether errorcorrection coding is used. Thus, in contrast to dither modulation, errorcorrection coding does not improve LBM in this context. When, in addition, it is the least significant bit of a uniform, scalar quantizer that is modulated, the results in Appendix A imply further that
while
Thus,
2.43 dB. Again, when many less
significant bits are modulated, the results of the appendix can
be used to establish that the penalty grows to 3 dB.
IX. CONCLUDING REMARKS We have seen that QIM methods are provably better than additive spread spectrum and generalized LBM against bounded perturbation and intheclear attacks and are nearoptimal for Gaussian channels, for which DCQIM is optimal. Furthermore, dither modulation is a practical implementation of QIM that exhibits many of the attractive performance properties of QIM. The convenient structure of dither modulation, which is easily combined with errorcorrection coding, allows the system designer to achieve different rate distortionrobustness tradeoffs by tuning parameters such as the quantization step size. Also, one can conveniently upgrade previously developed additive spreadspectrum and spread LBM systems to spreadtransform dithermodulation systems by replacing the respective addition and quantizeandreplace steps with a dithered quantization step. In the course of our investigation, a number of rather intriguing results have emerged. For example, the informationembedding capacity in the Gaussian case does not depend at all on whether the host signal is available during decoding, and DCQIM is optimal in both scenarios, and achieves perfect rejection of hostsignal interference, even in the highSDR regime. Also somewhat surprisingly, the optimal embedding strategy for Gaussian channels and for typical attacks in privatekey systems, DCQIM is "maximally suboptimal" against intheclear attacks. On the other hand, regular QIM, which has performance within 4.3 dB of DCQIM in the Gaussian case, performs better than any other currently known method against intheclear attacks, which arise in copyright notification applications where nokey architectures are used, for example. In particular, unlike additive spreadspectrum and generalized LBM methods, QIM and dithermodulation methods force an attacker to pay a distortion penalty. Thus, QIM emerges as a universally good embedding strategy against a wide variety of intentional and unintentional attacks. For hybrid transmission strategies, using DCQIM for digitaloveranalog transmission (in, for example, digital audio broadcasting applications) allows embedding rates of about 1/3 b/s/Hz for every decibel drop in analog signal quality. In digitaloverdigital transmission (in broadcast applications, for example),
DCQIM is as efficient as any single digital transmission, and thus as good as the alternative superposition coding and successive cancellationdecoding approach. Many important directions for further research remain. At one end of the spectrum, further insights into the
fundamental principles and structure of information embedding and digital watermarking systems will come from the development of still better general attack models. Those emerging from gametheoretic formulations and arbitrarily varying channel models appear to be an important starting point in this respect. At the same time, many of the results in this paper have important implications for
practical applications, and the most effective implementations of QIM and DCQIM embedding systems for these applications will take into account, in detail, the specific types of geometric distortions and other attacks that typically arise. For example, in image watermarking applications, embedders and decoders ultimately need to be robust to a wide range of often surprisingly challenging attacks, ranging from scaling and rotation, to cropping and column replacement. A great deal of future work is needed in this area to enable the use of QIM techniques in watermarking applications, and indeed these represent some especially interesting design challenges. APPENDIX A LBM DISTORTIONNORMALIZED MINIMUM DISTANCE In this appendix we calculate the distortionnormalized minimum distance of binary LBM with uniform, scalar quantization. We assume that the host signal and embedded signal are statistically independent. Since the embedding function of any good generalized LBM method can be written as (8) with (10), the expected distortion is
(64)
where we have used (10) and the independence of and to
obtain the final line.
We analyze coded binary LBM with uniform scalar quantiza
tion, an LBM system in which each in a sequence of coded bits
is repeated times and embedded in a length block with a
sequence of uniform, scalar quantizers.
The embedding is accomplished by modulating the least sig
nificant bit of each quantizer. The th uniform, scalar quantizer
is illustrated in Fig. 12. The coarse quantizer
has a step
size of , and the th least significant bit adjustment element
equals
.
Comparing this scheme to coded binary dither modulation
with uniform scalar quantization as described in Section VA,
we see that this scheme has the same minimum distance, i.e.,
(22). Restricting attention to the high SDR regime in which
can be modeled as uniformly distributed within each cell of ,
1442
IEEE TRANSACTION ON INFORMATION THEORY, VOL. 47, NO. 4, MAY 2001
ACKNOWLEDGMENT The authors thank Prof. A. Lapidoth for calling their attention to the paper by Costa, and an anonymous reviewer for helpful comments that improved the clarity of the paper.
Fig. 12. Lowbit modulation with a uniform, scalar quantizer. The quantizer
1 2 has a step size of = , and the least significant bit (lsb) is modulated. All
2 0 reconstruction points marked with a have an lsb of . Points marked with a
1 have an lsb of . This process is equivalent to first quantizing using a quantizer
1 with a step size of , whose reconstruction points are marked with a , and
61 4 adding
=.
as was used to develop (24) in Section VA, the first term in (64) is (65)
the same as the expected distortion (25) of the corresponding dither modulation system. The second term in (64), however, is (66)
Thus, the overall expected distortion is
and the distortionnormalized squared minimum distance is
By comparing with (26), we see that binary coded LBM with uniform scalar quantization is worse than the corresponding dither modulation system by
2.43 dB
(67)
Also, note that the result (67) is invariant to the actual dis
tribution of the 's, and invariant to any preprocessing of the
host signal by a unitary transformation. Thus, the gap between
STDM and spread LBM is also given by (67).
In other variants of LBM, the gap can be worse. For in
stance, in the case of ary coded implementations of dither
modulation and LBM based on uniform scalar quantization
where the
sets of reconstruction points together form a
regular lattice, then the minimum distances of the two schemes
remain equal (but generally different from the binary case), and
the first term in (64) remains (65). However, as gets large,
becomes effectively uniformly distributed over the range
, so the second term in (64) changes from (66) to
the same as (65). Thus, the gap (67) grows to a factor of (3 dB) in this large limit.
REFERENCES [1] M. D. Swanson, M. Kobayashi, and A. H. Tewfik, "Multimedia dataembedding and watermarking technologies," Proc. IEEE, vol. 86, pp. 10641087, June 1998. [2] I. J. Cox and J.P. M. G. Linnartz, "Some general methods for tampering with watermarks," IEEE J. Select. Areas Commun., vol. 16, pp. 587593, May 1998. [3] J.P. Linnartz, T. Kalker, and J. Haitsma, "Detecting electronic watermarks in digital video," in Proc. 1999 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 4, Phoenix, AZ, Mar. 1999, pp. 20712074. [4] G. Arce, C. G. Boncelet Jr., R. F. Graveman, and L. M. Marvel, "Applications of
Information Hiding," in Proc. 3rd Annu. Federated Laboratory Symp. Advanced Telecommunications & Information Distribution
Research Program, College Park, MD, Feb. 1999, pp. 423427. [5] H. C. Papadopoulos and C.E. W. Sundberg, "Simultaneous broadcasting of analog FM and digital audio signals by means of adaptive precanceling techniques," IEEE Trans. Commun., vol. 46, pp. 12331242, Sept. 1998. [6] B. Chen and C.E. W. Sundberg, "Broadcasting data in the FM band by means of adaptive contiguous band insertion and precancelling techniques," in Proc. 1999 IEEE Int. Conf. Communications, vol. 2, Vancouver, BC, Canada, June 1999, pp. 823827. [7] M. D. Swanson, B. Zhu, and A. H. Tewfik, "Data hiding for videoinvideo," in Proc. 1997 IEEE Int. Conf.
image processing, vol. 2,
Piscataway, NJ, 1997, pp. 676679. [8] J. M. Barton, "Method and apparatus for embedding authentication information within digital data," United States Patent 5 646 997, July 1997. [9] K. Tanaka, Y. Nakamura, and K. Matsui, "Embedding secret information into a dithered multilevel image," in Proc. 1990 IEEE Military Communications Conf., 1990, pp. 216220. [10] W. Bender, D. Gruhl, N. Morimoto, and A. Lu, "Techniques for data hiding," IBM Syst. J., vol. 35, no. 34, pp. 313336, 1996. [11] I. J. Cox, J. Killian, T. Leighton, and T. Shamoon, "A secure, robust watermark for multimedia," in Information Hiding. 1st Int. Workshop Proc. , June 1996, pp. 185206. [12] J. R. Smith and B. O. Comiskey, "Modulation and information hiding in images," in Information Hiding. 1st Int. Workshop Proc., June 1996, pp. 207226. [13] J. R. Hernandez, F. PerezGonzalez, J. M. Rodriguez, and G. Nieto, "Performance analysis of a 2Dmultipulse amplitude modulation scheme for data hiding and watermarking of still images," IEEE J. Select. Areas Commun., pp. 510524, May 1998. [14] J. A. O'Sullivan, P. Moulin, and J. M. Ettinger, "Information theoretic analysis of steganography," in Proc. 1998 IEEE Int. Symp. Information Theory, Cambridge, MA, Aug. 1998, p. 297. [15] B. Chen and G. W. Wornell, "Implementations of quantization index modulation methods for digital watermarking and information embedding of multimedia," J. VLSI Signal Processing Syst. Signal, Image, and Video Technol. (
special issue on Multimedia Signal Processing), vol. 27, pp. 733, Feb. 2001. [16] P. Moulin and J. A. O'Sullivan, "Informationtheoretic analysis of information hiding," preprint, Oct. 1999. [17] N. Merhav, "On random coding error exponents of watermarking systems," IEEE Trans. Inform. Theory, vol. 46, pp. 420430, Mar. 2000. [18] A. Cohen and A. Lapidoth, "On the Gaussian watermarking game," in Proc. IEEE Int. Symp. Information Theory, Sorrento, Italy, June 2000, p. 48. [19] P. Moulin and J. O'Sullivan, "Informationtheoretic analysis of information hiding," in Proc. IEEE Int. Symp. Inform. Theory, Sorrento, Italy, June 2000, p. 19. [20] R. J. Barron, B. Chen, and G. W. Wornell, "The duality between information embedding and source coding with side information and some applications," IEEE Trans. Inform. Theory, submitted for publication. [21] I. J. Cox, M. L. Miller, and A. L. McKellips, "Watermarking as communications with side information," Proc. IEEE, vol. 87, pp. 11271141, July 1999. [22] F. Hartung and M. Kutter, "Multimedia watermarking techniques," Proc. IEEE, vol. 87, pp. 10791107, July 1999.
CHEN AND WORNELL: QUANTIZATION INDEX MODULATION
1443
[23] F. A. P. Petitcolas, R. J. Anderson, and M. G. Kuhn, "Information hidingA survey," Proc. IEEE, vol. 87, pp. 10621078, July 1999. [24] A. Z. Tirkel, G. A. Rankin, R. van Schyndel, W. J. Ho, N. R. A. Mee, and C. F. Osborne, "Electronic water mark," in Proc. Conf. Digital Image Computing, Technology and Applications, Sydney, Australia, Dec. 1993, pp. 666672. [25] R. van Schyndel, A. Z. Tirkel, and C. F. Osborne, "A digital watermark," in Proc. 1st IEEE Int. Conf. Image Processing, vol. 2, Austin, TX, Nov. 1994, pp. 8690. [26] I. J. Cox, J. Killian, F. T. Leighton, and T. Shamoon, "Secure spread spectrum watermarking for multimedia," IEEE Trans. Image Processing, vol. 6, pp. 16731687, Dec. 1997. [27] B. Chen, "Design and analysis of digital watermarking, information embedding, and data hiding systems," Ph.D. dissertation, MIT, Cambridge, MA, June 2000. [28] E. A. Lee and D. G. Messerschmitt, Digital Communication, 2nd ed. Boston, MA: Kluwer Academic, 1994. [29] N. S. Jayant and P. Noll, Digital Coding of Waveforms: Principles and Applications to Speech and Video. Englewood Cliffs, NJ: PrenticeHall, 1984.
[30] R. Zamir and M. Feder, "On lattice quantization noise," IEEE Trans. Inform. Theory, vol. 42, pp. 11521159, July 1996. [31] T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: Wiley, 1991. [32] S. I. Gel'fand and M. S. Pinsker, "Coding for channel with random parameters," Probl. Contr. Inform. Theory, vol. 9, no. 1, pp. 1931, 1980. [33] C. Heegard and A. A. E. Gamal, "On the capacity of computer memory with defects," IEEE Trans. Inform. Theory, vol. IT29, pp. 731739, Sept. 1983. [34] M. H. M. Costa, "Writing on dirty paper," IEEE Trans. Inform. Theory, vol. IT29, pp. 439441, May 1983. [35] A. Lapidoth, "Nearest neighbor decoding for additive nonGaussian noise channels," IEEE Trans. Inform. Theory, vol. 42, pp. 15201529, Sept. 1996. [36] F.W. Sun and H. C. A. van Tilborg, "Approaching capacity by equiprobable signaling on the Gaussian channel," IEEE Trans. Inform. Theory, vol. 39, pp. 17141716, Sept. 1993.
B Chen, GW Wornell