Quantization index modulation: A class of provably good methods for digital watermarking and information embedding, B Chen, GW Wornell

Tags:
Content: IEEE TRANSACTION ON INFORMATION THEORY, VOL. 47, NO. 4, MAY 2001
1423
Quantization Index Modulation: A Class of Provably Good Methods for Digital Watermarking and Information Embedding Brian Chen, Member, IEEE, and Gregory W. Wornell, Senior Member, IEEE
Abstract--We consider the problem of embedding one signal (e.g., a digital watermark), within another "host" signal to form a third, "composite" signal. The embedding is designed to achieve efficient tradeoffs among the three conflicting goals of maximizing information-embedding rate, minimizing distortion between the host signal and composite signal, and maximizing the robustness of the embedding. We introduce new classes of embedding methods, termed quantization index modulation (QIM) and distortion-compensated QIM (DC-QIM), and develop convenient realizations in the form of what we refer to as dither modulation. Using deterministic models to evaluate digital watermarking methods, we show that QIM is "provably good" against arbitrary bounded and fully informed attacks, which arise in several copyright applications, and in particular, it achieves provably better rate distortion­robustness tradeoffs than currently popular spread-spectrum and low-bit(s) modulation methods. Furthermore, we show that for some important classes of Probabilistic Models, DC-QIM is optimal (capacity-achieving) and regular QIM is near-optimal. These include both additive white Gaussian noise (AWGN) channels, which may be good models for hybrid transmission applications such as digital audio broadcasting, and mean-square-error-constrained attack channels that model private-key watermarking applications. Index Terms--Data hiding, digital audio broadcasting, dither modulation, digital watermarking, hybrid transmission, information embedding, quantization index modulation (QIM), steganography. I. INTRODUCTION A NUMBER of applications have emerged recently [1] that require the design of systems for embedding one signal, sometimes called an "embedded signal" or "watermark," within another signal, called a "host signal." The embedding must be done such that the embedded signal is "hidden," i.e., causes no serious degradation to its host. At the same time, the embedding must be robust to common degradations of the watermarked signal--the watermark must survive whenever the host signal Manuscript received June 14, 1999; revised September 20, 2000. This work was supported in part by MIT Lincoln Laboratory Advanced Concepts Committee, the National Science Foundation under Grant CCR-0073520, Microsoft Research, and a National Defense Science and Engineering Graduate Fellowship. The authors are with the Department of electrical engineering and Computer Science, Massachusetts Institute of Technology (MIT), Cambridge, MA 02139 USA (e-mail: [email protected]; [email protected]). Communicated by T. E. Fuja, Associate Editor At Large. Publisher Item Identifier S 0018-9448(01)02834-6.
does. In some applications these degradations are the result of benign processing and transmission; in other cases they result from deliberate attacks. Several of these applications relate to copyright notification and enforcement for audio, video, and images that are distributed in digital formats. In these cases, the embedded signal either notifies a recipient of any copyright or licensing restrictions or inhibits or deters unauthorized copying. For example, this embedded signal could be a digital "fingerprint" that uniquely identifies the original purchaser of the copyrighted work. If illicit copies of the work were made, all copies would carry this fingerprint, thus identifying the owner of the copy from which all illicit copies were made. In another example, the embedded signal could either enable or disable copying by some duplication device that checks the embedded signal before proceeding with duplication. Such a system has been proposed for allowing a copy-once feature in digital video disc recorders [2]. Alternatively, a standards-compliant player could check the watermark before deciding whether or not to play the disc [3]. Other applications include automated monitoring of airplay of advertisements on commercial radio broadcasts. Advertisers can embed a digital watermark within their ads and count the number of times the watermark occurs during a given broadcast period, thus ensuring that their ads are played as often as promised. In other applications, the embedded signal may be used for authentication of--or detection of tampering with--the host signal. For example, a digital signature could be embedded in a military map. A number of other national security applications are described in [4] and include covert communication, sometimes called "steganography" or low probability of detection communication, and so-called traitor tracing, a version of the digital fingerprinting application described above used for tracing the source of leaked information. One final application for which the digital watermarking methods developed in this paper are well-suited is the backward-compatible upgrading of an existing communication system, an example of which is the so-called hybrid in-band on-channel digital audio broadcasting [5], [6]. In this application, one would like to simultaneously transmit a digital signal with existing analog (AM and/or FM) commercial broadcast radio without interfering with conventional analog reception. Thus, the analog signal is the host signal and the digital signal is the watermark. Since the embedding does not degrade the host signal too much, conventional analog receivers can demodulate the analog host signal. In addition,
0018­9448/01$10.00 © 2001 IEEE
1424
IEEE TRANSACTION ON INFORMATION THEORY, VOL. 47, NO. 4, MAY 2001
next-generation digital receivers can decode the digital signal embedded within the analog signal, which may be all or part of a digital audio signal, an enhancement signal used to refine the analog signal, or simply supplemental information such as station identification or traffic information. More generally, the host signal in these hybrid transmission systems could be some other type of analog signal such as video [7], or even a digital waveform--for example, a digital pager signal could be embedded within a digital cellular telephone signal. In general, designers of information embedding systems for these kinds of applications seek to achieve high embedding rates with high levels of robustness and low levels of embedding-induced distortion. However, in general, these three goals are conflicting. Thus, in this paper we characterize methods in terms of the efficiency with which they trade off rate, distortion, and robustness. For instance, for any minimum embedding rate requirement and maximum acceptable level of embedding distortion, the more efficient an embedding method is, the higher the robustness that can be achieved. A great many information-embedding algorithms have been proposed [1] in this still emerging field. Some of the earliest proposed methods [8], [9], [7] employ a quantize-and-replace strategy: after first quantizing the host signal, these systems change the quantization value to embed information. A simple example of such a system is the so-called low-bit(s) modulation (LBM), where the least significant bit(s) in the quantization of the host signal are replaced by a binary representation of the embedded signal. More recently, additive spread-spectrum-based methods, which embed information by linearly combining the host signal with a small pseudo-noise signal that is modulated by the embedded signal, have received considerable attention in the literature as an alternative to LBM-type methods [10]­[13]. In this paper, we show that both LBM-type strategies and additive spread spectrum are, in general, not good choices for most information embedding and digital watermarking applications. As an alternative, this paper introduces a new class of information-embedding strategies we refer to as "quantization index modulation" (QIM) that is, in general, preferable and in many specific scenarios optimal. We further develop computationally efficient implementations of QIM in the form of what we refer to as "dither modulation." We evaluate both specific realizations of uncoded and coded QIM, and the asymptotic performance limits of coded QIM using information-theoretic analysis. Other emerging information-theoretic results on the digital watermarking problem are developed in, e.g., [14]­[20]. The specific organization of the paper is as follows. In Section II, we develop two useful equivalent models for the information-embedding problem. In Section III, we classify traditional approaches to this problem, and in the process identify some of their shortcomings. Section IV introduces the QIM class of embedding methods, and Section V develops practical realizations that are compared to corresponding implementations of traditional approaches. Next, Section VI establishes conditions under which different forms of QIM are optimal in an information-theoretic sense. We then evaluate the methods of this paper in the context of Gaussian models for unintentional attacks in Section VII, and in the context of some general inten-
Fig. 1. General information-embedding problem model. A message m is embedded in the host-signal vector using some embedding function ( ; m). A perturbation vector corrupts the composite signal . The decoder extracts an estimate m ^ of m from the noisy channel output .
tional attack models in Section VIII. Finally, Section IX contains some concluding remarks.
II. PROBLEM MODEL Two mathematically equivalent models for the informationembedding problem are useful in our development.
A. Distortion-Constrained Multiplexing Model
The information-embedding problem is naturally and gener-
ally described by Fig. 1. In this figure, there is a host-signal
vector
into which we wish to embed some informa-
tion .1 We wish to embed at a rate of bits per dimension
(host-signal sample) so we can think of as an integer in the
set
.
An embedding function maps the host signal and embedded
information to a composite signal
subject to some
distortion constraint. Various distortion measures may be of in-
terest, an example of which is the squared-error distortion
(1)
or its expectation
. The composite signal is
subjected to various common signal processing manipulations
such as lossy compression, addition of random noise, and
resampling, as well as deliberate attempts to remove the
embedded information. These manipulations occur in some
channel, which produces an output signal
. For future
convenience, we define a perturbation vector to be the differ-
ence
, as shown in Fig. 1; we consider cases of both
signal-independent and signal-dependent perturbation vectors
in this paper.
A decoder extracts--i.e., forms an estimate of--the em-
bedded information based on the channel output . We focus
primarily on the "host-blind" case of interest in most applica-
tions, where is not available to the decoder, in contrast to the
"known-host" case, where the decoder can separately observe
. (See, e.g., [14] [17] for information-theoretic treatments of
some aspects of the known-host case.) Our interest is in de-
coders that produce reliable estimates whenever the channel is
not too severe, where reliable means either that
deter-
ministically or that
for sufficiently small . In
1The vector is any convenient representation of all or part of the host signal. In the case of a host image, it could be a vector of pixel values or discrete cosine transform (DCT) coefficients, for example. In the case of a host audio waveform, this vector could be a vector of samples, spectral parameters, or linear prediction coding (LPC) coefficients, for example.
CHEN AND WORNELL: QUANTIZATION INDEX MODULATION
1425
such cases, the tolerable severity of the channel degradations is a measure of the robustness of an information-embedding system.
B. Equivalent Super-channel model
An alternative representation of the model of Fig. 1 is
shown in Fig. 2. The two models are equivalent since any
embedding function
can be written as the sum of the
host signal and a host-dependent distortion signal
,
i.e.,
, simply by defining the distortion
signal to be
. Thus, one can view
as the input to a super-channel that consists of the cascade
of an adder and the true channel. The host signal is a
state of this super-channel that is known at the encoder. The
measure of distortion
between the composite and
host signals maps onto a host-dependent measure of the size
of the distortion signal . For example,
squared-error distortion (1) equals the power of
Fig. 2. Equivalent super-channel model for information embedding. The composite signal is the sum of the host signal, which is the state of the super-channel, and a host-dependent distortion signal. Host-interference nonrejecting methods have the general property that the host signal is effectively a source of interference in the system, and generally result from system designs that do not allow the encoder in Fig. 2 to sufficiently exploit knowledge of the host signal . The simplest of such methods have purely additive embedding functions of the form
(2)
Therefore, one can view information-embedding problems as power-limited communication over a super-channel with a state that is known at the encoder.2 As we will develop, this view will be convenient for determining achievable rate distortion­robustness tradeoffs of various information-embedding and -decoding methods. C. Channel Models In general, the channel model is either a characterization of the degradations that can actually occur to the composite signal, or alternatively, a description of the class of degradations to which the embedder and decoder must be robust, i.e., the system is designed to work against all degradations described by this particular model. The latter viewpoint is particularly useful in the context of intentional attacks. We consider both probabilistic and deterministic channel models. In the probabilistic case, we specify the channel input­output relationship in terms of the conditional probability law . Implicitly, this specification also describes the conditional probability law of the perturbation vectors against which the system must be robust since
where
is typically a pseudo-noise sequence. Such em-
bedding methods are often referred to as additive spread-spec-
trum methods, and some of the earliest examples are described
in [24], [25], [10], [26], [11], [12]. Typically,
takes the
form
(3)
where is a unit-energy spreading vector and
is a scalar
function of the message.3
It is often convenient to view additive spread-spectrum as per-
turbation of a projection. In particular, substituting (3) into (2)
and using that has unit energy, we obtain
(4)
which when projected onto we obtain
(5)
where is the corresponding projection of the host signal, i.e.,
(6)
In the deterministic case, the channel input­output relationship
is described most generally in terms of the set of possible out-
puts
for every given input, or equivalently, in terms
of the set of desired tolerable perturbation vectors
for
every given input.
III. CLASSES OF EMBEDDING METHODS An extremely large number of embedding methods have been proposed in the literature [22], [23], [1]. Broadly, for our purposes these can be divided into two classes: 1) host-interference nonrejecting methods and 2) host-interference rejecting methods. 2Cox et al. have also recognized that one may view watermarking as communications with side information known at the encoder [21].
Finally, substituting (5) back into (4) yields the composite signal reconstruction from projections
(7)
From (2), we see that for this class of embedding methods,
the host signal acts as additive interference that inhibits the de-
coder's ability to estimate . Consequently, even in the absence
of any channel perturbations
, one can usually embed
only a small amount of information. Thus, these methods are
useful primarily when either the host signal is available at the
decoder (as assumed in, e.g., [26]) or when the host-signal in-
terference is much smaller than the channel interference.
3Technically, spread-spectrum systems (2) for which (3) applies are classified as amplitude-modulation additive spread-spectrum methods, but since there is no risk of confusion in this paper, we will use the term "additive spread-spectrum" to specifically mean those systems based on amplitude modulation.
1426
IEEE TRANSACTION ON INFORMATION THEORY, VOL. 47, NO. 4, MAY 2001
Information embedding systems can achieve host-interference rejection when knowledge of the host signal at the encoder is adequately exploited in system design. Examples include LBM and, more generally, quantize-and-replace systems. In LBM systems, the least significant bit(s) in the binary representation of a host sample are simply replaced with message bits. A class of quantize-and-replace systems that we refer to as generalized LBM systems implement a vector generalization of this embedding strategy. Generalized LBM embedding functions are of the form
(8)
where represents the coarse quantizer that determines the most significant bits, and is determined only by the (modulated) least significant bits. A defining characteristic of generalized LBM systems is that the embedding never alters the most significant bits of the host signal, which is expressed in terms of the constraint
(9)
Without loss of generality, we may assume that good generalized LBM quantizers are unbiased, i.e.,
(10)
One example of a generalized LBM system is that developed in [7], where LBM is effectively applied to a pseudorandom projection of the form (6). Thus, the embedding is of the form (7) where is now of the form
(11)
with a uniform, scalar quantization function of step size
and
a perturbation value. It is convenient to think of this
class of generalized LBM systems as "spread LBM" systems.
While generalized LBM systems are host-interference re-
jecting, they are unnecessarily constrained in a way that makes
them generally inefficient and vulnerable to various classes of
attacks, which in turn limits the range of applications for which
they can be used. Avoiding these constraints in the process of
developing optimal information-embedding systems naturally
gives rise to a new and general class of host-interference
rejecting embedding methods called QIM, which we develop
in the sequel.
IV. QUANTIZATION INDEX MODULATION
To develop the QIM concept, we begin by viewing the embed-
ding function
as an ensemble of functions of , indexed
by . We denote the functions in this ensemble as
to
emphasize this view. If the embedding-induced distortion is to
be small, each function in the ensemble must be close to an iden-
tity function in some sense so that
(12)
That the system needs to be robust to perturbations suggests that the points in the range of one function in the ensemble should be far away in some sense from the points in the range
2 Fig. 3. QIM for information embedding. The points marked with 's and 's belong to two different quantizers, each with its associated index. The minimum distance d measures the robustness to perturbations, and the sizes of the quantization cells, one of which is shown in the figure, determine the distortion. = 2 If m 1, the host signal is quantized to the nearest . If m = 2, the host signal is quantized to the nearest .
of any other function. For example, one might desire at the very
least that the ranges be nonintersecting. Otherwise, even in the
absence of any perturbations, there will be some values of from
which one will not be able to uniquely determine . In fact, it is
precisely the nonintersection property that leads to host-signal
interference rejection.
The nonintersection property along with the approximate-
identity property (12), which suggests that the ranges of each
of the functions "cover" the space of possible (or at least highly
probable) host-signal values , suggests that the functions be
discontinuous. Quantizers are just such a class of discontinuous,
approximate-identity functions. Then, "QIM" refers to embed-
ding information by first modulating an index or sequence of
indices with the embedded information and then quantizing the
host signal with the associated quantizer or sequence of quan-
tizers.
Fig. 3 illustrates this QIM information-embedding technique.
In this example, one bit is to be embedded so that
.
Thus, we require two quantizers, and their corresponding sets
of reconstruction points in are represented in Fig. 3 with
's and 's. If
, the host signal is quantized with the
-quantizer, i.e., is chosen to be the closest to . If
,
is quantized with the -quantizer.
As varies, the composite signal value varies from one
point (
) to another or from one point
to an-
other, but it never varies between a point and a point. Thus,
even with an infinite energy host signal, one can determine
if channel perturbations are not too severe. The points and
points are both quantizer reconstruction points and signal con-
stellation points,4 and we may view design of QIM systems as
the simultaneous design of an ensemble of source codes (quan-
tizers) and channel codes (signal constellations).
Conveniently, properties of the quantizer ensemble can be re-
lated directly to the performance parameters of rate, distortion,
and robustness. For example, the number of quantizers in the
ensemble determines the information-embedding rate . The
sizes and shapes of the quantization cells determine the embed-
ding-induced distortion, all of which arises from quantization
4One set of points, rather than one individual point, exists for each value of m.
CHEN AND WORNELL: QUANTIZATION INDEX MODULATION
1427
error. Finally, for many classes of channels, the minimum dis- where tance
is the Gaussian -function
(18) (13)
between the sets of reconstruction points of different quantizers in the ensemble effectively determines the robustness of the embedding.5 It is important to emphasize that, in contrast to the case where the host signal is known at the receiver, the minimum-distance decoder needs to choose from all reconstruction points of the quantizers, not just those corresponding to the actual host signal . In particular, the minimum-distance decoder makes decisions according to the rule6 (14)
A. Distortion-Compensated QIM
Distortion compensation is a type of postquantization pro-
cessing that can improve the achievable rate distortion­robust-
ness tradeoffs of QIM methods. To see this, we begin by noting
that for a fixed rate and a given quantizer ensemble, scaling8 all
quantizers by
increases by a factor of , thereby
increasing the robustness of the embedding. However, the em-
bedding-induced distortion also increases by a factor of .
Adding back a fraction
of the quantization error to the
quantization value removes, or compensates for, this additional
distortion. The resulting embedding function is
If, which is often the case, the quantizers
map to the
nearest reconstruction point, then (14) can be rewritten as
(19)
(15) (While the minimum-distance decoder is especially convenient to implement and analyze, a variety of other potentially useful decoders are discussed in [27].) Intuitively, the minimum distance measures the size of perturbation vectors that can be tolerated by the system. For example, if channel perturbations are bounded according to7 (16) then the minimum-distance decoder is guaranteed to not make an error as long as (17)
where
is the th quantizer of an ensemble
whose reconstruction points have been scaled by so that two
reconstruction points separated by a distance before scaling
are separated by a distance after scaling. The first term in
(19) represents normal QIM embedding. We refer to the second
term as the distortion­compensation term.
The quantization error added back is a source of interfer-
ence to the decoder. Typically, the Probability Density Functions
(pdfs) of the quantization error for all quantizers in the QIM en-
semble are similar. Therefore, the distortion compensation term
in (19) is effectively statistically independent of and can be
treated as independent noise. Thus, decreasing leads to greater
minimum distance, but for a fixed embedding-induced distor-
tion, the distortion­compensation interference at the decoder in-
creases. One optimality criterion for choosing is to maximize
the following "SNR" at the decision device:
In the case of a classical additive white Gaussian noise (AWGN) channel with a noise variance of , at high signal-to-noise ratio (SNR) the minimum distance also characterizes the error probability of the minimum-distance decoder [28],
5When the host signal is known at the decoder, as is the case in some applications of interest, then the more natural minimum distance is d (x) = min ks(x; i) 0 s(x; j)k or k 0 k d = min min s(x; i) s(x; j) :
6Alternatively, if the host signal x is known at the decoder
k 0 k m^ ( ; x) = arg min
s(x; m) :
7We refer to this as the bounded perturbation channel and will revisit this deterministic channel in Section VIII-B1.
SNR
where this SNR is defined as the ratio between the squared
minimum distance between quantizers and the total interfer-
ence energy from both distortion­compensation interference
and channel interference. Here, is the minimum distance
when
and is a characteristic of the particular quantizer
ensemble. One can easily verify that the optimal scaling
parameter that maximizes this SNR is
DNR (20) DNR
where DNR is the (embedding-induced) distortion-to-noise
ratio
.
As we will see, suitably coded versions of this distortion-
compensated QIM with precisely the parameter setting (20) also
have important asymptotic optimality properties. Before devel-
oping these properties, let us first investigate some constraints
8If a reconstruction point is at q, it is "scaled" by by moving it to q=.
1428
IEEE TRANSACTION ON INFORMATION THEORY, VOL. 47, NO. 4, MAY 2001
that are useful to impose on QIM systems to facilitate implementation.
V. DITHER MODULATION: AN IMPLEMENTATION OF QIM
A key aspect of the design of QIM systems involves the
choice of practical quantizer ensembles for such systems,
which we now explore. In the process, we obtain additional
insights into the design, performance evaluation, and imple-
mentation of QIM embedding methods, particularly those of
low complexity. A convenient structure to consider is that
of so-called dithered quantizers [29], [30], which have the
property that the quantization cells and reconstruction points
of any given quantizer in the ensemble are shifted versions of
the quantization cells and reconstruction points of any other
quantizer in the ensemble. In nonwatermarking contexts, the
shifts typically correspond to pseudorandom vectors called
dither vectors. For information-embedding purposes, the dither
vector can be modulated with the embedded signal, i.e., each
possible embedded signal maps uniquely onto a different dither
vector
. The host signal is quantized with the resulting
dithered quantizer to form the composite signal. Specifically,
we start with some base quantizer , and the embedding
function is
This constraint ensures that the two corresponding
-dimensional dithered quantizers are the maximum
possible distance from each other. For example, a
pseudorandom sequence of
and its negative
satisfy this constraint. One could alternatively choose
pseudorandomly with a uniform distribution over
.10 Also, the two dither sequences need
not be the same for each length- block.
iii) The th block of is quantized with the dithered quantizer
using the dither sequence
.
A detailed assessment of the complexity of this QIM realization is developed in [15], [27]. The minimum-distance properties of coded binary dither modulation are readily deduced. In particular, any two distinct coded bit sequences differ in at least places, where is the minimum Hamming distance of the error-correction code. For each of these blocks, the reconstruction points of the corresponding quantizers are shifted relative to each other by in the th dimension. Thus, the square of the minimum distance (13) over all dimensions is
We call this type of information embedding "dither modulation." We discuss several low-complexity realizations of such dither-modulation methods in the sequel.
A. Coded Binary Dither Modulation with Uniform Scalar Quantization
Coded binary dither modulation with uniform, scalar quanti-
zation is one such realization.9 We assume that
.
The dither vectors in a coded binary dither modulation system
are constructed as follows.
i) The
information bits
repre-
senting the embedded message are error-correction
coded using a rate-
code to obtain a coded bit se-
quence
, where
(22) where to obtain the second equality we have used (21), and where, in the third line, is the gain of the error-correction code (23) In the high signal-to-distortion ratio (SDR) regime of primary interest for high-fidelity applications, the quantization cells are sufficiently small that the host signal can be modeled as uniformly distributed within each cell. In this case, the expected squared-error distortion of a uniform, scalar quantizer with step size is the familiar
(21)
(24)
(In the uncoded case,
and
.) We di-
vide the host signal into nonoverlapping blocks of
length and embed the th coded bit in the th block,
as described below.
ii) Two length- dither sequences
and
and
one length- sequence of uniform, scalar quantizers
with step sizes
are constructed with the
constraint
Thus, the overall average expected distortion (1) is (25) Combining (22) and (25) yields the "distortion-normalized" squared minimum distance (26)
9By scalar quantization, we mean that the high-dimensional base quantizer q(1) is the Cartesian product of scalar quantizers.
10A uniform distribution for the dither sequence implies that the quantization error is statistically independent of the host signal and leads to fewer "false contours," both of which are generally desirable properties from a perceptual viewpoint [29].
CHEN AND WORNELL: QUANTIZATION INDEX MODULATION
1429
Fig. 4. Dither modulation with uniform quantization step sizes.
Fig. 5. Transform dither modulation with nonuniform quantization step sizes.
a quantity that can be used to characterize the achievable performance of QIM realizations more generally, as we will develop.
B. Spread-Transform Dither Modulation
One special class of coded binary dither modulation methods
is what we refer to as spread-transform dither modulation
(STDM). We now develop its properties and quantify its
advantages over other forms of dither modulation, over additive
spread-spectrum methods, and over spread LBM.
To introduce STDM, we begin by observing that the dis-
tortion-normalized squared minimum distance (26) of binary
dither modulation with uniform scalar quantization does not de-
pend on the sequence , i.e., on the distribution of the dis-
tortion across samples within the length- block. Thus, one is
free to choose any distribution without sacrificing
, so the
's can be chosen to optimize other characteristics of the em-
bedding.
To understand this property, consider Figs. 4­6, each of which
show the reconstruction points of two quantizers for embedding
one bit in a block of two samples. For each of the three systems,
the minimum distance--
--and the average squared-error
distortion--
per sample--are identical. Thus, the robust-
ness against bounded perturbations is the same in each case.
However, the quantization differs in each case. In Fig. 4, where
scalar quantization is applied to each sample separately, the
quantization step sizes are the same for both samples. In Figs. 5
and 6, the samples are first pretransformed and the resulting co-
efficients quantized unevenly. In particular, a unitary transform
(coordinate rotation) is applied to the pair of samples before
quantization; the first transform coefficient is the component of
the host signal in the direction of depicted. In Fig. 5, the step
size for quantizing the first transform coefficient is larger than
that used to quantize the second transform coefficient, which
lies in the direction orthogonal to . Finally, in the extreme case
of Fig. 6, the step size for the first coefficient is larger still, and
that for the second coefficient is zero, i.e., all embedding occurs
in the first coefficient. In this case, the reconstruction points be-
come reconstruction lines, so to embed a bit, the host signal
is quantized to the nearest point on a line labeled with a . To
embed a bit, the host signal is quantized to the nearest point
on a line labeled with a .
Fig. 6. Transform dither modulation with quantization of only a single transform component. The quantization step size for the component of the host signal orthogonal to v is zero. While the three systems corresponding to Figs. 4­6 have the same minimum distance, the number of perturbation vectors of minimum length that cause decoding errors is higher for the case of Fig. 4 than for the case of Fig. 6. (For intermediate cases such as the one shown in Fig. 5, where quantization step sizes in different dimensions are different but nonzero, the number of perturbation vectors of minimum length that cause decoding errors is the same as in Fig. 4, but these vectors are not orthogonal.) Thus, for probabilistic channels, such as additive noise channels, the probability of error is generally different in each case. For example, suppose a bit is embedded and the composite signal is the point labeled with in Figs. 4 and 6. If the channel output lies in the decision region defined by the dashed box in Fig. 4 and defined by the two dashed lines in Fig. 6, then the decoder will correctly determine that a bit was embedded. If the perturbation vector places the channel output outside the decision region, however, the decoder will make an error with very high probability. (There is some possibility that the channel output is outside the decision region but is still closer to a point other than than to the closest . These events, however, are very unlikely for many perturbation probability distributions that are of practical interest.) Since the decision region of Fig. 6
1430
IEEE TRANSACTION ON INFORMATION THEORY, VOL. 47, NO. 4, MAY 2001
contains the decision region of Fig. 4, it follows that the probability of a correct decision in the case of nonuniform quantization step sizes is higher. The unitary transform in the case of Fig. 6 not only facilitates a comparison of Figs. 4 and 6, but also serves to spread any embedding-induced distortion over frequency and time/space when a peak distortion constraint is imposed, for example. Although the distortion is concentrated in only one transform coefficient, if the energy of is spread over space/time and frequency--for example, if is chosen pseudorandomly--then the distortion will also be spread. As we will see in subsequent sections of this paper, dithermodulation methods have considerable performance advantages over previously proposed additive spread-spectrum and spread LBM methods in a variety of contexts. However, much effort has already been invested in optimizing both additive spread-spectrum and spread LBM systems, for example, by exploiting perceptual properties of the human visual and auditory systems or designing receiver front-ends to mitigate effects of geometric and other distortions. An additional advantage of STDM specifically over other forms of dither modulation is that one can easily convert existing additive spread-spectrum and spread LBM systems into STDM systems while retaining the other optimized components of the system. In particular, it suffices to replace the addition step of additive spread spectrum, i.e., (5), or the quantize-and-replace step of spread LBM, i.e., (11), with the dithered quantization step of STDM, i.e.,
where
so that the expected distortion in both
cases is the same, and where we have used the fact that
and are chosen such that
.
The decoder in both cases makes a decision based on , the
projection of the channel output onto . In the case of additive
spread spectrum,
, while in the case of STDM,
, where is the projection of the perturbation
vector onto . We let be some measure of energy. For
example,
in the case of a deterministic variable , or
when is random. The energy of the interference
or "noise" is
for additive spread spectrum, but only
for STDM, i.e., the host-signal interference for STDM is
zero. Thus, the SNR at the decision device is
SNR
for additive spread spectrum and
SNR
for STDM, where the "signal" energies
and
are given by (28) and (29). Thus, the advantage of STDM over additive spread spectrum is
SNR
(27)
SNR
(30)
SNR Advantage of STDM: In this section, we quantify the
performance gain of STDM over additive spread spectrum and
spread LBM from an SNR perspective that applies to a broad
range of contexts. We focus our analysis on the representative
case of embedding one bit in a length- block using a unit-
length spreading vector . Because, as (5), (11), and (27) re-
flect, in each case the embedding occurs entirely in the projec-
tion of onto , a one-dimensional problem results. In addition,
because all of the embedding-induced distortion occurs only in
the direction of , the distortion in each case also has the same
temporal/spatial distribution and frequency distribution. Thus,
one would expect that any perceptual effects due to time/space
masking or frequency masking are the same in each case. There-
fore, squared-error distortion and SNR-type measures are more
meaningful measures of distortion when comparing these em-
bedding methods than one might expect in other more general
contexts where squared-error distortion may fail to capture cer-
tain perceptual effects.
SNR avantage of STDM over additive spread spec-
trum: Considering the case of additive spread-spectrum first,
since
in (5), we have
which is typically very large since the channel perturbations
are usually much smaller than the host signal if the channel
output is to be of reasonable quality. For example, if the host-
signal-to-channel-noise ratio is 30 dB and and are uncor-
related, then the SNR advantage (30) of STDM over additive
spread spectrum is 28.8 dB.11
SNR advantage of STDM over spread LBM: Spread-trans-
form dither modulation methods also have an SNR advantage
over spread LBM methods. As we show in Appendix A, the
distortion-normalized squared minimum distance (26) of LBM
is
2.43 dB worse than that of dither modulation in the
case of coded binary embedding with uniform, scalar quantiza-
tion. Thus, for a fixed rate and embedding-induced distortion,
the squared-minimum distance, and hence the SNR at the deci-
sion device, for spread LBM will be 2.43 dB worse than that of
STDM, i.e.,12
SNR SNR
2.43 dB
(31)
This SNR advantage is illustrated in Fig. 7, where the quantizer reconstruction points and embedding intervals for both spread
For STDM (27)
(28) (29)
4 3 = ~ 0 11Note that while the high SDR approximation (30) predicts that STDM is worse than additive spread spectrum by a factor of = 1.25 dB when x
( ) = 61 4 (as would be the case, for example, if the host signal had very little energy in
the direction of v), in fact, if one chooses d m
= then it is straightfor-
ward to verify that STDM performs as well as additive spread spectrum in this
low SDR regime.
2 ! 1 12Appendix A also shows that for M -ary embedding the SNR gain grows to
(3 dB) as M
.
CHEN AND WORNELL: QUANTIZATION INDEX MODULATION
1431 and a memoryless channel with known pdf where and are the th components of and , respectively.13 Then, the super-channel is also memoryless and has probability law
Fig. 7. Spread-transform dither modulation versus spread LBM. The embedding interval boundaries of spread LBM, which are shown with solid 2 lines, are the same for both points and points. In contrast, in the case of 2 STDM, the -point embedding intervals, shown by solid lines, differ from the -point embedding intervals, shown by dashed lines. An SNR advantage of 7=4 = 2.43 dB for STDM results. LBM and STDM are shown, with the same embedding-induced squared-error distortion for both cases. The preceding analysis establishes some important advantages of QIM methods over common information-embedding methods. In fact, it turns out that QIM methods are asymptotically optimal in many key scenarios of interest. To develop these results, we next examine information embedding within an information-theoretic framework. VI. INFORMATION-THEORETIC OPTIMALITY OF QIM This section explores the best possible rate-distortion-robustness performance that one could hope to achieve with any information-embedding system. Our analysis leads to insights about some properties and characteristics of good information-embedding methods, i.e., methods that achieve performance close to the information-theoretic limits. In particular, a canonical "hidden QIM" structure emerges for information embedding that consists of 1) preprocessing of the host signal, 2) QIM embedding, and 3) postprocessing of the quantized host signal to form the composite signal. One incurs no loss of optimality by restricting one's attention to this simple structure. We also derive sufficient conditions under which only distortion compensation postprocessing is required. As we develop in Sections VII and VIII, these conditions are satisfied in several important cases of practical interest. A. Communication over Channels with Side Information The super-channel model of Section II-B and Fig. 2 facilitates our analysis, i.e., we view information embedding as the transmission of a host-dependent distortion signal over a superchannel with side information or state that is known at the encoder. In this section, we also restrict our attention to a squarederror distortion constraint
The capacity [31] of this super-channel is the reliable information-embedding rate that is asymptotically achievable with long signal lengths . In nonwatermarking contexts, Gel'fand and Pinsker [32] and Heegard and El Gamal [33] have determined the capacity of such a channel in the case of a random state vector with independent and identically distributed (i.i.d.) components when the encoder sees the entire state vector before choosing the channel input . In this case, the capacity is (32)
where
denotes mutual information and is an auxiliary
random variable. Since
we can think of
in (32) as being generated from , and, in turn, from and .
While the mapping from to is, in general, probabilistic, from
convexity properties of mutual information, one can deduce that
the maximizing distribution in (32) always has the property that
is a deterministic function of
[32].
In the case of watermarking, the maximization (32) is subject
to a distortion constraint
. A formal proof of the ex-
tension of (32) to include this constraint is developed in [20].
Other researchers [18], [19], [16] are working on extending or
have extended these results to the case where the channel law
is not fixed but rather is chosen by an attacker subject to a
distortion constraint. A related information-theoretic formula-
tion can be found in [14].
As we shall see in the next section, one way to interpret (32)
is that
is the total number of bits per host-signal sample
that can be transmitted through the channel, and
is the
number of bits per sample that are allocated to the host signal .
The difference between the two is the number of bits per host-
signal sample that can be allocated to the embedded information
.
1) Hidden QIM: As we show in this subsection, one can
achieve the capacity (32) by a type of "hidden" QIM, i.e., QIM
that occurs in a domain represented by the auxiliary random
variable . One moves into and out of this domain with pre-
and postquantization processing.
13Extension of results in this section to the case where the channel is only block-wise memoryless is straightforward by letting y and s be the ith blocks, rather than ith scalar components, of y and s. In this case, information rates are measured in bits per block, rather than bits per sample.
1432
IEEE TRANSACTION ON INFORMATION THEORY, VOL. 47, NO. 4, MAY 2001
Fig. 8. Capacity-achieving "hidden QIM." One embeds by choosing a codeword u that is jointly distortion-typical with from the mth quantizer's codebook. The distortion function is e (u; x). The decoder finds a codeword that is jointly typical with . If this codeword is in the ith subset, then m ^ = i.
To develop this optimality of hidden QIM, we begin by
adding an interpretation in terms of quantization (source
coding) to the proof of the achievability of capacity by Gel'fand
and Pinsker [32], the result of which is summarized as fol-
lows. Fig. 8 shows an ensemble of
quantizers, where
, where each source codeword
(quantizer reconstruction vector) is randomly drawn from
the i.i.d. distribution , which is the marginal distribution
corresponding to the host-signal distribution and the max-
imizing conditional distribution
from (32). Although
the source codebooks are, therefore, random, both the encoder
and decoder, of course, know the codebooks. Each codebook
contains
codewords so there are
codewords total.
QIM embedding in this -domain corresponds to finding a
vector in the th quantizer's codebook that is jointly distor-
tion-typical with and generating
By distortion-typical, we mean that and are jointly typical
and
, i.e., the function
is the
distortion function in the -domain. Since the th quantizer's
codebook contains more than
codewords, the proba-
bility that there is no that is jointly distortion-typical with
is small.14 Thus, the selection of a codeword from the th
quantizer is the quantization part of QIM, and the generation of
, and, therefore,
, from the codeword and is the
postquantization processing.
The decoder finds a that is jointly typical with the channel
output and declares
if this is in the th quantizer's
codebook. Because the total number of codewords, is less than
, the probability that a other than is jointly typical
with is small. Also, the probability that is jointly typical with
is close to .15 Thus, the probability of error
is
small, and we can indeed achieve the capacity (32) with QIM in
the -domain.
The remaining challenge, therefore, is to determine the right
pre-processing and postprocessing given a particular channel
(attack) . As mentioned above, for a number of important
cases, it turns out that the only processing required is postquan-
tization distortion compensation. We discuss these cases in the
next subsection.
2) Optimality of Distortion-Compensated QIM: When
distortion-compensated QIM (DC-QIM) as introduced in Sec-
tion IV-A is viewed as an instance of hidden QIM, we obtain
that is a quantized version of . We show in this section
that suitably coded versions DC-QIM can achieve capacity
whenever the maximizing distribution
in (32) is of a
form such that the postprocessing is linear, i.e., when, without
loss of generality, is generated according to
(33)
To see that DC-QIM can achieve capacity when the maximizing pdf in (32) satisfies (33), we show that one can construct an ensemble of random DC-QIM codebooks that satisfy (33). First, we observe that quantizing is equivalent to quantizing with a scaled version of the quantizer and scaling back the result, i.e.,
(34)
where
is as defined following (19). Then, rearranging
terms in the DC-QIM embedding function (19) and substituting
(34) into the result, we obtain
(35)
We construct our random DC-QIM codebooks by choosing
the codewords of
from the i.i.d. distribution ,
the one implied by the maximizing pdf in (32) together with
the host pdf . (Equivalently, we choose the codewords of
in (19) from the distribution of .) Our quan-
tizers
choose a codeword that is jointly distor-
tion-typical with . The decoder looks for a codeword in all
of the codebooks that is jointly typical with the channel output.
Then, following the achievability argument of Section VI-A1,
we can achieve a rate
. From (35), we see that
Since
, we see that
. Thus, if
the maximizing distribution in (32) satisfies (33), our DC-QIM
codebooks can also have this distribution and, hence, achieve
capacity (32).
As a final comment, it is worth emphasizing that QIM sys-
tems are optimal in other important scenarios as well. As one ex-
ample, in the noise-free case
, which arises, for example,
14This principle is, of course, one of the main ideas behind the rate-distortion
15These principles are, of course, two of the main ideas behind the classical
theorem [31, Ch. 13].
channel coding theorem [31, Ch. 8].
CHEN AND WORNELL: QUANTIZATION INDEX MODULATION
1433
when a discrete-valued composite signal is transmitted over a digital channel with no errors, QIM is optimal even without distortion compensation, and achieves capacity [27] (36) As a second example, and as shown in [27], QIM is optimal even when the host signal is also available at the decoder achieving the capacity (37) determined by Heegard and El Gamal [33]. We next examine some key scenarios when the optimality condition (33) is met.
Remarkably, the capacity is independent of the signal variance
and, in fact, as we shall discuss later in this section, is the
same as in the case when the host signal is known at the
decoder. Note that this implies that an infinite energy host
signal causes no decrease in capacity in this Gaussian case, i.e.,
good information-embedding systems can completely reject
host-signal interference in the Gaussian case.
Based on our earlier results, to establish the optimality of
DC-QIM for this channel, it suffices to verify that (33) is sat-
isfied. This follows from the proof [34] of (38). In particular, as
shown in [34], the pdf that maximizes (32) is indeed one im-
plied by (33), for some parameter , where is chosen as a
function of so that
and so that the pair and
are independent. To see this, note that for a fixed value of , an
achievable rate
is [34]
VII. GAUSSIAN CHANNELS In this section, we examine the ultimate performance limits of information-embedding methods when both the host signal is white and Gaussian, the channel is an AWGN channel, and the host and channel noise are independent of one another. Extensions to colored host and/or colored channel cases are developed in [15], [27]. Our main result of the section is that DC-QIM is optimal for this class of channels, and that, in addition, the optimum distortion-compensation parameter is also given by (20), which maximized SNR in uncoded DC-QIM systems. In general, the embedding strategies optimized for Gaussian channel models can be expected to be good designs for a variety of applications in which one primarily requires robustness against unintentional attacks.16 And while Gaussian host models are not always accurate, the better the host-signal interference rejection properties of an information-embedding system, the smaller the role we might expect the host-signal model to play in determining the ultimate performance of such systems. A. Capacities and the Optimality of DC-QIM Specializing the formulation of Section VI-A to the Gaussian scenario of interest, with the zero-mean, variance- variables denoting elements of the -dimensional host-signal vector , and, similarly, the zero-mean, variance- variables denoting elements of the corresponding noise vector , the distortion constraint can be expressed as
with the corresponding constraint on
in (32) being
. We see that squared-error distortion-constrained,
Gaussian information embedding is equivalent to power-con-
strained communication over a Gaussian channel with Gaussian
side information known at the encoder, a case for which Costa
[34] has determined the capacity to be, expressed in terms of
the (embedding induced) DNR
DNR
DNR
(38)
16Indeed, these models can even apply to optimal, i.e., rate-distortion achieving [31], lossy compression of a Gaussian source, as discussed in [27].
which can also be written in terms of the DNR and the host SNR
(SNR
)
DNR DNR SNR
DNR SNR DNR
SNR (39)
This rate is maximized by setting (cf. (20)) DNR (40) DNR from which we conclude that the rate (38) is achievable. To establish that (38) is also the maximum achievable rate, it suffices to show that it is the capacity when is known at the decoder, since one obviously cannot do better in the host-blind case. To develop the known-host capacity, first recall that the capacity is given by (37). Again, the maximization is subject to a distortion constraint, which in the case of white noise is . Because subtracting a known constant from does not change mutual information, we can equivalently write
Noting that
, we immediately conclude that
in the case of an AWGN channel the known-host capacity is
indeed given by (38), where the maximizing distribution is
a zero-mean Gaussian distribution with variance .
In the known-host case, additive spread spectrum is optimal,
and optimal additive spread-spectrum systems superimpose
zero-mean i.i.d. Gaussian sequences with variance onto the
host signal. However, it is important to note that QIM is also
optimal in this case as well--as discussed in [15], quantizers
of optimal QIM systems have reconstruction sequences
chosen i.i.d. from a zero-mean Gaussian distribution with
variance
. Hence, yet another attractive property of
QIM methods is that they are optimal in more general Gaussian
broadcast scenarios, where some intended recipients of the
embedded information know the host signal and some do not.
As a final comment, several of the methods we have dis-
cussed can be optimal in the small host-signal interference sce-
1434
IEEE TRANSACTION ON INFORMATION THEORY, VOL. 47, NO. 4, MAY 2001
nario
. In fact, the capacity (38) is rather immediate
in this scenario: Fig. 2 reduces to the classical communication
problem considered in, e.g., [31],
, so that the capacity is
the usual mutual information between
and maximized
over all
such that
. In the AWGN channel
case, specifically, (38) results. Examining (39) in the associated
regime SNR
, we see that distortion-compensated QIM
with any , including
(regular QIM), is optimal in this
small host interference scenario. As one might expect, additive
spread-spectrum systems can be capacity-achieving in this limit
as well, which we will see more explicitly in Section VII-C4.
B. Capacities for Hybrid Transmission In this section, we consider scenarios corresponding to applications in which information embedding is part of a hybrid transmission scheme. We investigate two classes of such schemes: analog­digital and digital­digital transmission. In the former class, the host is an analog signal, as arises in, for example, the digital audio broadcasting application. In the latter class, the host signal is itself a digital signal, which has implications for broadcast transmission and related applications [31, Ch. 14]. In both cases, one is generally most concerned with the quality of the received signals, i.e., the channel output, rather than the channel input (composite signal). 1) Analog Host Signals: In this subsection, we determine how reliable embedding at a given rate impacts the quality with which an analog host signal is received and can be decoded with its conventional receiver from a noisy channel. In general, the effect of the embedding is to create an additional noise source DNR times as strong as the channel noise, and, therefore, the received signal quality drops by a factor of DNR or
DNR dB
(41)
For example, in the scenario analyzed in Section VII-A, op-
timum DC-QIM results in an embedding-induced distortion that
looks like white noise with variance . With no embedding,
one would have had a received host SNR of SNR
.
Due to the additional interference from the embedding-induced
distortion, however, the received host SNR drops to
SNR DNR a drop of DNR. Since the capacity in bits per dimension (bits per host-signal sample) is given by (38), and there are two independent hostsignal samples per second for every hertz of host-signal bandwidth [28], the capacity in bits per second per hertz (b/s/Hz) is
DNR b/s/Hz
(42)
Taking the ratio between (41) and (42), we see that the "value" in embedded rate of each decibel drop in received host-signal quality is
DNR DNR
0.3322 b/s/Hz/dB (43)
Thus, the available embedded digital rate in bits per second depends only on the bandwidth of the host signal and the tolerable degradation in received host-signal quality, and is approximately 1/3 b/s for every hertz of bandwidth and every decibel drop in received host SNR. It is worth noting that, as developed in [15], [27], these results carry over to the case of colored host and/or colored channel cases as well. Additional insights into the performance limits of such systems when the digital signal is specifically information for refining the analog signal, as arises in applications involving the upgrading of analog infrastructure, are developed in [20]. 2) Coded Digital Host Signals: When the host signal is a coded digital signal, an alternative measure of the received hostsignal quality is the capacity of the corresponding host digital channel. For example, in the case of white noise and a white host signal,17 if there were no embedding, the capacity corresponding to a host digital signal power of and a noise variance of would be
SNR
Embedding an additional digital signal within the host digital signal drops the host digital capacity to SNR DNR due to the drop in received host SNR of DNR. Unlike in the case of an analog host signal, if one must actually lower the rate of the coded host digital signal as a result of the embedding, then one may have to redesign both the digital encoder that generates this coded digital host signal and the corresponding decoder. Thus, depending on the designed noise margin of the original digital host signal, backward compatibility may or may not be possible. However, even when digital­digital transmission cannot be backward-compatible, using information embedding for simultaneous transmission of two digital signals is potentially attractive from the point of view of complexity and privacy. In particular, the decoder for the host signal need not decode (nor know how to decode) the embedded signal, and vice versa. As discussed further in [27], this is qualitatively different behavior from the superposition coding and successive cancellation decoding one might otherwise use for simultaneous transmission of two digital signals, where one of the receivers needs to decode both messages to receive its own. Interestingly, the information-embedding approach is equally efficient. To see this, we note that the embedded digital channel rate is given by (38)
DNR so that the combined rate of the two channels is DNR SNR
Since the associated expended power is
, we conclude
that this digital-over-digital transmission strategy is indeed ef-
17As is well known [31], white Gaussian coded signals are capacity-achieving for transmission over AWGN channels, so this is a good model for the host signal in this case.
CHEN AND WORNELL: QUANTIZATION INDEX MODULATION
1435
ficient: the combined rate
is as large as the achievable
of spread-transform QIM by appropriately modifying
rate using a single digital signal with this same total power.
(44) to obtain
C. Gaps to Capacity
In Section VII-A, we saw that DC-QIM is a capacity-
achieving strategy. In this section, for comparison, we eval-
uate the degree to which specific strategies such as regular
QIM (i.e., without distortion compensation), coded additive
spread-spectrum, uncoded STDM, and uncoded generalized
LBM can each approach capacity--and hence the performance
of DC-QIM--when suitably optimized. We quantify the perfor-
mance of these systems in terms of the additional DNR required
to achieve the same rate as a capacity-achieving system.
1) Regular QIM Gap to Capacity: As we now show, the per-
formance of the best QIM methods without distortion compen-
sation can approach the Gaussian capacity at high rates and is
within 4.3 dB of capacity at low rates, indicating that the QIM
class is large enough to include very good embedding functions
and decoders.
To develop a lower bound on the achievable rate of QIM
without distortion compensation, we begin by specializing (39)
to the case
, resulting in
DNR SNR
DNR
(44)
DNR SNR
DNR
DNR SNR DNR SNR
DNR
(45)
To upper-bound the gap between QIM and capacity we first recognize from (45) that the minimum DNR required for QIM to achieve a rate asymptotically with large is
DNR
(46)
which is minimized at even in the limit of large sets
to have
.19 However, . Thus, if one
(47)
then (46) remains a valid upper bound on the required DNR
for a QIM method to achieve a rate . From (38) we see that
the minimum DNR required for a capacity-achieving method to
achieve a rate is
, which when combined
with (46) yields the following upper bound between QIM and
the Gaussian capacity:
where to achieve this bound we choose reconstruction points
from the pdf implied by (33).18 The right-hand side of (44) is
generally not the capacity of QIM, however--i.e., QIM systems
can achieve a rate greater than the lower bound (44). Indeed, the
right-hand side of (44) actually approaches in the limit of
low DNR.
A tighter lower bound is obtained by developing a dif-
ferent lower bound on the capacity of a particular subclass
of QIM methods we refer to as "spread-transform QIM."
In spread-transform QIM, which is a generalization of
STDM as developed in Section V-B, the host-signal vector
is projected onto
orthonormal vec-
tors
to obtain transformed host-signal
samples
which are quantized using QIM.
Because projection onto the vectors represents a change of
orthonormal basis, the transformed host-signal samples and
the transformed noise samples
, which are the
projections of the original noise vector
onto
the orthonormal vectors , are still independent, zero-mean,
Gaussian random variables with the same variance as the
original host signal and noise samples, respectively. However,
if the distortion per original host-signal sample is , then the
distortion per transformed host-signal sample is
. Thus,
we obtain a "spreading gain" of in terms of DNR, but the
number of bits embedded per original host-signal sample is
only
times the number of bits embedded per transformed
host-signal sample. Thus, one can determine an achievable rate
= N 18The pdf of the reconstruction points u s in this case is (0; D + ), N 0 which is not the same as the well-known rate-distortion optimal pdf [31] for quantizing Gaussian random variables, which is (0; D ).
DNR (48) DNR
This expression is plotted in Fig. 9, where is given by (47).
We now examine the asymptotic limits of (48) at low and high
rates. Equation (47) implies
in the limit
of small , so in this limit (48) approaches
DNR DNR
as
Thus, the gap is at most a factor of (approximately 4.3 dB) in the limit of low rates. In the limit of large , (47) implies so (48) approaches
DNR as DNR
Thus, QIM asymptotically achieves capacity at high embedding rates. As we described in Section VII-B, in hybrid transmission applications one may be concerned about the degradation to the received host signal, which is DNR rather than DNR. The
19Note that since N + 0:5
N round

N 0 0:5
one can, indeed, approach this optimum spreading gain L in the limit of large
N even though N=L
need be a positive integer less than or equal to N .
1436
IEEE TRANSACTION ON INFORMATION THEORY, VOL. 47, NO. 4, MAY 2001
Fig. 9. DNR gap between spread-transform QIM and Gaussian capacity Fig. 10. Received host SNR gap (1 + DNR) between spread-transform QIM
(achieved by DC-QIM). The maximum gap is a factor of e ( 4.3 dB).
and capacity (achieved by DC-QIM).
gap in DNR (48) is larger than the gap in a corresponding upper bound
DNR , which has
DNR DNR
This gap is plotted in Fig. 10 as a function of , the rate in
b/s/Hz. Again, is given by (47) since minimizing DNR
also minimizes DNR . Thus, for example, at the (near)
worst case digital rate of 1 b/s/Hz using QIM requires at most 1.6
dB more drop in analog channel quality than the approximately
3-dB drop required for DC-QIM (Section VII-B1).
2) Uncoded STDM Gap to Capacity: The results above can
be compared to the achievable performance of uncoded binary
STDM with uniform scalar quantization as a minimal-com-
plexity realization of QIM.
The gap between uncoded STDM and capacity can easily
be quantified for low rates
, which are typical in
many applications, at a given probability of error. A straightfor-
ward union bound on the bit-error probability of uncoded binary
STDM with uniform scalar quantization is (see Fig. 6)
Fig. 11. Uncoded STDM gap to Gaussian capacity. The solid curve shows the bit-error probability for uncoded STDM as a function of rate-normalized distortion-to-noise ratio (DNR ). The dashed curve is the minimum required DNR for reliable information embedding for any embedding method.
For low embedding rates minimum required DNR error is
,
so the
for arbitrarily low probability of
DNR
1.4 dB
(50)
This bound is reasonably tight for low error probabili-
ties, and from (26) we can write this probability of error
in terms of the rate-normalized distortion-to-noise ratio
DNR
DNR
DNR
DNR
(49)
A capacity-achieving method can achieve arbitrarily low prob-
ability of error as long as
, which using (38) can
be expressed as
DNR
The probability of error of STDM is plotted as a function of DNR in Fig. 11. The required DNR for a given can be compared to (50) to determine the gap to capacity. For example, at an error probability of , uncoded STDM is about 13.6 dB from capacity. One can reduce this gap by at least 9.3 dB through channel coding, vector quantization, and nondithered quantization. The remaining gap (at most 4.3 dB) is the gap between QIM and capacity and can be closed with distortion compensation. As shown in [15], [27], it is fairly easily to close the gap between uncoded STDM (with uniform scalar quantizers) and capacity by about 6 dB using practical channel codes and distortion compensation. 3) Uncoded Spread LBM Gap to Capacity: The gap to capacity for uncoded binary spread LBM based on uniform, scalar
CHEN AND WORNELL: QUANTIZATION INDEX MODULATION
1437
quantization also follows readily from the results of Appendix
A, which shows that the distortion-normalized minimum dis-
tance for this form of spread LBM is a factor of
2.43 dB
worse than that of STDM (26). Thus, the LBM counterpart to
(49) is that the bit-error probability of uncoded spread LBM is
DNR
(51)
Thus, the gap to capacity of uncoded binary spread LBM at an
error probability of
is about 16 dB, 2.4 dB more than the
13.6-dB gap of uncoded binary STDM. Furthermore, as also
discussed in Appendix A, for -ary implementations the gap
widens by an additional 0.6 dB as
.
4) Coded Additive Spread-Spectrum Gap to Capacity: For
additive spread spectrum, where
, the distortion
signal in Fig. 2 is not a function of the host signal:
. Thus,
. The distortion constraint
is still
so that in the Gaussian case considered here,
the achievable rate of an additive spread-spectrum method is the
well-known [31] Gaussian channel capacity, treating both and
as interference sources20
a uniform distribution over the interval case
. In this
where the quantization error is uniformly distributed over the
interval
and statistically independent of [29].
Thus, the achievable rate
is slightly lower than the
case where is Gaussian. The entropy power inequality can be
used to show that the decrease in achievable rate is bounded by
[36]
DNR (54) DNR
This gap approaches the upper limit of
0.2546
b/dimension as the DNR gets large. For any finite DNR, the gap
is smaller. By subtracting the upper bound on the gap (54) from
the capacity (38), one obtains a lower bound on the achievable
rate of this type of dither modulation
DNR
(55)
DNR SNR (52)
where, again, SNR is the ratio between the host-signal variance and the channel-noise variance. Comparing (52) to (38), we see that the gap to capacity of additive spread spectrum is
DNR
SNR
(53)
DNR
which is typically large, since SNR must be large so that
channel noise will not excessively degrade signal quality.
In fact, in the high signal-to-distortion rastio (SDR) limit
where
, the achievable rate of additive spread-spec-
trum (52) clearly approaches zero, again reflecting the inability
of additive spread-spectrum methods to reject host-signal
interference like other methods.
At the opposite extreme, when SNR
the host inter-
ference is small so the gap (53) disappears, and, indeed, addi-
tive spread spectrum is an optimum embedding strategy for this
case, along with both DC-QIM and QIM as discussed at the end
of Section VII-A.
The other scenario in which additive spread spectrum can be
optimal is when the host is known at the decoder, which also
corresponds to a noninterfering host situation.
5) Known-Host Case: As discussed at the end of Sec-
tion VII-A, both capacity-achieving QIM and capacity-achie-
ving additive spread-spectrum methods exist when the host
signal is known at the decoder. Although QIM realizations in
the form of coded dither modulation with uniform, scalar quan-
tization are not optimal in this case, for AWGN channels one
can achieve performance within
1.53 dB of capacity
as we show below. We consider the case of dither signals with
20This rate is also the capacity when n is non-Gaussian, but still independent of s, and a correlation detector is used for decoding [35].
Thus, dither modulation with uniform scalar quantization in this
case is at most
1.53 dB from capacity.
VIII. INTENTIONAL ATTACKS We now turn our attention from AWGN channel models for unintentional attacks, to some alternative models for intentional attacks. Intentional, distortion-constrained attacks may be encountered in copyright, authentication, and covert communication applications. In these kinds of applications, attackers generally attempt to remove or alter the embedded information, and face a distortion constraint on their signal manipulations so that the integrity of the host signal is not compromised. An attacker's ability to prevent reliable watermark decoding depends on the amount of knowledge that the attacker has about the embedding and decoding processes. To limit such knowledge, some digital watermarking systems use keys, parameters that allow appropriate parties to embed and/or decode the embedded signal. The locations of the modulated bits and the pseudo-noise vectors in an additive spread-spectrum and generalized LBM systems are examples of keys. If only certain parties privately share the keys to both embed and decode information, and no one else can do either of these two functions, then the watermarking system is a private-key system. Alternatively, if some parties possess keys that allow them to either embed or decode, but not both, then the system is a public-key system since these keys can be made available to the public for use in one of these two functions without allowing the public to perform the other function. However, in some scenarios it may be desirable to allow everyone to embed and decode watermarks without the use of keys. For example, in a copyright ownership notification system, everyone could embed the ASCII representation of a copyright notice such as, "Property of ..." in their copyrightable works. Such a system is analogous to the system currently used to place copyright notices in (hardcopies of) books, a system in which there is no
1438
IEEE TRANSACTION ON INFORMATION THEORY, VOL. 47, NO. 4, MAY 2001
need for a central authority to store, register, or maintain separate keys--there are none--or watermarks--all watermarks are English messages--for each user. The widespread use of such a universally accessible "no-key" system requires only standardization of the decoder so that everyone will agree on the decoded watermark, and hence, the owner of the copyright. We analyze both private-key and no-key systems in the sequel, and establish the attractiveness of QIM in both cases. A. Attacks on Private-Key Systems Although the attacker does not know the key in a private-key scenario, he or she may know the basic algorithm used to embed the watermark. In [16], Moulin and O'Sullivan model such a scenario by assuming that the attacker knows the codebook distribution, but not the actual codebook. As we now develop, exploiting results of Moulin and O'Sullivan in this private-key scenario, we determine that DC-QIM methods are optimal (capacity-achieving) against squared-error distortion-constrained attackers. Moulin and O'Sullivan have derived both the capacityachieving distribution and an explicit expression for the capacity (32) in the case where the host is white and Gaussian and the attacker faces an expected perturbation energy constraint . In this case, the capacity is [16] DNR
SNR SNR
DNR DNR
where DNR
is the distortion-to-perturbation
ratio and SNR
is the host-signal-to-perturba-
tion ratio. The maximizing distribution is such that [16]
with
statistically independent of and
DNR (56) DNR
Since this distribution satisfies the condition (33), we can infer from our analysis in Section VI-A2 that DC-QIM can be used to achieve capacity against these attacks. Moreover, (56) gives the optimal distortion-compensation parameter. Moulin and O'Sullivan have also considered the case of host signals that are not necessarily Gaussian but that have zero mean, finite variance, as well as bounded and continuous pdfs. In the limit of small (high SDR) and , a limit of interest in high-fidelity applications, the capacity approaches
-
DNR
and the capacity-achieving distribution is such that
-
where, again,
is statistically independent of
[16]. Since this distribution satisfies the condition (33), we can
again conclude that distortion-compensated QIM can achieve
capacity in this high-fidelity limit. The capacity-achieving distortion-compensation parameter is [16]
DNR
-
DNR
B. Attacks on No-Key Systems In contrast to the scenario above, in no-key systems an attacker has full knowledge of the embedding and decoding processes, including all codebooks. For this case, some deterministic models we develop in this section are better for characterizing the associated worst case in-the-clear (i.e., fully informed) attacks. With these models, we show that QIM methods in general, and dither modulation in particular, are robust and achieve provably better rate distortion­robustness tradeoffs than both additive spread-spectrum and generalized LBM techniques. We consider two models for such attackers: 1) a bounded perturbation channel model in which the squared-error distortion between the channel input and channel output is bounded and 2) a bounded host-distortion channel model in which the squared-error distortion between the host signal and channel output is bounded. In each case, we develop conditions under which error-free decoding is possible with various implementations of QIM and DC-QIM, and quantify their advantages over the corresponding realizations of additive spread spectrum and generalized LBM. 1) Bounded Perturbation Channel: The bounded perturbation channel is one in which the attacker can perturb the composite signal in any way it desires (based on its full knowledge of the composite signal and the embedding algorithm), provided the energy in the perturbation vector does not exceed a prescribed level, i.e., (16), which reflects a requirement that the attacker not excessively degrade the original composite signal. Thus, this channel model imposes only a maximum distortion21 or minimum SNR constraint between the channel input and output. Binary dither modulation with uniform scalar quantization: One can combine the guaranteed error-free decoding condition (17) for a minimum-distance decoder (15) with the distortion-normalized minimum distance (26) of binary dither modulation with uniform scalar quantization to compactly express its achievable performance as
(57) or, equivalently, its achievable rate as (58) One can view the achievable rate (58) as the deterministic counterpart to the more conventional notions of achievable rates and capacities of random channels discussed in Sections VI and VII. 21Some types of distortion, such as geometric distortions, can be large in terms of squared error, yet still be small perceptually. However, in some cases, these distortions can be mitigated either by preprocessing at the decoder or by embedding information in parameters of the host signal that are less affected (in terms of squared error) by these distortions. For example, a simple delay or shift may cause large squared error, but the magnitude of the discrete Fourier transform coefficients are relatively unaffected.
CHEN AND WORNELL: QUANTIZATION INDEX MODULATION
1439
Additive spread spectrum: The nonzero minimum distance of QIM methods offers quantifiable robustness to perturbations, even when the host signal is not known at the decoder. In contrast, additive spread-spectrum methods offer relatively little robustness if the host signal is not known at the decoder. As discussed in Section III, these methods have linear embedding functions of the form
TABLE I ATTACKER'S DISTORTION PENALTIES. THE DISTORTION PENALTY IS THE ADDITIONAL DISTORTION THAT AN ATTACKER MUST INCUR TO SUCCESSFULLY REMOVE A WATERMARK. A DISTORTION PENALTY LESS THAN 0 dB INDICATES THAT THE ATTACKER CAN ACTUALLY IMPROVE THE SIGNAL QUALITY AND REMOVE THE WATERMARK SIMULTANEOUSLY
(59)
where
is a pseudo-noise vector. From the definition of
minimum distance (13)
i.e., the minimum distance is zero.
Thus, although these methods may be effective when the
host signal is known at the decoder, when the host signal is not
known, they offer no guaranteed robustness to perturbations,
i.e., no achievable rate expression analogous to (58) exists for
additive spread spectrum. As is evident from (59), in an additive
spread-spectrum system, is an additive interference, which
is often much larger than due to the distortion constraint.
In contrast, the quantization that occurs with QIM provides
immunity against this host-signal interference, as discussed in
Section IV.22
Generalized LBM: As shown in Appendix A, the distor-
tion-normalized minimum distance of generalized binary LBM
with uniform scalar quantization is about 2.43 dB worse than
that of the corresponding dither-modulation strategy. Therefore,
its achievable rate-distortion-robustness performance is also
about 2.43 dB worse than (57). Again, as also developed in the
appendix, for -ary implementations, the gap grows to 3 dB
for large .
2) Bounded Host-Distortion Channel: As an alternative to
the bounded perturbation channel, some attackers may work
with distortion constraint between the channel output and the
host signal, rather than the channel input, since this distortion
is the most direct measure of degradation to the host signal. For
example, if attackers have partial knowledge of the host signal,
which may be in the form of a probability distribution, so that
they can calculate this distortion, then it may be appropriate to
bound the expected distortion
, where this ex-
pectation is taken over the conditional probability density .23
We refer to this as the bounded host-distortion channel.
For this channel, we measure robustness to attacks by the
minimum expected distortion for a successful attack, where
the expectation is taken with respect to . The ratio between
and the expected embedding-induced distortion is the
distortion penalty that the attacker must pay to remove the wa-
termark and, hence, is a figure of merit measuring the robust-
ness­distortion tradeoff at a given rate. Distortion penalties for
the primary methods of interest are derived below and summa-
rized in Table I for the high SDR regime of primary interest.
As this table reflects, among these methods considered, only
QIM methods (including binary dither modulation with uniform
scalar quantization) are robust enough such that the attacker
must degrade the host-signal quality to remove the watermark.
Regular QIM: We first consider the robustness of regular
QIM. For any distortion measure, as long as each reconstruc-
tion point lies at the minimum-distortion point of its respective
quantization cell, the QIM distortion penalty is greater than or
equal to since any output that an attacker generates must nec-
essarily lie away from this minimum-distortion point. Equality
occurs only if each quantization cell has at least two minimum-
distortion points, one of which lies in the incorrect decoder de-
cision region. For expected squared-error distortion, the min-
imum-distortion point of each quantization cell is its centroid,
and one can express this distortion penalty in terms of the dis-
tortion-normalized minimum distance and the signal length ,
as we show below.
We use to denote the quantization cell containing and
to denote the conditional pdf of given that
.
Again, for sufficiently small quantization cells, this pdf can
often be approximated as uniform over , for example. Since
is the centroid of
(60)
22Another way to understand this host-signal interference rejection is to con-
sider, for example, that a quantized random variable has finite entropy while a
continuous random variable has infinite entropy.
= D = 23Note that if the attacker has full knowledge of the host signal, he or she can
trivially remove the embedded information by setting
, so
0. We
restrict our attention to the more realistic scenario in which an attacker has only
partial knowledge of the host, in the form of a conditional pdf.
Also, the expected squared error per letter embedding-induced
distortion given
is
(61)
1440
IEEE TRANSACTION ON INFORMATION THEORY, VOL. 47, NO. 4, MAY 2001
The most general attack can always be represented as
an expression whose counterpart for the bounded perturbation
, where may be a function of . The resulting distortion channel was (57). Thus, the corresponding achievable rates are
is
given by
Distortion-compensated QIM: An in-the-clear attacker of a DC-QIM system knows the quantizers and can determine the watermark after observing the composite signal . If the quantization cells are contiguous so that the distortion-compensation term in (19) does not move out of the cell containing , then an attacker can recover the original host signal with the following attack:
where we have used (61), the fact that
is a pdf and, thus,
integrates to one, and (60) to obtain the last line. For a successful
attack,
so
Averaging both sides of this expression over all quantization
cells yields
so that our figure of merit
for QIM methods is
(62)
Thus, for any QIM method of nonzero distortion-normalized
minimum distance
, the attacker's distortion penalty is al-
ways greater than (0 dB), indicating that to remove the water-
mark, the attacker must degrade the host-signal quality beyond
the initial distortion caused by the embedding of the watermark.
Binary dither modulation with uniform, scalar quantiza-
tion: In this case, (26) gives
in (62). Moreover, due to the
uniformity of the quantizers, the bound (62) is met with equality
so that the attacker's distortion penalty specializes to
(63)
Because the Hamming distance of a block code cannot exceed the number of coded bits
where the first equality follows from the definition (23) of . Thus, an upper bound for the distortion penalty (63) in this case is 2.43 dB Although this penalty may seem modest, it is larger than that obtainable by either additive spread spectrum or generalized LBM, as we show below. Larger distortion penalties are not possible because in-the-clear attackers can concentrate all their distortion in the minimum-distance direction in -dimensional space. As a final note, (63) implies that binary dither modulation with uniform, scalar quantization can defeat any attacker as long as
where the final line follows simply by inverting (19). Thus, the
attacker's distortion penalty
is decibels. We see
that although DC-QIM is optimal against both independent ad-
ditive Gaussian noise attacks and squared-error-distortion-con-
strained attacks in private-key scenarios, it is in some sense
"maximally suboptimal" against in-the-clear attacks. Regular
QIM, on the other hand, is almost as good as DC-QIM against
additive Gaussian noise attacks (Section VII) and also resistant
to in-the-clear attacks as discussed above. Thus, regular QIM
methods may offer an attractive compromise when one requires
resistance to both intentional attacks and unintentional attacks
in a no-key system.
Additive spread spectrum: Since the embedding function
of an additive spread-spectrum system is (2), the resulting dis-
tortion is
. An attacker with full knowledge
of the embedding and decoding processes can decode the mes-
sage , and hence, reproduce the corresponding pseudo-noise
vector . Therefore, the attacker can completely remove the
watermark by subtracting from to obtain the original host
signal, i.e.,
. Hence, the resulting distortion
penalty is
decibels.
Because the additive spread-spectrum embedding function
combines the host signal and watermark
in a simple
linear way, anyone that can extract the watermark, can easily re-
move it. Thus, these methods are not suitable for universally ac-
cessible no-key digital watermarking applications. By contrast,
the advantage of QIM is that it effectively hides the host signal
even when the embedded information is known.
Generalized LBM: Recall that the embedding function of
a generalized LBM system can be written as (8) with having
the property (9). Good generalized LBM systems also have the
property that the reconstruction points of are at the cen-
troids of the quantization cells, as we shall assume. One attack
that completely removes information about is to output these
reconstruction points, i.e.,
. Since is at a
minimum distortion point of the quantization cell,
0 dB, with equality only if both and are minimum
distortion points. Thus, an attacker can remove the watermark
without causing additional distortion to the host signal. This
CHEN AND WORNELL: QUANTIZATION INDEX MODULATION
1441
result applies regardless of whether error-correction coding is used. Thus, in contrast to dither modulation, error-correction coding does not improve LBM in this context. When, in addition, it is the least significant bit of a uniform, scalar quantizer that is modulated, the results in Appendix A imply further that
while
Thus,
2.43 dB. Again, when many less
significant bits are modulated, the results of the appendix can
be used to establish that the penalty grows to 3 dB.
IX. CONCLUDING REMARKS We have seen that QIM methods are provably better than additive spread spectrum and generalized LBM against bounded perturbation and in-the-clear attacks and are near-optimal for Gaussian channels, for which DC-QIM is optimal. Furthermore, dither modulation is a practical implementation of QIM that exhibits many of the attractive performance properties of QIM. The convenient structure of dither modulation, which is easily combined with error-correction coding, allows the system designer to achieve different rate distortion­robustness tradeoffs by tuning parameters such as the quantization step size. Also, one can conveniently upgrade previously developed additive spread-spectrum and spread LBM systems to spread-transform dither-modulation systems by replacing the respective addition and quantize-and-replace steps with a dithered quantization step. In the course of our investigation, a number of rather intriguing results have emerged. For example, the information-embedding capacity in the Gaussian case does not depend at all on whether the host signal is available during decoding, and DC-QIM is optimal in both scenarios, and achieves perfect rejection of host-signal interference, even in the high-SDR regime. Also somewhat surprisingly, the optimal embedding strategy for Gaussian channels and for typical attacks in private-key systems, DC-QIM is "maximally suboptimal" against in-the-clear attacks. On the other hand, regular QIM, which has performance within 4.3 dB of DC-QIM in the Gaussian case, performs better than any other currently known method against in-the-clear attacks, which arise in copyright notification applications where no-key architectures are used, for example. In particular, unlike additive spread-spectrum and generalized LBM methods, QIM and dither-modulation methods force an attacker to pay a distortion penalty. Thus, QIM emerges as a universally good embedding strategy against a wide variety of intentional and unintentional attacks. For hybrid transmission strategies, using DC-QIM for digitalover-analog transmission (in, for example, digital audio broadcasting applications) allows embedding rates of about 1/3 b/s/Hz for every decibel drop in analog signal quality. In digital-overdigital transmission (in broadcast applications, for example),
DC-QIM is as efficient as any single digital transmission, and thus as good as the alternative superposition coding and successive cancellation-decoding approach. Many important directions for further research remain. At one end of the spectrum, further insights into the fundamental principles and structure of information embedding and digital watermarking systems will come from the development of still better general attack models. Those emerging from game-theoretic formulations and arbitrarily varying channel models appear to be an important starting point in this respect. At the same time, many of the results in this paper have important implications for practical applications, and the most effective implementations of QIM and DC-QIM embedding systems for these applications will take into account, in detail, the specific types of geometric distortions and other attacks that typically arise. For example, in image watermarking applications, embedders and decoders ultimately need to be robust to a wide range of often surprisingly challenging attacks, ranging from scaling and rotation, to cropping and column replacement. A great deal of future work is needed in this area to enable the use of QIM techniques in watermarking applications, and indeed these represent some especially interesting design challenges. APPENDIX A LBM DISTORTION-NORMALIZED MINIMUM DISTANCE In this appendix we calculate the distortion-normalized minimum distance of binary LBM with uniform, scalar quantization. We assume that the host signal and embedded signal are statistically independent. Since the embedding function of any good generalized LBM method can be written as (8) with (10), the expected distortion is
(64)
where we have used (10) and the independence of and to
obtain the final line.
We analyze coded binary LBM with uniform scalar quantiza-
tion, an LBM system in which each in a sequence of coded bits
is repeated times and embedded in a length- block with a
sequence of uniform, scalar quantizers.
The embedding is accomplished by modulating the least sig-
nificant bit of each quantizer. The th uniform, scalar quantizer
is illustrated in Fig. 12. The coarse quantizer
has a step
size of , and the th least significant bit adjustment element
equals
.
Comparing this scheme to coded binary dither modulation
with uniform scalar quantization as described in Section V-A,
we see that this scheme has the same minimum distance, i.e.,
(22). Restricting attention to the high SDR regime in which
can be modeled as uniformly distributed within each cell of ,
1442
IEEE TRANSACTION ON INFORMATION THEORY, VOL. 47, NO. 4, MAY 2001
ACKNOWLEDGMENT The authors thank Prof. A. Lapidoth for calling their attention to the paper by Costa, and an anonymous reviewer for helpful comments that improved the clarity of the paper.
Fig. 12. Low-bit modulation with a uniform, scalar quantizer. The quantizer
1 2 has a step size of = , and the least significant bit (lsb) is modulated. All
2 0 reconstruction points marked with a have an lsb of . Points marked with a
1 have an lsb of . This process is equivalent to first quantizing using a quantizer
1 with a step size of , whose reconstruction points are marked with a , and
61 4 adding
=.
as was used to develop (24) in Section V-A, the first term in (64) is (65)
the same as the expected distortion (25) of the corresponding dither modulation system. The second term in (64), however, is (66)
Thus, the overall expected distortion is
and the distortion-normalized squared minimum distance is
By comparing with (26), we see that binary coded LBM with uniform scalar quantization is worse than the corresponding dither modulation system by
2.43 dB
(67)
Also, note that the result (67) is invariant to the actual dis-
tribution of the 's, and invariant to any preprocessing of the
host signal by a unitary transformation. Thus, the gap between
STDM and spread LBM is also given by (67).
In other variants of LBM, the gap can be worse. For in-
stance, in the case of -ary coded implementations of dither
modulation and LBM based on uniform scalar quantization
where the
sets of reconstruction points together form a
regular lattice, then the minimum distances of the two schemes
remain equal (but generally different from the binary case), and
the first term in (64) remains (65). However, as gets large,
becomes effectively uniformly distributed over the range
, so the second term in (64) changes from (66) to
the same as (65). Thus, the gap (67) grows to a factor of (3 dB) in this large limit.
REFERENCES [1] M. D. Swanson, M. Kobayashi, and A. H. Tewfik, "Multimedia dataembedding and watermarking technologies," Proc. IEEE, vol. 86, pp. 1064­1087, June 1998. [2] I. J. Cox and J.-P. M. G. Linnartz, "Some general methods for tampering with watermarks," IEEE J. Select. Areas Commun., vol. 16, pp. 587­593, May 1998. [3] J.-P. Linnartz, T. Kalker, and J. Haitsma, "Detecting electronic watermarks in digital video," in Proc. 1999 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 4, Phoenix, AZ, Mar. 1999, pp. 2071­2074. [4] G. Arce, C. G. Boncelet Jr., R. F. Graveman, and L. M. Marvel, "Applications of Information Hiding," in Proc. 3rd Annu. Federated Laboratory Symp. Advanced Telecommunications & Information Distribution Research Program, College Park, MD, Feb. 1999, pp. 423­427. [5] H. C. Papadopoulos and C.-E. W. Sundberg, "Simultaneous broadcasting of analog FM and digital audio signals by means of adaptive precanceling techniques," IEEE Trans. Commun., vol. 46, pp. 1233­1242, Sept. 1998. [6] B. Chen and C.-E. W. Sundberg, "Broadcasting data in the FM band by means of adaptive contiguous band insertion and precancelling techniques," in Proc. 1999 IEEE Int. Conf. Communications, vol. 2, Vancouver, BC, Canada, June 1999, pp. 823­827. [7] M. D. Swanson, B. Zhu, and A. H. Tewfik, "Data hiding for video-invideo," in Proc. 1997 IEEE Int. Conf. image processing, vol. 2, Piscataway, NJ, 1997, pp. 676­679. [8] J. M. Barton, "Method and apparatus for embedding authentication information within digital data," United States Patent 5 646 997, July 1997. [9] K. Tanaka, Y. Nakamura, and K. Matsui, "Embedding secret information into a dithered multi-level image," in Proc. 1990 IEEE Military Communications Conf., 1990, pp. 216­220. [10] W. Bender, D. Gruhl, N. Morimoto, and A. Lu, "Techniques for data hiding," IBM Syst. J., vol. 35, no. 3­4, pp. 313­336, 1996. [11] I. J. Cox, J. Killian, T. Leighton, and T. Shamoon, "A secure, robust watermark for multimedia," in Information Hiding. 1st Int. Workshop Proc. , June 1996, pp. 185­206. [12] J. R. Smith and B. O. Comiskey, "Modulation and information hiding in images," in Information Hiding. 1st Int. Workshop Proc., June 1996, pp. 207­226. [13] J. R. Hernandez, F. Perez-Gonzalez, J. M. Rodriguez, and G. Nieto, "Performance analysis of a 2-D-multipulse amplitude modulation scheme for data hiding and watermarking of still images," IEEE J. Select. Areas Commun., pp. 510­524, May 1998. [14] J. A. O'Sullivan, P. Moulin, and J. M. Ettinger, "Information theoretic analysis of steganography," in Proc. 1998 IEEE Int. Symp. Information Theory, Cambridge, MA, Aug. 1998, p. 297. [15] B. Chen and G. W. Wornell, "Implementations of quantization index modulation methods for digital watermarking and information embedding of multimedia," J. VLSI Signal Processing Syst. Signal, Image, and Video Technol. (special issue on Multimedia Signal Processing), vol. 27, pp. 7­33, Feb. 2001. [16] P. Moulin and J. A. O'Sullivan, "Information-theoretic analysis of information hiding," preprint, Oct. 1999. [17] N. Merhav, "On random coding error exponents of watermarking systems," IEEE Trans. Inform. Theory, vol. 46, pp. 420­430, Mar. 2000. [18] A. Cohen and A. Lapidoth, "On the Gaussian watermarking game," in Proc. IEEE Int. Symp. Information Theory, Sorrento, Italy, June 2000, p. 48. [19] P. Moulin and J. O'Sullivan, "Information-theoretic analysis of information hiding," in Proc. IEEE Int. Symp. Inform. Theory, Sorrento, Italy, June 2000, p. 19. [20] R. J. Barron, B. Chen, and G. W. Wornell, "The duality between information embedding and source coding with side information and some applications," IEEE Trans. Inform. Theory, submitted for publication. [21] I. J. Cox, M. L. Miller, and A. L. McKellips, "Watermarking as communications with side information," Proc. IEEE, vol. 87, pp. 1127­1141, July 1999. [22] F. Hartung and M. Kutter, "Multimedia watermarking techniques," Proc. IEEE, vol. 87, pp. 1079­1107, July 1999.
CHEN AND WORNELL: QUANTIZATION INDEX MODULATION
1443
[23] F. A. P. Petitcolas, R. J. Anderson, and M. G. Kuhn, "Information hiding--A survey," Proc. IEEE, vol. 87, pp. 1062­1078, July 1999. [24] A. Z. Tirkel, G. A. Rankin, R. van Schyndel, W. J. Ho, N. R. A. Mee, and C. F. Osborne, "Electronic water mark," in Proc. Conf. Digital Image Computing, Technology and Applications, Sydney, Australia, Dec. 1993, pp. 666­672. [25] R. van Schyndel, A. Z. Tirkel, and C. F. Osborne, "A digital watermark," in Proc. 1st IEEE Int. Conf. Image Processing, vol. 2, Austin, TX, Nov. 1994, pp. 86­90. [26] I. J. Cox, J. Killian, F. T. Leighton, and T. Shamoon, "Secure spread spectrum watermarking for multimedia," IEEE Trans. Image Processing, vol. 6, pp. 1673­1687, Dec. 1997. [27] B. Chen, "Design and analysis of digital watermarking, information embedding, and data hiding systems," Ph.D. dissertation, MIT, Cambridge, MA, June 2000. [28] E. A. Lee and D. G. Messerschmitt, Digital Communication, 2nd ed. Boston, MA: Kluwer Academic, 1994. [29] N. S. Jayant and P. Noll, Digital Coding of Waveforms: Principles and Applications to Speech and Video. Englewood Cliffs, NJ: PrenticeHall, 1984.
[30] R. Zamir and M. Feder, "On lattice quantization noise," IEEE Trans. Inform. Theory, vol. 42, pp. 1152­1159, July 1996. [31] T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: Wiley, 1991. [32] S. I. Gel'fand and M. S. Pinsker, "Coding for channel with random parameters," Probl. Contr. Inform. Theory, vol. 9, no. 1, pp. 19­31, 1980. [33] C. Heegard and A. A. E. Gamal, "On the capacity of computer memory with defects," IEEE Trans. Inform. Theory, vol. IT-29, pp. 731­739, Sept. 1983. [34] M. H. M. Costa, "Writing on dirty paper," IEEE Trans. Inform. Theory, vol. IT-29, pp. 439­441, May 1983. [35] A. Lapidoth, "Nearest neighbor decoding for additive non-Gaussian noise channels," IEEE Trans. Inform. Theory, vol. 42, pp. 1520­1529, Sept. 1996. [36] F.-W. Sun and H. C. A. van Tilborg, "Approaching capacity by equiprobable signaling on the Gaussian channel," IEEE Trans. Inform. Theory, vol. 39, pp. 1714­1716, Sept. 1993.

B Chen, GW Wornell

File: quantization-index-modulation-a-class-of-provably-good-methods.pdf
Title: Quantization index modulation: a class of provably good methods for digital watermarking and informa - Information Theory, IEEE Transactions on
Author: B Chen, GW Wornell
Published: Thu May 17 09:46:22 2001
Pages: 21
File size: 0.38 Mb


On the Road. 1957, 2 pages, 0.03 Mb

CURRICULUM VITAE for, 72 pages, 0.48 Mb

Tales of ten worlds, 4 pages, 0.04 Mb

Arms for Arbenz, 17 pages, 0.2 Mb
Copyright © 2018 doc.uments.com