inventory level, demand distribution, prior distribution, optimal policy, inventory cost, period, zt, lost sales, newsvendor model
Content:
Inventory Control with Unobservable Lost Sales and Bayesian Updates Xiangwen Lu Institute of Transportation Studies
University of California, Irvine, CA 92697
[email protected] JingSheng Song Fuqua School of Business,
Duke University Durham, NC 27708
[email protected] Kaijie Zhu Department of IELM,
Hong Kong University of
SCIENCE AND TECHNOLOGY Clear Water Bay, Kowloon, Hong Kong
[email protected] October 27, 2006 Abstract We study a finitehorizon lostsales inventory model. The demand distribution is unknown, and is dynamically updated based on the previous sales data in a Bayesian fashion. While the literature on inventory problems with censored demand data has mostly focused on multiperiod newsvendor models, in which the leftovers at each period must be salvaged, the focus here is systems in which inventory can be carried over to fulfill future demand. We derive a samplepath expression of the first derivative of the optimality equation to characterize the key tradeoff of the problem and compare it with the multiperiod newsvendor models. Unlike in the latter, the myopic solution here is no longer a lower bound on the optimal inventory level. Based on this expression, we develop tractable bounds on the optimal inventory level and employ them to devise heuristic policies. Finally, to assess the effectiveness of these heuristic policies, we develop upper bounds on their value loss relative to the optimal cost. Numerical examples are presented to illustrate the results. Key words: Stochastic; inventory; unknown demand distribution; unobservable lost sales; Bayesian updating; dynamic programming. 1
1 Introduction Stochastic inventory models mostly assume that the demand distribution is known. While this is often not the case in reality, this assumption makes the models much more tractable to yield simple solutions. These simple solutions can then be implemented as approximate solutions to
real systems with unknown demand distributions. The drawback, however, is that it is difficult to assess the effectiveness of these solutions when the demand distribution is indeed unknown. Various researchers have taken a different approach  They develop models in which demand distributions have unknown parameters that can be updated as sales information becomes available. These problems are considerably more difficult to solve, because the state space grows rapidly as time evolves due to a longer demand history. The problem becomes even harder when demand data is not completely observable, such as unobservable lost sales (termed censored data). The thrusts of this stream of research are therefore 1) identifying conditions under which simple solutions can be devised and 2) developing insights and effective approximate solutions. This paper falls into this second category of research. Our focus is on inventory
control problems of nonperishable products with unobservable lost sales. As explained below, the combination of these two features in the model makes the problem the most complicated to analyze thus far. Our results represent one of the earliest efforts to tackle the difficulty. We consider a periodicreview inventory system over a finite planning horizon. Demand in each period is stochastic and follows a distribution in a parametric family. However, the parameter of the demand distribution is unknown and is subject to a prior belief, which can be updated periodically with the observed sales data using the Bayesian approach. At the beginning of each period, based on the current status of inventory and the latest knowledge on demand distribution, we make the replenishment decision and place an order accordingly. The order arrives immediately. (We can imagine orders are placed at the end of each day and the shipments arrive the next morning. This is valid in many retail settings, especially with the advanced
information technology and crossdocking logistics.) During the period, demand occurs and is fulfilled from the inventory if feasible. At the end of the period, if there is any inventory left, it will be carried over to the next period and an inventoryholding cost is incurred. If the demand exceeds the inventory level, the unsatisfied portion is lost, resulting in a shortagepenalty cost. The objective is to minimize the total expected cost over the planning horizon. Earlier works on dynamic Bayesian inventory models with unknown demand distributions assume demand data is fully observable. To overcome the high dimensionality of the dynamic program, two approaches have been used. The first approach is to identify a sufficient statistic of the sales data, which reduces the state space to a fixed dimension. The second approach is built on 2
the first one and is often called the statespace reduction technique (see Scarf 1959, 1960, Karlin 1960, and Azoury 1985), which reduces the problem to a singledimension dynamic program. These techniques are applicable to certain types of demand distributions, including Uniform, Weibull, and Gamma distributions. Lovejoy (1990) further develops conditions under which the singledimension Bayesian dynamic program can be reduced to a static
optimization problem (the myopic optimization problem). When demand data is not fully observable due to unobservable lost sales, the information dynamics become more complex. On one hand, the inventory decision is based on the knowledge of the demand distribution. On the other hand, the demanddistribution updating relies on all the sales data obtained previously, which, in turn, depends on inventory decisions made previously. This complication makes the application of the two dimensionreduction approaches mentioned above extremely limited. As a matter of fact, the literature only shows that a sufficient statistic exists for newsvendor distributions (Braden and Freimer 1991) and that the statespace reduction technique is applicable to the Weibull distribution, a specific newsvendor distribution (Lariviere and Porteus 1999). This partly explains why censored data has not received much attention in the inventorycontrol literature until more recently. The existing papers on inventory problems with censored data largely focus on perishable products. Nahmias (1994) and Agrawal and Smith (1996) study how to use censored data to obtain reliable demand estimates. Harpaz, Lee, and Winkler (1982), Lariviere and Porteus (1999), Ding, Puterman, and Bisi (2002) and Lu, Song, and Zhu (2004) study the tradeoffs in the dynamic inventory decisions for perishable products in various contexts. More specifically, when products are perishable, inventory leftover at the end of a period must be salvaged, so the corresponding model is also termed the censored newsvendor model. The main tradeoff in this model is between the potential inventory overage cost incurred in the current period and potential loss of demand information due to insufficient inventory. One major finding is that the myopic solution, which minimizes the inventory cost in the current period but ignores the impact of learning from observed sales data on future periods, is a lower bound on the optimal inventory level. That is, to obtain the informational benefit, we should stock more than the myopic solution. In addition, Lariviere and Porteus (1999) present several managerial insights. We also note that Bensoussan, Cakanyildirim, and Sethi (2005, 2006) use unnormalized probabilities to linearize the dynamic programming formulation, from which they show the existence of an optimal feedback solution and provide a different approach for proving the myopic solution is a lower bound on the optimal solution. Despite the fact that nonperishable products prevail in the retail industry, to our knowledge, Chen and Plambeck (2005) and Huh and Rusmevichientongy (2006) are the only two works that study these systems. Chen and Plambeck (2005) analyze a twoperiod model, while Huh and 3
Rusmevichientongy (2006) examine a multiperiod model using a nonparametric approach (as opposed to the parametric approach adopted in the works mentioned above). Indeed, carrying inventory between periods in the nonperishableproduct context adds one more layer of complexity to models with censored demand data. Because the starting inventory level is no longer necessarily zero at the beginning of a period, it is possible that we do not need to place an order in that period, a situation we term overstock. This phenomenon requires a significantly different treatment than that in the newsvendor model (as a matter of fact, the multiperiod stochastic inventorycontrol literature is primarily devoted to analyzing the dynamics of inventory carryover and the economics of overstock). We study the optimal inventory decision by an incremental analysis. That is, we consider the scenario in which we increase the inventory level by a small amount in a period. Then, we analyze how such an increment of the inventory level brings better demand information in that period and is carried over to future periods. This approach is expected to be applicable to many inventorycontrol problems with information updating, pricing, and inventory carryover. We make two main contributions over the literature. The first contribution is theoretical. We derive an explicit expression for the first derivative of the optimality equation, which provides a samplepath view of the key tradeoffs. In particular, the optimal decision needs to tradeoff three parts  the inventory cost in the current period, the inventory cost in the future, and the informational benefit of higher inventory level in the future. This result generalizes that of the censored newsvendor model in which the tradeoff is between the inventory cost in the current period and the future informational benefit; see Lu, Song, and Zhu (2004). It helps to explain why, unlike in the censored newsvendor model, the myopic solution can be either higher or lower than the optimal inventory level. It also lays a foundation for development of approximations. The second contribution is computational. Due to the curse of dimensionality, it is extremely difficult if not intractable to compute the optimal policy. To counter this difficulty, we develop tractable lower and upper bounds on the optimal inventory level and use weighted averages of these bounds to approximate the optimal policy. To assess the effectiveness of these approximations, we derive an upper bound on the relative cost error of any heuristic policy. It is worth noting that several authors have developed bounds on the optimal inventory levels and/or on the relative cost errors for inventory models with demand forecasting updates using noncensored data; see, e.g., Morton (1978), Lovejoy (1990, 1992), Morton and Pentico (1995), and Lu, Song, and Regan (2003). However, those results and analysis cannot be applied to model examined here. For one thing, when demand data is not censored, increasing inventory does not bring any informational benefit. So, for a given sample path, the demand information obtained by applying any heuristic is exactly the same as that obtained by applying the optimal policy. But this is no 4
longer true here. The remainder of the paper is organized as follows: In Section 2, we introduce the basic notation and discuss two simpler models to prepare the readers to better understand our notation and approach in the latter analysis. In Section 3, we derive the first derivative function for the optimality equations. In Section 4, we develop bounds and approximations for the optimal inventory level and present an analytical approach to evaluate the performance of the approximate solutions. We finally illustrate our approach by a numerical study, followed by concluding remarks. All the proofs are presented in the appendix.
2 Notation and Preliminaries
We first introduce some common notation throughout the paper. We consider a T horizon periodic
review inventory control problem. We use t as the time index, t = 1, ..., T . We denote by Zt the demand in period t, and assume Z1, · · · , ZT to be independent and identically distributed random variables. The demand distribution has a probability density function (pdf) f (·) and a cumulative
distribution function (cdf) F (·), where f (·) is continuous and represents a parameter (or
a vector of parameters) for which we have only some prior knowledge.
For any integer j 1, define Z[t, t + j) = Zt + ... + Zt+j1. Let ut,t+j denote the vector
(ut, ..., ut+j). For example, Zt,t+j = zt,t+j means Zt = zt, ..., Zt+j = zt+j. We use 1(A) to denote
the indicator function of event A.
Denote by h and p the unit inventoryholding, and shortagepenalty costs in period t, respec
tively. Because the problem can always be transformed so that the unit ordering cost is zero (see
Veinott 1963 and Morton 1978), we can omit this cost from our consideration. Thus, given that y
is the inventory level in period t after ordering and zt is the realized demand, the inventoryholding
or shortagepenalty cost is
r (y, zt) = h(y  zt)+ + p (zt  y)+ .
To facilitate understanding of the notation and analysis of the original problem with an unknown demand distribution and unobservable lost sales, in the rest of this section we discuss two simpler models: (i) the traditional lostsales model with a known demand distribution; (ii) a lostsales model with an unknown demand distribution and observable lost sales.
2.1 Lost Sales with Known Demand Distribution Consider the traditional lostsales inventory model in which the demand distribution is known, i.e., the value of is given. Therefore we can suppress in the notation and denote by f (·) and F (·) the pdf and cdf of Zt.
5
Let Vt (x) be the optimal expected cost from period t to T given that the inventory level before ordering in period t is x. We have the following optimality equations:
VT +1 (x) = 0, x 0,
Vt (x) = min {Gt (y)} , x 0, 1 t T, yx
where
Gt (y)
=
C Z
(y) +
+
E
Ј Vt+1
Ў (y

Zt)+ў¤
,
(1)
C(y) =
r (y, zt) f (zt) dzt.
0
It can be shown that Gt (y) is a convex function. Let st be the minimizer of Gt (y). Then, the optimal policy is an orderupto policy with the optimal orderupto level st for period t.
For any given period t, if we minimize only the cost in the current period by solving
C0(y) = (p + h)F (y)  p = 0,
then we obtain the myopic orderupto level µ¶ sm = F 1 p , p+h
which is independent of t.
However, to determine the optimal inventory level st , we need to tradeoff the marginal inventory cost in the current period, C0 (y) and the marginal inventory cost in the future periods, t(y) =
d dy
E
[Vt+1
((y
 Zt)+)].
Below
we
take
a
samplepath
approach
to
show
how
to
do
this
by
backward
iterations. This approach is instrumental to understand the analysis for more
Complex Systemslatter. More specifically, note that, because VT +1 = 0, sT = sm. Suppose we have obtained st+1, ..., sT . We shall show that G0t (y) can be computed through C0 and st+1, ..., sT . Thus, G0t (y) = 0 is readily solvable to yield st . Suppose we are in period t and the inventory level after ordering is y. Then, no replenishment
order will be placed until the stopping time
(t, y) = min{t + j  (y  Z[t, t + j))+ < st+j, j 1}.
(Throughout the paper, min{} = .) Note that, if (t, y) > t + 1, then no lost sales occur in period t. Similarly, if (t, y) > t+i, then no lost sales happen from period t through period t+i1. On the other hand, if (t, y) = t + 1, then lost sales may happen in period t, in which case the initial inventory level in period t + 1 is zero. We have:
6
Theorem 1 In the lostsales inventorycontrol problem with a known demand distribution, the first
derivative function of Gt (y) is
0
0
Gt (y) = C (y) + t(y),
where
t(y)
=
d E dy
Ј Vt+1
Ў (y

Zt)+ў¤
=
TXt
E[1(
(t,
y)
>
t
+
i)C0(y

Z[t,
t
+
i))]
0.
(2)
i=1
Note that a similar result has been shown in the literature. See, e.g., Morton (1978). Because t(y) is always nonnegative, we have st sm. On the other hand, recall that sT = sm, we have (T  1, sm) = T . This implies T 1(sm) = 0 and hence G0T 1(sm) = 0. So, sT 1 = sm. Continuing in the same fashion, we can show that st = sm for all t. That is, the myopic policy is optimal.
2.2 Observable Lost Sales with Unknown Distribution
Next, suppose in f (·) is unknown but lost sales are observable. Observable lost sales may
occur in practice when orders are collected through a call center or mail order catalogs, or via the Internet.
At the beginning of period t, given the prior distribution ^t of , we use Z
m^ t (zt) = f (zt) ^t () d,
(3)
Z
M^ t (zt) = F (zt) ^t () d
(4)
as the pdf and cdf for Zt, respectively. At the end of period t, after we observe the demand realization zt, we update ^t to ^t+1, the posterior distribution of , based on the Bayes formula:
^t+1
(^t, zt)
=
R
f
f (zt)^t (zt0 ) ^t
() (0 )
d0
.
(5)
In this setting, the state of the inventory system comprises both the current inventory level and
the prior distribution of the demand. Let Vt (x, ^t) be the optimal expected cost from period t to T given that the inventory level before ordering in period t is x and the prior distribution of is
^t. Then, we have the following optimality equations:
VT +1 (x, ^T +1) = 0, x 0,
Vt (x, ^t)
=
min yx
{Gt
(y,
^t)}
,
x 0,
1 t T,
7
where
Gt (y, ^t)
= =
Ct Ct
(y, (y,
^t) ^t)
+ +
E Z
Ј Vt+1
Ў (y

Zt)+,
ў¤ ^t+1
+
Vt+1
Ў (y

zt)+,
^t+1(·^t,
zt
ў )
m^ t(zt)dzt,
(6)
Z +
0
Ct (y, ^t) =
r (y, zt) m^ t (zt) dzt.
0
It can be verified that Gt (y, ^t) is a convex function in y. Let st (^t) be its minimizer. Then, the optimal policy is an orderupto policy with statedependent orderupto level st (^t). Suppose we are in period t with prior distribution t. Let y be the inventory level after ordering. Let (t, y) = min{t + j  (y  Z[t, t + j))+ < st+j(^t+j), j 1}.
Then, there will be no order placement till period (t, y). Following the same logic as that of Theorem 1, we obtain:
Theorem 2 In the inventorycontrol problem with an unknown demand distribution and observable lost sales, the first derivative function of G0t(y, ^t) is
0 Gt(y,
^t)
=
Ct0(y,
^t)
+
t(y,
^t),
(7)
where
TXt
t(y, ^t) = E[1( (t, y) > t + i)Ct0+i(y  Z[t, t + i), ^t+i))] 0.
(8)
i=1
Theorem 2 indicates that, similar to the setting of a known demand distribution, the tradeoff
is between the marginal inventory cost in the current period, Ct0(y, ^t), and the marginal inventory
cost in the future periods, t(y, ^t).
The myopic orderupto level that minimizes Ct(y, ^t) is
µ¶
smt (^t) = M^ t1
p p+h
.
From G0t(st (^t) , ^t) = Ct0(st (^t) , ^t) + t(st (^t) , ^t) = 0, the optimal orderupto level satisfies
st
(^t)
=
M^ t1
µ
p

t(st (^t) p+h
,
^t)
¶
.
Because t 0, st (^t) smt (^t), i.e., the myopic orderupto level is an upper bound on the optimal orderupto level. Again, sT (^T ) = smT (^T ). However, the computation of st (^t), t < T, is in general much harder than that in the previous subsection. This is because ^t is updated from ^1 successively based on
8
the history of the observed demands z1, ..., zt1, so the state in period t consists of (x, z1, ..., zt1), whose dimension is increasing in t. Fortunately, for a wide range of demand distributions, the history of (z1, ..., zt1) can be represented by a sufficient statistic, wt. Thus, the state space in period t can be reduced to consist of only x and wt, whose dimension is fixed for all t. So the curse of dimensionality can be circumvented and the computation of the optimal policy becomes much easier. Scarf (1959, 1960) and Azoury (1985) show that the state space can be further reduced to a single dimension if f (·) satisfy certain additional conditions. The distributions that satisfy these conditions include Uniform, Weibull, and Gamma.
3 Unobservable Lost Sales with Unknown Distribution
We now turn to the main focus of this paper  The parameter in the demand distribution is unknown and lost sales are unobservable.
3.1 Bayesian Updates At the beginning of period t, our knowledge about the demand in that period is a prior distribution of , denoted by t. Using this prior, we can derive the pdf mt and cdf Mt in the same fashion as in (3) and (4). However, because the demand realization zt during period t is not fully observable due to unobservable lost sales, the demanddistribution updating at the end of t is much more complicated than (5). In fact, the observed sales information in period t consists of two parts, denoted by
ot = (ot,1, ot,2),
where ot,1 is the sales quantity and can take any nonnegative value, while ot,2 is the observation
status taking values of e or c, representing that the observation of the demand is exact or censored,
respectively.
For an illustration, suppose the stocked inventory is y and the demand realization is zt. If
y > zt, then the sales quantity is zt and the observation of the demand is exact. Otherwise, the
sales quantity is y and the observation is censored. Thus, ot is determined by y and zt, which we
express as
Ѕ
ot = y zt ,
(zt, e) (y, c)
if y > zt if y zt
.
For expositional simplicity, in the remainder of the paper, we use the shorthand notation zte and yc to represent (zt, e) and (y, c), respectively. For example, with the sales information 10e, we know
9
that that the demand is exactly 10. But with 10c, we can only tell that the demand is no less than
10.
At the end of period t, based on the observed sales information ot, we update t to t+1, the
posterior distribution of . We define
Ѕ
l (ot) =
f (ot,1)
if ot,2 = e,
1  F (ot,1) if ot,2 = c.
(9)
as the likelihood function of the sales information ot for any given . Based on the Bayes formula,
we have
t+1 (t, ot)
=
R
l
l (ot) (ot0 )
t t
() (0 )
d0
.
(10)
We can then obtain the marginal density function of the demand distribution for period t + 1 as
R
mt+1 (zt+1t, ot) =
f (Rzt+1) l (ot) t () d . l (ot) t () d
(11)
3.2 The Dynamic Program and Its Complications
Let Vt (x, t) be the optimal expected cost from period t to T given that the inventory level before ordering in period t is x and the prior distribution of is t. The optimality equations for this problem are
VT +1 (x, T +1) = 0, x 0,
(12)
Vt (x, t) = min {Gt (y, t)} , x 0, 1 t T,
(13)
yx
where
Gt (y, t)
=
Ct
(y,
t)
+
EZ
Ј Vt+1 y
Ў (y

Zt)+,
ў¤ t+1
(14)
= Ct (y, t) + Vt+1 (y  zt, t+1(·t, zte)) mt(zt)dzt + Vt+1 (0, t+1(·t, yc) [1  Mt(y)],
Z +
0
Ct (y, t) =
r (y, zt) mt (zt) dzt.
(15)
0
Similar to the model with observable lost sales studied in Section 2.2, the sufficient statistic
and the statespace reduction technique can reduce the dimensionality of the dynamic program.
However, the applicability of these is extremely limited. In fact, the literature only shows that
a sufficient statistic exists for newsvendor distributions (Braden and Freimer 1991) and that the
statespace reduction technique is applicable only to the Weibull distribution, a specific newsvendor
distribution (Lariviere and Porteus 1999).
For a general demand distribution, unobservable lost sales make it difficult to identify a fixed
dimension sufficient statistic or apply the statespace reduction technique. Therefore, the compu
tation of the optimal policy is very difficult or even intractable. This is well recognized by the
10
literature. For example, Braden and Freimer (1991) claim that "if a modeler feels that no member
of the families we characterize is a reasonable approximation, then he will almost surely encounter
serious analytic and computational problems if his data includes censored observations."
A second complication in this model is that the convexity of expected costs may not necessarily
hold. In the two settings studied in Section 2, inventory decisions have no impact on demand
realizations, and thus the convexity are shown to be true via induction. But in the setting of
censored demand, an inventory decision can affect demand realizations in future periods, which
makes the preservation of convexity collapse and therefore implies that the optimal policy is not
necessarily an orderupto policy.
More specifically, because Ct (y, t) as y , Gt (y, t) has minimizers. Due to the
possible existence of multiple minimizers, the optimal policy may resemble the (s, S) policy but
has multiple threshold (k1)} for k 2, (1)
values. Let (1) = arg min{Gt (y, t) , y = 0, and (k) = min{y : y > (k1), Gt
0},
(k)
=
arg Ў
min{Gўt
(y,
t)
(y, t) = Gt (k), t } for k
,y
> 2.
Thus, it is straightforward that (k) < (k) < (k+1) < (k+1). The optimal policy is the following:
If (k) < x < (k), an order is placed and brings the inventory level to (k). If (k) x (k+1),
no order is placed.
Formally, we can define
st (x, t) = arg minyxGt (y, t) .
That is, st (x, t) is the optimal inventory level in period t given that the inventory level before
ordering is x and the prior distribution of is t. Following the above explanation, we in fact have
st (x, t)
=
Ѕ
(k) x
when (k) < x < (k) when (k) x (k+1).
3.3 The First Order Condition Because the computation of the original optimality equation is intractable, we seek approximate solutions. In order to do so, in this subsection we derive an expression for the first order condition, providing a deeper understanding of the tradeoff in decision making and laying a foundation for the development of approximations in Section 4. We introduce the following convention: Assume that we are given t , the prior distribution of at the beginning of period t, and ot,t+i1, the sales information from period t to t + i  1 (note that ot,t+i1 depend on both demand realizations and inventory decisions in those periods). We can then compute the posterior distribution t+i (see Lemma 10 in the appendix). Thus, in terms of notation, we often replace t+i by (t, ot,t+i1) to highlight the impact of inventory decisions inbetween on the posterior distribution. For example, we equate Gt+i (y, t+i) with Gt+i (y, t, ot,t+i1), with the understanding that t+i is computed from t and ot,t+i1.
11
To make the exposition more compact, we introduce the following notation for history of information:
Ht+1 = (t, ot) , і
ґ
і
ґ
Ht+i = (t, ot, st+1 xt+1, H^t+1 zt+1, · · · , st+i1 xt+i1, H^t+i1 zt+i1), 2 i T  t, (16)
where xt+j and H^t+j are, respectively, the initial inventory level and the sales information used to compute the order quantity in period t + j.
With this notation, we define a suboptimal policy that has the following expected costtogo:
і
ґ TXt h
i
Jt+1 st+1(xt+1, H^t+1), Ht+1 = E Ct+i(st+i(xt+i, H^t+i), Ht+i) ,
i=1
і
ґ
where H^t+i = H^t+i1 st+i1(xt+i1, H^t+i1) zt+i1 and xt+i = st+i1(xt+i1, H^t+i1)zt+i1
for all i 2. Under this policy, in any period from period t + 1 on, while the actual information
is Ht+i, an "optimal" order quantity is placed as if the obtained information were H^t+i. On the
other hand, given costtogo is Gt+1
Ўxst+t+11a(xntd+1H, tH+t1+, 1i)f,wHet+fo1lўlo. wThtheereofoprteim, wale
policy from conclude
period
t
+
1
on,
the
resulting
Gt+1
Ўst+1(xt+1,
Ht+1),
ў Ht+1
Jt+1
і st+1(xt+1,
H^ t+1 ),
ґ Ht+1
.
(17)
Also, let
(t, y) = min{t + j  (y  Z[t, t + j))+ < st+j((y  Z[t, t + j))+, t+j), j 1}.
Then, the next order placement will not occur until period (t, y). Note that for any sample path
z such that (t, y) > t + i, we have y  z[t, t + i) st+i((y  z[t, t + i))+, t+i) > 0, which implies
zt+k is less than the inventory level in period t + k, k = 0, 1 ,... , i  1. So, before t + i, all demand
observations are exact. This implies that, conditional on (t, y) > t + i, we can replace t+i by
(t, zet,t+i1),
where
zet,t+i1
is
the Ў
exact
demand ў
realization
from
period
t
to
t
+
i

1.
In addition, let Zet,t+i1 = Zte, ..., Zte+i1 , where Zte+j is definіed recursively as the randoґm
event that ot+j,1 = Zt+j and ot+j,2 = e (or equivalently, Zt+j < st+j y  Z[t, t + j), t, Zet,t+j1 ).
Using the above notation, we can now present the expression for the first derivative of Gt:
Theorem 3 In the inventorycontrol problem with an unknown demand distribution and unobserv
able lost sales, the first derivative of Gt (y, t) is
0 Gt
(y,
t)
=
Ct0(y,
t)
+
t(y,
t)
+
t(y,
t),
(18)
where
TXt
TXt
t(y, t) = t,t+i(y, t) 0, t(y, t) = t,t+i1(y, t) 0
(19)
i=1
i=1
12
t,t+i(y, t) = E[1( (t, y) > t + i)Ct0+i(y  Z[t, t + i), t, Zet,t+i1)],
i 1,
t,t(y, t)
=
Ј Gt+1
Ўst+1
Ў 0,
Hte+1ў
,
Hte+1
ў

Jt+1
Ўst+1
Ў 0,
Htc+1ў
,
Hte+1ў¤
mt
(y)
,
t,t+i1(y, t) = E[ЎGt+i(st+i(0, Hte+,ei), Hte+,ei)  Jt+i(st+i(0, Hte+,ci), Hte+,ei)ў
Ч1( (t, y) > t + i  1)mt+i1(y  Z[t, t + i  1)Zet,t+i2)],
i 2,
and Hte+1 = (t, ye), Htc+1 = (t, yc), Hte+,ei = (t, Zet,t+i2, (y  Z[t, t + i  1))e), and Hte+,ci = (t, Zet,t+i2, (y  Z[t, t + i  1))c) for i 2. Moreover, the terms of t(y, t) and t(y, t) have the following properties, for any 1 i T  t,
t,t+i1(y, t) 0,
TXt
TXt
t,t+j(y, t) +
t,t+j1(y, t) 0,
j=i
j=i+1
TXt t,t+j(y, t) 0.
j=i
(20) (21) (22)
The decomposition of G0t (y, t) in (18) indicates that to make the right inventory decision, in addition to trade off the marginal inventory cost in the current period Ct0(y, t) and the marginal inventory cost in the future periods t(y, t) as in the case of known demand distribution or observable lost sales, we also need to take into account the marginal informational benefit in the future resulting from increasing the inventory level in period t (since t(y, t) 0, its absolute value is the marginal benefit). To facilitate understanding of the tradeoff and the consequent analysis in the next section, below we interpret the components of t(y, t). Observe that to derive G0t (y, t), we need to compare two scenarios  the inventory level after ordering in period t is y (scenario 1) or y + dy (scenario 2). The two scenarios have a cost difference in period t as follows: Ct (y + dy, t)  Ct (y, t) = Ct0 (y, t) dy + o(dy), which gives the first term of Ct0 (y, t) in (18). For the remaining terms, we discuss three cases of demand realization zt: y < zt < y + dy, zt > y + dy, and zt < y.
Case 1: y < zt < y + dy. In scenario 1, we obtain a censored observation of yc and make
the inventory decision in any future period based on this observation, which results in a cost
of
Jt+1
Ўst+1
Ў 0,
ў Htc+1
,
ў Hte+1 .
ingly make inventory decisions
In scenario afterwards,
2, we obtain an which results in
exact a cost
observation of ye and accord
of
Gt+1
Ўst+1
Ў 0,
ў Hte+1
,
ў Hte+1 .
13
The probability of y < zt < y + dy is approximately mt (y) dy. In short, compared with scenario 1, the additional inventory dy in scenario 2 allows us to obtain a more accurate observation in period t. This gives a cost reduction (i.e., an informational value) of t,t(y, t)dy, which leads to the first term in t(y, t). Case 2: zt > y + dy. In scenarios 1 and 2, we obtain censored observations of yc and (y + dy)c, respectively. Theorem 3 indicates that, in this case, compared with scenario 1, having additional inventory dy in scenario 2 still misses the true demand information and brings a negligible informational value (i.e., a value of only o(dy)). Case 3: zt < y. In scenarios 1 and 2, the inventory levels before ordering in period t + 1 are y  zt and y + dy  zt respectively, while the obtained observations are both zte. So the prior knowledge in period t + 1 becomes t+1 = (t, zte). Case 3a. If an order is placed in period t + 1 in scenario 1, i.e., (t, y) = t + 1, then an order will be placed in scenario 2 as well, which will bring the inventory to the same level as that in scenario 1. That is, the additional inventory dy can be offset by reducing the ordering quantity in scenario 2 by dy. So, given both the updated distributions and inventory level (after ordering) in period t + 1 are the same, the expected costs incurred from period t + 1 to T will be the same in the two scenarios. Case 3b. If no order is placed in period t+1 in scenario 1, i.e., (t, y) > t+1, then there is no need to place order in scenario 2 either. The inventory level ("after ordering") in period t + 1 will then be y  zt and y  zt + dy for scenarios 1 and 2, respectively. Thus, the inventory costs incurred in period t+1 are Ct+1(y zt, t, zte) and Ct+1(y +dy zt, t, zte) in the two scenarios. This leads to a cost difference of Ct+1(y + dy  zt, t, zte)  Ct+1(y  zt, t, zte) = Ct0+1(y zt, t, zte)dy +o(dy), from which we obtain t,t+1(y, t). Meanwhile, the additional inventory dy carried to period t + 1 can allow a more accurate observation to be obtained in t + 1 (if y  zt < zt+1 < y  zt + dy happens) and thus can bring an informational value, t,t+1(y, t)dy. From this we obtain the second term in t(y, t). Similarly, if (t, y) > t + i (which implies zt < y, zt + zt+1 < y, till z[t,t+i) < y), the additional inventory dy carried from period t to period t+i brings both the marginal inventory cost incurred in that period, t,t+i(y, t), and the informational value due to a more accurate observation obtained in that period, t,t+i(y, t). Trivially we have t,T (y, t) = 0, because a more accurate observation in the last period has no value at all. Theorem 3 also presents some properties of t(y, t) and t(y, t). More specifically, (20) means that the higher the inventory level y in period t, the larger the informational benefit obtained in 14
any future period. (21) indicates that when there is no order placed between period t + 1 to t + i,
increasing the inventory level y in period t increases the total expected cost from period t + i to
T . Because this cost consists of both the marginal inventory cost and the marginal informational
benefit, (21) essentially implies that the magnitude of the former outweighs that of the latter,
i.e.,
PT t j=1
t,t+j
(y,
t)

PT t j=2
t,t+j1(y,
t).
For example, t,t+1(y, t) + · · · + t,T (y, t)
t,t+1(y, t)  · · ·  t,T 1(y, t) means that the total marginal inventory cost incurred in periods
t + 1, · · · , T due to stocking dy in period t is no less than the total marginal informational benefit
from exact demand observations in periods t + 1, · · · , T  1 brought by dy. As a result, (22) shows
that the marginal inventory cost incurred from period t + i to T is nonnegative (but the marginal
inventory cost in a particular period, t,t+i(y, t), is not necessarily nonnegative).
Finally, we remark that (18) generalizes that for the censored newsvendor model. In that
model, without inventory carryover, we place an order in every period. Thus, (t, y) t + 1 and
1( (t, y) > t + i) = 0 for any i = 1, · · · , T  t and y. So, the terms t,t+1(y, t), · · · , t,T (y, t) and t,t+1(y, t), · · · , t,T 1(y, t) vanish from (18), yielding
G0t (y, t) = Ct0(y, t) + t,t(y, t),
(23)
an expression shown in Lu, Song, and Zhu (2004) for the censored newsvendor model.
3.4 The Myopic Solution
We define the myopic solution for this model as follows
smt (x, t) = arg minyxCt (y, t) .
Note that
µ¶
smt (0, t) = Mt1
p p+h
,
smt (x, t) = max{x, smt (0, t)}.
(24)
Recall that in the censored newsvendor model (i.e., no inventory carryover), the myopic solution
is a lower bound on the optimal inventory level (see Harpaz, Lee, and Winkler 1982, Lariviere and
Porteus 1999, Lu, Song, and Zhu 2004, and Bensoussan, Cakanyildirim, and Sethi 2006). Since
that model can be treated as a special case of the one studied here, we obtain that result from (23)
smt (0,
t)
=
Mt1
µ p
p +
¶ h
st (0,
t)
=
Mt1
µ p

t,t(st (0, p+h
t),
¶ t)
due to t,t(y, t) 0.
With inventory carryover considered in this paper, the above result may not necessarily hold.
We obtain from the first derivative function (18)
st (0,
t)
=
Mt1
µ
p

t(st (0,
t), t) p+
 h
t(st (0,
t
),
t
)
¶
.
15
Because t(st (0, t), t) 0 and t(st (0, t), t) 0, the sign of t(st (0, t), t) + t(st (0, t), t) is indeterminate. Therefore, the myopic solution, smt (0, t), can be either larger or smaller than st (0, t), depending on which term is dominant, the marginal informational benefit in the future, t(st (0, t), t), or the marginal inventory cost in the future periods, t(st (0, t), t). We summarize our observation in Theorem 4. We also present in this theorem a sufficient condition under which the myopic policy is indeed a lower bound (it still remains an open question to derive a condition for the myopic solution being an upper bound). Theorem 4 We have the following results with respect to the myopic solution of the inventorycontrol problem with an unknown demand distribution and unobservable lost sales: (a) In general, the myopic solution can be either larger or smaller than the optimal inventory level. (b) However, when the updated myopic inventory level is reachable for any demand realization in period t, i.e., smt (0, t)  zt smt+1(0, t+1) for all t, then the myopic inventory level is a lower bound on the optimal inventory level, i.e., smt (x, t) st (x, t). Part (a) is consistent with Chen and Plambeck (2005)'s finding in a twoperiod model.
4 Approximation and Performance Evaluation
In this section, we develop approximations for the optimal inventory levels. Our approach is to develop bounds on G0t in Theorem 3, from which we develop bounds on the optimal inventory level and use the bounds to construct heuristic policies. We also discuss how to assess the effectiveness of these heuristic policies. We first define some new notation. Recall that we have defined smt (x, t) as the myopic inventory level. Here we denote by s`t (x, t) and sut (x, t) the lower and upper bounds on the optimal inventory level st (x, t), respectively. For ease of exposition, in parallel to the notation presented in (16), we additionally let
Ht+1() = (t, ot) , і
ґ
і
ґ
Ht+i() = (t, ot, st+1 xt+1, H^t+1 zt+1, · · · , st+i1 xt+i1, H^t+i1 zt+i1),
= m, `, u, 2 i T  t,
where H^t+i = Ht+i() and, for i 2, xt+i = st+i1(xt+i1, H^t+i1)  zt+i1. We can explain Ht+i(u) as follows: Given that we update t to t+1 with the observation ot and that the inventory level before ordering is xt+1 in period t + 1, we successively choose upper bounds
16
on the optimal inventory levels afterwards, and Ht+i(u) represents the resulting sales history by period t. Ht+i(`) and Ht+i(m) can be explained in a similar way.
4.1 Approximation of Optimal Inventory Level According to (18), we have that G0t (y, t) Ct0(y, t) + t(y, t) by applying (19) and G0t (y, t) Ct0(y, t)+t,t(y, t) by setting i = 1 in (21), where t(y, t) is the marginal inventory cost incurred in periods t + 1, · · · , T due to stocking dy additionally in period t and t,t(y, t) is the marginal informational benefit from an exact demand observation in period t brought by dy. Thus, to derive bounds on G0t (y, t), it is sufficient to derive bounds on t(y, t) and t,t(y, t). We derive bounds on t(y, t) in Lemma 5.
Lemma 5 Given sut+i((y  z[t, t + i))+, t+i) and s`t+i((y  z[t, t + i))+, t+i) for i = 1, · · · , T  t, we have the following bounds on t(y, t):
TXt E[1( u (t, y) > t + i)Ct0+i(y  Z[t, t + i), t, Zet,t+i1)] t(y, t)
i=1
TXt
E[1(
`
(t,
y)
>
t
+
i

1)1(Amt+i
0 )Ct+i
(y

Z[t,
t
+
i),
t,
Zet,t+i1)],
i=1
where
(t, y) = min{t + j  (y  Z[t, t + j))+ < st+j((y  Z[t, t + j))+, t+j), j 1}, = `, u,
Amt+i = {y  Z[t, t + i) smt+i(0, t+i)}.
ЎЎ
ў
ў
Jant+d1TiЎonstdL+ee1rmiЎv0me, Habot6cu+(n1bўd)s,hHoonwte+1tўto,.t(dWye,reivpte)r,ebwsoeenutdnedinrsiLvoeenmsGempta+ar16aЎ(tsaet)l+yh1boЎwo0u,tnHodtdes+eo1rўniv,GeHbtt+eo+1u1nўs.dts+Wo1en0nJ,otH+te1te+Ўt1shta+,t1HtЎhte0+e,1Hliktca+en1lidў
,
ў Hte+1
hood function of ot = ye is f (y) while the likelihood function of ot = yc is 1  F (y). Lemma
6(a) essentially indicates that the difference between these two likelihood functions determines the
difference
between
Jt+1
Ўst+1
Ў 0,
Htc+1ў ,
Hte+1ў
and
Gt+1
Ўst+1
Ў 0,
Htc+1ў
, Htc+1ў.
Lemma 6
(a)
We
have
the
following
bounds
on
Jt+1
Ўst+1
Ў 0,
Htc+1ў
,
Hte+1ў:
Gt+1
Ўst+1
Ў 0,
Htc+1ў
,
Htc+1ў
[1
 Mt (y)] mt (y)
min
Ѕ 1
Jt+1
Ўst+1
Ў 0,
Htc+1ў
,
Hte+1ў
Gt+1
Ўst+1
Ў 0,
Htc+1ў
,
ѕ f (y)
 F (y)
Htc+1ў
[1
 Mt (y)] mt (y)
max
Ѕ
1
f 
(y) F (y)
ѕ
.
17
(b)
We
have
the
following
bounds
on
Gt+1
Ўst+1
(xt+1
,
Ht+1)
,
ў Ht+1 :
TXt ECt+i(smt+i(0, Ht+i(u)), Ht+i(u))
i=1
Gt+1
Ўst+1
(xt+1,
Ht+1)
,
ў Ht+1
TXt
ECt+i(smt+i(xt+i,
Ht+i(m)),
Ht+i(m)),
i=1
with Ht+1(u) = Ht+1(m) = Ht+1.
Applying Lemmas 5 and 6, we obtain
Theorem 7 gt` (y, t) G0t (y, t) gtu (y, t), where
"
#
gt` (y, t)
=
0 Ct(y, t) +
TXt ECt+i(smt+i(0, Ht+i(u)), Ht+i(u))
mt (y)

"TXt
i=1 ECt+i(smt+i(xt+i,
Ht+i(m)),
# Ht+i(m))
[1

Mt
(y)]
max
Ѕ
1
f 
(y) F (y)
ѕ
,
i=1
gtu (y, t)
=
0 Ct
(y,
t)
+
E[1(Amt+1)Ct0+1(y

zt,
t,
Zte)]
+
TXt
0 E[Ct+i(y

Z[t,
t
+
i),
t,
Zet,t+i1)1(
`
(t,
y)
>
t
+
i

1)1(Amt+i)],
i=2
with Ht+1(u) = (t, ye) and Ht+1(m) = (t, yc).
Remark It is possible to obtain tighter bounding functions for G0t (y, t) than those presented in Theorem 7. For example, based on (18), we can also claim G0t (y, t) Ct0(y, t) + t(y, t) + t,t(y, t) by applying (20) for i = 2, ..., T  t and G0t (y, t) Ct0(y, t) + t,t(y, t) + t,t+1(y, t) + t,t+1(y, t) by setting i = 2 in (21). By adapting and applying Lemma 5 and Lemma 6 accordingly, we can obtain much more complicated but possibly tighter bounding functions than those presented in Theorem 7. However, the bounding functions presented in Theorem 7 are easier to present and compute. Remark The above analysis and results can be easily carried over to the censored newsvendor model in which the inventory is perishable. In that case, as shown in (23), we have G0t (y, t) = Ct0(y, t) + t,t(y, t). Thus, deriving bounds on t,t(y, t) is sufficient for deriving bounding functions on G0t (y, t). More specifically, since t,t(y, t) 0, we directly have G0t (y, t) Ct0(y, t). On the other hand, we can use Lemma 6 to derive a lower bound on t,t(y, t). Solving gtu (y, t) = 0 and gt` (y, t) = 0 yields a lower and an upper bound on the optimal inventory level, respectively. More specifically, we have: 18
Corollary 8 Let sut (0, t) be the biggest solution to gt` (y, t) = 0 and s`t(0, t) be the smallest solution to gtu (y, t) = 0 over y 0. Then s`t(x, t) st (x, t) sut (x, t) for all x 0. Moreover, for x < s`t(0, t), s`t(x, t) = s`t(0, t) and sut (x, t) = sut (0, t); for x sut (0, t), s`t(x, t) = sut (x, t) = x. 4.2 Heuristics and Performance Evaluation We can obtain a general class of heuristics by averaging the lower and upper bounds on the optimal inventory level with weights, i.e.,
sHt (x, t) = sut (x, t) + (1  )s`t(x, t), where [0, 1].
Generating heuristics by weighting bounds has been widely used in the literature, see, e.g., Morton and Pentico (1995), Shang and Song (2003), Lu, Song and Regan (2003). An appropriate value for can come from trial and error, based on the cost error bounds derived below. For a given heuristic, we denote by VtH (x, t) the resulting expected cost from period t to period T , given that the inventory level before ordering is x in period t and the prior distribution of is t. The relative cost error of the heuristic to the optimal policy is VtH (x, t)  Vt (x, t) Ч 100%. Vt (x, t) Due to the curse of dimensionality of the dynamic program, it is intractable to compute the cost of the optimal policy, Vt (x, t). So, we develop an upper bound on VtH (x, t)  Vt (x, t) and a lower bound on Vt (x, t) (in Theorem 9), which lead to an upper bound on the relative cost error. We call this upper bound the relative cost error bound, and use it to measure the performance of the heuristic.
Theorem 9
(a) VtH (x, t)  Vt(x, t) has the following upper bound:
"TXt
#
max{B1 (t, t) , B2 (t, t)} + E
max{B1(t + i, t+i), B2(t + i, t+i)} ,
i=1
where
і
ґ
B1 (t, t) B2 (t, t)
= =
mmaxisn`ts(Ht0,(0t,)t)yysHt s(ut0,(0t,) t{)gntug(t`y(,y,t)}t)osЎHts(ut 0(,0, t)t)s`ts(Ht0,(0,t)t
, ў )
.
Here, the posterior distributions (t+i) are computed based on the observations obtained from applying the heuristic.
19
(b) Vt(x, t) has the following lower bound: TXt Ct(smt (0, t), t) + ECt+i(smt+i(0, Ht+i(u)), Ht+i(u)) i=1 with Ht+1(u) = (t, sut (0, t) Zt).
4.3 Numerical Study
We now present a numerical study. One purpose of this study is to illustrate our approach of
developing and evaluating heuristics. Another purpose is to evaluate the myopic policy and compare
its performance with that of the developed heuristics. As we know, the myopic policy is easy to
compute, but it remains a question how effective such a policy can be.
We assume f (·) is a
Normal Distribution with unknown mean and known variance 2, where
takes its value from the set = {1, 2, 3} = {100, 200, 300}. In other words, there are three
possible values for the mean of the normal distribution. Although we treat as a continuous
R
P
variable in the previous analysis, by replacing the integration with the summation , we can
show that the analysis holds for the case where is a discrete variable. We note that assigning
a discrete set of values to the unknown parameter is often used in
practical applications of the
Bayesian approach, see, e.g., Eppen and Iyer 1997 for an application to fashion products.
We consider a fourperiod planning horizon. We assume h = 1, p = 10, and = 100, while
parameterizing 1. To parameterize 1 over a single parameter, we assume tіhat the priorґprobability
of 1 is and the
prior probabilitycompute the variance of as V ar ()
and
decreasing
in
for
Ј
4 9
,
¤ 1.
of =
2
equals Ў
that
of ў3
(i.e.,
2500 1 + 8  92 , which
1 =
,
1 2
,
1 2
).
is increasing in for
We can
Ј 0,
4 9
¤
We
report
in
Table
1
s`1(0, 1),
su1 (0, 1),
and
an
upper
bound
on
V1H
(0,1)V1(0,1) V1(0,1)
.
No
informa
tion
related
to
s`t(0, t),
sut (0, t),
or
VtH (0,t)Vt(0,t) Vt(0,t)
is
reported
because
t,
the
distribution
of
in
an intermediate period, depends on the sample path of observations and decisions and thus is not
unique.
Besides,
V1H (0,1)V1(0,1) V1(0,1)
represents
the
effectiveness
of
a
heuristic
over
the
entire
planning
horizon, because it is the cost difference between the costtogo from period 1 of the heuristic and
that of optimal policy.
We briefly discuss the calculation of the lower and upper bounds in the numerical study. The
calculation is conducted via a backward manner. That is, prior to deriving sut (0, t) and s`t(0, t) for
period t, we have the bounds for periods beyond t. More specifically, we derive sut (0, t) bny evaluato
ing gt` (y, t) (see Theorem 7 and Corollary 8), where Ct0(y, t) and [1  Mt (y)] max
f (y) 1F (y)
are straightforward to compute, smt+i is obtained immediately from the newsvendor solution, and
ECt+i comes from summing up the costs over all the sample paths of zt+1, · · · , zt+i1 weighted
20
by their respective probabilities. Similarly, we derive s`t(0, t) by evaluating gtu (y, t), where in computing ECt0+i, we only need to consider those sample paths with both ` (t, y) > t + i  1 and Amt+i being true.
Prior distribution 1
ЎЎЎЎЎЎЎЎЎ29498901913592379,,,,,,,,,111751124913291619888
, , , , , , , , ,
1ў
2 4
ў
9 7
ў
118ў
3 5
ў
128ў
9 1
ў
6 1
ў
9 1
ў
18
(1, 0, 0)
Heuristic from weighted bounds
s`1(0, 1)
su1 (0, 1)
Relative cost error bound
399
441
1.92%
389
435
2.83%
377
427
3.13%
364
418
3.32%
349
407
4.18%
331
393
4.24%
311
374
4.18%
286
346
3.87%
258
301
2.34%
229
234
0.01%
Myopic policy
sm1 (0, 1)
Relative cost error bound
400
8.78%
393
8.27%
384
7.94%
375
7.39%
362
7.13%
346
6.76%
326
6.21%
300
4.96%
268
2.78%
234
0.00%
Table 1: Effectiveness of heuristic policies
Throughout the numerical study, we uniformly choose = 0.7. We find that for the scenarios
studied, the relative cost error bounds under this value of are relatively small. Of course, for a
particular scenario, searching for the best value of may lead to a further reduction of the relative
cost error bound.
As
we
have
shown
that
V
ar ()
is
increasing
in
for
Ј 0,
4¤ 9
and
decreasing
otherwise,
the
result in Table 1 essentially indicates that the performance of the developed heuristics depends on
the uncertainty on . In particular, as the uncertainty on increases, the performance becomes
worse.
We also report in Table 1 the myopic solution in period 1 (i.e., sm1 (0, 1)) and the relative cost
error іbounґd for this policy. For a given 1, sm1 (0, 1) is computed immediately from sm1 (0, 1) =
M11
p p+h
. The
numerical results indicate that the myopic policy under performs our heuristics.
It is interesting to note that the gap between s`1(0, 1) and sm1 (0, 1) tends to increase when the
demand uncertainty becomes higher (on the other hand, the gap between su1 (0, 1) and sm1 (0, 1)
is relatively stable except when 1 converges to (1, 0, 0)). An increased gap between s`1(0, 1)
and sm1 (0, 1) indicates that the optimal inventory level s1(0, 1) might become more likely to be
located between s`1(0, 1) and sm1 (0, 1) and therefore be smaller than sm1 (0, 1). The insight of the
above observation is the following: A larger demand uncertainty could increase the possibility of
overstock. As a result, the marginal inventory cost due to overstock may become higher than the
marginal informational benefit, which makes the myopic solution an upper bound on the optimal
inventory level.
21
5 Concluding Remarks We analyzed a finitehorizon periodicreview inventory model with an unknown demand distribution and unobservable lost sales. In each period, an inventory decision needs to be made based on the prior belief of the demand distribution, and at the end of the period, the prior distribution of the next period will be updated based on this period's censored demand information. Apart from the majority of the inventory literature on censored demand data, we assumed inventory leftovers at each period can be carried over to future periods. This particular feature complicates the analysis tremendously. We derived an explicit expression of the firstorder condition for determining the optimal inventory level in each period. The result depicts a clear picture of various tradeoffs in the decision making. It allows us to see that, unlike in the models without inventory carryover, the myopic inventory level in this model can be either smaller or larger than the optimal inventory level. It also helps us construct lower and upper bounds on the optimal inventory level and then use the weighted averages of the bounds to generate heuristics. We also developed an analytical approach to evaluate the performance of the heuristics and illustrated our approach by a numerical study. We showed that the myopic policy can underperform the developed heuristics significantly. Finally, by comparing the model with two simpler ones  one with a known demand distribution and the other with an unknown demand distribution but observable lost sales  we demonstrated the value of
Information Acquisition. In particular, if companies can use technology to learn more accurately how customer demand is distributed, so that an i.i.d. demand model can be truthfully employed, then the inventory policy is of the simplest form  the static myopic policy. If the demand distribution is difficult to estimate but mechanisms can be used to track the lost sales, then for a variety of distribution families, the computation of the optimal inventory level can be reduced to a singledimension problem, which is quite manageable. In either case, the additional information makes inventory management much easier. Acknowledgments This research was supported in part by
National Science Foundation under grant DMI0353552 and National Natural Science Foundation of China under award No. 70328001. 22
Appendix
Proof of Theorem 1 Using the Dominated Convergence Theorem, we can show
(E
Ј Vt+1
Ў (y

Zt)+ў¤)0
=
E[Vt0+1((y

Zt)+)].
Thus, we can concentrate on Vt0+1((y  zt)+) for any demand realization zt. Since
Ѕ Vt+1((y  zt)+) =
Gt+1(st+1) Gt+1((y  zt)+)
when (y  zt)+ st+1 when (y  zt)+ > st+1
,
we have
Ѕ
Vt0+1((y  zt)+) =
0 G0t+1((y  zt)+) > 0
when (y  zt)+ st+1 when (y  zt)+ > st+1
.
Thus, we obtain t(y) 0 and
0 Vt+1((y

zt)+)
=
1((y

zt)+
st+1)G0t+1((y

zt)+)
= 1((y  zt)+ st+1)G0t+1(y  zt) = 1( (t, y) > t + 1)G0t+1(y  zt)
= 1( (t, y) > t + 1)C0(y  zt) + 1( (t, y) > t + 1)E[Vt0+2((y  zt  Zt+1)+)],
where the second equality comes from st+1 0 and the third one comes from the definition of (t, y). Similarly, for any demand realization zt+1, we have
0 Vt+2((y

zt

zt+1)+)
=
1((y

zt

zt+1)+
st+2)G0t+2((y

zt

zt+1)+)
= 1((y  zt  zt+1)+ st+2)G0t+2(y  zt  zt+1) = 1( (t, y) > t + 2)G0t+2(y  zt  zt+1)
=
1(
(t,
y)
>
t
+
2)C0(y

zt

zt+1)
+
1(
(t,
y)
>
t
+
0 2)E[Vt+3((y

zt

zt+1

Zt+2)+)].
Continuing in the same fashion and noting that (t, y) > t + i implies (t, y) > t + i  1, we
obtain (2).
2
We present in Lemma 10 and 11 some technical properties used in the proof of Theorem 3. For conciseness, in the following proofs, we let xt+i = st+i1(xt+i1, Htc+i1)  zt+i1, Hte+i = (t, ye, ot+1,t+i1) and Htc+i = (t, yc, ot+1,t+i1) for i 2, where ot+j = st+j(xt+j, Htc+j) zt+j for 1 j i  1, xt+1 = 0, and, as defined in Theorem 3, Hte+i = (t, ye) and Htc+i = (t, yc).
Lemma 10
(a)
t+i (t, ot,t+i1) =
R
ij=10l(ot+j )t() ( ij=10l ot+j 0 )t(0 )d0
.
(b)
mt+i (zt+it, ot,t+i1) =
. R
f
(Rzt+iij=)10l(ijo=t10+lj(ot)+jt())dt()d
23
Proof We prove by induction. It is straightforward that the result holds for period t + 1. Assume that the result holds for period t + i. For period t + i + 1, we have
t+i+1 (t, ot,t+i)
=
R
l
l (ot+i) (ot+i0 )
t+i t+i
(t, ot,t+i1) (0 t, ot,t+i1)
d0
=
R Z
l
l (ot+i) l (ot+i1) · · · l (ot) t () (ot+i0 ) l (ot+i10 ) · · · l (ot0 ) t (0 )
d0
,
and
mt+i+1 (zt+i+1t, ot,t+i) =
f (zt+i+1) t+i+1 (t, ot,t+i) d
R
= f (zRt+i+1) l (ot+i) l (ot+i1) · · · l (ot) t () d l (ot+i) l (ot+i1) · · · l (ot) t () d
2
Lemma 11 We have the following:
(a)
Gt+1
Ўst+1
Ў 0,
ў Htc+1
,
ў Htc+1
=
[1

Mt
(y)]1
Ч
PT t i=1
R + 0
·
·
·
R + 0
rt+i
Ўst+i(xt+i,
Htc+i),
ў zt+i
R
ij=1f
(zt+j )
[1

F
(y)]
t
()
ddzt+i
·
·
·
dzt+1;
(b)
Jt+1
Ўst+1
Ў 0,
Htc+1ў
,
Hte+1ў
=
[mt
(y)]1
Ч
PT t i=1
R + 0
·
·
·
R + 0
rt+i
Ўst+i(xt+i,
Htc+i),
ў zt+i
R
ij=1f
(zt+j )
f
(y)
t
()
ddzt+i
·
·
·
dzt+1.
Proof (a) We first claim for given 1 k < i
Z
+ Z ···
+
rt+i
Ўst+i(xt+i,
Htc+i),
ў zt+i
mt+k
Ў zt+k
Htc+k
ў
=
0R Ч
0 ij=k+R1f (zt+j) kj=0l (ot+j) kj=0l (ot+j) t () d
t
()
Z
+ Z ···
+
rt+i
Ўst+i(xt+i,
Htc+i),
ў zt+i
0
0
d dzt+i · · · dzt+k
R
ij=kRf(ztkj+=j01l )(ot+kj=j01l)(ott+(j)d)t
()
d
dzt+i
·
·
·
dzt+k
.
(25)
Ў
ў
Applying Lemma 10(b) to mt+k zt+kHtc+k , the left hand of (25) equals
=
ZZЧ0R+st +k(·xij·=t·+kZk+0,RH1+tfc+lk((zr)ottZt+++ij+kЎs)t)+l·(·io(·jkxtZ=+t0+1k+li ,(oH)tr+tct++kjj=ii)0Ў1,)lszt(t++to(iti+(ўx)jRtd+)if,H(tRz(tc+t+)ik)dkj,=z)td01+lzit(ўkj+o=itR+01·lj·
0
0
0
Ч
R
ij=k+R1fl((zott++jk))l(ojkt=+01kl(o)t+kjj=01)l(to(t+)j d)
t
()
d
dzt+i
·
·
(ot+j) t () d ) t () d · dzt+k f (Rzt+kjk=)01l (jko=t+01lj · dzt+k+1dzt+k
(ot+j) t () ) t () d
d
24
+ Ч
Z +
Z
+ Z ···
+ rt+i Ўst+i(xt+i, Htc+i),
R st+k(xt+k,Htc+k) 0 ij=k+R1f (zt+j) l l (ot+k)
0 (ot+k) kj=01l (ot+j) jk=01l (ot+j) t () d
t
()
d
ў zt+i dzt+i
R
···
f (Rzt+kkj=)01l (jko=t+01lj dzt+k+1 dzt+k .
(ot+j) t () d ) t () d (26)
The first term on the right hand of (26) equals (note l (ot+k) = f (zt+k) for zt+k < st+k(xt+k, Htc+k))
=
ZЧZЧ00rRssttt+++ikk((Ўxxijs=ttt+++kkk+i,,RHH(1xttfcc++tf+kk(z))i(,tzZZ+Ht00+j++tck+ i))f)··, ··z(··tz+kjZZt=+i00ў01++kld (zo)trR+tt++i kjj·i=·Ў01·ij)s=ldt+(zkto+ti+(t(1+xkf)j+t+d(1zid)R,tz+Htjt+tc(+kkj,)i=))f,0d1zl(t(z+dotz+itў+tk+jRi·))·f·d(kjRt=zz(tt01++l)kk(djk+o=t1)0+1dljz(jkto+=)tk+01ljt((o)t+)djt())dt(()2d7)
while
the
second
term
equals
(note
l
(ot+k )
=
1F
Ўst+k (xt+k ,
ў Htc+k)
for
zt+k
st+k (xt+k ,
Htc+k ))
= = =
ЧЧЧZZZZЧs00srrRRtt++++++tt++ kk((iixxЎЎ··ttijij··ss++==··ttkk++kkZZ,,HH++ii00RR((tt11++ccxx++ff kkttЈЈ++))((11zzriiZZR,,ttt00+++HH++ijjFFtt ccЎ++ijsЎЎii))=··t))ss+ЈЈk··,,tt11+··izz++(ZZtt1kkx++00f((tii++xx+FFўў( ttzidd++,ЎЎtzz+HkkrssRtt,,tjtt+++++tcHH+iiikk)i··ttccЎ(()++··ijЈxxs,=··kk1tttz+dd))++ktRzz++ikk(tt1i,,ўў++xFўfHH¤¤kkt+RЎ++(ttcckj++zsi=11Rkjkj,ttkkd==+0Ј+H1))z1l0011jkttc(ll(++oўўx((kjk)it=oo¤¤t)F+.f+tt,01++jzklЎ(jjkkjjt,(zs==+Ho)tt00+i11+t))ў+tllckkR+t((jR((ttkoox)((tt)t++))f+))jjdjkўk=(jkRddt¤,=z0(1H))t01l+l)(tkjck+tt(o=dkjo((tk=01+t))0l+1))j(ljddo(ўkjto)¤=+)tdd+01jzztljkjttt(++(=()oii01))t··l+)dt··(dj(t··o(ddt+)zz))djtt++dtkk()++11)tdd(zt(+)2kd8)
Combining (27) and (28) leads to the claim stated in (25).
We note
Gt+1
Ўst+1
Ў 0,
Htc+1ў
,
Htc+1ў
=
TXt Z
+ Z ···
+
rt+i
Ўst+i(xt+i,
Htc+i),
ў zt+i
ij=1mt+j
Ў zt+j
Htc+j
ў
dzt+i
·
·
·
dzt+1
.
i=1 0
0
(29)
25
For each term on the right hand of (29), we obtain
Z
+ Z ···
+
rt+i
Ўst+i(xt+i,
Htc+i),
ў zt+i
ij=1mt+j
Ў zt+j
Htc+j
ў
dzt+i
·
·
·
dzt+1
= =
Z0
+
·
·
·
Z0+
rt+i
Ўst+i(xt+i
,
Htc+i
),
ў zt+i
0
0
Чij=11mt+j
Ў zt+j
Htc+j
ў
dzt+i
·
·
·
dzt+1
Z
+ Z ···
+
rt+i
Ўst+i(xt+i
,
Htc+i
),
ў zt+i
0
0
R
R
f (Rzt+iij=)10l (ijo=t10+lj(o)t+jt())dt () d
ij=1Rf (zt+j) l (ot) t l (ot) t () d
()
d
dzt+i
·
·
·
dzt+1,
Ў
ў
where the first equality comes from applying Lemma 10(b) to mt+i zt+iHtc+i and the second one
comes from applying the above claim recursively from k = i  1 to 1. So we obtain the result stated
in part (a).
(b) We can prove this part by following the same logic as that of part (a) but replacing
Ў
ў
Ў
ў
mt+i zt+iHtc+i by mt+i zt+iHte+i .
2
Proof of Theorem 3 In short, the proof of this theorem consists of the following three parts: First, we analyze the derivative of Gt (y, t), which yields t,t(y, t), the marginal informational benefit from an exact demand observation in period t brought by dy. Then, we analyze inventory carryover between periods, which yields t,t+1(y, t), · · · , t,T 1(y, t) and t,t+1(y, t), · · · , t,T (y, t), the other marginal informational benefits and the marginal inventory costs. Finally, we prove the properties related to t,t+i(y, t)'s and t,t+i(y, t)'s, as stated in (20), (21), and (22).
First, to analyze G0t (y, t), we note
Gt (y, t)
= =
Ct(y, Ct(y,
t) t)
+ +
EZ G+t+1GЎts+t1+Ў1sЎt(+y1Ў(Zy t)+z,t)+t,,yt,
ў Zt y
, t, ў zt ,
ў y Zt ў t, y zt
mt
(zt)
dzt
= Ct(y, t) + Z0y Gt+1 Ўst+1 (y  zt, t, zte) , t, zteў mt (zt) dzt
+Gt+1
Ўst+1
0Ў 0,
Htc+1ў
,
Htc+1
ў
[1

Mt
(y)]
,
which gives
G0t (y, t)
=
Ct0Z(yy, + 0
dtG) +t+1GЎts+t1+Ў1s(ty+1dЎyz0t,,Htte+, z1teў),,Htte+, z1teўўmmtt((yz)t)
dzt
d + dy
© Gt+1
Ўst+1
Ў 0,
Htc+1ў
,
Htc+1ў
[1

Mt
Є (y)]
.
(30)
26
We study the term
d dy
© Gt+1
Ўst+1
Ў 0,
Htc+1
ў
,
ў Htc+1
[1

Mt
Є (y)]
in
(30).
We
note
that
st+1
Ў 0,
ў Htc+1
wviseeltohtpreeeoatpthtseitmo+ra1elmЎ0in,hvHeerntc+et)1o.ўryTashleucvsoe,nl,isnwtahcnoitcmh(pil.ueea.t,idnfisgxtetodh)eGattne+rd1m(osstbt+d+dty11a((©0i0n,G,HHttctc+++111))Ў,Hsttc++11)Ў=0,
0 (that ў Htc+1 ,
is, we ў Htc+1
apply the enЄ [1  Mt (y)] ,
= =
+Чddddddyyymddy©ZZt+00GZ++10t +Ў+z1ЈrtЎ+rts+Gt1+t1+tH1+Ў1Ўs2tcЎ+stЎ0+t1s+,1ўtH1+Ў[1Ў02tc+0(,x,H1HўtM+tc,+tc2+Ht1,(1ўHtcyў+,)tc,z1+]ztўd+2t[+z)11,t1ў+Hўm1t+cM+t+2Gt1ў(tЎm+yz)2tt]++ЄЎ11stЎH+z2ttc+(+x11tў+H[21tc,+H1ўtcM+[12t)(,yH)M]tc+dtz2(ўty+¤)1] dzt+1.
Since
ot+1
=
st+1
Ў 0,
Htc+1ў
zt+1,
we
treat
ot+1
as
constant.
For
st+2(xt+2, Htc+2),
for
any
sample
path, we note the following two facts:
(a) If st+2(xt+2, Htc+2) > st+1(0, Htc+1)  zt+1 (i.e., there is an order placed in period t + 2), then
we
obtain
( ) Gt+2 st+2(xt+2,Htc+2),Htc+2 st+2(xt+2,Htc+2)
= 0.
(b)
Itfhesnt+w2(exot+b2t,aHintc+G2t)+2=Ўsstt++21((x0t,+H2,tc+H1tc)+2),zHt+tc+1 2(ўi.=e.,Gtth+e2reЎsits+1n(o0,
order Htc+1)
placed  zt+1,
in period t + ў Htc+2 , where,
2), as
stated above, st+1(0, Htc+1) is treated as constant.
Thus,
in
computing
the
term
d dy
R + 0
Gt+2
Ўst+2(xt+2,
Htc+2),
ў Htc+2
mt+1
Ў
ў
zt+1Htc+1 [1

Mt
(y)] dzt+1,
we treat both st+2(xt+2, Htc+2) and ot+1 as constant and thus obtain
= =
Ч+ddddddyyymddyZZZt+000Z+++20 Ў+zGZZt+00Zt2+++0 H2+Ў [tcrrs+ttt+2G++ў222tm+(ЎЎxs3sttt+tЎ+++s1222tЎ(,+(xzHx3ttt(++t+cx+122t2,+,H)HH3,t,cHt+ctc+H+1tc22+ўtc)+)2,[,31ўzz)tmt,++H2t2M+ўўtc+1mt+3Ў(ўtzy+Gtm)+2]t+1tЎd+z3zH2ttЎ++tЎsc+22zt+d1tH+ўz32t(t[c++1xH12t+ўtc+3mM,2Hўtt+m(tc1+ytЎ3)+z])1t,d+HzЎ1ztt+ct+H+131tcў+]H1ўtc+[11ў Mt (y)] dzt+2dzt+1
Ч [1  Mt (y)] dzt+2dzt+1.
Since ot+2 = st+2(xt+2, Htc+2) zt+2, we treat ot+2 as constant. Recursively, we obtain
d dy
© Gt+1
Ўst+1
Ў 0,
Htc+1ў
,
Htc+1ў
[1

Mt
Є (y)]
27
=
d dy
TXt i=1
Z + 0
·
·
·
Z + 0
rt+i
Ўst+i(xt+i,
Htc+i),
ў zt+i
ij=1mt+j
Ў zt+j
Htc+j
ў
Ч [1  Mt (y)] dzt+i · · · dzt+1.
where we treat st+i(xt+i, Htc+i) and ot+1,t+i1 as constant. Continuing the derivation by following Lemma 11 yields
d dy
© Gt+1
Ўst+1
Ў 0,
Htc+1ў
,
Htc+1ў
[1

Mt
Є (y)]
=
d dy
TXt i=1
Z + 0
·
·
·
Z + 0
rt+i
Ўst+i(xt+i,
Htc+i),
ў zt+i
Z
ij=1f
(zt+j )
[1

F
(y)]
t
()
ddzt+i
·
·
·
dzt+1
=
TXt i=1
Z + 0
·
·
·
Z + 0
rt+i
Ўst+i(xt+i,
Htc+i),
ў zt+i
Z
ij=1f
(zt+j )
d dy
[1

F
(y)]
t
()
ddzt+i
·
·
·
dzt+1
=
TXt Z 
+ Z ···
+
rt+i
Ўst+i(xt+i,
Htc+i),
ў zt+i
Z
ij=1f (zt+j) f (y) t () ddzt+i · · · dzt+1
=
i=1 Jt+1
0 Ўst+1
Ў 0,
0 Htc+1
ў
,
Hte+1ў
mt
(y).
(31)
From (30) and (31), we obtain
G0t (y, t)
= =
Ct0Z(yy, + 0 Ct0(y,
dttG)) ++t+1ЈGЎt,stt(+t+y11,Ў(sytt)++1dyЎzZ0t0,,yHtd,teG+zte1t)ў+,1,HЎts,tet+z+t1e1ўў(my tJ(dtzy+ztt)1,dЎszttt,+z1te)Ў,0, Ht, tzc+te1ўўm, tH(tze+t)1ўd¤zmt. t
(y) (32)
It is worthwhile mentioning that the proof so far, if applied to the censored newsvendor model,
can already yield the first derivative function, as stated in Lu, Song, and Zhu (2004).
Next,
we
analyze
Ry 0
dGt+1
(st+1
(yzt dy
,t ,zte
),t
,zte)
mt
(zt) dzt
in
(30),
which
is
caused
by
inven
tory carryover. We note that if (t, y) = t + 1 holds (i.e., an order is placed in period t + 1),
st+1 (y  zt, t, zte) is a reachable minimum of Gt+1 (·, t, zte), which leads to
dGt+1 Ўst+1 (y
ў  zt, t, zte) , t, zte
=
0.
(33)
dy
Thus, we obtain
= =
ZZE00ЈyyGdd0tGG+tt1++(11yЎ(ystd+Zy1tz,(ty,t,tdZ,yzztett)e,)11t(,(z(te(t)t,,,yy))t>,>ztettў++m1t1)()¤zm,t)td(zztt) dzt
(34)
28
where the first equality comes from (33) and the second one comes from the fact that (t, y) > t + 1
implies zt < y. From (32) and (34), we obtain
G0t
(y,
t)
=
Ct0(y,
t)
+
t,t(y,
t)
+
E
ЈG0t+1
(y

Zt,
t,
Zte)
1
(
(t,
y)
>
t
+
¤ 1)
.
(35)
Applying the above analysis to G0t+1 (y  zt, t, zte) yields
G0t+1 (y  zt, t, zte) = Ct0+1 (y  zt, t, zte) + [Gt+2 Ўst+2 (0, t, zte, (y  zt)e) , t, zte, (y  zt)eў
Jt+2 Ўst+2 (0, t, zte, (y  zt)c) , t, zte, (y  zt)eў]mt+1 (y  ztt, zte)
+E
ЈG0t+2
Ў y

zt

Zt+1,
t,
zte,
Zte+1ў
1
( (t,
y)
>
t
+
¤ 2)
,
which leads to
0 Gt (y, t)
=
Ct0(y, t) + t,t(y, t) + t,t+1(y, t)
+t,t+1(y,
t)
+
E
ЈG0t+2
Ў y

Zt

Zt+1,
t,
Zte,
Zte+1ў
1
( (t,
y)
>
t
+
¤ 2)
.
Recursively, we have
G0t (y, t) = Ct0(y, t) + TXt t,t+i(y, t) + TXt t,t+i1(y, t).
i=1
i=1
(36)
Finally, we prove (20), (21), and (22). We note
Gt+1
Ўst+1(xt+1,
Ht+1),
ў Ht+1
Jt+1
і st+1(xt+1,
H^ t+1 ),
ґ Ht+1
,
as
we
have
shown
in
(17),
because
Gt+1
Ўst+1(xt+1,
Ht+1),
ў Ht+1
is
the
optimal
costtogo
and
is
no
greater than the expected cost generated by any other policy. As a result, we have t,t(y, t) 0.
In general, by a similar logic, we argue
Gt+i(st+i(0, Hte+i), Hte+i) Jt+i(st+i(0, Htc+i), Hte+i),
(37)
which in turn leads to (20), i.e., t,t+i1(y, t) 0. For (21) and (22), we claim
G0t+1 (y  zt, t, zte) 1 ( (t, y) > t + 1) 0.
(38)
That is, when (t, y) > t + 1 holds for a given sample path of zt (i.e., st+1(y  zt, t, zte) = y  zt, or equivalently, no order is placed in period t + 1), we must have G0t+1 (y  zt, t, zte) 0. The argument can be proved by contradiction. We assume G0t+1 (y  zt, t, zte) < 0 when (t, y) > t + 1
29
holds. Then, froіm G0t+1 (ґy  zt, t, zte) < 0, we obtain that there exists an inventory level y0 > y zt such that Gt+1 y0, t, zte < Gt+1 (y  zt, t, zte). In other words, y  zt cannot be a minimizer (i.e., st+1(y  zt, t, zte) > y  zt, or equivalently, an order needs to be placed in period t + 1), which contradicts with the fact that (t, y) > t + 1 holds. In general, we have
G0t+i(y  z[t, t + i), t, zet,t+i1)1( (t, y) > t + i) 0.
(39)
From (35), (36), and (38), we obtain
E
ЈG0t+1
(y

Zt,
t,
Zte)
1
( (t,
y)
>
t
+
¤ 1)
=
TXt
t,t+i(y,
t)
+
TXt
t,t+i1(y,
t)
0,
i=1
i=2
which, together with t,t+i1 0, leads to
TXt t,t+i(y, t) 0. i=1
So we obtain the results stated in (21) and (22).
2
Proof of Theorem 4 We use induction to show st+i(0, t+i) smt+i(0, t+i) under the given
condition. The result holds for period T , because we know that the myopic inventory level is
optimal. Now, assume this is true for t + i, i 1, that is, st+i(0, t+i) smt+i(0, t+i). We proceed to show that st (0, t) smt (0, t). Note that according to the condition, we have smt (0, t)  zt smt+1(0, t+1). Furthermore, according to the induction assumption, we know that smt+1(0, t+1) st+1(0, t+1). Thus, we have
smt (0, t)  zt smt+1(0, t+1) st+1(0, t+1).
So, for any y smt (0, t), we have t(y, t) = 0, which implies G0t (y, t) = Ct0(y, t) + t(y, t) + t(y, t) = Ct0(y, t) + t(y, t) 0.
Therefore, we have smt (0, t) st (0, t).
2
Proof of Lemma 5 By the definition of upper bound (i.e., sut+i(y  z[t, t + i), t, zet,t+i1)), we
note the fact that
1( (t, y) > t + i) 1( u(t) > t + i) 0.
(40)
We have, for given zet,t+i1,
Ct0+i(y  z[t, t + i), t, zet,t+i1)1( (t, y) > t + i)
TXt
X i1
X k1
+
E[Ct0+k(y  zt+j  Zt+j, t, zet,t+i1, Zet+i,t+k1)1( (t, y) > t + k)] 0,
k=i+1
j=0
j=i
(41)
30
which comes from inductively using (37) and (39). We obtain
TXt t(y, t) = E[Ct0+i(y  Z[t, t + i), t, Zet,t+i1)1( (t, y) > t + i)] i=1 TXt = E[{Ct0+1 (y  Zt, t, Zte) + E[Ct0+i(y  Z[t, t + i), t, Zet,t+i1)1( (t, y) > t + i)]}1( (t, y) > t + 1)] i=2 TXt E[{Ct0+1 (y  Zt, t, Zte) + E[Ct0+i(y  Z[t, t + i), t, Zet,t+i1)1( (t, y) > t + i)]}1( u(t) > t + 1)], i=2 where the inequality comes from (40) and (41). Similarly, we can show
TXt E[Ct0+i(y  Z[t, t + i), t, Zet,t+i1)1( (t, y) > t + i]
i=2 E[{Ct0+2
Ў y

Zt

Zt+1,
t,
Zte,
Zte+1ў
TXt + E[Ct0+i(y  Z[t, t + i), t, Zet,t+i1)1( (t, y) > t + i)]}1( u(t) > t + 2)].
i=3
Recursively, we obtain
TXt t(y, t) E[Ct0+i(y  Z[t, t + i), t, Zet,t+i1)1( u(t) > t + i)]. i=1 For t(y, t)'s upper bound, we note
1( `(t) > t + i) 1( (t, y) > t + i) 0.
We also note
Ct0+i(y  z[t, t + i), t, zet,t+i1)1(Amt+i)
=
max
©Ct0+i(y

z[t,
t
+
i),
t,
zet,t+i1),
Є 0
Ct0+i(y  z[t, t + i), t, zet,t+i1)1( (t, y) > t + i).
Applying the above two facts, we obtain
0 Ct+1(y

zt,
t,
zte)1(Amt+1)
0 Ct+1(y

zt,
t,
zte)1( (t,
y)
>
t
+
1),
and, for 2 i T  t and given zet,t+i1,
0 Ct+i(y

z[t,
t
+
i),
t,
zet,t+i1)1(
`
(t)
>
t
+
i

1)1(Amt+i)
0 Ct+i(y

z[t,
t
+
i),
t,
zet,t+i1)1(
(t,
y)
>
t
+
i

1)1(Amt+i)
Ct0+i(y  z[t, t + i), t, zet,t+i1)1( (t, y) > t + i  1)1( (t, y) > t + i)
=
0 Ct+i(y

z[t,
t
+
i),
t,
zet,t+i1)1(
(t,
y)
>
t
+
i),
31
which in turn can lead to
t(y, t) TXt E[Ct0+i(y  Z[t, t + i), t, Zet,t+i1)1( ` (t) > t + i  1)1(Amt+i)]. i=1 2 Proof of Lemma 6 (a) We use a samplepath approach to prove for the upper bound; the proof for the lower bound is similar. We follow the proof of Theorem 3 and obtain
Jt+1
Ўst+1
Ў 0,
Htc+1
ў
,
Hte+1ў
= TXt ECt+i Ўst+i(xt+i, Htc+i), Hte+iў
=
i=1 TXt i=1
Z + 0
·
·
·
Z + 0
rt+i
Ўst+i(xt+i,
Htc+i),
ў zt+i
R
R
f
(zt+i)
ij=11l
(ot+j )
f
f (zt+i) ij=11l (ot+j) [1 
(y) t () d F (y)] t ()
d
ЧЧTX i=m1tijZa=0x1+mЅt1+··jf·ЎZ(Fzy0t+(+yj)Hr)ttcѕ++jiўЎijsd=tz+1t+im(ixt·+t·+j·iЎd,zHztt++tc+j1iH[)1,tcz+mt+jMtўi(ўdytzRR()ty+)iff]·(((·zzf·rttd++ozmiit+))(12[17ijij)==)m1111Mtll (((yoot tt()++yjj)]))
[1 [1
 
F F
(y)] (y)]
t t
() ()
d d
=
TXt Z
+ Z ···
+
rt+i
Ўst+i(xt+i,
Htc+i),
ў zt+i
ij=1mt+j
Ў zt+j
Htc+j
ў
dzt+i
·
·
·
dzt+1
i=1 0
0 Ѕ
ѕ
Ч [1  Mt (y)] max mt (y)
f (y) 1  F (y)
= =
GTX i=t1+t E1 ЎCstt++i1Ў(0s,t+Hi(tc+x1t+),i,HHtc+tc+1iў),[1Htmc+Mitў(yt[1()ym)]Mtm(yta()xyЅ)]
Ѕ
f
max
1
f (y) 1  F (y)
(y) F (y) ѕ .
ѕ
The key in this proof is the inequality, which we obtain by applying a change of measure from ye
to yc.
(b)
Gt+1
Ўst+1
(x,
Ht+1)
,
ў Ht+1
is
the
optimal
expected
cost
from
period
t+1
to
T
given
that
wsto+eluu1Ttp.ioodAndanetirenuivpeetapcatehorlopbweto+reui1rnoddwb,oiotuwnhnhdGtihctohe+n1oisbGЎsP stet++rTi1v1=aЎ(1sttxitE,o+HnC1 t(to++xt1i,a()Hsn,mttdH++i1tt(+)hx1,etў+Hiniit,s+vHet1nhўt+t,eoiw(ermxyep)plee)rc,vetHeseledtn+btcei(ofbmosert)leo)o.wofrfdosoellrmoinwegicnilgsaixtmhisenumpsyeeordipoiidnc
the derivation.
Claim 1: Ct+i (y, t, ot,t+i1) Ct+i(smt+i(0, t, ot,t+i1), t, ot,t+i1) for y 0.
32
Claim 2: Gt+i(st+i (x, t, ot,t+i1) , t, ot,t+i1) Gt+i(st+i(x0 , t, ot,t+i1), t, ot,t+i1) for 0 x x0. Claim 3: st+i (x, t, ot,t+i1) sut+i(x0 , t, ot,t+i1) for 0 x x0 . Claim 4: EGt+1(st+1((y  Zt)+, t, y Zt), t, y Zt) EGt+1(st+1(0, t, y0 Zt), t, y0 Zt) for 0 y y0. With the above claims, we can show
Gt+1(st+1(x, Ht+1), Ht+1) = Ct+1(st+1(x, Ht+1), Ht+1) +EGt+2(st+2((st+1(x, Ht+1)  Zt+1)+, Ht+1, st+1(x, Ht+1) Zt+1), Ht+1, st+1(x, Ht+1) Zt+1) Ct+1(smt+1(0, Ht+1), Ht+1) +EGt+2(st+2(0, Ht+1, sut+1(x, Ht+1) Zt+1), Ht+1, sut+1(x, Ht+1) Zt+1) = Ct+1(smt+1(0, Ht+1), Ht+1) +ECt+2(st+2(0, Ht+1, sut+1(x, Ht+1) Zt+1), Ht+1, sut+1(x, Ht+1) Zt+1) +EGt+3(st+3((st+2(0, Ht+1, sut+1(x, Ht+1) Zt+1)  Zt+2)+, Ht+1, sut+1(x, Ht+1) Zt+1, st+2(0, Ht+1, sut+1(x, Ht+1) Zt+1) Zt+2), Ht+1, sut+1(x, Ht+1) Zt+1, st+2(0, Ht+1, sut+1(x, Ht+1) Zt+1) Zt+2)
···
TXt
ECt+i(smt+i(0, Ht+i(u)), Ht+i(u)).
i=1
To complete the proof, we show that the above claims hold. Claim 1 comes directly from the fact that smt+i(0, t, ot,t+i1) solves miny0 Gt+i (y, t, ot,t+i1). For Claim 2, we note
Gt+i
Ўst+i
(x,
t,
ot,t+i1)
,
t,
ў ot,t+i1
=
min yx
Gt+i
(y,
t,
ot,t+i1)
,
and
Gt+i(st+i(x0
,
t,
ot,t+i1),
t,
ot,t+i1)
=
min yx0
Gt+i
(y,
t,
ot,t+i1)
,
which can lead to Claim 2 because of x x0. For Claim 3, we have sut+i(x0, t, ot,t+i1) st+i(x0, t, ot,t+i1) holds by the definition of upper bound. We note that st+i (x, t, ot,t+i1) solves minyx Gt+i (y, t, ot,t+i1) and st+i(x0 , t, ot,t+i1)
solves minyx0 Gt+i (y, t, ot,t+i1), which can lead to Claim 3. For Claim 4, we note
EGt+1
Ўst+1
Ў (y

Zt)+,
t,
y
ў Zt
,
t,
y
ў Zt
EGt+1
Ўst+1
(0,
t,
y
Zt)
,
t,
y
ў Zt
33
from Claim 2.
To
show
that
Claim
4
holds,
we
need
to
show
( dEGt+1 st+1(0,t,yZt),t,yZt) dy
0,
which comes from
dEGt+1
Ўst+1
(0,
t,
y
Zt)
,
t,
y
ў Zt
=
d dy
ЅZ y 0
Gt+1
dy Ўst+1 (0,
t,
zte)
,
t,
zteў
mt
(zt)
dzt
+
Gt+1
Ўst+1
Ў 0,
Htc+1ў
,
Htc+1ў
[1

Mt
ѕ (y)]
= =
Gt+1
Ўst+1
Ў 0,
Hte+1ў
,
Ј Gt+1
Ўst+1
Ў 0,
Hte+1ў
Hte+1ў mt , Hte+1ў 
(Jyt)++1 Ўsddty+1©GЎ0t,+H1 tЎc+s1t+ў1,
Ў 0,
Htc+1ў
,
Hte+1ў¤ mt
Htc+1ў [1 (y) 0.

Mt
Є (y)]
We note that the last equality comes from (31). We thus complete the proof of this lemma. 2
Proof of Theorem 7 We note G0t (y, t) Ct0(y, t) + t(y, t) by applying (19) and G0t (y, t) Ct0(y, t) + t,t(y, t) by setting i = 1 in (21). Thus, applying Lemma 5 and 6 leads to the result
stated in this theorem.
2
Proof of Theorem 9 (a) We note
VtH (x, t)  Vt(x, t) = VtH (x, t)  Gt(st (x, t), t)
= =
Gt Gt
(sHt (sHt
(x, (x,
t), t),
t) t)
 
Gt(st (x, Gt(st (x,
t t
), ),
t t
) )
+ +
VntCHt((xs,Ht (tx),tG),t(st)Ht +(xE, [Vt)tH,+1tі) ЎsHt
(x,
t)

Ztў+
,
ґo t+1 ]

n Ct(sHt
(x,
t),
t)
+
E[Vt+1
іЎsHt
(x,
t)

Ztў+
,
ґo t+1 ]
=
G+Et(s[VHt tH(+x1,іЎts),Ht (tx),tG) t(sZt (tўx+, ,t)t,+1tґ)

Vt+1
іЎsHt
(x,
t)

Ztў+
,
ґ t+1 ],
(42)
where both t+1's in (42) refer to the same posterior distribution obtained from the observation sHt (x, t) Zt in period t. We then develop an upper bound on Gt(sHt (x, t), t)  Gt(st (x, t), t), for which we discuss two cases, sHt (x, t) st (x, t) and sHt (x, t) < st (x, t). For sHt (x, t) st (x, t), we have
Gt(sHt (x, t), t)  Gt(st (x, t), t)
=
0 Gt
(y,
t)
st
(x,t
)ysHtn(x,t
)
ЎsHt
(x, t oі
)

st (x,
t
ў )
ґ
maxs`t(x,t)ysHt (x,t)
0 Gt (y, t)
sHt (x, t)  s`t(x, t)
і
ґ
maxs`t (x,t )ysHt
(x,t)
{gtu
(y,
t
)}
і
sHt
(x,
t
)

s`t
(x,
t) ґ
(from
Theorem
7)
maxs`t(0,t)ysHt (0,t) {gtu (y, t)} sHt (0, t)  s`t(0, t) ,
where the last inequality comes from the fact that sHt (x, t)  s`t(x, t) = sHt (0, t)  s`t(0, t) when x < sut (0, t) and sHt (x, t)  s`t(x, t) = 0 when x sut (0, t). For sHt (x, t) < st (x, t), by a
34
similar analysis, we can obtain
Gt(sHt (x,
t),
t)

Gt(st (x,
t),
t)
minsHt (0,t)ysut (0,t)
n gt`
(y,
o t)
Ўsut (0,
t)

sHt (0,
ў t)
.
We combine the results of the two cases together and obtain from (42)
VtH (x, t)  Vt(x, t) max{B1(t, t), B2(t, t)}
+
E[VtH+1
іЎsHt
(x,
t
)

Zt
ў+
,
t+1
ґ

Vt+1
іЎsHt
(x,
t)

Ztў+
,
ґ t+1 ].
Applying
the
above
recursively
to
VtH+1
іЎ sHt
(x,
t)

Ztў+
,
ґ t+1

Vt+1
іЎ sHt (x,
t)

Ztў+
,
ґ t+1
completes the proof.
(b) We have
Vt(x, t) = Gt(st (x, t), t)
=
Ct
(st (x,
t),
t)
+
E
Gt+1
Ўst+1
Ў(st (x,
t)

Zt
)+
,
t
,
st (x,
t)
ў Zt
,
t,
st
(x,
t)
ў Zt
Ct(smt (0, t), t) + EGt+1 Ўst+1 (0, t, sut (x, t) Zt) , t, sut (x, t) Ztў
(from Claim 1 and 4 in Proof of Lemma 6) = Ct(smt (0, t), t) + EGt+1 Ўst+1 (0, t, sut (x, t) Zt) , t, sut (x, t) Ztў TXt Ct(smt (0, t), t) + ECt+i(smt+i(0, Ht+i(u)), Ht+i(u)).(from Lemma 6(b)) i=1
2
References [1] Agrawal, N. and S. A. Smith, 1996. Estimating Negative Binomial Demand for Retail Inventory Management with Unobservable Lost Sales. Naval Research Logistics 43, 839861. [2] Azoury, K. S., 1985. Bayes Solution to Dynamic Inventory Models under Unknown Demand Distribution. Management Science 31, 11501160. [3] Bensoussan, A., M. Cakanyildirim, and S. P. Sethi, 2005. A Multiperiod Newsvendor Problem with Partially Observed Demand. Forthcoming Mathematics of
operations research. [4] Bensoussan, A., M. Cakanyildirim, and S. P. Sethi, 2006. Censored Newsvendor Model Revisited with Unnormalized Probabilities.
Working Paper, University of Texas at Dallas. [5] Braden, D. J. and M. Freimer, 1991. Informational Dynamics of Censored Observations. Management Science 37, 13901404. 35
[6] Chen, L. and E. L. Plambeck, 2005. Dynamic Inventory Management with Learning about the Demand Distribution and Substitution Probability. Working paper,
Stanford University. [7] Ding, X. M., L. Puterman, and A. Bisi, 2002. The Censored Newsvendor and the Optimal Acquisition of Information. Operations Research 50, 528537. [8] Eppen, G. D. and A. V. Iyer, 1997. Improved Fashion Buying with Bayesian Updates. Operations Research 45, 805819. [9] Harpaz, G., W. Y. Lee, and R. L. Winkler, 1982. Learning, Experimentation, and the Optimal Output Decision of a Competitive Firm. Management Science 28, 589603. [10] Huh W. T. and P. Rusmevichientongy, 2006. A NonParametric Approach to Stochastic Inventory Planning with Lost Sales and Censored Demand. Working paper,
Columbia University. [11] Karlin, S., 1960. Dynamic Inventory Policy with Varying Stochastic Demands. Management Science 6, 231258. [12] Lariviere, M. A. and E. L. Porteus, 1999. Stalking Information: Bayesian Inventory Management with Unobserved Lost Sales. Management Science 45, 346363. [13] Lovejoy, W. S., 1990. Stopped Myopic Policies in Some Inventory Models with Uncertain Demand Distributions. Management Science 36, 724738. [14] Lovejoy, W. S., 1992. Stopped Myopic Policies in Some Inventory Models with Generalized Demand Processes. Management Science 38, 688707. [15] Lu, X., J. S. Song, and A. Regan, 2003. Inventory Planning with Forecast Updates: Approximate Solutions and Cost Error Bounds. Forthcoming in Operations Research. [16] Lu, X., J. S. Song, and K. Zhu, 2004. Analysis of PerishableInventory Systems with Censored Demand Data. Forthcoming in Operations Research. [17] Morton, T. E., 1978. The Nonstationary Infinite Horizon Inventory Problem. Management Science 24, 14741482. [18] Morton, T. E. and D. W. Pentico, 1995. The Finite Horizon Nonstationary Stochastic Inventory Problem: Nearmyopic Bounds, Heuristics, Testing. Management Science 41, 334343. [19] Nahmias, S., 1994.
demand estimation in Lost Sales Inventory Systems. Naval Research Logistics 41, 739757. 36
[20] Scarf, H., 1959. Bayes Solutions of the Statistical Inventory Problem. Annals of
Mathematical Statistics 30, 490508. [21] Scarf, H., 1960. Some Remarks on Bayes Solution to the Inventory Problem. Naval Research Logistics Quarterly 7, 591596. [22] Shang, K. H. and J. S. Song, 2003. Newsvendor Bounds and Heuristic for Optimal Inventory Policies in Serial
supply chains. Management Science 49, 618638. [23] Veinott, A. F., Jr., 1963. Optimal Stockage Policies with Nonstationary Stochastic Demands. H. Scarf, D. Gilford, M. Shelly, eds. Multistage Inventory Models and Techniques. Office of Naval Research Monograph Series on
Mathematical Methods in Logistics, Stanford University Press,
Stanford, CA, 85115. 37
X Lu, JS Song, K Zhu