1 Introduction

The sampling formula of Ewens [8] defines a probability distribution over unordered

partitions of a positive integer n as follows: for a sequence of non-negative integers

(m1, . . . , mn) with i = 1, . . . , n, is

i imi = n, the probability that the partition has mi parts of size i,

k

n!

Pn,(m1, . . . , mn) = (1;n)

n j=1

j mj

mj

!

(1)

where k = m1 + . . . + mn is the total number of parts, > 0 is a parameter of the distribution, and (1;n) = ( + 1) . . . ( + n - 1). For background and applications to genetics see [11, 12, 13, 6, 5, 19]. This note presents a family of distributions Pn,, for random partitions of n, with two parameters 0 and 0 1. For = 0, > 0, Pn,,0 = Pn, as in (1). As indicated at the end of this introduction, the case 0 < < 1 arises naturally in the study of certain random partitions associated with stable processes with index . In particular the case = 1/2 is related to random partitions induced by the zeros of Brownian motion.

Research supported by N.S.F. Grant MCS91-07531

1

Here is the formula for the partition distribution Pn,,: for non-negative integers (m1, . . . , mn) with i imi = n,

Pn,,(m1,

.

.

.

,

mn)

=

N (m1,

.

.

.

,

mn) (

(;k-1) + Ї)(1;n-1)

n [Ї(1;j-1)]mj j=1

(2)

where Ї = 1 - ;

n! N (m1, . . . , mn) = [j!]mj mj!

is the number of partitions of a set of n elements into mi classes of size i, i = 1, 2, . . . , n; and for a real number a and non-negative integer m,

(a;m) =

1 for m = 0 ( + a) . . . ( + (m - 1)a) for m = 1, 2, . . .

Proposition 1 For each > 0 and 0 < 1, formula (2) defines a probability distribution on unordered partitions of n. These distributions are consistent in the sense of Kingman [17]: if a set of n + 1 elements is partitioned into random subsets with sizes distributed according to Pn+1,,, and independently one of the n + 1 elements picked uniformly at random is deleted, the induced partition of n elements is distributed according to Pn,,. There are two trivial cases of Proposition 1. For = 0 the distribution Pn,0, is concentrated on the partition with a single component of size n. For = 1 the distribution Pn,,1 is concentrated on the partition with n components of size 1. For > 0, 0 < 1, formula (2) assigns strictly positive probability to all possible partitions of n. In this case Proposition 1 is an immediate consequence of the next proposition:

Proposition 2 Fix > 0, 0 < 1. For i = 1, 2, . . . let Xi be independent random variables with beta (Ї, -+i) distributions. Let (Pi) be the random discrete probability distribution on {1, 2, . . .} defined as follows:

Pi = XЇ1XЇ2 . . . XЇi-1Xi

(3)

where xЇ = 1 - x. Given (Pi), let Y1, Y2, . . . be independent and identically distributed on {1, 2, . . .} according to (Pi). Define an equivalence relation n on {1, . . . n} by i n j iff Yi = Yj. Then the random partition of n induced by n has distribution Pn,, as in (2).

Proposition 2 effectively determines Kingman's representation [18] of the partition structure defined by formula (2). Proposition 2 is derived in Section 2 as a consequence of the following result.

2

Proposition 3 For (Pi) and (Yn) as in Proposition 2, let T1 = 1 Tn+1 = inf{m > Tn : Ym {YT1, . . . , YTn}, the sequence of indices at which new Y -values appear. Then (PT1, PT2, . . .) =d (P1, P2, . . .). In other words, the random probability distribution defined by (3) is invariant under size biased random permutation. Proposition 3 is an immediate consequence of the work of Perman, Pitman and Yor [20, 21], who showed that a sequence of random variables with representation (3) is obtained by size biased sampling of a certain point process derived from a Poisson point process on (0, ). Another derivation of Proposition 3 is given in Pitman[23]. Let B = (Bt, 0 t 1) be a stochastic process, for example a Brownian motion or Brownian bridge. Independent of B let U1, . . . , Un be i.i.d. uniform [0, 1]. Define an equivalence relation n on {1, . . . , n} by i n j iff Ui and Uj fall in the same excursion of B away from zero: that is to say Bt = 0 for all t between Ui and Uj. Let Pn be the distribution of the unordered partition of n induced by n . According to the results of [21], if B is any of the processes considered below, then the size-biased presentation of the lengths of maximal subintervals of [0, 1] that are free of zeros of B defines a random discrete distribution (Pi) of the form described in Proposition 2. Consequently, Proposition 2 implies · If B is standard Brownian motion, then Pn = Pn,1/2,1/2 · If B is standard Brownian bridge, then Pn = Pn,1,1/2 Somewhat more generally, · If B is a Bessel process of dimension , 0 < < 2, started at B0 = 0, then Pn = Pn,, for = 1 - /2 · If B is a Bessel bridge of dimension , 0 < < 2, starting and ending at 0, then Pn = Pn,2, for = 1 - /2 Corollary 3.15 of [21] shows how to construct for any 0 < < 1 and > 0 a process B absolutely continuous with respect to a Bessel bridge of dimension = 1 - 2 such that the zeros of B induce the partition distribution Pn,,. 3

2 Derivation of the Formula

Let (Pi) be as in (3), and (Yn) an iid sample from (Pi) as in the statement of Proposition 2. Formula (1) for the distribution of unordered partition of n induced by the values of Y1, . . . , Yn is a consequence of the following formula for the distribution of the ordered partition induced by the same values, with order defined by the order in which values appear in the sequence. The following proposition extends a formula due to Donnelly and Tavare [7] in case = 0.

Proposition 4 Fix n, , , and for (n1, . . . , nk) a sequence of integers with 1 ni n,

k i=1

ni

=

n,

let

Qn,(n1, . . . , nk)

denote

the

probability

that

for

each

1

j

n,

the

jth

value to appear in the Y sequence appears nj times among Y1, . . . , Yn. Then

Qn,,(n1,

.

.

.

,

nk )

=

#(n1,

.

.

.

,

(;k-1) nk) (Ї +

k i=1

Ї(1;ni-1)

)(1;n-1)

(4)

where

n!

#(n1, . . . , nk) = nk(nk + nk-1) . . . (nk + . . . + n1) ki=1(ni - 1)!

(5)

Proof. It is elementary that #(n1, . . . , nk) as in (4) is the number of different ways to arrange n1 values of one type, n2 of a second, and so on, subject to the constraint that the first value is of the first type, the next distinct value of the second type, and so on. The other factor in (4) is the probability of any given arrangement of this kind. Indeed, for any given arrangement, due to the conclusion of Proposition 3, by conditioning on the successive P -values to appear this probability is found to be

m1(n1 - 1, n2 + . . . + nk) m2(n2 - 1, n3 + . . . + nk) . . .

. . . mk-1(nk-1 - 1, nk) mk(nk - 1, 0)

(6)

where

mi(r, s) = E(XirXЇis)

(7)

=

B(Ї + r, - + i + s) B(Ї, - + i)

(8)

with

(a)(b)

B(a, s) =

(9)

(a + b)

4

the beta function. To illustrate the derivation of (6) for k = 3, the probability of any particular arrangement of kind (n1, . . . , nk) is found to be

E[X1n1-1XЇ1(XЇ1X2)n2-1(XЇ1XЇ2)(XЇ1XЇ2X3)n3-1]

=

E

[X1n1

-1

XЇ11+n2

X -1+1+n3-1 n2 2

-1

XЇ21+n3

-1

X3n3-1]

= E[X1n1-1XЇ1n2+n3 ]E[X2n2-1XЇ2n3]E[X3n3-1].

A similar calculation yields (6) for any other value of k. Now substituting (8) in (6) and using (9) and (r + 1) = r(r) allows (6) to be manipulated into the form shown in (4).

Proof of Proposition 2. what must be shown is that when formula (4) is summed over the k!/(m1!m2! . . . mn!) distinct (n1, . . . , nk) such that #{i : ni = j} = mj, the result is formula (1). In case = 0 this is a well known fact which amounts to the identity

1

1

1

n1

nk (nk + nk-1) . . . (nk + . . . n1) = i=1 imimi! ,

(10)

where the sum is over the same range. (This identity is easily understood by computing

the probability that a random permutation of n elements has mi cycles of length i in two different ways).

The result in case 0 < < 1 is obtained by using the identity (10) to simplify the

sum in that case, after noting that for every (n1, . . . nk) whose distribution is given by

(m1, . . . , mn),

k

n

(ni - 1)! = [(j - 1)!]mj

i=1

j=1

and similarly

k

n

Ї(1;ni-1) = [Ї(1;j-1)]mj

i=1

j=1

so both these products are constants so far as the summation is concerned.

3 Questions According to Kingman [17], the Ewens sampling distributions (Pn,,0) are characterized among all consistent families (Pn) by the following feature:

5

For a random partition with distribution Pn, if the part containing a point picked uniformly from {1, . . . , n}, independently of the random partition is deleted, then given that k points remain, the distribution of the remaining partition of k is Pk, for every 1 k n. A corresponding property of (Pn,,), follows easily from Proposition 2 and 3. For a random partition with distribution Pn,,, if the part containing a point picked uniformly from {1, . . . , n}, independently of the random partition, is deleted, then given that k points remain, the distribution of the remaining partition of k is Pk,+,, for every 1 k n. It would be interesting if this property could be used to characterize the family (Pn,,). This motivates the following questions: Question 5 Suppose that (Pn) and (Qn) are two consistent partition distributions, and that for every n 2, when the part containing a random element is deleted from a Pn partition, given that k elements remain these are partitioned according to Qk, 1 k n. Does this imply Pn = Pn,, for some 0, 0 < 1? Question 6 Suppose for each i = 0, 1, 2, . . . that (Pni) is a consistent family of partition distributions, and that for every i 1 and n 2, when the part containing a random element is deleted from a Pni partition, given that k elements remain these are partitioned according to Pki+1, 1 k n. Does this imply Pn0 = Pn,, (hence Pni = Pn,+i,) for some 0, 0 1? By arguing as in Hoppe [12], using Kingman's representation of the partition distributions, these questions have the same answers as the next two questions respectively: Question 7 Does the two parameter family of joint distributions for (Xi) described in Proposition 2 comprise all joint distributions for a sequence of random variables (Xi) with values in [0, 1] such that both X1 is independent of (X2, X3, . . .), and (Pi) defined by (3) is invariant under size biased permutation? Question 8 Are the beta distributions for Xi displayed in Proposition 2 the only possible distributions for independent Xi such that (Pi) defined by (3) is invariant under size biased permutation? It appears possible to construct an example to show that the answer to Question 7 is no, hence also the answer to Question 5 is no. Some results related to Question 8 may be found in Pitman [23]. 6

A Subsequent literature The results of this technical report were published in [22]. The two-parameter family was characterized in various ways in [16, 23, 32]. Applications to Brownian and Bessel excursions appear in [25]. The associated two-parameter family of Poisson-Dirichlet distributions was decribed in [29]. The family has found applications in the theory of processes of fragmentation and coagulation [1, 2, 3, 26]. Other papers describing various aspects of the two-parameter family are [9, 30, 31]. Various generalizations are treated in [24], [28], [10]. Much of this literature is reviewed in [27]. See Carlton [4], [14], [15] for related studies from the perspective of Bayesian non-parametric statistics. References [1] D.J. Aldous and J. Pitman. The standard additive coalescent. Ann. Probab., 26:17031726, 1998. [2] J. Bertoin and J. Pitman. Two coalescents derived from the ranges of stable subordinators. Electron. J. Probab., 5:no. 7, 17 pp., 2000. [3] E. Bolthausen and A.-S. Sznitman. On Ruelle's probability cascades and an abstract cavity method. Comm. Math. Phys., 197(2):247276, 1998. [4] M. A. Carlton. Applications of the two-paramter Poisson-Dirichlet distribution. PhD thesis, U.C.L.A., 1999. [5] R. A. Doney. On the asymptotic behaviour of first passage times for transient random walk. Probability Theory and Related Fields, 81:239 246, 1989. [6] P. Donnelly. Partition structures, Poґlya urns, the Ewens sampling formula, and the ages of alleles. Theoretical Population Biology, 30:271 288, 1986. [7] P. Donnelly and S. Tavarґe. The ages of alleles and a coalescent. Adv. Appl. Probab., 18:119 & 1023, 1986. [8] W.J. Ewens. The sampling theory of selectively neutral alleles. Theor. Popul. Biol., 3:87 112, 1972. [9] S. Feng and F. M. Hoppe. Large deviation principles for some random combinatorial structures in population genetics and Brownian motion. Ann. Appl. Probab., 8(4):975994, 1998. 7

[10] A. Gnedin and J. Pitman. Regenerative composition structures. Technical Report 644, Dept. Statistics, U.C. Berkeley, 2003. Available at http://www.stat.berkeley.edu/tech-reports/. [11] F. M. Hoppe. Poґlya-like urns and the Ewens sampling formula. Journal of Mathematical Biology, 20:91 94, 1984. [12] F. M. Hoppe. Size-biased filtering of Poisson-Dirichlet samples with an application to partition structures in genetics. Journal of Applied Probability, 23:1008 1012, 1986. [13] F. M. Hoppe. The sampling theory of neutral alleles and an urn model in population genetics. Journal of Mathematical Biology, 25:123 159, 1987. [14] L. F. James. Generalized weighted gamma random measures and the two-parameter Poisson Dirchlet distribution. Preprint, 2001. [15] L. F. James. Poisson calculus for spatial neutral to the right processes. arXiv:math.PR/0305053, 2003. [16] S. Kerov. Coherent random allocations and the Ewens-Pitman formula. PDMI Preprint, Steklov Math. Institute, St. Petersburg, 1995. [17] J. F. C. Kingman. Random partitions in population genetics. Proc. R. Soc. Lond. A., 361:120, 1978. [18] J. F. C. Kingman. The representation of partition structures. J. London Math. Soc., 18:374380, 1978. [19] J. F. C. Kingman. The Mathematics of Genetic Diversity. SIAM, 1980. [20] M. Perman. Order statistics for jumps of normalized subordinators. Stoch. Proc. Appl., 46:267281, 1993. [21] M. Perman, J. Pitman, and M. Yor. Size-biased sampling of Poisson point processes and excursions. Probab. Th. Rel. Fields, 92:2139, 1992. [22] J. Pitman. Exchangeable and partially exchangeable random partitions. Probab. Th. Rel. Fields, 102:145158, 1995. [23] J. Pitman. Random discrete distributions invariant under size-biased permutation. Adv. Appl. Prob., 28:525539, 1996. 8

[24] J. Pitman. Some developments of the Blackwell-MacQueen urn scheme. In T.S. Ferguson et al., editor, Statistics, Probability and game theory; Papers in honor of David Blackwell, volume 30 of Lecture Notes-Monograph Series, pages 245267. Institute of Mathematical Statistics, Hayward, California, 1996. [25] J. Pitman. Partition structures derived from Brownian motion and stable subordinators. Bernoulli, 3:7996, 1997. [26] J. Pitman. Coalescents with multiple collisions. Ann. Probab., 27:18701902, 1999. [27] J. Pitman. Combinatorial stochastic processes. Technical Report 621, Dept. Statis- tics, U.C. Berkeley, 2002. Lecture notes for St. Flour course, July 2002. Available via www.stat.berkeley.edu. [28] J. Pitman. Poisson-Kingman partitions. In D. R. Goldstein, editor, Science and Statistics: A Festschrift for Terry Speed, volume 30 of Lecture Notes-Monograph Series, pages 134. Institute of Mathematical Statistics, Hayward, California, 2003. [29] J. Pitman and M. Yor. The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab., 25:855900, 1997. [30] H. Yamato and M. Sibuya. Moments of some statistics of Pitman sampling formula. Bull. Inform. Cybernet., 32(1):110, 2000. [31] H. Yamato, M. Sibuya, and T Nomachi. Ordered sample from two-parameter gem distribution. Statist. Probab. Lett., 55:1927, 2001. [32] S.L. Zabell. The continuum of inductive methods revisited. In J. Earman and J. D. Norton, editors, The Cosmos of Science, Pittsburgh-Konstanz Series in the Philosophy and History of Science, pages 351385. University of Pittsburgh Press/UniversitЁatsverlag Konstanz, 1997. 9

J Pitman

Copyright © 2018 doc.uments.com