public int sample()

in commons-rng-sampling/src/main/java/org/apache/commons/rng/sampling/distribution/RejectionInversionZipfSampler.java [96:178]


        public int sample() {
            // The paper describes an algorithm for exponents larger than 1
            // (Algorithm ZRI).
            // The original method uses
            //   H(x) = (v + x)^(1 - q) / (1 - q)
            // as the integral of the hat function.
            // This function is undefined for q = 1, which is the reason for
            // the limitation of the exponent.
            // If instead the integral function
            //   H(x) = ((v + x)^(1 - q) - 1) / (1 - q)
            // is used,
            // for which a meaningful limit exists for q = 1, the method works
            // for all positive exponents.
            // The following implementation uses v = 0 and generates integral
            // number in the range [1, numberOfElements].
            // This is different to the original method where v is defined to
            // be positive and numbers are taken from [0, i_max].
            // This explains why the implementation looks slightly different.

            while (true) {
                final double u = hIntegralNumberOfElements + rng.nextDouble() * r;
                // u is uniformly distributed in (hIntegralX1, hIntegralNumberOfElements]

                final double x = hIntegralInverse(u);
                int k = (int) (x + F_1_2);

                // Limit k to the range [1, numberOfElements] if it would be outside
                // due to numerical inaccuracies.
                if (k < 1) {
                    k = 1;
                } else if (k > numberOfElements) {
                    k = numberOfElements;
                }

                // Here, the distribution of k is given by:
                //
                //   P(k = 1) = C * (hIntegral(1.5) - hIntegralX1) = C
                //   P(k = m) = C * (hIntegral(m + 1/2) - hIntegral(m - 1/2)) for m >= 2
                //
                //   where C = 1 / (hIntegralNumberOfElements - hIntegralX1)

                if (k - x <= s || u >= hIntegral(k + F_1_2) - h(k)) {

                    // Case k = 1:
                    //
                    //   The right inequality is always true, because replacing k by 1 gives
                    //   u >= hIntegral(1.5) - h(1) = hIntegralX1 and u is taken from
                    //   (hIntegralX1, hIntegralNumberOfElements].
                    //
                    //   Therefore, the acceptance rate for k = 1 is P(accepted | k = 1) = 1
                    //   and the probability that 1 is returned as random value is
                    //   P(k = 1 and accepted) = P(accepted | k = 1) * P(k = 1) = C = C / 1^exponent
                    //
                    // Case k >= 2:
                    //
                    //   The left inequality (k - x <= s) is just a short cut
                    //   to avoid the more expensive evaluation of the right inequality
                    //   (u >= hIntegral(k + 0.5) - h(k)) in many cases.
                    //
                    //   If the left inequality is true, the right inequality is also true:
                    //     Theorem 2 in the paper is valid for all positive exponents, because
                    //     the requirements h'(x) = -exponent/x^(exponent + 1) < 0 and
                    //     (-1/hInverse'(x))'' = (1+1/exponent) * x^(1/exponent-1) >= 0
                    //     are both fulfilled.
                    //     Therefore, f(x) = x - hIntegralInverse(hIntegral(x + 0.5) - h(x))
                    //     is a non-decreasing function. If k - x <= s holds,
                    //     k - x <= s + f(k) - f(2) is obviously also true which is equivalent to
                    //     -x <= -hIntegralInverse(hIntegral(k + 0.5) - h(k)),
                    //     -hIntegralInverse(u) <= -hIntegralInverse(hIntegral(k + 0.5) - h(k)),
                    //     and finally u >= hIntegral(k + 0.5) - h(k).
                    //
                    //   Hence, the right inequality determines the acceptance rate:
                    //   P(accepted | k = m) = h(m) / (hIntegrated(m+1/2) - hIntegrated(m-1/2))
                    //   The probability that m is returned is given by
                    //   P(k = m and accepted) = P(accepted | k = m) * P(k = m) = C * h(m) = C / m^exponent.
                    //
                    // In both cases the probabilities are proportional to the probability mass function
                    // of the Zipf distribution.

                    return k;
                }
            }
        }