## High probability Histogram testing Sample complexity $$ n^{1/3} k^{1/3} \epsilon^{-4/3} \log^{1/3} (1/\delta) + n^{1/2} \epsilon^{-2} \log^{1/2} (1/\delta) + \epsilon^{-2} \log (1/\delta) $$ ### Assume all intervals have equal size 1. Run independent test 2. Sum over the weight of the $i$-th element of each interval to get a marginal distribution on the domain $[n/k]$. Assert the distribution is uniform. ### Assume all intervals are within the size of $n/k$. Sub-divide each element in the interval such that the size of the interval exceeds $n/k$. Then, we truncate the distribution to the first $n/k$ elements after division. Then, we lose at most half of the total variation distance. ### Decompose large intervals If an interval has $x \cdot n/k$ elemments, we break it into $x$ sub-intervals. This introduces at most $k$ extra sub-intervals. Then, we assert the distribution after decomposition is an $O(k)$ histogram. Then, treat one interval as one element, we get a distribution on the domain at most $[2k]$. We put "interval" around elements corresponding to the sub-intervals. Assert the distribution on $[2k]$ is uniform on these intervals (there should exactly k such interval). Here, we no longer invoke the independent tester. Instead, we use closeness tester directly. ## Unknown interval Maybe we can make some minor improvement if $k<=n^{1/4}$. Through some rough computation, it seems the sample complexity is: $$ \sqrt{n} \epsilon^{-2} + n^{1/8} k \epsilon^{-5/2} \log^{1/2} k + n^{1/4} k \epsilon^{-1} \log(n \epsilon^{-1}) $$ The idea is to use the lipchitzness of the high probability histogram tester constructed above (I assume it "inherits" the property from the fundament closeness tester). First, we divide the domain into $n^{1/4} k \epsilon^{-1}$ intervals, where each of them has mass within $O( \frac{\epsilon}{n^{1/4}k})$. Then, we run histogram tester to assert whether all the intervals are uniform. Of course, they are not - there are bad intervals. So after we get the test statistics, we are allowed to remove at most $k$ intervals, where the total number of samples do not exceed $n^{1/4} \epsilon^{-1}$. Then, we assert the distribution on the domain $[n^{1/4} k \epsilon^{-1}]$ is a k histogram, where we try over all possible partitions (also, we are allowed to exclude at most k bad intervals).