In Order to test that every possible tree shape can be obtained we need to know 2 things:
Question 1 is easy answered by checking values for Catalan numbers - wiki , OEIS
Question 2 can be approached via ranking which loosely means assigning a position in a to a combinatorial object, while very often there’s no natural way of ranking objects each sound ranking approach is unambiguous.
We’ll use approach with Weight Sequences described in Enumerating, Ranking and Unranking Binary Trees paper.
In short it works like this:
leaf
(i.e. null
) nodes that are reachable via its
left link.The interesting property of such weight sequences is that value at position n
is at most n
,
so then those weight sequences can be reinterpreted as integers using Factoradic number system
The boilerplate will be functional interface mapping from a Tree<?>
to a list of
integers, inside there’ll be creation of a list and recursive call to do the real work.
WeightSequence class outline
public class WeightSequence implements Function<Tree<?>, List<Integer>> {
@Override
public List<Integer> apply(Tree<?> tree) {
List<Integer> sequence = new ArrayList<>();
recursive(tree, sequence);
return sequence;
}
sequence
is used for keeping state so far in recursion, so it needs to be passed around to calls.
Inside recursive call idea is as follows. Since we’re about to count null
values we need to just return 1
whenever null is encountered - that’s termination condition.
Now for a recursive call go to left and then to right now, while additionally append obtained value
to sequence
each time left recursive call returns.
Once right recursive call returns return sum of leftCount
and rightCount
.
WeightSequence recursive part
private int recursive(Tree<?> root, List<Integer> sequence) {
if (root == null) {
return 1;
}
int leftCount = recursive(root.left(), sequence);
sequence.add(leftCount);
int rightCount = recursive(root.right(), sequence);
return leftCount + rightCount;
}
In factoradic “number system” (it’s not a real number system because it doesn’t have finite amount of available digits)
for n-th position in a number a $n$
possible symbols can be chosen.
So for some digit $x$
in position $y$
we’d denote this as $x_y$
and then $x \in [0, .., y-1]$
.
This should be checked in code to avoid unsound inputs.
Also, because there’s only one possible value in 0th position we’ll skip it because it’s always the same and doesn’t contribute to final value.
Given following digits in their respective positions
Then interpreting this as an integer works the following way
In more compact form:
To make it practical for recreational implementation let’s use Long
type to represent $X$
,
but this requires additional consideration on data type sizes.
So MAX_VALUE for Long
is $2^63-1
and maximal value representable with $N$
factoradic digits is $(N+1)! - 1
. Following that we need to include check for a maximal value of N
for which $(N+1)! - 1 < 2^63 - 1
to avoid trying to convert sequences that won’t fit into Long
type.
The max value for which above inequality holds is $N=19$
, hence we’ll not convert sequences longer than 19 elements.
As usual let’s start with some boilerplate for functional interface.
FactoradicConverter class outline
public class FactoradicConverter implements ToLongFunction<List<Integer>>, Function<List<Integer>, Long> {
@Override
public Long apply(List<Integer> factoradic) {
// LOGIC GOES HERE
}
@Override
public long applyAsLong(List<Integer> value) {
return 0;
}
}
Now let’s translate reasoning from the previous section into logic inside apply
method.
FactoradicConverter class outline
public Long apply(List<Integer> factoradic) {
if(factoradic.size()>19)throw new ArithmeticException();
if(factoradic.stream().anyMatch(integer->integer< 0))throw new ArithmeticException();
long result=0L;
long base=1L;
for(int i=0;i<factoradic.size();i++){
base*=(i+1);
int curr=factoradic.get(i);
if(curr>i+1)throw new ArithmeticException("Wrong factoradic number");
result+=base*curr;
}
return result;
}
At this point we’re able to convert each Tree
instance into a positive integer using intermediate
weight sequence representation which is interpreted as factoradic number.
This way we can easily keep track of different Tree shapes seen so far in a unit test.
So the test inputs would be:
$n$
of a tree$C_n$
Let’s step aside from the topic drawing random trees for a moment and just focus on a problem of collecting all
$X$
different equiprobable values from some collection. This problem is called Coupon Collectors Problem
and let’s call the number of trails needed to achieve full collection $K_X$
.
So we have:
Where $H_X$
is Xth Harmonic Number
Now our input parameter is size $n$
of a Tree
which results in $C_n$
different shapes to collect
and expected number of trails will be $E(K_{C_n}$
.
Tabular results for this are bellow and here’s WolframAlpha computation for the same.
$N$ |
$C_N$ |
$H_{C_N}$ |
$C_N H_{C_N}$ |
---|---|---|---|
1 | 1 | 1 | 1 |
2 | 2 | 1.5 | 3 |
3 | 5 | 2.28333 | 11 |
4 | 14 | 3.25156 | 46 |
5 | 42 | 4.32674 | 182 |
6 | 132 | 5.4638 | 721 |
7 | 429 | 6.63984 | 2848 |
8 | 1430 | 7.84299 | 11215 |
9 | 4862 | 9.06652 | 44081 |
10 | 16796 | 10.3061 | 173102 |
11 | 58786 | 11.5589 | 679501 |
12 | 208012 | 12.8226 | 2667248 |
Above table clearly indicates that the test will quickly become unpractical in terms of required memory and running time,
but it can still be executed for small values of tree size - N
.
First let’s create test outline in Java and auxiliary table with Catalan numbers for convenience,
test will be parameterised along int
parameter treeSize
. Also because in case of wrong implementation
this test could run indefinitely a Timeout
will be placed as a guard against such scenario.
All Tree Shapes are possible test outline
// NOTE: Catalan numbers array is filled in correctly starting from idx = 0,
// while test will be using values from 1
private static final int[] catalan = new int[]{1, 1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796, 58786, 208012, 742900, 2674440, 9694845};
@Timeout(value = 60, unit = SECONDS)
@Property
void allTreeShapesArePossibleForGivenSize(@ForAll @IntRange(min = 1, max = 13) int treeSize) {
}
While doing the test we’ll need ability to use FactoradicConverter
and WeightSequence
,
they’ll be placed in test class as additional fields.
New fields for test logic
private final WeightSequence weightSequence = new WeightSequence();
private FactoradicConverter converter = new FactoradicConverter();
Last thing is logic of the test itself, it’ll need a Set
to keep track of the distinct shapes seen so far
and a loop to repeatedly draw a tree until all shapes expected are drawn.
All Tree Shapes are possible test logic
Set<Long> distinctShapes = new TreeSet<>();
int count = catalan[treeSize];
while (distinctShapes.size() < count) {
Tree<Integer> tree = generator.andThen(trimmer).apply(treeSize);
long treeId =
weightSequence
.andThen(converter)
.apply(tree);
assertThat(this.treeSize.apply(tree)).isEqualTo(treeSize);
distinctShapes.add(treeId);
}
assertThat(distinctShapes.size()).isEqualTo(count);
In this second unit test it is established that the implementation is following theoretical expectation and allows to get all possible tree shapes for a given size. This was achieved via additional helper functions of generating weight sequence for a tree and converting it to integer by reinterpeting as a factoradic.
Next note will cover testing expectation of a uniform probability distribution of all possible trees. Note that in case the probabilites were uneven this test would still pass, although it’d affect the running time. Intuitively running time distortion would be more pronounced the more uneven the real distribution would be.