In Order to test that every possible tree shape can be obtained we need to know 2 things:

- How many shapes can a Tree with N nodes take?
- How to tell shapes apart in a way that’s convenient for programming?

Question 1 is easy answered by checking values for Catalan numbers - wiki , OEIS

Question 2 can be approached via **ranking** which loosely means
assigning a position in a to a combinatorial object, while very often there’s no
natural way of ranking objects each sound ranking approach is unambiguous.

We’ll use approach with Weight Sequences described in Enumerating, Ranking and Unranking Binary Trees paper.

In short it works like this:

- Traverse nodes in infix order left to right
- For each node output the amount of
`leaf`

(i.e.`null`

) nodes that are reachable via its left link.

The interesting property of such weight sequences is that value at position `n`

is at most `n`

,
so then those weight sequences can be reinterpreted as integers using Factoradic number system

The boilerplate will be functional interface mapping from a `Tree<?>`

to a list of
integers, inside there’ll be creation of a list and recursive call to do the real work.

`WeightSequence class outline`

```
public class WeightSequence implements Function<Tree<?>, List<Integer>> {
@Override
public List<Integer> apply(Tree<?> tree) {
List<Integer> sequence = new ArrayList<>();
recursive(tree, sequence);
return sequence;
}
```

`sequence`

is used for keeping state so far in recursion, so it needs to be passed around to calls.

Inside recursive call idea is as follows. Since we’re about to count `null`

values we need to just return `1`

whenever null is encountered - that’s termination condition.

Now for a recursive call go to left and then to right now, while additionally append obtained value
to `sequence`

each time left recursive call returns.

Once right recursive call returns return sum of `leftCount`

and `rightCount`

.

`WeightSequence recursive part`

```
private int recursive(Tree<?> root, List<Integer> sequence) {
if (root == null) {
return 1;
}
int leftCount = recursive(root.left(), sequence);
sequence.add(leftCount);
int rightCount = recursive(root.right(), sequence);
return leftCount + rightCount;
}
```

In factoradic “number system” (it’s not a real number system because it doesn’t have finite amount of available digits)
for n-th position in a number a `$n$`

possible symbols can be chosen.

So for some digit `$x$`

in position `$y$`

we’d denote this as `$x_y$`

and then `$x \in [0, .., y-1]$`

.
This should be checked in code to avoid unsound inputs.

Also, because there’s only one possible value in 0th position we’ll skip it because it’s always the same and doesn’t contribute to final value.

Given following digits in their respective positions

$$d_1 d_2 d_3 d_4 ... $$

Then interpreting this as an integer works the following way

$$ X = d_1 1! + d_2 2! + ... + d_n n! $$

In more compact form:

$$ X = \sum_{i = 1}^{n} d_i i! $$

To make it practical for recreational implementation let’s use `Long`

type to represent `$X$`

,
but this requires additional consideration on data type sizes.

So MAX_VALUE for `Long`

is `$2^63-1`

and maximal value representable with `$N$`

factoradic digits is `$(N+1)! - 1`

. Following that we need to include check for a maximal value of N
for which `$(N+1)! - 1 < 2^63 - 1`

to avoid trying to convert sequences that won’t fit into `Long`

type.
The max value for which above inequality holds is `$N=19$`

, hence we’ll not convert sequences longer than 19 elements.

As usual let’s start with some boilerplate for functional interface.

`FactoradicConverter class outline`

```
public class FactoradicConverter implements ToLongFunction<List<Integer>>, Function<List<Integer>, Long> {
@Override
public Long apply(List<Integer> factoradic) {
// LOGIC GOES HERE
}
@Override
public long applyAsLong(List<Integer> value) {
return 0;
}
}
```

Now let’s translate reasoning from the previous section into logic inside `apply`

method.

`FactoradicConverter class outline`

```
public Long apply(List<Integer> factoradic) {
if(factoradic.size()>19)throw new ArithmeticException();
if(factoradic.stream().anyMatch(integer->integer< 0))throw new ArithmeticException();
long result=0L;
long base=1L;
for(int i=0;i<factoradic.size();i++){
base*=(i+1);
int curr=factoradic.get(i);
if(curr>i+1)throw new ArithmeticException("Wrong factoradic number");
result+=base*curr;
}
return result;
}
```

At this point we’re able to convert each `Tree`

instance into a positive integer using intermediate
weight sequence representation which is interpreted as factoradic number.

This way we can easily keep track of different Tree shapes seen so far in a unit test.

So the test inputs would be:

- The size
`$n$`

of a tree - The n-th Catalan number
`$C_n$`

Let’s step aside from the topic drawing random trees for a moment and just focus on a problem of collecting all
`$X$`

different equiprobable values from some collection. This problem is called Coupon Collectors Problem
and let’s call the number of trails needed to achieve full collection `$K_X$`

.

So we have:

$$ E(K_X) = X * H_X $$

Where `$H_X$`

is Xth Harmonic Number

Now our input parameter is size `$n$`

of a `Tree`

which results in `$C_n$`

different shapes to collect
and expected number of trails will be `$E(K_{C_n}$`

.

Tabular results for this are bellow and here’s WolframAlpha computation for the same.

`$N$` |
`$C_N$` |
`$H_{C_N}$` |
`$C_N H_{C_N}$` |
---|---|---|---|

1 | 1 | 1 | 1 |

2 | 2 | 1.5 | 3 |

3 | 5 | 2.28333 | 11 |

4 | 14 | 3.25156 | 46 |

5 | 42 | 4.32674 | 182 |

6 | 132 | 5.4638 | 721 |

7 | 429 | 6.63984 | 2848 |

8 | 1430 | 7.84299 | 11215 |

9 | 4862 | 9.06652 | 44081 |

10 | 16796 | 10.3061 | 173102 |

11 | 58786 | 11.5589 | 679501 |

12 | 208012 | 12.8226 | 2667248 |

Above table clearly indicates that the test will quickly become unpractical in terms of required memory and running time,
but it can still be executed for small values of tree size - `N`

.

First let’s create test outline in Java and auxiliary table with Catalan numbers for convenience,
test will be parameterised along `int`

parameter `treeSize`

. Also because in case of wrong implementation
this test could run indefinitely a `Timeout`

will be placed as a guard against such scenario.

`All Tree Shapes are possible test outline`

```
// NOTE: Catalan numbers array is filled in correctly starting from idx = 0,
// while test will be using values from 1
private static final int[] catalan = new int[]{1, 1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796, 58786, 208012, 742900, 2674440, 9694845};
@Timeout(value = 60, unit = SECONDS)
@Property
void allTreeShapesArePossibleForGivenSize(@ForAll @IntRange(min = 1, max = 13) int treeSize) {
}
```

While doing the test we’ll need ability to use `FactoradicConverter`

and `WeightSequence`

,
they’ll be placed in test class as additional fields.

`New fields for test logic`

```
private final WeightSequence weightSequence = new WeightSequence();
private FactoradicConverter converter = new FactoradicConverter();
```

Last thing is logic of the test itself, it’ll need a `Set`

to keep track of the distinct shapes seen so far
and a loop to repeatedly draw a tree until all shapes expected are drawn.

`All Tree Shapes are possible test logic`

```
Set<Long> distinctShapes = new TreeSet<>();
int count = catalan[treeSize];
while (distinctShapes.size() < count) {
Tree<Integer> tree = generator.andThen(trimmer).apply(treeSize);
long treeId =
weightSequence
.andThen(converter)
.apply(tree);
assertThat(this.treeSize.apply(tree)).isEqualTo(treeSize);
distinctShapes.add(treeId);
}
assertThat(distinctShapes.size()).isEqualTo(count);
```

In this second unit test it is established that the implementation is following theoretical expectation and allows to get all possible tree shapes for a given size. This was achieved via additional helper functions of generating weight sequence for a tree and converting it to integer by reinterpeting as a factoradic.

Next note will cover testing expectation of a uniform probability distribution of all possible trees. Note that in case the probabilites were uneven this test would still pass, although it’d affect the running time. Intuitively running time distortion would be more pronounced the more uneven the real distribution would be.