Algorithm Problem #2 Follow-Up #3: Testing that every tree shape can be obtained

Context

In Order to test that every possible tree shape can be obtained we need to know 2 things:

How many shapes can a Tree with N nodes take?
How to tell shapes apart in a way that’s convenient for programming?

Question 1 is easy answered by checking values for Catalan numbers - wiki , OEIS

Question 2 can be approached via ranking which loosely means assigning a position in a to a combinatorial object, while very often there’s no natural way of ranking objects each sound ranking approach is unambiguous.

We’ll use approach with Weight Sequences described in Enumerating, Ranking and Unranking Binary Trees paper.

In short it works like this:

Traverse nodes in infix order left to right
For each node output the amount of leaf (i.e. null) nodes that are reachable via its left link.

The interesting property of such weight sequences is that value at position n is at most n, so then those weight sequences can be reinterpreted as integers using Factoradic number system

Weight Sequence

The boilerplate will be functional interface mapping from a Tree<?> to a list of integers, inside there’ll be creation of a list and recursive call to do the real work.

WeightSequence class outline

public class WeightSequence implements Function<Tree<?>, List<Integer>> {

    @Override
    public List<Integer> apply(Tree<?> tree) {
        List<Integer> sequence = new ArrayList<>();

        recursive(tree, sequence);

        return sequence;
    }

sequence is used for keeping state so far in recursion, so it needs to be passed around to calls.

Inside recursive call idea is as follows. Since we’re about to count null values we need to just return 1 whenever null is encountered - that’s termination condition.

Now for a recursive call go to left and then to right now, while additionally append obtained value to sequence each time left recursive call returns.

Once right recursive call returns return sum of leftCount and rightCount.

WeightSequence recursive part

private int recursive(Tree<?> root, List<Integer> sequence) {
    if (root == null) {
       return 1;
    }

    int leftCount = recursive(root.left(), sequence);
    sequence.add(leftCount);
    int rightCount = recursive(root.right(), sequence);

    return leftCount + rightCount;
}

Interpreting Weight Sequences as Integers

In factoradic “number system” (it’s not a real number system because it doesn’t have finite amount of available digits) for n-th position in a number a $n$ possible symbols can be chosen.

So for some digit $x$ in position $y$ we’d denote this as $x_y$ and then $x \in [0, .., y-1]$ . This should be checked in code to avoid unsound inputs.

Also, because there’s only one possible value in 0th position we’ll skip it because it’s always the same and doesn’t contribute to final value.

Given following digits in their respective positions

$$d_1 d_2 d_3 d_4 ... $$

Then interpreting this as an integer works the following way

$$ X = d_1 1! + d_2 2! + ... + d_n n! $$

In more compact form:

$$ X = \sum_{i = 1}^{n} d_i i! $$

To make it practical for recreational implementation let’s use Long type to represent $X$ , but this requires additional consideration on data type sizes.

So MAX_VALUE for Long is $2^63-1 and maximal value representable with $N$ factoradic digits is $(N+1)! - 1. Following that we need to include check for a maximal value of N for which $(N+1)! - 1 < 2^63 - 1 to avoid trying to convert sequences that won’t fit into Long type. The max value for which above inequality holds is $N=19$ , hence we’ll not convert sequences longer than 19 elements.

Implementing Factoradic Converter

As usual let’s start with some boilerplate for functional interface.

FactoradicConverter class outline

public class FactoradicConverter implements ToLongFunction<List<Integer>>, Function<List<Integer>, Long> {

    @Override
    public Long apply(List<Integer> factoradic) {
        // LOGIC GOES HERE
    }

    @Override
    public long applyAsLong(List<Integer> value) {
        return 0;
    }
}

Now let’s translate reasoning from the previous section into logic inside apply method.

FactoradicConverter class outline

public Long apply(List<Integer> factoradic) {
    if(factoradic.size()>19)throw new ArithmeticException();
    if(factoradic.stream().anyMatch(integer->integer< 0))throw new ArithmeticException();

    long result=0L;

    long base=1L;

    for(int i=0;i<factoradic.size();i++){
    base*=(i+1);

    int curr=factoradic.get(i);
    if(curr>i+1)throw new ArithmeticException("Wrong factoradic number");

    result+=base*curr;
    }

    return result;
}

Putting this together into unit test

At this point we’re able to convert each Tree instance into a positive integer using intermediate weight sequence representation which is interpreted as factoradic number.

This way we can easily keep track of different Tree shapes seen so far in a unit test.

So the test inputs would be:

The size $n$ of a tree
The n-th Catalan number $C_n$

Let’s step aside from the topic drawing random trees for a moment and just focus on a problem of collecting all $X$ different equiprobable values from some collection. This problem is called Coupon Collectors Problem and let’s call the number of trails needed to achieve full collection $K_X$ .

So we have:

$$ E(K_X) = X * H_X $$

Where $H_X$ is Xth Harmonic Number

Now our input parameter is size $n$ of a Tree which results in $C_n$ different shapes to collect and expected number of trails will be $E(K_{C_n}$ .

Tabular results for this are bellow and here’s WolframAlpha computation for the same.

$N$	$C_N$	$H_{C_N}$	$C_N H_{C_N}$
1	1	1	1
2	2	1.5	3
3	5	2.28333	11
4	14	3.25156	46
5	42	4.32674	182
6	132	5.4638	721
7	429	6.63984	2848
8	1430	7.84299	11215
9	4862	9.06652	44081
10	16796	10.3061	173102
11	58786	11.5589	679501
12	208012	12.8226	2667248

Above table clearly indicates that the test will quickly become unpractical in terms of required memory and running time, but it can still be executed for small values of tree size - N.

First let’s create test outline in Java and auxiliary table with Catalan numbers for convenience, test will be parameterised along int parameter treeSize. Also because in case of wrong implementation this test could run indefinitely a Timeout will be placed as a guard against such scenario.

All Tree Shapes are possible test outline

// NOTE: Catalan numbers array is filled in correctly starting from idx = 0,
// while test will be using values from 1
private static final int[] catalan = new int[]{1, 1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796, 58786, 208012, 742900, 2674440, 9694845};

@Timeout(value = 60, unit = SECONDS)
@Property
void allTreeShapesArePossibleForGivenSize(@ForAll @IntRange(min = 1, max = 13) int treeSize) {

}

While doing the test we’ll need ability to use FactoradicConverter and WeightSequence, they’ll be placed in test class as additional fields.

New fields for test logic

private final WeightSequence weightSequence = new WeightSequence();
private FactoradicConverter converter = new FactoradicConverter();

Last thing is logic of the test itself, it’ll need a Set to keep track of the distinct shapes seen so far and a loop to repeatedly draw a tree until all shapes expected are drawn.

All Tree Shapes are possible test logic

Set<Long> distinctShapes = new TreeSet<>();

int count = catalan[treeSize];

while (distinctShapes.size() < count) {
Tree<Integer> tree = generator.andThen(trimmer).apply(treeSize);

long treeId =
weightSequence
.andThen(converter)
.apply(tree);

assertThat(this.treeSize.apply(tree)).isEqualTo(treeSize);

distinctShapes.add(treeId);
}

assertThat(distinctShapes.size()).isEqualTo(count);

Summary

In this second unit test it is established that the implementation is following theoretical expectation and allows to get all possible tree shapes for a given size. This was achieved via additional helper functions of generating weight sequence for a tree and converting it to integer by reinterpeting as a factoradic.

Next note will cover testing expectation of a uniform probability distribution of all possible trees. Note that in case the probabilites were uneven this test would still pass, although it’d affect the running time. Intuitively running time distortion would be more pronounced the more uneven the real distribution would be.