Aronson sequences (an ongoing research project): pt.2

Aronson sequences (an ongoing research project): pt.2

- 23 mins

Introduction

So it’s been a few wild months and I’ll be starting a PhD in Computer Vision real soon (wait, what?!). I’ll be takeing the time I have until then to keep working on blog-posts, as these have me packaging my thoughts in a way which focus and clarify them. This post is a continuation of the last one, as promised.

Generating Aronson sequences

If you need a reminder of past events, be sure to look back to the first part of this blog-post series.

The initial goal in this post is to answer in a satisfactory way the last question provided in the previous blog post, regarding the set of $(A_T(\to)\cap A_T(\gets))$. I thought up a few possible strategies for attempting to address this question.

The simplest one strategy would be the following: supposing we had all of the time in the world, we could begin to generate random integer sequences, and for each such sequence check whether it is in either $A_T(\to)$, or $A_T(\gets)$ or both. The only additional program this strategy would require then is a verifier.

A different, slightly more sophisticated possible strategy for finding $(A_T(\to)\cap A_T(\gets))$ is the following: suppose we had a magic black box which knows how to generate sequences from either set. We could then begin enumerating strings one-by-one from each set, keeping track of all enumerated sequences for each set and checking for each new sequence generated from either set whether it was already generated by the other set.

What are the problems with each of these approaches (which one would take longer? Which requires more memory?)

Let’s begin then with the task of writing an efficient verifier for $A_T(\to)$ and see which problems or concepts come up.

Verifier for Aronson sequences

A verifier is a computer program which provides ‘proof’ that a certain input is a solution to a given problem. More specifically, the infamous NP-class of computational problems can be characterised using properties of the verifier for a given problem (read more about this here). A simple example would be the problem of factorizing an integer, with a verifier in this case being a program which multiplies inputs and compares to the integer of choice. The following simple program constitutes a verifier for factorization

from functools import reduce
from operator import mul

def verifier(inp, *args):
    return reduce(mul,args) == inp

>>> verifier(30,3,2,5)
True
>>> verifier(30,3,2,4)
False

In the Aronson case, given an input sequence, a verifier would need to generate a sentence from the sequence and check that the sequence elements map to occurences of the letter of choice (‘T’ in this case). Such a verifier has approx. $\Theta(N)$ time complexity, as it would need to generate the relevant ordinal for each element in the sequence, and concatenate this to what was previously generated. We ignore additional operations (ordinal representations grow in $\Theta( \log k)$ for input k, as shown in the previous blog-post, meaning concatenation doesn’t happen in constant time) for the sake of simplicity.

A simple verifier which “gets the job done” would be the following:

def verifier1(letter, indices, forward=True):

    sequence = get_sequence(letter, indices, forward)
    # Reverse the sequence if forward is False
    if not forward:
        sequence = sequence[::-1]
    # Check if the input letter matches at the specified indices
    try:
        return all(sequence[i - 1] == letter for i in indices)
    except IndexError:
        return False

Where we treat the subroutine get_sequence() as a black box function which generates the underlying sentence from a sequence, ignoring the actual implementation of this function (it is not particularly interesting or relevant for our purposes). The purpose of the try/except blocks is to prevent errors resulting from sequences with elements larger than the length of the resulting generating sentence (i.e. think “T is the first, one hundredth letter”).

What else can be said about this specific implementation? The all() function does lazy evaluation (see this), with the bulk of computation being done by the get_sequence() subroutine. In the case of wrong inputs (say an extremely long input sequence with a wrong element at the beginning), this function would then be extremely inefficient! We can take this into account by implementing a second verifier which adresses this shortcoming.

Second verifier

# consts
START = " is the "
END = " letter"

def verifier2(letter, indices, forward=True):
    # Determine the initial sequence based on the direction (forward or reverse)
    s = (letter + START).replace(" ", "") if forward else END.replace(" ", "")[::-1]
    for idx in indices:
        # Add the ordinal to the sequence
        s += (n2w(idx) if forward else n2w(idx)[::-1])
        try:
            # Check if the letter matches the expected position
            if s[idx - 1] != letter:
                return False
        except IndexError:
            # If the index is out of range, return False
            return False
    return True

Now let’s write a simple script for which the second verifier will run much faster than the first, by generating the first n elements in Aronson’s sequence (we’ll use a helper function for this), and then “contaminating” these by planting consecutive elements [..,x,x+1,..] somewhere within the sequence. This would correspond to 2 consecutive ‘t’s within the generating sentence- making the sentence most-likely false and leading to verifier2() exiting early. Here’s some intuition for this methodology:

#Aronson
def aronson(n, letter, forward=True, inp=None):
    return list(islice(agen_better(letter, forward=forward, inp=inp), n - len(inp)))

>>> seq = aronson(100,'t') 
>>> intervals = [seq[i] - seq[i-1] for i in range(1,len(seq))]
>>> sum([interval == 1 for interval in intervals])
#Number of consecutive elements within first 100 terms of Aronson's sequence 
0

And here’s then is the test script with the use-case for which verifier2() runs much faster than verifier1()

def test_verifiers(n, letter *verifiers):
    #generate first n indices in forward Aronson sequence
    seq = aronson(n,letter)
    execution_times = {}

    # Loop over verifiers
    for _ in range(n):
        # Generate a 'contaminated' copy for both verifiers at each iteration
        seq_copy = copy.deepcopy(seq)
        random_int = random.randint(1, n - 1)
        #modify original sequence
        seq_copy[(random_int - 1):(random_int + 1)] = range((seq_copy[random_int] - 1), (seq_copy[random_int] + 1))

        # Loop over provided verifiers
        for verifier in verifiers:
            start_time = timeit.default_timer()
            verifier(letter, seq_copy)
            end_time = timeit.default_timer()

            # Store the execution time for the verifier
            execution_times.setdefault(verifier.__name__, []).append(end_time - start_time)

    # Calculate and print average times
    for verifier_name, times in execution_times.items():
        avg_time = sum(times) / n
        print(f"Average execution time for {verifier_name}: {avg_time:.3f} seconds")

>>> test_verifiers(256, 't', verifier1, verifier2)
Average execution time for verifier1: 0.009 seconds
Average execution time for verifier2: 0.004 seconds

We now change these function signature names to verifier_slow and verifier_fast as a result. But wait, there’s a problem with verifier_fast() we didn’t talk about. Notice the following

>>> verifier_fast('l',[1,23])
False

where we already established in the previous blogpost that “L is the first, twenty-third letter” is a correct sentence! This means then that verifier_fast() doesn’t work correctly for forward-referring sequences (how about verifier_slow())? We use then the following final verifier version

def verifier(letter, indices, forward=True, forward_referring=True):
    if forward_referring:
        #slower, correct for edge-cases
        return verifier_slow(letter, indices, forward)
    else:
        #faster, demands building sentence sequentially from ordinals one by one
        return verifier_fast(letter, indices, forward)

Notice that we can utilize verifier_fast() only if we have prior-knowledge that the sequence of question is not forward-referring, as the process of checking this would be as computationally expensive as the runtime of verifier_slow() (why?). We take on a ‘pessimistic’ view at the moment, that we may potentially investigate forward-referring sequences, and use the faster version only when we are assured that it is correct.

Verifier use-cases

Generating variations on Aronson sequences from input

Our first use-case for our implemented verifier has to do with extending the generating algorithm proposed in the previous blogpost . We saw already that this algorithm looks backwards to generate new elements, while we take into account additional self-referring (‘T is the tenth, twelfth,…’) and forward-referring sequences in our characterisation of generalized Aronson sequences. We will talk more into depth in the next blog-post (!) about possible approaches for generating such sequences, as this is a non-trivial task, but before doing so, we notice the following:

Say we have an initial self-referring sequence which is not self-contained, for example [10,12] (check this!). Then this sequence can be extended indefinitely with backward-referring elements in a way which could not impact the resulting sequence’s correctness (is this true also in the case of forward-referring sequences? Why not?).

Using the previous example, we would get:

‘T is the tenth (self-referring), twelfth (self-referring), seventeenth (backward-referring), twenty-fourth (backward-referring).. letter’

This allows us to generate new Aronson sequences using our original algorithm, by starting from some input sequence and then generating new elements on the fly from there!

Here’s how this could be done

# class for error messages related to the Verifier
class VerifierError(Exception):
    def __init__(self, message="Verifier failed", input_data=None):
        self.message = message
        self.input_data = input_data  # Store the input that caused the failure
        super().__init__(self.message)

    def __str__(self):
        # Customize the string representation to include input data
        return f"{self.message}: {self.input_data}"

#updated Aronson
def aronson(n, letter, forward=True, inp=None, forward_referring=True):
    if inp is None:
        return list(islice(agen_better(letter, forward=forward), n))
    else:
        return inp + list(islice(agen_better(letter, forward=forward, inp=inp, 
            forward_referring=forward_referring), n - len(inp)))

#updated generating function
def agen_better(letter, forward=True, inp=None, forward_referring = True):
    
    def update_from_inp(forward, inp, letter, s):

        if not verifier(letter, inp, forward, forward_referring):
            raise VerifierError(input_data=inp)
        for i in inp:
            s += (n2w(i) if forward else n2w(i)[::-1])
        idx = max(inp)  # inp[-1] if monotonically rising
        s = s[idx:]
        return s, idx

    # Initialize sequence based on direction
    s = (letter + START).replace(" ", "") if forward else END.replace(" ", "")[::-1]
    idx = 0
    if inp:
        # update to given point
        s, idx = update_from_inp(forward, inp, letter, s)
    while True:
        #we saw this already
        try:
            # Find the next occurrence of the letter. If not inside s.find() returns -1, so repeats infinitely
            idx_rel = 1 + s.find(letter)
        except ValueError:
            # safety measure
            break  # Stop if letter is not found
        idx += idx_rel
        yield idx
        # Adjust the sequence for the next iteration
        s = s[idx_rel:] + (n2w(idx) if forward else n2w(idx)[::-1])

Notice we use the verifier to check that the initial input sequence is correct, as otherwise the resulting sequence won’t be correct either.

Here’s a possible way to use the extended algorithm:

>>> n = 5
>>> letter = 't'
>>> inp_seqs = [[], [1, 11], [10, 12]]
>>> for inp in inp_seqs:
#       none of these sequences is forward-referring
        extend = aronson(n,letter,forward=True,inp=inp,forward_referring=False)
        print_sequence(letter, extend)

#Aronson
t is the first, fourth, eleventh, sixteenth, twenty-fourth letter

#First variation generated by removing second element 
t is the first, eleventh, eighteenth, twenty-fourth, twenty-eighth letter

#Second variation generated by giving self-referring input, converges with previous 
t is the tenth, twelfth, seventeenth, twenty-fourth, twenty-eighth letter

And we can now generate as many variations as we want on Aronson’s sequence using the following procedure


def generate_variations(n, letter, forward=True, inp=None):
    # where to start generating variations from
    orig = aronson(n, letter, forward, inp)
    start_idx = -1
    #keep track of generated sequences and which idxs were removed
    stack = [(orig, start_idx)]
    while stack:
        cur, cur_idx = stack.pop()
        yield cur
        # only remove indices forward
        for idx in range(cur_idx + 1, n - 1):
            # slice out current index to generate a new sequence
            inp = cur[:idx] + [cur[idx + 1]]
            try:
                #extend the series to length n
                extend = aronson(n, letter, forward, inp)
            except VerifierError:
                # verifier returned False- meaning sequence is incorrect and can't be extended
                continue
            stack.append((extend, idx))

>>> seen = []
>>> [print(seq) for seq in generate_variations(5, 't')
       if seq not in seen]

#these are all valid, Aronson sequences
[1, 4, 11, 16, 24]
[1, 4, 11, 24, 26]
[1, 4, 16, 21, 25]
[1, 4, 16, 25, 27]
[1, 11, 18, 24, 28]
[1, 11, 18, 28, 30]
[1, 11, 24, 30, 32]
[1, 11, 24, 32, 36]
[4, 11, 19, 25, 29]
[4, 11, 19, 29, 31]
[4, 11, 25, 30, 32]
[4, 11, 25, 32, 36]

Finding the intersection of sets

Now we can use our verifier to look for some sequences in both sets of sequences $(A_T(\to)\cap A_T(\gets))$! Going with the previously mentioned first strategy, a good place to start would be to look for more singleton sequences, as the search space over these sequences is bounded.

Let’s understand this intuitively: there must be a minimal m for which the following two hold:

  1. $n>= m -> len(ordinal(n)) >= len(ordinal(m))$
  2. $m > len(“t is the” + ordinal(m) + “letter”)$

with this m constituting an upper bound on our search for a singleton sequence, as any larger n>=m cannot be mapped to an index within the sentence.

We can find a minimal upper bound using the following code, where we start from a heuristic upper bound (m=100) and then ‘fix’ it to something smaller.

START = " is the "
END = " letter" #approximately same length as START
def find_minimal_upper_bound():
    lower = len(START.replace(" ", ""))
    upper = 100  # For n >= 100, len(ordinal(n)) >= len("one-hundred")

    for m in range(2 * lower, upper):
        len_ord_m = len(n2w(m))
        # Check that all larger ordinals are at least as long
        is_candidate = all([len(n2w(k)) >= len_ord_m for k in range(m + 1, upper)])
        # Ensure m is long enough relative to surrounding text
        is_long_enough = (m >= 2 * lower + len_ord_m)

        if is_candidate and is_long_enough:
            return m
>>> find_minimal_upper_bound()
40

Now that we have this upper-bound, we can bound the search space and use our verifiers to find singleton sequences within the intersection of sets $(A_T(\to)\cap A_T(\gets))$

def find_singletons(letter):
    upper = find_singleton_upper_bound()
    return {(i,) for i in range(upper) if verifier(letter, (i,), True) and verifier(letter, (i,), False)}

>>> singletons = find_singletons('t')
>>> singletons
{(19,), (4,)}
>>> for seq in singletons:
#       safety measure
        assert(verifier(letter, seq, True))
        assert(verifier(letter, seq, False))
#No AssertionError

Surprise: we’ve found a new singleton sequence ([19]) in $(A_T(\to)\cap A_T(\gets))$! We notice that this sequence is forward-referring in both directions. However, the search range become in fact unbounded when we look at longer and longer sequences, meaning that the same strategy is unfeasible if we want to find arbitrarily large non-singleton sequences. Let’s now rephrase the original question:

To attempt to answer this question we will use an algorithm based on the second previously discussed strategy:

The motivation for this algorithm is the following:

>>> n=5
>>> letter = 't'
>>> sum([verifier(letter,s) for s in power_set(aronson(letter,n))])/2**n
0.40625

meaning that a sufficiently large percentage of sub-sequences of the Aronson sequence are correct Aronson sequences themselves! The code implementation looks like this, where we always choose to use the slower verifier to take into account forward-referring sequences as well.

def intersect_aronson_sets(n, letter):
    def power_set(seq):
        return set(chain.from_iterable(combinations(seq, r) for r in range(len(seq) + 1)))

    # Initialize sets to accumulate power sets for both directions
    power_sets = [set(), set()]
    # Create generators for both directions
    gen_forward = generate_variations(n, letter, True)
    gen_backward = generate_variations(n, letter, False)
    for seq_fwd, seq_bwd in zip(gen_forward, gen_backward):
        power_sets[0].update(power_set(seq_fwd))
        power_sets[1].update(power_set(seq_bwd))

    # Compute intersection of both power sets
    common_subsets = power_sets[0] & power_sets[1]
    # Filter by verifier
    return {i for i in intersection if verifier(letter,i,forward=True) and verifier(letter,i,forward=False)}

>>> print(intersect_aronson_sets(n, letter))
{(19,), (4,), ()}

No luck, it seems that we will have to work harder than this to find non-singleton sequences in the intersection of sets (if such exist at all).

Conclusion

We discovered a valuable tool for our search problem in the form of a verifier, and have shown how it helps us generate new, correct Aronson sequences, as well as received more insight into the nature of the opening enquiry, about the intersection of sets of Aronson sequences. Not bad for a day’s work.

Until next time!

comments powered by Disqus