Aronson sequences (an ongoing research project): pt.1

Monday. January 27, 2025 - 15 mins

Introduction

Wow! A new blog-post sequence, after I didn’t even come close to finishing the last one (still have like 7 posts to write about the Fourier Transform).

I’m starting this one though in order to document in real-time what feels like my first dive into research in the realms of theoretical computer science/theory of computation/etc… This is very exciting for me, and I think of the act of documentation as equivalent to new parents taking a picture of their newborn every day, or documenting every new word which comes out of their toddler’s mouth.

Anyway, the stimulus to this new sequence was the following event: my parents called me the other day in order to throw my stuff out of a storage space, among which is included the book “Metamagical Themas” (recommended!) by Douglas Hofstadter. I used to be kind of obsessed and fascinated by this guy around 4 years ago, and particularly remember an essay of his about self-referential sentences, one of which is known as the Aronson sequence, and is in fact the subject at hand. One of my early programming projects was writing a failed implementation of this thing in Python, or more specifically asking for help on Stack Overflow.

Alrighty, let’s define Aronson’s sequence first thing!

Aronson’s Sequence

Aronson’s sequence is generated from the following sentence

“T is the first, fourth, eleventh, sixteenth… letter (in this sentence, not counting spaces or commas).”

For the sake of brevity we will omit the part in parenthesis when talking about this sentence from now on, as it is only a syntactical clarification as to what the sentence ‘means’. Everything which follows in this blog-post will then be equally true when taking the clarification into account.

The sequence itself is then the list of enumerated ordinals generated by the sentence

1, 4, 11, 16…

Some observations about this sentence and its underlying sequence:

It is self-referential, in the sense that it counts the appearances of the letter ‘t’ within itself.
Moreover, it attempts to count every occurence of the letter ‘t’. We refer to such a sequence as being self-contained, and will give a precise definition of this term shortly.
The sequence is semantically correct, or ‘meaningful’ from the natural language perspective.
Each ordinal number in the generating sentence refers backwards, meaning to an earlier index where the letter ‘t’ appeared.

This last fact gives us some intuition as to whether the generating sentence (and sequence) are infinite or not. Intuitively- if the gap between ordinals and their referenced index grows indefinitely as the sentence expands, then the sequence must be infinite, since the probability of an arbitrarily long substring lacking ‘t’s approaches zero.¹.

Self-containment: a precise definition

In the case that the Aronson sequence is infinite, then it cannot count all occurrences of ‘t’ within itself, making self-containment impossible. However, we can extend the definition of this notion to the infinite case, using the idea of sentence prefixes.

We define

The prefix of a generating sentence is any finite left-to-right substring of the sentence.

In the case of Aronson’s sequence, “T”, “T is the”, “T is the first, fourth, eleventh,” are all such prefixes

Now we define self-containment as the property that for any prefix of the generating sentence the sequence can be extended arbitrarily to capture all ‘t’s within that prefix.

This broader definition allows us now to treat Aronson’s sequence as being self-contained, while staying true in case the sequence is in fact finite!

Point of reflection: what about the ‘t’s occurring within the word “letter”? Given that the sequence is infinite, are those captured at any point? Does this matter for our definition of self-containment?

Implementation in Python

The following wonderfully simple code, due to Michael Branicky, generates the Aronson sequence

from num2words import num2words
from itertools import islice

#helper function for stripping spaces, commas and dashes from ordinals
def n2w(n):
    os = num2words(n, ordinal=True).replace(" and", "")
    return os.replace(", ", "").replace(" ", "").replace("-", "")

def agen(): # generator of terms
    s, idx = "tisthe", 0
    while True:
        idx_rel = 1 + s.index("t")
        idx += idx_rel
        yield idx
	#Look for next index
        s = s[idx_rel:] + n2w(idx)

#enumerate first N terms of the sequence
N=10
print(list(islice(agen(), N)))

This code yields the result

[1, 4, 11, 16, 24, 29, 33, 35, 39, 45]

Can this code give us any intuition as to whether the Aronson sequence is infinite or not? Going back to the notion of the gap between an ordinal and the index it refers to, and the effect of this on the ‘longevity’ of the sequence, we can simply inspect what happens to the substring s over time, as well as the number of ‘t’s within each such substring! We compute the first 10000 terms of the sequence, revealing

#A(n) (len(s), #t in s)
A(1): (6, 2),
A(10): (60, 13),
A(100): (1239, 127),
A(1000): (22878, 2452),
A(10000): (289786, 36618),
A(100000): (3848300, 398603)

Meaning both the substring s and the number of t’s within the substring are monotonically increasing. Looking at the ratio between the two

#A(n) num(t's)/len(s)
A(1): 33.33%
A(10): 21.67%
A(100): 10.25%
A(1000): 10.72%
A(10000): 12.64%
A(100000): 10.36%

Reveals that it stays more or less constant (with a lower bound of 1/10), meaning that both s and the number of ts have approximately the same growth order of O(n). Seems like Aronson’s sequence won’t be ending anytime soon…

Generalizations of Aronson’s sequence

Let’s start thinking of ways to generalize Aronson’s sequence! The most obvious generalization to begin with is to other letters in the alphabet. This gives a ‘family’ (set) of Aronson sequences over all letters, where we demand that each such sequence be self-contained. Note that for some letters the sequence will consume itself immediately (anything outside of the letters ‘isthelr’ specifically), making these ‘outliers’ boring in a sense (“J is the first letter”).

However, some of the other inliers yield interesting generating sentences! For example, the following generating sentence

‘L is the first, twenty-third letter.’

is semantically true, self-contained, and one of its ordinals looks-forward to an index beyond itself.

We will define such sequences as ‘forward-referring’, whereas Aronson’s sequence was ‘backwards-referring’ (look back to observations about the sequence if necessary).

Quite obviously, expanding agen() to generate sequences over different letter, via

def agen(letter): # generator of terms
    s, idx = letter + "isthe", 0
    while True:
        idx_rel = 1 + s.index(letter)
        idx += idx_rel
        yield idx
	#Look for next index
        s = s[idx_rel:] + n2w(idx)

is not enough to generate this self-contained sequence, because agen(letter) generates new terms only by looking backwards, meaning we must come up with a different algorithm to generate forward-referring sequences².

Next, let’s stay for a moment with the letter ‘t’, but relax other constraints to obtain a larger family of Aronson-based sequences, via the following definition

$$A_T(\to) = \left\{ s \mid s = "\text{T is the } X, Y, Z \dots \text{ letter}" \text{ where X, Y, Z are ordinals,} \\ \text{and the sentence is semantically correct with regard to the arrow-direction} \right\} $$

This set now contains any prefix of the original generating sentence (‘T is the first letter’, ‘T is the first, fourth letter, etc…’), and more interestingly- it allows us to generate entirely new sequences!

Variations on Aronson’s sequence

Two methods to generate variations on Aronson’s sequence are now proposed:

First, we can begin with a subset of the original, such as

t is the first, (skip fourth), eleventh, eighteenth, twenty-fourth, twenty-eighth, thirtieth, thirty-fourth, fortieth, forty-second, forty-sixth, fifty-second, fifty-fourth letter

Which diverges from the original sequence!

Or we can similarly flip two terms in a way which does not break the semantic correctness

t is the first, eleventh, fourth, eighteenth, twenty-fourth, thirtieth, thirty-fourth, thirty-sixth, fortieth, forty-sixth, forty-eighth, fifty-second, fifty-fifth letter

Notice that this last sequence is most probably also infinitely long, but not monotonically-increasing.

The notion that we can always omit or flip terms in backwards-referring sequence reveals that there are infinite variations on the Aronson sequence, meaning $|A_T(\to)| \geq \aleph_0$ !

Self-referring and converging sequences

Another interesting example of an infinite generating sentence is

t is the tenth, twelfth, seventeenth, twenty-fourth, twenty-eighth, thirtieth… letter

This sentence is obviously not self-contained, and seems to be a ‘parallel’ sequence to Aronson’s, in the sense it never seems to converge with the original. Notice also that the first ordinal number here (‘tenth’) is self-referring, different from either the backwards or forwards-referring sequences previously presented.

It does however converge with the following variation:

t is the first, eleventh, eighteenth, twenty-fourth, twenty-eighth, thirtieth… letter

Where we refer to both of these sequences as having the same ‘tail’. At some point we will attempt to use the notion of tails to construct equivalence classes over the set of generalized Aronson sequences.²

Reverse Aronson sequence

In case you got bored with different classes and types of generating sequences in the set of generalized Aronson sequences, here’s another new idea to shake things up: let’s flip the direction of the arrow in the set definition to define a new set of sequences! Thus our new set is $A_T(\gets)$, obtained by starting at the end of the generating sentence, counting the letter of choice backwards, and iteratively inserting ordinals in a way which allows for the generating sentence to be semantically correct.

The self-contained reverse Aronson sequence is then

“(counting backwards) t is the …thirteenth, eleventh, fourth, third, letter.”

With the parenthesis being once more a clarification to make the generating sentence semantically correct. This generates the sequence

3, 4, 11, 13…

Question: is this generating sentence also infinite?

Notice that this sequence can in fact be easily generated with the previous algorithm agen() via a simple modification. The resulting implementation would be

def agen_rev():
    s, idx = 'letter', 0
    while True:
        try:
	#count from the end
            idx_rel = 1 + s[::-1].index('t')
        except ValueError:
            break
        idx += idx_rel
        yield idx
	#slice part which was scanned
        s = n2w(idx) + s[:-idx_rel:]

#enumerate first N terms of the sequence
N=10
print(list(islice(agen_rev(), N)))
[3, 4, 11, 16, 24, 29, 33, 35, 39, 45]

However, this code is highly inefficient, as we’ve seen that the substring s grows linearly in the number of terms to be computed! Good thing we did that previous analysis.

Better implementation of agen_rev()

A more efficient implementation is to flip only the ordinal number computed at each iteration and add it to what has already been computed. This means that the only operational overhead is flipping the ordinals added to s at each iteration.

Bad news: these also increase monotonically with respect to the number of digits, meaning they are unbounded (ten, one-hundred and ten, one-thousand one-hundred and ten, etc…). Let’s discover the order of growth once more by investigating the maximum length of an ordinal number within a given range:

for i in range(1, 12):
    val_low, val_high = 10**(i-1), 10**i
    print((val_low, val_high), "->", max([len(n2w(i)) for i in range(val_low, val_high)]))

#range -> #max(length(ordinal))
(1, 10) -> 7
(10, 100) -> 14
(100, 1000) -> 26
(1000, 10000) -> 40
(10000, 100000) -> 47
(100000, 1000000) -> 59

Meaning that this growth rate is sublinear, or much better! This makes sense when we think of the growth of ordinal representation being equivalent to decimal representation (the number of bits necessary to represent a number). We can guess then that the order in this case is O(logn).

We can now modify the original agen() function to compute the Aronson sequence as efficiently in either direction, using simple branching logic

# Extension of Michael Branicky's implementation in OEIS, allowing for generating backwards sequences
# see https://oeis.org/A005224
def agen_better(forward=True):
    # Initialize sequence based on direction
    s = "tisthe" if forward else "rettel"
    idx = 0
    while True:
        try:
            # Find the next occurrence of the letter. If not inside s.find() returns -1, so repeats infinitely
            idx_rel = 1 + s.find('t')
        except ValueError:
            break  # Stop if letter is not found
        idx += idx_rel
        yield idx
        # Adjust the sequence for the next iteration
        s = s[idx_rel:] + (n2w(idx) if forward else n2w(idx)[::-1])

Let’s verify that this algorithm runs faster than the previous by timing how long each implementation takes to generate N=1000 terms of the reverse Aronson sequence. The results are

islice(agen_rev(), N) runtime: 0.050075 seconds
islice(agen_better(False), N) runtime: 0.040534 seconds

Meaning the new implementation is approximately 20% better. Notice that the two algorithms have different orders of complexity (O(n) vs. O(logn)), meaning the performance gap will grow with N.

Conclusion

So long for the first post in this sequence. I’ll be updating it daily as I discover new things about Aronson sequences, implement more efficient algorithms, etc…

Here’s are some question to conclude:

Can we generate a sequence in $A_T(\to)$ which is monotonically-decreasing and infinitely long?
$\text{Does } |A_T(\gets)| \geq \aleph_0$?
$\text{What is } (A_T(\to)\cap A_T(\gets))$?

So far we know ‘T is the fourth letter.’ is in both sets, but there might just be a spicier sequence than that (or infinitely many)!

See ya next time!

Note to self: prove this does in fact happen rigorously in the near future. ↩
Will address this in a future post ↩ ↩²