Confident Array Slicing and Sizing

04 Nov, 2023

You should normalize how you work with array indexes. It's one of those situations where you should adopt, learn, and memorize a convention. It's useful in problems involving the traversal and manipulation of arrays and other structures that rely on indexes. It allows you to be consistent and confident about your edge cases.

Without a convention, you repeatedly have to think about off by one situations, midpoints, and boundaries, whenever you perform such operations. Whereas with predictability, you don't linger over such details.

Quick, what's the middle index of an array containing 11 items?

If you're clever, you visualized this simple case and observed that there will be 5 items on either side of the sixth item. If you're a programmer used to work with the popular zero-based indexing, it might take you some additional fractions of a second to shift the sixth item to index 5. You might even come up with an off-the-cuff heuristic of how to determine the midpoint of any odd sized array: (size - 1)/2 or more precisely floor(size/2).

But what if I asked you to find the midpoint of a random slice in the same array, when only given the slice's starting index (which may not be at the beginning of the array) and its length? You probably have a vague idea of the path leading to a solution. You know the general idea behind a formula: start + size/2, but because of off-by-one situations, you might be unsure of the specifics.

Similar uncertainty may creep up and haunt your mental arithmetic when asked any of the following questions about a slice:

Given indexes of its first and last item, how many items does it contain?
Given its size and its last item's index, what's the slice's starting index?
Given its size and the index of its first item, what's the slice's midpoint?

These questions aren't difficult to answer. You could probably instinctively develop the correct idea behind the steps leading to a solution, but you also know that there may be more considerations to have regarding indices and inclusion or exclusion of boundaries.

By adopting and sticking to a convention, you will make manipulating arrays and slices easier on yourself.

Given an array A of arbitrary range going from 0 to z and a slice S of A, going from index x to y inclusively, such that x >= 0 and y <= z

A       S
{ 0 ... [ x ... y ] ... z }

let's define by:

first: the index (on the array) of the slice's first item (i.e. x).
last: the index (on the array) of the slice's last item (i.e. y).
stop: the position directly after last. i.e. stop = last + 1, which effectively excludes stop from the slice and makes it generally non-addressable (i.e. it should be considered out-of-bounds) in the context of a problem.
start: synonymous to first, but serving as a terminological counterpart to stop.

A       S
{ 0 ... [ start ... last ] stop ... z }

From the above definitions, we'll mostly use start and stop as they are more useful for the remaining manipulations. Thus, if you're only given first and last in a problem, make sure to redefine them in terms of start and stop, or at least add the latter ones.

start = first
stop = last + 1

Also, note that the terminology start and stop is simply my expression of the ideas above. It's a personal convention that I'm borrowing from Python. Feel free to keep or change it to something you feel more comfortable with (e.g. begin:end). The important thing is to be clear and to remain consistent.

Determine `size`, given `start` and `stop`

size = stop - start

E.g. Given start = 3 and stop = 10, the slice's first item is at index 3 and its last item, at index 9 (last = stop - 1). Its size is 10 - 3 or 7.

Calculating `start` or `stop`

The implications of the previous formula is obvious

start = stop  - size
stop  = start + size

Slice

Let's define the notation [a:b] as representing all elements in a slice from position a up to, but not including position b. Given stop = last + 1, [start:stop] fits that convention. This should feel familiar if you're acquainted with Python's slice or range syntax.

Thus you can calculate the size of a slice [a:b] with b - a.

If the size is zero, it implies that b - a = 0, which happens when a = b. Thus the slice [a:a] is equivalent to the empty slice [].

Calculating the `midpoint`

Let's define midpoint as a position that splits the slice into two parts, with the right part having at most one more element than the left.

We'll call R the value which, when added to start gives us the midpoint. With this definition, R and midpoint can be calculated as follows

R = floor((stop - start)/2)
midpoint = start + R

We could alternately define R as the value which, when removed from stop gives us the midpoint. With that definition the formulas must be changed to

R = ceil((stop - start)/2)
midpoint = stop - R

Shorthand formulas for midpoint can be derived:

midpoint = floor((start + stop)/2)
midpoint = ceil((start + stop - 1)/2)

I'd suggest that floor((start + stop)/2) is arguably the easier formula to memorize. Some languages have a shorthand notation for the floor() function which makes it even easier (e.g. in Python (start + stop)//2).

E.g. Given an array containing 27 items,

{ 0, ... 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, ... 26 }

What's the midpoint of the slice between index 6 and 17 inclusively?

start = 6
last = 17
stop = last + 1
stop = 18
m = floor((start + stop)/2)
m = 12

{ ... 4, 5, [ 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 ], 18 ... }
                                  ——

Why favor this approach to picking the midpoint?

When a problem involves using the midpoint of slices, it should technically be possible to reach a solution with an alternative definition of its location. The one shown above is generally preferred because it has built-in consistency with the start:stop convention.

Observe that when splitting a slice [start:stop] at its midpoint, the lower half [start:midpoint] excludes midpoint, while the upper half [midpoint:stop] includes it, regardless of whether the slice is even or odd sized.

Consider the following even sized slice. It technically has two middle items, C and D, but the approach I demonstrated earlier picks the second, which it automatically excludes from the lower partition, while including it in the upper one.

     [ A | B | C | D | E | F ]
                   —

If we used an approach where the midpoint should be part of the lower divide, while the size difference between partitions remains at most 1, C would become the midpoint and we'd need to modify the slice formula slightly to use it, such that the lower half would be represented by [start:midpoint+1], and the upper by [midpoint+1:stop]. This is valid, but is arguably more complicated than simply favoring the higher midpoint.

Splitting a slice at the midpoint

Many algorithms involve dividing input slices at their midpoint (e.g. quicksort, binary search). With the described midpoint = floor((start+stop)/2) formula, it's important to remember that the midpoint will always be part of the upper slice in a split, whether the slice is even or odd sized.

[ start ... ] | [ midpoint ... ] stop

Splitting slices with 2 or more items

A slice that contain 2 or more items, when split at its midpoint will always result in two slices of at least one item.

2x    >= 2 
2x/2  >= 2/2
x     >= 1

Splitting an even-sized slices will result in two slices of equal sizes.

[ 2x items ] -> [ x items ] | [ x items ]

Whereas splitting an odd-sized slices will produce a right child larger than its sibling by exactly one item.

[ 2x + 1 items ] -> [ x items ] | [ x + 1 items ]

Splitting a singleton slice

What happens when the slice to split only has one item? Take [ A ] i.e. start = x and stop = x + 1.

midpoint = floor((start + stop)/2)
    = floor((x + x + 1)/2)
    = floor((2x + 1)/2)
    = x

The middle element is exactly at start, but as we've seen the singleton element should be moved to the right child slice, whereas the left child is set to the empty slice.

[ A ] = [start:midpoint] + [midpoint:stop]
      = [start:start] + [start:stop]
      = [] + [ A ]

This property is important to remember when establishing the base case of certain algorithms that involve shrinking slices. If you fail to notice that the split can involve singleton slices, you may enter infinite recursion (or loops). In most problems, you can account for this case with this condition:

if stop - start <= 1:
    // return
    // or return stop - start
    // or return [start:stop]

Finding the kth-to-last item of a slice

If given stop, the index of the kth to last item can be retrieved with

index = stop - k

such that

last_item = stop - 1
second_to_last = stop - 2
third_to_last = stop - 3
etc...

Thus to access the kth item from the end of the slice you can do

slice[stop - k]

If the "slice" happens to actually be the entire array, stop has the same value as the size. The kth item from the end of the array thus becomes A[size - k]. Some programming languages (e.g. Python) take advantage of this property and offer a shorthand notation: A[-k].