.. _inference:

Range Inference
===============

In Tensor Comprehensions, loops are implicit and output tensors sizes
are inferred. Concretely, when a user write something like the
following stencil:

.. code::

    A(i) = B(i) + C(i-2)

Tensor Comprehensions must deduce the range of values that :code:`i` iterates
over, and from this we derive the size of the tensor :code:`A`. We also must
infer the size of loops that reduce, for example, the implicit loop
over :code:`k` in a matrix multiply:

.. code::

    def matmul(float(M, K) A, float(K, N) B) -> (C) {
        C(m, n) +=! A(m, r_k) * B(r_k, n)
    }


If this range inference procedure fails to match the user's intent,
then in the first case the output will not be the size they expect,
and in the second case the output *values* will be incorrect, as
either too few or too many terms were included in the summation.

To program productively, one must be able to mentally emulate the
written code on the abstract machine defined by the language
semantics. Regardless of how well-defined it is, if your source code
doesn't do what you think it does, you have a bug. Thus it's critical
that users build a mental model of how we infer ranges, and are able
to do range inference in their heads as they write code. If this
requires more thought than writing explicit loops would, we have
failed.

With this in mind, we eschew heavy-duty mathematical tools, and take a
more straightforward approach for the sake of usability. We infer
ranges only in cases where we feel they are obvious, and require
explicit annotation elsewhere. We intend to fine-tune this boundary in
the future depending on what users find surprising.

The Range Inference Algorithm
-----------------------------

To a good first approximation, we infer rectangular ranges that are as
large as possible without reading out of bounds on the inputs. If
there is more than one way to do this, we throw an error and require
explicit annotation using a :code:`where` clause.

This rule is sufficient to understand the matrix multiply case
above. Maximizing the range of :code:`i` gives it the range of the number of
rows of B. Similarly maximizing the range of :code:`j` gives it the range of
the columns of C. :code:`k` is used twice, so making :code:`k` as large as
possible gives it the lesser of the number of columns of B and the
number of rows of C. These in turn are constrained to be equal by the
type signature of the function (they are both :code:`K`).

Now consider a stencil:

.. code::

    A(i) += B(i + k) * K(k)

There are multiple ways in which we could maximize the ranges of :code:`i`
and :code:`k`. If we first maximize :code:`i`, we might say that it ranges over
the entirety of B. This forces :code:`k` to take on a single value only,
which will not result in the output one expects from a convolution (it
ignores most of the kernel!). If we first maximize :code:`k`, so that it
ranges over the entirety of :code:`K`, then in order to not read out of bounds
the range of :code:`i` must be smaller, and we get an output that is
slightly smaller than the input. This is the behavior we prefer.

In order to make this unambiguous without requiring explicit
annotation in this simple case, range inference proceeds in rounds. We
maintain a set of unresolved variables. Initially it contains all
variables not constrained with an explicit :code:`where` clause. In each
round, we consider all the tensor argument expressions that contain a
single unresolved variable, and construct a boolean expression that
states the access is not out-of-bounds. We then use tools from
Halide (:code:`solve_for_inner_interval`) to find the maximal range for
the variable that satisfies this condition, given the ranges of
variables already resolved in previous rounds. If the variable was
already constrained this round by some other tensor access, we take
the intersection of the inferred ranges.

For the stencil above, in the first round we ignore the expression
:code:`B(i + k)` because it contains multiple unresolved variables. We use
the expression :code:`K(k)` to deduce a range for :code:`k`. In the second
round, :code:`B(i + k)` now contains a single unresolved variable, and we
use the already-inferred range of :code:`k` to deduce a maximal range for
:code:`i`.

Preconditions
-------------

While this procedure produces easy-to-justify ranges for each
variable, it is not sufficient to ensure that no out-of-bounds reads
occur. For example consider:

.. code::

    A(i, j) = B(i) * C(i + j) * D(j)

In the first round, we can resolve both :code:`i` and :code:`j`, using :code:`B(i)` and
:code:`D(j)` respectively. This guarantees that there are no out-of-bounds reads
on :code:`B` and :code:`D`, and defines the size of :code:`A`. The size of :code:`C` could be used to further
restrain the range of :code:`i` or :code:`j` to avoid out-of-bounds reads on :code:`C`. But there
is no second round since the ranges for all variables have been resolved; and
even if there would be, the expression :code:`i + j` would not translate into a
unique rectangular shape for :code:`i` and :code:`j`. So we are left with the requirement
that :code:`C` is large enough to cover all reads. In some cases we can statically
prove this condition (for example if the sizes of :code:`B`, :code:`C`, and :code:`D` are known
constants). In general we emit a compile-time warning.

We intend to add runtime checking of these conditions in the future. However, for some preconditions, it is never desirable to
check them at runtime. Consider a lookup table:

.. code::

    def lut(float(J) B, float(I) C) -> A {
      A(i) = B(C(i))
    }

The range of :code:`i` is constrained by its use in :code:`C`, but we are left with
the additional precondition that the *values* in :code:`C` over that range
never exceed the size of :code:`B`. Checking this at runtime would require an
expensive bounds check in the inner loop. As with the previous case,
we currently just emit a compile-time notification that this unchecked
precondition exists. The user can suppress it and make this code
unconditionally safe by explicitly clamping the expression :code:`C(i)` to be
within the bounds of :code:`B` like so:

.. code::

    def lut(float(J) B, float(I) C) -> A {
      A(i) = B(max(min(C(i), J-1), 0))
    }

Though of course this also has a performance impact.

Worked Examples
---------------

We now describe how range inference reasons about several more complex
examples. If you find a confusing case, feel free to request that we
add it to this section.

Inverted indexing
~~~~~~~~~~~~~~~~~

.. code::

    def reverted(float(I) B) -> A {
      A(i) = B(10 - i)
    }

From the use in :code:`B`, range inference constructs the condition:

.. code::

  0 <= 10 - i < I

This is rearranged by Halide's solver to give the following range:

.. code::

  9 - I <= i < 11

Strided indexing with constant stride
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code::

    def subsample_2(float(I) B) -> A {
      A(i) = B(2*i)
    }

From the use, range inference constructs the condition:

.. code::

  0 <= 2*i < I

This is rearranged into:

.. code::

  0 <= i < (I+1)/2

Note that the division is integer division, which rounds towards
negative infinity in Tensor Comprehensions and Halide.

Strided indexing with offsets
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code::

    def average_pool_2(float(I) B) -> A {
      A(i) = B(2*i) + B(2*i + 1)
    }

From the uses, range inference constructs the conditions:

.. code::

  0 <= 2*i < I
  0 <= 2*i + 1 < I

These are rearranged into:

.. code::

  0 <= i < (I+1)/2
  0 <= i < I/2

The intersection of these two ranges is:

.. code::

  0 <= i < I/2

One could write the equivalent code:

.. code::

    def average_pool_2(float(I) B) -> A {
      A(i) = B(2*i + k) where k in 0:2
    }

The syntax :code:`where k in lb:ub` is inclusive of the lower bound
and exclusive of the upper bound: it constrains the range of :code:`k` to
the integers between :code:`lb` and :code:`ub-1`, here :code:`k` may only take the values
:code:`0` and :code:`1`.

Since the variable :code:`k` is already resolved by the where clause. From the
use of :code:`i`, range inference constructs the condition:

.. code::

  0 <= 2*i + k < I

We eliminate k by taking the conjunction of the expression over all
values of k, using Halide's ``and_condition_over_domain``. For the
lower bound, :code:`k == 0` dominates. For the upper bound, :code:`k == 1`
dominates.

.. code::

  0 <= 2*i && 2*i + 1 < I

This is equivalent to the intersection of the conditions in the
unrolled case, and so we get the same result:

.. code::

  0 <= i < I/2

Strided indexing with dynamic stride
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code::

    def subsample_2(float(I) B, int(1) S) -> A {
      A(i) = B(S(0)*i)
    }

The value of :code:`S(0)` is not fixed until runtime, so we can't resolve the
size of :code:`A` or the range of the loop. This case throws a compile-time
error. A :code:`where` clause that defines the range of :code:`i` is required.

Constant fill using an exists clause
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code::

    def constant_fill(float(N) A, float c) -> B {
      B(i) = c where exists A(i)
    }

An :code:`exists` clause allows you to add additional expressions to the range
inference process without having the expressions affect the actual computation.
In this example, it allows you to say that :code:`B(i)` should have the same size as
:code:`A(i)`, but be filled with a constant value :code:`c`.  That is, you should infer the
range of :code:`B(i)` to exist at all the places where :code:`A(i)` exists.
It is equivalent to writing the expression :code:`true ? c : A(i)`, but with
clearer intentions.