Getting Started

We provide integration of Tensor Comprehensions (TC) with PyTorch for both training and inference purposes. Using TC with PyTorch, you can express an operator using Einstein notation and get a fast CUDA implementation for that layer with just a few lines of code (examples below).

Here are a few cases where TC can be useful:

  • specialize your layer for uncommon tensor sizes and get better performance than libraries, or
  • experiment with layer fusion like group convolution, ReLU, FC, or
  • synthesize new layers and get an efficient kernel automatically, or
  • synthesize layers for tensors with unconventional memory layouts

TC makes it easy to synthesize CUDA kernels for such cases and more. By providing TC integration with PyTorch, we hope to make it easy for PyTorch users to express their operations and bridge the gap between research and engineering.

Installation

See instructions here: Installation Guide.

Example

For demonstration purposes, we illustrate a simple matmul operation backed by TC.

import tensor_comprehensions as tc
import torch
mm = """
def matmul(float(M,K) A, float(N,K) B) -> (output) {
    output(m, n) +=! A(m, r_k) * B(n, r_k)
}
"""
TC = tc.define(mm, tc.make_naive_options_factory())
A, B = torch.randn(3, 4).cuda(), torch.randn(4, 5).cuda()
C = TC.matmul(A, B)

With a few lines of code, you can get a functional CUDA implementation for an operation expressed in TC. Note, however, that this simplest example is not expected to be fast. Read the documentation to find out more.