# Autotuner¶

The genetic-algorithm-based autotuner tries to optimize a TC by tuning the available mapping options.

Each autotuning session starts with a set (population) of candidate options which can be initialized randomly and/or from known starting points. Each candidate is benchmarked and the best ones have a higher chance of surviving and breeding to produce the next generation of candidates. This procedure is repeated for a pre-defined number of generations. In the end, the best candidate is returned.

At the end of each generation new candidates must be selected. Each candidate is either a combination of parent candidates (crossover) or one that survives from the previous generation. Both types are potentially randomly changed (mutation). The top candidates (elites) survive intact (without mutations) between generations.

## Parameters for Autotuning¶

The parameters that control the autotuner’s behavior are the following:

• Number of generations: The number of tuning generation to be run.
• Population size: The number of candidates in each generation.
• Number of elites: The number of best candidates that are preserved intact between generations (without any mutations).
• Crossover rate: The rate at which new candidates are bred instead of just surviving across generations.
• Mutation rate: The rate at which candidate options are randomly changed (mutated).
• Number of threads: The number of threads that are used to compile different candidates in parallel.
• GPUs: A comma separated list of GPUs (ids) to use for evaluating candidates (e.g., “0,1,2,3”).
• RNG state: The state used to seed the tuner’s RNG.
• min_launch_total_threads: Prune out kernels mapped to fewer than this many threads and block. Set this to 1 to avoid pruning.

## Caching¶

After each autotuning session the best candidates’ profiling information and compilation results are stored in a cache. They can be subsequently retrieved to seed a new autotuning session.