Tensor Comprehensions
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
tc::ExecutionEngine Class Reference

#include <execution_engine.h>

Collaboration diagram for tc::ExecutionEngine:

Classes

struct  ExecutorInfo
 

Public Member Functions

 ExecutionEngine ()=default
 
void define (const std::string &language)
 
void define (const std::vector< lang::TreeRef > &treeRefs)
 
void addTC (const std::string &tc)
 
std::vector< const DLTensor * > inferOutputTensorInfo (const std::string &name, const std::vector< const DLTensor * > &inTensorPtrs)
 
lang::TreeRef treeForFunction (const std::string &name)
 
size_t compile (const std::string &name, const std::vector< const DLTensor * > &inputs, const MappingOptions &options)
 Returns a handle for the compiled kernel. More...
 
Duration run (size_t handle, const std::vector< const DLTensor * > &inputs, const std::vector< DLTensor * > &outputs, bool profile=false, std::function< bool(const ExecutorInfo *)> pruningFunction=[](const ExecutorInfo *){return false;})
 
void uncheckedRun (size_t handle, const std::vector< const void * > &inputs, const std::vector< void * > &outputs)
 
void clear (size_t handle)
 

Private Member Functions

size_t getHandle (const std::string &name, const std::vector< const DLTensor * > &inputsInfo, const MappingOptions &options)
 
std::unique_ptr< ExecutorInfomakeExecutorInfo (const std::string &name, const std::vector< const DLTensor * > &inputsInfo, const MappingOptions &options)
 
size_t emplaceExecutor (std::unique_ptr< ExecutorInfo > p)
 

Private Attributes

std::mutex executorInfoMutex
 For thread-safety perform all cheap operations under lock. More...
 
std::vector< std::unique_ptr
< ExecutorInfo > > 
executors_
 
std::map< std::string,
lang::TreeRef
tcNameMap_
 
size_t uidCounter = 0
 

Detailed Description

The goal for this new shiny API is to provide a different pathway for being able to execute the kernels for multiple TC i.e. given the language which can have multiple TCs, people should be able to run things by just calling out the run function with the name of function and the inputs to run on.

Constructor & Destructor Documentation

tc::ExecutionEngine::ExecutionEngine ( )
default

Member Function Documentation

void tc::ExecutionEngine::addTC ( const std::string &  tc)
void tc::ExecutionEngine::clear ( size_t  handle)
size_t tc::ExecutionEngine::compile ( const std::string &  name,
const std::vector< const DLTensor * > &  inputs,
const MappingOptions options 
)

Returns a handle for the compiled kernel.

void tc::ExecutionEngine::define ( const std::string &  language)

Create the ExecutionEngine::tcNameMap_ using the language passed to it - should support many TC.

void tc::ExecutionEngine::define ( const std::vector< lang::TreeRef > &  treeRefs)

Create the ExecutionEngine::tcNameMap_ from the parsed TC string - supports many TC.

size_t tc::ExecutionEngine::emplaceExecutor ( std::unique_ptr< ExecutorInfo p)
private
size_t tc::ExecutionEngine::getHandle ( const std::string &  name,
const std::vector< const DLTensor * > &  inputsInfo,
const MappingOptions options 
)
private
std::vector<const DLTensor*> tc::ExecutionEngine::inferOutputTensorInfo ( const std::string &  name,
const std::vector< const DLTensor * > &  inTensorPtrs 
)

Get the output Tensor info that can be used by the calling framework to allocate storage for the output.

std::unique_ptr<ExecutorInfo> tc::ExecutionEngine::makeExecutorInfo ( const std::string &  name,
const std::vector< const DLTensor * > &  inputsInfo,
const MappingOptions options 
)
private
Duration tc::ExecutionEngine::run ( size_t  handle,
const std::vector< const DLTensor * > &  inputs,
const std::vector< DLTensor * > &  outputs,
bool  profile = false,
std::function< bool(const ExecutorInfo *)>  pruningFunction = [](const ExecutorInfo *){return false;} 
)

Run a TC specified by its name on the given tensor inputs and fill the outputs with the result. The TC is looked up by its handle. If profile is set, the kernel runtime is returned.

The pruning function returns true if the run should not proceed (e.g. if there are too few threads mapped that would likely result in catastrophic performance). In this case, return Duration::max().

lang::TreeRef tc::ExecutionEngine::treeForFunction ( const std::string &  name)
inline
void tc::ExecutionEngine::uncheckedRun ( size_t  handle,
const std::vector< const void * > &  inputs,
const std::vector< void * > &  outputs 
)

This is the "low-latency" mode in which we just propagate raw pointers to data in GPU address space. No tensor-related information can be checked so it is the user's responsibility to ensure that shapes and strides match. If the user doesn't then segfault will likely occur.

Member Data Documentation

std::mutex tc::ExecutionEngine::executorInfoMutex
private

For thread-safety perform all cheap operations under lock.

std::vector<std::unique_ptr<ExecutorInfo> > tc::ExecutionEngine::executors_
private
std::map<std::string, lang::TreeRef> tc::ExecutionEngine::tcNameMap_
private
size_t tc::ExecutionEngine::uidCounter = 0
private

The documentation for this class was generated from the following file: