LLVM Environment Reference

LLVM is a production-grade compiler used throughout industry. It defines a machine independent intermediate representation (IR), and comprises a family of tools with frontends for C, C++, OpenCL, and many other languages.

CompilerGym exposes the LLVM IR optimizer for reinforcement learning through an LlvmEnv environment.

Overview:

Installation 

The LLVM environments are self-installing and work out of the box. A pack of required runtime data is downloaded and cached on first use of the environments.

Datasets 

We provide several datasets of open-source LLVM-IR benchmarks for use:

Dataset	Num. Benchmarks 1	Description	Validatable 2
benchmark://anghabench-v1	1,041,333	Compile-only C/C++ functions extracted from GitHub [Homepage, Paper]	No
benchmark://blas-v0	300	Basic linear algebra kernels [Homepage, Paper]	No
benchmark://cbench-v1	23	Runnable C benchmarks [Homepage, Paper]	Partially
benchmark://chstone-v0	12	Benchmarks for C-based High-Level Synthesis [Homepage, Paper]	No
benchmark://clgen-v0	996	Synthetically generated OpenCL kernels [Homepage, Paper]	No
benchmark://github-v0	49,738	Compile-only C/C++ objects from GitHub [Paper]	No
benchmark://jotaibench-v0	18,761	Compile-only C/C++ functions extracted from GitHub [Homepage]	No
benchmark://linux-v0	13,894	Compile-only object files from C Linux kernel [Homepage]	No
benchmark://mibench-v1	40	C benchmarks [Paper]	No
benchmark://npb-v0	122	NASA Parallel Benchmarks [Paper]	No
benchmark://opencv-v0	442	Compile-only object files from C++ OpenCV library [Homepage, Paper]	No
benchmark://poj104-v1	49,816	Solutions to programming programs [Homepage, Paper]	No
benchmark://tensorflow-v0	1,985	Compile-only object files from C++ TensorFlow library [Homepage, Paper]	No
generator://csmith-v0	∞	Random conformant C99 programs [Homepage, Paper]	No
generator://llvm-stress-v0	∞	Randomly generated LLVM-IR [Documentation]	No
Total	1,177,462

1: Values are for the Linux datasets. Some of the datasets contain fewer benchmarks on macOS.
2: A validatable dataset is one where the behavior of the benchmarks can be checked by compiling the programs to binaries and executing them. If the benchmarks crash, or are found to have different behavior, then validation fails. This type of validation is used to check that the compiler has not broken the semantics of the program. See compiler_gym.bin.validate.

All of the above datasets are available for use with the LLVM environment. See compiler_gym.envs.llvm.datasets for API details.

We characterize the datasets below in radial plots which show, clockwise from the top: the average number of instructions per benchmark, the density of branching instructions, the density of memory operations, and the density of arithmetic instructions. For example, comparing blas-v0 and cbench-v1 shows that blas-v0 consists of smaller programs with a similar density of branches, a higher density of arithmetic operations and relatively few memory operations. cbench-v1, in contrast to the small linear algebra kernels of blas-v0, contains larger programs with a higher density of memory operations and fewer arithmetic operations.

Observation Spaces 

We provide several observation spaces for LLVM based on published compiler research.

LLVM-IR 

Observation space	Shape
Ir	str_list<>[0,inf])
BitcodeFile	str_list<>[0,4096.0])

A serialized representation of the LLVM-IR can be accessed as a string through the Ir observation space:

>>> env.observation["Ir"]
'; ModuleID = \'benchmark://npb-v0/50\'\n ..."use-soft-float"="false" }\n'

Alternatively the module can be serialized to a bitcode file on disk:

>>> env.observation["BitcodeFile"]
'/home/user/.cache/compiler_gym/service/2020-12-21T11:55:41.716711-6f4f0669/module-5a8b9fcf.bc'

Note

Files generated by the BitcodeFile observation space are put in a temporary directory that is removed when env.close() is called.

InstCount 

Observation space	Shape
InstCount	Box(0, 9223372036854775807, (70,), int64)
InstCountDict	Dict(AShrCount:int<0,inf>, AddCount:int<0,inf>, AddrSpaceCastCount:int<0,inf>, AllocaCount:int<0,inf>, AndCount:int<0,inf>, AtomicCmpXchgCount:int<0,inf>, AtomicRMWCount:int<0,inf>, BitCastCount:int<0,inf>, BrCount:int<0,inf>, CallBrCount:int<0,inf>, CallCount:int<0,inf>, CatchPadCount:int<0,inf>, CatchRetCount:int<0,inf>, CatchSwitchCount:int<0,inf>, CleanupPadCount:int<0,inf>, CleanupRetCount:int<0,inf>, ExtractElementCount:int<0,inf>, ExtractValueCount:int<0,inf>, FAddCount:int<0,inf>, FCmpCount:int<0,inf>, FDivCount:int<0,inf>, FMulCount:int<0,inf>, FNegCount:int<0,inf>, FPExtCount:int<0,inf>, FPToSICount:int<0,inf>, FPToUICount:int<0,inf>, FPTruncCount:int<0,inf>, FRemCount:int<0,inf>, FSubCount:int<0,inf>, FenceCount:int<0,inf>, FreezeCount:int<0,inf>, GetElementPtrCount:int<0,inf>, ICmpCount:int<0,inf>, IndirectBrCount:int<0,inf>, InsertElementCount:int<0,inf>, InsertValueCount:int<0,inf>, IntToPtrCount:int<0,inf>, InvokeCount:int<0,inf>, LShrCount:int<0,inf>, LandingPadCount:int<0,inf>, LoadCount:int<0,inf>, MulCount:int<0,inf>, OrCount:int<0,inf>, PHICount:int<0,inf>, PtrToIntCount:int<0,inf>, ResumeCount:int<0,inf>, RetCount:int<0,inf>, SDivCount:int<0,inf>, SExtCount:int<0,inf>, SIToFPCount:int<0,inf>, SRemCount:int<0,inf>, SelectCount:int<0,inf>, ShlCount:int<0,inf>, ShuffleVectorCount:int<0,inf>, StoreCount:int<0,inf>, SubCount:int<0,inf>, SwitchCount:int<0,inf>, TotalBlocksCount:int<0,inf>, TotalFuncsCount:int<0,inf>, TotalInstsCount:int<0,inf>, TruncCount:int<0,inf>, UDivCount:int<0,inf>, UIToFPCount:int<0,inf>, URemCount:int<0,inf>, UnreachableCount:int<0,inf>, UserOp1Count:int<0,inf>, UserOp2Count:int<0,inf>, VAArgCount:int<0,inf>, XorCount:int<0,inf>, ZExtCount:int<0,inf>)
InstCountNorm	Box(0.0, 1.0, (69,), float32)
InstCountNormDict	Dict(AShrDensity:int<0,inf>, AddDensity:int<0,inf>, AddrSpaceCastDensity:int<0,inf>, AllocaDensity:int<0,inf>, AndDensity:int<0,inf>, AtomicCmpXchgDensity:int<0,inf>, AtomicRMWDensity:int<0,inf>, BitCastDensity:int<0,inf>, BrDensity:int<0,inf>, CallBrDensity:int<0,inf>, CallDensity:int<0,inf>, CatchPadDensity:int<0,inf>, CatchRetDensity:int<0,inf>, CatchSwitchDensity:int<0,inf>, CleanupPadDensity:int<0,inf>, CleanupRetDensity:int<0,inf>, ExtractElementDensity:int<0,inf>, ExtractValueDensity:int<0,inf>, FAddDensity:int<0,inf>, FCmpDensity:int<0,inf>, FDivDensity:int<0,inf>, FMulDensity:int<0,inf>, FNegDensity:int<0,inf>, FPExtDensity:int<0,inf>, FPToSIDensity:int<0,inf>, FPToUIDensity:int<0,inf>, FPTruncDensity:int<0,inf>, FRemDensity:int<0,inf>, FSubDensity:int<0,inf>, FenceDensity:int<0,inf>, FreezeDensity:int<0,inf>, GetElementPtrDensity:int<0,inf>, ICmpDensity:int<0,inf>, IndirectBrDensity:int<0,inf>, InsertElementDensity:int<0,inf>, InsertValueDensity:int<0,inf>, IntToPtrDensity:int<0,inf>, InvokeDensity:int<0,inf>, LShrDensity:int<0,inf>, LandingPadDensity:int<0,inf>, LoadDensity:int<0,inf>, MulDensity:int<0,inf>, OrDensity:int<0,inf>, PHIDensity:int<0,inf>, PtrToIntDensity:int<0,inf>, ResumeDensity:int<0,inf>, RetDensity:int<0,inf>, SDivDensity:int<0,inf>, SExtDensity:int<0,inf>, SIToFPDensity:int<0,inf>, SRemDensity:int<0,inf>, SelectDensity:int<0,inf>, ShlDensity:int<0,inf>, ShuffleVectorDensity:int<0,inf>, StoreDensity:int<0,inf>, SubDensity:int<0,inf>, SwitchDensity:int<0,inf>, TotalBlocksDensity:int<0,inf>, TotalFuncsDensity:int<0,inf>, TruncDensity:int<0,inf>, UDivDensity:int<0,inf>, UIToFPDensity:int<0,inf>, URemDensity:int<0,inf>, UnreachableDensity:int<0,inf>, UserOp1Density:int<0,inf>, UserOp2Density:int<0,inf>, VAArgDensity:int<0,inf>, XorDensity:int<0,inf>, ZExtDensity:int<0,inf>)

The InstCount observation space is a 70-dimension integer feature vector in the range [0,∞]. The first three features are the total number of instructions, the total number of basic blocks, and the total number of functions. The remaining features are the number of instructions of each of the 67 different types in the program.

Use the InstCount observation space to access the feature vectors as an np.array, and InstCountDict to receive them as a self-documented dictionary, keyed by the name of each feature.

The table below provides a description of each of the 70 features, with the index in which they appear in the InstCount and InstCountNorm spaces, and their name as they appear in the keys of the InstCountDict and InstCountNormDict spaces. See the LLVM instruction reference for the meaning of the counted instructions.

Index	Name	Description
0	TotalInsts	Total instruction count
1	TotalBlocks	Basic block count
2	TotalFuncs	Function count
3	Ret	Ret instruction count
4	Br	Br instruction count
5	Switch	Switch instruction count
6	IndirectBr	IndirectBr instruction count
7	Invoke	Invoke instruction count
8	Resume	Resume instruction count
9	Unreachable	Unreachable instruction count
10	CleanupRet	CleanupRet instruction count
11	CatchRet	CatchRet instruction count
12	CatchSwitch	CatchSwitch instruction count
13	CallBr	CallBr instruction count
14	FNeg	FNeg instruction count
15	Add	Add instruction count
16	FAdd	FAdd instruction count
17	Sub	Sub instruction count
18	FSub	FSub instruction count
19	Mul	Mul instruction count
20	FMul	FMul instruction count
21	UDiv	UDiv instruction count
22	SDiv	SDiv instruction count
23	FDiv	FDiv instruction count
24	URem	URem instruction count
25	SRem	SRem instruction count
26	FRem	FRem instruction count
27	Shl	Shl instruction count
28	LShr	LShr instruction count
29	AShr	AShr instruction count
30	And	And instruction count
31	Or	Or instruction count
32	Xor	Xor instruction count
33	Alloca	Alloca instruction count
34	Load	Load instruction count
35	Store	Store instruction count
36	GetElementPtr	GetElementPtr instruction count
37	Fence	Fence instruction count
38	AtomicCmpXchg	AtomicCmpXchg instruction count
39	AtomicRMW	AtomicRMW instruction count
40	Trunc	Trunc instruction count
41	ZExt	ZExt instruction count
42	SExt	SExt instruction count
43	FPToUI	FPToUI instruction count
44	FPToSI	FPToSI instruction count
45	UIToFP	UIToFP instruction count
46	SIToFP	SIToFP instruction count
47	FPTrunc	FPTrunc instruction count
48	FPExt	FPExt instruction count
49	PtrToInt	PtrToInt instruction count
50	IntToPtr	IntToPtr instruction count
51	BitCast	BitCast instruction count
52	AddrSpaceCast	AddrSpaceCast instruction count
53	CleanupPad	CleanupPad instruction count
54	CatchPad	CatchPad instruction count
55	ICmp	ICmp instruction count
56	FCmp	FCmp instruction count
57	PHI	PHI instruction count
58	Call	Call instruction count
59	Select	Select instruction count
60	UserOp1	UserOp1 instruction count
61	UserOp2	UserOp2 instruction count
62	VAArg	VAArg instruction count
63	ExtractElement	ExtractElement instruction count
64	InsertElement	InsertElement instruction count
65	ShuffleVector	ShuffleVector instruction count
66	ExtractValue	ExtractValue instruction count
67	InsertValue	InsertValue instruction count
68	LandingPad	LandingPad instruction count
69	Freeze	Freeze instruction count

Example values:

>>> env.observation["InstCount"]
array([406198,  46981,   3795,   3712,  41629,   1489,      0,      0,
            0,    151,      0,      0,      0,      0,     49,   5393,
          301,   3548,    157,   1132,    748,    152,    296,    270,
           42,     72,      0,   1228,    408,   1251,   2433,    878,
         1022,  22963, 107948,  53284,  59136,      0,      0,      0,
         2815,   7711,   3082,     14,    327,     16,    566,    328,
          888,    844,      0,  32345,      0,      0,      0,  14341,
          682,   1622,  30668,    257,      0,      0,      0,      0,
            0,      0,      0,      0,      0,      0])
>>> env.observation["InstCountDict"]
{'TotalInstsCount': 406198, 'TotalBlocksCount': 46981, 'TotalFuncsCount':
3795, 'RetCount': 3712, 'BrCount': 41629, 'SwitchCount': 1489,
'IndirectBrCount': 0, 'InvokeCount': 0, 'ResumeCount': 0,
'UnreachableCount': 151, 'CleanupRetCount': 0, 'CatchRetCount': 0,
'CatchSwitchCount': 0, 'CallBrCount': 0, 'FNegCount': 49, 'AddCount': 5393,
'FAddCount': 301, 'SubCount': 3548, 'FSubCount': 157, 'MulCount': 1132,
'FMulCount': 748, 'UDivCount': 152, 'SDivCount': 296, 'FDivCount': 270,
'URemCount': 42, 'SRemCount': 72, 'FRemCount': 0, 'ShlCount': 1228,
'LShrCount': 408, 'AShrCount': 1251, 'AndCount': 2433, 'OrCount': 878,
'XorCount': 1022, 'AllocaCount': 22963, 'LoadCount': 107948, 'StoreCount':
53284, 'GetElementPtrCount': 59136, 'FenceCount': 0, 'AtomicCmpXchgCount':
0, 'AtomicRMWCount': 0, 'TruncCount': 2815, 'ZExtCount': 7711, 'SExtCount':
3082, 'FPToUICount': 14, 'FPToSICount': 327, 'UIToFPCount': 16,
'SIToFPCount': 566, 'FPTruncCount': 328, 'FPExtCount': 888, 'PtrToIntCount':
844, 'IntToPtrCount': 0, 'BitCastCount': 32345, 'AddrSpaceCastCount': 0,
'CleanupPadCount': 0, 'CatchPadCount': 0, 'ICmpCount': 14341, 'FCmpCount':
682, 'PHICount': 1622, 'CallCount': 30668, 'SelectCount': 257,
'UserOp1Count': 0, 'UserOp2Count': 0, 'VAArgCount': 0,
'ExtractElementCount': 0, 'InsertElementCount': 0, 'ShuffleVectorCount': 0,
'ExtractValueCount': 0, 'InsertValueCount': 0, 'LandingPadCount': 0,
'FreezeCount': 0}

The derived spaces InstCountNorm and InstCountNormDict return the instruction counts normalized to the total number of instructions (index 0 in the table above). The first feature is omitted, yield a 69-dimensionality feature vector:

>>> env.observation["InstCountNorm"]
array([1.1566034e-01, 9.3427347e-03, 9.1384007e-03, 1.0248450e-01,
6657001e-03, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
7173988e-04, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
0000000e+00, 1.2063082e-04, 1.3276776e-02, 7.4101792e-04,
7346565e-03, 3.8651100e-04, 2.7868182e-03, 1.8414665e-03,
7420174e-04, 7.2870863e-04, 6.6470046e-04, 1.0339785e-04,
7725346e-04, 0.0000000e+00, 3.0231562e-03, 1.0044363e-03,
0797787e-03, 5.9896898e-03, 2.1615075e-03, 2.5160143e-03,
6531545e-02, 2.6575217e-01, 1.3117741e-01, 1.4558417e-01,
0000000e+00, 0.0000000e+00, 0.0000000e+00, 6.9301180e-03,
8983353e-02, 7.5874329e-03, 3.4465949e-05, 8.0502609e-04,
9389659e-05, 1.3934091e-03, 8.0748799e-04, 2.1861261e-03,
0778044e-03, 0.0000000e+00, 7.9628654e-02, 0.0000000e+00,
0000000e+00, 0.0000000e+00, 3.5305440e-02, 1.6789841e-03,
9931266e-03, 7.5500123e-02, 6.3269638e-04, 0.0000000e+00,
0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
0000000e+00], dtype=float32)
>>> math.isclose(env.observation["InstCountNorm"][2:].sum(), 1)
True

The InstCount observation space and its derivatives are cheap to compute, deterministic, and platform independent.

Autophase 

Observation space

Shape

Autophase

Box(0, 9223372036854775807, (56,), int64)

AutophaseDict

Dict(ArgsPhi:int<0,inf>, BB03Phi:int<0,inf>, BBHiPhi:int<0,inf>, BBNoPhi:int<0,inf>, BBNumArgsHi:int<0,inf>, BBNumArgsLo:int<0,inf>, BeginPhi:int<0,inf>, BlockLow:int<0,inf>, BlockMid:int<0,inf>, BranchCount:int<0,inf>, CriticalCount:int<0,inf>, NumAShrInst:int<0,inf>, NumAddInst:int<0,inf>, NumAllocaInst:int<0,inf>, NumAndInst:int<0,inf>, NumBitCastInst:int<0,inf>, NumBrInst:int<0,inf>, NumCallInst:int<0,inf>, NumEdges:int<0,inf>, NumGetElementPtrInst:int<0,inf>, NumICmpInst:int<0,inf>, NumLShrInst:int<0,inf>, NumLoadInst:int<0,inf>, NumMulInst:int<0,inf>, NumOrInst:int<0,inf>, NumPHIInst:int<0,inf>, NumRetInst:int<0,inf>, NumSExtInst:int<0,inf>, NumSelectInst:int<0,inf>, NumShlInst:int<0,inf>, NumStoreInst:int<0,inf>, NumSubInst:int<0,inf>, NumTruncInst:int<0,inf>, NumXorInst:int<0,inf>, NumZExtInst:int<0,inf>, TotalBlocks:int<0,inf>, TotalFuncs:int<0,inf>, TotalInsts:int<0,inf>, TotalMemInst:int<0,inf>, UncondBranches:int<0,inf>, binaryConstArg:int<0,inf>, const32Bit:int<0,inf>, const64Bit:int<0,inf>, morePreds:int<0,inf>, numConstOnes:int<0,inf>, numConstZeroes:int<0,inf>, onePred:int<0,inf>, onePredOneSuc:int<0,inf>, onePredTwoSuc:int<0,inf>, oneSuccessor:int<0,inf>, returnInt:int<0,inf>, testUnary:int<0,inf>, twoEach:int<0,inf>, twoPred:int<0,inf>, twoPredOneSuc:int<0,inf>, twoSuccessor:int<0,inf>)

The Autophase observation space is a 56-dimension integer feature vector summarizing the static LLVM-IR representation. It is described in:

Haj-Ali, A., Huang, Q. J., Xiang, J., Moses, W., Asanovic, K., Wawrzynek, J., & Stoica, I. (2020). AutoPhase: Juggling HLS phase orderings in random forests with deep reinforcement learning. Proceedings of Machine Learning and Systems, 2, 70-81.

Use the Autophase observation space to access the feature vectors as an np.array, and AutophaseDict to receive them as a self-documented dictionary, keyed by the name of each feature.

The table below provides a description of each of the 56 features, with the index in which they appear in the Autophase vector, and their name as they appear in the keys of the AutophaseDict dictionary.

Index	Name	Description
0	BBNumArgsHi	Number of BB where total args for phi nodes is gt 5
1	BBNumArgsLo	Number of BB where total args for phi nodes is [1, 5]
2	onePred	Number of basic blocks with 1 predecessor
3	onePredOneSuc	Number of basic blocks with 1 predecessor and 1 successor
4	onePredTwoSuc	Number of basic blocks with 1 predecessor and 2 successors
5	oneSuccessor	Number of basic blocks with 1 successor
6	twoPred	Number of basic blocks with 2 predecessors
7	twoPredOneSuc	Number of basic blocks with 2 predecessors and 1 successor
8	twoEach	Number of basic blocks with 2 predecessors and successors
9	twoSuccessor	Number of basic blocks with 2 successors
10	morePreds	Number of basic blocks with gt. 2 predecessors
11	BB03Phi	Number of basic blocks with Phi node count in range (0, 3]
12	BBHiPhi	Number of basic blocks with more than 3 Phi nodes
13	BBNoPhi	Number of basic blocks with no Phi nodes
14	BeginPhi	Number of Phi-nodes at beginning of BB
15	BranchCount	Number of branches
16	returnInt	Number of calls that return an int
17	CriticalCount	Number of critical edges
18	NumEdges	Number of edges
19	const32Bit	Number of occurrences of 32-bit integer constants
20	const64Bit	Number of occurrences of 64-bit integer constants
21	numConstZeroes	Number of occurrences of constant 0
22	numConstOnes	Number of occurrences of constant 1
23	UncondBranches	Number of unconditional branches
24	binaryConstArg	Binary operations with a constant operand
25	NumAShrInst	Number of AShr instructions
26	NumAddInst	Number of Add instructions
27	NumAllocaInst	Number of Alloca instructions
28	NumAndInst	Number of And instructions
29	BlockMid	Number of basic blocks with instructions between [15, 500]
30	BlockLow	Number of basic blocks with less than 15 instructions
31	NumBitCastInst	Number of BitCast instructions
32	NumBrInst	Number of Br instructions
33	NumCallInst	Number of Call instructions
34	NumGetElementPtrInst	Number of GetElementPtr instructions
35	NumICmpInst	Number of ICmp instructions
36	NumLShrInst	Number of LShr instructions
37	NumLoadInst	Number of Load instructions
38	NumMulInst	Number of Mul instructions
39	NumOrInst	Number of Or instructions
40	NumPHIInst	Number of PHI instructions
41	NumRetInst	Number of Ret instructions
42	NumSExtInst	Number of SExt instructions
43	NumSelectInst	Number of Select instructions
44	NumShlInst	Number of Shl instructions
45	NumStoreInst	Number of Store instructions
46	NumSubInst	Number of Sub instructions
47	NumTruncInst	Number of Trunc instructions
48	NumXorInst	Number of Xor instructions
49	NumZExtInst	Number of ZExt instructions
50	TotalBlocks	Number of basic blocks
51	TotalInsts	Number of instructions (of all types)
52	TotalMemInst	Number of memory instructions
53	TotalFuncs	Number of non-external functions
54	ArgsPhi	Total arguments to Phi nodes
55	testUnary	Number of Unary operations

Example values:

>>> env.observation["Autophase"]
array([   0,    0,   26,   25,    1,   26,   10,    1,    8,   10,    0,
          0,    0,   37,    0,   36,    0,    2,   46,  175, 1664, 1212,
        263,   26,  193,    0,   59,    6,    0,    3,   32,    0,   36,
         10, 1058,   10,    0,  840,    0,    0,    0,    1,  416,    0,
          0,  148,   60,    0,    0,    0,   37, 3008, 2062,    9,    0,
       1262])
>>> env.observation["AutophaseDict"]
{'BBNumArgsHi': 0, 'BBNumArgsLo': 0, 'onePred': 26, 'onePredOneSuc': 25,
 'onePredTwoSuc': 1, 'oneSuccessor': 26, 'twoPred': 10, 'twoPredOneSuc': 1,
 'twoEach': 8, 'twoSuccessor': 10, 'morePreds': 0, 'BB03Phi': 0,
 'BBHiPhi': 0, 'BBNoPhi': 37, 'BeginPhi': 0, 'BranchCount': 36,
 'returnInt': 0, 'CriticalCount': 2, 'NumEdges': 46, 'const32Bit': 175,
 'const64Bit': 1664, 'numConstZeroes': 1212, 'numConstOnes': 263,
 'UncondBranches': 26, 'binaryConstArg': 193, 'NumAShrInst': 0,
 'NumAddInst': 59, 'NumAllocaInst': 6, 'NumAndInst': 0, 'BlockMid': 3,
 'BlockLow': 32, 'NumBitCastInst': 0, 'NumBrInst': 36, 'NumCallInst': 10, ... }

Inst2vec 

Observation space	Shape
Inst2vec	ndarray_list<>[0,inf])
Inst2vecEmbeddingIndices	int32_list<>[0,inf])
Inst2vecPreprocessedText	str_list<>[0,inf])

The inst2vec observation space represents LLVM-IR as sequence of embedding vectors, one per LLVM statement, using embeddings trained offline on a large corpus of LLVM-IR. It is described in:

Ben-Nun, T., Jakobovits, A. S., & Hoefler, T. (2018). Neural code comprehension: A learnable representation of code semantics. In Advances in Neural Information Processing Systems (pp. 3585-3597).

The inst2vec methodology comprises three steps, all of which are exposed as observation spaces:

Step 1: pre-processing

The LLVM-IR statements are pre-processed to remove literals, identifiers, and simplify the expressions. Using the Inst2vecPreprocessedText observation space returns a list of pre-processed strings, one per statement. It could be useful if you want to normalize the IR but then do your own embedding.

>>> env.observation["Inst2vecPreprocessedText"]
['opaque = type opaque', ..., 'ret i32 <%ID>']

Step 2: encoding

Each of the pre-processed statements is mapped to an index into a vocabulary of over 8k LLVM-IR statements. If a statement is not found in the vocabulary, it maps to a special !UNK vocabulary item. Using the Inst2vecEmbeddingIndices observation space returns a list of vocabulary indices. This would be useful if you want to learn your own embeddings using the same vocabulary, or if you want to use the inst2vec pre-trained embeddings but are processing them on a GPU where you have already allocated and copied the embedding table, minimizing transfer sizes.

>>> env.observation["Inst2vecEmbeddingIndices"]
[8564, 8564, 5, 46, ..., 257]

Step 3: embedding

The vocabulary indices are mapped to 200-D embedding vectors, producing an np.array of shape (num_statements, 200). This could be fed into an LSTM to produce a program embedding.

>>> env.observation["Inst2vec"]
array([[-0.26956588,  0.47407162, -0.36637706, ..., -0.49256894,
         0.8016193 ,  0.71160674],
       [-0.59749085,  0.63315004, -0.0308373 , ...,  0.14833118,
         0.86420786,  0.44808227],
       [-0.59749085,  0.63315004, -0.0308373 , ...,  0.14833118,
         0.86420786,  0.44808227],
       ...,
       [-0.37584195,  0.43671703, -0.5360456 , ...,  0.6030259 ,
         0.82574934,  0.6306344 ],
       [-0.59749085,  0.63315004, -0.0308373 , ...,  0.14833118,
         0.86420786,  0.44808227],
       [-0.43074277,  0.8589559 , -0.35770646, ...,  0.28785184,
         0.8492773 ,  0.8914213 ]], dtype=float32)

ProGraML 

Observation space	Shape
Programl	str_list<>[0,inf]) -> json://networkx/MultiDiGraph

The ProGraML representation is a graph-based representation of LLVM-IR which includes control-flow, data-flow, and call-flow. This graph is represented as an nx.MultiDiGraph. ProGraML is described in:

Cummins, C., Fisches, Z. V., Ben-Nun, T., Hoefler, T., & Leather, H. (2020). ProGraML: Graph-based Deep Learning for Program Optimization and Analysis. arXiv preprint arXiv:2003.10536.

Each node in the graph represents an instruction, a variable, or a constant. A text attribute on each node can be used to produce an initial node embedding. Each edge in the graph has a type and a position. There are three types of edges: call edges, data edges, and control edges. An edge position is a positive integer which encodes the operand order for data edges and the branch number for control edges. The diagram below visualizes the ProGraML graph for a small program.

In the above diagram, each blue rectangular node represents an instruction, the red diamonds are variables, the red ovals are constants, and the edges between the nodes represent relations: blue edges are control flow, red edges are data flow, and green edges are call flow.

Example usage:

>>> G = env.observation["Programl"]
>>> G
<networkx.classes.multidigraph.MultiDiGraph object at 0x7f9d8050ffa0>
>>> G.number_of_nodes()
6326
>>> G.nodes[1000]
{'block': 8, 'features': {'full_text': ['%439 = load double, double* @tmp2, align 8']}, 'function': 0, 'text': 'load', 'type': 0}
>>> G.edge[0, 1, 0]
{'flow': 2, 'position': 0}

Hardware Information 

Observation space	Shape
CpuInfo	Dict(cores_count:int, l1d_cache_count:int, l1d_cache_size:int, l1i_cache_count:int, l1i_cache_size:int, l2_cache_count:int, l2_cache_size:int, l3_cache_count:int, l3_cache_size:int, l4_cache_count:int, l4_cache_size:int, name:str_list<>[0,inf]))

Essential performance information about the host CPU can be accessed as JSON dictionary, extracted using the cpuinfo library.

This observation space is used for obtaining information about the target hardware. The values are independent of the compiler and program state.

Example usage:

>>> env.observation["CpuInfo"]
{'cores_count': 8, 'l1d_cache_count': 8, ...}

Cost Models 

Observation space	Shape
IrInstructionCount	Box(0, 9223372036854775807, (1,), int64)
IrInstructionCountO0	Box(0, 9223372036854775807, (1,), int64)
IrInstructionCountO3	Box(0, 9223372036854775807, (1,), int64)
IrInstructionCountOz	Box(0, 9223372036854775807, (1,), int64)
ObjectTextSizeBytes	Box(0, 9223372036854775807, (1,), int64)
ObjectTextSizeO0	Box(0, 9223372036854775807, (1,), int64)
ObjectTextSizeO3	Box(0, 9223372036854775807, (1,), int64)
ObjectTextSizeOz	Box(0, 9223372036854775807, (1,), int64)

Raw values from the cost models used to compute rewards.

Runtime 

🏗️ Experimental API: This runtime observation space is still in an experimental state and is not yet stable. There may be bugs and breaking changes in future releases.

Observation space	Shape
IsRunnable	int<0,1>
Runtime	float64_list<>[0,inf])

Compile and run the benchmark, returning a list of wall-clock execution times. Times are returned as floating point second values. The number of times that the benchmark is executed is determined by the LlvmEnv.runtime_observation_count property.

Not all benchmarks are runnable. To check if the current benchmark is runnable, use the IsRunnable observation space, that is 1 if the benchmark is runnable, else 0. Requesting the Runtime observation space for a benchmark that is not runnable will return an empty list.

Build Time 

🏗️ Experimental API: This compiler time observation space is still in an experimental state and is not yet stable. There may be bugs and breaking changes in future releases.

Observation space	Shape
IsBuildable	int<0,1>
Buildtime	float64_list<>[0,inf])

Compile the benchmark to a binary and return a list of a single wall-clock build time as seconds.

Not all benchmarks are build. To check if the current benchmark is buildable, use the IsBuildable observation space, that is 1 if the benchmark is buildable, else 0. Requesting the Buildtime observation space for a benchmark that is not buildable will return an empty list.

Reward Spaces 

The goal of CompilerGym tasks is to minimize a cost function \(C(s)\) which takes as input the current program state \(s\) and produces a real-valued cost. At a given timestep, reward is the reduction in cost from the previous state \(s_{t-1}\) to the current state \(s_t\):

\[R(s_t) = C(s_{t-1}) - C(s_t)\]

Reward can be normalized using the cost of the program before any optimizations are applied as the scaling factor:

\[R(s_t) = \frac{C(s_{t-1}) - C(s_t)}{C(s_{t=0})}\]

Normalized rewards are indicated by a Norm suffix on the reward space name.

Alternatively, rewards can be normalized by comparison to a baseline policy. The baseline policies are derived from existing LLVM optimization levels: -O3, and -Oz. When a baseline policy is used, reward is the reduction in cost from the previous state, scaled by the reduction in cost achieved by applying the baseline policy to produce a baseline state \(s_b\):

\[R(s_t) = \frac{C(s_{t-1}) - C(s_t)}{{C(s_{t=0})} - C(s_b)}\]

These reward spaces are indicated by the baseline policy name as a suffix, e.g. the reward space IrInstructionCountO3 is IrInstructionCount reward normalized to the -O3 baseline policy.

IR Instruction Count 

Reward space	Baseline Policy	Range	Success Threshold	Deterministic?	Platform dependent?
IrInstructionCount		(-inf, inf)		Yes	No
IrInstructionCountNorm		(-inf, 1.0)		Yes	No
IrInstructionCountO3	`-O3`	(-inf, inf)	1.0	Yes	No
IrInstructionCountOz	`-Oz`	(-inf, inf)	1.0	Yes	No

The number of LLVM-IR instructions in the program can be used as a reward signal either using the raw change in instruction count (IrInstructionCount), or by scaling the changes in instruction count to the improvement made by the baseline -O3 or -Oz LLVM pipelines. LLVM-IR instruction count is fast to evaluate, deterministic, and platform-independent, but is not a measure of true codesize reduction as it does not take into account the effects of lowering.

Codesize 

Reward space	Baseline Policy	Range	Success Threshold	Deterministic?	Platform dependent?
ObjectTextSizeBytes		(-inf, inf)		Yes	Yes
ObjectTextSizeNorm		(-inf, 1.0)		Yes	Yes
ObjectTextSizeO3	`-O3`	(-inf, inf)	1.0	Yes	Yes
ObjectTextSizeOz	`-Oz`	(-inf, inf)	1.0	Yes	Yes

The ObjectTextSizeBytes reward signal returns the size of the .TEXT section of the module after lowering to an object file, before linking. This is more expensive to compute than IrInstructionCount. The object file code size depends on the target platform, see CompilerEnv.compiler_version.

Action Space 

The LLVM action space exposes the selection of semantics-preserving optimization transforms as a discrete space.

Action	Description
-add-discriminators	Add DWARF path discriminators
-adce	Aggressive Dead Code Elimination
-aggressive-instcombine	Combine pattern based expressions
-alignment-from-assumptions	Alignment from assumptions
-always-inline	Inliner for always_inline functions
-argpromotion	Promote ‘by reference’ arguments to scalars
-attributor	Deduce and propagate attributes
-barrier	A No-Op Barrier Pass
-bdce	Bit-Tracking Dead Code Elimination
-break-crit-edges	Break critical edges in CFG
-simplifycfg	Simplify the CFG
-callsite-splitting	Call-site splitting
-called-value-propagation	Called Value Propagation
-canonicalize-aliases	Canonicalize aliases
-consthoist	Constant Hoisting
-constmerge	Merge Duplicate Global Constants
-constprop	Simple constant propagation
-coro-cleanup	Lower all coroutine related intrinsics
-coro-early	Lower early coroutine intrinsics
-coro-elide	Coroutine frame allocation elision and indirect calls replacement
-coro-split	Split coroutine into a set of functions driving its state machine
-correlated-propagation	Value Propagation
-cross-dso-cfi	Cross-DSO CFI
-deadargelim	Dead Argument Elimination
-dce	Dead Code Elimination
-die	Dead Instruction Elimination
-dse	Dead Store Elimination
-reg2mem	Demote all values to stack slots
-div-rem-pairs	Hoist/decompose integer division and remainder
-early-cse-memssa	Early CSE w/ MemorySSA
-elim-avail-extern	Eliminate Available Externally Globals
-ee-instrument	Instrument function entry/exit with calls to e.g. mcount()(pre inlining)
-flattencfg	Flatten the CFG
-float2int	Float to int
-forceattrs	Force set function attributes
-inline	Function Integration/Inlining
-insert-gcov-profiling	Insert instrumentation for GCOV profiling
-gvn-hoist	Early GVN Hoisting of Expressions
-gvn	Global Value Numbering
-globaldce	Dead Global Elimination
-globalopt	Global Variable Optimizer
-globalsplit	Global splitter
-guard-widening	Widen guards
-hotcoldsplit	Hot Cold Splitting
-ipconstprop	Interprocedural constant propagation
-ipsccp	Interprocedural Sparse Conditional Constant Propagation
-indvars	Induction Variable Simplification
-irce	Inductive range check elimination
-infer-address-spaces	Infer address spaces
-inferattrs	Infer set function attributes
-inject-tli-mappings	Inject TLI Mappings
-instsimplify	Remove redundant instructions
-instcombine	Combine redundant instructions
-instnamer	Assign names to anonymous instructions
-jump-threading	Jump Threading
-lcssa	Loop-Closed SSA Form Pass
-licm	Loop Invariant Code Motion
-libcalls-shrinkwrap	Conditionally eliminate dead library calls
-load-store-vectorizer	Vectorize load and Store instructions
-loop-data-prefetch	Loop Data Prefetch
-loop-deletion	Delete dead loops
-loop-distribute	Loop Distribution
-loop-fusion	Loop Fusion
-loop-guard-widening	Widen guards (within a single loop, as a loop pass)
-loop-idiom	Recognize loop idioms
-loop-instsimplify	Simplify instructions in loops
-loop-interchange	Interchanges loops for cache reuse
-loop-load-elim	Loop Load Elimination
-loop-predication	Loop predication
-loop-reroll	Reroll loops
-loop-rotate	Rotate Loops
-loop-simplifycfg	Simplify loop CFG
-loop-simplify	Canonicalize natural loops
-loop-sink	Loop Sink
-loop-reduce	Loop Strength Reduction
-loop-unroll-and-jam	Unroll and Jam loops
-loop-unroll	Unroll loops
-loop-unswitch	Unswitch loops
-loop-vectorize	Loop Vectorization
-loop-versioning-licm	Loop Versioning For LICM
-loop-versioning	Loop Versioning
-loweratomic	Lower atomic intrinsics to non-atomic form
-lower-constant-intrinsics	Lower constant intrinsics
-lower-expect	Lower ‘expect’ Intrinsics
-lower-guard-intrinsic	Lower the guard intrinsic to normal control flow
-lowerinvoke	Lower invoke and unwind, for unwindless code generators
-lower-matrix-intrinsics	Lower the matrix intrinsics
-lowerswitch	Lower SwitchInst’s to branches
-lower-widenable-condition	Lower the widenable condition to default true value
-memcpyopt	MemCpy Optimization
-mergefunc	Merge Functions
-mergeicmps	Merge contiguous icmps into a memcmp
-mldst-motion	MergedLoadStoreMotion
-sancov	Pass for instrumenting coverage on functions
-name-anon-globals	Provide a name to nameless globals
-nary-reassociate	Nary reassociation
-newgvn	Global Value Numbering
-pgo-memop-opt	Optimize memory intrinsic using its size value profile
-partial-inliner	Partial Inliner
-partially-inline-libcalls	Partially inline calls to library functions
-post-inline-ee-instrument	Instrument function entry/exit with calls to e.g. mcount()” “(post inlining)
-functionattrs	Deduce function attributes
-mem2reg	Promote Memory to ” “Register
-prune-eh	Remove unused exception handling info
-reassociate	Reassociate expressions
-redundant-dbg-inst-elim	Redundant Dbg Instruction Elimination
-rpo-functionattrs	Deduce function attributes in RPO
-rewrite-statepoints-for-gc	Make relocations explicit at statepoints
-sccp	Sparse Conditional Constant Propagation
-slp-vectorizer	SLP Vectorizer
-sroa	Scalar Replacement Of Aggregates
-scalarizer	Scalarize vector operations
-separate-const-offset-from-gep	Split GEPs to a variadic base and a constant offset for better CSE
-simple-loop-unswitch	Simple unswitch loops
-sink	Code sinking
-speculative-execution	Speculatively execute instructions
-slsr	Straight line strength reduction
-strip-dead-prototypes	Strip Unused Function Prototypes
-strip-debug-declare	Strip all llvm.dbg.declare intrinsics
-strip-nondebug	Strip all symbols, except dbg symbols, from a module
-strip	Strip all symbols from a module
-tailcallelim	Tail Call Elimination
-mergereturn	Unify function exit nodes

FAQ 

Is this really a sequential decision process?

Yes. Compilers frequently package individual transformations as “optimization passes” which are then applied in a sequential order. Usually this order is fixed (e.g. real world example). The CompilerGym LLVM environment replaces that fixed order with a sequential decision process where any pass may be applied at any stage.

When does the environment consider an episode “done”?

The compiler itself doesn’t have a signal for termination. Actions are like rewrite rules, it is up to the user to decide when no more improvement can be achieved from further rewrites. E.g. for simple random search we can use “patience”. The only exception is if the compiler crashes, or the code ends up in an unexpected state - we have to abort. This happens.

How do I run this on my own program?

By compiling your program to an unoptimized LLVM bitcode file. This can be done automatically for C/C++ programs using the env.make_benchmark() API, or you can do this yourself using clang:

$ clang -emit-llvm -c -O0 -Xclang -disable-O0-optnone -Xclang -disable-llvm-passes myapp.c

Then pass the path of the generated .bc file to the CompilerGym command-line tools using the –benchmark flag, e.g.

$ bazel run -c opt //compiler_gym/bin:random_search -- \
    --env=llvm-ic-v0 \
    --benchmark=file:///$PWD/myapp.bc

Should I always try different actions?

Some optimization actions may be called multiple times after other actions. An example of this is dead code elimination, which can be used to “clean up mess” generated from a previous action. So repeating the same action in different context can bring improvements.