In the “abstract” droidlet agent, the controller chooses whether to put Tasks on the Task Stack based on the memory state. In the locobot agent and the craftassist agent subclasses, it consists of

  • a DSL

  • a neural semantic parser, which translates natural language into partially specified programs over the DSL

  • a Dialogue Manager, Dialogue Stack, and Dialogue objects.

  • the Intepreter, a special Dialogue Object that takes partially specified programs from the DSL and fully specifies them using the Memory

  • a set of “default behaviors”, run randomly when the Task Stack and Dialogue Stack are empty

Dialogue Objects behave similarly to Tasks , except they only affect the agent’s environment directly by causing the agent to issue utterances (or indirectly by pushing Task Objects onto the Task Stack). In particular, each Dialogue Object has a .step() that is run when it is the highest priority object on the Stack. Dialogue Objects, like Task Objects are modular: a learned model or a heuristic can mediate the Dialogue Object, and the same model or heuristic script can be used across many different agents.

The Dialogue Manager puts Dialogue Objects on the Dialogue Stack, either on its own, or at the request of a Dialogue Object. In the locobot and craftassist agent, the manager is powered by a neural semantic parser.

A sketch of the controller’s operation is then

if new utterance from human:
     logical_form = semantic_parser.translate(new command)
     if the logical_form denotes a command:
         push Interpreter(logical_form, agent_memory) onto the DialogueStack
     else if the logical_form denotes some other kind of dialogue the agent can handle:
         push some other appropriate DialogueObject on the DialogueStack
if the Dialogue Stack is not empty:
     step the highest priority DialogueObject
if TaskStack is empty:
     maybe place default behaviors on the stack

Dialogue Stack and Manager

The Dialogue Stack holds Dialogue Objects, and steps them.

class base_agent.dialogue_stack.DialogueStack(agent, memory)[source]

This class organizes and steps DialogueObjects.


Append a dialogue_object to stack


clear current stack


Get the item on top of the DialogueStack


Process and step through the top-of-stack dialogue object.

The Dialogue Manager operates the Stack, and chooses whether to place Dialogue objects

class base_agent.nsp_dialogue_manager.NSPDialogueManager(agent, dialogue_object_classes, opts)[source]

Dialogue manager driven by neural network.

  • ~NSPDialogueManager.dialogue_objects (dict) – Dictionary specifying the DialogueObject class for each dialogue type. Keys are dialogue types. Values are corresponding class names. Example dialogue objects: {‘interpreter’: MCInterpreter, ‘get_memory’: GetMemoryHandler, ‘put_memory’: … }

  • ~NSPDialogueManager.safety_words (List[str]) – Set of blacklisted words or phrases. Commands containing these are automatically filtered out.

  • ~NSPDialogueManager.botGreetings (dict) – Different types of greetings that trigger scripted responses. Example: { “hello”: [“hi bot”, “hello”] }

  • ~NSPDialogueManager.model (TTADBertModel) – Semantic Parsing model that takes text as input and outputs a logical form. To use a new model here, ensure that the subfolder directory structure mirrors the current model/dataset directories. See TTADBertModel.

  • ~NSPDialogueManager.ground_truth_actions (dict) – A key-value with ground truth logical forms. These will be queried first (via exact string match), before running the model.

  • ~NSPDialogueManager.dialogue_object_parameters (dict) – Set the parameters for dialogue objects. Sets the agent, memory and dialogue stack.

  • agent – a droidlet agent, eg. CraftAssistAgent

  • dialogue_object_classes (dict) – Dictionary specifying the DialogueObject class for each dialogue type. See dialogue_objects definition above.

  • opts (argparse.Namespace) –

    Parsed command line arguments specifying parameters in agent.

    param –nsp_models_dir

    Path to directory containing all files necessary to load and run the model, including args, tree mappings and the checkpointed model. Semantic parsing models used by current project are in ttad_bert_updated. eg. semantic parsing model is ttad_bert_updated/caip_test_model.pth

    param –nsp_data_dir

    Path to directory containing all datasets used by the NSP model. Note that this data is not used in inference, rather we load from the ground truth data directory.

    param –ground_truth_data_dir

    Path to directory containing ground truth datasets loaded by agent at runtime. Option to include a file for blacklisted words safety.txt, a class for greetings greetings.json and .txt files with text, logical_form pairs in datasets/.

    See ArgumentParser for full list of command line options.

get_logical_form(s: str, model, chat_as_list=False) → Dict[source]

Get logical form output for a given chat command. First check the ground truth file for the chat string. If not in ground truth, query semantic parsing model to get the output.

  • s (str) – Input chat provided by the user.

  • model (TTADBertModel) – Semantic parsing model, pre-trained and loaded by agent


Logical form representation of the task. See paper for more

in depth explanation of logical forms:

Return type



>>> get_logical_form("destroy this", model)
    "dialogue_type": "HUMAN_GIVE_COMMAND",
    "action_sequence": [{
        "action_type": "DESTROY",
        "reference_object": {
            "filters": {"contains_coreference": "yes"},
            "text_span": [0, [1, 1]]
handle_logical_form(speaker: str, d: Dict, chatstr: str) → Optional[dialogue_object.DialogueObject][source]

Return the appropriate DialogueObject to handle an action dict d d should have spans filled (via process_spans).

maybe_get_dialogue_obj(chat: Tuple[str, str]) → Optional[dialogue_object.DialogueObject][source]

Process a chat and maybe modify the dialogue stack.


chat (Tuple[str, str]) – Incoming chat from a player. Format is (speaker, chat), eg. (“player1”, “build a red house”)


DialogueObject or empty if no action is needed.

Semantic Parser

The training of the semantic parsing model we use is described in detail here; the interface is

class base_agent.ttad.ttad_transformer_model.query_model.TTADBertModel(model_dir, data_dir, model_name='caip_test_model')[source]

TTAD model class that loads a pretrained model and runs inference in the agent.

  • ~TTADBertModel.tokenizer (str) – Pretrained tokenizer used to tokenize input. Runs end-to-end tokenization, eg. split punctuation, BPE.

  • ~TTADBertModel.dataset (CAIPDataset) – CAIP (CraftAssist Instruction Parsing) Dataset. Note that this is empty during inference.

  • ~TTADBertModel.encoder_decoder (EncoderDecoderWithLoss) – Transformer model class. See

  • model_dir (str) – Path to directory containing all files necessary to load and run the model, including args, tree mappings and the checkpointed model. Semantic parsing models used by current project are in ttad_bert_updated. eg. semantic parsing model is ttad_bert_updated/caip_test_model.pth

  • data_dir (str) – Path to directory containing all datasets used by the NSP model. Note that this data is not used in inference, rather we load from the ground truth data directory.

parse(chat, noop_thres=0.95, beam_size=5, well_formed_pen=100.0)[source]

Given an incoming chat, query the parser and return a logical form. Uses beam search decoding, see beam_search


chat (str) – Preprocessed chat command from a player. Used as text input to parser.


Logical form.

Return type


Dialogue Objects

The generic Dialogue Object is

class base_agent.dialogue_objects.dialogue_object.DialogueObject(agent, memory, dialogue_stack, max_steps=50)[source]

DialogueObject class controls the agent’s use of the dialogue stack.


agent: the agent process memory: agent’s memory dialogue_stack : A stack on which dialogue objects are placed finished: whether this object has finished processing awaiting_response: whether this object is awaiting the speakers response to a question max_steps: finish after this many steps to avoid getting stuck current_step: current step count progeny_data: data from progeny DialogueObjects, for example used to answer a clarification


class DummyGetMemoryHandler(DialogueObject):
    def __init__(self, speaker_name: str, action_dict: Dict, **kwargs):
        # initialize everything
        self.speaker_name = speaker_name
        self.action_dict = action_dict

    def step(self) -> Tuple[Optional[str], Any]:
        # check for dialogue type "GET_MEMORY"
        assert self.action_dict["dialogue_type"] == "GET_MEMORY"
        memory_type = self.action_dict["filters"]["memory_type"]
        if memory_type == "AGENT" or memory_type == "REFERENCE_OBJECT":
            return self.handle_reference_object() # handle these two by writing them to memory
            raise ValueError("Unknown memory_type={}".format(memory_type))
        # mark as finished
        self.finished = True

Check if the object is finished processing.


the Dialogue Stack runs this objects .step();


string to be uttered by the agent (or None) progeny_data: data for the parent of this object (or None)

Return type


A DialogueObject’s main method is .step(), Some others:

class base_agent.dialogue_objects.dialogue_object.Say(response_options, **kwargs)[source]

This class represents a sub-type of DialogueObject to say / send a chat to the user.


response_options – a list of responses to pick the final response from

class base_agent.dialogue_objects.dialogue_object.AwaitResponse(wait_time=800, **kwargs)[source]

This class represents a sub-type of DialogueObject to await a response from the user.

  • init_time – initial time

  • response – the response to the question asked

  • wait_time – how long should we await the response

  • awaiting_response – a flag to mark where we are awaiting the response

class base_agent.dialogue_objects.dialogue_object.BotStackStatus(**kwargs)[source]

This class represents a sub-type of the DialogueObject to answer questions about the current status of the bot, to the user.


ing_mapping – A map from task name to present continuous tense language.

class base_agent.dialogue_objects.dialogue_object.ConfirmTask(question, tasks, **kwargs)[source]

This class represents a sub-type of the DialogueObject to ask a clarification question about something.

  • question – the question to ask the user

  • tasks – list of task objects

  • asked – flag to denote whether the clarification has been asked for

class base_agent.dialogue_objects.dialogue_object.ConfirmReferenceObject(reference_object, **kwargs)[source]

This class represents a sub-type of the DialogueObject to confirm if the reference object is correct.

  • bounds – general area of reference object to point at

  • pointed – flag determining whether the agent pointed at the area

  • asked – flag determining whether the confirmation was asked for


The Interpreter is responsible for using the world state (via memory) and a natural language utterance that has been parsed into a logical form over the agent’s DSL from the semantic parser to choose a Task to put on the Task Stack. The locobot and craftassist Interpreters are not the same, but the bulk of the work is done by the shared subinterpreters (in the files * here. The subinterpreters, registered in the main Interpreter here (and for the specialized versions here and here), roughly follow the structure of the DSL. This arrangement is to allow replacing the (currently heuristic) subinterpreters with learned versions or specializing them to new agents.

class base_agent.dialogue_objects.interpreter.Interpreter(speaker: str, action_dict: Dict, **kwargs)[source]
This class processes incoming chats and modifies the task stack.
Handlers should add/remove/reorder tasks on the stack, but not execute them.
Most of the logic of the interpreter is run in the subinterpreters or task handlers.
The keyword args in __init__ match the base DialogueObject class
  • speaker – The name of the player/human/agent who uttered the chat resulting in this interpreter

  • action_dict – The logical form, e.g. returned by a semantic parser

Keyword Arguments
  • agent – the agent running this Interpreter

  • memory – the agent’s memory

  • dialogue_stack – a DialogueStack object where this Interpreter object will live