Agents

We instantiate a droidlet agent on a Locobot and an agent in Minecraft using the Craftassist framework (the droidlet project evolved from Craftassist).

Locobot

Locobot Perception

We have a high-level pipeline that runs many of the perception handlers that exist. This pipeline is split into Perception and SlowPerception.

Pipelines

Perception only consists of the fast processing modules, that can be run in the main thread and likely need to be run at a high frequency. SlowPerception consists of heavier modules such as Face Recognition, Object Detection and can be run at a much lower frequency in a background thread.

class locobot.agent.perception.Perception(agent, model_data_dir)[source]

Home for all perceptual modules used by the LocobotAgent.

It provides a multiprocessing mechanism to run the more compute intensive perceptual models (for example our object detector) as a separate process.

Parameters
  • agent (LocoMCAgent) – reference to the LocobotAgent

  • model_data_dir (string) – path for all perception models (default: ~/locobot/agent/models/perception)

log(rgb_depth, detections, humans, old_rgb_depth, old_xyz)[source]

Log all relevant data from the perceptual models for the dashboard.

All the data here is sent to the dashboard using a socketio emit event. This data is then used by the dashboard for different debugging visualizations. Add any data that you would want to fetch on the dashboard here.

Parameters
  • rgb_depth (RGBDepth) – the current RGBDepth frame. This frame might be different from

  • old_rgb_depth (RGBDepth) –

  • is the RGBDepth frame for which SlowPerception has been run (which) –

  • detections (list[Detections]) – list of all detections

  • humans (list[Human]) – list of all humans detected

  • old_rgb_depth – RGBDepth frame for which detections and humans are being sent.

  • is the frame for which SlowPerception has been run. (This) –

  • old_xyz (list[floats]) – the (x,y,yaw) of the robot at the time of old_rgb_depth

perceive(force=False)[source]

Called by the core event loop for the agent to run all perceptual models and save their state to memory. It fetches the results of SlowPerception if they are ready.

Parameters

force (boolean) – set to True to force waiting on the SlowPerception models to finish, and execute all perceptual models to execute sequentially (doing that is a good debugging tool) (default: False)

setup_vision_handlers()[source]

Setup all vision handlers, by defining an attribute dict of different perception handlers.

class locobot.agent.perception.SlowPerception(model_data_dir)[source]

Home for all slow perceptual modules used by the LocobotAgent.

It is run as a separate process and has all the compute intensive perceptual models.

Parameters

model_data_dir (string) – path for all perception models (default: ~/locobot/agent/models/perception)

perceive(rgb_depth)[source]

run all percetion handlers on the current RGBDepth frame.

Parameters

rgb_depth (RGBDepth) – input frame to run all perception handlers on.

Returns

input frame that all perception handlers were run on. detections (list[Detections]): list of all detections humans (list[Human]): list of all humans detected

Return type

rgb_depth (RGBDepth)

setup_vision_handlers()[source]

Setup all vision handlers, by defining an attribute dict of different perception handlers.

Components

These pipelines are powered by components that can be stringed together in arbitrary ways, to create your own custom pipeline:

class locobot.agent.perception.handlers.InputHandler(agent, read_from_camera=True)[source]

Fetches all input sensor data from the robot.

Parameters
  • agent (LocoMCAgent) – reference to the agent instance

  • read_from_camera (boolean) – boolean to enable/disable reading from the camera sensor and possibly reading from a local video instead (default: True)

handle()[source]

Implement this method to hold the core execution logic.

class locobot.agent.perception.handlers.DetectionHandler(model_data_dir)[source]

Class for object detector.

We use a modified Mask R-CNN with an additional head that predicts object properties.

Parameters

model_data_dir (string) – path to the model directory

handle(rgb_depth)[source]

the inference logic for the handler lives here.

Parameters

rgb_depth (RGBDepth) – the current input frame to run inference on.

Returns

list of detections found

Return type

detections (list[Detections])

class locobot.agent.perception.handlers.HumanPoseHandler(model_data_dir)[source]

Class for Human Pose estimation.

We use a keypoint estimator.

Parameters

model_data_dir (string) – path to the model directory

handle(rgb_depth)[source]

Implement this method to hold the core execution logic.

class locobot.agent.perception.handlers.FaceRecognitionHandler(face_ids_dir=None)[source]

Class for Face Recognition models.

We use a keypoint estimator.

Parameters
  • face_ids_dir (string) – path to faces used to seed the face recognizer with

  • silent (boolean) – boolean to suppress all self.draw() calls

detect_faces(rgb_depth)[source]

will find all of the faces in a given image and label them if it knows what they are then save them in the object.

Parameters

rgb_depth – the captured picture by Locobot camera

get_encoded_faces()[source]

looks through the faces folder and encodes all the faces.

Returns

dict of (name, image encoded)

static get_facenet_boxes(img)[source]

get the face locations from facenet module.

get_union(locations1, locations2)[source]

Get the union of two sets of bounding boxes.

handle(rgb_depth)[source]

Implement this method to hold the core execution logic.

static overlap(box1, box2)[source]

Check if two bounding boxes overlap or not each box’s coordinates is: (y1, x2, y2, x1)

class locobot.agent.perception.handlers.LaserPointerHandler[source]

Identifies an especially prepared laser pointer in an image to use it for proprioception.

static get_hsv_laser_mask(image)[source]

get the masked image with the laser dot using HSV color model.

static get_laser_point_center(mask)[source]

get the center of a masked laser pointer in screen coordinate.

handle(rgb_depth)[source]

Implement this method to hold the core execution logic.

class locobot.agent.perception.handlers.TrackingHandler[source]

Class for real-time 2D object tracking.

We use the results of DetectionHandler and the norfair tracking library (https://github.com/tryolabs/norfair)

handle(rgb_depth, detections)[source]

run tracker on the current rgb_depth frame for the detections found

class locobot.agent.perception.handlers.MemoryHandler(agent)[source]

Class for saving the state of the world parsed by all perceptual models to memory.

The MemoryHandler also performs object reidentification and is responsible for maintaining a consistent state of the world using the output of all other perception handlers.

Parameters

agent (LocoMCAgent) – reference to the agent.

handle(rgb, objects, dedupe=True)[source]

run the memory handler for the current rgb, objects detected.

This is also where each WorldObject is assigned a unique entity id (eid).

Parameters
  • rgb (np.array) – current RGB image for which the handlers are being run

  • objects (list[WorldObject]) – a list of all WorldObjects detected

  • dedupe (boolean) – boolean to indicate whether to deduplicate objects or not. If False, all objects

  • considered to be new/novel objects that the agent hasn't seen before. If True (are) –

  • handler tries to (the) –

  • whether it has seen a WorldObject before and updates it if it has. (identify) –

is_novel(rgb, world_obj, dedupe)[source]

this is long-term tracking (not in-frame). it does some feature matching to figure out if we’ve seen this exact instance of object before.

It uses the cosine similarity of conv features and (separately), distance between two objects

Data Structures

The components use some data structure classes to create metadata such as object information and have convenient functions registered on these classes

class locobot.agent.perception.RGBDepth(rgb, depth, pts)[source]

Class for the current RGB, depth and point cloud fetched from the robot.

Parameters
  • rgb (np.array) – RGB image fetched from the robot

  • depth (np.array) – depth map fetched from the robot

  • pts (list[(x,y,z)]) – list of x,y,z coordinates of the pointcloud corresponding

  • the rgb and depth maps. (to) –

get_coords_for_point(point)[source]

fetches xyz from the point cloud in pyrobot coordinates and converts it to canonical world coordinates.

class locobot.agent.perception.WorldObject(label, center, rgb_depth, mask=None, xyz=None)[source]
get_xyz()[source]

returns xyz in canonical world coordinates.

class locobot.agent.perception.Human(rgb_depth: locobot.agent.perception.handlers.core.RGBDepth, keypoints: locobot.agent.perception.handlers.human_pose.HumanKeypoints)[source]

Instantiation of the WorldObject that is used by the human pose estimator.

class locobot.agent.perception.HumanKeypoints(nose, left_eye, right_eye, left_ear, right_ear, left_shoulder, right_shoulder, left_elbow, right_elbow, left_wrist, right_wrist, left_hip, right_hip, left_knee, right_knee, left_ankle, right_ankle)
property left_ankle

Alias for field number 15

property left_ear

Alias for field number 3

property left_elbow

Alias for field number 7

property left_eye

Alias for field number 1

property left_hip

Alias for field number 11

property left_knee

Alias for field number 13

property left_shoulder

Alias for field number 5

property left_wrist

Alias for field number 9

property nose

Alias for field number 0

property right_ankle

Alias for field number 16

property right_ear

Alias for field number 4

property right_elbow

Alias for field number 8

property right_eye

Alias for field number 2

property right_hip

Alias for field number 12

property right_knee

Alias for field number 14

property right_shoulder

Alias for field number 6

property right_wrist

Alias for field number 10

class locobot.agent.perception.Detection(rgb_depth: locobot.agent.perception.handlers.core.RGBDepth, class_label, properties, mask, bbox, face_tag=None, center=None, xyz=None)[source]

Instantiation of the WorldObject that is used by the detector.

Locobot PyRobot interface

We have a RemoteLocobot object that runs on the robot, and marshals data back and forth from the robot to the server. Additionally, on the server-side, we have a LoCoBotMover class that communicates with RemoteLocobot and provides a low-level API to the robot.

class locobot.robot.RemoteLocobot(backend='locobot', backend_config=None)[source]

PyRobot interface for the Locobot.

Parameters
  • backend (string) – the backend for the Locobot (“habitat” for the locobot in Habitat, and “locobot” for the physical LocoBot)

  • (default – locobot)

  • backend_config (dict) – the backend config used for connecting to Habitat (default: None)

close_gripper()[source]

Commands gripper to close fully.

command_finished()[source]

Returns whether previous executed command finished execution.

return: command execution state [True for finished execution, False for still executing] rtype: bool

dip_pix_to_3dpt()[source]

Return the point cloud at current time step in robot base frame.

Returns

point coordinates (shape: \([N, 3]\)) in metric unit

Return type

list

get_base_state(state_type)[source]

Returns the base pose of the robot in the (x,y, yaw) format as computed either from Wheel encoder readings or Visual-SLAM.

Parameters

state_type (string) – Requested state type. Ex: Odom, SLAM, etc

Returns

pose of the form [x, y, yaw]

Return type

list

get_camera_state()[source]

Return the current pan and tilt joint angles of the robot camera in radian.

Returns

A list the form [pan angle, tilt angle]

Return type

list

get_current_pcd(in_cam=False, in_global=False)[source]

Return the point cloud at current time step.

Parameters

in_cam (bool) – return points in camera frame, otherwise, return points in base frame

Returns

tuple (pts, colors)

pts: point coordinates (shape: \([N, 3]\)) in metric unit

colors: rgb values (shape: \([N, 3]\))

Return type

tuple(list, list)

get_depth()[source]

Returns the depth image perceived by the camera.

Returns

depth image in meters, dtype-> float32

Return type

np.ndarray or None

get_depth_bytes()[source]

Returns the depth image perceived by the camera.

Returns

depth image in meters, dtype-> bytes

Return type

np.ndarray or None

get_end_eff_pose()[source]

Return the end effector pose w.r.t ‘ARM_BASE_FRAME’ :return:tuple (trans, rot_mat, quat)

trans: translational vector in metric unit (shape: \([3, 1]\))

rot_mat: rotational matrix (shape: \([3, 3]\))

quat: rotational matrix in the form of quaternion (shape: \([4,]\))

Return type

tuple (list, list, list)

get_gripper_state()[source]

Return the gripper state.

Returns

state state = -1: unknown gripper state state = 0: gripper is fully open state = 1: gripper is closing state = 2: there is an object in the gripper state = 3: gripper is fully closed

Return type

int

get_img_resolution()[source]

return height and width

get_intrinsics()[source]

Returns the intrinsic matrix of the camera.

Returns

the intrinsic matrix (shape: \([3, 3]\))

Return type

list

get_joint_positions()[source]

Return arm joint angles order-> base_join index 0, wrist joint index -1.

Returns

joint_angles in radians

Return type

list

get_joint_velocities()[source]

Return the joint velocity order-> base_join index 0, wrist joint index -1.

Returns

joint_angles in rad/sec

Return type

list

get_pan()[source]

Return the current pan joint angle of the robot camera.

:return:Pan joint angle in radian :rtype: float

get_pcd_data()[source]

Gets all the data to calculate the point cloud for a given rgb, depth frame.

get_rgb()[source]

Returns the RGB image perceived by the camera.

Returns

image in the RGB, [h,w,c] format, dtype->uint8

Return type

np.ndarray or None

get_rgb_bytes()[source]

Returns the RGB image perceived by the camera.

Returns

image in the RGB, [h,w,c] format, dtype->bytes

Return type

np.ndarray or None

get_tilt()[source]

Return the current tilt joint angle of the robot camera in radian.

:return:tilt joint angle :rtype: float

get_transform(src_frame, dst_frame)[source]

Return the transform from the src_frame to dest_frame.

Parameters
  • src_frame (str) – source frame

  • dest_frame (str) – destination frame

:return:tuple (trans, rot_mat, quat )

trans: translational vector (shape: \([3, 1]\)) rot_mat: rotational matrix (shape: \([3, 3]\)) quat: rotational matrix in the form of quaternion (shape: \([4,]\))

Return type

tuple(np.ndarray, np.ndarray, np.ndarray)

go_home()[source]

Moves the robot base to origin point: x, y, yaw 0, 0, 0.

go_to_absolute(xyt_position, use_map=False, close_loop=True, smooth=False)[source]

Moves the robot base to given goal state in the world frame.

Parameters
  • xyt_position (list or np.ndarray) – The goal state of the form (x,y,yaw) in the world (map) frame.

  • use_map (bool) – When set to “True”, ensures that controller is using only free space on the map to move the robot.

  • close_loop (bool) – When set to “True”, ensures that controller is operating in open loop by taking account of odometry.

  • smooth (bool) – When set to “True”, ensures that the motion leading to the goal is a smooth one.

go_to_relative(xyt_position, use_map=False, close_loop=True, smooth=False)[source]

Moves the robot base to the given goal state relative to its current pose.

Parameters
  • xyt_position (list or np.ndarray) – The relative goal state of the form (x,y,yaw)

  • use_map (bool) – When set to “True”, ensures that controller is using only free space on the map to move the robot.

  • close_loop (bool) – When set to “True”, ensures that controller is operating in open loop by taking account of odometry.

  • smooth (bool) – When set to “True”, ensures that the motion leading to the goal is a smooth one.

grasp(dims=[(240, 480), (100, 540)])[source]
Parameters

dims (list) – List of tuples of min and max indices of the image axis to be considered for grasp search

Returns

move_ee_xyz(displacement, eef_step=0.005, plan=False)[source]

Keep the current orientation of arm fixed, move the end effector of in {xyz} directions.

Parameters
  • displacement (list or np.ndarray) – (delta_x, delta_y, delta_z) in metric unit

  • eef_step (float) – resolution (m) of the interpolation on the cartesian path

  • plan (bool) – use moveit the plan a path to move to the desired pose. If False, it will do linear interpolation along the path, and simply use IK solver to find the sequence of desired joint positions and then call set_joint_positions

open_gripper()[source]

Commands gripper to open fully.

pix_to_3dpt(rs, cs, in_cam=False)[source]

Get the 3D points of the pixels in RGB images in metric unit.

Parameters
  • rs (list or np.ndarray) – rows of interest in the RGB image. It can be a list or 1D numpy array which contains the row indices.

  • cs (list or np.ndarray) – columns of interest in the RGB image. It can be a list or 1D numpy array which contains the column indices.

  • in_cam (bool) – return points in camera frame, otherwise, return points in base frame

Returns

tuple (pts, colors)

pts: point coordinates in metric unit (shape: \([N, 3]\))

colors: rgb values (shape: \([N, 3]\))

Return type

tuple(list, list)

reset()[source]

This function resets the pan and tilt joints of robot camera by actuating them to their home configuration [0.0, 0.0].

set_ee_pose(position, orientation, plan=False)[source]

Commands robot arm to desired end-effector pose (w.r.t. ‘ARM_BASE_FRAME’). Computes IK solution in joint space and calls set_joint_positions.

Parameters
  • position (list or np.ndarray) – position of the end effector in metric (shape: \([3,]\))

  • orientation (list or np.ndarray) – orientation of the end effector (can be rotation matrix, euler angles (yaw, pitch, roll), or quaternion) (shape: \([3, 3]\), \([3,]\) or \([4,]\)) The convention of the Euler angles here is z-y’-x’ (intrinsic rotations), which is one type of Tait-Bryan angles.

  • plan (bool) – use moveit the plan a path to move to the desired pose

set_joint_positions(target_joint, plan=False)[source]

Sets the desired joint angles for all arm joints.

Parameters
  • target_joint (list) – list of length #of joints(5 for locobot), angles in radians, order-> base_join index 0, wrist joint index -1

  • plan (bool) – whether to use moveit to plan a path. Without planning, there is no collision checking and each joint will move to its target joint position directly.

set_joint_velocities(target_vels)[source]

Sets the desired joint velocities for all arm joints.

Parameters

target_vels (list) – target joint velocities, list of length #of joints(5 for locobot) velocity in radians/sec order-> base_join index 0, wrist joint index -1

set_pan(pan, wait=True)[source]

Sets the pan joint angle of robot camera to the specified value.

Parameters
  • pan (float) – value to be set for pan joint in radian

  • wait (bool) – wait until the pan angle is set to the target angle.

set_pan_tilt(pan, tilt, wait=True)[source]

Sets both the pan and tilt joint angles of the robot camera to the specified values.

Parameters
  • pan (float) – value to be set for pan joint in radian

  • tilt (float) – value to be set for the tilt joint in radian

  • wait (bool) – wait until the pan and tilt angles are set to the target angles.

set_tilt(tilt, wait=True)[source]

Sets the tilt joint angle of robot camera to the specified value.

Parameters
  • tilt (float) – value to be set for the tilt joint in radian

  • wait (bool) – wait until the tilt angle is set to the target angle.

stop()[source]

stops robot base movement.

transform_pose(XYZ, current_pose)[source]

Transforms the point cloud into geocentric frame to account for camera position

Parameters
  • XYZ – …x3

  • current_pose – camera position (x, y, theta (radians))

Returns

…x3

Return type

XYZ

class locobot.agent.locobot_mover.LoCoBotMover(ip=None, backend='locobot')[source]

Implements methods that call the physical interfaces of the Locobot.

Parameters
  • ip (string) – IP of the Locobot.

  • backend (string) – backend where the Locobot lives, either “habitat” or “locobot”

get_base_pos()[source]

Return Locobot (x, y, yaw) in the robot base coordinates as illustrated in the docs:

https://www.pyrobot.org/docs/navigation

get_base_pos_in_canonical_coords()[source]

get the current Locobot position in the canonical coordinate system instead of the Locobot’s global coordinates as stated in the Locobot documentation: https://www.pyrobot.org/docs/navigation.

the standard coordinate systems:

Camera looks at (0, 0, 1), its right direction is (1, 0, 0) and its up-direction is (0, 1, 0)

return: (x, z, yaw) of the Locobot base in standard coordinates

get_pan()[source]

get yaw in radians.

get_rgb_depth()[source]

Fetches rgb, depth and pointcloud in pyrobot world coordinates.

Returns

an RGBDepth object

get_tilt()[source]

get pitch in radians.

grab_nearby_object(bounding_box=[(240, 480), (100, 540)])[source]
Parameters

bounding_box – region in image to search for grasp

look_at(obj_pos, yaw_deg, pitch_deg)[source]

Executes “look at” by setting the pan, tilt of the camera or turning the base if required.

Uses both the base state and object coordinates in canonical world coordinates to calculate expected yaw and pitch.

Parameters
  • obj_pos (list) – object coordinates as saved in memory.

  • yaw_deg (float) – yaw in degrees

  • pitch_deg (float) – pitch in degrees

Returns

string “finished”

move_absolute(xyt_positions)[source]

Command to execute a move to an absolute position.

It receives positions in canonical world coordinates and converts them to pyrobot’s coordinates before calling the bot APIs.

Parameters
  • xyt_positions – a list of (x_c,y_c,yaw) positions for the bot to move to.

  • (x_c

  • y_c

  • are in the canonical world coordinates. (yaw)) –

move_relative(xyt_positions)[source]

Command to execute a relative move.

Parameters
  • xyt_positions – a list of relative (x,y,yaw) positions for the bot to execute.

  • x

  • y

  • are in the pyrobot's coordinates. (yaw) –

reset_camera()[source]

reset the camera to 0 pan and tilt.

turn(yaw)[source]

turns the bot by the yaw specified.

Parameters

yaw – the yaw to execute in degree.

Craftassist

Details for setting up and running the Cuberite server and Craftassist agent are here

Craftassist Perception

The craftassist perception modules are mostly heuristic.

class craftassist.agent.low_level_perception.LowLevelMCPerception(agent, perceive_freq=5)[source]

Perceive the world at a given frequency and update agent memory.

updates positions of other players, mobs, self, and changed blocks, takes this information directly from the cuberite server

Parameters
  • agent (LocoMCAgent) – reference to the minecraft Agent

  • perceive_freq (int) – if not forced, how many Agent steps between perception

perceive(force=False)[source]

Every n agent_steps (defined by perceive_freq), update in agent memory location/pose of all agents, players, mobs; item stack positions and changed blocks.

Parameters

force (boolean) – set to True to run all perceptual heuristics right now, as opposed to waiting for perceive_freq steps (default: False)

class craftassist.agent.heuristic_perception.PerceptionWrapper(agent, perceive_freq=20)[source]

Perceive the world at a given frequency and update agent memory.

Runs three basic heuristics.
The first finds “interesting” visible (e.g. touching air) blocks, and
creates InstSegNodes
The second finds “holes”, and creates InstSegNodes
The third finds connected components of “interesting” blocks and creates
BlockObjectNodes. Note that this only places something in memory if
the connected component is newly closer than 15 blocks of the agent,
but was not recently placed (if it were newly visible because of
a block placement, it would be dealt with via maybe_add_block_to_memory
in low_level_perception.py)
Parameters
  • agent (LocoMCAgent) – reference to the minecraft Agent

  • perceive_freq (int) – if not forced, how many Agent steps between perception

perceive(force=False)[source]

Called by the core event loop for the agent to run all perceptual models and save their state to memory.

Parameters

force (boolean) – set to True to run all perceptual heuristics right now, as opposed to waiting for perceive_freq steps (default: False)

However, there are voxel models for semantic segmentation, one is here. Its interface is:

class craftassist.agent.voxel_models.subcomponent_classifier.SubcomponentClassifierWrapper(agent, model_path, perceive_freq=0)[source]

Perceive the world at a given frequency and update agent memory.

creates InstSegNodes and tags them

Parameters
  • agent (LocoMCAgent) – reference to the minecraft Agent

  • model_path (str) – path to the segmentation model

  • perceive_freq (int) – if not forced, how many Agent steps between perception. If 0, does not run unless forced

perceive(force=False)[source]

run the classifiers, put tags in memory

Parameters

force (boolean) – set to True to run all perceptual heuristics right now, as opposed to waiting for perceive_freq steps (default: False)