The Scene Understanding and Modeling Challenge

The SUMO challenge encourages the development of algorithms for complete understanding of 3D indoor scenes from 360° RGB-D panoramas with the goal of enabling social AR and VR research and experiences. The target 3D models of indoor scenes include all visible layout elements and objects complete with pose, semantic information, and texture. Algorithms submitted are evaluated at 3 levels of complexity corresponding to 3 tracks of the challenge: oriented 3D bounding boxes, oriented 3D voxel grids, and oriented 3D meshes.

360° RGB-D Input

360° RGB

360° Depth

Complete 3D Scene Output

3D Texture + Pose

3D Semantic + Instance


The SUMO challenge dataset is derived from processing scenes from the SUNCG dataset to produce 360° RGB-D images represented as cubemaps and corresponding 3D mesh models of all visible scene elements. The mesh models are further processed into a bounding box and voxel-based representation. The dataset format is described in detail here.

59 K

Indoor Scenes







1024 X 1024 RGB images

1024 X 1024 Depth Maps

2D Semantic Information

3D Semantic Information

3D Object Pose

3D Element Texture

3D Bounding Boxes Scene Representation

3D Voxel Grid Scene Representation

3D Mesh Scene Representation


The SUMO Challenge is organized into three performance tracks based on the output representation of the scene. A scene is represented as a collection of elements, each of which models one object in the scene (e.g., a wall, the floor, or a chair). An element is represented in one of three increasingly descriptive representations: bounding box, voxel grid, or surface mesh. For each element in the scene, a submission contains the following outputs listed per track.

3D Bounding Box Track

3D Bounding Box

3D Object Pose

Semantic Category of Element

3D Voxel Grid Track

3D Bounding Box

3D Object Pose

Semantic Category of Element

Location and RGB Color of Occupied 3D Voxels

3D Mesh Track

3D Bounding Box

3D Object Pose

Semantic Category of Element

Element's textured mesh (in .glb format)


Evaluation of a 3D scene focuses on 4 keys aspects: Geometry, Appearance, Semantic and Perceptual - (GASP) Details of the metrics for each track are provided here.


Upload your results to our EvalAI submission page (under construction).


Winners of the 2018 pilot edition of the SUMO Challenge will be announced at the SUMO Workshop held in conjunction with ACCV 2018 in Perth, Australia on December 3rd. See the official SUMO Challenge Contest Rules.

3D Mesh Track - 1st Prize

$2500 cash prize

Titan X GPU

Oral Presentation

3D Voxel Track - 2nd Prize

$2000 cash prize

Titan X GPU

Oral Presentation

3D Bounding Box Track - 3rd Prize

$1500 cash prize

Titan X GPU

Oral Presentation


SUMO Announcement at CVPR18

June 22nd, 2018

SUMO Challenge Launch

August 29th, 2018

Challenge Submission Deadline

November 16th, 2018

SUMO Workshop - Perth, Australia

December 3rd, 2018


Daniel Huber


Lyne Tchapmi

Stanford University

Frank Dellaert


Advisory Board

Ilke Demir


T. Funkhouser

Princeton University

Leo. Guibas

Stanford University

Jitendra Malik

UC Berkeley

Silvio Savarese

Stanford University

Facebook Team

Challenge Advisors

Bahram Dahi

Jay Huang

Nandita Nayak

John Princen

Ruben Sethi

Iro Armeni

Angel Chang

Kevin Chen

Christopher Choy

JunYoung Gwak

Manolis Savva

Alexander(Sasha) Sax

Richard Skarbez

Shuran Song

Amir R. Zamir