3D-Scene-Language to Action Model
3D-LOTUS and 3D-LOTUS++: Helping robots to generalize better
My work on 3D-LOTUS and 3D-LOTUS++ is all about making robots better at understanding and performing tasks by using both visual inputs and language instructions. These systems aim to help robots adapt to new and challenging scenarios without requiring tons of extra training.
3D-LOTUS: This system uses 3D models of the robot’s surroundings to plan its actions. It’s fast, precise, and performs exceptionally well on tasks it’s already trained for.
3D-LOTUS++: Taking things a step further, this version combines language models (for breaking down tasks into simple steps) and vision-language models (for identifying objects). By teaming these tools with the motion planning skills of 3D-LOTUS, it handles completely new tasks much better.
Both systems were tested on GemBench, a benchmark created to measure how well robots adapt to different levels of difficulty, from simple tweaks to entirely new tasks. Learn more at GemBench Project.