A new approach uses crowdsourced feedback to help train robots.

MIT, Harvard, and University of Washington researchers have unveiled their Human Guided Exploration (HuGE), which uses crowdsourced feedback to expedite AI agent training.

Typically, reinforcement learning relies on a carefully crafted reward function by human experts, making the process time-consuming and challenging, particularly for complex tasks. 

HuGE, however, leverages nonexpert feedback gathered asynchronously from users worldwide, allowing the AI agent to learn swiftly, even with potentially error-filled data.

MIT Professor Pulkit Agrawal says a bottleneck in robotic agent design lies in engineering the reward function. 

HuGE addresses scalability by crowdsourcing reward function design, enabling nonexperts to provide valuable feedback. 

Lead author Marcel Torne says “the reward function guides the agent to what it should explore, instead of telling it exactly what it should do”, facilitating robust learning even with imprecise human supervision.

The researchers decoupled the process into two algorithms that comprise HuGE. 

The goal selector algorithm continuously updates with human feedback to guide the agent's exploration, while the agent explores autonomously in a self-supervised manner. 

This two-pronged approach enhances learning efficiency, narrowing down exploration areas and adapting to infrequent and asynchronous feedback.

Testing HuGE in simulated and real-world tasks, researchers found it accelerated learning on long sequences of actions, such as stacking blocks or navigating mazes. 

In real-world tests involving 109 nonexpert users across 13 countries, HuGE outperformed other methods in training robotic arms for tasks like drawing the letter “U” and object manipulation. 

Notably, nonexpert crowdsourced data outperformed synthetic data, showcasing HuGE's scalability.

Torne says that the method’s promise lies in its ability to autonomously learn without human resets, demonstrating its potential for scalability. 

Future work aims to refine HuGE for natural language and physical interaction applications, emphasising alignment with human values. The research receives partial funding from the MIT-IBM Watson AI Lab.

More details are accessible here.