Minecraft Meets AI Research
Research Background & Objectives
Embodied AI, a subfield of artificial intelligence, focuses on developing agents capable of reasoning and interacting within physical or simulated environments, mimicking human-like cognitive and physical behaviors. Minecraft has emerged as a valuable testbed for embodied AI research due to its voxel-based, open-world, sandbox nature. This environment presents diverse challenges such as navigation, resource collection, crafting, and multi-agent collaboration, offering a scalable and customizable framework for testing and benchmarking AI capabilities. To evaluate agent capabilities in such complex domains, researchers have developed metrics that assess skills such as generalization, efficiency, and robustness. Minecraft’s variety of tasks and procedural generation features allow for systematic evaluation of both specific skills and emergent behaviors. Recent innovations in video diffusion models and world modeling further enhance agent capabilities. Video diffusion models enable predictive modeling by learning to generate plausible future states conditioned on past observations, supporting robust planning in uncertain environments. World models, which encapsulate compact representations of an agent’s environment, integrate perception, planning, and control, enabling agents to learn and adapt more efficiently. Complemented by model predictive control (MPC), which enable agents to plan actions over short horizons by simulating future states, video and world models such as these have the potential to be particularly impactful in Minecraft, where the environment’s complexity requires agents to internalize dynamics and adapt to changing scenarios.
Our overall aim is to be the first to develop an embodied AI agent that includes speech in its perception and action spaces. The impact of this would be profound – imagine being able to interact with an agent, in an open-world environment, that you can speak to and interact with as if it were another human. Written, text-based interchanges with LLMs are already arguably passing the Turing test. We believe that, with support from NSERC and our industrial partners, we will be able to make significant strides towards passing such a test through the more challenging naturalistic interface of speech, perception, and action in a complex open-universe world. Moreover, and most critically, the model architectures developed in this research program are immediately repurposable onto other embodied AI agent domains including digital assistants.
Our Plan
Step 1
Collect a substantial amount of multimodal Minecraft play data. Hosted at plaicraft.ai we have developed a world-spanning AWS-EC2-based platform for collecting first-person Minecraft interactive play in a shared server world requiring only a browser on participants’ computers. Under the guidance of UBC’s Behavioural Research Ethics Board (BREB), our study (classified as minimal risk) captures capture audio in and out, keypresses, mouse movements, and video from players around the world interacting via multiplayer Minecraft play. The specific unique aspect of this dataset is that we have interactive speech data, in particular naturalistic speech that arises from human agents interacting in a shared Minecraft open world environment.
Step 2
Develop an embodied AI agent performance benchmark. Evaluating agent performance in such complex environments effectively requires direct comparisons between human and AI abilities. There is a significant challenge of developing evaluation frameworks that align with human cognitive benchmarks. Our plaicraft.ai environment gives us a potentially unparalleled opportunity to develop quantitative metrics for evaluating embodied agent performance.
Step 3
Design, build, and train a performant embodied AI agent. Our approach to this involves inventing flexible, conditional multi-modal diffusion models. The challenges here are multifaceted but within tenable grasp. Our experience gives us confidence in our ability to explore the space of architectures required by such multi-modal data.
Participate in our Research!
How It Works:
- Sign Up: Visit plaicraft.ai and enter your email to join.
- Consent: Complete a quick form to formally consent to participate in the project.
- Play: Receive an access token, dive into Minecraft, and play for science.
Why Participate?
- Contribute to Science: Your gameplay is a critical contribution to AI advancements.
- Free Gameplay: Enjoy Minecraft at no cost, with the added benefit of contributing to scientific research.
- Exclusive Access: Researchers and students can gain early access to the collected data by helping sign up participants — a fantastic opportunity for academic exploration and extra credit.
Stay tuned! Pay attention to our plaicraft.ai blog and follow @frankdonaldwood on Twitter for updates, insights, and more as we embark on this remarkable journey together. Let’s craft the future of AI, one block at a time.