SIGMA: An open-source mixed-reality system for research on physical task assistance (Microsoft)

What would it take to build an interactive AI system that could assist you with any task in the physical world, just as a real-time expert would? To begin exploring the core competencies that such a system would require, we developed and released the Situated Interactive Guidance, Monitoring, and Assistance (SIGMA) system, an open-source research platform and testbed prototype for studying mixed-reality task assistance. SIGMA provides a basis for researchers to explore, understand, and develop the capabilities required to enable in-stream task assistance in the physical world.

[…] building AI systems that collaborate with people in the physical world—including not just mixed-reality task assistants but also interactive robots, smart factory floors, autonomous vehicles, and so on—requires going beyond the ability to generate relevant instructions and content. To be effective, these systems also require physical and social intelligence.

[…] the system is implemented as a client-server architecture: a lightweight client application runs on the HoloLens 2 device (configured in Research Mode (opens in new tab)), which captures and sends a variety of multimodal data streams—including RGB (red-green-blue), depth, audio, head, hand, and gaze tracking information—live to a more powerful desktop server. The desktop server implements the core functionality of the application and streams information and commands to the client app for what to render on the device. This architecture enables researchers to bypass current compute limitations on the headset and creates opportunities for porting the application to other mixed-reality devices.

SIGMA is built on top of Platform for Situated Intelligence (also known as \psi), an open-source framework that provides the fabric, tools, and components for developing and researching multimodal integrative-AI systems. The underlying \psi framework enables fast prototyping and provides a performant streaming and logging infrastructure. The framework provides infrastructure for data replay, enabling data-driven development and tuning at the application level. Finally, Platform for Situated Intelligence Studio provides extensive support for visualization, debugging, tuning and maintenance.