Introducing Avatar Streaming: Real-Time Facial Tracking on Oshi

3 min readFeb 20, 2024

Welcome to the Oshi blog, where we dive into the future of immersive communication through avatars and the technical challenges we’re solving to enable it. Today, we’re thrilled to unveil our latest product: Avatar Streaming. This feature brings a new dimension to virtual communication on our platform, allowing creators to stream behind their avatars visually while conversing in their natural human voice.

At Oshi, we’re on a mission to pioneer the first-ever social network for virtual characters, catering to VTubers (virtual YouTubers), anime personas, and game characters. While platforms like YouTube and Twitch offer live streaming capabilities, there’s been a void in the market — a space where individuals can cultivate genuine one-on-one relationships with fellow characters. With the rise of digital identities among Gen Z and Alpha, we saw an opportunity to create a platform tailored to their needs — a place where virtual identities can connect with each other on a deeper level.

The development journey behind Avatar Streaming started with our product team crafting this feature and breaking it down into three key submodules:

1. 3D Assets Loading Module: We kickstarted the development process by implementing a framework technology capable of loading users’ VRM models as avatars into the desired scene. Leveraging the power of glTF (Graphics Library Transmission Format), we ensured efficient transmission and loading of 3D scenes and models, laying a solid foundation for Avatar Streaming.

2. Face & Body Tracking Module: Next, we started work on facial and body tracking. Our team developed algorithms to track users’ motions via the camera, capturing nuances and expressions. Using machine learning frameworks, we detected and tracked facial and hand landmarks, enabling synchronization between users’ movements and their avatars.

3. Video Renderer & Audio Processor: Finally, we brought Avatar Streaming to life by capturing users’ movements and voice recordings in real-time. Through motion capture processes, we ensured that every gesture made by the human was replicated by the avatars, including speech, delivering an immersive and authentic communication experience.

Along the way, we encountered numerous technical challenges, from optimizing model efficiency to ensuring compatibility across a diverse range of devices — starting with our web platform and soon rolling out to our iOS mobile application.

By leveraging industry-standard frameworks like FACS (Facial Animation Control System) and incorporating synthetic data generation techniques, we pushed the boundaries of what’s possible in facial tracking technology. Our commitment to accessibility led us to develop adaptive systems capable of dynamically adjusting to the processing power of any device, ensuring a seamless experience for users across the board.

Avatar Streaming is currently available in private beta to select creators. Want to request access? DM us on Twitter or create a ticket on Discord here.

Behind all the work everyone at Oshi did to bring this product to life, is the community that allows us to thrive and innovate. We extend our heartfelt gratitude to our dedicated community of VTubers, whose feedback and support keep us going. Thank you for being part of the Oshi journey. Want to be a part of our creator community? Join our Discord.

Introducing Avatar Streaming: Real-Time Facial Tracking on Oshi

Written by Logcast