Producting Spatial Video
Apple re-uses as much technology across and between platforms as possible. Therefore it’s no surprise that video experiences on the Vision Pro are built on top of existing technologies like HTTP Live Streaming, MPEG-4, and HEVC. The native frame rate is 90Hz and it contains helpers to upsample 24fps content to 96Hz. 3D video produced for the Vision Pro contains both left eye and right eye content for every single frame, each captured from different lenses or angles. This create parallax, the 3D effect that can be presented on the Vision Pro. The distance between each frame is calculated during encoding using a technique called 2D Plus Delta. This allows your content to degrade gracefully when viewed in 2D. Video is encoded in MultiView HEVC (MV-HEVC for short).
Parallax content is presented relative to the screen plane, with closer content appearing in front of the plane and content further away appearing behind. Apple suggests we store this parallax contour information in a timed metadata track. The primary use of this information to ensure that captions do not conflict with other 3D elements in a video. This metadata is stored in a tiled format to allow for efficient processing, with a recommendation of a 10x10 tile for a balance of resolution and storage cost. Generating parallax contour information is currently an exercise left to the content creator.
Apple’s approach to re-using and extending existing tools, workflows, and standards should make it easier for vendors to incrementally roll out support for 3D video without major disruption of content creation workflows.
In the Destination Video example project, Apple includes links to several playlists showing how to play video in visionOS. In
DestinationVideo/Model/Videos.json there are links to several HTTP Live Streaming playlists, but it’s not clear to me if any of them support MV-HEVC:
Links and more information
Technical links from Apple:
- HEVC Stereo Video describes interoperable stereo bitstreams at a very low level.
- Extensions to ISOBMFF for Stereo Video explains the Video Extended Usage box and other pieces of stereoscopic metadata.
- Stereo Video Contour Map Metadata describes the video contour timed metadata used in caption placement in detail.
- The latest draft of HTTP Live Streaming 2nd Edition contains information about
Videos from Apple: