# MRPS v4 Adaptive Mono/Stereo RGB-D Snapshot — Normative Format Specification

**Format identifier:** `mr-phase-shift-snapshot/v4`  
**Documentation revision:** `v4-doc1`  
**Scope:** the encoder and decoder shipped beside this file  
**Coordinate frame:** camera/viewer-local at capture time; no room/world registration

The words **MUST**, **MUST NOT**, **SHOULD**, **SHOULD NOT**, and **MAY** define player conformance requirements.

## 1. Purpose and authority order

An MRPS v4 PNG stores a visible 2048×2048 RGB/phase-depth atlas plus the native WebXR depth samples and all transforms needed to reconstruct metric geometry. A conforming player uses sources in this order:

1. **Embedded native metric depth chunks** (`mdPN`, or `mdPL` + `mdPR`) are authoritative for size, distance, geometry, and occlusion.
2. **Per-depth-view sensor matrices and transforms** are authoritative for unprojection into capture-local metres.
3. **RGB mapping records** are authoritative for sampling the stored RGB without changing its aspect ratio.
4. **Capture baseline/IPD-equivalent fields** are diagnostics and provenance; complete matrices remain authoritative.
5. **The four-tile phase atlas** is a fallback for image/video workflows and MUST NOT replace native depth when exact metric reconstruction is required.

The captured geometry is attached to the playback viewer/camera origin and orientation. The format intentionally contains no room anchor, world map, or persistent physical-world coordinate.

## 2. Files and PNG container

Each capture normally downloads matching `.png` and `.json` files. The JSON is a convenient sidecar copy. The PNG is self-contained and contains the same JSON plus native depth payloads.

A player MUST scan the raw PNG chunk stream; loading the PNG only through an image API will expose the 2048×2048 pixels but can hide custom depth chunks.

| PNG chunk | Occurrence | Meaning |
|---|---:|---|
| Standard `IHDR`/`IDAT`/`IEND` etc. | normal PNG rules | Visible 2048×2048 atlas. |
| `iTXt` with keyword `mr-phase-shift-metadata` | exactly one expected | UTF-8 JSON metadata, uncompressed. |
| `mdPN` | mono capture | Native depth for view ID `mono`. |
| `mdPL` | stereo capture | Native depth for view ID `left`. |
| `mdPR` | stereo capture | Native depth for view ID `right`. |

A player MUST use `metricDepth.views[].chunkType` as the actual manifest instead of assuming a chunk name solely from eye labels. Unknown ancillary chunks MAY be ignored. PNG CRCs SHOULD be verified.

## 3. Native depth payload (`MRD1`)

Every `mdP*` chunk contains this payload. Header integers and the `float64` scale are little-endian. PNG chunk lengths and PNG CRC words remain big-endian according to PNG itself.

| Byte offset | Size | Type | Field | Meaning |
|---:|---:|---|---|---|
| 0 | 4 | ASCII | `magic` | Must equal `MRD1`. |
| 4 | 1 | `uint8` | `version` | Must equal `1`. |
| 5 | 1 | `uint8` | `formatCode` | `1 = uint16`, `2 = float32`. |
| 6 | 1 | `uint8` | `rawByteOrderCode` | Byte order of raw samples: `1 = little`, `2 = big`. |
| 7 | 1 | reserved | — | Written as zero; ignore when reading v1. |
| 8 | 4 | `uint32 LE` | `width` | Depth-buffer columns. |
| 12 | 4 | `uint32 LE` | `height` | Depth-buffer rows. |
| 16 | 8 | `float64 LE` | `rawValueToMeters` | Multiply a raw sample by this value to obtain camera-plane Z in metres. |
| 24 | 4 | `uint32 LE` | `rawByteLength` | Number of following raw bytes. |
| 28 | variable | bytes | `rawBytes` | `width × height` samples in row-major order and the declared raw byte order. |

Validation requirements:

```text
rawByteLength == width * height * bytesPerSample
payloadByteLength == 28 + rawByteLength
```

A raw value of zero is invalid/no measurement. A robust player also rejects non-finite and non-positive float samples. For valid samples:

```text
zMeters = rawSample * rawValueToMeters
```

Depth is camera-plane Z, not Euclidean ray length. It describes distance from the sensor/camera plane along the sensor's forward axis.

## 4. Coordinate and matrix conventions

The metadata contract is:

```text
units: metres
handedness: right-handed
axes: +X right, +Y up, -Z forward
matrix storage: column-major, 16 numbers
vector convention: column vectors
name AFromB: transforms coordinates expressed in B into A
translation indices: matrix[12], matrix[13], matrix[14]
clip space: WebGL NDC, x/y/z in [-1,+1]
```

The capture-local origin is the `XRViewerPose` at capture time. Therefore `captureLocalFromSensor` places the depth sensor relative to the capture camera origin, while `captureLocalFromView` places the XR render view relative to that same origin.

A player MUST use the supplied projection matrices directly rather than reconstructing a symmetric FOV, because an XR projection may be asymmetric or sheared.

### 4.1 Normalized coordinates

Normalized view and normalized depth-buffer coordinates both use top-left origin, +X right, +Y down. For a depth pixel centre:

```text
depthUv = [(column + 0.5) / width,
           (row    + 0.5) / height,
           0,
           1]
viewUvH = normViewFromNormDepthBuffer * depthUv
viewUv  = viewUvH.xy / viewUvH.w
```

The inverse direction, useful when sampling depth by RGB/view UV, is:

```text
depthUvH = normDepthBufferFromNormView * [viewU, viewV, 0, 1]
depthUv  = depthUvH.xy / depthUvH.w
```

## 5. Atlas layout

The visible PNG is 2048×2048 pixels with top-left image origin:

| Region | Pixel rectangle | Mono | Stereo |
|---|---|---|---|
| top-left | `(0,0,1024,1024)` | mono RGB | left RGB |
| top-right | `(1024,0,1024,1024)` | unused black | right RGB |
| bottom-left | `(0,1024,1024,1024)` | mono phase atlas | left phase atlas |
| bottom-right | `(1024,1024,1024,1024)` | unused black | right phase atlas |

Each 1024×1024 depth atlas contains four copies of the same logical 512×512 depth grid:

| Tile | Atlas-local rectangle | Value |
|---|---|---|
| coarse | `(0,0,512,512)` | `zNorm` |
| inverse | `(512,0,512,512)` | `1 - zNorm` |
| sine | `(0,512,512,512)` | `0.5 + 0.5*sin(zNorm*cycles*2π)` |
| cosine | `(512,512,512,512)` | `0.5 + 0.5*cos(zNorm*cycles*2π)` |

The alpha byte is a state, not ordinary transparency:

```text
0   = invalid sample or outside active mapped RGB region
128 = valid native sample farther than maxDistanceMeters; only a lower bound
255 = valid sample represented within the phase range
```

## 6. RGB aspect and crop mapping

`rgb.mappings` and each depth view's `rgbAndAtlasMapping` describe the exact source-to-stored transform. Rectangles use top-left pixel origin.

For a mapping object:

- `sourceRectPixels` selects the logical source view: the whole frame for mono or one half for SBS.
- `sampledSourceRectPixels` is the actual source crop after `cover`; it equals `sourceRectPixels` for `contain`.
- `normalizedViewRect` expresses `sampledSourceRectPixels` relative to `sourceRectPixels` in normalized view coordinates.
- `storedActiveRectPixels` is the region inside the 1024×1024 RGB slot where source pixels were written.
- `storedActiveRectNormalized` is that same region divided by 1024.
- `storedResolution` is the RGB slot size, normally 1024×1024.

To map a reconstructed normalized-view UV to the stored RGB slot:

```text
tx = (viewU - normalizedViewRect.x) / normalizedViewRect.width
ty = (viewV - normalizedViewRect.y) / normalizedViewRect.height

storedU = storedActiveRectNormalized.x + tx * storedActiveRectNormalized.width
storedV = storedActiveRectNormalized.y + ty * storedActiveRectNormalized.height
```

Use the sample only when `tx` and `ty` are in `[0,1]`. Convert slot UV to the full PNG by adding the slot's `output.*` rectangle offset. This mapping preserves capture aspect ratio for both `contain` and `cover`.

Under this project's explicit calibration assumption, an assigned RGB source view has the exact FOV, optical axis, and distortion model of its associated depth sensor.

## 7. Complete metadata field reference

All fields emitted by the supplied encoder are defined below. “Informational” strings describe the encoder decision and are not executable code unless this specification says otherwise.

### 7.1 Root fields

| Path | Type | Meaning / player use |
|---|---|---|
| `schema` | string | Format identifier. A v4 player MUST require major version `v4`; unknown major versions must be rejected or handled by another decoder. |
| `captureId` | string | Unique capture identifier shared by the PNG and sidecar JSON. Do not derive geometry from it. |
| `capturedAt` | ISO-8601 string | Wall-clock creation time. Diagnostic only. |
| `files` | object | Names and PNG embedding information. |
| `assumptions` | object | Explicit RGB/depth calibration assumptions used by this revision. |
| `coordinateSystem` | object | Units, axes, matrix and origin contract. |
| `source` | object | APIs/session/reference-space provenance. |
| `captureTrigger` | object or null | Action that requested this snapshot. Diagnostic only. |
| `viewConfiguration` | object | Mono/stereo layout and capture baseline information. |
| `timing` | object | Best-effort RGB/depth pairing timestamps. |
| `output` | object | Visible atlas dimensions and quadrant rectangles. |
| `rgb` | object | RGB source, layout and aspect mapping. |
| `depthSession` | object | Granted WebXR depth session configuration. |
| `metricDepth` | object | Authoritative depth-chunk manifest. |
| `phaseDepthFallback` | object | Non-authoritative four-tile encoding parameters. |
| `reconstruction` | object | Human-readable reconstruction formulas/rules. |
| `occlusion` | object | Validity and suggested tolerance guidance. |
| `depth` | object | Full per-view calibration and native-buffer descriptors. |
| `atlasStatistics` | object | Logical 512×512 phase-atlas sample counts per active view. |

### 7.2 `files`

| Field | Type | Meaning |
|---|---|---|
| `imageFilename` | string | Suggested/downloaded PNG filename. |
| `jsonFilename` | string | Matching sidecar filename. |
| `pngMetadata.chunkType` | string | `iTXt`. |
| `pngMetadata.keyword` | string | `mr-phase-shift-metadata`. |
| `pngMetadata.compression` | string | `none`; the decoder does not accept compressed iTXt. |
| `imageIsSelfContained` | boolean | `true` means metric payloads and metadata are inside the PNG. |

### 7.3 `assumptions`

| Field | Type | Meaning |
|---|---|---|
| `rgbCameraUsesExactDepthSensorFieldOfView` | boolean | RGB is treated as sharing the depth sensor projection. |
| `rgbCameraUsesExactDepthSensorOpticalAxis` | boolean | RGB and depth optical axes are treated as identical. |
| `rgbCameraUsesExactDepthSensorDistortionModel` | boolean | No additional RGB lens-distortion correction is required under the stated assumption. |
| `note` | string | Human-readable statement of that assumption. |

A player that cannot accept these assumptions MUST reject the capture or request external RGB calibration.

### 7.4 `coordinateSystem`

| Field | Type | Meaning |
|---|---|---|
| `units` | string | Must be `meters`. |
| `handedness` | string | Must be `right-handed`. |
| `axes` | string | `+X right, +Y up, -Z forward`. |
| `origin` | string | Viewer/camera pose at capture time. |
| `worldAlignmentRequired` | boolean | `false`; place capture at playback camera origin. |
| `matrixStorage` | string | `column-major`. |
| `vectorConvention` | string | Column vectors; `AFromB` naming rule. |
| `clipSpace` | string | WebGL NDC range and convention. |

### 7.5 `source`

| Field | Type | Meaning |
|---|---|---|
| `xrSessionMode` | string | Session mode used, normally `immersive-ar`. |
| `rgbApi` | string | RGB acquisition API (`MediaDevices.getUserMedia`). |
| `depthApi` | string | WebXR depth API used. |
| `referenceSpaceType` | string | Actual reference space used to obtain viewer/view transforms, e.g. `local` or `viewer`. Capture-local removes dependence on its persistent origin. |

### 7.6 `captureTrigger`

`captureTrigger` may be null. Known variants:

| Variant/field | Type | Meaning |
|---|---|---|
| `type` | string | `webxr-select`, `dom-event`, `programmatic`, or implementation/test label. |
| `eventType` | string or null | DOM/WebXR event name such as `select`, `click`, or `pointerup`. |
| `targetRayMode` | string or null | For WebXR input: `gaze`, `tracked-pointer`, `screen`, or `transient-pointer`. |
| `handedness` | string | For WebXR input: `none`, `left`, or `right`. |
| `profiles` | string[] | Runtime input profile names. |
| `pointerType` | string or null | DOM pointer type such as `touch`, `mouse`, or `pen`. |

No trigger field changes projection or geometry.

### 7.7 `viewConfiguration`

| Field | Type | Meaning |
|---|---|---|
| `mode` | `mono` or `stereo` | Capture view mode. |
| `primaryViewCount` | integer | Number of non-observer XR primary views accepted: 1 or 2. |
| `eyeOrder` | string[] | XR eye labels in output/depth-view order. |
| `viewIds` | string[] | Stable MRPS IDs (`mono`, or `left`,`right`) in output order. |
| `supportsPlaybackOnDifferentViewConfiguration` | boolean | True because geometry is reconstructed in metric capture-local space before playback projection. |
| `stereoBaseline` | object or null | Stereo diagnostics; null for mono. |

`stereoBaseline` fields:

| Field | Type | Meaning |
|---|---|---|
| `viewLeftToRightMeters` | number[3] | Right XR-view origin minus left XR-view origin, expressed in capture-local axes. |
| `viewOriginDistanceMeters` | number | Euclidean magnitude of `viewLeftToRightMeters`. |
| `depthSensorLeftToRightMeters` | number[3] | Right depth-sensor origin minus left sensor origin in capture-local axes. |
| `depthSensorOriginDistanceMeters` | number | Magnitude of the depth-sensor vector. |
| `ipdEquivalentMeters` | number | Duplicate convenience value equal to `viewOriginDistanceMeters`. It is not guaranteed biological IPD. |
| `ipdEquivalentSource` | string | Explains how the value was derived. |
| `depthCameraBaselineMeters` | number | Duplicate convenience value equal to `depthSensorOriginDistanceMeters`. |
| `isGuaranteedBiologicalInterpupillaryDistance` | boolean | Always false. |
| `isGuaranteedPhysicalRgbCameraSeparation` | boolean | Always false; getUserMedia does not provide this calibration. |
| `mayBeAnonymizedOrQuantizedByUserAgent` | boolean | Warns that exposed XR view separation can be privacy-adjusted. |
| `useAsPlaybackEyeSeparation` | boolean | Always false. Playback uses current runtime views. |

The complete per-view/sensor transforms are authoritative. Baseline scalars are useful for diagnostics, stereo fusion checks, and provenance only.

### 7.8 `timing`

| Field | Type | Meaning |
|---|---|---|
| `synchronizationMethod` | string | RGB next-frame capture followed by the next complete XR depth set. |
| `hardwareSynchronized` | boolean | False. No hardware exposure synchronization is claimed. |
| `video` | object | RGB capture timing record. |
| `xrFrameTimePerformanceMs` | number | XR animation-frame timestamp in the `performance.now()` timebase. |
| `estimatedVideoToDepthSkewMs` | number or null | `xrFrameTimePerformanceMs - video.frameCallbackAtPerformanceMs`. Positive means the chosen XR/depth frame occurred later. |

`timing.video` fields:

| Field | Type | Meaning |
|---|---|---|
| `mode` | string | `next-video-frame`, timeout fallback, or no-callback fallback path. |
| `requestedAtPerformanceMs` | number | Time RGB frame capture was requested. |
| `frameCallbackAtPerformanceMs` | number | `requestVideoFrameCallback` callback timestamp or fallback timestamp. |
| `copiedAtPerformanceMs` | number | Time the RGB image copy completed. |
| `metadata` | object or null | Sanitized browser video-frame metadata. |

Known `timing.video.metadata` numeric fields are `presentationTime`, `expectedDisplayTime`, `width`, `height`, `mediaTime`, `presentedFrames`, `processingDuration`, `captureTime`, `receiveTime`, and `rtpTimestamp`. A browser may omit all of them.

### 7.9 `output`

| Field | Type | Meaning |
|---|---|---|
| `width`, `height` | integer | Full PNG size, currently 2048×2048. |
| `topLeft`, `topRight`, `bottomLeft`, `bottomRight` | object | Quadrant descriptors. |
| `depthTileSize` | integer | Logical phase tile size, currently 512. |

Each quadrant descriptor contains `role`, `x`, `y`, `width`, and `height`; active regions also contain `viewId`. `role` identifies RGB, phase atlas, or `unused`. Coordinates use top-left PNG origin.

### 7.10 `rgb`

| Field | Type | Meaning |
|---|---|---|
| `sourceWidth`, `sourceHeight` | integer | Captured RGB frame dimensions before atlas mapping. |
| `sourceAspect` | number | `sourceWidth / sourceHeight`. |
| `mediaStreamVideoTrackCount` | integer | Number of video tracks returned by the MediaStream. This is not physical lens count. |
| `physicalCameraCount` | null | Deliberately unknown. |
| `physicalCameraCountReason` | string | Explanation of why the count is not guessed. |
| `selectedLayout` | string | UI selection: `auto`, `mono`, or `sbs`. |
| `resolvedLayout` | string | Actual source interpretation: `mono` or `sbs`. |
| `fitMode` | string | `contain` or `cover`. |
| `trackSettings` | object or null | Snapshot of `MediaStreamTrack.getSettings()`; diagnostic capture settings, not intrinsic calibration. |
| `mappings` | object | `mono`, or `left` and `right`, each using the mapping object defined in section 6. |
| `calibrationSource` | string | States that corresponding depth sensor geometry is used under the explicit assumption. |

Common `trackSettings` fields include `width`, `height`, `aspectRatio`, `frameRate`, `facingMode`, `resizeMode`, `deviceId`, and `groupId`; browser/vendor additions may occur. A player must not derive focal length from these settings.

### 7.11 `depthSession`

| Field | Type | Meaning |
|---|---|---|
| `usage` | string | Granted WebXR depth usage, expected `cpu-optimized`. |
| `dataFormat` | string | Granted depth format, normally `float32` or `unsigned-short`. |
| `type` | string or null | Granted depth type (`raw` or `smooth`) when exposed by the runtime. |
| `matchDepthViewRequested` | boolean | Whether session creation requested matching depth/view timing. This is not proof of RGB synchronization. |

### 7.12 `metricDepth`

| Field | Type | Meaning |
|---|---|---|
| `preferredSource` | string | Tells the player to prefer embedded native chunks. |
| `exactDeviceSamplesPreserved` | boolean | Raw CPU depth bytes were copied without phase quantization. Runtime-provided depth itself can still be filtered/quantized. |
| `conversion` | string | Human-readable raw-to-metre rule. |
| `views` | array | Canonical chunk manifest in output order. |

Each manifest item has:

| Field | Type | Meaning |
|---|---|---|
| `viewId` | string | MRPS view identifier. |
| `eye` | string | XR eye label. |
| `xrViewIndex` | integer | Runtime XRView index when available. |
| `chunkType` | 4-character string | PNG chunk containing that view's `MRD1` payload. |

### 7.13 `phaseDepthFallback`

| Field | Type | Meaning |
|---|---|---|
| `purpose` | string | Fallback warning; native depth is authoritative. |
| `encoding` | string | Encoding name. |
| `maxDistanceMeters` | number | Denominator used for phase-atlas normalization only. |
| `phaseCycles` | integer | Number of sine/cosine cycles across `[0,maxDistance]`. |
| `zNormDefinition` | string | `clamp(zMeters/maxDistanceMeters,0,1)`. |
| `validUnclippedDecodeMeters` | string | Inverse for valid, unclipped samples. |
| `nominalPhaseQuantizationStepMeters` | number | `maxDistance/(phaseCycles*255)`; a nominal indicator, not an accuracy guarantee. |
| `tiles` | object | Formula for each 512×512 tile. |
| `alphaState` | object | Meanings of alpha values `0`, `128`, and `255`. JSON object keys are strings. |

### 7.14 `reconstruction`

All fields in this object are human-readable restatements of the normative math in sections 4 and 9:

| Field | Meaning |
|---|---|
| `preferredInput` | Use native metric chunks. |
| `rawDepthPixelToNormalizedView` | Depth-pixel centre and normalized-coordinate transform steps. |
| `normalizedViewToSensorPoint` | Projection-inverse ray and camera-plane-Z scaling steps. |
| `sensorPointToCaptureLocal` | Sensor-to-capture-local transform. |
| `attachAtPlaybackViewer` | Place capture-local origin at current playback viewer pose. |
| `renderForEachPlaybackView` | Per-playback-view transformation and projection steps. |
| `playbackIpdRule` | Never scale geometry by capture/playback IPD. |
| `captureBaselinePurpose` | Baseline is calibration/provenance, not size scale. |
| `physicalScale` | One coordinate unit equals one metre. |

Players SHOULD implement the normative algorithm rather than evaluating these strings.

### 7.15 `occlusion`

| Field | Type | Meaning |
|---|---|---|
| `source` | string | Occlusion comes from the reconstructed metric surface. |
| `invalidSamplesCreateHoles` | boolean | Invalid native samples must not create geometry or occluders. |
| `clippedPhaseSamplesAreNotExactOccluders` | boolean | Alpha-128 phase values are lower bounds, not exact surfaces. |
| `embeddedNativeDepthRecommended` | boolean | Exact occlusion should use native depth. |
| `recommendedStartingToleranceMeters` | number | Initial z-comparison tolerance; tune for sensor noise. |

### 7.16 `depth.views[]` and `depth.byId`

`depth.views` is the canonical ordered list. `depth.byId` is a duplicated convenience index keyed by `viewId`. A player SHOULD parse `depth.views`; it MAY verify that `depth.byId[id]` agrees and otherwise ignore the duplicate.

Every depth-view object contains:

| Field | Type | Meaning |
|---|---|---|
| `viewId` | string | `mono`, `left`, or `right`. |
| `eye` | string | XR eye label `none`, `left`, or `right`. |
| `viewArrayIndex` | integer | Position in the `XRViewerPose.views` array used during capture. |
| `xrViewIndex` | integer | `XRView.index` when exposed, otherwise array index. |
| `nativeBuffer` | object | Native dimensions, type, scale, ordering, and payload descriptor. |
| `normalizedCoordinates` | object | View/depth normalized-coordinate transforms. |
| `xrViewGeometry` | object | Capture XR render-view projection and pose. |
| `sensorGeometry` | object | Authoritative depth-sensor projection and pose. |
| `rgbAndAtlasMapping` | object | Exact aspect/crop mapping shared by RGB and phase atlas for this view. |

#### `nativeBuffer`

| Field | Type | Meaning |
|---|---|---|
| `width`, `height` | integer | Native depth dimensions. |
| `dataFormat` | string | WebXR session format. |
| `elementType` | string | Serialized sample type: `uint16` or `float32`. |
| `rawValueToMeters` | number | Device-provided metric scale. |
| `invalidRawValue` | number | Zero. |
| `rowOrder` | string | `top-to-bottom`. |
| `storageOrder` | string | `row-major`. |
| `payload` | object | PNG chunk and `MRD1` header descriptor. |

#### `nativeBuffer.payload`

| Field | Type | Meaning |
|---|---|---|
| `chunkType` | string | Matching custom PNG chunk. |
| `payloadMagic` | string | `MRD1`. |
| `headerVersion` | integer | `1`. |
| `headerBytes` | integer | `28`. |
| `rawElementType` | string | `uint16` or `float32`. |
| `rawByteOrder` | string | Raw-sample byte order. |
| `rawByteLength` | integer | Raw bytes only. |
| `payloadByteLength` | integer | Header plus raw bytes. |
| `sha256OfRawBytes` | hex string or null | Optional integrity digest; null when WebCrypto digest was unavailable. |
| `headerLayoutLittleEndian` | object | Human-readable offset dictionary matching section 3. Its values are documentation strings, not data. |

`headerLayoutLittleEndian` keys are `magic`, `version`, `formatCode`, `rawByteOrderCode`, `width`, `height`, `rawValueToMeters`, `rawByteLength`, and `rawBytes`.

#### `normalizedCoordinates`

| Field | Type | Meaning |
|---|---|---|
| `origin` | string | `top-left`. |
| `xDirection` | string | `right`. |
| `yDirection` | string | `down`. |
| `normDepthBufferFromNormView` | number[16] | Maps normalized view UV into normalized depth-buffer UV. |
| `normViewFromNormDepthBuffer` | number[16] | Inverse used when unprojecting native depth pixels. |

#### `xrViewGeometry`

| Field | Type | Meaning |
|---|---|---|
| `projectionMatrix` | number[16] | Capture XRView projection. Use for source-view diagnostics/rendering, not as a substitute for sensor projection when sensor data exists. |
| `projectionMatrixInverse` | number[16] or null | Stored inverse. |
| `captureLocalFromView` | number[16] | XR view coordinates to capture-local. |
| `viewFromCaptureLocal` | number[16] | Inverse. |
| `viewport` | object or null | `XRWebGLLayer.getViewport(view)` result (`x`,`y`,`width`,`height`). It is framebuffer layout metadata, not camera intrinsics. |

#### `sensorGeometry`

| Field | Type | Meaning |
|---|---|---|
| `projectionMatrix` | number[16] | Authoritative depth-sensor projection. |
| `projectionMatrixInverse` | number[16] | Inverse used for unprojection. |
| `captureLocalFromSensor` | number[16] | Sensor coordinates to capture-local metres. |
| `sensorFromCaptureLocal` | number[16] | Inverse. |
| `projectionSource` | string | `XRDepthInformation.projectionMatrix` or explicit XRView fallback. A player may warn when fallback was used. |
| `transformSource` | string | `XRDepthInformation.transform` or explicit XRView fallback. |
| `depthMeaning` | string | Camera-plane Z from the depth sensor. |

### 7.17 `atlasStatistics`

The active keys are `mono`, or `left` and `right`. Each statistics object contains:

| Field | Meaning |
|---|---|
| `valid` | Logical 512×512 samples with valid native depth within phase range. |
| `clipped` | Valid samples beyond `maxDistanceMeters`. |
| `invalid` | Invalid native depth or pixels outside the active mapped image region. |
| `total` | `valid + clipped + invalid`, normally `512×512 = 262144`. Counts logical samples once, not all four repeated tiles. |

## 8. Canonical versus diagnostic data

A proper player follows these rules:

- `metricDepth.views` is the canonical depth-payload manifest.
- `depth.views` is the canonical ordered calibration list.
- `depth.byId` is a redundant lookup index.
- `sensorGeometry` is authoritative for depth unprojection; `xrViewGeometry` describes the associated render view.
- Full transforms are authoritative over baseline scalar summaries.
- `rawValueToMeters` inside the binary payload is authoritative for that payload; metadata should match and can be cross-checked.
- `output` and mapping records are authoritative for image/atlas placement.
- `phaseDepthFallback` is never authoritative when an embedded native chunk is present.
- Informational formula strings are documentation, not code to evaluate.

## 9. Normative player algorithm

### 9.1 Parse and validate

1. Verify PNG signature and chunk bounds.
2. Verify PNG CRCs unless operating in an explicitly trusted environment.
3. Parse the uncompressed `iTXt` record whose keyword is `mr-phase-shift-metadata`.
4. Require `schema == "mr-phase-shift-snapshot/v4"` or an explicitly supported compatible v4 revision.
5. For every item in `metricDepth.views`, locate and parse the named `mdP*` chunk.
6. Verify `MRD1`, payload version, dimensions, byte lengths, element type, and byte order.
7. If `sha256OfRawBytes` is non-null, verify it.
8. Match each manifest item to `depth.views[].viewId`.

### 9.2 Reconstruct one native depth pixel

```text
raw = samples[row * width + column]
if raw is zero, non-finite, or non-positive: no geometry

zMeters = raw * rawValueToMeters

d = [(column+0.5)/width, (row+0.5)/height, 0, 1]
vh = normViewFromNormDepthBuffer * d
viewU = vh.x / vh.w
viewV = vh.y / vh.w

ndc = [2*viewU - 1, 1 - 2*viewV, -1, 1]
rh = sensorProjectionInverse * ndc
ray = rh.xyz / rh.w
if ray.z >= 0 or approximately zero: reject sample

sensorPoint = ray * (zMeters / -ray.z)
ch = captureLocalFromSensor * [sensorPoint, 1]
captureLocalPoint = ch.xyz / ch.w
```

This guarantees `sensorPoint.z == -zMeters` apart from floating-point error and preserves physical metres.

### 9.3 Associate RGB

Use the pixel's `viewU`,`viewV` and section 6 to obtain stored RGB UV. Do not stretch the entire 1024×1024 slot over geometry when letterboxing or cropping was used.

For mono capture, use the mono slot. For stereo capture, use matching left/right RGB and depth when present. A player MAY fuse the two metric point clouds/meshes into one camera-local surface, but it must preserve visibility holes and avoid averaging unrelated foreground/background samples across disocclusions.

### 9.4 Attach at playback camera origin and render with a different IPD/FOV

Let:

```text
referenceFromCaptureLocal = playbackViewerPose.transform.matrix
referenceFromPlaybackView = playbackXRView.transform.matrix
playbackViewFromReference = inverse(referenceFromPlaybackView)
playbackViewFromCaptureLocal = playbackViewFromReference * referenceFromCaptureLocal
clip = playbackXRView.projectionMatrix
     * playbackViewFromCaptureLocal
     * [captureLocalPoint, 1]
```

Run this separately for every current playback XRView. This uses the playback device's current view offsets/IPD and projection. Never multiply geometry by a capture-IPD/playback-IPD ratio.

A non-XR player performs the same operation with its own camera pose and projection, placing capture-local origin at that camera.

### 9.5 Occlusion/clipping

Invalid native samples create holes and must not write depth. For live real-world occlusion, compare captured and current live depths after both are expressed in the same playback view/camera convention. A common starting test is:

```text
if liveRealZMeters < capturedFragmentZMeters - toleranceMeters:
    discard captured fragment
```

Start near `occlusion.recommendedStartingToleranceMeters` and tune for sensor noise. Phase alpha 128 is only a lower bound and must not be used as an exact occluder.

### 9.6 Phase fallback decoder

At one logical 512×512 coordinate, read the four grayscale bytes and alpha:

```text
c  = coarse / 255
ci = inverse / 255
s  = 2*sine   / 255 - 1
q  = 2*cosine / 255 - 1
cleanCoarse = clamp(0.5 * (c + (1-ci)), 0, 1)
phase01 = atan2(s, q) / (2*pi)
if phase01 < 0: phase01 += 1
cycle = round(cleanCoarse * phaseCycles - phase01)
zNorm = clamp((cycle + phase01) / phaseCycles, 0, 1)
zMeters = zNorm * maxDistanceMeters
```

- alpha 0: invalid;
- alpha 128: `zMeters` is a lower-bound placeholder only;
- alpha 255: valid fallback estimate.

## 10. Error handling and forward compatibility

A player MUST fail the affected view on missing depth chunk, malformed payload, impossible dimensions, singular required matrix, or mismatch between payload bytes and header. It SHOULD continue displaying RGB when metric depth fails, but must not claim calibrated 3D playback.

Unknown metadata fields SHOULD be ignored. Unknown **major** schema versions MUST NOT be treated as v4. Missing optional diagnostic fields must not block reconstruction when all canonical fields are present.

## 11. Supplied decoder API

`mrps_v4_decoder.js` installs `globalThis.MRPSv4`:

```js
const parsed = MRPSv4.parsePng(pngBytes);
const record = MRPSv4.resolveDepthRecord(parsed, 'left'); // or 'mono'/'right'
const meters = MRPSv4.getDepthMeters(record, column, row);
const point = MRPSv4.unprojectNativePixel(parsed, 'left', column, row);
const baseline = MRPSv4.getCaptureStereoBaseline(parsed);
const projected = MRPSv4.projectForPlaybackXRView(
  point.captureLocalPointMeters,
  currentXRView,
  currentXRViewerPose
);
```

Additional exports: `nativePixelToNormalizedView`, `makePlaybackViewFromCaptureLocal`, `projectCaptureLocalPoint`, `decodePhaseSamples`, `mat4Multiply`, `mat4Invert`, and `mat4Transform`.

## 12. Package files

- `quest_3_mr_phase_shift_encoder_adaptive_v4.html` — real-device encoder.
- `mrps_v4_decoder.js` — reference parser, unprojection, baseline, and playback-projection helpers.
- `MRPS_v4_SPEC.md` — this complete field and player contract.
- `mrps-v4-metadata.schema.json` — machine-readable schema with descriptions.
- `mrps-v4-depth-payload.ksy` — machine-readable `MRD1` binary header layout.
- `example_mono_metadata.json` / `example_stereo_metadata.json` — representative metadata.
- `MRPS_v4_CONFORMANCE_CHECKLIST.md` — implementation acceptance checklist.
