For those of you who do not know what Azure Kinect is… it is basically a developer kit with advanced AI sensors that provide sophisticated computer vision and speech models.
Topics covered in this post:
- Views RGB
1 – Hardware
This is the hardware presented by Microsoft. As you can see, this Kinect, weighing only 440g has:
- RGB Camera:
- OV12A10 12MP CMOS rolling shutter sensor.
- USB video class-compatible, and can also be used without the SDK Sensor (keep reading for more information on the SDK).
- Color space: BT .601 full range [0..255]
- Depth Camera:
- 1-Megapixel Time-of-Flight (ToF) imaging chip enabling higher modulation frequencies and depth precision.
- Two NIR Laser diodes enabling near and wide Field-of-View (FoV) depth modes.
- Depth outside of indicated range depending on object reflectivity.
- Implements the Amplitude Modulated Continuous Wave (AMCW) ToF principle. Casts modulated illumination in the near IR (NIR) spectrum onto the scene. It then records an indirect measurement of the time it takes the light to travel from the camera to the scene and back.
- IR emitters.
- Motion Sensor (IMU):
- LSM6DSMUS includes an accelerometer and gyroscope sampled at 1.6 kHz, reporting to the host at 208 Hz. Origin [0,0,0]; both coordinate systems are right-handed.
- hone array:
- 7-microphone circular array, identified as a standard USB audio class 2.0 device.
- Sensitivity: –22 dBFS (94 dB SPL, 1 kHz)
- Signal to noise ratio > 65 dB
- Acoustic overload point: 116 dB
2 – Field of View RGB / Depth
The best way to understand the field-of-view, and the angle that the sensors “see” is through this diagram.
- This diagram shows the RGB Camera 1. in 4:3 mode from a distance of 2000mm.
- Regarding the Depth Camera views (which is tilted 6 degrees downwards from the color camera), both 2. and 3. It’s important to understand that this camera transmits modulated IR images to the host PC. Then, the depth engine software converts the raw signal into depth maps. As described in the image, the supported modes are:
- NFOV (Narrow field-of-view): These modes are ideal for scenes with smaller extents in X and Y but larger in Z. One of the illuminators in this mode is aligned with the depth camera case, not tilted.
- WFOV (Wide field-of-view): These modes are ideal for scenes with larger extents in X and Y but smaller in Z. The illuminator used in this view is tilted an additional 1.3 degrees downward relative to the depth camera.
- The depth camera supports 2×2 binning modes (at the cost of lowering image resolution), to extend the Z-range in comparison to the corresponding unbinned modes we described before.
Note: When depth is in NFOV mode, the RGB camera has better pixel overlap in 4:3 than in 16:9 resolutions.
3 – SDK
The K4A DK consists of the following SDKs:
- Sensor SDK (is primarily a C API, in this link it also covers the C++ wrapper):
- Note: Can provide color images in the BGRA pixel format. The host CPU is used to convert from MJPEG images received from the device.
- Features included:
- Depth and RGB Camera control and access.
- Motion sensor.
- Synchronized Depth-RGB camera streaming with configurable delay between cameras.
- Device synchronization
- Camera frame meta-data
- Device calibration data access
- Azure Kinect Viewer
- Azure Kinect Recorder
- Firmware update tool
- Body Tracking SDK (primarily a C API):
- Body Segmentation
- Anatomically correct skeleton for each partial or full body in FOV.
- Unique identity for each body.
- Can track bodies over time.
- Viewer tool to track bodies in 3
Photo Source: Microsoft.