Ever wondered what win32, MFC, WPF, .Net and those acronyms are all about?
Here is a nice article that sheds some light into the software development roadmap for the Microsoft World (past, present and future): Windows 8 for software developers: the Longhorn dream reborn?.
Really nice article that let us know the sound and speech features included in the Kinect (which so far have been under the radar for most FOSS driver hackers/developers/users), and which will definitely pave the way for truly multimodal NUIs.
Interesting post on how the MS Kinect may actually work.
Some (unofficial and still to be confirmed) specs summarized from the post linked above (and comments):
- the Kinect appears to be a 640×480 30fps video camera that knows the *depth* of every single pixel in the frame. It does this by projecting a pattern of dots with a near infrared laser over the scene and using a detector that establishes the parallax shift of the dot pattern for each pixel in the detector (parallax seems to be more robust than intensity – some sources said that materials ( hair in particular ) caused large fluctuations in intensity, so it doesn’t seem like it would be a useful channel to probe for depth data).
- The depth buffer is only 320×480 (unconfirmed). It seems that the hardware will happily give a 640×480 version (this is Xbox360 API memory, so upscalingmay actually occur on the XBox360) but the hardware itself only gets enough data to fill 320×480.
- Alongside this there is a regular RGB video camera that detects a standard video frame. This RGBZ (or ‘D’) data is then packaged up and sent to the host over USB.
- It seems that the Kinect framerate (for RGB image and depth buffer) is 30Hz.
- The Kinect does not identify shapes within the field of view and does not attempt to map skeletal outlines of those shapes recognised. For that, you would need to take each one of the 640×480 frames and copy them into a framebuffer so they can be processed by a vision library like OpenCV. Typical operations would be to threshold the depth image to get the “closest” pixels – then perform a blob analysis ROI to group these pixels into identifiable features and then track those blobs over their lifetime.
- The Kinect uses a pattern of laser dots to detect depth, as can be seen in this video (and another one, and another one, and another one ;-)) and in these images. It seems to exist a 3×3 checker board effect in that dot pattern (no clue why yet… any suggestions?).
So, processing all this data seems to be quite heavy (mainly if you try to do it in an embedded board like the guy from the post above). Using a full-fledged PC/Mac using openCV and/or OpenCL in a multicore machine will get you the required juice for advanced image processing.
Finally, some quite interesting resources for Kinect related stuff:
Actually, what I found really impressive is not from M$… is coming from SAMSUNG: their new SR40 sensor/vision enabled display really rocks!!
Truly amazing concept. I wonder if current technology already allows this level of interaction, though. Anyway, I really think this is the way to go (and I hope Microsoft doesn’t ditch once again a great idea developed at their labs – what happened to PhotoSynth?!?)
More examples here.