Gamers’ interest may come and go, but artists are always exploring the potential of computer vision for expression. Microsoft this month has resurrected the Kinect, albeit in pricey, limited form. Let’s fit it to the family tree.
Time flies: musicians and electronic artists have now had access to readily available computer vision since the turn of this century. That initially looked like webcams, paired with libraries like the free OpenCV (still a viable option), and later repurposed gaming devices from Sony and Microsoft platforms.
And then came Kinect. Kinect was a darling of live visual projects and art installations, because of its relatively sophisticated skeletal tracking and various artist-friendly developer tools.
A full ten years ago, I was writing about the Microsoft project and interactions, in its first iteration as the pre-release Project Natal. Xbox 360 support followed in 2010, Windows support in 2012 – while digital artists quickly hacked in Mac (and rudimentary Linux) support. An artists in music an digital media quickly followed.
For those of you just joining us, Kinect shines infrared light at a scene, and takes an infrared image (so it can work irrespective of other lighting) which it converts into a 3D depth map of the scene. From that depth image, Microsoft’s software can also track the skeleton image of one or two people, which lets you respond to the movement of bodies. Microsoft and partner PrimeSense weren’t the only to try this scheme, but they were the ones to ship the most units and attract the most developers.
We’re now on the third major revision of the camera hardware.
2010: Original Kinect for Xbox 360. The original. Proprietary connector with breakout to USB and power. These devices are far more common, as they were cheaper and shipped more widely. Despite the name, they do work with both open drivers for respective desktop systems.
2012: Kinect for Windows. Looks and works almost identically to Kinect for 360, with some minor differences (near mode).
Raw use of depth maps and the like for the above yielded countless music videos, and the skeletal tracking even more numerous and typically awkward “wave your hands around to play the music” examples.
Here’s me with a quick demo for the TED organization, preceded by some discussion of why I think gesture matter. It’s… slightly embarrassing, only in that it was produced on an extremely tight schedule, and I think the creative exploration of what I was saying about gesture just wasn’t ready yet. (Not only had I not quite caught up, but camera tech like what Microsoft is shipping this year is far better suited to the task than the original Kinect camera was.) But the points I’m making here have some fresh meaning for me now.
2013: Kinect for Xbox One. Here’s where things got more interesting – because of a major hardware upgrade, these cameras are far more effective at tracking and yield greater performance.
- Active IR tracking for use in the dark*
- Wider field of vision
- 6 skeletons (people) instead of two
- More tracking features, with additional joints and creepier features like heart rate and facial expression
- 1080p color camera
- Faster performance/throughput (which was key to more expressive results)
Clarification: The Kinect One uses a Time of Flight calculation in place of the Structured Light (“Light Coding”) technique of the original Kinects – a fancy way of saying that it gets its depth by measuring how long it takes for light emitted to return to the sensor, instead of projecting a bunch of dots on the subject and calculating the position. That isn’t terrifically important to the end user or developer, but it does enable the active IR technique mentioned above, and Microsoft credits it for the enhanced sensing performance in more complex scenes and larger scenes. (You can read up on that on their 2013 blog. This should also be true of the Azure Kinect, below.
Kinect One, the second camera (confusing!), definitely allowed more expressive applications. One high point for me was the simple but utterly effective work of Chris Milk and team, “The Treachery of Sanctuary.”
And then it ended. Microsoft unbundled the camera from Xbox One, meaning developers couldn’t count on gamers owning the hardware, and quietly discontinued the last camera at the end of October 2017.
Everything old is new again
I have mixed feelings – as I’m sure you do – about these cameras, even with the later results on Kinect One. For gaming, the devices were abandoned – by gamers, by developers, and by Microsoft as the company ditched the Xbox strategy. (Parallel work at Sony didn’t fare much better.)
It’s hard to keep up with consumer expectations. By implying “computer vision,” any such technology has to compete with your own brain – and your own brain is really, really, really good. “Sensors” and “computation” are all merged in organic harmony, allowing you to rapidly detect the tiniest nuance. You can read a poker player’s tell in an instant, while Kinect will lose the ability to recognize that your leg is attached to your body. Microsoft launched Project Natal talking about seeing a ball and kicking a ball, but… you can do that with a real ball, and you really can’t do that with a camera, so they quite literally got off on the wrong foot.
It’s not just gaming, either. On the art side, the very potential of these cameras to make the same demos over and over again – yet another magic mirror – might well be their downfall.
So why am I even bothering to write this?
Simple: the existing, state-of-the-art Kinect One camera is now available on the used market for well under a hundred bucks – for less than the cost of a mid-range computer mouse. Microsoft’s gaming business whims are your budget buy. The computers to process that data are faster and cheaper. And the software is more mature.
So while digital art has long been driven by novelty … who cares? Actual music and art making requires practice and maturity of both tools and artist. It takes time. So oddly while creative specialists were ahead of the curve on these sorts of devices, the same communities might well innovate in the lagging cycle of the same technology.
And oh yeah – the next generation looks very powerful.
Kinect: The Next Generation
Let’s get the bad news out of the way first: the new Kinect is both more expensive ($400) and less available (launching only in the US and China… in June). Ugh. And that continues Microsoft’s trend here of starting with general purpose hardware for mass audiences and working up to … wait, working up to increasingly expensive hardware for smaller and smaller groups of developers.
That is definitely backwards from how this is normally meant to work.
But the good news here is unexpected. Kinect was lost, and now is found.
The safe bet was that Microsoft would just abandon Kinect after the gaming failure. But to the company’s credit, they’ve pressed on, with some clear interest in letting developers, researchers, and artists decide what this thing is really for. Smart move: those folks often come up with inspiration that doesn’t fit the demands of the gaming industry.
So now Kinect is back, dubbed Azure Kinect – Microsoft is also hell-bent on turning Azure “cloud services” into a catch-all solution for all things, everywhere.
And the hardware looks … well, kind of amazing. It might be described as a first post-smartphone device. Say what? Well, now that smartphones have largely finalized their sensing capabilities, they’ve oddly left the arena open to other tech defining new areas.
For a really good write-up, you’ll want to read this great run-down:
Here are the highlights, though. Azure Kinect is the child of Kinect and HoloLens. It’s a VR-era sensor, but standalone – which is perfect for performance and art.
Fundamentally, the formula is the same – depth camera, conventional RGB camera, some microphones, additional sensors. But now you get more sensing capabilities and substantially beefed-up image processing.
1MP depth camera (not 640×480) – straight off of HoloLens 2, Microsoft’s augmented reality plaform
Two modes: wide and narrow field of view
4K RGB camera (with standard USB camera operation)
7 microphone array
Gyroscope + accelerometer
And it connects both by USB-C (which can also be used for power) or as a standalone camera with “cloud connection.” (You know, I’m pretty sure that means it has a wifi radio, but oddly all the tech reporters who talked to Microsoft bought the “cloud” buzzword and no one says outright. I’ll double-check.)
Also, now Microsoft supports both Windows and Linux. (Ubuntu 18.04 + OpenGL v 4.4).
Downers: 30 fps operation, limited range.
Somehing something, hospitals or assembly lines Azure services, something that looks like an IBM / Cisco ad:
That in itself is interesting. Artists using the same thing as gamers sort of … didn’t work well. But artists using the same tool as an assembly line is something new.
And here’s the best part for live performance and interaction design – you can freely combine as many cameras as you want, and sync them without any weird tricks.
All in all, this looks like it might be the best networked camera, full stop, let alone best for tracking, depth sensing, and other applications. And Microsoft are planning special SDKs for the sensor, body tracking, vision, and speech.
Also, the fact that it doesn’t plug into an Xbox is a feature, not a bug to me – it means Microsoft are finally focusing on the more innovative, experimental uses of these cameras.
So don’t write off Kinect now. In fact, with Kinect One so cheap, it might be worth picking one up and trying Microsoft’s own SDK just for practice.
For more on this, if you’re in Berlin, Stanislav Glazov and Valentin Tszin will show how they connect computer vision (using Kinect) as a link between choreography and music. They also recently explored these fields with techno legend Dasha Rush at CTM Festival.