The device is said to be powered by a high-end graphics and central processing unit, as well as a "Holographic Processing Unit, (HPU)" whatever that is. The device will use Windows Holographic which is baked into Windows 10.
I have a few questions about the technical hurdles based on what's been shown.
Microsoft stated that the device is untethered, without the need to connect to external devices. The first issue this raises with me is Battery life? With the amount of computational power needed to plot X,Y,Z of a model in real-time whilst also accounting for micro-movement of the users head must take some horsepower, which ultimately requires battery power.
Is the device constantly scanning the environment for objects and surfaces? In the MineCraft image above we can see that the AR models appear as though they are resting on surfaces. This is nothing new, the AR bots in Playroom can stand on surfaces, but that camera isn't moving every frame.
The Kinect sensor has depth mapping, but it isn't as accurate as the above image seems to suggest of the HoloLens.
Part of the bridge structure, resting on the sofa, is out of view because of the sofas armrest. That suggests pixel precision from the depth sensor knowing exactly where the sofa arm is whilst the software then finds out what that means for the models, and renders them accordingly. All within 1 frame, before it has to do it all over again.
Speaking of frames, Oculus VR have recently stated that there should be around 90 frames per second to be able to hit the 'sweet spot' and immerse the user with VR. I'd hazard a guess that something similar would be required with AR.
This potentially means the device will scan the room, create a depth map, find out what objects are in front of the models, and render, to pixel perfect accuracy, those models in 3D space, without lag, 90 times a second.
This also leads to the question of what happens if somebody walks through or stands inside of your hologram. The videos Microsoft have released have shown multiple users looking at the same objects simultaneously from different viewpoints. Even in the live demo, both the user and the camera had different views of the models.
This makes me question the tethering again. If they're working over a network and are both manipulating a model, one version must be more true than the other. They're not looking at the identical model.
Another concern is lack of input devices. It appears to work via voice commands and gestures. This is fine for Siri or Kinect. If it fails on either of those then there are alternate inputs, a controller or an on-screen-keyboard. With HoloLens and Windows Holographic they need to work consistently 100% of the time. They can't be janky like Siri or Kinect. Also, everyone feels like an idiot when they gesture or use devices without haptic feedback.
And finally, How do you draw something that's black on a transparent screen?
I really hope that Microsoft have pulled this off, and that they've managed to solve all the limitations of current technology to pull this off in a nice consumer package that won't be priced to high. I'll buy one. I'd probably even work on small projects if something like Unity support it.
Also, as Forbes pointed out, a holographic Cortana hanging out while answering your Google searches and telling me the weather would be pretty rad.