Some time ago, I wrote that maybe one could use the image frame motion that the Raspberry Pi graphics chip is outputting when capturing a video to estimate the motion of the attached quadrocopter in 3d space. It is definately possible to retrieve this information from the full video (see this paper and this video), but it seems that the 2d video encoding motion vectors are not accurate enough to carry this information, as this paper from Bristol HP labs written by Maurizio Pilu in 1997 demonstrates
For me, this means that it takes more computational effort to get motion estimation from the video, possibly using Deep Learning strategies as currently hyped by all kinds of press. Some weeks ago I worked through the MNIST expert tutorial of Google TensorFlow and must say that I like the idea to define some calculation model in Python and then use some high-performance execution backend (CPU or GPU) to execute it. At least in my quick look, the Python API was quite easy to use and I would like to use it in the following scenario: when the altitude sensor fusion (see my last blog entry) is working, I will try to train a convolutional recurrent network to predict the height above ground and vertical speed (as estimated based on the other sensors) from just the video stream.
Unfortunately, this will most likely not perform well on the Raspberry Pi 2, so I won’t be able to use it in the air (but I am interested in how easy it is to get this running in TensorFlow).