I was asked what I do when I'm not working. Then I do other fun stuff. Like physics and math. Recently I tried to do motion tracking with blender, and it resulted in some nice things (not so much results, though). So I'll post about it here.
So what is that? It's a method for modifying a video recording. From its name you can understand that the computer will track moving points in the video. That can be useful for two things: finding out how the camera was moving, if you are tracking points that didn't move while you were recording, and finding out how an object was moving. Once you know that, Blender allows you to add animations into the video, which look like they were there when you were recording. For an example, you can check the new movie from the Blender Foundation, Tears of Steel.
Getting a camera
I hoped to do something like that (but much less impressive). So I recorded a video with my photo camera, only to notice that even though it can get an acceptable resolution in photos, it only does 352×288 pixel videos. So I went to Walmart for a new camera. Now one bad thing about Walmart is that the people who work there have no idea what they are selling. All the camera boxes said what resolution photos they would do, but only one said anything about video (and that was much too low). I asked, but they didn't know. So I just picked one and hoped it would be good. It was acceptable. The video is fine, the only annoying thing is that it makes a sound when switching it on and off, and when taking a photo. But I'll live with it.
I just put some markers on my table and made a video of them. Then I tried to let Blender reconstruct the camera motion using its default settings. It worked pretty well. But when I tried to construct the table in Blender, I noticed that it was deformed; the reconstructed shape didn't have any right angles. So that needed some tuning.
The thing that should be tuned, is the ''sensor size'' and the ''focal length'' of the lens. With expensive cameras, these things are probably given. But not with a cheap one like mine. It did specify a focal length, but no sensor size.
Of course that's great! It gives me an excuse to use physics and find that it helps me accomplish things. And because that's so nice, I'm sharing it here.
So why would they want to know a focal length and a sensor size, and what are these things? For that, I shall give a short explanation of classical optics. The classical part is that light particles (photons) travel in straight lines from one point to another (non-classical optics treats them as waves).
One point of a subject emits light in all directions. We can use a lens to focus this light. If the setup is right, the lens will focus all the light from one point to another point (on the other side of the lens). Every point in the subject has a corresponding point on the other side, and those points together are called the image. When making a photo, the idea is to focus the subject onto the sensor, so that the sensor will record a sharp image of it.
The picture above shows how it works. The blue line on the right is the subject. The black vertical line is the lens. The colored lines are light paths from the top of the subject. The lens will focus any light from that point in the subject onto one spot in the image, which is the blue line on the left. So that is where the sensor must be if you want to make a sharp picture.
Real cameras have more than one lens in them, so the story is more complex. However, most of the time it's ok to just do the math as if if was a single lens. The sensor size and focal length then no longer have a physical meaning, but that's no problem, because it's all in a closed box anyway and having a model of a box that behaves the same is good enough.
There are a few formulas which can be used here. First of all, there is 1/f=1/v+1/b, where f is the focal length of the lens, v is the distance from the lens to the subject and b is the distance from the lens to the image. (The letters I'm using make sense if you speak Dutch.) Furthermore, the magnification is defined as B/V, the size of the image divided by the size of the subject, and it is equal to b/v, their respective distances to the lens divided.
Finally, I shall define a ratio R as the number of pixels per mm on the sensor. I shall need to convert between pixels and millimeters, because all the distances on the ''real'' side of the lens are in meters, but the sizes in the image are only measured in pixels.
I know the number of pixels on my sensor, so if I also know R, I will know the sensor size in mm. Besides that, I will need to find the focal length of the lens; I don't trust the manufacturer for that.
To do that, I took a picture with a known v (distance from lens to subject) and V (size of subject). The picture gives me Bp, the size of the image is pixels. I do not know b (the distance from the lens to the sensor). When I find R, I can easily compute B (the size of the image) from Bp, of course.
So before looking at the photo, let's rework the formulas into something more useful for this problem.
b/v=B/V, so b=vB/V and 1/b=V/vB.
substituting that b into the other formula gives:
1/f=1/v+V/vB, so 1/f-1/v=V/vB
The left part of that can be written as
Now I convert B to Bp using R: R was the number of pixels per mm, so B×R=Bp, or B=Bp/R:
Multiplying the v out on both sides and dividing by VR gives
This is a strange way of writing it, but I have a reason for that. The above formula (the leftmost and rightmost expressions only) is of the form
In this case y=1/Bp, a=1/(VRf), x=v and c=f. This is the formula for a straight line. It means that if I do this computation for several subjects, each with equal size V, but different v and Bp, and I plot the results, it should give me a straight line. The steepness of the line is a (=1/VRf) and the crossing with the horizontal axis is c (=f).
So if I first use this crossing to determine f, I can then use that f and the steepness of the line (and the known V) to compute R. I'll use that to compute the sensor size in mm, and then my calibration is complete!
At this point, a mathematician would say "the problem can be solved" and stop. But I'm a physicist, and I want to finish it. Then I can see what it means, and I can use it in Blender. So I need several subjects, all of the same size, but with different distances to the camera. I did many of them in a single picture:
The result of plotting them is this graph:
The units are 1/pixels on the vertical axis, and the length of my kitchen tiles on the horizontal axis. The red line is the fit through the points. The computed values for a and c are a=0.00020/px/tilesize (the strange unit is due to the inverse on the vertical axis) and c=-0.104 tilesize.
Using the results
So we knew that f=c, so f=-0.104 tilesize. With a measured tilesize of 155 mm, this means f=-16.1 mm. That is a problem; the lens of a camera must have a positive focal length. Negative lenses don't produce an image that you can capture by putting a sensor in the right place. If I do the calculation for computing the sensor size anyway, it results in a negative sensor size. These things are fine for a mathematician, but they don't make a lot of sense in the real world.
So what does all this mean? The results are nonsense. That always means that the assumptions were wrong. In this case, it means that my camera is more complex than I had hoped, and that a single lens is not a proper model to use for it.
But I can cheat. If I say that my camera was a few centimeters more back, then the whole graph will move a bit to the right, and the crossing goes to the positive part. The reconstructed camera will be placed slightly further from the scene than I really was during the recording, but I can live with that. This is what's so cool about physics: when you understand the situation, you know what you have to change to make it work the way you want it.
So that's what I've been doing recently. I don't have any modified videos yet, but I'll show them when I do.
Sunday May 13 2018 00:49:24
You can leave a comment (plain text only)
Feedback is also welcome at firstname.lastname@example.org.