## Introduction

The primary difference between an image and a video is that a video is a collection of images sequenced in a very specific order. In more technical terms, videos have a temporal structure in addition to a spatial structure found in images. Information in a video is recorded not only spatially, i.e, objects or people in a video, but also sequentially. This extra piece of information is what makes classifying videos a bit more challenging. We require an algorithm that could classify videos keeping the motion of the video as a key feature.

## Optical Flow

Optical flow is the pattern of apparent motion of objects in the image between 2 consecutive frames caused by the displacement of the object or the camera.

There are two major assumptions that optical flow assumes.

- It assumes that the brightness of every frame in the video is constant.
- Neighbouring pixels are in a similar motion.

Optical flow is per pixel prediction, and it tries to estimate how the pixel brightness moves across the window over time.

Consider a pixel at coordinates I(x, y, t), where t is time which is the third dimension. Now, consider the pixel moves by distance dx, dy in time dt.

So, since the pixel intensity remains unaltered. We can say that.

`I(x, y, t) = I(x + dx, y + dy, t + dt)`

Here, the change in location is predicted by flow field.

Suppose, there is an RGB pixel with values (255, 255, 255) at coordinates (11,15) at time t = 1 second. Optical flow assumes that the same RGB values,(255, 255, 255), will still be there on the screen at t=2 seconds. If there is a motion, the change of coordinates will give the displacement vector. Letâ€™s say at t=2 seconds; the pixel is at (16, 23). The optical flow displacement vector for this pixel will be (5, 8). This means that if we take the pixel in the previous frame and apply the displacement vector to it, we should get the new image.

Going back to the equation we derived earlier,

`I(x, y, t) = I(x + dx, y + dy, t + dt)`

We simplify the equation using Taylor series, remove common terms and divide it by dt to get the final equation as -

`fxu + fyv + ft = 0 `

Here, f_{x , }f_{y } are image gradients and f_{t }is the gradient of time. u and v are unknown variables that can be solved using Lucas Kannade Method.

### Lucas Kannade Method

Remember, the second assumption of optical flow - Neighbouring pixel have similar motion. Lucas Kannade makes use of this assumption to take a 3x3 patch around a point and all the 9 points are assumed to have a similar motion. We can find f_{x,}f_{y,}f_{t } for these 9 points. So we now have 9 equations with 2 unknown variables in each to find. This is a computationally expensive task. A more convenient method would be the least square fit method. The equation for finding u,v is given by -

Source - __link__