Inspiration
Hello people! You know life is getting fantastic again when you have ample time to execute your projects. (I just finished my Year 1’s final!!!) Anyway, this table tennis tracker thingy has been on my project list for quite a long time. It was a fortunate coincidence for me to be intrigued by this computer vision topic. There are mainly two reasons for me doing this:
- I accidentally knew someone from Linkedin who did this project before
- I found out that XX company’s employees are doing this as a fun after-work project during a company visit
These two things combined make me want to give this project a shot.
Videography
Before creating a tracking OpenCV model for a table tennis ball, I first need to record a video of me playing table tennis. In the recording process, I tried to take the videos from several angles, filter out the unwanted parts, and compile the usable cuts. In the end, I got two short videos.
First Attempt: Filter out the ball and grab contours
I plan to use CV2’s inRange function to filter out the ball by adjusting the lower and higher boundary HSV values.
But it seems like my video is not suitable for this method because the background and the ping pong table contain a similar colour to the white ball, causing the model to misinterpret. So I tried to modify the video’s colour using Premiere Pro, but the effect didn’t help much either.
Second Attempt: Use Background Subtraction to Track the ball
This method is purposed to track the ball’s movement by generating a foreground mask (namely, a binary image containing the pixels belonging to moving objects in the scene) by using a static camera. I found this method while googling how to track moving objects using OpenCV. OpenCV provides several functions to subtract the background. I chose one of them and got the below result.
As you can see, this method removes the other unnecessary white traces. Only the ball is left in the processed frame.
Background Subtraction in OpenCV
We will consider video as a three-dimensional object with x,y and t as the parameters. x stands for width, y stands for height and t stands for time. It is a sequence of images combined. A simple approach in background subtraction is; first, you estimate the background at time t. Next, subtract the background from the current input frame. Finally, apply a threshold to the absolute difference of the subtraction and get the foreground mask. The fundamental question here is how we determine which frame is the background. Note: This images is extracted from OpenCV Documentation
Frame Differencing
My first instinct is to subtract the Image(x,y,t-1) from Image(x,y,t), which means I assume that the background is always the last frame (t-1) of the current input frame (t). Google identifies this method as frame differencing. However, the problem of using frame differencing magnifies when the object with one colour is moving at a slow speed or is big. Imagine a big white ball with radius 10 moves slowly from position (x,y,t) to (x,y+1,t+1). At time=t, (x,y) is white. At time=t+1, (x,y) remains white (Even though it is moving) because the ball has a radius of 10. This confusion will cause the algorithm not to highlight the centre of the moving object because the machine detects a lower absolute difference than the threshold. Illustration as follows (Not-To-Scale):
Mean and Median Filtering
By Mean Filtering, instead of using only the last frame, we use the average of the previous N frames. The formula is as follows:
Hence, the foreground mask is computed as follows:
However, the Mean Filtering contains some random noise, and hence Median Filtering is introduced. Median filtering means using the median of previous N frames as the background model.
The foreground mask for Median Filtering is computed as follows:
Gaussian Mixture-based Background
This approach is used in CV2’s BackgroundSubtractorMOG function. The initial idea is proposed P. KadewTraKuPong and R. Bowden in the paper “An improved adaptive background mixture model for real-time tracking with shadow detection”. The main idea of this algorithm is we model each background pixel with a mixture of Gaussians (K = 3 to 5) and we update its parameter over time. If the pixel is comes from one of those Gaussians, we define it as background.
More info about Gaussian Mixture Model can be found here
Find corners in the video, add circles and export
After successfully implementing the background subtraction using the BackgroundSubtractorMOG function, all I left was to identify the white colour in the processed frames and add circles to the original structures. These can be achieved using circle and goodFeaturesToTrack functions provided by cv2.
Result
Final video: Click Here
Github Code: Read Me
References
- An Improved Adaptive Background Mixture Model for Realtime Tracking with Shadow Detection
- Background Subtraction in OpenCV
- Introduction to Computer Vision by Udacity
Thank you for reading!