This is my master’s thesis in Computer Science, on optimally allocating a computational budget while encoding video, done at the Eindhoven University of Technology.
Video encoding is a process that transforms raw video data into a compressed format. It is used to reduce the storage and/or transmission requirements of video, as uncompressed video data takes up a lot of space. The encoding process requires a lot of resources in the form of working memory, computational power and still a fairly large amount of storage if you want to achieve good quality. These resource requirements are not fixed but depend on the contents of the video being encoded or decoded. This means that the exact resource requirements cannot be known beforehand and fluctuate over time. In an environment with restricted resource availability, which may also be fluctuating, the encoding of video therefore becomes challenging.
A solution to this problem is using a video encoder that is scalable, meaning that it can adapt its operation to function under some restricted set of resources. We distinguish between two types of scalability: rate scalability and complexity scalability. The goal of rate scalability is to have the encoder produce an output bitstream that will survive transmission across a channel of fluctuating capacity. The goal of complexity scalability is to enable the encoder to continue producing chunks of output within some deadline, even under fluctuating computational resources.
Complexity scalability is useful when the encoded video must be produced within some timeframe. If there are no such requirements, it does not matter how much (computational) time encoding takes. In that case, even if the amount of available computational resources changes there is no point in adapting to it.
An important goal of scalable video encoding is that at every point in time, we want to achieve the best quality possible for the amount of resources available at that time. Available resources must not go to waste, which means that we want to achieve finegrained scalability. The encoder must be able to operate at any point in its scalability range (from minimum to maximum).
In this thesis, we study an H.264/AVC encoder operating at a fixed bitrate under an arbitrarily restriced amount of computational power and with a deadline on every frame, i.e., the video must be produced in real-time. This scenario is consistent with video conferencing on a device with limited computational power, such as a mobile phone.
We show that we can guarantee that the encoder does not miss a deadline for encoding a frame by introducing a complexity budget in the encoder, placing a hard upper bound on the computational requirements of a single frame. Our complexity budget limits the number of Sum of Absolute Differences (SAD) computations that are performed, which is the most expensive operation of the motion estimation process of video encoding.
To obtain the best video output quality for a given frame budget we introduce an algorithm that allocates a frame’s complexity budget to its constituents, called “macroblocks”, based on the distortion with the best reference block block found during motion estimation of the previous frame. A high distortion indicates a bad match and is a sign that the macroblock could benefit from extra steps of motion search. We show that this allocation strategy results in higher quality than both the trivial Uniform Allocation strategy and the Motion History Matrix (MHM) allocation technique, which uses the motion vectors found during motion estimation to estimate the complexity needs. We achieve quality gains of around 1 dB Peak Signal-to-Noise Ratio (PSNR) at around 30 dB PSNR encoding the “city” sequence at 1500 kbps.