Back to posts.

Reading Chunks from a Buffer

What follows is a brain dump of something I need to do from time to time and I thought it would be a good idea to write down the steps which are involved when reading and parsing chunks from a source. For example when reading from a file.

When do you want to read chunks anyway? Well, lets take an example for which I often need to read chunks: reading nals from a h264 file (annex-b). When parsing raw h264 data you don't know the byte-offsets in the file where the video frames start and stop. Therefore you need to parse it and detect the annex-b headers. Because the file size can be very large, I don't want to read the complete file at once into a buffer. A more sense approach is to read and parse the file in chunks of lets say 128 kilobytes. When you combine reading chunks and parsing these chunks there are several things you need to keep in mind:

How many bytes are still available in the source file?
How many bytes did the parser parsed?
How many h264-bytes didn't get parsed?

Determining the number of bytes to read

A first approach could be something like this: we keep track of how many bytes there are still in the source buffer (e.g. the file). Lets call the variable that holds the number of bytes that are available bytes_left_to_read. Then every time we read from the source, we try to read the maximum amount of bytes. The maximum bytes is normally the size of the chunk. But be aware that the last read from the source is special. For example, lets say the file size is 101 bytes and we're reading in chunks of 10 bytes, then we need to read only 1 byte for the last read. Using std::min<size_t>(chunk_size, bytes_left_to_read) we can determine how many bytes we still can read.

// Be aware that this is an incomplete example.
while (bytes_left_to_read > 0) {
  bytes_to_read = std::min<size_t>(chunk_size, bytes_left_to_read);
  bytes_left_to_read -= bytes_to_read;
}

This is all good and simple and it will make sure that we only read the number of bytes that fit in our chunk and doesn't exceed the number of bytes in our source.

But the approach above won't work because you can't always, read the full chunk_size. When the parser didn't parse the full previous chunk there are still some bytes left that you need to parse after the next read. Therefore we need to reduce the chunk_size by the number of bytes_available_in_chunk that still need to be parsed. bytes_available_in_chunk holds the number of source bytes that still need to be parsed but have been read. So a better approach is this to use: std::min<size_t>(chunk_size - bytes_available_in_chunk, bytes_left_to_read);

while (bytes_left_to_read > 0) {
  bytes_to_read = std::min<size_t>(chunk_size - bytes_available_in_chunk, bytes_left_to_read);
  bytes_left_to_read -= bytes_to_read;
}

Reading data into the buffer and parsing it.

Once we've determined how many bytes we can read, we need to read the bytes into our chunk. Because the chunk can hold some valid bytes that we need to parse we cannot simply copy new bytes into the start of the buffer. New bytes need to be stored after the bytes which are still available from our previous read.

But lets do one step back. Lets say we've just read a complete chunk of 10 bytes but only parsed 8 bytes. We need to move the last 2 bytes which are still waiting to be parsed, to the beginning of our chunk before we start reading new fresh bytes. For this we use memmove. With memmove we move the valid bytes to the start of our chunk buffer. The number of bytes that we need to move is what we called bytes_available_in_chunk that we calculate using chunk_size - bytes_parsed.

bytes_available_in_chunk = chunk_size - bytes_parsed;
memmove(chunk, chunk + bytes_parsed, bytes_available_in_chunk);

Once we've moved the remaining bytes to the start of the buffer we repeat the steps described above which leads to something like this:

int VideoH264Creator::create(const std::string inpath) {
 
    const int chunk_size = 1024 * 28;
    size_t bytes_available_in_chunk = 0;
    size_t file_size = 0;
    size_t bytes_left_to_read = 0;
    size_t bytes_to_read = 0;
    size_t bytes_parsed = 0;
    int parse_result = H264_PARSE_OK;
    uint8_t buffer[chunk_size];
 
    if (0 == inpath.size()) {
      SX_ERROR("Given input path is empty.");
      return -1;
    }
 
    /* Open the input file (h264, annex-b) */
    std::ifstream ifs(inpath.c_str(), std::ios::in | std::ios::binary);
    if (false == ifs.is_open()) {
      SX_ERROR("Failed to open: %s", inpath.c_str());
      return -2;
    }
 
    /* Check the file size. */
    ifs.seekg(0, std::ifstream::end);
    file_size = ifs.tellg();
    bytes_left_to_read = file_size;
    ifs.seekg(0, std::ifstream::beg); 
 
    if (0 == file_size) {
      SX_ERROR("Input file is empty.");
      return -3;
    }
 
    while (bytes_left_to_read > 0) {
 
      /* We can only read the remaining free space in the chunk, or what's still remaining in the file. */
      bytes_to_read = std::min<size_t>(chunk_size - bytes_available_in_chunk, bytes_left_to_read);
 
      /* We read new bytes, after the bytes which are still available. */
      ifs.read((char*)buffer + bytes_available_in_chunk, bytes_to_read);
 
      /* Increment the number of valid bytes using the number of bytes we just read. */
      bytes_available_in_chunk += ifs.gcount();
 
      parse_result = parser.parse(buffer, bytes_available_in_chunk,  bytes_parsed);
 
      if (bytes_parsed > chunk_size) {
        SX_ERROR("Number of bytes parsed bigger then given buffer. Not supposed to happen.");
        break;
      }
 
      /* Remove the bytes that were read from our small buffer. */
      bytes_available_in_chunk = chunk_size - bytes_parsed;
      memmove(buffer, buffer + bytes_parsed, bytes_available_in_chunk);
 
      /* Recude the number of bytes read from the file. */
      bytes_left_to_read -= bytes_to_read;
    }
 
    return 0;
  }

NAT Types This is so exciting, in this article I dive into some of the different ways a NAT device translates addresses which is important for peer-to-peer connections.
Building Cabinets In this post I dive into the design and construction of a cabinet with an interact LED strip. I also explain how I dynamically change the colors of the LEDs over TCP/UDP.
Compiling GStreamer from source on Windows How to compile GStreamer on Windows from Source using Visual Studio 2019 and the meson build system.
Debugging CMake Issues In this post I explain a process you can follow to debug issues with CMake by focusing on a specific target and making the output verbose.
Dual Boot Arch Linux and Windows 10 How to install Arch Linux and Windows 10 Pro as dual boot. A step by step tutorial how to create bootable installers, partition and setup a dual boot menu.
Mindset Updated Edition, Carol S. Dweck (Book Notes) Paragraphs I marked from the book "Mindset" from Carol S. Dweck.
How to setup a self-hosted Unifi NVR with Arch Linux A step by step HOW-TO that explain show to setup a Unifi Video Controller with an NFS share with Arch Linux.
Blender 2.8 How to use Transparent Textures Follow this node setup when you want to use an image with transparency as a "sticker".
Compiling FFmpeg with X264 on Windows 10 using MSVC A couple of steps to compile FFmpeg on Windows using MSVC.
Blender 2.8 OpenGL Buffer Exporter The following Blender script creates a [name].h and [name].cpp for the selected object and stores the positions, normals and UVs.
Blender 2.8 Baking lightmaps Light maps are a cheap way to add a lot of realism to you static scenes and have been used forever.
Blender 2.8 Tips and Tricks Use Environment Map only for reflections; create a floor plane for a Product Render, diffuse texture for roughness and more!
Setting up a Bluetooth Headset on Arch Linux Learn how to setup a Sennheiser PXC 550 Bluetooth headset on Arch Linux.
Compiling x264 on Windows with MSVC Compile the excellent x264 source on Windows using MSYS2 and MSVC.
C/C++ Snippets Is a number divisible by four?
Reading Chunks from a Buffer Some thoughts on reading bytes from a file; handy for reading NALs.
Handy Bash Commands Bash scripts: removing white space, lowercase filenames, backup using tar, etc.
Building a zero copy parser Simple solution to parse data in a pretty performant way. Used this for a RTSP protocol parser.
Kalman Filter A very simple yet powerful filter which works great when you have to smooth noisy data. Used for the Nike Rise 2.0 project.
Saving pixel data using libpng Do you have raw RGBA data that you want to save? Use this snippet to save it into a PNG file.
Compile Apache, PHP and MySQL on Mac 10.10 Setup you own PHP, MySQL and Apache and with virtual document roots.
Fast Pixel Transfers with Pixel Buffer Objects Using Pixel Buffer Objects (PBO) for fast asynchronous data transfers and OpenGL.
High Resolution Timer function in C/C++ Wait...... wait.. fast high resolution timer funtions (Windows, Linux, Mac)
Rendering text with Pango, Cairo and Freetype My never ending obsession with font rendering. A complex beast to do well. Use Pango and FreeType for the heavy lifting.
Fast OpenGL blur shader Make things look blurry ... and fast using this OpenGL blur shader.
Spherical Environment Mapping with OpenGL An old trick to get great lighting effects using Environment Maps and OpenGL.
Using OpenSSL with memory BIOs OpenSSL is a great library with lots of abstractions. In this post I discuss how to break some of these abstractions and use your own memory buffers.
Attributeless Vertex Shader with OpenGL A simple way to render a fullscreen quad without a vertex buffer with OpenGL.
Circular Image Selector Some thoughts on a different way to select images from a huge collection in a compact UI.
Decoding H264 and YUV420P playback Using libav to demux and playback with OpenGL.
Fast Fourier Transform Analyse your audio using the Fastest Fourier Transform in the West.
OpenGL Rim Shader Pretty glowy edges using a GLSL rim shader.
Rendering The Depth Buffer Render the non-linear OpenGL Depth Buffer.
Delaunay Triangulation Do you need to triangulate some shape: use the “Triangle” library.
RapidXML RapidXML is a versatile and fast XML parser with a simple API. Check out these examples.
Git Snippets Some simple GIT snippets; added here to remind myself.
Basic Shading With OpenGL A couple of basic GLSL shaders with explanation.
Open Source Libraries For Creative Coding Collection of great open source libraries for you creative programming projects.
Bouncing particle effect Snippet that can be used to create a bouncy particle effect; basic, effective, simple but nice.
OpenGL Instanced Rendering Want to render thousands and thousands of objects? Use OpenGL instanced rendering. The solution...the only solution.
Mapping a texture on a disc Ever heard about projective interpolation related to texture mapping? Learn about this intertesting issue with OpenGL and texture mapping.
Download HTML page using CURL When you want a quick solution to perform a HTTP(S) request CURL is always a quick an simple solution. Check out this example code.
Height Field Simulation on GPU Although not a Navier-Stokes implementation ... still a very nice and enjoyable effect.
OpenCV Optical Flow: when doing anything with tracking you've probably heard of it. See this simple example code using OpenCV and OpenGL.
Some notes on OpenGL FBOs and Depth Testing, using different Attachment Points, a YUV420p shader, ...
Math Meaning of the Dot Product in 3D graphics, calculating a perpendicular vector using Sam Hocevar's solution, orientation matrix and more.
Gists to remember Some gists that I want to remember, often use, etc...
Reverse SSH Do you want to login, into a remote PC but the remote PC is behind a firewall? Then use this simple reverse SSH trick which doesn't require changing your firewall rules.
Working Set Having issues with your compiler? Or during linking? Check these common issues and their solutions. I also list several tools that you can use to get a some useful info.
Consumer + Producer model with libuv Example of a common Multi Threaded Consumer/Producer Model using LibUV.
Parsing binary data Learn about the basic of a binary protocol and how to create one easily yourself.
C++ file operation snippets Reading a file into a string, vector, checking the file size, change to a position, etc. A collection of C++ file operation snippets.
Importance of blur with image gradients Do you want to experiment with OpenGL and aligning Brush Strokes along Image Gradients? Then check out this post about the importance of blurring.
Real-time oil painting with openGL Code snippet for fake "oil painting" effect with OpenGL using instanced rendering.
x264 encoder Basic example on how to use libx264 to encode image data using libav
Generative helix with openGL Screenshots of a project I worked on with that generates a DNA helix.
Mini test with vector field Screenshots while experimenting with a vector field; nothing much to see here.
Protractor gesture recognizer Testing the amazing One Dollar $1 gesture recognizer. The simplest and very good gesture recognizer.
Hair simulation Example code that implements the "Fast Simulation of Inextensible Hair and Fur" paper from M. Müller, T.Y. Kim and N.Chentanez.
Some glitch screenshots Glitch screenshots.
Working on video installation Screenshots of some experiments of a video installation.
Generative meshes I enjoy creating physics based simulations and render them on high res. Here are some experiments I did a time ago.
Converting video/audio using avconv Examples that show you how to use avconv to manipulate video and audio files.
Auto start terminal app on mac Automatically start you application whe Mac boots and make sure that it restarts your app when it exists. Handy for interactive installations.
Export blender object to simple file format Export the selected object in Blender into a .h and .cpp file that prepresents the buffer.