Back to posts.

Fast OpenGL blur shader

There have been several proposed several optimized solution for applying a Gaussian blur to a texture with OpenGL. Normally with a blur shader you fetch the color for a specific pixel, then fetch some pixels around this particular pixel and combine all values by some weight. This weight is based on the gaussian function. How many pixels you fetch for each outgoing pixel determines the quality of the blur effect but also adds extra costs and slows down your your rendering.

Daniel Rákos described an interesting solution to reduce the number of texture fetches which are costly operations compared to lets say multiplying some value in a shader. Therefore by reducing the number of texture fetches speeds up the blur shader. You can read up on the details in the article of Daniel. In short he describes a technique where you make use of the linear sampling feature of openGL. Linear sampling is highly optimized and basically 'free' compared to the number of extra texture fetches one does without this optimized solution.

Instead of doing 9 texture fetches (per horizontal and vertical blur step) you only need to do 5 fetches instead for the same kernel size. The trick is that the color value you fetch is automatically interpolated by the hardware of your GPU and combining this with adjusted weights you have to do less fetches. Therefore for this to work you need to change the weights and texture coordinates (offsets) a little bit so they returns a similar adjusted value.

The image belows shows the formula how one can adjust the normal guassian weights and offsets for the optimized version that uses linear sampling. Credits for this formula go to daniel.

Blur performance and quality results

Another trivial way to speed up the blur process is to simply reduce the size of the texture that you want to blur. In the images below I've used several different sizes and blur steps. when you reduce the size of the input texture for the blur you'll see that the quality is a lot less. Though this can be easily fixed by applying a couple more blur passes. You have to find the best mix between scale and the number of blur passes you want to do.

Source code

The code below can be used as an example on how to implement an optimized blur shader with a fixed kernel size and user defined sigma.

#ifndef GFX_BLUR_H
#define GFX_BLUR_H
 
#include <glad/glad.h>
 
#define ROXLU_USE_LOG
#define ROXLU_USE_OPENGL
#define ROXLU_USE_MATH
#include <tinylib.h>
 
static const char* BLUR_VS = ""
  "#version 330\n"
  ""
  " const vec2[] pos = vec2[4]("
  "   vec2(-1.0, 1.0),"
  "   vec2(-1.0, -1.0),"
  "   vec2(1.0, 1.0),"
  "   vec2(1.0, -1.0)"
  "   );"
  ""
  "const vec2 texcoords[4] = vec2[] ("
  "  vec2(0.0, 1.0), "
  "  vec2(0.0, 0.0), "
  "  vec2(1.0, 1.0), "
  "  vec2(1.0, 0.0)  "
  "); "
  ""
  "out vec2 v_tex;"
  ""
  "void main() {"
  "  gl_Position = vec4(pos[gl_VertexID], 0.0, 1.0);"
  "  v_tex = texcoords[gl_VertexID];"
  " }" 
  "";
 
 
namespace gfx { 
  class Blur {
  public:
    Blur();
    ~Blur();
    int init(double amount);
    void blurX(float w, float h);
    void blurY(float w, float h);
 
  public:
    GLuint vao;
    GLuint vert;
    GLuint frag_y;
    GLuint frag_x;
    GLuint prog_x;
    GLuint prog_y;
    GLint xtex_w;
    GLint xtex_h;
    GLint ytex_w;
    GLint ytex_h;
  };
} /* namespace gfx */
 
#endif

The implmentation has both the optimized and non-optimized version. You can enable/disable it by using the #if in the Blur::init() function.

#include <vector>
#include <sstream>
#include <math.h>
#include <gfx/Blur.h>
 
namespace gfx {
 
  /* -------------------------------------------------------------------------------- */
 
  static float gauss(float x, float s2);
 
  /* -------------------------------------------------------------------------------- */
 
  Blur::Blur() 
    :vert(0)
    ,frag_x(0)
    ,frag_y(0)
    ,prog_x(0)
    ,prog_y(0)
    ,vao(0)
  {
  }
 
  Blur::~Blur() {
  }
 
  int Blur::init(double amount) {
    RX_VERBOSE("Creating blur shader - check the effect of having the first sum like: sum = weights[0] * 2.0, is better");
 
#if 1
    /* OPTIMIZED VERSION */
    float sum = 0.0;
    float weights[5] = { 0.0f } ;
    float offsets[5] = { 0.0, 1.0, 2.0, 3.0, 4.0 } ;
 
    /* Calculate the weights */
    weights[0] = gauss(0, amount);
    sum = weights[0];  //     sum = weights[0] * 2.0;
    for (int i = 1; i < 5; ++i) {
      weights[i] = gauss(i, amount);
      sum += 2.0 * weights[i];
    }
    for (int i = 0; i < 5; ++i) {
      weights[i] /= sum;
    }
 
    /* fix for just 3 fetches */
    float new_weights[3] = { weights[0], weights[1] + weights[2], weights[3] + weights[4] } ;
    float new_offsets[3] = { 0.0f };
    new_offsets[0] = 0.0f;
    new_offsets[1] = ( (weights[1] * offsets[1]) + (weights[2] * offsets[2]) ) / new_weights[1];
    new_offsets[2] = ( (weights[3] * offsets[3]) + (weights[4] * offsets[4]) ) / new_weights[2];
 
    /* create the shader */
    std::stringstream ss_open;
    ss_open << "#version 330\n"
            << "uniform sampler2D u_tex;\n"
            << "uniform float u_tex_w;\n"
            << "uniform float u_tex_h;\n"
            << "in vec2 v_tex;\n"
            << "layout( location = 0 ) out vec4 fragcolor;\n"
            << "\n"
            << "void main() {\n"
            << "  float sy = 1.0 / u_tex_h;\n"
            << "  float sx = 1.0 / u_tex_w;\n"
            << "";
 
    /* create the texture lookups */
    std::stringstream ss_y, ss_x;
    ss_y << "  fragcolor = texture(u_tex, v_tex) * " << new_weights[0] << ";\n";
    ss_x << "  fragcolor = texture(u_tex, v_tex) * " << new_weights[0] << ";\n";
 
    for (int i = 1; i < 3; ++i) {
      ss_y << "  fragcolor += texture(u_tex, vec2(v_tex.s, v_tex.y + (" << new_offsets[i] << " * sy))) * " << new_weights[i] << ";\n";
      ss_y << "  fragcolor += texture(u_tex, vec2(v_tex.s, v_tex.y - (" << new_offsets[i] << " * sy))) * " << new_weights[i] << ";\n";
      ss_x << "  fragcolor += texture(u_tex, vec2(v_tex.s + (" << new_offsets[i] << " * sx), v_tex.t)) * " << new_weights[i] << ";\n";
      ss_x << "  fragcolor += texture(u_tex, vec2(v_tex.s - (" << new_offsets[i] << " * sx), v_tex.t)) * " << new_weights[i] << ";\n";
    }
 
    ss_y << "}\n";
    ss_x << "}\n";
 
#else 
    /* UNOPTIMIZED */
    float sum = 0.0;
    float weights[5] = { 0.0f } ;
    float offsets[5] = { 0.0, 1.0, 2.0, 3.0, 4.0 } ;
 
    /* Calculate the weights */
    weights[0] = gauss(0, amount);
    sum = weights[0];  //     sum = weights[0] * 2.0;
    for (int i = 1; i < 5; ++i) {
      weights[i] = gauss(i, amount);
      sum += 2.0 * weights[i];
    }
    for (int i = 0; i < 5; ++i) {
      weights[i] /= sum;
    }
 
    /* create the shader */
    std::stringstream ss_open;
    ss_open << "#version 330\n"
            << "uniform sampler2D u_tex;\n"
            << "uniform float u_tex_w;\n"
            << "uniform float u_tex_h;\n"
            << "in vec2 v_tex;\n"
            << "layout( location = 0 ) out vec4 fragcolor;\n"
            << "\n"
            << "void main() {\n"
            << "  float sy = 1.0 / u_tex_h;\n"
            << "  float sx = 1.0 / u_tex_w;\n"
            << "";
 
 
    /* create the texture lookups */
    std::stringstream ss_y, ss_x;
    ss_y << "  fragcolor = texture(u_tex, v_tex) * " << weights[0] << ";\n";
    ss_x << "  fragcolor = texture(u_tex, v_tex) * " << weights[0] << ";\n";
 
    for (int i = 1; i < 5; ++i) {
      ss_y << "  fragcolor += texture(u_tex, vec2(v_tex.s, v_tex.y + (" << offsets[i] << ".0 * sy))) * " << weights[i] << ";\n";
      ss_y << "  fragcolor += texture(u_tex, vec2(v_tex.s, v_tex.y - (" << offsets[i] << ".0 * sy))) * " << weights[i] << ";\n";
      ss_x << "  fragcolor += texture(u_tex, vec2(v_tex.s + (" << offsets[i] << ".0 * sx), v_tex.t)) * " << weights[i] << ";\n";
      ss_x << "  fragcolor += texture(u_tex, vec2(v_tex.s - (" << offsets[i] << ".0 * sx), v_tex.t)) * " << weights[i] << ";\n";
    }
 
    ss_y << "}\n";
    ss_x << "}\n";
#endif
 
    std::string yfrag = ss_open.str() + ss_y.str();
    std::string xfrag = ss_open.str() + ss_x.str();
 
    /* create the shaders */
    vert = rx_create_shader(GL_VERTEX_SHADER, BLUR_VS);
    frag_x = rx_create_shader(GL_FRAGMENT_SHADER, xfrag.c_str());
    frag_y = rx_create_shader(GL_FRAGMENT_SHADER, yfrag.c_str());
    prog_x = rx_create_program(vert, frag_x, true);
    prog_y = rx_create_program(vert, frag_y, true);
 
    /* set the texture binding points */
    glUseProgram(prog_x);
    glUniform1i(glGetUniformLocation(prog_x, "u_tex"), 0);
    xtex_w = glGetUniformLocation(prog_x, "u_tex_w");
    xtex_h = glGetUniformLocation(prog_x, "u_tex_h");
 
    glUseProgram(prog_y);
    glUniform1i(glGetUniformLocation(prog_y, "u_tex"), 0);
    ytex_w = glGetUniformLocation(prog_y, "u_tex_w");
    ytex_h = glGetUniformLocation(prog_y, "u_tex_h");
 
    /* create our vao. */
    glGenVertexArrays(1, &vao);
 
    return 0;
  }
 
  void Blur::blurX(float w, float h) {
 
    /* make sure init has been called. */
    if (0 == prog_x || 0 == prog_y) {
      RX_ERROR("Shaders not initialized");
      return;
    }
 
    glBindVertexArray(vao);
    glUseProgram(prog_x);
    glUniform1f(xtex_w, w);
    glUniform1f(xtex_h, h);
    glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
  }
 
  void Blur::blurY(float w, float h) {
 
    /* make sure init has been called. */
    if (0 == prog_x || 0 == prog_y) {
      RX_ERROR("Shaders not initialized");
      return;
    }
 
    glBindVertexArray(vao);
    glUseProgram(prog_y);
    glUniform1f(ytex_w, w);
    glUniform1f(ytex_h, h);
    glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
  }
 
 
  /* -------------------------------------------------------------------------------- */
 
  static float gauss(float x, float s2) {
    double c = 1.0 / (2.0 * 3.14159265359 * s2);
    double e = -(x * x) / (2.0 * s2);
    return (float) (c * exp(e));
  }
 
} /* namespace gfx */

NAT Types This is so exciting, in this article I dive into some of the different ways a NAT device translates addresses which is important for peer-to-peer connections.
Building Cabinets In this post I dive into the design and construction of a cabinet with an interact LED strip. I also explain how I dynamically change the colors of the LEDs over TCP/UDP.
Compiling GStreamer from source on Windows How to compile GStreamer on Windows from Source using Visual Studio 2019 and the meson build system.
Debugging CMake Issues In this post I explain a process you can follow to debug issues with CMake by focusing on a specific target and making the output verbose.
Dual Boot Arch Linux and Windows 10 How to install Arch Linux and Windows 10 Pro as dual boot. A step by step tutorial how to create bootable installers, partition and setup a dual boot menu.
Mindset Updated Edition, Carol S. Dweck (Book Notes) Paragraphs I marked from the book "Mindset" from Carol S. Dweck.
How to setup a self-hosted Unifi NVR with Arch Linux A step by step HOW-TO that explain show to setup a Unifi Video Controller with an NFS share with Arch Linux.
Blender 2.8 How to use Transparent Textures Follow this node setup when you want to use an image with transparency as a "sticker".
Compiling FFmpeg with X264 on Windows 10 using MSVC A couple of steps to compile FFmpeg on Windows using MSVC.
Blender 2.8 OpenGL Buffer Exporter The following Blender script creates a [name].h and [name].cpp for the selected object and stores the positions, normals and UVs.
Blender 2.8 Baking lightmaps Light maps are a cheap way to add a lot of realism to you static scenes and have been used forever.
Blender 2.8 Tips and Tricks Use Environment Map only for reflections; create a floor plane for a Product Render, diffuse texture for roughness and more!
Setting up a Bluetooth Headset on Arch Linux Learn how to setup a Sennheiser PXC 550 Bluetooth headset on Arch Linux.
Compiling x264 on Windows with MSVC Compile the excellent x264 source on Windows using MSYS2 and MSVC.
C/C++ Snippets Is a number divisible by four?
Reading Chunks from a Buffer Some thoughts on reading bytes from a file; handy for reading NALs.
Handy Bash Commands Bash scripts: removing white space, lowercase filenames, backup using tar, etc.
Building a zero copy parser Simple solution to parse data in a pretty performant way. Used this for a RTSP protocol parser.
Kalman Filter A very simple yet powerful filter which works great when you have to smooth noisy data. Used for the Nike Rise 2.0 project.
Saving pixel data using libpng Do you have raw RGBA data that you want to save? Use this snippet to save it into a PNG file.
Compile Apache, PHP and MySQL on Mac 10.10 Setup you own PHP, MySQL and Apache and with virtual document roots.
Fast Pixel Transfers with Pixel Buffer Objects Using Pixel Buffer Objects (PBO) for fast asynchronous data transfers and OpenGL.
High Resolution Timer function in C/C++ Wait...... wait.. fast high resolution timer funtions (Windows, Linux, Mac)
Rendering text with Pango, Cairo and Freetype My never ending obsession with font rendering. A complex beast to do well. Use Pango and FreeType for the heavy lifting.
Fast OpenGL blur shader Make things look blurry ... and fast using this OpenGL blur shader.
Spherical Environment Mapping with OpenGL An old trick to get great lighting effects using Environment Maps and OpenGL.
Using OpenSSL with memory BIOs OpenSSL is a great library with lots of abstractions. In this post I discuss how to break some of these abstractions and use your own memory buffers.
Attributeless Vertex Shader with OpenGL A simple way to render a fullscreen quad without a vertex buffer with OpenGL.
Circular Image Selector Some thoughts on a different way to select images from a huge collection in a compact UI.
Decoding H264 and YUV420P playback Using libav to demux and playback with OpenGL.
Fast Fourier Transform Analyse your audio using the Fastest Fourier Transform in the West.
OpenGL Rim Shader Pretty glowy edges using a GLSL rim shader.
Rendering The Depth Buffer Render the non-linear OpenGL Depth Buffer.
Delaunay Triangulation Do you need to triangulate some shape: use the “Triangle” library.
RapidXML RapidXML is a versatile and fast XML parser with a simple API. Check out these examples.
Git Snippets Some simple GIT snippets; added here to remind myself.
Basic Shading With OpenGL A couple of basic GLSL shaders with explanation.
Open Source Libraries For Creative Coding Collection of great open source libraries for you creative programming projects.
Bouncing particle effect Snippet that can be used to create a bouncy particle effect; basic, effective, simple but nice.
OpenGL Instanced Rendering Want to render thousands and thousands of objects? Use OpenGL instanced rendering. The solution...the only solution.
Mapping a texture on a disc Ever heard about projective interpolation related to texture mapping? Learn about this intertesting issue with OpenGL and texture mapping.
Download HTML page using CURL When you want a quick solution to perform a HTTP(S) request CURL is always a quick an simple solution. Check out this example code.
Height Field Simulation on GPU Although not a Navier-Stokes implementation ... still a very nice and enjoyable effect.
OpenCV Optical Flow: when doing anything with tracking you've probably heard of it. See this simple example code using OpenCV and OpenGL.
Some notes on OpenGL FBOs and Depth Testing, using different Attachment Points, a YUV420p shader, ...
Math Meaning of the Dot Product in 3D graphics, calculating a perpendicular vector using Sam Hocevar's solution, orientation matrix and more.
Gists to remember Some gists that I want to remember, often use, etc...
Reverse SSH Do you want to login, into a remote PC but the remote PC is behind a firewall? Then use this simple reverse SSH trick which doesn't require changing your firewall rules.
Working Set Having issues with your compiler? Or during linking? Check these common issues and their solutions. I also list several tools that you can use to get a some useful info.
Consumer + Producer model with libuv Example of a common Multi Threaded Consumer/Producer Model using LibUV.
Parsing binary data Learn about the basic of a binary protocol and how to create one easily yourself.
C++ file operation snippets Reading a file into a string, vector, checking the file size, change to a position, etc. A collection of C++ file operation snippets.
Importance of blur with image gradients Do you want to experiment with OpenGL and aligning Brush Strokes along Image Gradients? Then check out this post about the importance of blurring.
Real-time oil painting with openGL Code snippet for fake "oil painting" effect with OpenGL using instanced rendering.
x264 encoder Basic example on how to use libx264 to encode image data using libav
Generative helix with openGL Screenshots of a project I worked on with that generates a DNA helix.
Mini test with vector field Screenshots while experimenting with a vector field; nothing much to see here.
Protractor gesture recognizer Testing the amazing One Dollar $1 gesture recognizer. The simplest and very good gesture recognizer.
Hair simulation Example code that implements the "Fast Simulation of Inextensible Hair and Fur" paper from M. Müller, T.Y. Kim and N.Chentanez.
Some glitch screenshots Glitch screenshots.
Working on video installation Screenshots of some experiments of a video installation.
Generative meshes I enjoy creating physics based simulations and render them on high res. Here are some experiments I did a time ago.
Converting video/audio using avconv Examples that show you how to use avconv to manipulate video and audio files.
Auto start terminal app on mac Automatically start you application whe Mac boots and make sure that it restarts your app when it exists. Handy for interactive installations.
Export blender object to simple file format Export the selected object in Blender into a .h and .cpp file that prepresents the buffer.