I was at SIGGRAPH 2014 to present, among other things, a poster of a new relighting method using Voxel Cone Tracing. My schedule was as per usual completely overloaded. Nevertheless, here are some impressions.

My favorite part about the conference are the courses, specificaly these three:

These are great synchronization points for me. In Advances in Real-Time Rendering I especially enjoyed three talks: Bart Wronskis (@BartWronsk) Volumetric Fog method for Assassin's Creed 4 and two talks about SSR from Eidos and Guerilla Games, which is currently a thing I'm looking into to simulate low-cost reflections in AR. Also, Wade Brainerd (@wadetb) gave a great talk which was running completely inside the Call of Duty engine, which is I think a perfect way to show case what you have live on the "slides".

Physically Based Shading had the usual very informative introduction by Naty Hoffman (@renderwonk) (here are the expanding course notes from last year), followed by an important in-depth talk from Eric Heitz about the masking function in microfacet BRDFs. One of the fun talks was delivered at the end by Pixar, which I guess was a clear demonstration on how artists abuse Renderman to circumvent everything phyiscally-based that was introduced by the programmers in the first place.

Although I'm not (yet) into path-tracing, I regularily visit the Light Transport session to delve into the general sampling and filtering madness that dominates the entire subfield. The course usually starts with a nice introduction into path integral forumlations and then expands on papers which show different methods to cope with the noise. One presentation that I found interessting by Ondre Karlik presented the Corona Renderer, a project with a focus on being more easy to use, with less but more intuitive parameters to manipulate.

The rest of the conference, such as the material of the paper sessions Fast Rendering and Light Transport, unfortunately still rests on my backlog.

I was missing the Geek Bar this year. This place comes in handy when you have multiple sessions overlapping the same time slot, which happens every time for GI/rendering talks. The Geek Bar streams multiple sessions into the same room, and you can switch around channels with your headset.

Vancouver as a location was pretty nice, had some good party locations and was a welcome change to LA. You can see some photos I took right here.

On publication videos

I have reviewed my fair share of papers over the years and seen some heinous crimes in video editing for the customary media attachement: blurry, too large, strange codecs only recognized by VLC, mouse cursor and other GUI elements visible on screen, double letterboxing or outright interesting choices of aspect ratio. Several factors play into this, such as the usual last minute pressure when compiling a paper, the oh-gawd-we-surpassed-the-maximum-attachement-size limitations of most submission systems or that the constant recompression just takes too long and the authors are happy with what they got.

It is clear however that the cause for this confusion is that video encoding is firmly situated at the borderline to black magic. There are certainly tools like iMovie which tremendously help to overcome most troubles in standard editing and encoding, but somewhere in between incompatible codecs and file containers, video bitrates, quantization and scaling issues, frame re-timing, color compression, and total boredom because you just wanted a video everyone will give up sooner or later.

In this post I'm going to cover the topic of how to create a web-browser compatible, pixel-matching HD or Full-HD, H.264 compressed MP4 of a reasonable size from a recording of a graphics application. But first, we need some ground rules.

Getting it right

The first commandment, from which all others are derived, is that you shall absolutely and unequivocally avoid re-encoding your video multiple times. With each re-encoding, the quality of your video will degenrate into the image above. This is especially true if you know there will be yet another re-encoding step ahead where the settings are beyond your reach, for instance when uploading to Youtube: in this case a mostly unprocessed video is your best option.

These days the world has come down to some fairly standard aspect ratios: 16:9 for movies, 16:10 for displays. If you are recording 4:3 videos you're doing it wrong! Because we're creating a movie for the web, you may end up uploading it to various streaming platforms such as Youtube and Vimeo, which only deal with 16:9 ratios, so let's just stick with that.

On the resolution front of these ratios the choices have been reduced to 720p (1280x720 pixels) and 1080p (1920x1080 pixels). We'll most likely see new standards ermerge for the upcoming 4k madness. In order to get a nice video file at the end, make sure that whatever you intend to record comes in either one of those two resolutions.

Not going for a standard aspect ratio and a standard resolution will force editing tools to rescale your video, causing sampling issues which will lead to blurry image quality. Sometimes you can't avoid having a resolution that isn't 720p or 1080p (capturing webcam videos for instance), but if you control the output just stick with the default. Videos which do not conform to either aspect ratio or resolution should be cropped and scaled manually with an external tool so that you control the quality and output instead of getting lucky with whatever your editing tool does to those videos.

Finally, all your videos should run with the same frame rate! If they don't you either have to re-time everything to the lowest common denominator (which doesn't require anything special but may not be great), or magically re-construct the missing frames with some optical-flow tool like Apple Cinema Tools or Twixtor (a lot of manual work ahead).

Dumping data

A popular way to obtain videos of a D3D or GL application are hooks like Fraps, NVIDIA ShadowPlay or DXTory. On Mac OS X, you can use Quicktime (File -> New Screen Recording) to record portions of the screen. If you haven't done so already, check out a demo of one of them and try to record from a running game. An important decision has to be made upfront about resource division: record uncompressed videos for maximum quality and speed at very high data rates which can easily fill even the largest of hard disks, or sacrifice potentially vital CPU time to compress live during recording. There is no easy answer to this and it depends on your preferences and hardware settings. Personally, I prefer to record uncompressed and later downsize the videos into smaller, compressed segments for archiving. This certainly is no option to record World of Warcraft Boss fights of 10 minutes and more though.

I haven't played around with Fraps too much and recently found a way to completely avoid it by piping raw RGBA frames from my D3D app directly into FFmpeg. You may use the following class to do the same.

#include <iomanip>
#include <ctime>
#include <sstream>

#include <D3D11.h>

#define tstringstream std::wstringstream
#define tcerr std::wcerr
#define tclog std::wclog

template<typename T>
void safe_release(T& obj)
    if (obj)
        obj = nullptr;

class video_recorder
    UINT                    width_,
    ID3D11Texture2D*        ffmpeg_texture_;
    FILE*                   ffmpeg_;
    tstring                 path_;

    void create(ID3D11Device* device, UINT width, UINT height, UINT fps, const tstring& path = L"../../data")
        width_ = width;
        height_ = height;
        fps_ = fps;
        path_ = path;

        D3D11_TEXTURE2D_DESC desc;
        desc.Width = static_cast<UINT>(width);
        desc.Height = static_cast<UINT>(height);
        desc.MipLevels = 1;
        desc.ArraySize = 1;
        desc.SampleDesc.Count = 1;
        desc.SampleDesc.Quality = 0;
        desc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
        desc.Usage = D3D11_USAGE_STAGING;
        desc.BindFlags = 0;
        desc.CPUAccessFlags = D3D11_CPU_ACCESS_READ;
        desc.MiscFlags = 0;

        if (device->CreateTexture2D(&desc, nullptr, &ffmpeg_texture_) != S_OK)
            tcerr << L"Failed to create staging texture for recording" << std::endl;

    void start_recording(bool compressed = false)
        if (!ffmpeg_texture_)

        std::time_t t = std::time(nullptr);

        std::tm tm;
        localtime_s(&tm, &t);

        tstringstream file;
        file << path_ << "/record-" << std::put_time(&tm, L"%Y%m%d-%H%M%S") << L".mp4";

        // adapted from
        tstringstream cmd;
        cmd << L"ffmpeg -r " << fps_ << " -f rawvideo -pix_fmt rgba "
            << L"-s " << width_ << "x" << height_ << " "
            << L"-i - "
            << L"-threads 2 -y "
            << L"-c:v libx264 "
            << (compressed) ? L"-preset ultrafast -qp 0 " : L"-preset fast "
            << make_absolute_path(file.str())

        tclog << L"Recording video with: " << cmd.str() << std::endl;

#ifdef UNICODE
        ffmpeg_ = _wpopen(cmd.str().c_str(), L"wb");
        ffmpeg_ = _popen(cmd.str().c_str(), "wb");

        if (!ffmpeg_)
            tcerr << L"Failed to initialize ffmpeg" << std::endl;

    void stop_recording()
        if (ffmpeg_)

        ffmpeg_ = nullptr;

    void add_frame(ID3D11DeviceContext* context, ID3D11RenderTargetView* rtv)
        ID3D11Resource* resource;
        add_frame(context, resource);

    void add_frame(ID3D11DeviceContext* context, ID3D11Resource* resource)
        context->CopyResource(ffmpeg_texture_, resource);

        UINT subresource = D3D11CalcSubresource(0, 0, 0);
        context->Map(ffmpeg_texture_, subresource, D3D11_MAP_READ, 0, &msr);

        if (msr.pData && ffmpeg_)
            fwrite(msr.pData, (width_ * height_ * 4), 1, ffmpeg_);
            context->Unmap(ffmpeg_texture_, subresource);

    void destroy()

Essentially the video_encoder class creates an open pipe to ffmpeg using the libx264 codec with zero compression. add_frame() copies a resource behind a RenderTargetView to a staging texture, which is then CPU read and fed to the pipe.

The crucial line is where the cmd stringstream is compiled: here you may want toggle compressed to have direct compression enabled and choose custom settings for -qp and -crf to control the compression rate. Either way, be warned that the resulting video is NOT compatible with MP4 support in browsers, because most of them only play content in YUV 4:2:0 color space and these settings output RGBA. I don't have access to D3D11.1 where the staging texture can have DXGI_FORMAT_NV12, but if you do you may want to try this.

If you need maximum performance though, save the video uncompressed and compress it afterwards.

$ ffmpeg -i record-xxxx-xxxx.mp4 -c:v libx264\
         -crf 23 out.mp4

This will give you a nice, compressed MP4 which modern browser can directly display without plugins. You can upload it to your webpage and embed the video with an HTML5 video tag or put it on a Dropbox folder and people can directly watch it. If you plan to cut and edit the video though, skip this step.

Embedding videos into a webpage is easily done with the following code.

<video style="width:100%" controls autoplay>
    <source src="http://foo/bar.mp4" type="video/mp4">
    Your browser does not support the video tag.

Below you can see a sample video I grabbed from my renderer and uploaded to Dropbox. The video should run on any modern browser without any plugins.


When it comes to video editing my weapon of choice is Final Cut 7, which I'll use to illustrate an important point: most if not all bigger editing tools use a special codec for cutting a video. Apple tools such as FCP and iMovie use the ProRes codec, which comes in several forms (Proxy, LT, HQ). If you value your time, don't want your editing tool to arbitrarily recompress your snippets and maintain full control consider re-encoding your raw FFmpeg dump manually with such a codec. In case of FCP, this will also allow you to use blending operations and other effects without FCPs further interference.

At this point I cannot overstate the magnificent gloriosity of MPEGStreamClip: it's an easy to use free frontend for the Quicktime compressor on Windows and Mac OS X and it has a GUI for batch processing. I usually dump all my videos into the batch list and encode them with Apple ProRes 422 (HQ) at 100% quality. You can also use FFmpeg and simply script this behavior if you're good with the command line. If you have a funny resolution, MPEGStreamClip also has some options to crop and scale the video. A very similar aweseome tool is Handbrake which comes with some neat settings for common enddevices such as smartphone and tablets. Yet another tool for simple cutting, scaling and cropping operations is Virtual Dub, which however doesn't handle the encoding side as nicely.

Converting to the final format

When you are done with the clip, the video is exported into its final format. Here you should take care to use a standard container (no, not WMV/ASF) with a standard video and audio codec (no, not Fraps or Cinepak or DivX or ...) so that the video will work in Safari, Chrome, Firefox and Opera directly and play with the default video player of the operating system, even on mobile devices. You may use the command above for FFMpeg (and add audio settings if necessary), or use MPEGStreamclip like this:

  • Open your uncompressed, edited video
  • Go to File -> Export to MPEG-4
  • Choose H.264 compression
  • Select 100% quality
  • Limit the data rate to 3000-8000 kbps
  • Disable sound if there is none in the video
  • Export the video

And that's it, you're done. Prepare some template Photoshop intro and outro images (which contain paper title, author names, logos of the conference) and creating new videos from recorded material should be an effort of minutes!

The checklist

  • All videos have the same resolution and framerate
  • All videos are either 720p or 1080p
  • You have re-encoded them with the intermediate codec of your editing tool
  • You have created a project in said editing tool with the same resolution and framerate of your videos
  • The final output is H.264

Another helpful resource is the Youtube advanced encoding page, which has two tables for recommended bitrates for standard and high quality movies.






NVIDIA Shadowplay

Miles Macklin, Real-Time Video Capture with FFmpeg

Google, Youtube Advanced Encoding

Notes on importance sampling

Some tutorials on importance sampling specular terms that are out in the wild have what I found to be an information gap: the step from the PDF to the actual sampling function is missing. Hopefully this step-by-step guide can help out one or two other confused readers. I checked all integrals with Maxima and Wolfram Alpha. If you just have a free account at Wolfram Alpha, you might run into issues with exceeding the computation time, which is why I will also write down the commands to do everything in Maxima. If your result doesn't look like the one noted here, try simplifying it first (in the Maxima menu Simplify -> Simplifiy Expression) or put it into Wolfram Alpha and check the Alternate form section!


The PDF for a normalized Phong BRDF with the specular power notation is:

$$ p(\theta, \phi) = \frac{\left(n+1\right)}{2 \pi} \cos^n \theta \sin \theta $$

We generate two functions to sample theta and phi independently. To do this, we first integrate \(p(\theta, \phi)\) along the domain of \( \phi \):

integrate((n+1)/(2*%pi) * cos(t)^n * sin(t), p, 0, 2*%pi)

$$ p(\theta) = \int^{2 \pi}_0 p(\theta, \phi) d\phi = \left(n+1\right) \cos^n \theta \sin \theta $$

This is the marginal density function of \( \theta \). Note that the extra \( \sin \theta \) at the end is there because we are dealing with differential solid angles in spherical coordinates. We can retreive the PDF for \( \phi \) with the conditional probability \( p(\phi | \theta) \), the conditional density function:

$$ p(\phi | \theta) = \frac{p(\theta, \phi)}{p(\theta)} = \frac{1}{2 \pi} $$

For isotropic NDFs this is always the case, no matter which PDF you will integrate over the domain of \( \phi \), so all you need to calculate for the next PDF is the function for \( \theta \).

Now that we have two functions for each variable, we can integrate both to generate a CDF each. The case of \( \phi \) is trivial:

integrate(1/(2*%pi), p, 0, s)

$$ P(s_\phi) = \int_0^{s_\phi} p(\phi)d\phi = \int_0^{s_\phi} \frac{1}{2 \pi} d\phi = \frac{s_\phi}{2 \pi} $$

If we set \( P(s_\phi) \) to a random variable \( \xi_\phi \) and solve for \( s_\phi \), we get:

$$ P(s_\phi) = \frac{s_\phi}{2 \pi} = \xi_\phi $$

$$ s_\phi = 2 \pi \xi_\phi $$

Again, sampling around the azimuthal angle of an isotropic NDF is always the same because the material is rotationally invariant along this angle, so you don't need to derive this formula again.

We now repeat the process and create a CDF for \( p(\theta) \):

integrate((n+1) * cos(t)^n * sin(t), t, 0, s)

$$ P(s_\theta) = \int_0^{s_\theta} p(\theta) d\theta = 1 - cos^{n+1} s_\theta$$

Just like in the case of \( \phi \), we set \( P(s_\theta) \) to a random variable again and solve for \( s \):

solve(1 - cos(s)^(n+1) = x, s)

$$ P(s_\theta) = 1 - cos^{n+1} s_\theta = \xi_\theta $$

$$ s_\theta = cos^{-1}\left( \left( 1 - \xi_\theta \right)^\frac{1}{n+1} \right) $$

Therefore, a GLSL shader which generates important directions for a Phong NDF from random, uniform values looks like this:

vec2 importance_sample_phong(vec2 xi)
  float phi = 2.0f * PI * xi.x;
  float theta = acos(pow(1.0f - xi.y, 1.0f/(n+1.0f)));
  return vec2(phi, theta);

Note that since \( \xi_\theta \in [0,1) \) the expression \( 1 - \xi_\theta \) is also a random variable in the same range.

You can for instance use a low-discrepancy sequence as a source for uniformly distributed random values xi and generate important samples which aren't clumped.

Trowbridge-Reitz aka GGX

The GGX normal distribution function is part of a microfacet model, which gives you the probability of micro-surface normals oriented along a certain direction. The term has to be normalized though brefore integrating over the hemisphere to account for the projected micro-surface area:

$$ \int_\Omega D(\mathbf{m})(\mathbf{n} \cdot \mathbf{m}) d\mathbf{m} = 1 $$

The NDF itself is defined as:

$$ D(\mathbf{m}) = \frac{\alpha^2}{\pi \left( \cos^2 (\mathbf{n} \cdot \mathbf{m}) \left(\alpha^2 - 1 \right) +1 \right)^2} $$

Just like above, we start out with the PDF for GGX:

$$ p(\theta, \phi) = \frac{\alpha^2}{\pi \left( \cos^2 \theta \left(\alpha^2 - 1 \right) +1 \right)^2} \cos\theta \sin\theta $$

As in the case of Phong, we create two functions for \( \theta \) and \( \phi \). First let's create \( p(\theta) \):

integrate((a^2*cos(t)sin(t))/(%pi((a^2−1)cos(t)^2+1)^2), p, 0, 2%pi)

$$ p(\theta) = \int^{2 \pi}_0 p(\theta, \phi) d\phi = \frac{2 \alpha^2}{\left( \cos^2 \theta \left( \alpha^2 -1 \right) + 1 \right)^2} \cos \theta \sin \theta$$

The integration for \( \phi \) is the same as above, so we skip it and instead now create the CDF for \( p(\theta) \):

integrate((2*a^2*cos(t)*sin(t))/((a^2−1)*cos(t)^2+1)^2, t, 0, s)

$$ P(s_\theta) = \int_0^{s_\theta} p(\theta) d\theta = $$

$$ 2 \alpha^2 \left( \frac{1}{ \left( 2 \alpha^4 - 4 \alpha^2 + 2 \right) \cos^2 s_\theta + 2 \alpha^2 - 2} - \frac{1}{2 \alpha^4 - 2 \alpha^2 } \right) $$

Setting the CDF to a random variable and solving for \( s \) yields:

solve(2*a^2*(1/((2*a^4−4*a^2+2)cos(s)^2+2*a^2−2)−1/(2a^4−2*a^2)) = x, s)

$$ P(s_\theta) = \xi_\theta $$

$$ s_\theta = cos^{-1} \left( \sqrt{ \frac{1 - \xi_\theta}{\left( \alpha^2 -1 \right) \xi_\theta + 1}} \right) $$

A simple GLSL function to generate important directions looks like this:

vec2 importance_sample_ggx(vec2 xi)
  float phi = 2.0f * PI * xi.x;
  float theta = acos(sqrt((1.0f - xi.y)/
                          ((a*a - 1.0f) * xi.y + 1.0f)
  return vec2(phi, theta);


The ultimate goal of mathematics is to eliminate any need for intelligent thought.

Other NDFs can be used to create important samples with the exact same procedure: given a hemispherical PDF (i.e. the specular part of the BRDF), create two independent PDFs for \( \theta \) and \( \phi \), integrate both from \( 0 \) to \( s \) (i.e. create a CDF for each), set the result to \( \xi \) and solve for \( s \).


Walter et al, Microfacet Models for Refraction through Rough Surfaces

Wikipedia, Constructions of low-discrepancy sequences

Physically based AR


A particularly brutal aspect of real-time rendering in AR environments is that your object is directly exposed to the physical reality around it for comparison. Unlike your typical game-engine, masking or hiding physical inaccuracies in the model used to simulate real reflection of light on your virtual object is not going to work very well when all the other objects on screen somehow behave differently. If the augmented object doesn't match up with reality, a knee-jerk reaction is to edit the responsible shader and simply tweak the output with a bunch of random multipliers until all looks right again. However, before all the magic-number adjustments get out of hand, it might be time to review the basics and avoid a non-physically-based-shading-post-traumatic-stress-disorder™.

Image based lit Stanford dragon: diffuse SH + specular environment map sampling.

Many real-time engine writers have concentrated on getting their equations right. The most recent result is the Siggraph Physically Based Shading Course (2012 Edition, 2013 Edition). I will roughly lay out the basic principles of PBS and refer to other very good online blogs which have covered most of the details that do not need to be repeated ad absurdum.

Consider the standard form of the rendering equation:

$$ L\left(x, \mathbf{\omega_o}\right) = \int_\Omega f_r \left(x, \mathbf{\omega_i}, \mathbf{\omega_o}\right) L\left(x, \mathbf{\omega_i}\right) \cos \theta_i d\mathbf{\omega_i} $$

When shading your object, it is important that light reflection off the surface behaves physically plausible, that is the BRDF \( f_r \) has to respect certain conditions:

  • Positivity: the value of the BRDF is always positive.

$$ f_r \left(x, \mathbf{\omega_i}, \mathbf{\omega_o} \right) \geq 0 $$

  • Energy conservation: the total amount of energy reflected over all directions of the surface must be less or equal to the total amount of energy incident to it. In practical terms this means that the visible energy (i.e. reflected light) can at best decrease after bouncing off a surface, while the rest turns to heat or some other form which isn't part of the simulation. A non-emissive surface however cannot emit more light than it received.

$$ M = \int_\Omega L \left(x, \mathbf{\omega_o}\right) \cos \theta_o d\mathbf{\omega_o} \le \int_\Omega L \left(x, \mathbf{\omega_i}\right) \cos \theta_i d\mathbf{\omega_i} = E$$

$$ \forall \mathbf{\omega_o}, \int_\Omega f_r(x, \mathbf{\omega_o}, \mathbf{\omega_i}) cos \theta_i d\mathbf{\omega_i} \leq 1 $$

  • Helmholtz reciprocity: the standard assumption in geometric optics is that exchanging in- and outgoing light direction \( \mathbf{\omega_o} \) in the BRDF doesn't change the outcome.

$$ f_r \left(x, \mathbf{\omega_i}, \mathbf{\omega_o} \right) = f_r \left(x, \mathbf{\omega_o}, \mathbf{\omega_i} \right) $$

  • Superposition: the BRDF is a linear function. Contribution of different light sources may be added up independently. There is a debate about whether this is a property of the BRDF or light.

It is usually the energy conservation in the specular term where things go wrong first. Without normalization Blinn-Phong for instance, a popular and easy way to model specular reflection, can easily be too bright with small specular powers and quickly loose too much energy when increasing the term.

But there is more! Many renderers assume that there are materials which are perfectly diffuse, i.e. they scatter incident light in all directions equally. There is no such thing. John Hable demonstrated this by showing materials which would be considered to be diffuse reflectors. You can read more in his article Everything has Fresnel.

So here we are, with a BRDF that can output too much energy, darkens too quickly and can't simulate the shininess of real world objects because the model is built with unrealistic parameters. How do we proceed?

One solution is to find a normalization factor for Blinn-Phong to fix the energy issues with the model and add a Fresnel term. There are also several other reflection models to choose from: Oren-Nayar, Cook-Torrance, Ashikhmin-Shirley, Ward...

Microfacet Models

Physically based BRDF models are built on the theory of microfacets, which states that the surface of an object is composed of many tiny flat mirrors, each with its own orientation. The idea of a microfacet model is to capture the appearance of a macro-surface not by integration over its micro-factets, but by statistical means.

A microfacet model looks like this (with the notation for direct visibility):

$$ f_\mu \left( \mathbf{l}, \mathbf{v} \right) = \frac{k_d}{\pi} + \frac{ F \left( \mathbf{l}, \mathbf{h} \right) G \left( \mathbf{l}, \mathbf{v}, \mathbf{h} \right) D \left( \mathbf{h} \right) } { 4 \left( \mathbf{n} \cdot \mathbf{l} \right) \left( \mathbf{n} \cdot \mathbf{v} \right) } $$


$$ \mathbf{h} = \left| \mathbf{l} + \mathbf{v} \right| $$

This specular part of the BRDF has three important components:

  • the Fresnel factor \( F \)
  • a geometric term \( G \)
  • a Normal Distribution Function (NDF) \( D \)

The Fresnel term \( F \) simulates the Fresnel behavior and can be implemented with the Schlick approximation.

The geometric term \( G \) models the self-occlusion behavior of the microfacets on the surface and can be thought of as a visibility factor for a micro-landscape which simply depends on one parameter for the surface roughness.

The NDF \( D \) is the term that gives you the distribution of microfacet normals across the surface from a certain point of view. If more microfacets are oriented in the half-vector direction \( \mathbf{h} \), the specular highlight will be brighter. \( D \) is the density of normals oriented in direction \( \mathbf{h} \).


To sum up, the idea here is to start out with the correct shading model to avoid inconsistencies that might turn up later (I'm speaking out of negative experience tweaking arbitrary parameters of my old AR renderer). Turning to such a model might not produce visible differences immediately, but will later be noticeable once GI is added to the system where the light bounces multiply any error one made right at the start.

A neat way to check how you are on the reality-scale is Disney's BRDF Explorer, which comes with GLSL implementations of several microfacet terms and other BRDFs (have a look at the brdf/ subdirectory and open one of the .brdf files).

Disney's BRDF explorer

You can download measured materials from the MERL database, load them in the BRDF Explorer and compare them to see how well analytical BRDFs match up to reality.

More References

Hill et al., Siggraph PBS 2012

Hill et al., Siggraph PBS 2013

Brian Karis, Specular BRDF Reference

John Hable, Everything has Fresnel

Rory Driscoll, Energy Conservation In Games

Rory Driscoll, Physically-Based Shading

Simon Yeung, Microfacet BRDF

Sébastien Lagarde, PI or not to PI in game lighting equation

Wikipedia, Schlick's approximation's_approximation

Christian Schüler, The Blinn-​Phong Normalization Zoo

D3D Book, Cook-Torrance

Disney, BRDF Explorer


A magic lantern


When adding a virtual object into a real scene, the renderer has to shade the object depending on the incident light it receives. Furthermore, the renderer might also take care of changes on reality which were caused by the object, such as a drop-shadow from it. But before we proceed, let's not reinvent the wheel here: great inspiration can be drawn from the knowledge gathered in the film industry on how to augment reality properly, and by taking a step back and learn from history.

Introduction to the macabre

The earliest form of an augmented reality can be found shortly after the introduction of the Lanterna Magica in the mid 16th century.

Giovanni Fontana, Christian Huygens and/or Athanasius Kircher invented the first apparatus to project images onto a flat surface with a type of lantern that focuses light through a painted glass image. The glass can be moved through a slit on the side of the projection apparatus, much like a more modern slide projector.

From F. Marion, The Wonders Of Optics, 1869

It wasn't long until the device became known as the "lantern of fear" when it was used to conjure up the occult. Apparitions of the dead projected onto smoke via rapidly exchanging images convinced an entire generation that some stage performers had actually attained a special connection to the afterlife. Combined with the bizarre fascination for spiritualism at that time, these shows caused such commotion that eventually the authorities stepped in to end it all.

What remained however was a new kind of artist: a cross-breed between magician and scientist.

Moving images

The French illusionist and filmmaker Georges Méliès accidentally discovered what became known as the stop-trick, whereby a movie shot is halted and the filmed scene is substituted with something different. A great example is his short movie The Hilarious Poster, filmed in 1906. A regular poster on a street takes on a life of its own with horseplay directed at passengers which happen to walk by.

The Hilarious Poster by Georges Méliès

In the modern film industry, early effects such as the light saber glow seen in the first Star Wars movies where inserted via rotoscoping, a technique where the movie is basically edited and enhanced frame-by-frame with a new painted overlay. Initially, the laser sword was a rod with scotchlite attached to it (the material used for traffic sign reflectors), but it turned out that it didn't reflect enough light for the movie, so the effect was added later.

Rotoscoping however is tedious, and surely there are better, more automated ways to add effects to a movie after it has been shot. A curious case is William's and Chou's "Interface", shown at SIGGRAPH 1985 featuring a shiny robot kissed by a woman. The textured-mapped robot however was rendered and added afterwards, while the reflection of this romantic scene on its polished surfaces was filmed using a mirroring ball. Lance Williams published a paper two years earlier which introduced mip-mapping to the world of computer graphics. In an almost Fermat-like manner, the paper contains a paragraph which reads:

If we represent the illumination of a scene as a two-dimensional map, highlights can be effectively antialiased in much the same way as textures. Blinn and Newell [I] demonstrated specular reflection using an illumination map. The map was an image of the environment (a spherical projection of the scene, indexed by the X and Y components of the surface normals) which could be used to cast reflections onto specular surfaces. The impression of mirrored facets and chrome objects which can be achieved with this method is striking; Figure (16) provides an illustration. Reflectance mapping is not, however, accurate for local reflections. To achieve similar results with three dimensional accuracy requires ray-tracing.

Figure 16, courtesy of Lance Williams

Using Jimm Blinn's modified environment mapping technique with real images instead of computer generated ones did the trick. Course notes from Miller and Hoffman at SIGGRAPH 1984 additionally added the idea of pre-convolving these images to simulate reflection off other glossy or diffuse materials. Eventually, the concept art became a reality when the technique was used in The Flight of the Navigator in 1986, followed by The Abyss in 1989 and Terminator 2 in 1991.

Trailer: The Flight of the Navigator

What makes this technique so elegant is that it circumvents the need to track light sources in your scene. One could also extract single point light sources from a hemispherical image, but small reflected highlights from shiny objects can easily distract the algorithm and cause flickering when using low sampling rates. A reflection map instead represents the entire lighting configuration and is therefore physically more plausible than a bunch of point lights.

There is however no inherent necessity to augment reality with objects that have to look physically plausible. First and foremost, it is important that whatever has been added to the scene is at its proper place at all times (i.e. geometrically registered). One grandiose example is Jessica Rabbit's Why don't you do it right:

A scene from: Who Framed Roger Rabbit

As you can see, Jessica sometimes vanishes behind other objects or people as if she were actually at the correct position in the 3D scene. Of course being a comic character, Jessica's painters took care of this, but one can use invisible impostors in a renderer to create a virtualized depth buffer of the real scene if none exists.

Note that the clip features another important property, which is scene consistency: Jessica's reflection on the real catwalk can be seen 14 seconds into the clip. The editors know about the reflective material and can draw the reflection that appears with the viewing angle of the camera. The shadow of her hand is also visible on some dude's face at exactly minute one. Jessica's presence in the scene changes reality (and not just by making the audience go nuts)!

Of course, ever since stop-motion was declared extinct by Phil Tippet while producing the animations for Jurassic Park, having computer-generated effects in movies has become the norm. But irrespective of the technique used to augment something into a real scene, capturing reality correctly and having consistent light interaction is key to create the proper illusion of a mixed reality. Real and virtual light sources, direct and indirect, influence the virtual and real space. The importance of this is nowhere more clear and apparent than when watching a bad, low-budget sci-fi movie.

Because your typical SFX director may go overboard with the special effects in a movie scene, it might happen that the only real thing left is the actor himself. Humans are sensitive to facial features, and it is therefore necessary to edit the appearance of skin reflections to match the scene. You don't want to end up with a green aura from the studio set engulfing the actor. SIGGRAPH 2000 brought a revolution to the film industry with the introduction of the Light Stage. In it, the actors face is captured multiple times under varying illumination. After enough samples have been gathered, for each pixel on the actors face one can effectively invert the rendering equation and extract the BSDF of the skin and reproduce it under different illumination.

The Digital Emily Project

This method doesn't just work with faces, but with materials in general. There is a broad range of reconstruction algorithms for materials, but the basic method still includes a dome which captures samples of a surface under varying illumination conditions. Every time simulated light is supposed to interact with a real surface, its properties have to be known upfront.

Light and Magic

So in essence, the evolution of the film industry from a simple projector used in 17th century theater to the intricate details produced for the The Lord of the Rings movies already provides enough knowledge on the problem of correct augmentation. Over time, artists figured out several methods to capture real light, honor light interaction with real and virtual surfaces and improve overall consistency by reconstructing the world in a more elaborate manner.

In the next articles, I will go into details on how to translate most of these methods to real-time algorithms. We'll start off creating a basic pipeline for an AR renderer and then proceed to create shaded virtual geometry first before relighting reality.


Fulgence Marion, The Wonders Of Optics

Jimm Blinn, Texture and reflection in computer generated images

Wikipedia, Rotoscoping

Lance Williams, Pyramidal parametrics

Disney, The Flight of the Navigator (Trailer)

Paul Debevec, The Story of Reflection Mapping

Georges Méliès, The Hilarious Poster

Wikipedia, Georges Méliès

Robert Zemeckies, Who Framed Roger Rabbit

USC, The Light Stages at UC Berkeley and USC ICT