1 \documentclass{article}
5 \usepackage{algpseudocode}
9 \title{An Implementation of a Simple, Efficient Method for Realistic
16 This report details the implementation of a method for rendering and
17 animating clouds by Dobashi et.\ al~\cite{dobashi2000simple} in
18 OpenGL. It currently implements the cellular automata, shading and
19 rendering parts as described in the paper, as well as additional
20 features for debugging and viewing the underlying process.
23 This paper is just one of many from Dobashi and Yoshinori's work into
24 the rendering of cloud and other atmospheric
25 effects~\cite{dobashi1996display,dobashi1998animation}. The process of
26 generating such clouds begins with cellular automata to simulate cloud
27 movement and growth. Whilst not physically accurate, it is remarkably
28 effective for how simple it is.
30 \subsection{Cellular automata}
31 The method for cellular automata is an extension of earlier
32 work~\cite{nagel1992self}, extended with two extra rules. The clouds
33 are simulated on a 3D grid, with three binary variables for each cell:
34 Humidity, transition (written as
35 \textit{act} in the paper for activation) and clouds. The clouds variable
36 determines what actually gets rendered in the end. The four rules that
37 operate on these variables are as follows:
39 \item[\bf Growth] Causes clouds to appear over time.
40 \item[\bf Extinction] Causes clouds to disappear over time.
41 \item[\bf Advection] Causes clouds to move over time (due to wind).
43 In addition to the three variables, there are also ellipsoids of
44 probabilities generated for extinction, humidity and transition. These
45 are generated randomly at the beginning and are used for the
46 extinction and advection rules. They are static in that they do not
47 change over time, and simulate air parcels.
49 After each time step in the cellular automata, the grid of binary
50 cloud values is converted to a grid of continuous densities, which
51 represent a field of metaballs of varying density.
54 Before each metaball is rendered to give the final cloud image, the
55 amount (and colour) of light reaching it needs to be
56 calculated. Because other cloud ``particles'' can obscure and scatter
57 the light, this needs to be calculated individually. The general
58 approach taken by the paper is to start with a blank white buffer,
59 projected \textit{orthogonally} from the suns point of view, and
60 starting from the closest metaball, multiply the attenuation ratios
61 onto the buffer. This slowly blots out the frame, blocking out the
62 light as more and more metaballs are added. Each time a metaball's
63 attenuation ratio is multiplied onto the buffer, the centre pixel of
64 that metaball is read, multiplied by the colour of the sun, and then
65 stored to be used later in the rendering step.
67 The attenuation ratios are stored as textures like the ones shown in
68 Figure~\ref{fig:textures}. These are different for each density, but
69 because the attenuation ratio isn't proportional to the density a
70 discrete number of textures (64 for this paper) are generated
71 instead. Then the attenuation ratio texture closest to each metaball's
72 density can be selected.
73 These textures need to be precalculated beforehand, just once.
75 The metaballs are rendered as billboards: flat surfaces that are
76 always orientated towards the camera. To multiply the attenuation
77 ratios, the OpenGL \texttt{GL\_MODULATE} texture blend parameter is
80 \subsection{Rendering}
81 Once the colour of light reaching each metaball has been calculated,
82 the metaballs are then rendered from the camera's perspective, onto
83 the buffer that the user will actually see. The buffer is cleared with
84 the colour of the sky, and then beginning with the metaball furthest
85 from the camera, each metaball is again rendered as a billboard using
86 the attenuation ratio texture. This time, the textures are blended
87 instead of modulated. This is not to be confused with the
88 fragment blending that occurs with \texttt{GL\_BLEND}.
90 \section{Implementation}
91 \subsection{Ceullar automata}
92 The implementation of the cellular automata simulation
93 was the most straightforward part of the process. I did not attempt to
94 implement it with bitwise operations --- the cost of simulation with
95 8-bit booleans was negligible given the other costs described later
96 on. The paper states that the humidity and activation fields are set
97 ``randomly'', but then delegates the details to~\cite{nagel1992self},
98 which I was unable to access. Therefore I made some artistic liberties
99 and chose a non-uniform probability of 0.01 and 0.005
102 \subsection{Metaballs}
103 Once the binary cells are converted to a grid of continuous densities,
104 these needed to specify a set of metaballs. There was a bit of
105 ambiguity here as the paper just states ``the continuous density
106 distribution is expressed by a set of metaballs'', so from the
107 implementation in~\cite{dobashi1998animation} I interpreted this as a
108 fixed grid of metaballs at each point, assigning a density to each one.
110 One of the biggest challenges I faced when interpreting the paper, was
111 figuring out how the billboard textures are precalculated. The paper
112 provides a vague diagram stating that both the attenuation ratio and
113 cumulative density are stored, but doesn't explain how to calculate
114 the former. I can only assume that it must be possible to calculate
115 this in a physically accurate way. Again, artistic liberties were
116 taken here in Listing~\ref{lst:textures} to guess a method for
117 generating the result in Figure~\ref{fig:textures}.
121 \includegraphics[width=2in]{attenuationTextures}~
122 \includegraphics[width=2in]{shadowMap}
123 \caption{Attenuation ratio textures for varying densities, and the
124 shadow map produced as a byproduct of shading.}\label{fig:textures}
128 \begin{lstlisting}[language=C,basicstyle=\footnotesize,float,caption=The
129 method for used for generating attenuation ratio textures,label={lst:textures}]
131 for (int j = 0; j < 32; j++) {
132 for (int i = 0; i < 32; i++) {
133 // TODO: properly calculate this
134 // instead of whatever this is
135 float r = distance(vec2(i, j), vec2(16, 16)) / 16;
136 float density = (float)d / NQ;
137 float x = (3 * density * (metaballField(r) / normFactor));
138 data[i + j * 32] = 1 - fmin(1, x);
144 The process of displaying a frame looks is given in the pseudocode of
145 Algorithm~\ref{alg:display}. The first main procedure that occurs is
146 shading the clouds, which calculates the colour of the light reaching
149 \subsubsection{Billboards}
150 As mentioned earlier, the metaballs are rendered as billboards that
151 always face the camera. This can be done quite easily, by just copying
152 the orientation part of the view matrix onto the orientation part of
153 the model matrix. That way, the metaball's orientation matches
154 exactly that of the camera's.
156 \mathit{view} &= \begin{bmatrix}
157 v_{00} & v_{01} & v_{02} & v_{03} \\
158 v_{10} & v_{11} & v_{12} & v_{13} \\
159 v_{20} & v_{21} & v_{22} & v_{23} \\
160 v_{30} & v_{31} & v_{32} & v_{33}
162 \mathit{model} &= \begin{bmatrix}
163 v_{00} & v_{01} & v_{02} & dx \\
164 v_{10} & v_{11} & v_{12} & dx \\
165 v_{20} & v_{21} & v_{22} & dx \\
170 \subsubsection{Blending}
171 \texttt{glBlendFunc} controls how the
172 fragment shader output and the existing buffer fragment are blended
173 together before writing to the buffer. Whilst shading, \texttt{GL\_ZERO}
174 is set for the fragment shader and \texttt{GL\_SRC\_ALPHA} is set for
175 the buffer. This means that the final fragment written is calculated
178 \mathit{bufferFrag} \gets \mathit{shaderFrag} * 0 + \mathit{bufferFrag} * \mathit{shaderFrag.alpha}
180 The effect of this is that the original buffer of RGBA(1,1,1,1) is
181 gradually blotted out, as the original white colour is multiplied away
182 by the attenuation ratios' alpha.
184 \subsubsection{Texture modes}
185 The paper specifies that the texture mode should be set to
186 \texttt{GL\_MODULATE} with \texttt{glTexEnv}. After trying this out,
187 my program kept on crashing. As it turns out, this is part of the old
188 fixed function pipeline. My implementation was shader based, so I
189 could no longer use this texture mode. Thankfully, the formulae used
190 for the texture modes are listed in the Man pages, and the logic could
191 be reimplemented in the shader, shown in Listing~\ref{lst:shader}.
193 \begin{lstlisting}[language=C,basicstyle=\footnotesize,float,caption={Logic
194 for old fixed-function pipeline texture modes, reimplemented in the
195 shader},label={lst:shader}]
196 // Cf = color from fragment, Ct = color from texture
197 // Cc = color from texture environment
198 // not set, defaults to (0,0,0,0)
199 // Af = alpha from fragment, At = alpha from texture
200 // C = output color, A = output alpha
201 float f = texture(tex, texCoord).r;
206 // the +0.06 is a hack to get lighter clouds!
207 // can be thought of as ambient light
208 FragColor = color * (f + 0.02);
211 // C = Cf * (1-Ct) + Cc * Ct
213 vec3 C = color.rgb * (1 - f);
214 float A = color.a * f;
215 FragColor = vec4(C, A);
220 \caption{Rendering and displaying a frame}\label{alg:display}
222 \Procedure{shadeClouds}{}
223 \State Sort metaballs in ascending distance from the sun
224 \State \Call{glUniform}{modulate, true}
225 \State \Call{glBlendFunc}{GL\_ZERO, GL\_SRC\_ALPHA}
226 \For{$k\gets \mathit{metaballs}$}
227 \State Rotate metaball to face the sun
228 \State \Call{glBindTexture}{textures[$\mathit{k.density}$]}
229 \State \Call{glDrawElements}{}
230 \State $c \gets \textrm{pixel at center of metaball}$
231 \State $\mathit{k.col} \gets c * \textrm{colour of sun}$
233 \State Bonus: Store the current framebuffer as a free shadow map
236 \Procedure{renderClouds}{}
237 \State Sort metaballs in descending distance from the camera
238 \State \Call{glUniform}{modulate, false}
239 \State \Call{glBlendFunc}{GL\_ONE, GL\_SRC\_ALPHA}
240 \For{$k \gets \mathit{metaballs}$}
241 \State Rotate metaball to face the sun
242 \State \Call{glBindTexture}{textures[$k$.density]}
243 \State \Call{glUniform}{$\mathit{color}, \mathit{k.col}$}
244 \State \Call{glDrawElements}{}
248 \Procedure{display}{}
249 \State $\mathit{projection} \gets $ \Call{orthographic}{}
250 \State $\mathit{view} \gets$ \Call{lookAt}{$\mathit{sunPos}, \mathit{sunPos} + \mathit{sunDir}$}
251 \State Clear buffer with RGBA(1,1,1,1)
252 \State \Call{shadeClouds}{}
253 \State $\mathit{projection} \gets $ \Call{perspective}{}
254 \State $\mathit{view} \gets$ \Call{lookAt}{$\mathit{camPos}, \mathit{camPos} + \mathit{camDir}$}
255 \State Clear buffer with sky colour
256 \State \Call{renderClouds}{}
261 \subsection{Rendering}
262 \subsubsection{Buffers}
263 The rendering process is similar to shading, in that we draw each
264 metaball onto a buffer. Originally the shading process happened in an
265 off-screen framebuffer so that the default framebuffer was
266 undisturbed, but since the rendering clears the buffer anyway and
267 draws on top of it, there was no need and it was eventually removed so
268 that everything happens in the one buffer. However caution was needed
269 to make sure that the orthographic projection in the shading process
270 was large enough, so that all the metaballs would fit in the same
271 dimensions as the frame it was being drawn to.
273 \subsubsection{Blending}
274 The blending for rendering uses the equation:
276 \mathit{bufferFrag} \gets \mathit{shaderFrag} * 1 + \mathit{bufferFrag} * \mathit{shaderFrag.alpha}
278 Unlike the shading process, the fragment from the shader actually gets
279 added into the buffer this time round. And likewise, the texture is no
280 longer modulated, so the uniform variable passed to the shader is updated.
283 It would have been near impossible to implement this correctly if I
284 weren't able to visualize all the different steps in this
285 process. Different debug modes were added, which can be accessed by
286 typing the numbers 0--4 on the keyboard. Currently they are implemented
287 all together in the main shader, toggled by boolean uniforms passed
288 into the shader. This is pretty inefficient, as when not debugging the
289 shaders still need to pay the price for all the branching and checking.
290 They should moved out into a separate shader and switched to whenever
291 a debug mode is entered.
296 \textbf{Key} & \textbf{Mode} \\
298 0 & Shaded \& rendered \\
300 2 & Metaball shading \\
301 3 & Transition probabilities \\
302 4 & Extinction probabilities
307 Despite this course being \textit{Real-time Rendering}, the rendering
308 process here was very much \textbf{not} real time. The authors of the
309 paper say that it took around 10 seconds to render a single frame for
310 a grid of $256\times128\times20$. On my machine in 2020, this took around 12
311 seconds. Something was definitely wrong, so I profiled the program and
312 found that the vast majority of time was being spent inside
313 \texttt{shadeClouds}: namely waiting on the \texttt{glReadPixels}
314 call. \texttt{glReadPixels} forces synchronization of the entire GPU
315 pipeline and blocks until the GPU is finished drawing and has
316 transferred the requested buffer data back. This was happening for
317 every single metaball, killing performance entirely. For perspective,
318 about 1 second was spent doing the simulation and rendering.
320 It isn't clear what exactly the authors used to read the pixel at each
321 metaball. But what was clear was that the pixel data wasn't needed
322 immediately. It's only needed for the rendering process, not the
323 shading. So I turned to pixel buffer objects (PBO) --- a buffer object
324 in OpenGL that can be used for \emph{asynchronous} pixel
325 transfers. The general workflow for packing (downloading) the data
326 from the GPU asynchronously from within a loop looked like this:
327 \begin{lstlisting}[language=C,basicstyle=\footnotesize,breaklines=true]
328 glBindBuffer(GL_PIXEL_PACK_BUFFER, pboBufs[pboIdx]);
329 glReadPixels(screenPos.x, screenPos.y, 64, 64, GL_RGBA,
330 GL_UNSIGNED_BYTE, NULL);
331 glBindBuffer(GL_PIXEL_PACK_BUFFER, pboBufs[pboBuf + 1]);
332 GLubyte *pixel = (GLubyte *)glMapBufferRange(GL_PIXEL_PACK_BUFFER, 0, 4 * sizeof(GLubyte), GL_MAP_READ_BIT);
333 glUnmapBuffer(GL_PIXEL_PACK_BUFFER);
334 glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);
336 \texttt{glReadPixels} now
337 queued an asynchronous read to the PBO that was bound and returned
338 immediately. Meanwhile, we would map the buffer of \textit{another}
339 PBO that already had \texttt{glReadPixels} called on it, and so had
342 Because the time it takes to (issue the commands to) draw a metaball
343 is so short, a large buffer of 512 PBOs were set up to minimize the
344 time spent waiting on downloads to finish. Reading the pixels is
345 still the bottleneck, but it now only takes 2.8 seconds instead of 16.
347 \section{Improvements}
348 Despite the nice speed up by using PBOs, the constant thrashing back
349 and forth the GPU is doing by reading and writing is undesirable. In
350 the future I would like to explore methods of storing the pixel values
351 somewhere on the GPU, and once all the metaballs are shaded, then
352 doing one single download to the CPU. I have discussed this with
353 people on the \#OpenGL channel on the Freenode IRC network, and
354 apparently it is possible to achieve something within a shader.
356 Using bitfields and bitwise operations for the simulation would also
357 undoubtedly improve performance, and the last contribution of the
358 paper, shafts of lights, remain to be implemented.
360 Lastly, a lot of implementation details still remain unclear to
361 me. Perhaps it was meant that two sets of textures were to be stored:
362 one for the attenuation ratios, and one for the cumulative densities?
363 Then the attenuation ratios would have been used for the shading part
364 whilst the cumulative densities would have been used the rendering
365 part. But again, we would need to figure out how to calculate both in
369 \includegraphics[width=0.3\textwidth]{render0}~
370 \includegraphics[width=0.3\textwidth]{render1}~
371 \includegraphics[width=0.3\textwidth]{render2}
373 \bibliographystyle{acm}
374 \bibliography{report}