report/report.tex

   1 \documentclass{article}
   2
   3 \usepackage{listings}
   4 \usepackage{algorithm}
   5 \usepackage{algpseudocode}
   6 \usepackage{graphicx}
   7 \usepackage{amsmath}
   8
   9 \title{An Implementation of a Simple, Efficient Method for Realistic
  10   Animation of Clouds}
  11 \author{Luke Lau}
  12 \begin{document}
  13 \maketitle
  14
  15 \section{Abstract}
  16 This report details the implementation of a method for rendering and
  17 animating clouds by Dobashi et.\ al~\cite{dobashi2000simple} in
  18 OpenGL. It currently implements the cellular automata, shading and
  19 rendering parts as described in the paper, as well as additional
  20 features for debugging and viewing the underlying process.
  21
  22 \section{Background}
  23 This paper is just one of many from Dobashi and Yoshinori's work into
  24 the rendering of cloud and other atmospheric
  25 effects~\cite{dobashi1996display,dobashi1998animation}. The process of
  26 generating such clouds begins with cellular automata to simulate cloud
  27 movement and growth. Whilst not physically accurate, it is remarkably
  28 effective for how simple it is.
  29
  30 \subsection{Cellular automata}
  31 The method for cellular automata is an extension of earlier
  32 work~\cite{nagel1992self}, extended with two extra rules. The clouds
  33 are simulated on a 3D grid, with three binary variables for each cell:
  34 Humidity, transition (written as
  35 \textit{act} in the paper for activation) and clouds. The clouds variable
  36 determines what actually gets rendered in the end. The four rules that
  37 operate on these variables are as follows:
  38 \begin{enumerate}
  39 \item[\bf Growth] Causes clouds to appear over time.
  40 \item[\bf Extinction] Causes clouds to disappear over time.
  41 \item[\bf Advection] Causes clouds to move over time (due to wind).
  42 \end{enumerate}
  43 In addition to the three variables, there are also ellipsoids of
  44 probabilities generated for extinction, humidity and transition. These
  45 are generated randomly at the beginning and are used for the
  46 extinction and advection rules. They are static in that they do not
  47 change over time, and simulate air parcels.
  48
  49 After each time step in the cellular automata, the grid of binary
  50 cloud values is converted to a grid of continuous densities, which
  51 represent a field of metaballs of varying density.
  52
  53 \subsection{Shading}
  54 Before each metaball is rendered to give the final cloud image, the
  55 amount (and colour) of light reaching it needs to be
  56 calculated. Because other cloud ``particles'' can obscure and scatter
  57 the light, this needs to be calculated individually. The general
  58 approach taken by the paper is to start with a blank white buffer,
  59 projected \textit{orthogonally} from the suns point of view, and
  60 starting from the closest metaball, multiply the attenuation ratios
  61 onto the buffer. This slowly blots out the frame, blocking out the
  62 light as more and more metaballs are added. Each time a metaball's
  63 attenuation ratio is rendered into the buffer, the centre pixel of
  64 that metaball is read, multiplied by the colour of the sun, and then
  65 stored for later use in the rendering step. Although we do not take
  66 advantage of it in this paper, as a byproduct of calculating this, we
  67 get a shadow map of the clouds for free!
  68
  69 The attenuation ratios are stored as textures like the ones shown in
  70 Figure~\ref{fig:textures}. These are different for each density, but
  71 because the attenuation ratio isn't proportional to the density a
  72 discrete number of textures (64 for this paper) are generated
  73 instead. Then the attenuation ratio texture closest to each metaball's
  74 density can be selected.
  75 These textures need to be precalculated beforehand, just once.
  76
  77 The metaballs are rendered as billboards: flat surfaces that are
  78 always orientated towards the camera. To multiply the attenuation
  79 ratios, the OpenGL \texttt{GL\_MODULATE} texture blend parameter is
  80 used.
  81
  82 \subsection{Rendering}
  83 Once the colour of light reaching each metaball has been calculated,
  84 the metaballs are then rendered from the camera's perspective, onto
  85 the buffer that the user will actually see. The buffer is cleared with
  86 the colour of the sky, and then beginning with the metaball furthest
  87 from the camera, each metaball is again rendered as a billboard using
  88 the attenuation ratio texture. This time, the textures are blended
  89 instead of modulated. This is not to be confused with the
  90 fragment blending that occurs with \texttt{GL\_BLEND}.
  91
  92 \section{Implementation}
  93 \subsection{Cellular automata}
  94 The implementation of the cellular automata simulation
  95 was the most straightforward part of the process. I did not attempt to
  96 implement it with bitwise operations --- the cost of simulation with
  97 8-bit booleans was negligible given the other costs described later
  98 on.  The paper states that the humidity and activation fields are set
  99 ``randomly'', but then delegates the details to~\cite{nagel1992self},
 100 which I was unable to access. Therefore I made some artistic liberties
 101 and chose a non-uniform probability of 0.01 and 0.005
 102 respectively.
 103
 104 \subsection{Metaballs}
 105 Once the binary cells are converted to a grid of continuous densities,
 106 they need to be mapped to metaballs that can be rendered. There was a
 107 bit of ambiguity here as the paper just states ``the continuous
 108 density distribution is expressed by a set of metaballs'', so given
 109 the implementation in~\cite{dobashi1998animation}, I interpreted this
 110 as a fixed grid of fixed-sized metaballs at each point, with a density
 111 assigned to each one.
 112
 113 One of the biggest challenges I faced when interpreting the paper was
 114 figuring out how the billboard textures were precalculated. The paper
 115 provides a vague diagram stating that both the attenuation ratio and
 116 cumulative density are stored, but doesn't explain how to calculate
 117 the former. I can only assume that it must be possible to calculate
 118 this in a physically accurate way. Again, artistic liberties were
 119 taken here in Listing~\ref{lst:textures} to guess a method for
 120 generating the result in Figure~\ref{fig:textures}. Only the
 121 ``attenuation ratio'' is stored.
 122
 123 \begin{figure}
 124   \centering
 125   \includegraphics[width=2in]{attenuationTextures}~
 126   \includegraphics[width=2in]{shadowMap}
 127   \caption{Attenuation ratio textures for varying densities, and the
 128     shadow map produced as a byproduct of shading.}\label{fig:textures}
 129 \end{figure}
 130
 131
 132 \begin{lstlisting}[language=C,basicstyle=\footnotesize,float,caption=The
 133   method for used for generating attenuation ratio textures,label={lst:textures}]
 134 float data[32 * 32];
 135 for (int j = 0; j < 32; j++) {
 136   for (int i = 0; i < 32; i++) {
 137     // TODO: properly calculate this
 138     // instead of whatever this is
 139     float r = distance(vec2(i, j), vec2(16, 16)) / 16;
 140     float density = (float)d / NQ;
 141     float x = (3 * density * (metaballField(r) / normFactor));
 142     data[i + j * 32] = 1 - fmin(1, x);
 143   }
 144 }
 145 \end{lstlisting}
 146
 147 \subsection{Shading}
 148 The process of displaying a frame looks is given in the pseudocode of
 149 Algorithm~\ref{alg:display}. The first main procedure that occurs is
 150 shading the clouds, which calculates the colour of the light reaching
 151 each metaball.
 152
 153 \subsubsection{Billboards}
 154 As mentioned earlier, the metaballs are rendered as billboards that
 155 always face the camera. This can be done quite easily, by just copying
 156 the orientation part of the view matrix onto the orientation part of
 157 the model matrix. That way, the metaball's orientation matches
 158 exactly that of the camera's.
 159 \begin{align*}
 160   \mathit{view} &= \begin{bmatrix}
 161     v_{00} & v_{01} & v_{02} & v_{03} \\
 162     v_{10} & v_{11} & v_{12} & v_{13} \\
 163     v_{20} & v_{21} & v_{22} & v_{23} \\
 164     v_{30} & v_{31} & v_{32} & v_{33}
 165   \end{bmatrix} &&
 166                    \mathit{model} &= \begin{bmatrix}
 167                      v_{00} & v_{01} & v_{02} & dx \\
 168                      v_{10} & v_{11} & v_{12} & dx \\
 169                      v_{20} & v_{21} & v_{22} & dx \\
 170                      0 & 0 & 0 & 1 \\
 171                    \end{bmatrix}
 172 \end{align*}
 173
 174 \subsubsection{Blending}
 175 \texttt{glBlendFunc} controls how the
 176 fragment shader output and the existing buffer fragment are blended
 177 together before writing to the buffer. Whilst shading, \texttt{GL\_ZERO}
 178 is set for the fragment shader and \texttt{GL\_SRC\_ALPHA} is set for
 179 the buffer. This means that the final fragment written is calculated
 180 as:
 181 \[
 182   \mathit{bufferFrag} \gets \mathit{shaderFrag} * 0 + \mathit{bufferFrag} * \mathit{shaderFrag.alpha}
 183 \]
 184 The effect of this is that the original buffer of RGBA(1,1,1,1) is
 185 gradually blotted out, as the original white colour is multiplied away
 186 by the attenuation ratios' alpha.
 187
 188 \subsubsection{Texture modes}
 189 The paper specifies that the texture mode should be set to
 190 \texttt{GL\_MODULATE} with \texttt{glTexEnv}. After trying this out,
 191 my program kept on crashing. As it turns out, this is part of the old
 192 fixed function pipeline. My implementation was shader based, so I
 193 could no longer use this texture mode. Thankfully, the formulae used
 194 for the texture modes are listed in the Man pages, and the logic could
 195 be reimplemented in the shader, shown in Listing~\ref{lst:shader}.
 196
 197 \begin{lstlisting}[language=C,basicstyle=\footnotesize,float,caption={Logic
 198   for old fixed-function pipeline texture modes, reimplemented in the
 199   shader},label={lst:shader}]
 200 // Cf = color from fragment, Ct = color from texture
 201 // Cc = color from texture environment
 202 //      not set, defaults to (0,0,0,0)
 203 // Af = alpha from fragment, At = alpha from texture
 204 // C = output color, A = output alpha
 205 float f = texture(tex, texCoord).r;
 206 if (modulate) {
 207   // GL_MODULATE: C
 208   // C = Cf * Ct
 209   // A = Af * At
 210   // the +0.06 is a hack to get lighter clouds!
 211   // can be thought of as ambient light
 212   FragColor = color * (f + 0.02);
 213 } else {
 214   // GL_BLEND:
 215   // C = Cf * (1-Ct) + Cc * Ct
 216   // A = Af * At
 217   vec3 C = color.rgb * (1 - f);
 218   float A = color.a * f;
 219   FragColor = vec4(C, A);
 220 }
 221 \end{lstlisting}
 222
 223 \begin{algorithm}
 224   \caption{Rendering and displaying a frame}\label{alg:display}
 225   \begin{algorithmic}
 226     \Procedure{shadeClouds}{}
 227     \State Sort metaballs in ascending distance from the sun
 228     \State \Call{glUniform}{modulate, true}
 229     \State \Call{glBlendFunc}{GL\_ZERO, GL\_SRC\_ALPHA}
 230     \For{$k\gets \mathit{metaballs}$}
 231     \State Rotate metaball to face the sun
 232     \State \Call{glBindTexture}{textures[$\mathit{k.density}$]}
 233     \State \Call{glDrawElements}{}
 234     \State $c \gets \textrm{pixel at center of metaball}$
 235     \State $\mathit{k.col} \gets c * \textrm{colour of sun}$
 236     \EndFor
 237     \State Bonus: Store the current framebuffer as a free shadow map
 238     \EndProcedure
 239
 240     \Procedure{renderClouds}{}
 241     \State Sort metaballs in descending distance from the camera
 242     \State \Call{glUniform}{modulate, false}
 243     \State \Call{glBlendFunc}{GL\_ONE, GL\_SRC\_ALPHA}
 244     \For{$k \gets \mathit{metaballs}$}
 245     \State Rotate metaball to face the sun
 246     \State \Call{glBindTexture}{textures[$k$.density]}
 247     \State \Call{glUniform}{$\mathit{color}, \mathit{k.col}$}
 248     \State \Call{glDrawElements}{}
 249     \EndFor
 250     \EndProcedure
 251
 252     \Procedure{display}{}
 253     \State $\mathit{projection} \gets $ \Call{orthographic}{}
 254     \State $\mathit{view} \gets$ \Call{lookAt}{$\mathit{sunPos}, \mathit{sunPos} + \mathit{sunDir}$}
 255     \State Clear buffer with RGBA(1,1,1,1)
 256     \State \Call{shadeClouds}{}
 257     \State $\mathit{projection} \gets $ \Call{perspective}{}
 258     \State $\mathit{view} \gets$ \Call{lookAt}{$\mathit{camPos}, \mathit{camPos} + \mathit{camDir}$}
 259     \State Clear buffer with sky colour
 260     \State \Call{renderClouds}{}
 261     \EndProcedure
 262   \end{algorithmic}
 263 \end{algorithm}
 264
 265 \subsection{Rendering}
 266 \subsubsection{Buffers}
 267 The rendering process is similar to shading, in that we draw each
 268 metaball onto a buffer. Originally the shading process happened in an
 269 off-screen framebuffer so that the default framebuffer wouldn't be
 270 disturbed. But since the buffer is cleared before rendering anyway,
 271 there was no need and it was eventually removed so that everything
 272 happens in the one buffer. Caution was needed however to ensure that
 273 the orthographic projection in the shading process was large enough to
 274 fit all the metaballs in the frame.
 275
 276 \subsubsection{Blending}
 277 The blending for rendering uses the equation:
 278 \[
 279   \mathit{bufferFrag} \gets \mathit{shaderFrag} * 1 + \mathit{bufferFrag} * \mathit{shaderFrag.alpha}
 280 \]
 281 Unlike the shading process, the fragment from the shader actually gets
 282 added into the buffer this time round. And likewise, the texture is no
 283 longer modulated, so the uniform variable passed to the shader is updated.
 284
 285 \subsection{Debug}
 286 It would have been near impossible to implement this correctly if I
 287 weren't able to visualize all the different steps in this
 288 process. Different debug modes were added, which can be accessed by
 289 typing the numbers 0--4 on the keyboard. Currently they are implemented
 290 all together in the main shader, toggled by boolean uniforms passed
 291 into the shader. This is pretty inefficient, as when not debugging the
 292 shaders still need to pay the price for all the branching and checking.
 293 They should moved out into a separate shader and switched to whenever
 294 a debug mode is entered.
 295 \begin{table}[h]
 296   \centering
 297   \begin{tabular}{c|c}
 298     \hline
 299     \textbf{Key} & \textbf{Mode} \\
 300     \hline
 301     0 & Shaded \& rendered \\
 302     1 & Cloud density \\
 303     2 & Metaball shading \\
 304     3 & Transition probabilities \\
 305     4 & Extinction probabilities
 306   \end{tabular}
 307 \end{table}
 308
 309 \section{Evaluation}
 310 Despite this course being \textit{Real-time Rendering}, the rendering
 311 process here was very much \textbf{not} real time. The authors of the
 312 paper say that it took around 10 secondsto render a single frame for
 313 a grid of $256\times128\times20$ --- in 2005. On my machine in 2020, this took around 12
 314 seconds. Something was definitely wrong, so I profiled the program and
 315 found that the vast majority of time was being spent inside
 316 \texttt{shadeClouds}: namely waiting on the \texttt{glReadPixels}
 317 call. \texttt{glReadPixels} forces synchronization of the entire GPU
 318 pipeline and blocks until the GPU is finished drawing and has
 319 transferred the requested buffer data back. This was happening for
 320 every single metaball, killing performance entirely. For perspective,
 321 about 1 second was spent doing the simulation and rendering.
 322
 323 It isn't clear what exactly the authors used to read the pixel at each
 324 metaball. But what was clear was that the pixel data wasn't needed
 325 immediately. It's only needed for the rendering process, not the
 326 shading. So I turned to pixel buffer objects (PBO) --- a buffer object
 327 in OpenGL that can be used for \emph{asynchronous} pixel
 328 transfers. The general workflow for packing (downloading) the data
 329 from the GPU asynchronously from within a loop looked like this:
 330 \begin{lstlisting}[language=C,basicstyle=\footnotesize,breaklines=true]
 331 glBindBuffer(GL_PIXEL_PACK_BUFFER, pboBufs[pboIdx]);
 332 glReadPixels(screenPos.x, screenPos.y, 64, 64, GL_RGBA,
 333 GL_UNSIGNED_BYTE, NULL);
 334 glBindBuffer(GL_PIXEL_PACK_BUFFER, pboBufs[pboBuf + 1]);
 335 GLubyte *pixel = (GLubyte *)glMapBufferRange(GL_PIXEL_PACK_BUFFER, 0, 4 * sizeof(GLubyte), GL_MAP_READ_BIT);
 336 glUnmapBuffer(GL_PIXEL_PACK_BUFFER);
 337 glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);
 338 \end{lstlisting}
 339 \texttt{glReadPixels} now
 340 queued an asynchronous read to the PBO that was bound and returned
 341 immediately. Meanwhile, we would map the buffer of \textit{another}
 342 PBO that already had \texttt{glReadPixels} called on it, and so had
 343 data ready for us.
 344
 345 Because the time it takes to (issue the commands to) draw a metaball
 346 is so short, a large buffer of 512 PBOs was set up to minimize the
 347 time spent waiting on downloads to finish. Reading the pixels is still
 348 the bottleneck, but thanks to PBOs it now only takes 2.8 seconds instead
 349 of 16.
 350
 351 \section{Improvements}
 352 Despite the nice speed up by using PBOs, the constant thrashing back
 353 and forth the GPU is doing by reading and writing is undesirable. In
 354 the future I would like to explore methods of storing the pixel values
 355 somewhere on the GPU, and once all the metaballs are shaded, then
 356 doing one single download to the CPU. I have discussed this with
 357 people on the \#OpenGL channel on the Freenode IRC network, and
 358 apparently it is possible to achieve something within a shader.
 359
 360 Using bitfields and bitwise operations for the simulation would also
 361 undoubtedly improve performance, and the last contribution of the
 362 paper, shafts of lights, remain to be implemented.
 363
 364 Lastly, a lot of implementation details still remain unclear to
 365 me. Perhaps it was meant that two sets of textures were to be stored:
 366 one for the attenuation ratios, and one for the cumulative densities?
 367 Then the attenuation ratios would have been used for the shading part
 368 whilst the cumulative densities would have been used the rendering
 369 part. But again, we would need to figure out how to calculate both in
 370 the first place.
 371
 372 \vspace{.2in}
 373 \includegraphics[width=0.3\textwidth]{render0}~
 374 \includegraphics[width=0.3\textwidth]{render1}~
 375 \includegraphics[width=0.3\textwidth]{render2}
 376
 377 \bibliographystyle{acm}
 378 \bibliography{report}
 379
 380 \end{document}