2 title: 🚰 Hunting leaks with GDB
6 There's currently a nasty memory leak in
7 [haskell-ide-engine](https://github.com/haskell/haskell-ide-engine) which leaks
8 all the cached module information about from GHC. It seems to only occur on
9 certain platforms, but on platforms unfortunate to be affected by it, a sizeable
10 portion memory is leaked every time the user types or edits their document. For
11 a small module of about 60 lines, this was 30-ishMB per edit. (Yikes)
13 During this year's ZuriHac, Matthew Pickering, Daniel Gröber and I tried to get
14 this sorted out once and for all: It ended up taking the entire weekend, and I
15 learnt more about the RTS than I ever intended to.
19 The first step in the investigation began with figuring out what exactly was
20 leaking. To do this we built `hie` with [profiling
21 enabled](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/profiling.html):
22 This injects *call site* annotations into your code, and causes your executable
23 to be linked with one of the versions of the RTS that was built with profiling.
27 # or in our case, for cabal projects
28 $ cabal v2-install :hie --enable-profiling
31 GHC comes with a bunch of different RTS libraries, each built with a
32 different combination of features.
35 $ ls `ghc --print-libdir`/rts | grep rts
36 libHSrts-ghc8.6.5.dylib
38 libHSrts_debug-ghc8.6.5.dylib
40 libHSrts_l-ghc8.6.5.dylib
43 libHSrts_thr-ghc8.6.5.dylib
45 libHSrts_thr_debug-ghc8.6.5.dylib
47 libHSrts_thr_l-ghc8.6.5.dylib
52 `thr` stands for threading, `debug` for debug, `l` for [event
53 logging](#bonus-event-logs) and `p` for profiling. If you pass GHC `-v`, you should
54 see `-lHSrts_thr_p` or equivalent when linking: We're bringing in the RTS built
55 with threading and profiling enabled.
57 Now we could profile our executable by running it with specific flags passed to
58 the RTS. To see what's available, we ran `hie +RTS --help`
61 hie: -h<break-down> Heap residency profile (hp2ps) (output file <program>.hp)
62 hie: break-down: c = cost centre stack (default)
64 hie: d = closure description
65 hie: y = type description
67 hie: b = biography (LAG,DRAG,VOID,USE)
71 Here we used use the retainer breakdown, `-hr`, to help us find what exactly is
72 being retained i.e. leaking.
73 We originally just configured the language client to launch `hie` with the flag,
74 but we found ourselves doing it so often that we created a test scenario with
75 [lsp-test](https://hackage.haskell.org/package/lsp-test) and automated the
81 build-depends: base, lsp-test
83 import Language.Haskell.LSP.Test
84 import Language.Haskell.LSP.Types
87 runSession ("hie +RTS -hr") fullCaps "../test" $ do
88 doc <- openDoc "TestFile.hs" "haskell"
91 sendNotification TextDocumentDidSave (DidSaveTextDocumentParams doc)
92 waitForDiagnostics -- wait before sending it again
95 Then we profiled and got some data:
98 $ hp2ps -c hie.ps # c for colour
101 ![Retainer Profile](/images/retainBefore.png)
103 You can see `tcRnImports` and `zonkEvBinds` gradually eating up more memory over
104 time: These are parts of the GHC API and so I narrowed it down to most likely
105 coming from our `UriCaches`, where we store the `TypecheckedModule` and
109 type UriCaches = Map.Map FilePath UriCacheResult
111 data UriCacheResult = UriCacheSuccess UriCache
114 data UriCache = UriCache
115 { cachedInfo :: !CachedInfo
116 , cachedPsMod :: !ParsedModule
117 , cachedTcMod :: !(Maybe TypecheckedModule)
118 -- | Data pertaining to the typechecked module,
119 -- not the parsed module
120 , cachedData :: !(Map.Map TypeRep Dynamic)
124 # The weak pointer trap
126 Now that we knew *what* was leaking, the question turned to *where* was it leaking.
127 The next steps are taken from [Simon Marlow's excellent blog
128 post](http://simonmar.github.io/posts/2018-06-20-Finding-fixing-space-leaks.html),
129 in which he details how he tracked down and fixed multiple memory leaks in GHCi.
131 The trick is to create a weak pointer -- a pointer that the garbage collector
132 ignores when looking for retainers. Point it to the offending type and store it
133 alongside the original, strongly referenced one.
134 Whenever you know the object should have been deallocated, force garbage
135 collection by calling `performGC` and dereference the weak pointer. If the
136 pointer is null, it was released, otherwise something is still holding onto it:
137 the object is being leaked.
139 We modified `UriCache` to look like the following:
142 data UriCacheResult = UriCacheSuccess (Weak UriCache) UriCache
145 mkUriCacheSuccess :: UriCache -> IO UriCacheResult
146 mkUriCacheSuccess uc = do
147 -- second arg is callback when GC is run
148 ptr <- mkWeakPtr uc Nothing
149 return (UriCacheSucces ptr uc)
152 We knew that when caching a module in the `cacheModule` function, it should
153 replace the old cached module, so that seemed like a sensible place to add in
154 the `performGC` trap.
159 res <- liftIO $ mkUriCacheSuccess newUc
161 maybeOldUc <- Map.lookup uri' . uriCaches <$> getModuleCache
163 modifyCache $ \gmc ->
164 gmc { uriCaches = Map.insert uri' res (uriCaches gmc) }
168 Just (UriCacheSuccess wkPtr _) -> liftIO $ do
170 derefed <- deRefWeak wkPtr
172 Just _ -> error "Leaking!"
173 Nothing -> hPutStrLn stderr "Not leaking"
177 With our trap set up in place, we ran the test again.
182 2019-06-20 00:01:32.650578 [ThreadId 4] - Using stack GHC version
183 2019-06-20 00:01:33.826226 [ThreadId 4] - Run entered for HIE(hie) Version 0.10.0.0, Git revision d679eb595a6a1583a8b63a3fdd5964f84a8d00f6 (dirty) (2801 commits) x86_64 ghc-8.6.5
184 2019-06-20 00:01:33.826997 [ThreadId 4] - Current directory:/Users/luke/Source/haskell-ide-engine
185 2019-06-20 00:01:33.836594 [ThreadId 9] - Using stack GHC version
187 CallStack (from HasCallStack):
188 error, called at ./Haskell/Ide/Engine/ModuleCache.hs:255:19 in hie-plugin-api-0.10.0.0-inplace:Haskell.Ide.Engine.ModuleCache
189 CallStack (from -prof):
190 Haskell.Ide.Engine.ModuleCache.cacheModule (Haskell/Ide/Engine/ModuleCache.hs:(221,1)-(260,29))
191 Haskell.Ide.Engine.ModuleCache.CAF (<entire-module>)
194 Nice. At this point we knew the *what* and the *where*, all we needed to know
197 # Poking about in GDB
199 For this there is the convenient `findPtr` function included in
200 the debug RTS. It's a C function that we can call during runtime from a
201 debugger to find out what is referencing a closure. To access it we first needed
202 to pass two flags to GHC[1](footnote:1) (or in our case set them in Cabal):
205 $ ghc -debug -fwhole-archive-hs-libs Main.hs
208 `-debug` brings in the version of RTS with debug stuff in it, whilst
209 `-fwhole-archive-hs-libs` is needed to prevent the linker from optimising out
210 the `findPtr` function: Nothing actually calls it, but we still want it
211 available in our final binary.
213 After rebuilding `hie`, it was time to hijack:
217 $ gdb -pid `pidof hie`
218 # alternatively, beforehand:
219 $ gdb -ex "set follow-fork-mode child" ./Leak.hs
222 Now that we were in, we needed to find out the address of the [info
223 table](https://gitlab.haskell.org/ghc/ghc/wikis/commentary/rts/storage/heap-objects#info-tables):
224 We're looking for a symbol that ends in `UriCacheSuccess_con_info`.
227 (gdb) info functions UriCacheSuccess_con_info
228 All functions matching regular expression "UriCacheSuccess_con_info":
230 Non-debugging symbols:
231 0x0000000111956650 hiezmpluginzmapizm0zi10zi0zi0zminplace_HaskellziIdeziEngineziGhcModuleCache_UriCacheSuccess_con_info_dsp
232 0x0000000111956668 hiezmpluginzmapizm0zi10zi0zi0zminplace_HaskellziIdeziEngineziGhcModuleCache_UriCacheSuccess_con_info
235 We plugged in the rather gnarly
236 `hiezmpluginzmapizm0zi10zi0zi0zminplace_HaskellziIdeziEngineziGhcModuleCache_UriCacheSuccess_con_info`
237 into `findPtr`, which will go ahead and find all closures in the heap referring
238 to a `UriCacheSuccess`:
241 (gdb) p (void)findPtr(...UriCacheSuccess_con_info, 1)
242 0x4205499088 = hie-plugin-api-0.10.0.0-inplace:Haskell.Ide.Engine.GhcModuleCache.UriCacheSuccess(0x4205499139, 0x4205499110)
243 0x42154a88a8 = hie-plugin-api-0.10.0.0-inplace:Haskell.Ide.Engine.GhcModuleCache.UriCacheSuccess(0x42154be8d1, 0x42154a8a30)
246 And now we have the culprit retainers lined up right before us. We can go deeper
247 with another call to `findPtr`: here the 1 passed to it means to follow the
251 (gdb) p (void)findPtr(0x4205499088, 1)
253 == in ./Leak.hs stderr ==
255 0x4205499088 = hie-plugin-api-0.10.0.0-inplace:Haskell.Ide.Engine.GhcModuleCache.UriCacheSuccess(0x4205499139, 0x4205499110)
256 0x42154a88a8 = hie-plugin-api-0.10.0.0-inplace:Haskell.Ide.Engine.GhcModuleCache.UriCacheSuccess(0x42154be8d1, 0x42154a8a30)
257 0x4205499148 = THUNK(0x43b9600, 0x4205499072, 0x4205499089, 0x42154a93c1)
259 0x42054990a0 = hie-plugin-api-0.10.0.0-inplace:Haskell.Ide.Engine.PluginsIdeMonads.IdeState(0x4205499148, 0x42154a9370, 0xc19ff3a, 0x42154a9390)
261 0x42154be928 = TVAR(value=0x42054990a1, wq=0xc353d90, num_updates=0)
263 0x42154be918 = base:GHC.Conc.Sync.TVar(0x42154be928)
266 0x42154bf548 = FUN/1(0x43f76b0, 0x42154be919)
267 0x42154bf5c8 = FUN/1(0x43f76b0, 0x42154be919)
274 Is that an unevaluated thunk? Let's take a closer look at it.
277 0x4205499148 = THUNK(0x43b9600, 0x4205499072, 0x4205499089, 0x42154a93c1)
280 The first address we see on the right is the address of the thunk on the heap.
281 To the right of the `=` is its contents. We can see the layout of such a thunk
282 in GHC's source, at `rts/storage/Closures.h`:
286 StgThunkHeader header;
287 struct StgClosure_ *payload[];
291 const StgInfoTable* info;
292 #if defined(PROFILING)
295 StgSMPThunkHeader smp;
303 So the first address we're seeing, `0x43b9600`, is the `const StgInfoTable* info`
304 pointer. It's then followed by a word of padding, which `findPtr` leaves out for
305 us. (The profiling enabled version of the RTS also passes about an extra word of
308 The remaining values are the free variables of the function. There's a lot more
309 to talk about [the runtime representation of
310 thunks](https://gitlab.haskell.org/ghc/ghc/wikis/commentary/rts/storage/heap-objects#thunks),
311 , more than I could ever fit into a blog post, so I'll return to our leak.
315 We now knew the thunk that was retaining the `UriCache`s, but we didn't know
316 where it was being created. We could figure out where it's coming from by
317 compiling `hie` with debug information: not to be confused with linking with the
318 debug version of the RTS.
321 $ ghc -g3 -debug -fwhole-archive-hs-libs Main.hs
323 # (make sure to stop cabal from just stripping it back out)
324 $ cabal v2-build hie --enable-debug-info --disable-executable-stripping --disable-library-stripping
325 # or in your cabal.project
330 executable-stripping: False
331 library-stripping: False
334 This attached source locations to the info tables, so after another rebuild and
335 another GDB session, we could simply print out the source code listing for the
339 0x42104e1148 = THUNK(0x43c6280, 0x42104e1072, 0x42104e1089, 0x4217654a31)
342 (gdb) list *0x43c6280
343 56 -- ---------------------------------------------------------------------
345 58 modifyCache :: (HasGhcModuleCache m) => (GhcModuleCache -> GhcModuleCache) -> m ()
346 59 modifyCache f = do
347 60 mc <- getModuleCache
348 61 setModuleCache (f mc)
350 63 -- ---------------------------------------------------------------------
351 64 -- | Runs an IdeM action with the given Cradle
352 65 withCradle :: (GM.GmEnv m) => GM.Cradle -> m a -> m a
355 And there we had it: unevaluated thunks of `f` piling up and eating memory.
358 this](https://github.com/mpickering/haskell-ide-engine/commit/c3bdae0f7df63c8bdb4f360285feeb7eadff8966)
359 is usually [making sure to strictly
360 evaluate](https://github.com/haskell/haskell-ide-engine/pull/1292/files) the
361 [offending thunk](https://github.com/haskell/haskell-ide-engine/pull/1305/files).
363 We used this technique to hunt down several retainers, but be warned: stifling
364 one retainer isn't necessarily guaranteed to fix the memory leak, as it is
365 possible (and likely) that another retainer out exists and is still holding on.
367 But now we know what we're doing. We'll be using this to squash more memory
368 leaks down the line, and hopefully you can too.
372 In addition to profiling via `-hr`, it's possible to capture event logs. They
373 can be thought of as a more modern version of the heap profiles.
374 If your executable was compiled with `-debug`, all you need to do is pass `-l`
375 alongside your regular profiling flags:
381 This will produce a little `exe.eventlog` file which you can use with
382 profiling tools like ThreadScope.
384 Matthew Pickering has made a handy tool,
385 [eventlog2html](https://mpickering.github.io/eventlog2html/), to convert these
386 into interactive graphs. I'll leave you with the trace of our memory leak
394 1. GHC 8.6.5 and earlier don't come with an RTS with both debugging and
395 profiling enabled, so we've disable profiling here. This has
396 [been fixed in GHC 8.8](https://github.com/ghc/ghc/commit/d36b1ffac9960db70043aaab43c931ce217912ba#diff-21dcbd7f6f00e9a602377d3f69bff4ba).