changed 6 years ago
Published Linked with GitHub

NodeJS CPU heavy tick and memory leak debugging

https://hackmd.io/p/js-paris-nodejs


Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Max Wu

CTO of HackMD

I love great product and build them.
I'm developing my dream with code.


Node.js

Node.js is a JavaScript runtime built on Chrome’s V8 JavaScript engine.


Node.js

  • non-blocking event loop
  • single threaded
  • use Garbage Collector to free up memory

ref.


It's super easy to write a Node.js application.
But it's hard to hunt the Ghost in the shell.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →


Once upon a time

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →



The CPU usage goes too high

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →


It stops responsing to the requests

and recover after few minutes

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →


It might be restarted by scheduler due to the unresponsiveness

(health check failed)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →


We need to find out who's the killer

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →


Node.js profiling

built-in tool to profile the app

node --prof app.js


Use ApacheBench to make some load

ab -k -c 20 -n 250 "http://localhost:3000"

20 concurrent connections with 250 requests per each


Get the output of ApacheBench

Concurrency Level:      20
Time taken for tests:   46.932 seconds
Complete requests:      250
Failed requests:        0
Keep-Alive requests:    250
Total transferred:      50250 bytes
HTML transferred:       500 bytes
Requests per second:    5.33 [#/sec] (mean)
Time per request:       3754.556 [ms] (mean)
Time per request:       187.728 [ms] (mean, across all concurrent requests)
Transfer rate:          1.05 [Kbytes/sec] received

...

Percentage of the requests served within a certain time (ms)
  50%   3755
  66%   3804
  75%   3818
  80%   3825
  90%   3845
  95%   3858
  98%   3874
  99%   3875
 100%   4225 (longest request)

Profiling tool will generate a log file

named like:
isolate-0xnnnnnnnnnnnn-v8.log

wait for a while to collect more data


Analyze the log file

process the log file the get the statistics by this command

node --prof-process isolate-0xnnnnnnnnnnnn-v8.log > processed.txt


View the result

The file is broken up into sections which are again broken up by language

[Bottom up (heavy) profile]:
  Note: percentage shows a share of a particular caller in the total
  amount of its parent calls.
  Callers occupying less than 1.0% are not shown.

   ticks parent  name
  98984   37.6%  UNKNOWN

  83688   31.8%  /lib/x86_64-linux-gnu/libc-2.19.so
   2162    2.6%    /usr/local/bin/node
    583   27.0%      Builtin: StringPrototypeReplace
    148   25.4%        LazyCompile: *InlineLexer.output /hackmd/node_modules/marked/lib/marked.js:585:40
     66   44.6%          LazyCompile: *Parser.tok /hackmd/node_modules/marked/lib/marked.js:1043:32
     22   33.3%            LazyCompile: ~Parser.parse /hackmd/node_modules/marked/lib/marked.js:992:34
     21   31.8%            LazyCompile: *Parser.parse /hackmd/node_modules/marked/lib/marked.js:992:34
     19   28.8%            LazyCompile: *Parser.tok /hackmd/node_modules/marked/lib/marked.js:1043:32
      4    6.1%            RegExp: \\bXbox *\\d+[.\\w_]* {3}
     37   25.0%          LazyCompile: *InlineLexer.output /hackmd/node_modules/marked/lib/marked.js:585:40
     19   51.4%            LazyCompile: *Parser.tok /hackmd/node_modules/marked/lib/marked.js:1043:32
      9   24.3%            LazyCompile: ~Parser.tok /hackmd/node_modules/marked/lib/marked.js:1043:32
      6   16.2%            LazyCompile: *InlineLexer.output /hackmd/node_modules/marked/lib/marked.js:585:40
      1    2.7%            RegExp: \\bChrome\\b
...

Here you are!

We found the marked library that used for parsing markdown cause the problem.


FlameGraph

might be another helpful option

ref.


Yet another story
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →


Once upon a time

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →


The Memory usage goes too high

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →



It fills the instance memory every few hours

and been killed by the OS scheduler


Symptom
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • the response time becomes higher and higher, until the memory usage reaches 100% or above
  • and the application stops responding
  • there is not enough swap left, the server can even fail to accept SSH connections

This problem is the well-known "Memory leaks"


We need to find out who's the killer

again

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →


What tools we have

  • inspector
  • memwatch-next & heapdump

Enable the inspector

use the built-in inspector to understand your app

node --inspect app.js


Use ApacheBench to make some load

ab -k -c 20 -n 250 "http://localhost:3000"

20 concurrent connections with 250 requests per each



There is no graphic server on the production

Let's make the server generates the memory dump


memwatch-next

triggers events when the memory usage still growing after 5 consecutive GCs


const Memwatch = require('memwatch-next'); /** * Check for memory leaks */ let hd = null; Memwatch.on('leak', (info) => { console.log('memwatch::leak'); console.error(info); if (!hd) { hd = new Memwatch.HeapDiff(); } else { const diff = hd.end(); console.error(Util.inspect(diff, true, null)); trace.report('memwatch::leak', { HeapDiff: hd }); hd = null; } }); Memwatch.on('stats', (stats) => { console.log('memwatch::stats'); console.error(Util.inspect(stats, true, null)); trace.report('memwatch::stats', { Stats: stats }); });

View the heapdiff

memwatch::stats           // <-- GC stats

{ num_full_gc: 227,
num_inc_gc: 991,
heap_compactions: 227,

usage_trend: 1.3,         // <-- the usage trending

estimated_base: 49219293,
current_base: 47057208,
min: 41628720,
max: 50299360 }

memwatch::leak            // <-- the leak event

growth=1362440, reason=heap growth over 5 consecutive GCs (1m 56s) - 40.32 mb/hr

// show the heap diff

{ before: { nodes: 401400, size_bytes: 44743648, size: '42.67 mb' },
after: { nodes: 407650, size_bytes: 46306635, size: '44.16 mb' },
change:
{ size_bytes: 1562987,
size: '1.49 mb',
freed_nodes: 24397,
allocated_nodes: 30647,
details:
[ { what: 'Array',
size_bytes: 557728,
size: '544.66 kb',
'+': 5438,
'-': 2962 },
{ what: 'ArrayBuffer',
size_bytes: 560,
size: '560 bytes',
'+': 21,
'-': 14 },
{ what: 'Buffer',
size_bytes: 400,
size: '400 bytes',
'+': 12,
'-': 7 },
{ what: 'BufferList',
size_bytes: 1200,
size: '1.17 kb',
'+': 48,
'-': 23 },
{ what: 'Client', size_bytes: 0, size: '0 bytes', '+': 5, '-': 5 },
{ what: 'Closure',
size_bytes: 28080,
size: '27.42 kb',
'+': 1525,
'-': 1021 },
{ what: 'Code',
size_bytes: 405144,
size: '395.65 kb',
'+': 315,
'-': 61 },
...

It's better if we can have the whole heap to inspect


heapdump

dumping the memory stack of node.js app to a file to inspect later


const http = require('http'); const heapdump = require('heapdump'); const requestLogs = []; const server = http.createServer((req, res) => { if (req.url === '/heapdump') { heapdump.writeSnapshot((err, filename) => { console.log('Heap dump written to', filename) }); } requestLogs.push({ url: req.url, date: new Date() }); res.end(JSON.stringify(requestLogs)); }); server.listen(3000); console.log('Server listening to port 3000. Press Ctrl+C to stop it.'); console.log(`Heapdump enabled. Run "kill -USR2 ${process.pid}" or send a request to "/heapdump" to generate a heapdump.`);

Trigger dumping by USR2 signal

$ node index.js Server listening to port 3000. Press Ctrl+C to stop it. Heapdump enabled. Run "kill -USR2 29431" or send a request to "/heapdump" to generate a heapdump. $ kill -USR2 29431 $ curl http://localhost:3000/heapdump $ ls heapdump-31208326.300922.heapsnapshot heapdump-31216569.978846.heapsnapshot

Load dump to Chrome Dev Tools


Look into the dev tools


Here you are!

We found the gray-matter library that used for generating the page title cause the problem.

ref.


Everything now goes back to normal


Last but not least

the best debbuger of Node.js!


ndb

ndb is an improved debugging experience for Node.js, enabled by Chrome DevTools


Install

npm install -g ndb


Opening ndb

ndb app.js



Placing Breakpoints

ref.


TL;DR

  • look into the CPU: profiling
  • look into the Memory: inspecting
  • let the memory talks: memory dumping
  • the step-by-step debugger: ndb

Thanks for listening!

do you have any questions?


References


References

Select a repo