Ruby Debugging at Unix
Ruby Debugging at Unix
● Process crashes
● Stuck daemons
● Broken network connections
● Memory leaks
● File system errors
Solutions?
● How to deal with such problems?
● How to debug them?
● You can debug Ruby...
● You can't debug black boxes!
Plan of attack
● Debugging UNIX processes
● Networking issues
● At Ruby-level
● At C-level
● File system issues
● Approach
Ruby is a UNIX process
ps — UNIX tool for process monitoring
ps abilities
● View all process for specific:
● User
● Group
● Terminal
● PID
● Parent
● View all threads
● View trie-view of process-list
ps usage
sudo ps aux | grep ruby — looking for PIDs of
all ruby process
sudo ps auxH — show all process with threads
sudo ps auxf — tree-view of all system
process
ps aux | grep ruby
cris@home:/home/cris ps aux |grep ruby
cris 21067 0.2 4.0 140236 125756 pts/0 Sl+ 12:56 1:19 ruby
script/server --debugger
cris@home:/home/cris ps aux|grep firefox
cris 19296 43.1 8.0 424828 249048 ? Rl 10:43 256:04
/usr/lib/firefox-3.5.9/firefox
cris@home:/home/cris ps aux|grep nginx
root 1649 0.0 0.0 4664 736 ? Ss May17 0:00 nginx: master
process /usr/sbin/nginx
www-data 1650 0.0 0.0 5200 1824 ? S May17 0:02 nginx: worker
process
ps auxH
cris@home:/home/cris ps auxH|egrep 'ruby|firefox|nginx'
root 1649 0.0 0.0 4664 736 ? Ss May17 0:00 nginx: master process
www-data 1650 0.0 0.0 5200 1824 ? S May17 0:02 nginx: worker process
cris 19296 41.8 8.1 424828 250992 ? Rl 10:43 250:13 /usr/lib/firefox-3.5.9/firefox
cris 19296 0.0 8.1 424828 250992 ? Sl 10:43 0:02 /usr/lib/firefox-3.5.9/firefox
cris 19296 0.0 8.1 424828 250992 ? Sl 10:43 0:01 /usr/lib/firefox-3.5.9/firefox
cris 19296 1.3 8.1 424828 250992 ? Rl 10:43 8:15 /usr/lib/firefox-3.5.9/firefox
cris 19296 0.0 8.1 424828 250992 ? Sl 10:43 0:00 /usr/lib/firefox-3.5.9/firefox
cris 19296 0.0 8.1 424828 250992 ? Sl 10:43 0:00 /usr/lib/firefox-3.5.9/firefox
cris 19296 0.0 8.1 424828 250992 ? Sl 10:43 0:04 /usr/lib/firefox-3.5.9/firefox
cris 19296 0.0 8.1 424828 250992 ? Sl 10:43 0:00 /usr/lib/firefox-3.5.9/firefox
cris 19296 0.0 8.1 424828 250992 ? Sl 12:44 0:00 /usr/lib/firefox-3.5.9/firefox
cris 19296 0.0 8.1 424828 250992 ? Sl 15:01 0:00 /usr/lib/firefox-3.5.9/firefox
cris 21067 0.2 4.0 140236 125756 pts/0 Sl+ 12:56 1:00 ruby script/server --debugger
cris 21067 0.0 4.0 140236 125756 pts/0 Rl+ 12:56 0:19 ruby script/server --debugger
ps auxf
cris@home:/home/cris ps auxf
postgres 23499 0.0 0.0 42148 1164 ? S Feb27 0:56 /usr/bin/postgres -D /var/lib/postgresql
postgres 25278 0.0 0.3 42280 6976 ? Ss Apr09 0:01 \_ postgres: writer process
postgres 25279 0.0 0.0 42148 816 ? Ss Apr09 0:01 \_ postgres: wal writer process
postgres 25280 0.0 0.0 42280 1012 ? Ss Apr09 0:01 \_ postgres: autovacuum launcher process
postgres 25281 0.0 0.0 13604 936 ? Ss Apr09 0:09 \_ postgres: stats collector process
www-data 312 0.0 0.0 2556 572 ? Ss Mar02 4:41 redis-server /etc/redis.conf
www-data 13956 0.0 0.0 0 0 ? Z 17:43 0:00 \_ [redis-server] <defunct>
www-data 423 0.0 0.1 6556 2880 ? Sl 13:54 0:00 PassengerNginxHelperServer passenge
www-data 438 0.0 0.6 22512 12196 ? S 13:54 0:00 \_ Passenger spawn server
www-data 10722 1.0 4.3 91968 77148 ? S 17:27 0:09 \_ Passenger ApplicationSpawner:
www-data 442 0.0 0.0 6000 800 ? Ss 13:54 0:00 nginx: master process
www-data 451 0.0 0.1 6300 1836 ? S 13:54 0:00 \_ nginx: worker process
www-data 467 0.0 0.1 6300 1840 ? S 13:54 0:00 \_ nginx: worker process
www-data 9651 0.0 0.0 1892 512 ? Ss 14:37 0:00 QUEUE=default rake resque:run
www-data 9685 0.1 4.3 91960 77680 ? S 14:37 0:13 \_ resque-1.5.0: Waiting for default
www-data 9661 0.0 0.0 1888 504 ? Ss 14:37 0:00 QUEUE=uploader rake resque:run
www-data 9696 0.1 4.3 91956 77704 ? S 14:37 0:12 \_ resque-1.5.0: Waiting for uploader
kill — send signal to process
kill usage
kill -N PID — common form
ps aux|grep Passenger
where:
-s — max size of line
-tt — show time
-p — pid of monitoring process
strace usage example
cris@home:/home/cris strace -tt -s 1000 -p 23987
Process 23987 attached - interrupt to quit
09:03:38.907564 select(8, [7], [], [], NULL^[OF) = ? ERESTARTNOHAND (To be
restarted)
09:04:01.141896 --- SIGABRT (Aborted) @ 0 (0) ---
Process 23987 detached
strace filtering via grep
cris@home:/home/cris strace -tt -s 1000 -p 679 2>&1 | grep -v sigprocmask
Process 679 attached - interrupt to quit
09:15:43.725633 select(8, [7], [], [], NULL) = 1 (in [7])
09:15:48.007594 accept(7, {sa_family=AF_INET, sin_port=htons(38510),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 5
09:15:48.007890 fcntl64(5, F_GETFL) = 0x2 (flags O_RDWR)
09:15:48.007956 fstat64(5, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
09:15:48.008090 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0xb7896000
09:15:48.008167 _llseek(5, 0, 0xbf9fa68c, SEEK_CUR) = -1 ESPIPE (Illegal seek)
09:15:48.008248 fcntl64(5, F_GETFL) = 0x2 (flags O_RDWR)
09:15:48.008304 fstat64(5, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
09:15:48.008430 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0xb7895000
09:15:48.008496 _llseek(5, 0, 0xbf9fa68c, SEEK_CUR) = -1 ESPIPE (Illegal seek)
09:15:48.008749 setsockopt(5, SOL_TCP, TCP_CORK, [1], 4) = 0
09:15:48.010250 select(6, [5], [], [], {0, 0}) = 1 (in [5], left {0, 0})
09:15:48.010734 gettimeofday({1274336148, 10755}, NULL) = 0
09:15:48.011137 select(8, [5 7], [], [], NULL) = 1 (in [5])
Networking: HTTP
http://tools.ietf.org/html/rfc2616
HTTP issues
● Session
● Cookies
● Mime-types
● Encoding
● AJAX-requests
HTTP investigation tools
Client Side:
Firefox: Firebug plugin (F12)
Safari, Chrome: Developer Tools (Ctrl-Shift-i)
Opera: DragonFly (Ctrl-Shift-i)
Server Side:
telnet, nc — good at server-side
curl, wget — more complex HTTP-requests
Firebug for HTTP-debug
Debugging HTTP with WebKit
Debugging HTTP with Opera
All text-oriented protocols can be
debugged via telnet
Telnet
● HTTP
● Memcache
● Redis
● POP3
● STOMP
Debugging HTTP via telnet
cris@home:/home/cris telnet google.com.ua 80
Trying 74.125.87.103...
Connected to google.com.ua.
Escape character is '^]'.
GET http://www.google.com.ua/ HTTP/1.0
Host: www.google.com.ua
HTTP/1.0 200 OK
Date: Mon, 17 May 2010 20:37:38 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=windows-1251
Set-Cookie: PREF=ID=a2699d840401b148:TM=1274128658:LM=1274128658:S=nCiLvHHjeNDpnIFo;
expires=Wed, 16-May-2012 20:37:38 GMT; path=/; domain=.google.com.ua
Server: gws
X-XSS-Protection: 1; mode=block
http://www.faqs.org/rfcs/rfc793.html
tcpdump
where:
-i — interface
-s — packet size(default only 68 byte)
-A — print in ASCII
tcp port 3000 - is a expression (man pcap-filter)
tcpdump in action
cris@home:/home/cris sudo tcpdump -iany -A -s 0 tcp port 3000
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
22:52:15.133580 IP localhost.53478 > localhost.3000: Flags [S], seq 19847865, win 32792, options [mss
16396,sackOK,TS val 23239272 ecr 0,nop,wscale 6], length 0
E..<@.@.@...........................;=....@....
.b.h........
22:52:15.133595 IP localhost.3000 > localhost.53478: Flags [S.], seq 18122975, ack 19847866, win 32768,
options [mss 16396,sackOK,TS val 23239272 ecr 23239272,nop,wscale 6], length 0
E..<..@.@.<...............................@....
.b.h.b.h....
22:52:15.133607 IP localhost.53478 > localhost.3000: Flags [.], ack 1, win 513, options [nop,nop,TS val
23239272 ecr 23239272], length 0
E..4@.@.@..................................
.b.h.b.h
22:52:15.133630 IP localhost.53478 > localhost.3000: Flags [P.], seq 1:1423, ack 1, win 513, options
[nop,nop,TS val 23239272 ecr 23239272], length 1422
E...@.@.@..`...............................
.b.h.b.hGET /users/4?_=1274298735118 HTTP/1.0
X-Real-IP: 127.0.1.1
X-Forwarded-For: 127.0.1.1
Host: 127.0.0.1
Connection: close
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100401 Ubuntu/9.10 (karmic)
Firefox/3.5.9
Accept: */*
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
X-Requested-With: XMLHttpRequest
Referer: http://127.0.0.1/users
Cookie: __utma=38596661.1128088787.1267800550.1274193594.1274197735.24;
tcpdump debugging use-cases
● Check network activity on selected
port/interface
● Check whether we realy use some connection
● Check what exactly we send
● Catch all data, which are sent
tcpdump advanced usage example
require 'rubygems'
require 'ruby-debug'
debugger
puts "hello debug world"
Ruby-debug: commands
n — execute next statement
s — 'step into', enter into statement
p command — execute 'command'
l — list current sources
Enter — repeat last command
h — help
h command — help for particular commands
ruby-debug with Rails
Git: http://github.com/tmm1/gdb.rb
Slides:
http://www.slideshare.net/tmm1/debugging-ruby
Division by half
Division by half
● Memory leaks
● Performance bottlenecks
● Process crash
Division by half: mongrel leaks
1. Divide by half on controllers level
2. Divide by half on actions level
3. Divide by half on sources level
Memory leaks
Memory leaks in the wild
● Usage of method 'load'
● Working with big data arrays
● Selecting big data arrays from DB
load 'file'
C-level:
Valgrind
File system debugging
● Leaked file descriptors(forget to close file)
● Leaked tempfiles
● Monitoring read/write file operations
● Monitoring filesystem activity
lsof — list opened files
lsof — netstat for files
lsof
● Allow to see all opened files for particular:
● User
● Process
● Directory or any device
● And also all opened sockets
● Allow to detect who works with some file
● And much more... (see man lsof)
lsof or netstat
sudo lsof -i -U -n
works like:
where:
-s — number of bytes to show
-e expression — expression for event
filtering
Logfile: dealing with group