某个子站是php写的,访问的时候nginx时不时会冒出现502错误,高峰时更频繁,检查php-fpm的日志发现大量的 child exited on signal 7 (SIGBUS),并且和accesslog里的502时间完全吻合,排除了php进程过载的可能,然后又排除了apc的嫌疑。
1,安装strace
yum install strace -y
2,查看php-fpm进程
vagrant@vagrant-ubuntu-precise-64:~$ ps -ef | grep php-fpm
root 2105 1 0 04:02 ? 00:00:02 php-fpm: master process (/etc/php5/fpm/php-fpm.conf)
www-data 2113 2105 0 04:02 ? 00:00:02 php-fpm: pool www
www-data 18481 2105 0 07:05 ? 00:00:01 php-fpm: pool www
www-data 18513 2105 0 07:06 ? 00:00:03 php-fpm: pool www
vagrant 19312 6379 0 10:14 pts/4 00:00:00 grep --color=auto php-fpm
3,调试进程输出日志到文件
vagrant@vagrant-ubuntu-precise-64:~$ strace -f -ff -t -d -p 60798 -e trace=file -o /root/trace/trace.log
Process 2105 attached - interrupt to quit
Process 19349 attached
Process 19350 attached
4,查看日志文件
tail -f /temp/trace.log
14:31:33 open("/www/web/hexinv1/api_web/ThinkPHP/Library/Think/Template/TagLib.class.php", O_RDONLY) = 5
14:31:33 stat("./Application/Runtime/Cache/Erp", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
14:31:34 getcwd("/www/web/hexinv1/api_web", 4096) = 25
14:31:34 open("/www/web/hexinv1/api_web/Application/Runtime/Cache/Erp/ccba9ab6388cca08ce1c762cf5ac043e.php", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 5
14:31:34 getcwd("/www/web/hexinv1/api_web", 4096) = 25
14:31:34 open("/www/web/hexinv1/api_web/Application/Runtime/Cache/Erp/ccba9ab6388cca08ce1c762cf5ac043e.php", O_RDONLY) = 5
14:31:34 --- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRERR, si_addr=0x7f635b686000} ---
14:31:34 +++ killed by SIGBUS +++
5.后端修改缓存,将框架日志关闭后恢复正常,再也没有出现 child exited on signal 7 (SIGBUS)