Alerts and users complaining about the system not being responsive I took a closer look and saw hundreds of concurrent users from all over the world. First reaction was wow, what happened ? Did we get mentioned on some high volume site or what.
Well the bad news was most of that traffic was from script kiddies. The usual approach to this is to fight them and block them with all measures you have. I wrote an article on that
But this attacker seems to be able to avoid those traps. I still need to figure out how they did that, but that will be for another blog.
Instead I decided to take up the challenge. My poor system was hit with 80 to 90% CPU usage, 12GB of memory and was swapping as hell.
While the system was reasonably tuned (at least I thought) it was clearly not up for this challenge. I worked hours to tune (untill now), mostly on the database part and was able to handle the traffic with an easy 1 to 2% of CPU usage and just 3GB of RAM, coming from 12GB (and no swapping).
What where the most important findings ?
- Disable memcache, it should provide better performance but on a single system with SSD and well tuned MYSQL INNODB engine it turned out it doesn't.
- Allthough the main database is INNODB and that was reasonably tuned, mysql also uses another system only database (mysql) and that needed attention to. Increasing key_buffer_size up to the size of the MYISAM database made a huge impact
- Increasing the query_cache_size to supersede the size of the database itself also made quite an impact on performance.
- Increasing the max_connections to 512 since my value of 256 was insufficient and caused some users to get time outs or lost DB connections in the logs. This usually increases memory a lot since buffers are used per connections, reducing those to a more suitable level made it possible without sacrificing memory.
-Reducing the innodb_buffer_pool_size to just beyond the size of the database saved me a lot of memory without impacting performance. The mysqld performance guidance recommends to use 80 to 90% of your available system memory for it. But that is not a good recommendation. Having it sized just to be big enough to handle to whole database is more than enough.
- Reduce logging. PHP was logging a lot of unnecessary stuff to the database. Increasing the total DB to 10 times the true size and therefore me needing to have a large buffer pool size. I also reduced apache logging and instructed php to only log ERRORS and not warnings.
- While I was already offloading traffic, I decided to try and use cloudflare (http://www.cloudflare.com). They had free support for SSL based web sites and lot's of nice stuff to help you improve performance. You would think that this was the best move of all. It wasn't, since the site is very dynamic, it did not made a huge impact. It did help users to reduce round trip delay and therefore helping them anyway.
- Using OPC as a JIT compiler and caching engine was already in place, but recommended anyway.
Final result was that while the script kiddies were still doing their dirty work, my system had a 1 to 2% CPU usage, page load was reduced by 50% (while under attack) and even the required memory was only 3 GB (as to 12GB starting).
I am not convinced that I can handle much larger attacks, and I still need to shutdown the useless traffic I learned a lot and know for sure that anyone can improve their site performance if they are willing to.
If you want us/me to do it for you. You know how to find me. Use the contact form https://www.centillien.com/contact/NetCare for further information and quotes.
That's all for today.
NetCare is een ervaren ICT organisatie, die praktische en uitvoerbare oplossingen levert op de gebieden detachering, software ontwikkeling en ons sociaal intranet applicatie MyVox om zo toegevoegde...