Recommendation for open-source Nginx log analyzer for website analytics

I am looking for a solution(preferably a CLI) that can analyze and give me important statistics of my website using nginx as web server. Top data points I'm interested in

* Total no. of requests
* Total no. of unique IPs
* URL of frequent requests
* Total downtime vs uptime

I am currently evaluating GoAccess [https://github.com/allinurl/goaccess](https://github.com/allinurl/goaccess)

**Is there any better and simpler solution out there?**

I'd prefer couple of bash commands which can get above data without any external dependency if possible: for example, I can get # of unique IPs by following command `sudo awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | wc -l` (although I'm not sure if access.log is missing historical data or not as I see many other log files in the nginx log folder.

2 thoughts on “Recommendation for open-source Nginx log analyzer for website analytics”

  1. Actually thank you for your suggestion. Goaccess looks pretty useful. The old ones I know of can be found if you searched for apache log analyzers, and I believe all require you fire up a browser in some way. I’m sure you can adjust/change your logging format to work with these analyzers as well. I’ll list them in case you find them handy. I’ve not setup or used analytics in a long time, so I don’t know if they suite your needs, but they should retain historical data, but it must first be in your current logs.

    * https://awstats.sourceforge.io/
    * https://www.c-amie.co.uk/software/analog/
    * Formerly Piwik, alternative to google analytics https://matomo.org/free-software/

    There’s also webalizer, but the site didn’t load at the time when I looked this up.

    Reply
  2. Don’t forget about Google Analytics — it’s very good and keeps getting better and more useful

    To be honest, I use grep, awk, cut, sort, uniq and ‘ccze -a’ to colorize output from the commandline way more than anything else. It’s worth learning them and key options – it will pay off many orders of magnitude.

    A few quick commandline examples:

    \#case insensitive match ‘error’ in nginx access.log and colorize output

    grep -i error /var/log/nginx/access.log |ccze -A

    \# how many 404 errors today?

    grep “\\” 404 ” /var/log/nginx/access.log |grep “18/Jul/2020” |wc -l

    \# what caused 404 errors, how many times did each one happen, and sort based on # times, and colorize

    grep “\\” 404 ” /var/log/nginx/access.log |grep “18/Jul/2020″ |cut -d \\” -f 2 |sort |uniq -c |sort -rh |ccze -A

    and to answer your examples:

    \# Total # of http requests

    grep “18/Jul/2020” /var/log/nginx/access.log |wc -l

    \# Total # of http requests that generated a 200 response code:

    grep “18/Jul/2020” /var/log/nginx/access.log |grep “\\” 200 ” |wc -l

    \# total number of unique IPs for today’s requests (if you’re behind a firewall or vpc, your webserver needs to be configured properly to forward customer IPs, and customers could be behind firewalls too, so… don’t put too much weight on this stat

    grep “18/Jul/2020” /var/log/nginx/access.log |awk ‘{print $1}’ |sort -u |wc -l

    \# unique IPs today sorted by # of requests

    grep “18/Jul/2020” /var/log/nginx/access.log |awk ‘{print $1}’ |sort |uniq -c |sort -rh

    \# urls of today’s requests – do you mean the referrer url or the request?

    top 20 referrer urls for today:

    grep “18/Jul/2020″ /var/log/nginx/access.log |cut -d \\” -f 4 |sort |uniq -c |sort -rh |head -20

    top 20 webserver requests for today

    grep “18/Jul/2020″ /var/log/nginx/access.log |cut -d \\” -f 2 |sort |uniq -c |sort -rh |head -20

    ===

    Now, total downtime vs. uptime… that’s not as easy as it looks.

    server uptime tracking: ‘apt get install uptimed’ ; then run ‘uprecords’ – there’s your server uptime

    webserver uptime – probably easiest to assume that this is equal to above assuming webserver autostarts. If your webserver is mysteriously dying… then you have a whole other problem… you can monitor a running process using ‘supervisord’, ‘monit’, or a slew of other tools

    What you really care about monitoring is APPLICATION availability. If your app is written in php and uses a mysql backend, then you need a ‘ping’ that involves an http request to a php file that does the simplest database query. If it succeeds, then all is good. If it fails, it doesn’t matter that your webserver is running… Run this as often as you feel necessary and set alerting or autoremediation if it fails

    Reply

Leave a Comment