[GTALUG] logrotate problem

Giles Orr gilesorr at gmail.com
Tue Oct 29 13:12:08 EDT 2019


On Thu, 17 Oct 2019 at 13:59, Giles Orr <gilesorr at gmail.com> wrote:

>
> We have a bunch of new(ish) Debian 10 VMs, and logrotate is failing to
> rotate our non-standard logs.  Unfortunately we deleted all the old Debian
> 9 VMs before I noticed this problem, so they're not readily available for
> comparison.  The logrotate config files worked fine on Debian 9
> (provisioning is with Ansible, so it's consistent).  The failures aren't
> detailed enough to help.  Here's the config:
>
>     # /etc/logrotate.d/ruby
>     /opt/rubyapp/log/*.log {
>             daily
>             missingok
>             rotate 28
>             compress
>             delaycompress
>             copytruncate
>     }
>
> The parent configuration is standard Debian 10:
>
>     # /etc/logrotate.conf
>     # (system-supplied comments removed)
>     weekly
>     rotate 4
>     create
>     include /etc/logrotate.d
>
> Unfortunately my paranoia is such that I'm redacting or modifying machine
> names and folder names ... I apologize for that.  But I don't think the
> path involved is the problem.
>
> Here's one of the errors:
>
>     # systemctl status logrotate.service
>     ● logrotate.service - Rotate log files
>        Loaded: loaded (/lib/systemd/system/logrotate.service; static;
> vendor preset: enabled)
>        Active: failed (Result: exit-code) since Thu 2019-10-17 00:00:17
> EDT; 12h ago
>          Docs: man:logrotate(8)
>                man:logrotate.conf(5)
>       Process: 29004 ExecStart=/usr/sbin/logrotate /etc/logrotate.conf
> (code=exited, status=1/FAILURE)
>      Main PID: 29004 (code=exited, status=1/FAILURE)
>
>     Oct 17 00:00:01 acctserver systemd[1]: Starting Rotate log files...
>     Oct 17 00:00:14 acctserver logrotate[8621]: error: unable to open
> /opt/rubyapp/log/newrelic_agent.log.1 for compression
>     Oct 17 00:00:14 acctserver logrotate[8621]: error: unable to open
> /opt/rubyapp/log/puma.stderr.log.1 for compression
>     Oct 17 00:00:14 acctserver logrotate[8621]: error: unable to open
> /opt/rubyapp/log/puma.stdout.log.1 for compression
>     Oct 17 00:00:14 acctserver logrotate[8621]: error: unable to open
> /opt/rubyapp/log/traffic.log.1 for compression
>     Oct 17 00:00:17 acctserver systemd[1]: logrotate.service: Main process
> exited, code=exited, status=1/FAILURE
>     Oct 17 00:00:17 acctserver systemd[1]: logrotate.service: Failed with
> result 'exit-code'.
>     Oct 17 00:00:17 acctserver systemd[1]: Failed to start Rotate log
> files.
>
> Here's the folder contents:
>
>     # cd /opt/rubyapp/log
>     # ls -l
>     -rw-rw-r--+ 1 root root        1982 Oct 16 15:08 newrelic_agent.log
>     -rw-rw-r--+ 1 root root        7194 Oct 16 13:37 newrelic_agent.log.1
>     -rw-rw-r--+ 1 root root        2549 Oct 10 17:45
> newrelic_agent.log.2.gz
>     -rw-rw-r--+ 1 root root      154290 Oct 17 12:34 puma.stderr.log
>     -rw-rw-r--+ 1 root root      573253 Oct 16 13:37 puma.stderr.log.1
>     -rw-rw-r--+ 1 root root      512648 Oct 10 17:45 puma.stderr.log.2.gz
>     -rw-rw-r--+ 1 root root         238 Oct 16 15:08 puma.stdout.log
>     -rw-rw-r--+ 1 root root         722 Oct 16 13:37 puma.stdout.log.1
>     -rw-rw-r--+ 1 root root         701 Oct 10 17:45 puma.stdout.log.2.gz
>     -rw-rw-r--+ 1 root root  4747006453 Oct 17 12:37 traffic.log
>     -rw-rw-r--+ 1 root root 15668065757 Oct 10 17:55 traffic.log.1
>     -rw-rw-r--+ 1 root root   850646513 Sep 20 18:12 traffic.log.2.gz
>
> I note that in /var/log/ - where logrotate continues to work fine - that
> files are owned mostly root:adm (what is 'adm', and does it matter in this
> context?) and the permissions are 640 rather than 664.  There are ACLs
> attached to the files/folder shown above ... does _that_ matter?  Where
> this gets weirder is that if I run 'logrotate --force
> /etc/logrotate.d/ruby' it gets rotated fine.  It runs fine if run by hand,
> it fails if run on a SystemD timer.  Which suggests a difference in
> permissions, but don't timers run as root:root?
>
> Any thoughts appreciated.  As you can see, these are damn big logs, and we
> have this problem across multiple machines so I'd really like to fix it ...
>
> Errors on other servers aren't always consistent with this: a fix for this
> may or may not help with them, so I may be coming back for more.
>
> Thanks all.
>

I thought I should report the slightly unsatisfactory conclusion to this
month-plus epic fight.

I'm still fairly sure that the problem is caused by the ACLs applied to the
folder and files.  Which would suggest that the SystemD timer that runs
'logrotate' each night has some weird not-quite-root permissions.  I did
try to use logrotate's "su" directive to run logrotate as the user/group
that have the added perms.  This didn't work either.  For all that, I never
conclusively proved it was a perms or ACLs issue.

Having previously and repeatedly proven that running logrotate by hand as
root always successfully rotated the logs, I moved /etc/logrotate.d/ruby to
/etc/logrotate.ruby.conf and it's now run on a nightly root cron job
instead of the SystemD timer.  Essentially I've reverted the behaviour -
for the one application log folder - from Debian 10 to Debian 9.  <sigh>
I'm not happy about adding complexity in the form of an oddball exception,
but with Ansible automation it's not too hard to maintain.

-- 
Giles
https://www.gilesorr.com/
gilesorr at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gtalug.org/pipermail/talk/attachments/20191029/f7c2ba34/attachment.html>


More information about the talk mailing list