[GTALUG] On the subject of backups.

nick xerofoify at gmail.com
Mon May 4 19:28:10 EDT 2020



On 2020-05-04 2:12 p.m., Alvin Starr via talk wrote:
> On 5/4/20 1:26 PM, Lennart Sorensen via talk wrote:
>> On Mon, May 04, 2020 at 04:38:28PM +0200, ac via talk wrote:
>>> Hi Alvin,
>>>
>>> On a 2TB dataset, with +-600k files, I have piped tree to less with
>>> limited joy, it took a few hours and at least I could search for
>>> what I was looking for... - 15TB and 100M is another animal though
>>> and as disk i/o will be your bottleneck, anything will take long, no?
>>>
>>> now, for my own info/interest, can you tell me which fs is used for this
>>> ext3?
>> Hmm, sounds awful slow.
>>
>> Just for fun I ran find on one of my drives:
>>
>> # time find /data | wc -l
>> 1825463
>> real    3m57s.208s
>>
>> That is with 5.3T used out of 6.0TB.
>>
>> Running it a second time when it is cached takes 7.7s.  Tree takes 14.7s.
>>
>> Another volume:
>> # time find /mythdata | wc -l
>> 54972
>>
>> real    0m1.924s
>>
>> That is with 15 TB out of 15 TB in use (yes that one always fills up
>> for some reason).
>>
>> Both of those are lvm volumes with ext4 on top of software raid6 using
>> 5400rpm WD red drives.
>>
>> Seems either XFS is unbelievable bad, or there isn't enough ram to cache
>> the filesystem metadata if you are having a problem with 100M files.
>> I only have a measly 32GB in my home machine.
> 
> I believe the directory hierarchy has a lot to do with the performance.
> It seems that the listing time is non-linear although I do not believe its  an N^^2 kind of problem.
> I would have said the same as you before I started having to deal with 10's of millions of files.
> 
> 
> 

The first question I would have is how big are the actual files versus space used. Most file systems
try to merge files that don't allocate in a single block to save blocks and get better space usage.
Assuming a) the size of the files being rather small and b) The amount, I would be curious if ReiserFS 
or brtfs helps either as most have much better tail merging into metadata blocks to my knowledge. Better
packing the small files can help as disk seeks are a problem here it seems. My disks will solve to like
10 MB/S on ext4 with lots of small files like this due to seeks.

The other question as pointed out here is how much memory the page cache and other kernel caches are
using. I would check /proc/meminfo to start as this may be another logical solution already pointed
out.

Maybe that helps,

Nick


More information about the talk mailing list