[GTALUG] On the subject of backups.

Alvin Starr alvin at netvel.net
Mon May 4 14:12:39 EDT 2020


On 5/4/20 1:26 PM, Lennart Sorensen via talk wrote:
> On Mon, May 04, 2020 at 04:38:28PM +0200, ac via talk wrote:
>> Hi Alvin,
>>
>> On a 2TB dataset, with +-600k files, I have piped tree to less with
>> limited joy, it took a few hours and at least I could search for
>> what I was looking for... - 15TB and 100M is another animal though
>> and as disk i/o will be your bottleneck, anything will take long, no?
>>
>> now, for my own info/interest, can you tell me which fs is used for this
>> ext3?
> Hmm, sounds awful slow.
>
> Just for fun I ran find on one of my drives:
>
> # time find /data | wc -l
> 1825463
> real    3m57s.208s
>
> That is with 5.3T used out of 6.0TB.
>
> Running it a second time when it is cached takes 7.7s.  Tree takes 14.7s.
>
> Another volume:
> # time find /mythdata | wc -l
> 54972
>
> real    0m1.924s
>
> That is with 15 TB out of 15 TB in use (yes that one always fills up
> for some reason).
>
> Both of those are lvm volumes with ext4 on top of software raid6 using
> 5400rpm WD red drives.
>
> Seems either XFS is unbelievable bad, or there isn't enough ram to cache
> the filesystem metadata if you are having a problem with 100M files.
> I only have a measly 32GB in my home machine.

I believe the directory hierarchy has a lot to do with the performance.
It seems that the listing time is non-linear although I do not believe 
itsĀ  an N^^2 kind of problem.
I would have said the same as you before I started having to deal with 
10's of millions of files.



-- 
Alvin Starr                   ||   land:  (647)478-6285
Netvel Inc.                   ||   Cell:  (416)806-0133
alvin at netvel.net              ||



More information about the talk mailing list