Deleting many large files spikes filer CPU

From: Stephen C. Losen (scl@sasha.acc.virginia.edu)
Date: Fri Jan 25 2008 - 08:02:59 EST

  • Next message: Chris Blackmor: "Re: Deleting many large files spikes filer CPU"

    We have a fairly heavily loaded FAS960c pair that contains storage
    for our University wide email system. Most of the email storage
    is NFS files with the email servers running Unix and Communigate Pro.
    We are transitioning to MS Exchange, so these filers also have some
    FC SAN LUNs for our emerging Exchange service.

    The other day we cleaned up about a hundred NFS email inboxes, average size
    about 100M, but a few were approaching 1G. We removed the files on a NFS
    client and immediately after the rm command returned, we experienced a
    serious performance problem on the 960s.

    sysstat indicated that the CPU was pegged at near 100% while all I/O
    throughput (network, disk, FC SAN) and all file ops (NFS, FCP) dropped to
    almost nothing. Something grabbed the filer CPU for a minute or two which
    seriously impacted all of our email servers. We had to restart them all.

    I suspect that the CPU load was caused by some processing having to do with
    recovering disk blocks freed by the file deletes. But no blocks were
    actually freed because the volume had snapshots that were newer than the
    deleted files. Perhaps the number of snapshots (41) was a factor.

    I opened a case with netapp on this, but repeating the problem will have
    dire consequences on our production email systems, so we can't send them
    performance metrics.

    I checked bugs online on NOW and didn't find anything that seemed to apply
    that wasn't marked fixed. I did see a very old bug (4157) first fixed in
    DOT 5.1, where WAFL would deadlock if many large files were deleted all at
    once.

    I was just curious if anyone else has run into anything like this.
    We are running DOT 7.2.3. In the future when we delete a lot of big
    files, we'll do them one at a time, with sleeps in between.

    Steve Losen scl@virginia.edu phone: 434-924-0640

    University of Virginia ITC Unix Support



    This archive was generated by hypermail 2b29 : Fri Jan 25 2008 - 09:20:58 EST