Re: Deleting many large files spikes filer CPU

From: Blake Golliher (thelastman@gmail.com)
Date: Fri Jan 25 2008 - 13:07:52 EST

  • Next message: Daniel Keisling: "Thin provisioning LUNs, A-SIS, and notifications"

    I've seen this too. it can happen with large file deletions and with
    many many small file deletions. Mostly it has to do with running out
    of zombie processes to reap the deletes. As Chris said, your best bet
    is to delete slowly and cautiously.

    -Blake

    On Jan 25, 2008 6:50 AM, Chris Blackmor <chris.blackmor@amd.com> wrote:
    > There is a known issue regarding large file deletions. I know that
    > NA is actively working this but I cannot speak to an ETA on it's fix.
    >
    > The work around at this point is "don't do that", or at least,
    > "don't do that all at once". Yes, it does seem silly but until
    > they have a fix for this, that's all anyone can do. Your workaround
    > is the "right" thing at this point.
    > C-
    >
    >
    > Stephen C. Losen wrote:
    > > We have a fairly heavily loaded FAS960c pair that contains storage
    > > for our University wide email system. Most of the email storage
    > > is NFS files with the email servers running Unix and Communigate Pro.
    > > We are transitioning to MS Exchange, so these filers also have some
    > > FC SAN LUNs for our emerging Exchange service.
    > >
    > > The other day we cleaned up about a hundred NFS email inboxes, average size
    > > about 100M, but a few were approaching 1G. We removed the files on a NFS
    > > client and immediately after the rm command returned, we experienced a
    > > serious performance problem on the 960s.
    > >
    > > sysstat indicated that the CPU was pegged at near 100% while all I/O
    > > throughput (network, disk, FC SAN) and all file ops (NFS, FCP) dropped to
    > > almost nothing. Something grabbed the filer CPU for a minute or two which
    > > seriously impacted all of our email servers. We had to restart them all.
    > >
    > > I suspect that the CPU load was caused by some processing having to do with
    > > recovering disk blocks freed by the file deletes. But no blocks were
    > > actually freed because the volume had snapshots that were newer than the
    > > deleted files. Perhaps the number of snapshots (41) was a factor.
    > >
    > > I opened a case with netapp on this, but repeating the problem will have
    > > dire consequences on our production email systems, so we can't send them
    > > performance metrics.
    > >
    > > I checked bugs online on NOW and didn't find anything that seemed to apply
    > > that wasn't marked fixed. I did see a very old bug (4157) first fixed in
    > > DOT 5.1, where WAFL would deadlock if many large files were deleted all at
    > > once.
    > >
    > > I was just curious if anyone else has run into anything like this.
    > > We are running DOT 7.2.3. In the future when we delete a lot of big
    > > files, we'll do them one at a time, with sleeps in between.
    > >
    > > Steve Losen scl@virginia.edu phone: 434-924-0640
    > >
    > > University of Virginia ITC Unix Support
    > >
    > >
    > >
    > >
    >
    > --
    > -----------------------------------------------------------------------------
    > * Chris Blackmor _______ | *
    > * Advanced Micro Devices \____ | | A good horse never comes *
    > * Phone: (512) 602-1608 /| | | | in a bad color! *
    > * Fax: (512) 602-5155 | |___| | | *
    > * Email: chris.blackmor@amd.com |____/ \| | Author Unknown*
    > -----------------------------------------------------------------------------
    > * My comments are mine, and mine alone. *
    > -----------------------------------------------------------------------------
    >
    >
    >
    >



    This archive was generated by hypermail 2b29 : Fri Jan 25 2008 - 13:44:20 EST