Re: snapmirror job "hung" and abort won't

From: R.P. Aditya (aditya@grot.org)
Date: Wed Feb 06 2008 - 09:03:11 EST

  • Next message: Herret, Hannes: "RE: Deleting many large files spikes filer CPU"

    On Wed, Feb 06, 2008 at 10:39:03AM +0100, De Wit Tom (Consultant) wrote:
    > Did you already try to put the destination volume offline ? Normally
    > this also breaks all running transfers (or aborting ones) to that
    > volume.

    thanks, tried that, didn't help -- I suspect the problem is with the source
    filer and probably a lock that isn't getting cleaned up since the snapshots
    for the transfer are locked too

    Thanks,
    Adi

    > This helped me a few times with hanging transfers ...
    >
    > Grtz,
    > Tom
    >
    > -----Original Message-----
    > From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com]
    > On Behalf Of R.P. Aditya
    > Sent: dinsdag 5 februari 2008 18:37
    > To: toasters@mathworks.com
    > Subject: Re: snapmirror job "hung" and abort won't
    >
    > based on off-list suggestions, I've tried turning off snapmirror and
    > unlicensing snapmirror, unfortunately it did not help.
    >
    > > ssh root@boxcar snapmirror off
    > > ssh root@boxcar snapmirror status
    > Snapmirror is off.
    > Source Destination State Lag
    > Status
    > boxcar:ctfs2008 flatcar:ctfs2008 Source 00:11:26
    > Idle
    > boxcar:orabackup flatcar:orabackup Source - Aborting
    > boxcar:watsadmin flatcar:watsadmin Source 00:11:26
    > Idle
    >
    > same for unlicensing and licensing -- a quiesce on the destination filer
    > shows:
    >
    > flatcar> snapmirror quiesce flatcar:orabackup
    > snapmirror quiesce: in progress
    > This can be a long-running operation. Use Control - C (^C) to
    > interrupt.
    > snapmirror quiesce: orabackup : destination is not in snapmirrored state
    >
    > Regarding another query, this job has been running for months, so it was
    > not
    > interrupted during a baseline transfer.
    >
    > Thanks,
    > Adi
    >
    > On Mon, Feb 04, 2008 at 09:15:02PM +0000, R.P. Aditya wrote:
    > > I have a filer pair where the source filer shows this:
    > >
    > > Snapmirror is on.
    > > Source Destination State Lag
    > Status
    > > boxcar:ctfs2008 flatcar:ctfs2008 Source 00:04:43
    > Idle
    > > boxcar:orabackup flatcar:orabackup Source - Aborting
    > > boxcar:watsadmin flatcar:watsadmin Source 01:04:45 Transferring
    > (1352 MB done)
    > >
    > > and the destination filer, after various attempts to abort a job on
    > the
    > > source/destination shows:
    > >
    > > Snapmirror is on.
    > > Source Destination State Lag
    > Status
    > > boxcar-vif0:ctfs2008 flatcar:ctfs2008 Snapmirrored 00:04:29
    > Idle
    > > boxcar-vif0:orabackup flatcar:orabackup Broken-off 150:34:31
    > Idle
    > > boxcar-vif0:watsadmin flatcar:watsadmin Snapmirrored 01:04:31
    > Transferring
    > > (1566 MB done)
    > >
    > > the boxcar:orabackup to flatcar:orabackup job is in a bad state and
    > sending
    > >
    > > snapmirror abort -h flatcar:orabackup
    > >
    > > on the source filer doesn't do anything -- the CLI is unresponsive
    > since that
    > > command is issued and I have to send commands via ssh and there are
    > snapshots
    > > from the snapmirror that are still busy:
    > >
    > > Volume orabackup
    > > working...
    > >
    > > %/used %/total date name
    > > ---------- ---------- ------------ --------
    > > 40% (40%) 28% (28%) Jan 29 15:34 flatcar(0101184681)_orabackup.3664
    > (busy)
    > > 40% ( 1%) 28% ( 0%) Jan 29 14:34 flatcar(0101184681)_orabackup.3663
    > (busy)
    > >
    > > Netapp support recommended rebooting the source, which seems a bit
    > drastic
    > > (and hard to do midweek) esp. since there are two other snapmirror
    > jobs
    > > working fine, and in other respects everything is well.
    > >
    > > The immediate problem is that those snapshots are eating a lot of
    > space and I
    > > get:
    > >
    > > > snap delete -a -f orabackup
    > > snap delete -a: Remaining snapshots are currently
    > > in use by dump,
    > > snap restore, SnapMirror, a CIFS share, RAID mirroring, LUNs or
    > > retained by SnapLock.
    > > Please try to delete remaining snapshots later.
    > >
    > > if I try to delete them manually...
    > >
    > > This problem started when the destination filer suffered a power
    > outage,
    > > presumably in the middle of a snapmirror transfer on the orabackup
    > volume.
    > >
    > > any ideas short of a reboot?
    > >
    > > Thanks,
    > > Adi
    >



    This archive was generated by hypermail 2b29 : Wed Feb 06 2008 - 09:49:13 EST