How to balance volume priority (some NFS vs. CIFS)

From: Adam McDougall (mcdouga9@egr.msu.edu)
Date: Sun Feb 08 2009 - 21:19:22 EST

  • Next message: Pat Breen: "Re: How to balance volume priority (some NFS vs. CIFS)"

    I have not opened a case with Netapp yet but probably will if no one has
    any good ideas; I just like to pick people's brains before going
    official. Thanks for any input.

    A few months ago we moved a file share off a Windows server onto our
    FAS3040 Netapp running 7.2.4 and shared it out via CIFS. It contains
    software install files and scripts and depending on scheduled jobs, it
    can get hit pretty hard and pushes out approximately 1 Gbit/sec, which
    has been drastically affecting the service times for our other shares on
    that filer, and its namely response-sensitive NFS shares we care about
    that are affected the most such as mail and web files. It doesn't
    really seem to be a disk bottleneck because the disk read/sec in sysstat
    is usually only half of what the filer is pushing out to the network, so
    I assume its reading some data from cache. The CIFS software install
    share can either get hit by 1-60+ CIFS clients where each client reads
    files on and off for hours at a time, or sometimes we have hundreds of
    clients hitting the share at once for a smaller set of files (such as to
    update one software package across a large set of PCs). I've been able
    to reproduce the slowdown with just 4 CIFS clients on gigabit
    downloading a large file from the share. Sometimes it only causes a
    modest slowdown in the NFS response time but sometimes email messages
    being moved between folders will stall for 8 seconds or much more, which
    is pretty much unacceptable. I don't think its a bottleneck in my core
    network because I've done tests where the slow nfs client is on the same
    switch as the filer, which is connected via two gig links using LACP.
    Also, in the normal situation where the slowdown is encountered, mail
    (NFS) traffic is flowing through a different gig uplink than the hungry
    CIFS clients.

    Goal: reduce the impact of greedy clients (primarily known ones, but
    hopefully unexpected ones too) on the response time of the rest of the
    filer's clients. I don't care if the CIFS software share must accept
    slower data rates, and I'd rather not run away from the problem by
    avoiding it but rather learn what I can do to prevent my filer from
    being held hostage by greedy clients. I do have another 3040 I could
    move the share to, but that filer also has volumes that would be
    affected negatively in the same way, and I'd rather not concede defeat
    and go back to hosting the share on a dedicated windows server. I can
    try different code versions in a test environment if I need to, but I'd
    like to think this kind of situation would have come up already and have
    a solution at hand.

    I've played around with na_priority trying to set the mail and website
    volumes to high or veryhigh priority and the software share to low or
    verylow but that isn't making a measurable impact. I'm not really sure
    what to tweak or check next.

    Here is an example from sysstat when I am simulating the slowdown
    condition with 4 CIFS clients on gigabit fetching the same file.

     CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s
    Cache
                                   in out read write read
    write age
      6% 2058 167 0 751 1543 2196 0 0
    0 11
      6% 2590 164 0 699 2238 2904 32 0
    0 11
     10% 2183 223 0 1241 4471 5072 17872 0
    0 11
     11% 3299 799 0 1577 22194 4935 1183 0
    0 11
     22% 3298 3072 0 3005 107869 9128 24 0
    0 11
     18% 2532 1986 0 2270 87651 2078 0 0
    0 11
     18% 2198 2200 0 1696 105941 8032 8 0
    0 11
     16% 3597 1650 0 1890 84691 3528 24 0
    0 11
     23% 4946 2216 0 2604 112741 14664 0 0
    0 11
     22% 4075 2041 0 2324 100380 21568 0 0
    0 11
     CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s
    Cache
                                   in out read write read
    write age
     21% 3272 2246 0 2862 115380 4688 24 0
    0 11
     21% 4117 2092 0 2686 109165 3864 8 0
    0 11
     26% 4188 2136 0 3436 115081 21900 0 0
    0 11
    ......(skip)
     30% 7487 1773 0 4261 93385 10156 3328 0
    0 6
     25% 4566 1900 0 3339 96655 13764 9808 0
    0 7
     24% 2965 2202 0 2477 111493 11772 5475 0
    0 8
     23% 5256 1986 0 3093 102409 10508 24 0
    0 8
     19% 2979 2068 0 1810 102282 9926 0 0
    0 8
     20% 3164 2323 0 2301 111209 1560 8 0
    0 8
     23% 7082 2165 0 2322 103816 2292 24 0
    0 8
     22% 11780 1158 0 2763 55501 1760 0 0
    0 8
     20% 12032 675 0 3820 36504 2452 0 0
    0 8
     CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s
    Cache
                                   in out read write read
    write age
     23% 16269 1122 0 3914 54034 4460 24 0
    0 6
     18% 8991 1030 0 2739 48400 4568 8 0
    0 6
     10% 3903 237 0 1346 4494 3828 0 0
    0 6
     11% 3912 219 0 1623 4301 3808 6508 0
    0 6
      8% 2402 224 0 868 2027 2744 8712 0
    0 6



    This archive was generated by hypermail 2b29 : Sun Feb 08 2009 - 21:39:19 EST