DeDupe causing slow backups

Last post 05-26-2011, 2:49 PM by Paul Hutchings. 53 replies.
Page 1 of 2 (54 items)   1 2 Next >
Sort Posts: Previous Next
  • DeDupe causing slow backups
    Posted: 09-18-2010, 12:02 AM

    We are running Simpana 8 with SP4.

    We have implemented Block Level DeDupe recently and are finding that DeDupe has slowed down our Full backups substantially. When monitoring our servers the network throughput rarely exceeds 100Mbit/sec. If I disable DeDupe and run the same backup I get speeds of around 230+Mbit/sec.

    I have tried applying the nDeDupTurbo registry key and I have moved the DeDupe DB store to its own RAID1 volume. What I am noticing is extremely high IO on this volume and I believe this is the cause of the slow backups.

    Software compression and Deduplication are set to happen on the clients.

    I was wondering if anyone else is experiencing this issue and if they have been able to work around it.

    Current DeDupe Perf Stats:

    DiskPerf Version        : 1.1
    Path Used               : S:\Temp
    Read-Write type         : SEQUENCE
    Block Size              : 512
    Block Count             : 4096
    File Count              : 1024
    Total Bytes Written     : 2147483648
    Time Taken to Write(S)  : 43.91
    Throughput Write(GB/H)  : 163.97
    Total Bytes Read        : 2147483648
    Time Taken to Read(S)   : 18.90
    Throughput Read(GB/H)   : 380.88

  • Re: DeDupe causing slow backups
    Posted: 09-18-2010, 4:15 PM

    Hi,


    Because of the fact you're using RAID1, the write throughpout for your DDB database is not very good... Have you looked at these 2 links:

    http://documentation.commvault.com/commvault/release_8_0_0/books_online_1/english_us/features/single_instance/single_instance_how_to.htm#Evaluate

    http://documentation.commvault.com/commvault/release_8_0_0/books_online_1/english_us/features/single_instance/single_instance.htm#Overview

    Somewhere in there you will find:

    "Ensure that the average read throughput of the disk is around 500 GB per hour, and the average write throughput of the disk is around 400 GB per hour. Calculate the average read and write throughputs from multiple samples (between three and ten), for a FILECOUNT of 500."

    * for a DDB drive RAID0 is recommended, or RAID10 as second best choice. How many disks do you have in use right now? What type are they ? FC? 15k?

    * Also how many SIDB's are you running on that media-agent. No more than 4 per MA is a rule of thumb.. where 2 to 3 are considere optimum

    * How many RAM does the media-agent have? You should aim at 2 GB / SIDB

    * For the throughputs you're hoping for, you should be running a x64 OS (2008 x64 would be a good choice, that kernel was built for speed)

     

    Cheers

     

     

  • Re: DeDupe causing slow backups
    Posted: 09-18-2010, 5:37 PM

    Thanks for your reply. I had stumbled across that doco after I posted.

    I am using two 76GB SAS 15K RPM drives in a RAID1 for the DDB. RAID0 is a scary proposition when there is no redundancy in the event of failure. RAID10 means throwing a lot of disk at a small amount of data.

    We will be upgrading the CommServe to a Windows 2008 64-bit server in the next few months. Currently the server has 4GB RAM and we have to DDB instances running on the MA.

    I spent the remainder of the day trying to improve performance of my backups. I found that when I sealed my dedupe store my backups instantly went from 80Mbit/sec to 200+Mbit/sec.

    I think I need to schedule to seal my Store on a regular basis. Maybe every 20 to 30 days to ensure performance does not degrade.

     

  • Re: DeDupe causing slow backups
    Posted: 09-24-2010, 10:38 PM

    Wgrixti,

    The ddb is not required to reside on a redundant raid set. The reason for this is that if the ddb is lost all the data on disk is restorable. What we need for the DDB is to have the fastest I/O possible. Raid 0 gives you this. Also make sure to move the Page file its own raid 0 drive configuration. This is a microsoft recommendation for server with very high I/O. i seen in many cases move the page file from the C drive to its own drive improve performance greatly. what is your retention and what is the setting "Do not deduplicate against objects older than" set to?

    AaronA

  • Re: DeDupe causing slow backups
    Posted: 09-24-2010, 10:43 PM

    Aaron thanks for your reply.

    So if we loose the DDB it will have no impact on restores. Is this only if they have gone to tape and been re-hydrated or restores from disk will continue to work? 

    If there is no impact I can convert my current RAID1 to a RAID0 for better IO.

    The Do not deduplicate against matches my Disk rentention period which is 7 days.

  • Re: DeDupe causing slow backups
    Posted: 09-27-2010, 8:19 AM

    Wgrixti,

     

    Correct, you can loose the DDB and your backups on disk in dedup form will be restorable. In fact the DDB is not even contacted during a normal restore. So have no worries about loosing the DDB. Your data is safe.

    I would recommend converting the to a Raid 0. when doing this eventhough the data is safe you dont' want to have to rebaseline. in order to prevent this i would recommend the following steps.

     

    1. stopping the services on the Media agent.

    2. Waiting for the sidb2 process to shutdown

    3. move the entire DDB directory starting with the CV_SIDB folder to another server.

    4. convert the drives to a raid 0.

    5. move CV_SIDB back to the original server with the same directory structure. Or else the ddb will not start and a ticket will need to be opened.

    6. start the media agent services and run a test backup.

    Regarding the Do not deduplicate against objects older than setting

    its only recommended to have this setting and retention match if you are using a Unix ma. this is because CV doesn't support sparse files on unix. So to clean the disk up CV recommend to have these setting match. This setting effectively laying down a new baseline. Which is the same as sealing the store. with such a low retention and using this setting your not getting the best bang for your buck with deduplication.

  • Re: DeDupe causing slow backups
    Posted: 09-27-2010, 10:44 AM

    We're seeing slow backup for dedupe on block level dedupe and worse performance when restoring or AUX copying dedupe data. We have 24GB on DDB MA and 8 x 15K SAS drives in RAID 10. We're moving away form CV dedupe due to the huge performance hit and moving to specialized appliance for deduping.

  • Re: DeDupe causing slow backups
    Posted: 09-27-2010, 1:55 PM

    Sanman,

     

    Most performance issues can be easily resolved by reviewing your dedup configuration.

    I would recommend opening a ticket to have your config reviewed.

    AaronA

  • Re: DeDupe causing slow backups
    Posted: 09-27-2010, 1:59 PM

    AaronA:

    Sanman,

     

    Most performance issues can be easily resolved by reviewing your dedup configuration.

    I would recommend opening a ticket to have your config reviewed.

    AaronA

     

    I can't even tell you how many reviews we have had to look at various configs, in the end generic software running on Windows will never be as fast or effective as specialized appliance.

  • Re: DeDupe causing slow backups
    Posted: 09-27-2010, 5:39 PM

    Aaron,

     

    Thanks for all the info.

    Can you explain how there is no impact on restores if the DDB RAID0 set was to fail. Bare in mind we dedupe to disk but do not dedupe to tape.

    With the "Do not dedupe" setting from what you said we should increase this to something like 30 days to see more of a benefit even though our SP retention is only for 7 days.

    Wayne

  • Re: DeDupe causing slow backups
    Posted: 09-28-2010, 4:48 PM
    Wgrixti The information necessary to complete a restore from a deduplicated backup is maintained in the backup object on the magnetic library. In the event that the RAID0 set is lost on the DDB - the impact is that when the DDB is placed on a new LUN that the next FULL backup is a new baseline - rather than a deduplicated job. Hope that helps!
  • Re: DeDupe causing slow backups
    Posted: 09-30-2010, 12:33 PM

    I am seeing a 6x drop in tape performance.  Before going to dedupe, I was seeing 250+GB/hr.  Ever since, tape speed is 30-60GB/hr.  I know that the data has to be rehydrated, but the huge drop in performance makes copying current data to tape very difficult.

  • Re: DeDupe causing slow backups
    Posted: 09-30-2010, 1:46 PM

    Colin:

    I am seeing a 6x drop in tape performance.  Before going to dedupe, I was seeing 250+GB/hr.  Ever since, tape speed is 30-60GB/hr.  I know that the data has to be rehydrated, but the huge drop in performance makes copying current data to tape very difficult.

     

    We see the same thing and are planning on dumping CV Dedupe and going to Data Domain in the next 60 days, the CV dedupe perfomance is horrible and I have 8 cores and 2008 R2 media agents with 8+ GB of RAM.

  • Re: DeDupe causing slow backups
    Posted: 09-30-2010, 2:44 PM

    Good afternoon Sanman,

    How many DDBs are you running at the same time?

    What is the drive configuration for the DDB server? Please include O/S drive and DDB volume.

    Microsoft released a kernal patch for Windows 2008 R2 that improves performance under heavy I/O.

     

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;982383

     

     

    AaronA

  • Re: DeDupe causing slow backups
    Posted: 09-30-2010, 2:48 PM

    AaronA:

    Good afternoon Sanman,

    How many DDBs are you running at the same time?

    What is the drive configuration for the DDB server? Please include O/S drive and DDB volume.

    Microsoft released a kernal patch for Windows 2008 R2 that improves performance under heavy I/O.

     http://support.microsoft.com/default.aspx?scid=kb;EN-US;982383

     AaronA

     

    DDB is on a RAID 10 array that consists of 8 x 15K drives, also houses index cache.

  • Re: DeDupe causing slow backups
    Posted: 09-30-2010, 2:50 PM

    Wgrixti,

     

    I appologize for the delay.

    During a restore we don't touch the DDB. this is because all the intelligence for the dedup data is incorporated into archive data in the mag library.

     

    The Do not dedup setting should be to a high number. If recommend setting the to a value higher than your seal time. if you don't seal then set this to 1825. This is the highest value recognized by the system. The effectively move the setting out of the way. 

    if you are seeing performance issue and don't feel you are getting the most out of Dedup then i would recommend opening up a TR to have your configuration reviewed.

    AaronA

     

     

  • Re: DeDupe causing slow backups
    Posted: 09-30-2010, 2:50 PM

    @sanman: Interesting, could you describe your environment in more detail. Looking for details on how the 15K RPM disks have been configured (same RAID Group or dedicated ones) on the R2 MediaAgents

    Can you also provide details for the backup target itself, spindle speed and capacity, what is the RAID configuration, how many spindles per RAID group, etc.

     

  • Re: DeDupe causing slow backups
    Posted: 09-30-2010, 3:27 PM

    When aux-copiying be sure to be mindful of the multiplexing factor and also combining streams. I have a rule of thumb of not going over a multiPlexing factor of 4 performance seems to tank 4fold if you go higher. because the drive has to unwind all that data to get to it. Also if you are combining the streams same type of thing applies. Also try to set this factors based upon your data set. If you are running alot of Windows servers I wouldn't go any higher than 3 on multiplexing and combine no more that a 3or 4 to 1 combining of streams. Unix/Linux can go higher, they just write cleaner and faster than windows. This may help your aux copy/restore issues.

     

    On the backups are you writing to disk? There is an advanced setting that you should match up to when you close off you DDB.

    When you look at the properties of the storage policy ->data dedupe tab ->go to the advanced tab and set the "do not dedup things older than" match this to same value as our DDB

     

  • Re: DeDupe causing slow backups
    Posted: 09-30-2010, 10:46 PM

    All,

    Thank you for all the great information. I have made some configuration changes but will need to wait until my new server arrives before I can make some major changes.

    From my perspecitve I wish I had been made aware of the dedupe requirements by CommVault Sales and Pre-sales so I would not have been blind-sided by these issues. If I was aware of the requirements both hardware and configuration I think I would not be in so much pain right now, I all so would have deferred my DeDupe implementation until the new server was available.

    Wayne

  • Re: DeDupe causing slow backups
    Posted: 10-01-2010, 7:46 AM

    MA #1 is 8 core 2.3GHz AMD with 8GB RAM 2008 R2 with 4gb FC and 10gb NIC

    MA #2 is 8 core 2.3GHz AMD with 8GB RAM 2008 R2 with 4gb FC and 10gb NIC

    Mag Libray server is 8 core 3GHz Intel with 20GB RAM 2008 R2 and 10gb NIC that has following drives:

    8 x 300GB 15K RAID 10 LUN contains shared index cache and 4 DDBs

    30 x 1TB SATA RAID 6 LUN Contains 2 MagLibrary paths

    30 x 1TB SATA RAID 6 LUN Contains 2 MagLibrary paths

    All luns are formated using 64K clusters and are aligned to a 64K boundary.

    Both MAs access a shared LTO5 FC Library and then the respecitve MagLibrary paths on the common MagLibrary Server.

    We also go into the policy and under Data Paths set the properties to a 64K block size, which imporves performance. We have also tweaked all NIC settings for optimal performance with additional buffers, and RSS queues.

     

  • Re: DeDupe causing slow backups
    Posted: 10-01-2010, 12:04 PM
    Thanks sanman, that explains a lot

    Two key elements to deduplication performance are I/O isolation for DDB (or SIDB) and I/O distribution across the magnetic library paths (or mount paths)

    I/O Isolation for DDB

    In addition to fast disk, it is important to ensure that the DDB I/O path is not shared by any other activity for maximum performance. The best way to do that is to create separate RAID groups for each volume that needs to be presented to DDB Managers (MA1 and MA2). In our example setup we have the 15 K RPM spindles configured as follows

    RAIDGROUP 1: 2 spindles – RAID 1: This volume contains the Windows OS.

    RAIDGROUP2: 2 spindles – RAID 0: This volume contains the OS pagefile

    RAID GROUP3: 4 spindles – RAID 0: This volume contains the DDB (or SIDB)

    By creating RAID groups in this manner, you are taking advantage of multiple channels on your FC card and ensuring that the IO for the OS is routed on a different channel than the IO for the DDB.

    The MA in this setup is Windows 2008 R2 x64, 2 CPU quad core, 16 GB RAM. With this we have seen sustained backup throughput of 2.5 to 3 TB/hr per MediaAgent on real world datasets.

    I think the fact that all the 15K RPM spindles are on a single RAIDGROUP and the OS, Pagefile, DDB and Index Cache all share it may be hindering your performance. If the index cache needs to be collocated, I would put it on the same volume as the pagefile. You should avoid putting anything else on the DDB volumes. If your policies allow it, I would even suggest turning off anti-virus on the DDB volume.

    Also as AaronA noted earlier, anything other than RAID0 will not drive peak performance. You can use RAID 10 or RAID 5 on the DDB volume for additional comfort, but that will come at the expense of performance

    I/O Distribution across mount paths

    This may be the reason why your Auxcopy performance is slow. Remember when you deduplicate data you will read that could be scattered across the volume. A key consideration here is to distribute the read and write workload across the different volumes equally. Also the more LUNs you have on a single RAIDGroup, the more burden you place on the spindle heads.

    In our example setup, we configured 7 x 1TB RAID 5 RAIDGROUPs and two LUNs per RAID GROUP of about 3 TB each. We presented these LUNs as mount points (not drive letters) which were then configured as mount paths on the disk library. We also had the disk library set up to do “spill and fill” to distribute i/O equally across these volumes. We were able to attach 64 TB of usable disk behind a single MA. Assuming a conservative 10:1 dedupe ratio, that translates to 640 TB of logical data behind a single MA.

    When we setup the disk target using these principles, our AuxCopy to tape speed was about 70-80% of the speed when reading from a non-dedupe disk.

    In your case, I suggest 15x1TB RAID6 GROUP and 3-4 LUNs per RAIDGROUP. If you are worried about wasting spindles because of smaller RAIDGROUPs, then RAID 5 will do as well, since you are creating a second copy on tape anyway for added protection. With this you get 12-16 mount points on which you can perform “spill and fill”. For optimal use, suggest you set the cumulative # of streams across all storage policies to be multiple of 12-16 as well, to ensure equal distribution. You should also check to see if data aging jobs are running at the same time as Auxcopy. While normally there is no harm running the two simultaneously, sometimes when your retention settings are low, Auxcopy could act as a trigger for data aging as well, increasing the read burden on the disk library spindles.

    This may be something that Colin may want to look at as well for auxcopy optimization.

    As for Data Domain or any appliance approach, there really is no secret sauce in my opinion. If you peel under the layers of any appliance, you will see that a lot of the performance claims derive from the disk raid configuration and optimal stream distribution across different volumes. Typically, in the larger appliances, you will find 16x1TB RAID 6 with multiple LUNs per RAIDGROUP and streams that distribute equally across these LUNs. Even there, Auxcopy will run 50% slower than when using non-dedupe disk as source, because of rehydration.

    The downside to this closed appliance approach is that you still need MediaAgents that are powerful enough that can drive performance. Since deduplication occurs in the appliance, you are forced to write everything from the MA to the appliance, requiring thicker pipes. Why not take full advantage of these MA servers and have them do deduplication as well. By reducing data on the MA, you are reducing the data transferred between the MA and the disk target. More importantly, you get the benefit of incrementally adding disk capacity, versus having to go buy full shelves when you only need a few TBs. Moreover, if the appliance is full, you have no choice but to buy another one of larger size.

    So yes an appliance may simplify life initially for the first few weeks, but pretty soon you will find yourself facing the same issues that you were having before you switched over to a backup to disk solution.

    Would love to continue this discussion, so keep the comments rolling in.
  • Re: DeDupe causing slow backups
    Posted: 10-01-2010, 12:26 PM

    Actually the config I specified is only our latest config. We had the SIDB ona  dedicated host with dedicated disk and saw NO difference. I have watched perfmon and Resource monitor and with our config SIDB does not impose much or utilize much, with 20GB or RAM and 512MB cache controller the disk IO is very much buffered and cached. Actually the OS seems to be using 12+GB of RAM as disk cache. This is the 3rd or 4th different config we have tried, each time at the suggestion of CV support, so at this point I am thoroughly convinced CV dedupe is the bottle neck. We have run cvdiskperf again the disk and can sustain over 375MB/sec. We also turned off software compression to help lessen the load on the CPUs as with dedupe and software compression the CPUs (all 8) live over 80% and performanc seems to drop. I also dont buy the appliance aurgument you mention, in the end have a dedicated dedupe box that is highly optimized to dedupe is a far better way to go. In the end if  need to grow the MA or the dedupe appliance it is the same. All of our testing shows CV is somewhat ineffecient at moving data in general so the less we burden it with other tasks (compression, dedupe) the better we can get.

  • Re: DeDupe causing slow backups
    Posted: 10-01-2010, 1:25 PM

    sanman:

    Actually the config I specified is only our latest config. We had the SIDB ona  dedicated host with dedicated disk and saw NO difference. I have watched perfmon and Resource monitor and with our config SIDB does not impose much or utilize much, with 20GB or RAM and 512MB cache controller the disk IO is very much buffered and cached.

    Outof curiosity, when using perfmon, did you monitor the Read and Write Average Queue length on the SIDB volume. I find that is the best indicator of whether the SIDB is bottleneck.

    Also keep in mind that the Auxcopy performance is independent of the SIDB. That really is a function of how the backup disk targets have been setup and how the streams are distributed across the different mount paths

  • Re: DeDupe causing slow backups
    Posted: 10-02-2010, 12:33 AM

    I would really like to get a firm understanding of what I need to purchase when I buy our new MA next year to get better DeDupe performance. Would CV Support help out here or would I be  bumped to Pro Services?

     

  • Re: DeDupe causing slow backups
    Posted: 10-04-2010, 1:07 PM

    Wgrizti,

    This is something that i can assist you with.

    I Will contact you offline tomorrow to have a further conversation regarding the media agent recommendation.

    AaronA

     

  • Re: DeDupe causing slow backups
    Posted: 11-25-2010, 3:40 PM

    Hello Zahid,

    This if very interesting information. I'm currently testing Simpana in our lab before deploying it to our production environment. Is there documentation on how to recover the DDB? I wouldn't mind putting it on a RAID-0 for the added performance if I'm 100% confident on how to restore it in case we lose a drive.

    This raises another question though... if we don't need the DDB to recover data and we can rebuild it in case it is lost, what is its use? I'm guessing it is some kind of indexing DB that just helps speed the whole backup/restore process? What would happen in case we lose the DDB during a backup/restore job?

    I thought the DDB was actually holding all blocks that weren't unique and that if losing this, you'd lose everything since you'd have a bunch of "holes" in your backed up data. If this isn't the case, where is that data and the pointers being kept? Does this all apply for a global deduplication store as well?

    Thanks!

    Philippe

  • Re: DeDupe causing slow backups
    Posted: 11-29-2010, 9:25 AM

    PhilippeMorine,

    what version are you testing? in v9 there is a process that will create a recovery point for the DDB called DDB Availability. This process will protect the ddb in case of disk corruption and then automatically rebuild the ddb. in v8 there is no way to protect the DDB. if you lose the drive the ddb resides on then a new ddb will be created.

     

    The DDb is used for Backup and data aging. The DDB is not used for restores.

    AaronA.

     

     

     

  • Re: DeDupe causing slow backups
    Posted: 11-29-2010, 9:44 AM

    Hello AaronA,

    Thanks for your reply. We are indeed using v9. So if I understand right, I don't have to do anything. If the drive crashes, once it is back online, Simpana will see it and start to rebuild the database? When will it try to rebuild it?

    What if the drive doesn't come back in time for the next scheduled backup? I'm guessing it will fail until the drive is back online?

    Thanks!
    Philippe

  • Re: DeDupe causing slow backups
    Posted: 11-29-2010, 11:11 AM

    Philippe.Morin,

    if the ddb drive goes down then the recovery point will be lost as well. Support can provide you a reg key that will allow the Recovery point to be moved to another set of disk.

    The recover process will begin after the backups jobs determine the DDB is not there or corrupted. The jobs will go pending until the recover process completes. Once completed the backups will resume and run to completion.

    If the disk does not come up in time for the next backup then a manual move of the DDB will need to occur. once the DDB has been moved in the gui the system will be able to recover the DDB.

    AaronA

  • Re: DeDupe causing slow backups
    Posted: 11-29-2010, 11:51 AM
    Hi Philipe,

    There are several things to consider when designing a solution for high availability, the DDB is just one of those considerations. Before we go into availability considerations let me answer your original query. All the DDB contains is the hash signatures of the data blocks and the reference counts to those signatures (I am simplifying of course, but the concept still stands). During backup, the MediaAgents refer to the DDB to make a quick determination on whether to write the data block or not. If signature does not exists in the DDB, we insert it in there, if it does exist, we increase the reference count.

    The key differentiation we have is our index also contains this exact same information, i.e the index (or catalog) contains information about where all the data blocks for each file being backed up is located on the disk library. And as you know, this index is distributed, self-managed and self-protected. In fact, every job automatically also protects its own index (called the Archive Index phase)

    Because of this, we never need to refer to the DDB for restores. A restore job simply identifies the point in time and the file(s) to be restored. It then refers to the index to identify which blocks on disk to read. Since the index is self-protected, if the index is not available, we automatically rebuild it from the backup disk target (a process called Index Restore). This catalog has a couple of other advantages.

    - Since the index points to exact blocks (across different backups) that constitute a file, there is no rehydration penalty or delay

    - If index restore is required, we only rebuild the part of the index that is necessary for the restore

    - The index is stored on all backup copies. In case of DR, you can start recovering your data in as little as 20 mins, as long as you have your backups on the DR site

    As you can see, the DDB is simply a reference table for us and data is still available even if the DDB is lost. The Data Aging process needs the DDB, but only to reduce the reference counts.. When the reference counts for each block reduces to 0, the corresponding data block can be deleted from disk. This does not mean that you cannot age data if you lose the DDB. There are processes called macro pruning that will allow you to age data even if the DDB is not available.

    Back to the availability question, you need to be aware of many components that can impact your backups. The most important is the backup disk itself. If you chose to perform backups to disks that are directly attached to the MediaAgents (FC or iSCSI), if you lose the MA, you have lost access to the backup target as well. If gridstor is available, backup will restart on other MediaAgents and disk luns but you cannot restore data written by the first MA because the disk LUN attached to that MA is not available. You need to make the disk lun available to other MAs which could be a manual process. To avoid this you may want to think about using shared storage as backup disk target. Also it is prudent to always create a second backup to a different disk device using DASH copy, or to tape or cloud storage using SILO. This ensures that even if you lose the first backup target, you still have access to data while you rebuild the original MA/luns. If the loss is permanent, you can simply promote the secondary copy as the primary copy (assuming it is local and meets performance criteria) and continue doing backups.

    If you chose to use NAS disk target you need to ensure that the network access to the share itself is highly available too.

    Back to DDB availability, the deduplication store has the option (enabled by default) to create a copy of the DDB every 8 hours (configurable). When the DDB is lost, the next backup process detects this loss and attempts to rebuild the DDB. This involves copying back the saved DDB copy and then reinserting all signatures/reference pointers from that point back into the DDB. Remember that our index contains all the information necessary to rebuild the DDB. How long this process takes depends on how far back the DDB was restored to and how many backup jobs ran since the last DDB copy, but it is never more than a couple of hours. Backups will remain suspended during this time, as soon as the DDB is available, backups resume automatically. If this happens in the middle of the night, you may not even know about this outage.

    The DDB backup copy by default is created in the same location as the DDB. We recommend creating this DDB copy on a network share (if you are using indexing server or shared index, recommend creating the DDB copy there). This covers against all types of losses. If the original DDB LUN (or MA) is unavailable temporarily, the DDB rebuild process will recover automatically. If the outage is long term or permanent, you can simply change the DDB host to an alternate MediaAgent from the Storage Policy properties and the DDB will be recovered to the alternate MA.

    If you want no manual intervention at all and you have multiple MAs, you can simply deploy these MAs as active-active nodes in a cluster with the DDB volume as one of the cluster resources. If one of the physical servers is unavailable, the MA fails over to the other node and the DDB rebuild process automatically rebuilds the DDB on the other node (if necessary).

    As you can see there are multiple options available to ensure uninterrupted operations. It is a matter of picking which one is most suitable to your environment and your availability needs.





    Zahid
  • Re: DeDupe causing slow backups
    Posted: 11-29-2010, 12:49 PM

    this may be "lame" to reply to such an old thread, for all I know you already have some Datadomain's in place by now..

    I noticed you formatted your LUNs to 64kB block size. This is fine for the MagLib store LUNs, but not for the DDB, which should be formatted using a 4 kB blocksize

    Furthermore, the DDB's I run on my Clariion CX4-240 are put on a RAID10 8 disk raidgroup. I attain rates with CVDiskPerf of 725 MB/sec read and 650 MB/sec write

    MA initially had 8 GB of RAM and ran 2008 R2.  Simulations have shown I could run up to an SIDB for 160TB

    When I increased RAM substantially, the test would continue on running, not reaching the 1000 µs insert mark for days...   so RAM is definately a factor as well!  2GB /SIDB doesn't seem a lot...

     

    Kind regards,

    Renaat

  • Re: DeDupe causing slow backups
    Posted: 11-29-2010, 1:25 PM

    Zahid,

    Thanks for all the information. This is very interesting.

    I am still testing it (v9) in the lab but this is what our production environment should look like once we're ready:

    We have 6 regions which have 1 MA each and backup to disk. Each region has it's own storage policy with a secondary copy sent to the DR site through DASH copy. The regions are very small so the backed up files are on the same disks as the DDB (I know this doesn't help performance but our regions are only 100gb total so I figured it wasn't worth spending money on more hardware as the backup time is already very short).

    HQ has 1 MA and backup to disk. Backed up files are sent to direct FC attached drives. The DDB should be located on a couple of disks on our SAN (which also hosts our production files) in what will most likely be a RAID-0 after what I'm reading in this thread. I'd like to send the DDB backup to the FC attached drives that host the backed up files as I guess the performance impact would be minimal (1 copy every 8 hours). HQ's storage policy also has a secondary copy sent to the DR site through DASH copy.

    DR has only 1 MA with FC attached drives which will host both the regions/HQ copies and its own DDB (I'm guessing the performance impact of having the DDB on the same disks would be minimal since the server will most likely be waiting on the WAN anyway).

    Now, from what you've said, I have 3 more questions....

    1: Does this setup makes any sense? It does to me but I could be missing something you pros would see!

    2: How do I backup the DDB to another drive? AaronA mentionned a registry key. I searched the documentation but could only find the key to change the backup interval.

    3: When reading the documentation about DASH copies, they have checkboxes to enable both disk and network optimized DASH copies. When I look at my v9 interface though, I can only choose one of the two. Has this been changed from v8 to v9 and the v9's screenshot wasn't updated properly? I obviously went for the network optimized option but would rather have both options checked if possible! :-)

    Thanks!
    Philipppe

  • Re: DeDupe causing slow backups
    Posted: 02-01-2011, 5:21 AM

    Hello Phillipe,

    concerning point 2.:

    When the snap occurs only the active deduplication database is snapped. Once the store is sealed the backup folder will remain in place until all the data in the store has aged and data aging deletes the store folder.

    Note:

    When sizing the DDB volume the backup folder size needs to be considered. Create a recovery point of the DDB will double the foot print of the active DDB.

    The backup folder can be move from the default location to less expensive local drives or a network share by using the following registry keys.

    Keep im mind that the DDB size is so, that speed might matter in case of a necessary rebuild.

     

    Best Regards

    Bernd

  • Re: DeDupe causing slow backups
    Posted: 02-01-2011, 6:54 AM

    Zahid:
    As for Data Domain or any appliance approach, there really is no secret sauce in my opinion. If you peel under the layers of any appliance, you will see that a lot of the performance claims derive from the disk raid configuration and optimal stream distribution across different volumes. Typically, in the larger appliances, you will find 16x1TB RAID 6 with multiple LUNs per RAIDGROUP and streams that distribute equally across these LUNs. Even there, Auxcopy will run 50% slower than when using non-dedupe disk as source, because of rehydration. The downside to this closed appliance approach is that you still need MediaAgents that are powerful enough that can drive performance. Since deduplication occurs in the appliance, you are forced to write everything from the MA to the appliance, requiring thicker pipes. Why not take full advantage of these MA servers and have them do deduplication as well. By reducing data on the MA, you are reducing the data transferred between the MA and the disk target. More importantly, you get the benefit of incrementally adding disk capacity, versus having to go buy full shelves when you only need a few TBs. Moreover, if the appliance is full, you have no choice but to buy another one of larger size. So yes an appliance may simplify life initially for the first few weeks, but pretty soon you will find yourself facing the same issues that you were having before you switched over to a backup to disk solution. Would love to continue this discussion, so keep the comments rolling in.

    In the end by the time I spend all the $$ to build the ultimate CV Dedupe MA setup I am into a Data Domain. We have tried numerous different configs and in the end the bottle neck becomes the rehydration performance for AUX copies. I can get excellent backup perfromance, but the overhead for the AUX to rehydrate makes the AUX crawl. With DD I get a highly optimized hw/sw appliance and can let the CV MAs do what they do best, move data. Off loading the dedupe and rehydration to a dedicated appliance will give us best performance. We had tried DDB in every possibel config even with a dedicated media agent and in the end we found too many probelms with CV Dedupe. We use CV dedupe at our remote locations that have less load requireements and it works fine there, but for moving and deduping 18TB CV dedupe cracks under the pressure.

     

  • Re: DeDupe causing slow backups
    Posted: 02-02-2011, 12:34 AM

    very interesting post sanman, and can agree with many of your points.  Shame Data Domains are not cheap hey.

  • Re: DeDupe causing slow backups
    Posted: 02-02-2011, 5:42 PM

    Data Domains have also been recommended to me.  It seems like it does a good job, but again, it's ridonculously expensive.  There should be a better way to achieve similar results without having to purchase data domains.  right? 

    Can't live without Data Domains, but can't afford its expense either.

    -Andy

  • Re: DeDupe causing slow backups
    Posted: 02-03-2011, 6:42 AM

    Andy Gray:

    Data Domains have also been recommended to me.  It seems like it does a good job, but again, it's ridonculously expensive.  There should be a better way to achieve similar results without having to purchase data domains.  right? 

    Can't live without Data Domains, but can't afford its expense either.

    -Andy

     

    That is pretty much the theroy behind CV and others offering dedupe in their backup products to slot in lower than DD. DD is expensive, I suspect if they get serious competiton one day the price will come down, but if I were to factor in all the man hours we have spent fighting with CV Dedupe the DD is looking more attractive. By the time you set up media agents with the CPU power, RAM and disk performance you need to get reasonable numbers out of CV dedupe it gets expensive. With DD I will need smaller Media agents that are optimized for I/O as they are only going to move data. Having the Media agent have to juggle I/O and then dedupe (hash creation, compression, etc) is a huge load on a box. We have sat and watched out 8 core boxes run 100% through most of backups becasue of this load which has to have an impact on throughput. Additionallt the DD does variable block dedupe, CV is fixed length blocks.

  • Re: DeDupe causing slow backups
    Posted: 02-03-2011, 6:33 PM

    sanman:

    but if I were to factor in all the man hours we have spent fighting with CV Dedupe the DD is looking more attractive. By the time you set up media agents with the CPU power, RAM and disk performance you need to get reasonable numbers out of CV dedupe it gets expensive.

    yes i tend to agree with this, unfortunatly the man hours of post install work dont really get logged on capital expenditure forms so the business never sees this.

  • Re: DeDupe causing slow backups
    Posted: 02-03-2011, 7:26 PM

    Hey!

    Just chiming in to say that I am now getting tape speeds of up to 200GB/hr.

    Problem was not so much dedupe as incorrectly configured values for streams on the storage policies.

  • Re: DeDupe causing slow backups
    Posted: 02-03-2011, 8:28 PM

    yeah we have seen 200gb/hr but remember LTO4 is capable of about 400gb per hour, so your still not pushing the tape hard enough.

  • Re: DeDupe causing slow backups
    Posted: 02-03-2011, 8:49 PM

    Well, I went from 30-60GB/hr to 200GB/hr.  All things considered, I am very happy with dedupe.  I saw little to no slowdowns with the backups and tuning some parameters has improved aux copy speed.

  • Re: DeDupe causing slow backups
    Posted: 02-03-2011, 10:14 PM

    Colin:

    Well, I went from 30-60GB/hr to 200GB/hr.  All things considered, I am very happy with dedupe.  I saw little to no slowdowns with the backups and tuning some parameters has improved aux copy speed.

     

    Problem is a decent media agent shoudl be able to doubtl e that performance which means something is slowing you down. We were getting 300-400gb/hr in one MA with dedupe for several streams, probelm is with 18TB to backup I need more than that.

  • Re: DeDupe causing slow backups
    Posted: 02-03-2011, 10:32 PM

    Sanman,

    Is that disk or tape speed?

  • Re: DeDupe causing slow backups
    Posted: 02-04-2011, 6:49 AM

    Colin:

    Sanman,

    Is that disk or tape speed?

    That is throughput on the media agent to mag library (disk), but we had a lot of gear to get there.

    Media agent was 8 x 3GHz CPU with 16GB RAM and it had locally 60 x 7K SATA drives in 2 RAID 6 logical drives and I would have numerous streams and jobs going concurrently, the main job being Fiber Channel attached for my 4TB of VMware VADP backups. We eventually split the MA to 2 8 x 2.4GHz 12GB RAM system and they now feed the same disk that is on file server. We noticed some disadvanatges to this as now there is an additional network hop. We still get some good throughput but as I have said before the bottle neck for us is when we AUX all that deduped data to tape tp take it offsite we get maybe 25% of that throughput due to several factors, the rehydration of the data and the huge fragmentation of the data CV writes to disk. You woudl think the backup woudl sufffer, but the OS and Array controller buffer the writes very well, when the data is scattered all over the drives there is no way to cache/buffer that. We have played with every setting from segment size and write blocks to NIC buffers to optimize our backups. If you never have to rehydrate your data for an AUX then CV dedupe is not bad, but if you have to rehydrate and AUX that data it is painful. For now we have rediirected our NDMP stright to tape until our Data Domain is setup becasue 4TB would take maybe 21 hours to backup (many other jobs running concurrently), but the rehydration and AUX would take 2-3 days!!!

  • Re: DeDupe causing slow backups
    Posted: 02-14-2011, 3:48 AM

    It must be the disk</sarcasm>

    :)

    Seriously though, we have had some major throughput problems with out deduplication setup.

    To begin with we had a primary deduplicated copy via two MediaAgents (each with 2 x 3GHz Intel and 8GB RAM) to two separate Sun 6140 SAN sporting around 45TB total. We then ran into problems auxiliary copying data to tape, with ridiculous speeds between 30 to 60GB/hr in some cases.

    After enough cries from support about hardware not being up to spec, I replaced the lot. The CS is now a dual X5660 Nehalem, with a 4 spindle SAS2 RAID10 for database, and a 4 spindle SAS2 RAID10 for the index cache too. The DDB manager is a dedicated dual X5660, with 24GB of RAM, and 2 x 4 spindle SAS2 RAID0 for the DDB's. The datamovers are still Dell PE2900's with dual 3.0GHz Intel CPU's, each sporting 4 x 4GB FC HBA's, two dedicated to tape library, two for magnetic.

    At some point (forgive me, this is all a bit blurry!) we decided to try a secondary magnetic copy. We put in a dual X5660 with 24GB of RAM and 114TB of raw direct attached near line SAS (over 4 trays with SAS2 controllers). Copy performance isn't much (or any?) better on that either. No disk queue, low atime during auxiliary copy, but CV takes 2 minutes to read a chunk (or so I'm told..)

    So, six months (or more) down the track, we're back to "It must be the disk"... And V9 will be much less read intensive...

    Frustrating to say the least.

  • Re: DeDupe causing slow backups
    Posted: 02-14-2011, 4:44 AM

    Colin,

    You never gave any detail on your increase in performance. Are you able to share the changes you made to improve your copy speeds?

    Wayne

  • Re: DeDupe causing slow backups
    Posted: 02-14-2011, 7:02 AM

    While we have been mixed overall with CV I will say that when we switched from Veritas to CV we noticed an immediate 25%-30% drop in throughput on the same exact hardware. We worked with CV for months with no resolution and we gave up. My overall impression is CV's claim to fame is the usability and reporting as well as the usually fast adoption of new technologies. CV performance is a crap shoot and dedupe while nice on paper has been a pain for anything other than a few TB of backup data in our remote locations. If NetBackup would become more user friendly it would gain back the market share it lost to CV, becasue it screamed and I would get bug fixes in a few days.

  • Re: DeDupe causing slow backups
    Posted: 04-15-2011, 12:38 PM

    Zahid:
    Thanks sanman, that explains a lot Two key elements to deduplication performance are I/O isolation for DDB (or SIDB) and I/O distribution across the magnetic library paths (or mount paths) I/O Isolation for DDB In addition to fast disk, it is important to ensure that the DDB I/O path is not shared by any other activity for maximum performance. The best way to do that is to create separate RAID groups for each volume that needs to be presented to DDB Managers (MA1 and MA2). In our example setup we have the 15 K RPM spindles configured as follows RAIDGROUP 1: 2 spindles – RAID 1: This volume contains the Windows OS. RAIDGROUP2: 2 spindles – RAID 0: This volume contains the OS pagefile RAID GROUP3: 4 spindles – RAID 0: This volume contains the DDB (or SIDB) By creating RAID groups in this manner, you are taking advantage of multiple channels on your FC card and ensuring that the IO for the OS is routed on a different channel than the IO for the DDB. The MA in this setup is Windows 2008 R2 x64, 2 CPU quad core, 16 GB RAM. With this we have seen sustained backup throughput of 2.5 to 3 TB/hr per MediaAgent on real world datasets. I think the fact that all the 15K RPM spindles are on a single RAIDGROUP and the OS, Pagefile, DDB and Index Cache all share it may be hindering your performance. If the index cache needs to be collocated, I would put it on the same volume as the pagefile. You should avoid putting anything else on the DDB volumes. If your policies allow it, I would even suggest turning off anti-virus on the DDB volume. Also as AaronA noted earlier, anything other than RAID0 will not drive peak performance. You can use RAID 10 or RAID 5 on the DDB volume for additional comfort, but that will come at the expense of performance I/O Distribution across mount paths This may be the reason why your Auxcopy performance is slow. Remember when you deduplicate data you will read that could be scattered across the volume. A key consideration here is to distribute the read and write workload across the different volumes equally. Also the more LUNs you have on a single RAIDGroup, the more burden you place on the spindle heads. In our example setup, we configured 7 x 1TB RAID 5 RAIDGROUPs and two LUNs per RAID GROUP of about 3 TB each. We presented these LUNs as mount points (not drive letters) which were then configured as mount paths on the disk library. We also had the disk library set up to do “spill and fill” to distribute i/O equally across these volumes. We were able to attach 64 TB of usable disk behind a single MA. Assuming a conservative 10:1 dedupe ratio, that translates to 640 TB of logical data behind a single MA. When we setup the disk target using these principles, our AuxCopy to tape speed was about 70-80% of the speed when reading from a non-dedupe disk. In your case, I suggest 15x1TB RAID6 GROUP and 3-4 LUNs per RAIDGROUP. If you are worried about wasting spindles because of smaller RAIDGROUPs, then RAID 5 will do as well, since you are creating a second copy on tape anyway for added protection. With this you get 12-16 mount points on which you can perform “spill and fill”. For optimal use, suggest you set the cumulative # of streams across all storage policies to be multiple of 12-16 as well, to ensure equal distribution. You should also check to see if data aging jobs are running at the same time as Auxcopy. While normally there is no harm running the two simultaneously, sometimes when your retention settings are low, Auxcopy could act as a trigger for data aging as well, increasing the read burden on the disk library spindles. This may be something that Colin may want to look at as well for auxcopy optimization. As for Data Domain or any appliance approach, there really is no secret sauce in my opinion. If you peel under the layers of any appliance, you will see that a lot of the performance claims derive from the disk raid configuration and optimal stream distribution across different volumes. Typically, in the larger appliances, you will find 16x1TB RAID 6 with multiple LUNs per RAIDGROUP and streams that distribute equally across these LUNs. Even there, Auxcopy will run 50% slower than when using non-dedupe disk as source, because of rehydration. The downside to this closed appliance approach is that you still need MediaAgents that are powerful enough that can drive performance. Since deduplication occurs in the appliance, you are forced to write everything from the MA to the appliance, requiring thicker pipes. Why not take full advantage of these MA servers and have them do deduplication as well. By reducing data on the MA, you are reducing the data transferred between the MA and the disk target. More importantly, you get the benefit of incrementally adding disk capacity, versus having to go buy full shelves when you only need a few TBs. Moreover, if the appliance is full, you have no choice but to buy another one of larger size. So yes an appliance may simplify life initially for the first few weeks, but pretty soon you will find yourself facing the same issues that you were having before you switched over to a backup to disk solution. Would love to continue this discussion, so keep the comments rolling in.

    Zahid I just wanted to say thank you for this fantastic post.

    I'm planning on rebuilding our MA next week and I've copied and pasted this as a bit of a reference on how to do it.

    I do have one question (for now Smile) - is there a performance reason for using mount paths over drive letters for the maglib mount paths, or is it just because you start running out of drive letters?

    Also (ok so two questions) our maglib storage array is an Equallogic PS6000 so it's a single large RAID50 set (16x1tb) - from a performance perspective would you suggest carving it up into, say, 5x2tb volumes or 10x1tb volumes and if there is a preference, why is that please just so I understand it?

    I've only briefly experimented with aux copy of deduped data under 9.0 SP1b but I did find that the read-ahead registry key seemed to make a big difference so long as multiplexing meant several streams were being pulled.

    Thanks,

    Paul

  • Re: DeDupe causing slow backups
    Posted: 04-15-2011, 1:06 PM

    Paul Hutchings:

    Zahid I just wanted to say thank you for this fantastic post.

    I'm planning on rebuilding our MA next week and I've copied and pasted this as a bit of a reference on how to do it.

    I do have one question (for now Smile) - is there a performance reason for using mount paths over drive letters for the maglib mount paths, or is it just because you start running out of drive letters?

    Also (ok so two questions) our maglib storage array is an Equallogic PS6000 so it's a single large RAID50 set (16x1tb) - from a performance perspective would you suggest carving it up into, say, 5x2tb volumes or 10x1tb volumes and if there is a preference, why is that please just so I understand it?

    I've only briefly experimented with aux copy of deduped data under 9.0 SP1b but I did find that the read-ahead registry key seemed to make a big difference so long as multiplexing meant several streams were being pulled.

    Thanks,

    Paul

    Hi Paul,

    Spot on with the mount paths over drive letters. We are increasingly seeing folks attaching up to 50 mount points behind a single dedupe mediaagent and it is easier to configure disk libraries that way.

    On your question around volume size, EqualLogic is fairly unique in the sense that the controller is able to shift IO resources appropriately regardless of the LUN layout. However, as a rule of thumb, the number of mountpaths configured should depend on number of streams you want to drive with the MediaAgent. The more mount paths you have, the easier it is to distribute Storage Policy streams across the different LUNs leading to better performance. On the flip side on some systems, too many LUNs on a single RAID Group may actually hamper performance due to excessive drive head movement as multiple streams direct writes across the LUNs on the same RAID Group.

    Another consideration is drive maintenance operation. If at some point you need to run drive maintenance tools like defrag, it may be faster to do so on smaller volumes than on larger volumes.

    we have found that 2 TB is the ideal size for mount paths especially if you are going to be adding a lot more capacity.

    For your particular case, the EqualLogic controller is smart enough to optimize IO regardless of how LUN size. Be sure to enable the "Spill and Fill" option on the disk library to ensure all mount paths are written to equally.

  • Re: DeDupe causing slow backups
    Posted: 04-15-2011, 1:09 PM

    Paul Hutchings:

    Zahid I just wanted to say thank you for this fantastic post.

    I'm planning on rebuilding our MA next week and I've copied and pasted this as a bit of a reference on how to do it.

    I do have one question (for now Smile) - is there a performance reason for using mount paths over drive letters for the maglib mount paths, or is it just because you start running out of drive letters?

    Also (ok so two questions) our maglib storage array is an Equallogic PS6000 so it's a single large RAID50 set (16x1tb) - from a performance perspective would you suggest carving it up into, say, 5x2tb volumes or 10x1tb volumes and if there is a preference, why is that please just so I understand it?

    I've only briefly experimented with aux copy of deduped data under 9.0 SP1b but I did find that the read-ahead registry key seemed to make a big difference so long as multiplexing meant several streams were being pulled.

    Thanks,

    Paul

    Hi Paul,

    Spot on with the mount paths over drive letters. We are increasingly seeing folks attaching up to 50 mount points behind a single dedupe mediaagent and it is easier to configure disk libraries that way.

    On your question around volume size, EqualLogic is fairly unique in the sense that the controller is able to shift IO resources appropriately regardless of the LUN layout. However, as a rule of thumb, the number of mountpaths configured should depend on number of streams you want to drive with the MediaAgent. The more mount paths you have, the easier it is to distribute Storage Policy streams across the different LUNs leading to better performance. On the flip side on some systems, too many LUNs on a single RAID Group may actually hamper performance due to excessive drive head movement as multiple streams direct writes across the LUNs on the same RAID Group.

    Another consideration is drive maintenance operation. If at some point you need to run drive maintenance tools like defrag, it may be faster to do so on smaller volumes than on larger volumes.

    we have found that 2 TB is the ideal size for mount paths especially if you are going to be adding a lot more capacity.

    For your particular case, the EqualLogic controller is smart enough to optimize IO regardless of how Volumes are configured. Be sure to enable the "Spill and Fill" option on the disk library to ensure all mount paths are written to equally.

    Zahid

  • Re: DeDupe causing slow backups
    Posted: 04-15-2011, 1:47 PM

    Thanks Zahid, 2tb fits in quite nicely with what I had in mind.

    I'm not an expert on the Equallogic but I presume there's little benefit in  thin provisioning?  I say that because they're mount paths - they're going to fill up it's "when" rather than "if" so you'd be stupid to overprovision so there's little point in thin provisioning them in the first place?

    Until we either a) replace the MA server "head" or b) see a performance impact that gives us no choice, we're going to be left hosting the SIDB on a volume on the Equallogic as well - I guess that other than potentially slowing down our backups (D2D with dedupe should hopefully still be a big improvement over backup to tape) there is no major issue with this?

    I'd love to stick in a big fat RAID0/RAID10 just for the DDB but this box didn't get specced as a dedicated dedupe MA, it started life just to be used to backup to tape, bolting on this Equallogic is very much an afterthought due to a great deal to get us onto CLA.

    Cheers,
    Paul 

Page 1 of 2 (54 items)   1 2 Next >
The content of the forums, threads and posts reflects the thoughts and opinions of each author, and does not represent the thoughts, opinions, plans or strategies of Commvault Systems, Inc. ("Commvault") and Commvault undertakes no obligation to update, correct or modify any statements made in this forum. Any and all third party links, statements, comments, or feedback posted to, or otherwise provided by this forum, thread or post are not affiliated with, nor endorsed by, Commvault.
Commvault, Commvault and logo, the “CV” logo, Commvault Systems, Solving Forward, SIM, Singular Information Management, Simpana, Commvault Galaxy, Unified Data Management, QiNetix, Quick Recovery, QR, CommNet, GridStor, Vault Tracker, InnerVault, QuickSnap, QSnap, Recovery Director, CommServe, CommCell, SnapProtect, ROMS, and CommValue, are trademarks or registered trademarks of Commvault Systems, Inc. All other third party brands, products, service names, trademarks, or registered service marks are the property of and used to identify the products or services of their respective owners. All specifications are subject to change without notice.
Close
Copyright © 2019 Commvault | All Rights Reserved. | Legal | Privacy Policy