Deduplication with Extended Retention, Data Aging and Pruning insight

Last post 02-10-2020, 2:13 AM by RHor. 6 replies.
Sort Posts: Previous Next
  • Deduplication with Extended Retention, Data Aging and Pruning insight
    Posted: 02-07-2020, 7:30 AM

    Hi All,

    I wanted to get a better understanding of how Deduplication Database and data pruning is impacted by Extended Retention Settings
    Documentation states that it is not recommended to use Extended Retention Rules for deduplicated storage policy as it may impact of how data is pruned. The reason is that some of the extended retention jobs may hold on some of the unique data which would be aged by basic retention jobs.

    "More significant space would be reclaimed from the disk library only when the extended retention jobs are aged."

    This is perfectly understandable.

    Following documentation there is a recommendation to use a selective copy in case someone need long term retention. Now, this seems to be counter-intuitive.

    Let's assume that we are following best practice and we use
    DDB1 for daily copy (30Days Ret) then another
    DDB2 for Monthly Copy (1Y Ret) and the another
    DDB3 for Yearly Copies (6 Year Ret)

    In this example our DDB's have unique blocks to the total amount of
    DDB1 10TB
    DDB2 20TB
    DDB3 10TB
    Which requires 40TB of disklib space. Data from different DDB's are not deduplicated against each other so from disklib used space point of view it would be better in 1 DDB.

    If we assume that there are:
    5TB in DDB2 that would make a secondary blocks in DDB3 and
    5TB in DDB1 that would make a secondary blocks in both DDB2 and DDB3
    then aux copying all of this data to single DDB4 would in theory take only 30TB of disk space on disklib, which seems to me as a strong Pro. DDB4 would be bigger than than DDB 1,2 or 3 but given the sizes it'll still be way under DDB max size and as long as Q&I times are good there should be no issue.

    So the disk space rquired for holding jobs is a strong Pro for using 1 DDB and mix retention, this however is the opposite of what is recommended.

    So what I don't understand is
    "As a result, when basic retention jobs are aged, comparatively less space gets reclaimed from the disk library due to some of the unique blocks pertaining to extended retention jobs."

    Which is true of course, but regarding disklib space it does not make sense, because creating another selective copy would only create some duplicated unique blocks which consume more space. And this is still true for another DDB with selective copy - no space will be reclaimed from here.

    Despite all of the above I assume that documentation is right about DDB and extended retention it just doesn't explain why. What I'm looking for more in-depth view of how data aging and pruning works. Can anyone share some light in that matter?

  • Re: Deduplication with Extended Retention, Data Aging and Pruning insight
    Posted: 02-07-2020, 10:11 AM

    Hi Robert,

    There are two points here

    - What you explained is basically a Global DDB for your secondary copies. It would certainly help save more space.

    - If you mix different data types, you lose % of dedupe saving. So 3 DDBs will always offer better saving ratio than 1 single huge DDB if you have different data types. In fact from SP17 by default if you create a new DDB, it will automatically creates 3 DDBs to hold different types of data (VMs, FS, Database).

    Hope this helps.



  • Re: Deduplication with Extended Retention, Data Aging and Pruning insight
    Posted: 02-07-2020, 11:26 AM

    Hi Bob,


    Thanks for the reply.

    You are right ofcourse, if you mix different types of data you always get worse dedup % because it is relative to the amount of data that you keep in DDB and I know about the new feature in SP17. This however is not the case in my post.

    What I'm trying to understand is how using an extended retention rules impact data aging comparing to use of selective copies to different DDBs and why it is a recomendation to use selective copies even though it will take more space than 1 single DDB. Documentation doesn't really explain this.

    For the sake of simplicity we can assume that all data in my example is VMs.

  • Re: Deduplication with Extended Retention, Data Aging and Pruning insight
    Posted: 02-09-2020, 5:09 AM

    That documentation regarding not mixing of short and long term retention by using Extended Retention does not apply if you are using DDB v4 Gen 2 technology (Garbage Collection through Mark and Sweep).  Commvault now state that each archive file is saved in a separate secondary table file and is deleted on a per-archive file basis upon expiration. This avoids any unwanted increase in DDB size, even when mixed retention is used. There is no need to separate short term and long-term data in a different deduplication database, which can lead to a complete re-baseline of the data.

    To upgrade to v4 Gen 2, the DDB must have been upgraded to Commvault 11 and if it is a Transactional DDB - then it must be downgraded to a non-transactional DDB (downgrade is done through a Commvault Support Ticket).  If you meet that criteria, then the DDB can be upgraded after performing a compaction (takes a few hours and backups cannot run during this time), then enabling Garbage Collection is as simple as ticking the box on DDB properties.

  • Re: Deduplication with Extended Retention, Data Aging and Pruning insight
    Posted: 02-09-2020, 6:40 PM

    Hi Anthony,


    That is some great news! From what I read in your post, Garbage Collection feature does make a great impact on how data aging with deduplication works yet the documentation is very unclear about this

    "The Garbage Collection feature optimizes the DDB performance by reducing the DDB disk IO during the data pruning process. With the Garbage Collection feature, the DDB disk is parsed once every 24 hours to mark the CommServe job records for pruning and the data aging process prunes these marked CommServe job records."

    I also have to admit that event though I work with Commvault on daily basis and try to stay up-to-date I had no idea that Garbage Collection is such a huge change! Even though, from what I read, this is a default since SP14 they still even throw a warning message at you when you try to apply extended retention.

    1) Is there any doc/KB/article where I can read some more about it?

    Also, I just jumped into two different Commcell Environments (SP14&SP18) and I don't see the option to turn on Enable garbage collection and pruning log journaling options.

    2) Do I need to run a compaction on DDB first for this option to appear?

    3) And last but not least - any downsides of turning it on? Is it safe to run this on production?

  • Re: Deduplication with Extended Retention, Data Aging and Pruning insight
    Posted: 02-09-2020, 7:27 PM

    You won't have been the first to get the memo.  Whilst CV do put out quarterly newsletters for each SP, even those who follow the product closely can miss some of their best stuff like this and they really should evangelise it more.  To be honest the only hitch I have seen so far is that one occassion I had to repeat the secondary compaction a few times because it wasn't properly registering the completion operation but worked fine once I properly exited out of all the commvault directories/applications on the mediaagent.  Compaction can take a few hours to run, but if you have secondary copy with a good link then you can work on one copy at a time whilst the other is being compacted.

    The settings you should have are found here and here. Are they not there?

  • Re: Deduplication with Extended Retention, Data Aging and Pruning insight
    Posted: 02-10-2020, 2:13 AM

    They really should!

    In SP18 there is no such option in Media Management -> Deduplication tab.


    There is however in DDB Properties (I was looking in the wrong place earlier). I guess they just removed the option to show this and now it always shows.

    Thank you Anthony I will look into this and test it as soob as possible.

The content of the forums, threads and posts reflects the thoughts and opinions of each author, and does not represent the thoughts, opinions, plans or strategies of Commvault Systems, Inc. ("Commvault") and Commvault undertakes no obligation to update, correct or modify any statements made in this forum. Any and all third party links, statements, comments, or feedback posted to, or otherwise provided by this forum, thread or post are not affiliated with, nor endorsed by, Commvault.
Commvault, Commvault and logo, the “CV” logo, Commvault Systems, Solving Forward, SIM, Singular Information Management, Simpana, Commvault Galaxy, Unified Data Management, QiNetix, Quick Recovery, QR, CommNet, GridStor, Vault Tracker, InnerVault, QuickSnap, QSnap, Recovery Director, CommServe, CommCell, SnapProtect, ROMS, and CommValue, are trademarks or registered trademarks of Commvault Systems, Inc. All other third party brands, products, service names, trademarks, or registered service marks are the property of and used to identify the products or services of their respective owners. All specifications are subject to change without notice.
Copyright © 2020 Commvault | All Rights Reserved. | Legal | Privacy Policy