AWS Cloud Configuration

Last post 09-01-2020, 2:27 PM by TechnoSide2. 8 replies.
Sort Posts: Previous Next
  • AWS Cloud Configuration
    Posted: 07-08-2020, 7:32 PM


    We are planning to start sending our secondary copies to the AWS Cloud and it was recommended that we use a combined storage class, consisting of S3-IA/Glacier storage to accomodate all our data.  We have several hundred TBs of data that will need to be copied and we will very infrequently have a need to restore from it.  The mass majority of our restores will be satisfied by the primary on-premise copy.  It was also recommended that we host the MA/DDB in the Cloud to help with lowering costs.  Does this sound like a reasonable configuration based on our needs?

    Commvault recommends that we seal the DDB every 8-12 months.  Can someone please explain in simple terms why the DDB seal is necessary?  I am concerned because with each seal, we will be forced to re-baseline all our data (100's of TBs) and this could take a very long time.

    According to the documentation, micro-pruning is supported with a combined storage class.  Does this mean that the jobs will be aged/pruned from the Glacier storage object?  I thought that was one of the reasons why we had to perform the DDB seal was to help facilitate the pruning as it wouldn't occur without the seal?


    - Bill

  • Re: AWS Cloud Configuration
    Posted: 07-09-2020, 2:08 PM

    Yes - it is recommended to have a M.A in the cloud as it will help with doing synthetic fulls vs. getting charged for unneccessary egress costs if you don;t have one.


    AS for ddb seal, which it is a best practices, it will honestly just cost more if you don't depend on it for actual resotres.

    Micro-pruning is supported for live tiers ( S3, IA ) and is not supported int he Glacial Tiers

  • Re: AWS Cloud Configuration
    Posted: 07-12-2020, 7:04 PM


    Yes - it is recommended to have a M.A in the cloud as it will help with doing synthetic fulls vs. getting charged for unneccessary egress costs if you don;t have one.

    BillyK wants to send secondary copies to the AWS Cloud, so won't the Chunk Metadata read intensive Synthetic Full operation be performed on the primary copy and thus avoiding the egress cost?


    AS for ddb seal, which it is a best practices, it will honestly just cost more if you don't depend on it for actual resotres.

    Indeed, because deduplicated Combined Storage tiers require a recall of the deduplicated volumes from archive. Only exception to this are Windows/Unix File System backups (V2 Indexed) there is no gain for sealing the deduplication databases because the entire deduplication database does not need to be recalled for the restore.


    Micro-pruning is supported for live tiers ( S3, IA ) and is not supported int he Glacial Tiers

    Micro-prunng seems to work best in the hot tier. I've yet to see micro-pruning support S3 commodity storage.

  • Re: AWS Cloud Configuration
    Posted: 07-14-2020, 6:47 PM

    We have a new workflow on Commvault store which supports restores for all agetns so there is no need of using cmd line method which recalls all data in DDB. We already updated our documentation for version 11.16 to 11.19 and form 11.20 this workflow is made avaliable by default on the CommServe.

    About which method to use combined storage class (S3-IA/Glacier) vs direct archive storage class (Glacier)... if restore from archive copy is very infrequent then I would recommned to use directarchive storage class itself.

    For direct archive storage class you need to seal DDB frequently 6-12 months since we can't do micro pruning and sealing is the only way to prune data using macro prune.  For combined storage class micro pruning is supported but it's not at not at the each dedupe block rather it's at 8MB dedupe blocks file so there is a chance that some data may not get pruned if there is any dedupe blocks are still getting referred by new jobs. To cover these cases it's recommended to seal DDB at some higher frequecy (12 months) than you do for direct archive storage class.

    Please refer to documentation for more details about the new workflow.

  • Re: AWS Cloud Configuration
    Posted: 07-14-2020, 8:05 PM

    Read about these changes on just yesterday. Smile The "Cloud Storage Archive Recall V2" for 11.16 -> 11.19 workflow is very recent addition that deprecated the "Cloud Index File Recaller" workflow (along with the removal of all the documentation on this deprecated workflow).  The "Cloud Storage Archive Recall" (not to be confused with the V2 naming convention which only applies to prior Feature Releases) workflow is baked into 11.20 and the support for the On-Demand Recall has significantly increased and does not require the command line recall method.

    It would be welcome if nice changes like this make it into bulletins like the Feature Release 11.20 newsletter so customers change their practices accordingly.

    Prasad, as for the recommendation to write to write very infrequent data direct to Archive Storage Class instead of Combined - the "Public Cloud Architecture Guide for  Amazon Web Services" for 11.19 states "Do not use S3 Glacier/Deep Archive storage directly, instead use Commvault combined storage classes".  Have Commvault recently updated the way data is managed in Cloud Archive storage that allows writing direct to Archive/Glacier feasible?

  • Re: AWS Cloud Configuration
    Posted: 08-28-2020, 9:32 AM

    We too want to move to S3 IA/Glacier storage for long term archive.  I have a simple question on how to set this up.  The documentation doesn’t seem to call that out clearly.   Is it as simple as choosing the S3IA/Glacier storage class when setting up the library?   What are the requirements from the AWS side?   Is this a special type of bucket that my AWS team needs to setup?    I too am concerned about the every 8-10 months seal of DDB.   We have 100's of TB as well that we will be sending to the cloud.  I want to take advantage of the combined storage class to save on cost.  

    Thanks in advance!

  • Re: AWS Cloud Configuration
    Posted: 08-29-2020, 4:43 PM

    Yes  you are right. When configuring AWS library, select the Storage class  which you need. 

    From AWS side,  you need the account and with that information , able to create the S3 cloud Stotage Library  and bukcet from Commvault.  If already bucket is created in AWS side,  you can mention that as a bucket during configuration .  

    Please see below document is helping for the seal DDB question





  • Re: AWS Cloud Configuration
    Posted: 09-01-2020, 8:28 AM

    Hi All 

    Additional Note - depending on the Storage Class selected during the Cloud Library Creation, specific API are used to ensure the Backup Data is written to the right tier. 

    So within Commvault if you create the Cloud Library as S3 IA/Glacier (Combined Tier Storage), only the CHUNK metadata will be written to S3 IA and the remaining SFILE CONTAINERS will be written to Glacier. 

    Also with the latest Cloud Storage Archive Recall (V2) in conjunction to S3 IA/Glacier (Combined Tier Storage), when doing the restore it will access the CHUNK metadata to identify the associated Vectors and only recall the required SFILEs (which omits the needs of recalling both the CHUNK Metadata, then figuring out what additional SFILEs needs to recall)

    Here are more details on Cloud Storage Archive Recall (V2) Workflow - 



  • Re: AWS Cloud Configuration
    Posted: 09-01-2020, 2:27 PM

    Some more thoughts from my side.  The documentation seems to recommend the most cost effective is S3 IA only.   If we have to reseal the DDB every 6 months, wouldn't a new set of FULL/VSF  blocks need to be sent up to IA/Glacier every 6 months when new DDB is created,  hence not saving you much money.   

    My scenario, I’m already on S3 IA storage but I'm looking to create a new library with S3 IA / Glacier to save on cost.  Some of the aux copied data only lives on the S3 IA storage currently.  So I'll have to setup a new AUX copy from this source S3 IA library to the new library that has the combined storage class S3IA/Glacier.   The data will be very infrequently accessed, and only used for extended retention purposes.   

    If my AWS team already created the new bucket do I need to have them scratch this and let Commvault create the bucket?   Or will the Commvault API know how to use this bucket and send the data to Glacier.  

    I'm just trying to see if it'll ultimately save us on cost or not. 

    I'm no expert in this AWS storage and I appreciate all the additional conversations. 

The content of the forums, threads and posts reflects the thoughts and opinions of each author, and does not represent the thoughts, opinions, plans or strategies of Commvault Systems, Inc. ("Commvault") and Commvault undertakes no obligation to update, correct or modify any statements made in this forum. Any and all third party links, statements, comments, or feedback posted to, or otherwise provided by this forum, thread or post are not affiliated with, nor endorsed by, Commvault.
Commvault, Commvault and logo, the “CV” logo, Commvault Systems, Solving Forward, SIM, Singular Information Management, Simpana, Commvault Galaxy, Unified Data Management, QiNetix, Quick Recovery, QR, CommNet, GridStor, Vault Tracker, InnerVault, QuickSnap, QSnap, Recovery Director, CommServe, CommCell, SnapProtect, ROMS, and CommValue, are trademarks or registered trademarks of Commvault Systems, Inc. All other third party brands, products, service names, trademarks, or registered service marks are the property of and used to identify the products or services of their respective owners. All specifications are subject to change without notice.
Copyright © 2020 Commvault | All Rights Reserved. | Legal | Privacy Policy