Our largest issue with CV at the moment is getting pruning to play nice with DASH copies|backups. If pruning is large and runs into the time that backups|DASH copies operate, both pruning and other operations slow down drastically and cascade into a slow mess with pruning & dash copy backlog and queued jobs.
Am really sorry about the issues you are experiencing with Commvault and I hope my answers would assist you in any way possible.
a.) The 3 main Commvault operations that makes use of intense IOPS on the Media Agents are Synthetic Fulls, Aux copy jobs and Data Aging jobs(Media Agent physical pruning)
b.) If either of the 3 listed schedules collides with each other, this will slow performance of both AUX\Dash cop jobs and Pruning jobs with Synthetic Full jobs taking higher precedence, then Aux copy, then Data aging job respectively.
c.) Commvault has provided a means to speed up physical pruning thread on Media Agents, however, this uses high IOPS as more Media Agent resources are utilised during this operation.
Please add the Additional Key to your Media Agent:
MediaAgent --> DedupMaxDiskZerorefPrunerThreadsForStore
Type --> DWORD|Integer
Note*- Usage is for Disk Libraries Only - Increases Pruning ability on Media Agent. Default value is 3 (Start from low value to higher value)
Helps generate more physical pruning threads when magnetic pruning is perceived to be slow.
We've got local SSD's with Q&I times ~300ys, and finalize times generally from under 1 second > 5. When pruning runs with DASH copies|backups both of the above can spike drastically.
Yes, this is true as both Dash Copies and Pruning are CPU intensive operations. Best recommendation will be to stagger schedule so that both operations doesn't run at the same time.
Also, whats the Front End Size of Data being backed up? The average Q and I time should be under 200 milli seconds.
Please refer to the Hardware Specifications for Deduplication Mode and ensure your Media Agents hosting the DDBs meets the Hardware specs:
We've got auto-kill set on our DASH copies before DDB backups and before the main backup window kicks off. Operational windows only go so far, particularly if you've already got a backlog from a DDB recon, once a backlogs built up I find I need to use every available hour to get it back down again and operational windows can result in wasted time not doing something that needs doing. (Also SMM_DONOTPRUNE only affects data aging and not pending deletes already send out to the MAs.)
Ideally we'd like more control and reporting of pruning. Currently the most information we can get from the GUI is the slope of the graph.
While I can grep finalize times from SIDBPrune and track current pending deletes per MA & the size thereof from SIDBEngine, this isn't enough information to make changes and see if something is helping or hindering.
The already prodvided additional key DedupMaxDiskZerorefPrunerThreadsForStore would assist to aggravate pruning threads. However, if something is hindering pruning, the sidbprune.log and sidbpysicaldeletes.log will show us errors that could be affecting physical pruning on the Media Agent.
At the moment, the pending deletes records could only be visible via the SIDBStore in Commcell Console, the graph and the SidbEngine.log but we cannot see the Application size of Pending Physical Deletes at this stage.
Stuff that'd be useful:
Add the size (TB) of the pending deletes per GDSP and MA to the GUI somewhere.
Please raise a Commvault Support case to see if a Customer Modification Request could be raised for this feauture.
Add pending deletes and size reclaimed per hour counters to either the commserve or private reporting server (both preferred).
Please raise a Commvault Support case to see if a Customer Modification Request could be raised for this feature.
Alternatively, you can use the - Interval between disk space check changed to 15. (Control Panel --> Media Management -->Interval between disk space check changed )
Then you can right Click on the Library --> Properties tab, then check the amount of disk space reclaimed.
Does CV support have a parser that grabs all the "Avg" times from SIDBPrune.log and does something useful with it (Graph|Groups values by hour), this would be good to release to customers if so.
Yes we do have a counter that allows the analysis of the AVG times on the SIDBPrune.log but this is not being used for graphical purpose.. Please enable below settings on your media Agent and it would analzye the Average finalize time to prune a chunk of Data in your CVMagnectic Library
Increase Deduped Counters on your Media Agent
Also while we're on a similar topic the non-errors "Unable to aquire the Lock to File" for pruning on partitioned GDSP's, I thought AFID's to prune were unique to an MA so why are multiple MA's trying to prune the chunk of a stream at the same time?
As you already know, the SFILE.idx is locked so that two pruning MAs do not attempt to prune the same chunk at the same time. So, if there is already one MA that has locked an SFILE.idx, any other MAs trying to prune that chunk at the same time will hit this error.
However, when it comes to pruning, Multiple Media Agents have the ability to prune as long as it has read\write access to the mount paths of the Library regardless if this is a mount path associated to a DDB. However you can set each Media Agent to be a pruner for a particular Library but you cannot set a particular Media Agent to prune based on DDB. AFID's are unique to the DDB but pruning happens on the Media Agent which is why all the Media Agents tries to prune on a mount path\Library level and not a DDB level for Data written on Libraries.
To set your Media agent to prune for specific Library, please make use of below settings:
qoperation execscript -sn SetDeviceControllerProps -si operation -si LibraryName -si MountPathName -si MediaAgentName
*Note- The above settings only applies to Cloud Libraries only and not Disk Libraries.
I wish you all the best with Commvault. Please do let me know if you have any questions.