HyperScale and Drill Holes = A lot of space occupied = Intensive Space Reclamation process needed

Last post 12-22-2018, 7:53 AM by Wwong. 6 replies.
Sort Posts: Previous Next
  • HyperScale and Drill Holes = A lot of space occupied = Intensive Space Reclamation process needed
    Posted: 11-27-2018, 8:14 AM
    Hi,

    We have HyperScale and no one said us that for this Scale-out architecture seems to be that
    Space Reclamation is needed. Space Reclamation is a task "hidden" in Data Verification task.

    According note seen in https://documentation.commvault.com/commvault/v11/article?p=93142.htm
    "To work around the drill holes limitation on HyperScale, automatically start a DV2 space
    reclamation job for scale-out configured mount paths."

    seems that HyperScale do not support drill holes hence micro pruning do not free up blocks
    and space reclamation is needed. This space reclamation should start automatically but I can
    not see that this automatic proccess take place.

    This concept is not a joke. For example, in our case, we have
    written 115 TB to disk + 33% Erasure Coded = 153 TB but we have 271 TB occupied
    on disk (from 350 TB) so we have our HyperScale's patform beyond sanity limits. There are
    about 118 TB that we do not know where they are.

    Solution?. Space Reclamation. A process that you can launch at differente
    levels (20%, 40%, 60 and 80%) and for the less agressive (80%) take two days to complete.
    I think Space Reclamation should be and independent task and should be warned when you
    talk about HyperScale.

    Off course I oppened incident to CommVault but community forum is always a good place to
    search help. I am misunderstanding something what I have exposed here?.

    Thanks a lot.
  • Re: HyperScale and Drill Holes = A lot of space occupied = Intensive Space Reclamation process needed
    Posted: 11-27-2018, 10:21 AM

    I am checking if this is correct.

    Regarding your space usage calculation, how many nodes are in a block?

    I am asuming 3 nodes looking at your calculation, then the erasure is based on 4+2 which results in (115 TB/4)*6=172,5 TB in use which leaves 271-172,5=98,5 TB which cannot be identified.

    Most of the environments I see have an overhead of 10-15% due to not being able to perform hole drilling.
    Even if I account for erasure coding this would result in more than 10-15% overhead.
    Not sure if you are only looking at effects possibly caused by lack of hole drilling.

    Will get back to you when I have an answer to your question regarding the usage of hole drilling.


    Jos Meijer
    Senior Technical Consultant
  • Re: HyperScale and Drill Holes = A lot of space occupied = Intensive Space Reclamation process needed
    Posted: 11-27-2018, 10:26 AM

    Just received confirmation that Gluster which is used for the disklibrary currently does not support Hole Drilling.


    Jos Meijer
    Senior Technical Consultant
  • Re: HyperScale and Drill Holes = A lot of space occupied = Intensive Space Reclamation process needed
    Posted: 11-27-2018, 10:37 AM

    Released a short while ago by Red Hat is Gluster Storage 3.4 which supports hole drilling or as they call it Punch Hole Support. Not sure in which SP release this will be implemented.

    But for now you are correct and you will have to run space reclamation.


    Jos Meijer
    Senior Technical Consultant
  • Re: HyperScale and Drill Holes = A lot of space occupied = Intensive Space Reclamation process needed
    Posted: 11-29-2018, 6:24 AM

    Hi,

    Thank you for your comments!!!.

    We have 4+2 but with 6 nodes. I undestand that same calculations apply (115 TB/4)*6=172,5 TB. We have launched "Quick Verification of Deduplication Database". This task take almost one hour to complete and generates an inform in file DDBMntPathInfo.log (without "o" not as is stated in            http://documentation.commvault.com/commvault/v11/article?p=100399.htm). This file is generated in each Media Agent that is select in the task (five in my case). Summing all "Reclaimable" lines (in all MAs) and 80% column for all Media Agents we have that Reclamation task free up about 28 TB. And it's true. After launch the Reclamation task at level 1 (=80%  http://documentation.commvault.com/commvault/v11/article?p=100399.htm) 27,7 TB have been recovered. Focusing on the progresion 27,7(80%) -> 33,1(60%) -> 36,1(40%) -> 38,2(20%) we can suppose that at 0% Reclamation space could be about 42 TB. So we can suppose that there are 56,5 TB which cannot be identified (172,5+42=214,5 then 271-214,5=56,5 TB).

    MA1

    320649 4e489 11/26 09:54:26 2348772 Reclaimable: @ 20% [  2314598070204 (  2155.64 GB), 15.13%], @ 40% [  2174075216664 (  2024.77 GB), 14.21%]

    320649 4e489 11/26 09:54:26 2348772              @ 60% [  1996678856231 (  1859.55 GB), 13.05%], @ 80% [  1693216405721 (  1576.93 GB), 11.07%]

    MA2

    59134 e6fe 11/26 09:54:26 2348772 Reclaimable: @ 20% [ 19062449017312 ( 17753.29 GB), 15.59%], @ 40% [ 18008857018772 ( 16772.05 GB), 14.73%]

    59134 e6fe 11/26 09:54:26 2348772              @ 60% [ 16461588937894 ( 15331.05 GB), 13.46%], @ 80% [ 13735460137479 ( 12792.14 GB), 11.23%]

    MA3

    306623 4adbf 11/26 09:54:26 2348772 Reclaimable: @ 20% [  4646115277276 (  4327.03 GB), 15.26%], @ 40% [  4389793835701 (  4088.31 GB), 14.42%]

    306623 4adbf 11/26 09:54:26 2348772              @ 60% [  4035474081552 (  3758.33 GB), 13.26%], @ 80% [  3368023087733 (  3136.72 GB), 11.06%]

    MA4

    280224 446a0 11/26 09:54:26 2348772 Reclaimable: @ 20% [  8838102087792 (  8231.12 GB), 15.94%], @ 40% [  8391798588495 (  7815.47 GB), 15.14%]

    280224 446a0 11/26 09:54:26 2348772              @ 60% [  7745489481682 (  7213.55 GB), 13.97%], @ 80% [  6554864682325 (  6104.69 GB), 11.82%]

    MA5

    245541 3bf25 11/26 09:54:26 2348772 Reclaimable: @ 20% [  7148041261442 (  6657.13 GB), 16.42%], @ 40% [  6762634347911 (  6298.19 GB), 15.53%]

    245541 3bf25 11/26 09:54:26 2348772              @ 60% [  6197695591997 (  5772.05 GB), 14.23%], @ 80% [  5168525825806 (  4813.56 GB), 11.87%]

     

    Is important to remark that, according the reclaimable space at 20%, we have an overhead of 22% (38,2*100/172,5) as a consequence of the fact that it is not supported drill holes in Hyperscale.

    File DDBMntPathInfo.log also shows some other interesting information:

    MA1

    320649 4e489 11/26 09:54:26 2348772 Blks Size (in Bytes) - Total [29759247681789], Valid [12876394023195], PhyRemoved [14464491798447], Dormant [2418361860147, 15.81%]

    MA2

    59134 e6fe 11/26 09:54:26 2348772 Blks Size (in Bytes) - Total [228165275241312], Valid [102567067344913], PhyRemoved [105895604788823], Dormant [19702603107576, 16.11%]

    MA3

    306623 4adbf 11/26 09:54:26 2348772 Blks Size (in Bytes) - Total [57331403490678], Valid [25620957499204], PhyRemoved [26892359648858], Dormant [4818086342616, 15.83%]

    MA4

    280224 446a0 11/26 09:54:26 2348772 Blks Size (in Bytes) - Total [108522909869856], Valid [46309834727598], PhyRemoved [53082951452161], Dormant [9130123690097, 16.47%]

    MA5

    245541 3bf25 11/26 09:54:26 2348772 Blks Size (in Bytes) - Total [83947325559037], Valid [36158805216303], PhyRemoved [40407979760693], Dormant [7380540582041, 16.95%]

     

    Accordig this, summing all numbers per MA, we have:

    461,7 TB in Total, 203.3 TB Valid, 218.9 PhyRemoved and 39.5 Dormant. 

    I don't understand these numbers because we haven't 461,7 TB of disk and this number is far from our Hyperscale's backend or frontend. If anyone can clarify these numbers and if my calcs are correct... Another question is about automatic reclamation task. I can't see any reclamation task is scheduled. Is this task emmbbed in the code and launched when critical occupation take place?. In that case, what is the critical occupation threshold?. Can be changed?.

    Seems that, the fact no drill holes is supported, have a very huhge overhead in HyperScale, far from 10-15%. I think this could be a key factor when someone think in adquire HyperScale solution. 

    Thanks a lot!!!

  • Re: HyperScale and Drill Holes = A lot of space occupied = Intensive Space Reclamation process needed
    Posted: 12-20-2018, 3:39 PM

    Indeed, it's an old problem not limited to Hyperscale, but even when using NAS as backup mount paths...  NAS is so convenient with Gridstor etc, but it does have drawbacks.

    I was the one launching this topic with Commvault development 3 years ago, after noticing my NAS storage spiraling out of control, and numbers not adding up.  There was no documentation on this so I was on my own figuring this out.  Turned out I had 55% of whitespace ...  with 800+TB of storage... you quickly understand how much storage was "stuck" in my case, and how hard it was to sell this to a manager.  Just adding more and more space wasn't defendable at all, knowing more than half of the used space was whitespace.

    Making new storage policies and auxcopying would've required even more space.. not mentioning the time and performance impact of full backups.  so bad ideas allround.

    Hence the space reclamation feature.  Performance was not so great when it came out, and last summer some tweaks were made to make it perform better. 

    Unfortunately, as soon as you've run space reclamation, the reality is that things just start all over again... white space increasing as you are aging data. So you need to run this regularly.

    This isn't scheduled automatically, you have to do schedule it yourself if that is what you want.

    DDBMntPathInfo.log also shows a total for all mountpaths from a DV2 job:

    11151 2b8f 11/19 14:28:06 3761049 ====================================================================
    11151 2b8f 11/19 14:28:06 3761049 TOTAL for all MPs [5], EngId [34], Frag. threshold [40%], Attempt # [1], Datapath MAs # [2]
    11151 2b8f 11/19 14:28:06 3761049 Chunks - Total [107872]
    11151 2b8f 11/19 14:28:06 3761049 Fragmented: >= 20% [ 41626], >= 40% [ 30105]
    11151 2b8f 11/19 14:28:06 3761049 >= 60% [ 23740], >= 80% [ 18349]
    11151 2b8f 11/19 14:28:06 3761049 Sfile containers - Total [3104122]
    11151 2b8f 11/19 14:28:06 3761049 Fragmented: >= 20% [ 1160946], >= 40% [ 937671]
    11151 2b8f 11/19 14:28:06 3761049 >= 60% [ 669454], >= 80% [ 410773]
    11151 2b8f 11/19 14:28:06 3761049 Blks Count - Total [632407110], Valid [192265657], PhyRemoved [377438852], Dormant [62702601, 24.59%]
    11151 2b8f 11/19 14:28:06 3761049 Reclaimable: @ 20% [ 59935573, 23.51%], @ 40% [ 54534583, 21.39%]
    11151 2b8f 11/19 14:28:06 3761049 @ 60% [ 42780173, 16.78%], @ 80% [ 27578914, 10.82%]
    11151 2b8f 11/19 14:28:06 3761049 Blks Size (in Bytes) - Total [38554025165261], Valid [13210169907875], PhyRemoved [22191712006979], Dormant [3152143250407, 19.26%]
    11151 2b8f 11/19 14:28:06 3761049 Reclaimable: @ 20% [ 3034835155584 ( 2826.41 GB), 18.55%], @ 40% [ 2792531209427 ( 2600.75 GB), 17.07%]
    11151 2b8f 11/19 14:28:06 3761049 @ 60% [ 2023627613733 ( 1884.65 GB), 12.37%], @ 80% [ 1498039433596 ( 1395.16 GB), 9.16%]
    11151 2b8f 11/19 14:28:06 3761049 ==================================================================== 


    When using gridstor and using all MA's for the DV2 job, you should indeed SUM these values to get an idea of how much whitespace can be reclaimed.  Would be cool to get this into Command Center at some point!

    SP14 added incremental support for space reclamation (space reclamation was always preceded with a Full Quick DDB verification that populates the DDBMntPathInfo.log files.  This could take a very long time on large DDB's).

    And there's more stuff to come still... ;)

     

    With regards to hole punching on gluster for hyperscale. I've been talking to development since over a year about this :-)    It was my first question to them as soon as HyperScale came out :-)

    SP14 is just out and the hyperscale images are still based on RHGS 3.3.1 it seems.

    #cat /etc/redhat-storage-release
    Red Hat Gluster Storage Server 3.3.1
    # gluster volume get all cluster.op-version
    Option Value
    ------ -----
    cluster.op-version 31101

    That is just one release behind on 3.4. Crossing fingers it makes it into the next SP! They will at some point, I just don't know when.

    Renaat

  • Re: HyperScale and Drill Holes = A lot of space occupied = Intensive Space Reclamation process needed
    Posted: 12-22-2018, 7:53 AM

    Hi All 

    Just want to provide some additional reference information, as of SP12 "Defragmentation Of Disks" of Gluster on HyperScale is an automated process when the Storage Pool meets the High Water Mark percentage - http://documentation.commvault.com/commvault/v11/article?p=105346.htm 

    So to overcome the white space issue, this feature was introduced. 

    Nonetheless you can still run the "Space Reclamation" manually.

    Hopefully this helps 

    Thank you 

    Winston

The content of the forums, threads and posts reflects the thoughts and opinions of each author, and does not represent the thoughts, opinions, plans or strategies of Commvault Systems, Inc. ("Commvault") and Commvault undertakes no obligation to update, correct or modify any statements made in this forum. Any and all third party links, statements, comments, or feedback posted to, or otherwise provided by this forum, thread or post are not affiliated with, nor endorsed by, Commvault.
Commvault, Commvault and logo, the “CV” logo, Commvault Systems, Solving Forward, SIM, Singular Information Management, Simpana, Commvault Galaxy, Unified Data Management, QiNetix, Quick Recovery, QR, CommNet, GridStor, Vault Tracker, InnerVault, QuickSnap, QSnap, Recovery Director, CommServe, CommCell, SnapProtect, ROMS, and CommValue, are trademarks or registered trademarks of Commvault Systems, Inc. All other third party brands, products, service names, trademarks, or registered service marks are the property of and used to identify the products or services of their respective owners. All specifications are subject to change without notice.
Close
Copyright © 2019 Commvault | All Rights Reserved. | Legal | Privacy Policy