Collecting access time metadata for analytics

Last post 10-17-2018, 9:30 AM by BillyK. 29 replies.
Sort Posts: Previous Next
  • Collecting access time metadata for analytics
    Posted: 09-26-2018, 9:46 AM

    Hello,

    We have a need to start collecting access time metadata to be used for Commvault FLA via the webconsole.  We've confirmed that NTFS access time is enabled on our servers. (NTFSDisableLastAccessUpdate - 0) Is this the only setting we need on a Windows 2012R2 server? We've also modified all our subclients to include "Catalog additional file and system attributes".  Last night we ran our first set up backups with the new sublient settings. (Incrementals)  When reviewing the analytic reports this morning, we noticed that access time metadata was only collected for a very small sampling of data. (~ 300gb out of 155TB of total data) 

    Question:  After configuring subclients to collect this metadata, does a full backup need to run in order to obtain the access time for ALL our data?  Is the access time only being collected for files that are backed up or should it be collected for everything that is scanned? (i.e., last night's incremental should have captured all the access times?)  Do we need to wait for the weekly synthetic full cycle?  Please advise on what we need to do in order to collect the access time metadata for ALL our data.

     

    Thanks,

    - Bill 

  • Re: Collecting access time metadata for analytics
    Posted: 09-26-2018, 9:52 AM

    A synthetic full won't do it for you since you are possibly making an incremental and then merging the incrementals within that cycle. You need to access the file system with a regular full backup in order to retrieve the access time for all the data.


    Jos Meijer
    Senior Technical Consultant
  • Re: Collecting access time metadata for analytics
    Posted: 09-26-2018, 9:56 AM

    I was afraid of that.  Regarding the servers, is that the only registry setting that should be needed for enabling access time?  Also, at the FS agent level, I did choose "track access time" but we haven't rebooted our servers.  I assumed that a reboot wasn't needed as NTFS access time is already enabled in the registry on our servers.  If the server still needs to be rebooted for another reason, please advise.  Thank you for the quick response.

    - Bill

  • Re: Collecting access time metadata for analytics
    Posted: 09-26-2018, 11:01 AM

    To enable this you need:

    • registry key or fsutil command to set disablelastaccess to 0
    • enable 'Catalog additional file and system attributes' within commvault advanced options of the subclient
    • reboot the server where disablelastaccess setting has been changed

    Jos Meijer
    Senior Technical Consultant
  • Re: Collecting access time metadata for analytics
    Posted: 10-04-2018, 9:10 AM

    If a full backup is required to collect all the "access time" metadata, that would imply that going forward, as files are accessed but not modified, Commvault would not collect that metadata and have no knowledge of these changes, correct?  That would render Commvault essentially useless as far as maintaining up to date metadata for access time.  We were hoping that we could collect this data with every backup (presumably during the scan phase) and then generate CV analtyics reports based on access time.  Can someone please clarify this for me?

     

    Thanks,

    - Bill

  • Re: Collecting access time metadata for analytics
    Posted: 10-05-2018, 5:29 AM

    To my knowledge this is correct, but I agree that this is not how you would want this to function and makes me wonder if I am wrong..

    Will check this.

    You could also use data cube for file system to online crawl the server, this way you don't need to run a full backup. Not sure though if analytics will be populated, you might need to build a report to show information.


    Jos Meijer
    Senior Technical Consultant
  • Re: Collecting access time metadata for analytics
    Posted: 10-08-2018, 8:59 AM

    Hi Jos,

    Any update on this? Thank you so much for your response.

  • Re: Collecting access time metadata for analytics
    Posted: 10-08-2018, 9:58 AM

    Not yet, due to having 2 kids and past two days being the weekend I didn't have time to arrange this.
    Currently working to retrieve analytics for a newly installed server, will get back to you soon.


    Jos Meijer
    Senior Technical Consultant
  • Re: Collecting access time metadata for analytics
    Posted: 10-08-2018, 10:06 AM

    No urgency, thanks.  Another issue that I noticed regarding analtyics, if the client in question is being backed up via both an FS and VSA agent, the metrics get counted twice. (Not just once for the subclient in the computer group that you're running analytics on)  

  • Re: Collecting access time metadata for analytics
    Posted: 10-08-2018, 10:55 AM

    Having a slight issue with the v10 method (Webconsole - Analytics - File Analytics) showing the Data Analytics information. Not sure yet whats happening there.

    So I started testing with the V11 method of analytics (Webconsole - Analytics - File System) and created a new File System Data Cube.

    • Enabled lastaccesstime on the FS of a Windows 2016 server, rebooted
    • Created 2 text files
    • Performed an online crawl of the file system while text file 1 was excluded
    • Created a custom report based on file name and access time, showed the access time of text file 2 as expected
    • Removed the exclusion of text file 1 and performed an incremental crawl, this initiates a file scan which should be the same as an incremental backup scan it uses a file system agent to perform this action
    • Refreshed the report information and still showing only access time for text file 2

    My conclusion is that when performing an incremental backup, only the delta related files are being scanned.
    Therefor no file access information is added to the index server for files which have not been changed between the full and the incremental backup.

    If you want to index the access time for all files you will need to perform a full backup or use data cube to perform a file system online crawl of the file system locations you want to index.

    When I get the v10 method of file analytics working again I will double check, but I really doubt that there will be a different outcome.


    Jos Meijer
    Senior Technical Consultant
  • Re: Collecting access time metadata for analytics
    Posted: 10-08-2018, 10:56 AM

    Bill,

    Jos is right. Backup based FLA will not include the accurate access time information. Incremental backups will only update the meta data information of files that are modified and not accessed. Hence, you will not be able to build a reilable report based on access time. FLA reports are good for reports based on file size, modified time. 

    For reports based on access time, you will need to use Data cube. Please find the link below for more details on how to setup. You can directly reach out to me if you need a demo. 

     

    http://documentation.commvault.com/commvault/v11/article?p=43863.htm

     

    Thanks,

    Praveen

  • Re: Collecting access time metadata for analytics
    Posted: 10-08-2018, 11:18 AM

    Thanks Praveen.  That being said, how does OnePass work then for a policy based on access time?  (We're going to be implementing OnePass soon)  Is it doing something different where it can check the access time with every backup, including incrementals?  Thanks again and I'll look into Data Cube.

     

    - Bill

  • Re: Collecting access time metadata for analytics
    Posted: 10-16-2018, 12:10 PM

    Looking to setup datacube to perform a crawl on some of our fileserver volumes.  I'm a bit confused as to what values are needed for Index server and Access node.  I have a fileserver (name=dalfs01), what values do I enter if I want to crawl G:\ for on dalfs01 for example?  Are there any specific software packages that need to be deployed to the fileserver to support this crawl?  All I'm looking to do at the moment is crawl some of our fileserver filesystems so that I can retrieve access time metrics to be viewed via the filesystem data connector on the web console.  Thanks.

  • Re: Collecting access time metadata for analytics
    Posted: 10-16-2018, 12:16 PM

    If I want to run the crawl directly on the server as opposed to a UNC path, do I need to install the media agent package on the fileserver?  Thanks.

  • Re: Collecting access time metadata for analytics
    Posted: 10-16-2018, 1:32 PM

    You will need to install the index server package on a server and configure the role for data analytics.

    Media agent with file system agent can be used as access node.

    If you want to scan UNC you need to have file system agent and media agent installed on the client.


    Jos Meijer
    Senior Technical Consultant
  • Re: Collecting access time metadata for analytics
    Posted: 10-16-2018, 1:45 PM

    Thanks Jos.  I have the crawl working fine across a UNC path from my index server, which also has the MA package installed.  Are you saying that if I want to run the scan locally on the fileserver, I need to install both the MA and index server packages locally? (Even though I'm going to be using a different index server)  Before even going down this road, is this even necessary or would you just use UNC paths?  I was thinking that maybe performing the scan locally would be faster.  Thanks again!

  • Re: Collecting access time metadata for analytics
    Posted: 10-16-2018, 2:17 PM

    I have used UNC, but a local install of Media Agent and File System Agent might be faster.

    No do not install index server package on the file server client.

    If you do not backup the server via a local agent I would stay with UNC.

    Tomorrow I am very busy and have little time, do want to test this though to see if it is faster and if you can re-use existing backup data so you can offload the process to the MA which is handling the disk library.

    Willtest later this week or beginning next week depending on my workload. Will get back to you.


    Jos Meijer
    Senior Technical Consultant
  • Re: Collecting access time metadata for analytics
    Posted: 10-16-2018, 2:51 PM

     

    Hi Jos,

    There seems to be a huge difference in performance when running the scan locally as opposed to via UNC.

     

    Testing Scenario #1 (Scan over UNC from index server)

    index srever = index server

    access node = index server

    875GB took 25 minutes

     

     

    Testing Scenario #2 (Scan local on fileserver)

    index server = index server

    access node = fileserver

    875GB took 44 seconds

    To get this configuration working, I had to install the MA and index server packages on the fileserver and then create a new index server via the commserve.  I still specifed the same index server as in testing scenario #1.  Although a new index server was created on the fileserver, it doesn't appear to be using the directory structure at all, which is kind of what I expected.

     

    * incremental crawl was unchecked for both tests

  • Re: Collecting access time metadata for analytics
    Posted: 10-16-2018, 3:12 PM

    Nice! But why install the index server on the file server?

    With the media agent package on the file server you can assign the file server as access node and you should achieve the optimal speed. The index package would not do anything as you are assigning the index to the other index server.


    Jos Meijer
    Senior Technical Consultant
  • Re: Collecting access time metadata for analytics
    Posted: 10-16-2018, 3:17 PM

    I would receive a connection related error with creating the index server.  I'm guessing that by creating an index server, it spawns a process that listens for incoming connections that datacube/scan connects to.  It wouldn't work until I created one.  I'll test again on another server with just the MA installed just to confirm.

  • Re: Collecting access time metadata for analytics
    Posted: 10-16-2018, 3:25 PM

    I just ran a netstat on the index server and I see an establishbed connection to port 20000 on the fileserver.  I suspect that when running the scan locally, the index server still needs an established connection to this analytics port and the process that listens on the port isn't spawned until the index server is created.

  • Re: Collecting access time metadata for analytics
    Posted: 10-16-2018, 3:35 PM

    Odd.. it is not logical to need an index server on the data source.

    I am also very curious to both our findings.

    Then we can compare scenarios and find the difference, if we have a different outcome :)


    Jos Meijer
    Senior Technical Consultant
  • Re: Collecting access time metadata for analytics
    Posted: 10-16-2018, 4:03 PM

    Jos,

    I tested on another fileserver but this time I just installed the MA software and not the index and sure enough, the crawl failed right out of the gate with the following error:

     

    Error Code: [72:106]

    Description: Failed to send data to Index Engine. Please verify that the Index Engine is running.
    Source: DAVMS221162, Process: FileScan   

     

    It looks like the index software is required on the fileserver in order to communicate back to the index server.  

  • Re: Collecting access time metadata for analytics
    Posted: 10-16-2018, 4:15 PM

    You might be having issues with just the connection to the index server, try creating a client group for your server with MA only and assign a two way firewall policy between that client group and the index server client group.

    If that works then you have a solr engine security being to secure which is bypassed by the tunnel.


    Jos Meijer
    Senior Technical Consultant
  • Re: Collecting access time metadata for analytics
    Posted: 10-16-2018, 4:57 PM

    If there was a connectivity issue, I wouldn't think that it would just immediately resolve after creating an index server on the fileserver.  Something else seems to be at play here.  Can you please confirm when you get a chance?  I suspect that you'll see the same thing.  Thanks.

  • Re: Collecting access time metadata for analytics
    Posted: 10-17-2018, 5:34 AM

    As expected, I provided a VM with the MA and FS agent.

    Created a FS Data Cube with as access node the VM it self.

    Initial test failed with error: Solr ping request to server failed

    Made a two-way network topology in order to bypass the security restrictions of SOLR, this provided a tunnel between the index server and the VM and it works.


    Jos Meijer
    Senior Technical Consultant
  • Re: Collecting access time metadata for analytics
    Posted: 10-17-2018, 8:56 AM

    Still makes no sense as adding an index server resolves the issue on my end every time.  I have a call into Commvault regarding this.  

    Regarding access time metrics, do you agree that I would need to perform the crawl without the incremental crawl checkbox selected IF I want to maintain up to date access time metrics everytime the job is run?  I'm guessing that with it checked, it's only going to pull access time metrics for any files that have "changed", meaning modified since the last crawl.  Thank you.

  • Re: Collecting access time metadata for analytics
    Posted: 10-17-2018, 8:58 AM

    What packages needed to be installed on the VM?  MA and Index Store or just MA?  Thanks.

  • Re: Collecting access time metadata for analytics
    Posted: 10-17-2018, 9:04 AM

    It doesn't make sense IPv4 wise, but is does when looking at security within SOLR.

    But regarding access times, yes a full crawl will index that for all files. Incremental only for changed files.

    Only MA and a File System agent need to be installed on the VM, not the index server package.


    Jos Meijer
    Senior Technical Consultant
  • Re: Collecting access time metadata for analytics
    Posted: 10-17-2018, 9:30 AM

    Thanks Jos.  I will try the 2-way firewall configuration without the Index Store package being installed.  One last thing, when viewing the metrics from the webconsole, although it shows you a bar graph for access time, it does not show an access time column when viewing the actual files at the bottom of the page. (Only modified time)  I am looking at the Size Distirubtion Dashboard report.  Is there any way to view the access time for each of the individual files or is that an expected limtation?  Thanks again!

The content of the forums, threads and posts reflects the thoughts and opinions of each author, and does not represent the thoughts, opinions, plans or strategies of Commvault Systems, Inc. ("Commvault") and Commvault undertakes no obligation to update, correct or modify any statements made in this forum. Any and all third party links, statements, comments, or feedback posted to, or otherwise provided by this forum, thread or post are not affiliated with, nor endorsed by, Commvault.
Commvault, Commvault and logo, the “CV” logo, Commvault Systems, Solving Forward, SIM, Singular Information Management, Simpana, Commvault Galaxy, Unified Data Management, QiNetix, Quick Recovery, QR, CommNet, GridStor, Vault Tracker, InnerVault, QuickSnap, QSnap, Recovery Director, CommServe, CommCell, SnapProtect, ROMS, and CommValue, are trademarks or registered trademarks of Commvault Systems, Inc. All other third party brands, products, service names, trademarks, or registered service marks are the property of and used to identify the products or services of their respective owners. All specifications are subject to change without notice.
Close
Copyright © 2018 Commvault | All Rights Reserved. | Legal | Privacy Policy