Media mount and usage errors - going on months

Last post 11-19-2019, 11:06 AM by DM_MP. 20 replies.
Sort Posts: Previous Next
  • Media mount and usage errors - going on months
    Posted: 09-09-2019, 5:23 PM

    I have opened several Incidents with Support but we cannot seem to find a resolution. I keep getting the Media Mount and Usage error alerts on all 3 media agents and pretty much all paths over the past several weeks/months.

    The latest SE noticed the following message:

    [TYPE] Information [TIME] 2019/09/09 01:06:43  [SOURCE] Galaxy [COMPUTER] xxx.xxx.local[DESCRIPTION] Failed in acquiring the lock on file [E:\Program Files\Commvault\ContentStore\Log Files\PerformanceMetrics.log] Module [PerformanceMetrics] Error[0x80070306:{CQiLogger::LockLockfile(1326)} + {CQiLogger::LockLockfile(1320)/W32.774.(One or more errors occurred while processing the request.)-Timed out acquiring mutex on Global\QiLogger_PerformanceMetrics}]

    Has anyone else encountered several Media Mount and Usage alerts and or know what the "mutex on Global" even means?

    Thanks. 

  • Re: Media mount and usage errors - going on months
    Posted: 09-09-2019, 7:19 PM

    Is the CommServe or are the MediaAgents clustered?

  • Re: Media mount and usage errors - going on months
    Posted: 09-09-2019, 8:48 PM

    The errors are showing up on our Media agents. We have 2 on one campus and a single media agent at a different campus. They are not clustered and the OS is Win2k12 R2 and the hardware is HPE Proliant 380DL Gen9. Each campus location has its own dedicated Storage (Netapp e2700 series). So single Commcell environment. 

  • Re: Media mount and usage errors - going on months
    Posted: 09-09-2019, 9:13 PM

    Given that it is happening on the three mediaagents, it is possible that the problem is coming from the CommServe.  What version+servicepack of Commvault are you running, and what is the Build Number of the SQL Server on the CommServe?

  • Re: Media mount and usage errors - going on months
    Posted: 09-10-2019, 8:29 AM

    I am running v11 sp15 HPK21. The SQL build number on the CommServe is Server 2014 v12.0.2269.0 (X64) Jun 10 2015.

    So there is no Anti-Virus on the media agents but I happened to notice that SEP is installed but disabed on the CommServe. Should it be completely uninstalled? 

  • Re: Media mount and usage errors - going on months
    Posted: 09-10-2019, 10:00 AM

    Ok thanks. That is an old revision of SQL server 2014. Can you see errors in the Program Files\Microsoft SQL Server\MSSQL.n\MSSQL\LOG\ERRORLOG file?

  • Re: Media mount and usage errors - going on months
    Posted: 09-10-2019, 12:47 PM

    I checked the SQL ERRORLOG files on both commserve servers (active & standby) and I didn't see any errors around the date/time the media mount and usage alert came out. 

    Support is seeing this when they parse the most recent logs:

    redatacted\cvd

    1760  2150  09/09 01:27:50 1611889 6700604-# [DM_BASE    ] **ERROR** MountActiveVolume Failed to mount volume 0  MediaGroupId = 205 MA hostname- redacted MediaManager returned error [752], errorString [A mount path already exists at the given location. Cannot create a file when that file already exists. ]

    1760  2150  09/09 01:27:50 1611889 6700604-# [DM_BASE    ] Going to check whether the mount should be retried for the above mediamanager error

     

    1760  2150  09/09 01:27:50 1611889 6700604-# [DM_BASE    ] **ERROR** MountActiveVolume Failed to mount volume 0  MediaGroupId = 205 MA hostname- redacted MediaManager returned error [752], errorString [A mount path already exists at the given location. Cannot create a file when that file already exists. ]

     

     

    The error code 183 returned by the OS indicates that the file already exists:

    C:\>net helpmsg 183

    Cannot create a file when that file already exists.

     

    This might happen if the device did not respond, timed out and CV eventually retried the same file.

    The health check on the dedicated storage device showed no issues. 

  • Re: Media mount and usage errors - going on months
    Posted: 09-10-2019, 6:24 PM

    Hi wsimps01

    The original error looks to be an issue when updating logs, and could be related to slowness on the Disk where MediaAgent Binary is installed. 

    The subsequent mount errors looks like volume exist in the current mount path and this type of issue could be exhibited due to:

    • Slow storage at the time of writing, Commvault acknowledges the volume is already created or
    • A DR restore on the Commserve is done, when it attempts to re-write the same volume, as the volume already exist on Disk it will report the error and skip writing. 
    As you have mentioned you are using NetApp Arrays, so I presume you will be using CIF presented volume for the Disk Storage. 
     
    Can we also check the Network aspect to confirm whether there are TCP reset or packet drops resulting in these intermittent errors?
     
    Regards
     
    Winston 
  • Re: Media mount and usage errors - going on months
    Posted: 09-25-2019, 12:17 PM

    Hi,

    I have the exact same erro on all of my 4 MAs since the SP16 update.

     

    Failed in acquiring the lock on file [D:\Programm Files\CommVault\Log Files\PerformanceMetrics.log] Module [PerformanceMetrics] Error[0x80070306:{CQiLogger::LockLockfile(1318)} + {CQiLogger::LockLockfile(1312)/W32.774.(One or more errors occurred while processing the request.)-Timed out acquiring mutex on Global\QiLogger_PerformanceMetrics}]

    And then

    Backup error with:

    • Failure Reason: Failed to mount the disk media in library [*"MA"_DiskLib00] with mount path [D:\V5020_Diskpool_RDC\RDC_Diskpool_Volume_P00_03] on MediaAgent [*"MA"]. Reason: A mount path already exists at the given location. Advice: Please enter a different mount path.

    The open Commvault support case is not really big help at the moment or going any further :(


    BR

  • Re: Media mount and usage errors - going on months
    Posted: 10-09-2019, 3:51 PM

    I'm here to add that we too are experiencing this same exact issue.  I've opened a case with support today and sent logs.

    We upgraded to SP16 HotfixPack 9 about 4 to 6 weeks ago.  It has been okay up until last night, when all jobs became hung for about 30 minutes.  After some time, the Library came back on its own and jobs resumed. We get the following Information in Media Agent Event Viewer: Windows Application Logs

    Failed in acquiring the lock on file [C:\Program Files\Commvault\ContentStore\Log Files\PerformanceMetrics.log] Module [PerformanceMetrics] Error[0x80070306:{CQiLogger::LockLockfile(1318)} + {CQiLogger::LockLockfile(1312)/W32.774.(One or more errors occurred while processing the request.)-Timed out acquiring mutex on Global\QiLogger_PerformanceMetrics}]

    We also received a lot of failed/hung backup messages:

    Failure Reason: A mount path already exists at the given location

    Failure Reason: Failed to mount the disk media in library [LibraryName] with mount path [C:\MountPoints\DataStore#] on MediaAgent [MediaAgentName]. Reason: A mount path already exists at the given location. Advice: Please enter a different mount path.

     

    Media Agent is Server 2012R2. Storage is EMC VNX series SAN.

  • Re: Media mount and usage errors - going on months
    Posted: 10-09-2019, 4:40 PM

    May I know the support ticket number. We will check the logs and get back on this. 

  • Re: Media mount and usage errors - going on months
    Posted: 10-10-2019, 3:05 PM

    Hi Prasad - Ticket Number: 191009-333

  • Re: Media mount and usage errors - going on months
    Posted: 10-11-2019, 9:27 AM

    I've updated my case with CV support with the following info, but I wanted to note it here - in case it helps anyone else.

    We have a dedicated VM proxy at our DR site for LiveSync of VMs.  The VM proxy became unresponsive overnight, and all LiveSync operations have been hung for 8+ hours.  I found the same messages in the Windows Application Event Viewer repeatedly since 1AM:

    Log Name:      Application

    Source:        Galaxy

    Date:          10/11/2019 1:03:30 AM

    Event ID:      1

    Task Category: None

    Level:         Information

    Keywords:      Classic

    User:          N/A

    Computer:      <Virtual Appliance> redacted 

    Description:

    Failed in acquiring the lock on file [C:\Program Files\Commvault\ContentStore\Log Files\PerformanceMetrics.log] Module [PerformanceMetrics] Error[0x80070306:{CQiLogger::LockLockfile(1318)} + {CQiLogger::LockLockfile(1312)/W32.774.(One or more errors occurred while processing the request.)-Timed out acquiring mutex on Global\QiLogger_PerformanceMetrics}]

     

    This proxy client is Win2012R2 - 4 CPU 16GB RAM - VM Tools up to date.  Latest Win Updates installed.  It performs no other roles.

  • Re: Media mount and usage errors - going on months
    Posted: 11-01-2019, 9:34 AM

    Bumping this thread for awareness.  Same issue occured again.  Same symptoms.  All SAN activity paused.  All backup jobs hung.  Same warnings in the System log for the duration of the issue.

    Failed in acquiring the lock on file [C:\Program Files\Commvault\ContentStore\Log Files\PerformanceMetrics.log] Module [PerformanceMetrics] Error[0x80070306:{CQiLogger::LockLockfile(1318)} + {CQiLogger::LockLockfile(1312)/W32.774.(One or more errors occurred while processing the request.)-Timed out acquiring mutex on Global\QiLogger_PerformanceMetrics}]

    After about 30 minutes, the storage came back online on its own.  All backup operations resumed.

    CV case notes have been updated.  We did not experience this issue prior to our upgrade to SP16.  (upgraded from SP11 to SP16 in early August 2019)

  • Re: Media mount and usage errors - going on months
    Posted: 11-01-2019, 10:10 AM

    DM_MP are you seeing any mutex errors on any or all of your media agents during this time? If so what hardware are your media agents? I still haven't found a resolution for my issue and it still continues to randomly happen. The last Incident Support showed me on the media agent(s) Win2k12 R2 a lot of mutex issues and suggested it may be hardware/firmware. I am running HP ProLiant DL380s.

  • Re: Media mount and usage errors - going on months
    Posted: 11-01-2019, 10:23 AM

    We only have a single MA in our main data center.  it's a Dell PowerEdge R430.  We have another MA in our DR site, however its primary function is to receive AUX data and run periodic SQL Log restores to our DR database servers.  So, it's not nearly as 'busy' as our Prod Media Agent.  It is also a Dell PE R430.

    Both Run Win 2012R2 w/ the latest Win Updates.

    I filtered the Windows System Log on our DR Media Agent for Source:"Galaxy" , and it returned zero results.  So, it doesn't appear to be impacting this MA - yet.

    The VMWare proxy I mentioned in another post, which reported the Mutex errors, is a VM.  So, in this instance, it would rely on the VMWare hardware profile.

    The last time this happened, I had my Storage guys check the SAN for the time that the issue occured.  Aside from I/O flatlining during that time period, there were no other warnings or errors reported on the SAN.  There was just a 20-30 minute stretch of no I/O.

    I'm willing to entertain a f/w or driver issue, however it's peculiar to see it on a VM as well as physical hardware.  It's also interesting that we did not encounter this issue prior to the SP16 upgrade.

  • Re: Media mount and usage errors - going on months
    Posted: 11-01-2019, 4:39 PM

    A bit of speculation here, but I'll toss it out there.

    When this issue occured yesterday evening, there was a Tape AUX copy job which had already been running for about 2 hours.  The disk library and the tape library are both fiber attached to the Media Agent, so all data flow would be self-contained to the MA.  It should be noted that there were no indications that this Tape AUX copy operation went into pending/failed state at any time.  CV support also confirmed that our CVD services continued to process data through the outage.  I must assume the data being processed was this tape AUX copy.

    Conversely, any new backup operation hung with a message that the the MA was unreachable, or the Library was offline/unavailable.

    Consider that there was also an AUX copy scheduled to run during this outage.  This was a network based AUX copy from our Main MA to our DR MA.  It failed to start, due to inability to contact our Production MA.

    The MA continues to process data within itself (tape aux copy).  Any attempts to contact it via network for new backup or Aux copies are denied.  Almost like the MA services are stopped or the network connection has been disrupted.

    Our monitoring software reports no network disruption (i.e. it was pingable for the duration of the outage).  So we must assume our NICs were active and functional.  The NIC reports it has been up the same amount of time as the server (38+ days).

    Let's consider that during this outage, we see repeated messages in our Windows Event log:  unable to obtain file lock on a .LOG file in the CommVault install directory.

    Speculation - If CommVault is unable to obtain a lock for a .LOG file, the CV services that handle network connectivity go into a failed state.  This prohibits the MA from responding to any network requests for CommVault.  The result is Loss of Control and/or Library Offline error messages.

    I'm interested in some feedback regarding these assumptions.

    Final note - all Local Hard Disks in the Media Agent (OS, IndexCache, and DDB disks) are all SSD.

  • Re: Media mount and usage errors - going on months
    Posted: 11-06-2019, 6:12 PM

    Generally if we see the [A mount path already exists at the given location] error exception we'd initially triage by attempting to mark the active media as full (Link: https://documentation.commvault.com/commvault/v11/article?p=13958.htm#o13959). Quite possibly there is an active volume folder being referenced that causes this error as we assume it should not exist, yet it does for reasons unknown in this case without further context. Give this a go for the applicable storage policy and report back your findings if possible.

  • Re: Media mount and usage errors - going on months
    Posted: 11-14-2019, 10:20 PM

    DM_MP:

    Bumping this thread for awareness.  Same issue occured again.  Same symptoms.  All SAN activity paused.  All backup jobs hung.  Same warnings in the System log for the duration of the issue.

    Failed in acquiring the lock on file [C:\Program Files\Commvault\ContentStore\Log Files\PerformanceMetrics.log] Module [PerformanceMetrics] Error[0x80070306:{CQiLogger::LockLockfile(1318)} + {CQiLogger::LockLockfile(1312)/W32.774.(One or more errors occurred while processing the request.)-Timed out acquiring mutex on Global\QiLogger_PerformanceMetrics}]

    After about 30 minutes, the storage came back online on its own.  All backup operations resumed.

    CV case notes have been updated.  We did not experience this issue prior to our upgrade to SP16.  (upgraded from SP11 to SP16 in early August 2019)

    having similar issue with our Azure based VM - SP15, HP39. identical error messages in windows event logs relating to Timed out acquiring mutex. generally the VM goes unresponsive for about 45 minutes at a time for us. other behaviors:

    • attempting RDP log on during this period results in waiting at login screen (loading profile or similar)
    • If i launch Windows Resource monitor, and then attempt to close it, the process doesnt close
    • at the end of the 45 mins everything waiting to close closes, and the VM goes back to normal
  • Re: Media mount and usage errors - going on months
    Posted: 11-19-2019, 10:57 AM

    Hi All,

    I've increased the mountpath Time Outs from default 10 mins to 45 mins on all disk libraries.

    That seems to have resolved the issue. I would wait another coouple of days or probebly a week to see the consequences or any side effects from he change.

    Let me observe the behaviour changes first and then I'll put an update here.

     

    BR

    Pemil

  • Re: Media mount and usage errors - going on months
    Posted: 11-19-2019, 11:06 AM

    Thank you for that update from your environment.  I'm curious to know how frequent you were experiencing the problems.

    So far, we've only had it happen about 2 or 3 times.  And each time it happened at least 3+ weeks apart.  I'm just wondering if the issue is more severe for you.

The content of the forums, threads and posts reflects the thoughts and opinions of each author, and does not represent the thoughts, opinions, plans or strategies of Commvault Systems, Inc. ("Commvault") and Commvault undertakes no obligation to update, correct or modify any statements made in this forum. Any and all third party links, statements, comments, or feedback posted to, or otherwise provided by this forum, thread or post are not affiliated with, nor endorsed by, Commvault.
Commvault, Commvault and logo, the “CV” logo, Commvault Systems, Solving Forward, SIM, Singular Information Management, Simpana, Commvault Galaxy, Unified Data Management, QiNetix, Quick Recovery, QR, CommNet, GridStor, Vault Tracker, InnerVault, QuickSnap, QSnap, Recovery Director, CommServe, CommCell, SnapProtect, ROMS, and CommValue, are trademarks or registered trademarks of Commvault Systems, Inc. All other third party brands, products, service names, trademarks, or registered service marks are the property of and used to identify the products or services of their respective owners. All specifications are subject to change without notice.
Close
Copyright © 2019 Commvault | All Rights Reserved. | Legal | Privacy Policy