I spent quite some time with a customer last year looking at this exact problem (or so it looks...)
Long story short, it appears to be a limitation of the software in the way it threads data to tape across the aux copies. In our environment, we had 10 LTO4 drives and never really saw throughput increase above 600GB/h across all jobs.
One thing to test to see if this is like the same issue, is to run an Aux copy across all 8 drives, from only one media agent. When we did that our throughput went up to 1TB/h+.
If you see the throughput go up, then it's likey the same issue.
Another behaviour we were seeing was that if you had a number of streams set for the aux copy (say 10) all 10 would have to finish before a new set of 10 would start. This would mean that when you got down to the last 1 or 2, the overall throughput would tank.
I believe stream randomization is meant to resolve this, but it didn't make any difference for us. We are on CV 8 SP5 btw.