OverlapSphere vs OverlapSphereNonAlloc

Both functions check if there are colliders in a specific radius.

Physics.OverlapSphere creates a new Array each time it is called.

Physics.OverlapSphereNonAlloc takes a Buffer-Array as argument and just fills it when called. That way it causes less Garbage because the same array is reused and thus should be more performant.
The downside is, that OverlapSphereNonAlloc has two important kinks which you need to be aware:

  1. When the supplied buffer-Array is not large enough to store all objects in range, it just stores some random ones in range but not all. (The Documentation isn’t specific about the criteria)
  2. When Objects leave the radius they are not removed from the array. So you have to do it manually.
Physics.OverlapSphere
Physics.OverlapSphereNonAlloc

This is the Script I used for the above test:

public class RangeChecker : MonoBehaviour
{
  [SerializeField] private bool isUsingNonAlloc;
  [SerializeField] Collider[] collidersInRange = new Collider[20];
  private float radius = 10;
  
  void Update()
  {
    if (isUsingNonAlloc)
        Physics.OverlapSphereNonAlloc(transform.position, radius, collidersInRange);
    else
        collidersInRange = Physics.OverlapSphere(transform.position, radius);
  }
}

Performance comparison

In order to make a realistic performance comparison I adjusted the above script, to use GetComponent and change a variable from a script.

void Update()
{
    int objInRangeCnt = 0;
    if (isUsingNonAlloc)
    {
        objInRangeCnt = Physics.OverlapSphereNonAlloc(transform.position, radius, collidersInRange);
    }
    else
    {
        collidersInRange = Physics.OverlapSphere(transform.position, radius);
        objInRangeCnt = collidersInRange.Length;
    }

    for (int i = 0; i < objInRangeCnt; i++)
    {
        collidersInRange[i].GetComponent<EnemyScript>().hp -= 1;
    }
}

I created a build and tested it on my computer (AMD Ryzen 5 3600, Radeon RX 6650 XT, 32 Gb Ram).
These are the FPS I’ve got:

I also created a comparison in the profiler of a developmentbuild when I had 4.096 Rangecheckers. The result: The impact of the Garbagecollector is insignificant.

Also if you compare the two profilers with ProfileAnalyzer you can see that the Playerloop even takes slightly longer on the OverlapSphereNonAlloc Approach (Probably just a measurement inaccuracy).
The GetComponent takes twice as long (23ms) as OverlapSphere(12.13ms) or OverlapSphereNonAlloc (10.45ms).

Summary

It seems like in practical situations the performancegains are very limited when using OverlapSphereNonAlloc compared to OverlapSphere even if you have an enormous amount of OverlapSpheres per frame.