1 Million Particles: GPU Swarm Simulation

The Question: How many particles can we simulate in real-time?

The Answer: Way more than we thought.

The Result: Slightly concerning but mostly awesome.

What We Built

A WebGL particle system that simulates flocking behavior for 1 million individual particles. Each particle:

  • Follows 3 simple rules (more on that below)
  • Runs entirely on your GPU
  • Updates 60 times per second
  • Looks like a murmuration of starlings

Try the demo ← Click to see it live

The Three Rules of Flocking

This comes from Craig Reynolds' "Boids" algorithm (1986). Three rules create emergent swarm behavior:

Rule 1: Separation

"Don't crash into your neighbors"

Each bird steers away from birds too close.

Rule 2: Alignment

"Fly in the same direction as nearby birds"

Match velocity with your neighbors.

Rule 3: Cohesion

"Stay with the group"

Steer toward the average position of nearby birds.

That's it. Three rules = murmuration.

Why The GPU?

CPU approach:

for each particle (1 million):
    for each neighbor (average 20):
        calculate forces
        update position

That's 20 million calculations per frame. At 60 FPS = 1.2 billion calculations/second.

Your CPU would melt.

GPU approach: All particles calculate simultaneously. 1 million threads running in parallel.

60 FPS? Easy.

The Code (Simplified)

Compute Shader (Runs on GPU)

// Each particle runs this code in parallel
void main() {
    vec3 separation = vec3(0.0);
    vec3 alignment = vec3(0.0);
    vec3 cohesion = vec3(0.0);
    int neighborCount = 0;

    // Check nearby particles
    for (int i = 0; i < PARTICLE_COUNT; i++) {
        float dist = distance(myPosition, particles[i].position);

        if (dist < NEIGHBOR_RADIUS && dist > 0.0) {
            // Rule 1: Separation
            separation -= (particles[i].position - myPosition) / dist;

            // Rule 2: Alignment
            alignment += particles[i].velocity;

            // Rule 3: Cohesion
            cohesion += particles[i].position;

            neighborCount++;
        }
    }

    // Average the forces
    if (neighborCount > 0) {
        alignment /= neighborCount;
        cohesion = (cohesion / neighborCount) - myPosition;
    }

    // Update velocity and position
    velocity += separation * 0.5 + alignment * 0.1 + cohesion * 0.1;
    position += velocity * deltaTime;
}

Every particle runs this simultaneously. Millions of calculations in parallel.

Performance Numbers

Hardware: RTX 3070 (Mid-range gaming GPU)

Particles FPS GPU Usage
10,000 60 15%
100,000 60 45%
500,000 60 80%
1,000,000 58-60 95%
2,000,000 30-35 100%

Sweet spot: 1M particles at stable 60 FPS.

Optimizations We Made

1. Spatial Hashing

Don't check all 1M particles for each particle. Divide space into grid cells.

Only check particles in nearby cells.

Speed improvement: 100x faster

2. Instanced Rendering

Don't draw 1M particles individually. Draw one particle 1M times with GPU instancing.

Speed improvement: 50x faster rendering

3. Compute Shaders

Physics calculations run on GPU, not CPU.

Speed improvement: 1000x faster than CPU

What We Learned

1. GPUs are magic The same chip that renders Cyberpunk 2077 can simulate complex physics.

2. Simple rules → complex behavior Three lines of code create stunning emergent patterns.

3. Performance has limits 2M particles dropped to 30 FPS. There's always a ceiling.

4. WebGL/WebGPU is underrated Browsers can do things that used to require native apps.

Try It Yourself

Demo: Live simulation

Source code: GitHub

Controls:

  • Click + drag to rotate
  • Scroll to zoom
  • Space to pause
  • R to randomize

The Unnecessary Part

Did we need 1 million particles? No.

10,000 looks just as good.

But we wanted to know: What's the limit?

Turns out it's around 2M on consumer hardware.

Science: Answering questions nobody asked.


Experiments: Check out our Labs page for more mad science Next Mad Science: Cloth Physics in Real-Time (Aug 3)