1 Million Particles: GPU Swarm Simulation
1 Million Particles: GPU Swarm Simulation
The Question: How many particles can we simulate in real-time?
The Answer: Way more than we thought.
The Result: Slightly concerning but mostly awesome.
What We Built
A WebGL particle system that simulates flocking behavior for 1 million individual particles. Each particle:
- Follows 3 simple rules (more on that below)
- Runs entirely on your GPU
- Updates 60 times per second
- Looks like a murmuration of starlings
Try the demo ← Click to see it live
The Three Rules of Flocking
This comes from Craig Reynolds' "Boids" algorithm (1986). Three rules create emergent swarm behavior:
Rule 1: Separation
"Don't crash into your neighbors"
Each bird steers away from birds too close.
Rule 2: Alignment
"Fly in the same direction as nearby birds"
Match velocity with your neighbors.
Rule 3: Cohesion
"Stay with the group"
Steer toward the average position of nearby birds.
That's it. Three rules = murmuration.
Why The GPU?
CPU approach:
for each particle (1 million):
for each neighbor (average 20):
calculate forces
update position
That's 20 million calculations per frame. At 60 FPS = 1.2 billion calculations/second.
Your CPU would melt.
GPU approach: All particles calculate simultaneously. 1 million threads running in parallel.
60 FPS? Easy.
The Code (Simplified)
Compute Shader (Runs on GPU)
// Each particle runs this code in parallel
void main() {
vec3 separation = vec3(0.0);
vec3 alignment = vec3(0.0);
vec3 cohesion = vec3(0.0);
int neighborCount = 0;
// Check nearby particles
for (int i = 0; i < PARTICLE_COUNT; i++) {
float dist = distance(myPosition, particles[i].position);
if (dist < NEIGHBOR_RADIUS && dist > 0.0) {
// Rule 1: Separation
separation -= (particles[i].position - myPosition) / dist;
// Rule 2: Alignment
alignment += particles[i].velocity;
// Rule 3: Cohesion
cohesion += particles[i].position;
neighborCount++;
}
}
// Average the forces
if (neighborCount > 0) {
alignment /= neighborCount;
cohesion = (cohesion / neighborCount) - myPosition;
}
// Update velocity and position
velocity += separation * 0.5 + alignment * 0.1 + cohesion * 0.1;
position += velocity * deltaTime;
}
Every particle runs this simultaneously. Millions of calculations in parallel.
Performance Numbers
Hardware: RTX 3070 (Mid-range gaming GPU)
| Particles | FPS | GPU Usage |
|---|---|---|
| 10,000 | 60 | 15% |
| 100,000 | 60 | 45% |
| 500,000 | 60 | 80% |
| 1,000,000 | 58-60 | 95% |
| 2,000,000 | 30-35 | 100% |
Sweet spot: 1M particles at stable 60 FPS.
Optimizations We Made
1. Spatial Hashing
Don't check all 1M particles for each particle. Divide space into grid cells.
Only check particles in nearby cells.
Speed improvement: 100x faster
2. Instanced Rendering
Don't draw 1M particles individually. Draw one particle 1M times with GPU instancing.
Speed improvement: 50x faster rendering
3. Compute Shaders
Physics calculations run on GPU, not CPU.
Speed improvement: 1000x faster than CPU
What We Learned
1. GPUs are magic The same chip that renders Cyberpunk 2077 can simulate complex physics.
2. Simple rules → complex behavior Three lines of code create stunning emergent patterns.
3. Performance has limits 2M particles dropped to 30 FPS. There's always a ceiling.
4. WebGL/WebGPU is underrated Browsers can do things that used to require native apps.
Try It Yourself
Demo: Live simulation
Source code: GitHub
Controls:
- Click + drag to rotate
- Scroll to zoom
- Space to pause
- R to randomize
The Unnecessary Part
Did we need 1 million particles? No.
10,000 looks just as good.
But we wanted to know: What's the limit?
Turns out it's around 2M on consumer hardware.
Science: Answering questions nobody asked.
Experiments: Check out our Labs page for more mad science Next Mad Science: Cloth Physics in Real-Time (Aug 3)