Vertex shaders execute once per vertex, fragment shaders execute once per pixel (fragment). Performance disparity is massive: 1000-vertex model at 1920x1080 resolution = ~1000 vertex shader executions vs ~2 million fragment shader executions (2000x difference). Rule: do maximum work in vertex shader since it runs far fewer times. Fragment shaders are bottleneck for fill-rate intensive scenes (large screen coverage, complex per-pixel effects). Vertex-bound scenarios: extremely high polygon count (>1M triangles), complex vertex transformations. Fragment-bound scenarios (most common): high resolution, expensive per-pixel calculations (lighting, shadows), many texture samples. Measurement: Use WebGL Inspector or Spector.js to profile. Indicators: fragment-bound if reducing resolution improves FPS significantly, vertex-bound if reducing geometry count helps more. Mobile performance: fragment shaders on ARM Mali process mediump precision 2x faster than highp - use lowest acceptable precision. Optimization strategy: move calculations to vertex shader, interpolate via varyings to fragment shader.
WebGL Advanced FAQ & Answers
22 expert WebGL Advanced answers researched from official documentation. Every answer cites authoritative sources you can verify.
unknown
22 questionsVertex shader (do here): (1) Lighting calculations - diffuse lighting per-vertex (Gouraud shading), transform normals and tangent-space basis. (2) Texture coordinate transformations - UV scrolling, projection mapping. (3) Position transformations - model-view-projection matrices, skinning animations. (4) Heavy math - complex calculations with results passed as varyings. Fragment shader (keep minimal): (1) Texture sampling - diffuse, normal, specular maps. (2) Per-pixel lighting - specular highlights requiring view direction. (3) Final color blending - alpha compositing, color correction. (4) Simple effects - fog, distance-based fading. Trade-off: Vertex shader produces lower quality (interpolated results, visible on low-poly models) vs fragment shader's per-pixel precision. Modern best practice: Vertex shader for diffuse/ambient lighting, fragment shader for specular and detail maps. When fragment-bound: Reduce resolution, optimize shader ALU ops, combine texture lookups. When vertex-bound: Implement LOD (Level of Detail), reduce geometry complexity, use instancing for repeated objects. Profile first - don't optimize blindly.
PBR core components: Cook-Torrance microfacet BRDF (Bidirectional Reflectance Distribution Function) replaces Phong/Lambert, metallic-roughness workflow. Essential calculations: (1) Fresnel (Schlick approximation): vec3 F = F0 + (1.0 - F0) * pow(1.0 - max(dot(H, V), 0.0), 5.0) - F0 = base reflectivity (0.04 for dielectrics, albedo for metals). (2) Normal Distribution Function (GGX/Trowbridge-Reitz): float NDF = (roughness^4) / (PI * ((NdotH^2 * (roughness^4 - 1.0) + 1.0)^2)) - determines microfacet distribution. (3) Geometry function (Smith's method): float G = G1(NdotV) * G1(NdotL) where G1 = (2.0 * NdotX) / (NdotX + sqrt(roughness^2 + (1.0 - roughness^2) * NdotX^2)) - handles self-shadowing. BRDF formula: (NDF * G * F) / (4.0 * NdotV * NdotL). Image-Based Lighting: Prefiltered environment maps (diffuse irradiance cubemap, specular prefilter mipmap chain, BRDF LUT texture). Performance: ~2-3ms per frame for complex scenes. Use shader LOD - simplified BRDF for distant objects.
Each draw call incurs CPU overhead: state changes (blend mode, depth test), uniform updates, buffer bindings, API call overhead. Bottleneck: >1000 draw calls = significant CPU time, GPU may idle waiting for commands. Each call ~0.1-0.5ms CPU overhead (varies by driver). Performance impact: 1000 draw calls = 100-500ms CPU time alone, impossible to maintain 60fps (16.67ms budget). Material/shader changes most expensive - trigger pipeline state changes. Texture binding changes also costly. Target: <500 draw calls per frame for consistent 60fps on desktop, <200 on mobile. Measurement: Use browser DevTools Performance panel or WebGL Inspector. Look for long JavaScript execution during render. Modern browsers expose gl.getParameter(gl.DRAW_CALLS) for profiling. Critical threshold: Beyond 1500 draw calls, CPU becomes primary bottleneck regardless of GPU power. Solution requires batching, instancing, or architectural changes. Three.js InstancedMesh and BatchedMesh optimize automatically. Note: WebGPU reduces draw call overhead significantly with command buffers and better state management.
Optimization techniques: (1) Instancing - gl.drawArraysInstanced() / gl.drawElementsInstanced() draws multiple copies with single call. Pass per-instance data as attributes (instance matrices, colors). 50-100x reduction for repeated objects (trees, particles, crowds). (2) Geometry batching - merge static objects with same material into single vertex buffer. Render entire batch in one call. Downside: increases vertex count, more memory, breaks frustum culling granularity. Best for: static backgrounds, level geometry. (3) Texture atlases - combine multiple textures into single large texture, use UV offsets per object. Eliminates texture binding changes. (4) Material sorting - group draw calls by shader/material to minimize state changes. Render opaque front-to-back (early Z-cull), transparent back-to-front. (5) Uniform Buffer Objects (UBO) - WebGL 2.0 only, batch uniform updates. When to batch: Static objects, shared materials, <10K triangles per batch. When to instance: Many identical objects, dynamic transforms. Trade-off: Batching increases memory and loses culling precision, instancing requires shader support. Modern frameworks (Three.js) handle automatically with InstancedMesh and BatchedMesh.
GPU occlusion culling uses occlusion queries to test object visibility before rendering full geometry. Process: (1) Create query: const query = gl.createQuery(). (2) Render bounding volumes (boxes, spheres) with color/depth writes disabled: gl.colorMask(false, false, false, false). (3) Execute query: gl.beginQuery(gl.ANY_SAMPLES_PASSED, query); drawBoundingBox(); gl.endQuery(gl.ANY_SAMPLES_PASSED). (4) Next frame, check results (asynchronous): if (gl.getQueryParameter(query, gl.QUERY_RESULT_AVAILABLE)) { const visible = gl.getQueryParameter(query, gl.QUERY_RESULT); if (visible > 0) { renderObject(); } }. Async nature: Results delayed 1-2 frames. Use predictive culling (assume visible if moving) or last frame's results. Bounding volume: Render simplified geometry (OBB, AABB, sphere) much cheaper than full object. Limitations: Query overhead ~0.1ms each, limit to 100-200 queries. Works best with hierarchical occlusion culling (test parent volumes first). Combine with frustum culling for best results. WebGL 2.0 required - no extension available for WebGL 1.0.
Performance gains: 30-50% fewer draw calls in complex scenes with significant occlusion (dense urban environments, indoor scenes, forests). Bigger gains for overdraw-heavy scenarios where objects block each other. Example scenarios: (1) City scene with buildings - 40% reduction in rendered geometry when walking between buildings. (2) Indoor environment - 50-60% culling when rooms/corridors block each other. (3) Dense forest - 30-35% culling from tree occlusion. Best case: Architectural visualizations, CAD applications with large occluded volumes - can reach 70% culling. Worst case: Open outdoor scenes with distant camera (skybox, terrain) - minimal benefit, overhead outweighs gains. Performance characteristics: Most effective when: (1) Scene has natural occlusion hierarchy, (2) Objects are large enough that query overhead < rendering cost, (3) Camera movement is predictable (reduces false positives from 1-frame delay). Cost-benefit: Query overhead increases linearly with object count, benefit increases with scene complexity. Break-even ~100 objects. Below that, frustum culling alone is sufficient. Modern approach: hierarchical culling with BVH (Bounding Volume Hierarchy) reduces query count by testing groups first.
Shader compilation blocks main thread causing stuttering. Best practices: (1) Async compilation - use setTimeout or requestIdleCallback between shader compiles during loading screen: compileShader(source); await new Promise(r => setTimeout(r, 0)); - yields to browser. (2) KHR_parallel_shader_compile extension - Chrome/Edge support async compilation: const ext = gl.getExtension('KHR_parallel_shader_compile'); gl.compileShader(shader); function checkReady() { if (gl.getShaderParameter(shader, ext.COMPLETION_STATUS_KHR)) { /* ready */ } else { requestAnimationFrame(checkReady); } }. (3) Program caching - cache compiled programs by shader source hash: const key = hash(vsSource + fsSource); if (cache[key]) return cache[key];. (4) Warm-up renders - render off-screen on first frame to trigger driver compilation: gl.viewport(0, 0, 1, 1); draw(); gl.viewport(0, 0, width, height);. (5) Precompile common shaders - compile during app initialization, not on-demand. Gotcha: Drivers may defer compilation until first draw call - warm-up prevents runtime stuttering. Three.js handles caching automatically.
Deferred rendering separates geometry from lighting using Multiple Render Targets (MRT). Requires: WebGL 2.0 (gl.drawBuffers()) or WebGL 1.0 with WEBGL_draw_buffers extension. Two passes: (1) Geometry pass - render scene to G-buffer (multiple textures): Position, Normal, Albedo/Diffuse, Specular/Roughness. Fragment shader outputs: layout(location = 0) out vec4 gPosition; layout(location = 1) out vec4 gNormal; layout(location = 2) out vec4 gAlbedoSpec;. Setup: gl.drawBuffers([gl.COLOR_ATTACHMENT0, gl.COLOR_ATTACHMENT1, gl.COLOR_ATTACHMENT2]);. (2) Lighting pass - render full-screen quad, sample G-buffer textures, calculate lighting for visible pixels only. Shader samples position/normal/albedo from textures, applies lighting equations. Architecture: Geometry pass writes scene info to textures, lighting pass reads textures and outputs final lit image. Benefits: Lighting complexity O(pixels × lights) vs forward rendering's O(geometry × lights). Efficient for 50+ lights without performance loss. Enables: Advanced post-processing (SSAO, SSR), efficient multi-light scenes, screen-space effects. WebGL 2.0 allows 4-8 MRT attachments (driver dependent).
Deferred pros: (1) Lighting scales with screen pixels not geometry - 50+ lights with minimal cost. (2) Each pixel lit once (forward may shade same pixel multiple times with overlapping lights). (3) Enables advanced effects - SSAO, SSR, screen-space decals work naturally. Deferred cons: (1) High memory bandwidth - 4-5 full-resolution textures (Position: RGB16F, Normal: RGB16F, Albedo: RGBA8, Specular: RGBA8) = 40-50MB at 1080p. (2) No hardware MSAA - must use post-process anti-aliasing (FXAA, TAA, SMAA). (3) Transparency problem - cannot handle transparent objects (requires separate forward pass). (4) Single material per pixel - no blending multiple materials. (5) Bandwidth intensive - reading 4+ textures per pixel in lighting pass. Forward pros: Native MSAA, transparency support, less memory, simple implementation. Forward cons: Poor scaling with many lights, overdraw wastes computation. Best for deferred: Indoor scenes, many small lights (100+ lights achievable). Avoid deferred for: Outdoor scenes, few lights, heavy transparency, mobile (bandwidth constraints). Modern alternative: Clustered forward rendering - hybrid approach with better transparency support.
Noise types: (1) Perlin noise - smooth, organic patterns, gradient-based interpolation. Classic for terrain, clouds. (2) Simplex noise - Ken Perlin's improved algorithm, fewer directional artifacts, ~25% faster than Perlin. Preferred for modern implementations. (3) Worley noise (Cellular) - Voronoi diagrams, creates cellular patterns. Great for stone, water foam, organic textures. (4) Value noise - Simplest, interpolates random values at grid points. Faster but blocky artifacts. Use cases: Terrain heightmaps (Perlin/Simplex FBM - Fractal Brownian Motion), water surface (multiple noise octaves), clouds (multi-octave Perlin), stone/marble (Worley), fire/smoke (animated noise). Fractal Brownian Motion (FBM): Layer multiple octaves at different frequencies: float fbm(vec2 p) { float value = 0.0; float amplitude = 0.5; for (int i = 0; i < octaves; i++) { value += amplitude * noise(p); p *= 2.0; amplitude *= 0.5; } return value; }. Each octave adds detail, doubles frequency, halves amplitude. Libraries: webgl-noise (Stefan Gustavson), lygia (Patricio Gonzalez Vivo) provide optimized implementations.
Performance at 1080p (fragment shader, per-frame): (1) Value noise: ~0.5ms - fastest, simplest, lower quality. (2) Simplex noise: ~0.9ms - best quality-to-performance ratio. (3) Perlin noise: ~1.2ms - classic, slightly slower than Simplex. (4) Worley noise: ~2.5ms - slowest, complex distance calculations. 3D noise: ~3x slower than 2D (more lookups, interpolation). 4D (animated 3D) ~4-5x slower than 2D. Optimization strategies: (1) Texture-based noise - precompute to texture, sample instead of calculate. ~10x faster for static/repeated patterns. Trade memory for speed. (2) Reduce octaves - each FBM octave doubles cost. Use 3-4 max, not 8-10. (3) Level of Detail (LOD) - reduce octaves for distant objects: int octaves = max(1, 6 - int(distance * 0.5)). (4) Lower dimensions - use 2D noise with scrolling UV instead of 3D when possible. Shader optimization: Avoid dynamic loops, unroll octaves: value += 0.5 * noise(p); p *= 2.0; value += 0.25 * noise(p); instead of loop. Best practice: Bake static patterns to textures offline (Substance Designer, Houdini), compute dynamic effects in shader. Three.js examples use texture-based noise for terrain.
Transform Feedback captures vertex shader outputs to buffer without rasterization, enabling GPU-side physics simulation. Particle system pattern: (1) Create two buffers (ping-pong): read buffer (current state), write buffer (next state). (2) Vertex shader updates particle state: integrate velocity, apply forces (gravity, wind), age particles, respawn dead particles. (3) Transform Feedback captures outputs: gl.bindTransformFeedback(gl.TRANSFORM_FEEDBACK, tf); gl.bindBufferBase(gl.TRANSFORM_FEEDBACK_BUFFER, 0, writeBuffer); gl.beginTransformFeedback(gl.POINTS); gl.drawArrays(gl.POINTS, 0, particleCount); gl.endTransformFeedback();. (4) Swap buffers: write becomes read for next frame. (5) Render particles from current buffer. Vertex shader: #version 300 es\nout vec3 v_position; out vec3 v_velocity; out float v_life; outputs captured. Link program: gl.transformFeedbackVaryings(program, ['v_position', 'v_velocity', 'v_life'], gl.SEPARATE_ATTRIBS); gl.linkProgram(program);. Benefits: All computation on GPU, no CPU-GPU transfer, massively parallel (millions of particles), deterministic simulation. Limitations: No particle-particle interaction (particles independent). For collisions, use compute shaders in WebGPU.
Performance gains: 10-100x faster than CPU-based particles. CPU: 10K-50K particles at 60fps (JavaScript loop overhead, data transfer). GPU Transform Feedback: 1M-10M particles at 60fps (parallel processing, no CPU-GPU transfer). Benchmarks: MacBook Pro M1: ~5M particles at 60fps with basic physics (gravity, velocity integration). High-end desktop GPU (RTX 3080): ~15M particles with collision detection via compute pass. Mobile (Snapdragon 888): ~500K particles maintaining 60fps. Scalability: Constant per-particle cost regardless of count (massively parallel). CPU scales linearly and badly with particle count. Memory efficiency: No double buffering between CPU/GPU. State lives entirely on GPU. Bottlenecks shift: From CPU JavaScript execution to GPU vertex processing and rasterization. Real-world use cases: 100K+ particle effects (explosions, fire, rain) with physics simulation. GPU particles standard for WebGL games and visualizations. Comparison: CPU 10K particles = ~8-10ms. GPU Transform Feedback 10K particles = ~0.5ms. GPU 1M particles = ~3-5ms (1000x more particles, 50% less time). Three.js GPUComputationRenderer wraps Transform Feedback pattern for easy use.
UBOs group multiple uniforms into buffer objects for efficient batch updates. Benefits over individual uniforms: (1) Update many uniforms in one API call - gl.bufferSubData() updates entire block vs multiple gl.uniform*() calls. (2) Share uniform data across shader programs - bind same UBO to multiple shaders. (3) Faster performance - 5-10x speedup for >20 uniforms. (4) Memory layout control via std140 layout rules. Setup GLSL: #version 300 es\nlayout(std140) uniform SceneData { mat4 viewMatrix; mat4 projMatrix; vec3 cameraPos; float time; };. Create buffer: const ubo = gl.createBuffer(); gl.bindBuffer(gl.UNIFORM_BUFFER, ubo); const data = new Float32Array([...matrices, ...vectors]); gl.bufferData(gl.UNIFORM_BUFFER, data, gl.DYNAMIC_DRAW);. Bind to program: const blockIndex = gl.getUniformBlockIndex(program, 'SceneData'); gl.uniformBlockBinding(program, blockIndex, 0); gl.bindBufferBase(gl.UNIFORM_BUFFER, 0, ubo);. Binding point: Acts as shared location - multiple programs reference same binding point, which references one UBO. Change UBO, all programs see updated data. std140 layout: Alignment rules - vec3 aligns to vec4 boundaries (wastes 4 bytes). Plan layout carefully.
Performance gains: (1) Batch updates - update 100 uniforms in one gl.bufferSubData() call (0.5ms) vs 100 individual 5ms). 10x speedup. (2) Reduced API overhead - fewer JavaScript-to-WebGL calls. Each gl.uniform*() calls (gl.uniform*() has overhead, UBO amortizes across all uniforms. (3) Shader switching optimization - when switching shaders, binding UBO (0.1ms) faster than setting individual uniforms (1-2ms for 50+ uniforms). (4) Shared data - one UBO shared across 10 programs = 1 update instead of 10. Critical for scenes with many materials. Use cases: (1) Per-frame data - view/projection matrices, camera position, time, environment settings. Update once per frame, used by all shaders. (2) Material properties - 20+ parameters for PBR (albedo, metallic, roughness, etc.). (3) Lighting data - array of 100 lights in one UBO. Limitations: Max UBO size 16KB-64KB (driver-dependent, query gl.MAX_UNIFORM_BLOCK_SIZE). std140 padding can waste memory. Critical threshold: >50 uniforms or >20 shader switches per frame, UBOs essential. Below that, overhead similar to individual uniforms. Three.js uses UBOs internally for efficient multi-material rendering.
SSAO approximates ambient occlusion using depth buffer in screen space (no scene geometry knowledge needed). Algorithm: (1) G-buffer pass - render scene depth and normals to textures. (2) SSAO pass - for each pixel: generate sample positions in hemisphere around surface normal, project samples to screen space, sample depth buffer at projected positions, count samples behind surfaces (occluded), calculate occlusion factor. (3) Blur pass - bilateral/depth-aware blur smooths noise. (4) Composite - multiply scene color by occlusion factor (darker in crevices). Shader structure: Generate random sample kernel (32-64 samples in hemisphere). Use 4x4 noise texture (tiled) to rotate samples per pixel. Key calculation: float occlusion = 0.0; for (int i = 0; i < samples; i++) { vec3 samplePos = fragPos + TBN * hemisphere[i] * radius; vec4 offset = projection * vec4(samplePos, 1.0); offset.xy /= offset.w; offset.xy = offset.xy * 0.5 + 0.5; float sampleDepth = texture(depthTex, offset.xy).r; occlusion += (sampleDepth >= samplePos.z + bias ? 1.0 : 0.0); }. Parameters: Radius (occlusion distance), bias (avoid self-occlusion), sample count (quality vs performance).
SSAO optimizations: (1) Half-resolution rendering - compute SSAO at 50% resolution, bilateral upsampling to full resolution. ~4x faster with minimal quality loss. (2) Reduce samples - 64 samples (high quality, ~8ms) → 32 samples (good quality, ~4ms) → 16 samples (acceptable quality, ~2ms) at 1080p. (3) Depth-aware blur - use depth discontinuities to preserve edges during blur. Prevents bleeding across objects. (4) Interleaved gradient noise - replace 4x4 noise texture with analytical noise function. Saves texture fetch. (5) Temporal filtering - accumulate samples across frames for higher quality without per-frame cost. Requires motion vectors for moving objects. Performance: Naive SSAO (64 samples, full-res, 5x5 blur): ~5-8ms at 1080p. Optimized SSAO (32 samples, half-res, depth-aware blur): ~1-2ms at 1080p. Advanced alternatives: (1) HBAO+ (Horizon-Based AO) - better quality, similar performance, uses horizon angles instead of samples. (2) GTAO (Ground Truth AO) - more accurate, slightly more expensive. Three.js provides SAOPass (Scalable Ambient Occlusion) and SSAOPass with these optimizations built-in. Mobile: half-resolution essential, 16 samples maximum.
Architecture: WebGL wraps OpenGL ES (legacy API), WebGPU based on modern APIs (Vulkan, Metal, Direct3D 12). Key differences: (1) Compute shaders - WebGPU has full compute shader support (general-purpose GPU), WebGL limited to Transform Feedback (vertex-only processing). (2) API design - WebGL stateful (implicit state machine), WebGPU explicit command buffers and pipelines. (3) Performance overhead - WebGPU 20-40% lower CPU overhead for complex scenes, better multi-threading support. (4) Shader language - WebGL uses GLSL, WebGPU uses WGSL (WebGPU Shading Language). (5) Resource management - WebGPU requires explicit barriers and synchronization, WebGL handles implicitly. Browser support (2025): WebGPU in Chrome 113+, Edge 113+, Safari 26+ (June 2025), Firefox 141+ (July 2025). ~90% desktop browser coverage, growing mobile support. WebGL has 98%+ compatibility (includes old browsers, mobile). Complexity: WebGL simpler, fewer concepts. WebGPU verbose but powerful, steeper learning curve. Migration: Non-trivial - different paradigms, shader rewrite required. Three.js r150+ supports both via WebGPURenderer, easing transition.
Use WebGL for: (1) Universal browser support - production apps requiring compatibility with older browsers, mobile devices, enterprise environments. (2) Existing codebases - migration costs rarely justify unless hitting concrete performance walls. (3) Simpler 3D scenes - if WebGL meets performance needs, no reason to increase complexity. (4) Quick prototypes - faster to develop, well-documented, mature ecosystem. (5) Libraries/frameworks - Three.js, Babylon.js, PlayCanvas mature WebGL support. Use WebGPU for: (1) Compute-heavy tasks - ML inference, physics simulation, particle systems (>1M particles), procedural generation. Compute shaders 5-10x faster than Transform Feedback. (2) Performance-critical apps - 20-40% lower overhead enables more complex scenes at same framerate. (3) Next-gen graphics - ray tracing, advanced post-processing, high-fidelity rendering. (4) 6-12 month development timeline - by deployment, WebGPU adoption will be higher. (5) Multi-threaded rendering - WebGPU enables true parallel command recording. Migration path: Projects starting 2025+ should evaluate WebGPU as primary target with WebGL fallback. Three.js makes this straightforward (same scene graph, swap renderer). WebGL will coexist with WebGPU for 5+ years (like OpenGL and Vulkan).
Shadow mapping uses depth from light's perspective to determine shadows. Two passes: (1) Shadow map pass - render scene from light position to depth framebuffer: gl.bindFramebuffer(gl.FRAMEBUFFER, shadowFBO); gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.DEPTH_ATTACHMENT, gl.TEXTURE_2D, depthTexture, 0); gl.viewport(0, 0, shadowMapSize, shadowMapSize); renderScene(lightViewMatrix, lightProjectionMatrix);. Depth texture stores distance from light. (2) Rendering pass - render scene normally, for each fragment: transform position to light clip space: vec4 lightSpacePos = lightProjectionMatrix * lightViewMatrix * worldPos, perspective divide: lightSpacePos.xyz /= lightSpacePos.w, NDC to texture coords: shadowCoord.xy = lightSpacePos.xy * 0.5 + 0.5, sample shadow map: float shadowDepth = texture(shadowMap, shadowCoord.xy).r, compare depths with bias: float shadow = (lightSpacePos.z - bias) > shadowDepth ? 0.0 : 1.0. Bias: Prevents shadow acne (self-shadowing). Too little = acne, too much = Peter Panning (shadows detached). Use slope-based bias: bias = max(0.05 * (1.0 - dot(normal, lightDir)), 0.005);. Typical values: 0.001-0.01 depending on scene scale.
Shadow artifacts and solutions: (1) Shadow acne (surface self-shadowing) - add depth bias: float shadow = (fragDepth - bias) > shadowDepth ? 0.0 : 1.0. Use adaptive bias based on surface slope: bias = max(0.05 * (1.0 - dot(N, L)), 0.005). Alternative: offset shadow map render slightly along normal. (2) Peter Panning (shadows detach from objects) - bias too high. Reduce bias, use front-face culling during shadow pass: gl.cullFace(gl.FRONT) renders only back faces to shadow map. (3) Aliased edges (pixelated shadows) - increase shadow map resolution (2048×2048 → 4096×4096), use Percentage Closer Filtering (PCF): sample 3×3 or 5×5 grid around shadow coord, average results: float shadow = 0.0; for (int x = -1; x <= 1; x++) { for (int y = -1; y <= 1; y++) { vec2 offset = vec2(x, y) / shadowMapSize; float depth = texture(shadowMap, shadowCoord.xy + offset).r; shadow += fragDepth - bias > depth ? 0.0 : 1.0; } } shadow /= 9.0;. (4) Low resolution - use cascaded shadow maps (CSM) for large scenes. (5) Soft shadows - use Variance Shadow Maps (VSM) or PCF with larger kernel. Performance: PCF 5×5 = 25 shadow map samples, ~1-2ms overhead. Three.js handles automatically with light.shadow.bias and light.shadow.radius.