The Basics of 3D in the Browser
Learn the fundamentals of browser rendering.
Introduction
3D rendering engines are an abstraction around rendering APIs that simplify the process of drawing meshes on the screen and simulating the lighting of the meshes. In the web, rendering engines are composed of:
A rendering pipeline built on top of a browser rendering API, which updates every frame.
A developer-facing API for initializing and updating meshes, materials, cameras, and lights, written in a browser-supported programming language.
Abstractions for connecting the output of the rendering pipeline to the browser window.
Abstractions for updating the scene in response to user inputs received from browser APIs.

🎨 Core Browser Technologies: WebGL and HTMLCanvas
The primary browser graphics API is WebGL, which is based on the native OpenGL graphics API. Over the past two years, engines browsers have also begun to support WebGPU, a more modern graphics API that integrates tightly with powerful native APIs like Vulkan, Metal, and DirectX 12. The purpose of a graphics API is to provide an interface between the CPU, which constructs an abstract representation of a scene, and the GPU, which renders the abstract scene representation to an image.
The GPU outputs the rendered image as a list of color values into a special memory allocation known as the framebuffer. In order to display the image to the user, the browser provides an HTML component called the canvas. This canvas component occupies a rectangular area of a webpage, and displays the output from the GPU. In practice, the developer constructs a canvas, and then accesses the appropriate graphics API (WebGL or WebGPU) through a reference to the canvas.
🖥️ Programming Language Support in Web Rendering Engines
There are two main approaches to building rendering engines for the web:
Compiling an existing engine written in a low-level languages (e.g. C, C++, Rust) to WebAssembly. This approach is taken by engines like Unity, Godot, and Bevy.
Writing the engine entirely in JavaScript. This approach is used by engines like PlayCanvas, Three.js, and Babylon.js.
Each approach has certain tradeoffs:
Browser Compatibility
Browser APIs are generally not available to the WebAssembly runtime; to interface with the browser, WebAssembly code must call a JavaScript function that then calls the corresponding browser API. Sending commands and data over this reflection layer can be expensive, negating some of the benefits of the theoretically faster low-level WebAssembly code.
JavaScript can directly call all browser APIs.
Performance Ceiling
WebAssembly runs at near-native speeds, enabling a very high performance ceiling.
While modern JavaScript engines like V8 and JavaScriptCore are capable of running graphics applications, low-level languages like C++ and Rust have an edge in pure performance. In particular, JavaScript is single-threaded, uses automatic memory management, and cannot leverage Single-Instruction Multiple Data parallelization in the browser, putting a theoretical ceiling on performance. Some multi-threading capability is available in the browser via the web worker API.
Debugging
WebAssembly debugging support is lacking across major browsers. Developers will often need to rely on debugging tools provided by the engine, making it hard to fix web-specific bugs.
Profiling and debugging tools are built into the browser, making it relatively easy to identify the root cause of performance problems and bugs.
Deployment
WebAssembly requires an additional compilation step in the deployment process. The entire game engine must be compiled with the build.
JavaScript applications can be deployed directly. Developers can also reduce bundle sizes by stripping unused code and minifying the JavaScript code.
The choice of engine is not as clear-cut as performance vs. usability. Given the complex nature of rendering engines, it is very difficult to build comprehensive performance benchmarks proving one engine is faster than another. It is more important to choose an engine based on the type of experience you want to build and your strengths as a developer.
🏗️ Building a Rendering Pipeline with WebGL
Graphics APIs like WebGL and WebGPU are used to construct a rendering pipeline, a program that defines the steps that the underlying native Graphics API (OpenGL, Vulkan, Metal, DirectX) must take to render an object to the framebuffer.
Here is a code sample of a rendering pipeline implemented in WebGL that draws a single triangle to the screen. Below, we describe how WebGL and the canvas interact with the GPU to draw the triangle onto the screen.
Setting Up WebGL State In JavaScript
Before anything happens on the GPU, the developer must set up the pipeline in JavaScript. This consists of: setting up shaders, small functions that run on the GPU in parallel to process each triangle and pixel we want to render; allocating vertex buffers, which contain data for each vertex, a point in 3D space that makes up a mesh; and describing the layout of vertex buffers, as a vertex buffer can contain position data, color data, etc., laid out in any fashion. After this state is set up, the code submits the frame to the GPU, where the rendering pipeline begins.
Vertex Processing
3D scenes consist of many vertices in 3D space, which make up lines and triangles, which in turn compose into more complex shapes. In the first step of the rendering pipeline, each vertex in an array of vertices are processed by a "vertex shader", a small program that determines the position and color of a particular vertex given some input properties.
Rasterization
The processed primitives are then rasterized, or converted into a sequence of fragments. Whereas primitives represent a shape in 3D space, fragments represent the projection of 3D shapes into 2D space. Imagine taking a picture on a digital camera; the 3D scene you are capturing is recorded by a series of sensors, which record light information from 3D space. A fragment loosely corresponds to one of these sensors; it consists of 2D position data for a point and data interpolated from the vertices that contribute to that point. This fragment data is output to the next stage.
Canvas Output
Finally, the framebuffer is drawn onto the canvas in the browser window. View a complete example of the rendering pipeline in this codepen. You can look at the exact JavaScript and HTML used to render a triangle to the screen.
📸 Engine Abstractions: Meshes, Materials, Cameras, and Lights
Rendering API code is often regarded as verbose and unapproachable, so some abstractions are built on top to make things easier:
At the highest level of abstraction is the scene, a tree-like data structure that defines the hierarchy of elements that are rendered to the screen.
The scene is composed of meshes, collections of triangles that from a shape, and have some position, scale, and rotation in the world.
Each mesh has a material, which defines how the mesh responds to lights in the scene. Materials may render just a solid color, can render a texture, or can imitate real-life material properties, like wood or skin.
A camera renders the scene from a particular perspective, which is then output to the browser window via the Canvas API.
We might construct a scene like this:
// Example of initializing a scene and starting the render loop
function runApp() {
const scene = engine.createScene();
const camera = scene.createCamera();
camera.position = { x: 0, y: 4, z: -15 };
camera.lookAt({ x: 0, y: 0, z: 0 });
const light = scene.createAreaLight();
light.color = { r: 1, g: 1, b: 1 };
const basicMaterial = scene.createStandardMaterial();
basicMaterial.color = { r: 1, g: 0, b: 0 };
const box = scene.createBox();
box.material = basicMaterial;
const boxInstance1 = box.createInstance();
boxInstance1.position = { x: 1, y: 1, z: 0 };
boxInstance1.scale = { x: 2, y: 1, z: 0 };
const boxInstance2 = box.createInstance();
boxInstance2.position = { x: -1, y: -1, z: 0 };
// RequestAnimationFrame is a browser API that fires a callback on every browser fram
requestAnimationFrame(function() {
// Re-render the scene every frame
render(scene);
})
}
In an engine like Babylon.js, we can create a similar scene in a playground and produce the following result:

Behind the scenes, the engine will compose a rendering pipeline from these instances, lights, and camera in the render
function. In most engines, the render
function will look something like this:
// Heavily-abstracted render loop. This runs every frame.
function render(scene) {
for (let material of scene.materials) {
// Initialize Shaders using gl.createShader(), gl.compileShader(), etc.
InitializeShadersAndTextures(material, scene.camera, scene.lights);
for (let mesh of scene.meshes.filter(
(mesh) => mesh.material === material
)) {
// Create a WebGL buffer for the mesh vertices using gl.createBuffer()
InitializeMeshBuffers(mesh);
for (let instance of mesh.instances.values()) {
// For each instance of the mesh, set the appropriate color and position
SetInstanceValues(instance);
// Render the instance to the screen using gl.drawArrays()
Draw();
}
}
}
}
👩💻 Updating the Scene in Response to User Inputs in the Browser
Generally, users need to interact with the program in some way. The browser exposes a few APIs to allow controlled access to the mouse, keyboard, touch, and gamepad events:
Keyboard/Mouse
KeyboardEvent API: Responds to key strokes
MouseEvent API: Responds to mouse clicks and mouse movements
Mobile
Desktop
Gamepad
Gamepad API: Detects when controllers are connected, and determines the controller layout
Mobile
Desktop
XR Headset and Inputs
WebXR API: Provides abstractions for handling headset movement and hand/gamepad inputs.
XR
Pointer
Pointer Events API: Treats mouse, touch, and gamepad events as generic "pointer" events. This simplifies responding to basic click and hover events.
Mobile, Desktop, XR
📚 More Learning Resources
General WebGL and WebGPU tutorials:
Last updated
Was this helpful?