The Basics of 3D in the Browser

Learn the fundamentals of browser rendering.

Introduction

3D rendering engines are an abstraction around rendering APIs that simplify the process of drawing meshes on the screen and simulating the lighting of the meshes. In the web, rendering engines are composed of:

A rendering pipeline built on top of a browser rendering API, which updates every frame.
A developer-facing API for initializing and updating meshes, materials, cameras, and lights, written in a browser-supported programming language.
Abstractions for connecting the output of the rendering pipeline to the browser window.
Abstractions for updating the scene in response to user inputs received from browser APIs.

Rendering Image Design: User Input -> State Updates -> Rendering -> Output

🎨 Core Browser Technologies: WebGL and HTMLCanvas

Modern web browsers, such as Safari, Chrome, Firefox, and Edge, implement a set of standardized Graphics APIs and HTML components that collectively support rendering 3D applications in a web page.

The primary browser graphics API is WebGL, which is based on the native OpenGL graphics API. Over the past two years, engines browsers have also begun to support WebGPU, a more modern graphics API that integrates tightly with powerful native APIs like Vulkan, Metal, and DirectX 12. The purpose of a graphics API is to provide an interface between the CPU, which constructs an abstract representation of a scene, and the GPU, which renders the abstract scene representation to an image.
The GPU outputs the rendered image as a list of color values into a special memory allocation known as the framebuffer. In order to display the image to the user, the browser provides an HTML component called the canvas. This canvas component occupies a rectangular area of a webpage, and displays the output from the GPU. In practice, the developer constructs a canvas, and then accesses the appropriate graphics API (WebGL or WebGPU) through a reference to the canvas.

🖥️ Programming Language Support in Web Rendering Engines

The only officially supported browser language is JavaScript. To execute non-JavaScript code directly in the browser, it must be compiled to a binary format known as WebAssembly, where it can be directly executed by a virtual machine in the browser.

There are two main approaches to building rendering engines for the web:

Compiling an existing engine written in a low-level languages (e.g. C, C++, Rust) to WebAssembly. This approach is taken by engines like Unity, Godot, and Bevy.
Writing the engine entirely in JavaScript. This approach is used by engines like PlayCanvas, Three.js, and Babylon.js.

Each approach has certain tradeoffs:

Tradeoff

WebAssembly

JavaScript

Browser Compatibility

Browser APIs are generally not available to the WebAssembly runtime; to interface with the browser, WebAssembly code must call a JavaScript function that then calls the corresponding browser API. Sending commands and data over this reflection layer can be expensive, negating some of the benefits of the theoretically faster low-level WebAssembly code.

JavaScript can directly call all browser APIs.

Performance Ceiling

WebAssembly runs at near-native speeds, enabling a very high performance ceiling.

While modern JavaScript engines like V8 and JavaScriptCore are capable of running graphics applications, low-level languages like C++ and Rust have an edge in pure performance. In particular, JavaScript is single-threaded, uses automatic memory management, and cannot leverage Single-Instruction Multiple Data parallelization in the browser, putting a theoretical ceiling on performance. Some multi-threading capability is available in the browser via the web worker API.

Debugging

WebAssembly debugging support is lacking across major browsers. Developers will often need to rely on debugging tools provided by the engine, making it hard to fix web-specific bugs.

Profiling and debugging tools are built into the browser, making it relatively easy to identify the root cause of performance problems and bugs.

Deployment

WebAssembly requires an additional compilation step in the deployment process. The entire game engine must be compiled with the build.

JavaScript applications can be deployed directly. Developers can also reduce bundle sizes by stripping unused code and minifying the JavaScript code.

The choice of engine is not as clear-cut as performance vs. usability. Given the complex nature of rendering engines, it is very difficult to build comprehensive performance benchmarks proving one engine is faster than another. It is more important to choose an engine based on the type of experience you want to build and your strengths as a developer.

🏗️ Building a Rendering Pipeline with WebGL

Graphics APIs like WebGL and WebGPU are used to construct a rendering pipeline, a program that defines the steps that the underlying native Graphics API (OpenGL, Vulkan, Metal, DirectX) must take to render an object to the framebuffer.

Here is a code sample of a rendering pipeline implemented in WebGL that draws a single triangle to the screen. Below, we describe how WebGL and the canvas interact with the GPU to draw the triangle onto the screen.

Toggle the "JS" and "HTML" tabs to view the usage of the WebGL and the canvas APIs, respectively.

Creating the Canvas

The output of the rendering pipeline is rendered directly into the browser via the Canvas API:

<canvas id="myCanvas" width="400" height="400">
    The Rendering Canvas (Alt Text)
</canvas>

Setting Up WebGL State In JavaScript

Before anything happens on the GPU, the developer must set up the pipeline in JavaScript. This consists of: setting up shaders, small functions that run on the GPU in parallel to process each triangle and pixel we want to render; allocating vertex buffers, which contain data for each vertex, a point in 3D space that makes up a mesh; and describing the layout of vertex buffers, as a vertex buffer can contain position data, color data, etc., laid out in any fashion. After this state is set up, the code submits the frame to the GPU, where the rendering pipeline begins.

Vertex Processing

3D scenes consist of many vertices in 3D space, which make up lines and triangles, which in turn compose into more complex shapes. In the first step of the rendering pipeline, each vertex in an array of vertices are processed by a "vertex shader", a small program that determines the position and color of a particular vertex given some input properties.

Primitive Assembly and Clipping

Each vertex may be part of one or more primitives (point, line, or triangle) in the scene. At this stage, the pipeline generates an array of primitives from the vertices and clips all primitives that extend outside of the camera view.

Rasterization

The processed primitives are then rasterized, or converted into a sequence of fragments. Whereas primitives represent a shape in 3D space, fragments represent the projection of 3D shapes into 2D space. Imagine taking a picture on a digital camera; the 3D scene you are capturing is recorded by a series of sensors, which record light information from 3D space. A fragment loosely corresponds to one of these sensors; it consists of 2D position data for a point and data interpolated from the vertices that contribute to that point. This fragment data is output to the next stage.

Fragment processing

Each fragment is processed by a fragment shader, which determines the final color of each value in the framebuffer.

Canvas Output

Finally, the framebuffer is drawn onto the canvas in the browser window. View a complete example of the rendering pipeline in this codepen. You can look at the exact JavaScript and HTML used to render a triangle to the screen.

📸 Engine Abstractions: Meshes, Materials, Cameras, and Lights

Rendering API code is often regarded as verbose and unapproachable, so some abstractions are built on top to make things easier:

At the highest level of abstraction is the scene, a tree-like data structure that defines the hierarchy of elements that are rendered to the screen.
The scene is composed of meshes, collections of triangles that from a shape, and have some position, scale, and rotation in the world.
Each mesh has a material, which defines how the mesh responds to lights in the scene. Materials may render just a solid color, can render a texture, or can imitate real-life material properties, like wood or skin.
A camera renders the scene from a particular perspective, which is then output to the browser window via the Canvas API.

We might construct a scene like this:

// Example of initializing a scene and starting the render loop
function runApp() {
  const scene = engine.createScene();

  const camera = scene.createCamera();
  camera.position = { x: 0, y: 4, z: -15 };
  camera.lookAt({ x: 0, y: 0, z: 0 });

  const light = scene.createAreaLight();
  light.color = { r: 1, g: 1, b: 1 };

  const basicMaterial = scene.createStandardMaterial();
  basicMaterial.color = { r: 1, g: 0, b: 0 };

  const box = scene.createBox();
  box.material = basicMaterial;

  const boxInstance1 = box.createInstance();
  boxInstance1.position = { x: 1, y: 1, z: 0 };
  boxInstance1.scale = { x: 2, y: 1, z: 0 };

  const boxInstance2 = box.createInstance();
  boxInstance2.position = { x: -1, y: -1, z: 0 };

  // RequestAnimationFrame is a browser API that fires a callback on every browser fram
  requestAnimationFrame(function() {
    // Re-render the scene every frame
    render(scene);
  })
}

In an engine like Babylon.js, we can create a similar scene in a playground and produce the following result:

Behind the scenes, the engine will compose a rendering pipeline from these instances, lights, and camera in the render function. In most engines, the render function will look something like this:

// Heavily-abstracted render loop. This runs every frame.
function render(scene) {
  for (let material of scene.materials) {
    // Initialize Shaders using gl.createShader(), gl.compileShader(), etc.
    InitializeShadersAndTextures(material, scene.camera, scene.lights);

    for (let mesh of scene.meshes.filter(
      (mesh) => mesh.material === material
    )) {
      // Create a WebGL buffer for the mesh vertices using gl.createBuffer()
      InitializeMeshBuffers(mesh);

      for (let instance of mesh.instances.values()) {
        // For each instance of the mesh, set the appropriate color and position
        SetInstanceValues(instance);

        // Render the instance to the screen using gl.drawArrays()
        Draw();
      }
    }
  }
}

👩‍💻 Updating the Scene in Response to User Inputs in the Browser

Generally, users need to interact with the program in some way. The browser exposes a few APIs to allow controlled access to the mouse, keyboard, touch, and gamepad events:

Input Method

API

Platform Support

Touch

Touch Events API: Responds to swipe, tap, and multitap interactions

Mobile
Chrome, Edge Desktop

Keyboard/Mouse

KeyboardEvent API: Responds to key strokes
MouseEvent API: Responds to mouse clicks and mouse movements

Mobile
Desktop

Gamepad

Gamepad API: Detects when controllers are connected, and determines the controller layout

Mobile
Desktop

XR Headset and Inputs

WebXR API: Provides abstractions for handling headset movement and hand/gamepad inputs.

Pointer

Pointer Events API: Treats mouse, touch, and gamepad events as generic "pointer" events. This simplifies responding to basic click and hover events.

Mobile, Desktop, XR

📚 More Learning Resources

General WebGL and WebGPU tutorials:

PreviousOptimization Recommendations and Requirements NextChallenges and Optimization Strategies

Last updated 2 months ago

Was this helpful?