Memory Management

Introduction

Low-level languages, like C, have low-level memory management primitives like malloc() and free(). On the other hand, JavaScript values are allocated when things (objects, strings, etc.) are created and "automatically" freed when they are not used anymore. The latter process is called garbage collection. This "automatically" is a source of confusion and gives JavaScript (and other high-level language) developers the impression they can decide not to care about memory management. This is a mistake.

Memory life cycle

Regardless of the programming language, memory life cycle is pretty much always the same:

  1. Allocate the memory you need
  2. Use the allocated memory (read, write)
  3. Release the allocated memory when it is not needed anymore

The second part is explicit in all languages. The first and last parts are explicit in low-level languages, but are mostly implicit in high-level languages like JavaScript.

Allocation in JavaScript

Value initialization

In order to not bother the programmer with allocations, JavaScript does it alongside with declaring values.

var n = 123; // allocates memory for a number
var s = 'azerty'; // allocates memory for a string 
var o = {
  a: 1,
  b: null
}; // allocates memory for an object and contained values
// (like object) allocates memory for the array and 
// contained values
var a = [1, null, 'abra']; 
function f(a) {
  return a + 2;
} // allocates a function (which is a callable object)
// function expressions also allocate an object
someElement.addEventListener('click', function() {
  someElement.style.backgroundColor = 'blue';
}, false);

Allocation via function calls

Some function calls result in object allocation.

var d = new Date(); // allocates a Date object
var e = document.createElement('div'); // allocates a DOM element

Some methods allocate new values or objects:

var s = 'azerty';
var s2 = s.substr(0, 3); // s2 is a new string
// Since strings are immutable value, 
// JavaScript may decide to not allocate memory, 
// but just store the [0, 3] range.
var a = ['ouais ouais', 'nan nan'];
var a2 = ['generation', 'nan nan'];
var a3 = a.concat(a2); 
// new array with 4 elements being
// the concatenation of a and a2 elements

Using values

Using values basically means reading and writing in allocated memory. This can be done by reading or writing the value of a variable or an object property or even passing an argument to a function.

Release when the memory is not needed anymore

Most of memory management issues come at this phase. The hardest task here is to find when "the allocated memory is not needed any longer". It often requires the developer to determine where in the program such piece of memory is not needed anymore and free it.

High-level languages embed a piece of software called "garbage collector" whose job is to track memory allocation and use in order to find when a piece of allocated memory is not needed any longer in which case, it will automatically free it. This process is an approximation since the general problem of knowing whether some piece of memory is needed is undecidable (can't be solved by an algorithm).

Garbage collection

As stated above, the general problem of automatically finding whether some memory "is not needed anymore" is undecidable. As a consequence, garbage collections implement a restriction of a solution to the general problem. This section will explain the necessary notions to understand the main garbage collection algorithms and their limitations.

References

The main notion garbage collection algorithms rely on is the notion of reference. Within the context of memory management, an object is said to reference another object if the former has an access to the latter (either implicitly or explicitly). For instance, a JavaScript object has a reference to its prototype (implicit reference) and to its properties values (explicit reference).

In this context, the notion of "object" is extended to something broader than regular JavaScript objects and also contains function scopes (or the global lexical scope).

Reference-counting garbage collection

This is the most naive garbage collection algorithm. This algorithm reduces the definition of "an object is not needed anymore" to "an object has no other object referencing to it". An object is considered garbage collectable if there is zero reference pointing at this object.

Example

var o = { 
  a: {
    b: 2
  }
}; 
// 2 objects are created. One is referenced by the other as one of its properties.
// The other is referenced by virtue of being assigned to the 'o' variable.
// Obviously, none can be garbage-collected
var o2 = o; // the 'o2' variable is the second thing that 
            // has a reference to the object
o = 1;      // now, the object that was originally in 'o' has a unique reference
            // embodied by the 'o2' variable
var oa = o2.a; // reference to 'a' property of the object.
               // This object has now 2 references: one as a property, 
               // the other as the 'oa' variable
o2 = 'yo'; // The object that was originally in 'o' has now zero
           // references to it. It can be garbage-collected.
           // However what was its 'a' property is still referenced by 
           // the 'oa' variable, so it cannot be freed
oa = null; // what was the 'a' property of the object originally in o 
           // has zero references to it. It can be garbage collected.

Limitation: cycles

There is a limitation when it comes to cycles. In the following example two objects are created and reference one another, thus creating a cycle. They will go out of scope after the function call, so they are effectively useless and could be freed. However, the reference-counting algorithm considers that since each of the two objects is referenced at least once, neither can be garbage-collected.

function f() {
  var o = {};
  var o2 = {};
  o.a = o2; // o references o2
  o2.a = o; // o2 references o
  return 'azerty';
}
f();

Real-life example

Internet Explorer 6 and 7 are known to have reference-counting garbage collectors for DOM objects. Cycles are a common mistake that can generate memory leaks:

var div;
window.onload = function() {
  div = document.getElementById('myDivElement');
  div.circularReference = div;
  div.lotsOfData = new Array(10000).join('*');
};

In the above example, the DOM element "myDivElement" has a circular reference to itself in the "circularReference" property. If the property is not explicitly removed or nulled, a reference-counting garbage collector will always have at least one reference intact and will keep the DOM element in memory even if it was removed from the DOM tree. If the DOM element holds lots of data (illustrated in the above example with the "lotsOfData" property), the memory consumed by this data will never be released.

Mark-and-sweep algorithm

This algorithm reduces the definition of "an object is not needed anymore" to "an object is unreachable".

This algorithm assumes the knowledge of a set of objects called roots (In JavaScript, the root is the global object). Periodically, the garbage-collector will start from these roots, find all objects that are referenced from these roots, then all objects referenced from these, etc. Starting from the roots, the garbage collector will thus find all reachable objects and collect all non-reachable objects.

This algorithm is better than the previous one since "an object has zero reference" leads to this object being unreachable. The opposite is not true as we have seen with cycles.

As of 2012, all modern browsers ship a mark-and-sweep garbage-collector. All improvements made in the field of JavaScript garbage collection (generational/incremental/concurrent/parallel garbage collection) over the last few years are implementation improvements of this algorithm, but not improvements over the garbage collection algorithm itself nor its reduction of the definition of when "an object is not needed anymore".

Cycles are not a problem anymore

In the first above example, after the function call returns, the 2 objects are not referenced anymore by something reachable from the global object. Consequently, they will be found unreachable by the garbage collector.

The same thing goes with the second example. Once the div and its handler are made unreachable from the roots, they can both be garbage-collected despite referencing each other.

Limitation: objects need to be made explicitly unreachable

Although this is marked as a limitation, it is one that is rarely reached in practice which is why no one usually cares that much about garbage collection.

See also