When working with big datasets, RAM conservation is critically
important. However, it is not always enough to just monitor the size of
the objects created. So-called “copy-on-modify” behavior, characteristic
of R, means that some expressions or functions may require an
unexpectedly large amount of RAM overhead. For example, replacing a
single value in a matrix (e.g., with ‘[<-’) duplicates that matrix in
the backend, making this task require twice as much RAM as that used by
the matrix itself. The peakRAM
package makes it easy to
monitor the total and peak RAM used so that developers can quickly
identify and eliminate RAM hungry code. You can get started with
peakRAM
by installing this package directly from CRAN.
The peakRAM
package, inspired by the very elegant
microbenchmark
package, offers an easy way to monitor the
total and peak RAM used by any number of R expressions or
functions, including anonymous functions. Simply call
peakRAM
with any number of comma-separated expressions or
functions provided as arguments. This function will execute each
argument piecewise, recording the amount of RAM allocated as a result of
that call (i.e., “Total RAM Used”) as well as the maximum amount of RAM
allocated at any point during that call (i.e., “Peak RAM Used”). Note
that throughout this package, all RAM use is measured in mebibytes
(MiB).
## Function_Call Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB
## 1 function()1:1e+07 0.000 0.0 0.0
## 2 1:1e+07 0.000 0.0 0.0
## 3 1:1e+07+1:1e+07 0.077 38.1 114.4
## 4 1:1e+07*2 0.121 76.3 114.4
What happened here? Well, we see that initializing the vector
1:1e7
requires ~38 MiB of RAM, whether done through an
anonymous function or not. Also, as we might expect, we see that adding
1:1e7
to 1:1e7
requires ~72 MiB of RAM, even
though the result only occupies ~38 MiB of RAM, because the vector
1:1e7
is initialized twice.
When RAM is valuable and the object is large, we want to avoid this
kind of overhead. To achieve this, we might try instead to double the
vector 1:1e7
, avoiding addition altogether. But wait, this
uses even more RAM. Why? Well, multiplying by 2
in
this case first copies the integer vector to a double
vector, then multiplies the double vector by 2
.
For an instant, the original integer vector and new
double vector exists simultaneously, occupying ~38 MiB plus ~72
MiB of RAM.
Alas, R is a most precarious lover. To conserve the maximum
amount of RAM, we need an approach that makes no needless copies. In
this case, we just need to force 2
to exist as an
integer.
## Function_Call Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB
## 1 1:1e+07*2:2 0.033 38.1 76.3
Now, we have a solution that we can scale confidently, knowing for sure that we will not unwittingly exceed memory capacity through superfluous RAM overhead.