Skip to content Skip to sidebar Skip to footer

Can Rpy2 Code Be Run In Parallel?

I have some Python code that passes a data frame to R via rpy2, whereupon R processes it and I pull the resulting data.frame back to R as a PANDAS data frame via com.load_data. Th

Solution 1:

rpy works by running a Python process and an R process in parallel, and exchange information between them. It does not take into account that R calls are called in parallel using multiprocess. So in practice, each of the python processes connects to the same R process. This probably causes the issues you see.

One way to circumvent this issue is to implement the parallel processing in R, and not in Python. You then send everything at once to R, this will process it in parallel, and the result will be sent back to Python.


Solution 2:

The following (python3) code suggests that, at least if a multiprocessing.Pool is used, separate R process are being spawned for each worker process (@lgautier is this right?)

import os
import multiprocessing
import time
num_processes = 3
import rpy2.robjects as robjects

def test_r_process(pause):
    n_called = robjects.r("times.called <- times.called + 1")[0]
    r_pid = robjects.r("Sys.getpid()")[0]
    print("R process for worker {} is {}. Pausing for {} seconds.".format(
        os.getpid(), r_pid, pause))
    time.sleep(pause)
    return(r_pid, n_called)


pause_secs = [2,4,3,6,7,2,3,5,1,2,3,3]
results = {}
robjects.r("times.called <- 0")
with multiprocessing.Pool(processes=num_processes) as pool:
    for proc, n_called in pool.imap_unordered(test_r_process, pause_secs):
        results[proc]=max(n_called, results.get(proc) or 0)
print("The test function should have been called {} times".format(len(pause_secs)))
for pid,called in results.items():
    print("R process {} was called {} times".format(pid,called))

On my OS X laptop results in something like

R process for worker 22535 is 22535. Pausing for 3 seconds.
R process for worker 22533 is 22533. Pausing for 2 seconds.
R process for worker 22533 is 22533. Pausing for 6 seconds.
R process for worker 22535 is 22535. Pausing for 7 seconds.
R process for worker 22534 is 22534. Pausing for 2 seconds.
R process for worker 22534 is 22534. Pausing for 3 seconds.
R process for worker 22533 is 22533. Pausing for 5 seconds.
R process for worker 22534 is 22534. Pausing for 1 seconds.
R process for worker 22535 is 22535. Pausing for 2 seconds.
R process for worker 22534 is 22534. Pausing for 3 seconds.
R process for worker 22535 is 22535. Pausing for 3 seconds.
The test function should have been called 12 times
R process 22533 was called 3.0 times
R process 22534 was called 5.0 times
R process 22535 was called 4.0 times

Post a Comment for "Can Rpy2 Code Be Run In Parallel?"