Title:
|
pR: AUTOMATIC PARALLELIZATION OF DATA PARALLEL STATISTICAL COMPUTING CODES FOR R IN HYBRID MULTI-NODE AND MULTI-CORE ENVIRONMENTS |
Author(s):
|
Paul Breimyer , Guruprasad Kora , William Hendrix , Neil Shah , Nagiza F. Samatova |
ISBN:
|
978-972-8924-97-3 |
Editors:
|
Hans Weghorn and Pedro IsaĆas |
Year:
|
2009 |
Edition:
|
V II, 2 |
Keywords:
|
Statistical Computing, Automatic Parallelization, Data-Parallel |
Type:
|
Short Paper |
First Page:
|
22 |
Last Page:
|
27 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
The increasing size and complexity of modern scientific data sets challenge the capabilities of traditional statistical
computing. High-Performance Statistical Parallel Computing is a promising strategy to address these challenges,
especially as multi-core parallel computing architectures become increasingly prevalent. However, parallel statistical
computing introduces implementation complexities and, therefore, an automatic parallelization approach would be ideal.
Data-parallel statistical computations that aim to evaluate the same function on different subsets of data represent natural
candidates for automatic parallelization due to their inherent inter-process independence.
In this paper, we extend the pR middleware for the R open-source statistical environment to support automatic
parallelization of data-parallel tasks in multi-node, multi-core, and hybrid environments. pR requires few or no changes
to existing serial codes and yielded over 50% end-to-end execution time improvements in our tests, compared to the
commonly used snow R package. |
|
|
|
|