All the algorithms described in this paper have been implemented in the SUIF compiler system[33]. To evaluate the effectiveness of our proposed algorithm, we ran our compiler over a set of programs, ran our compiler-generated code on the Stanford DASH multiprocessor[24] and compared our results to those obtained without using our techniques.