Next: About this document
Up:
Data and Computation
Previous: Acknowledgements
References
- 1
-
A. Agarwal, D. Chaiken, G. D'Souza, K. Johnson, and D. Kranz et. al.
The MIT Alewife machine: A large-scale distributed memory
multiprocessor.
In Scalable Shared Memory Multiprocessors. Kluwer Academic
Publishers, 1991.
- 2
-
A. Agarwal, D. Kranz, and V. Natarajan.
Automatic paritioning of parallel loops for cache-coherent
multiprocessors.
In Proceedings of the 1993 International Conference on Parallel
Processing, St. Charles, IL, August 1993.
- 3
-
A. V. Aho, R. Sethi, and J. D. Ullman.
Compilers: Principles, Techniques, and Tools.
Addison-Wesley, Reading, MA, second edition, 1986.
- 4
-
J. M. Anderson and M. S. Lam.
Global optimizations for parallelism and locality on scalable
parallel machines.
In Proceedings of the SIGPLAN '93 Conference on Programming
Language Design and Implementation, pages 112--125, Albuquerque, NM, June
1993.
- 5
-
B. Appelbe and B. Lakshmanan.
Optimizing parallel programs using affinity regions.
In Proceedings of the 1993 International Conference on Parallel
Processing, pages 246--249, St. Charles, IL, August 1993.
- 6
-
U. Banerjee, R. Eigenmann, A. Nicolau, and D. Padua.
Automatic program parallelization.
Proceedings of the IEEE, 81(2):211--243, February 1993.
- 7
-
B. Bixby, K. Kennedy, and U. Kremer.
Automatic data layout using 0-1 integer programming.
In Proceedings of the International Conference on Parallel
Architectures and Compilation Techniques (PACT), pages 111--122, Montreal,
Canada, August 1994.
- 8
-
W. J. Bolosky and M. L. Scott.
False sharing and its effect on shared memory performance.
In Proceedings of the USENIX Symposium on Experiences with
Distributed and Multiprocessor Systems (SEDMS IV), pages 57--71, San Diego,
CA, September 1993.
- 9
-
S. Carr, K. S. M.2emcKinley, and C.-W. Tseng.
Compiler optimizations for improving data locality.
In Proceedings of the Sixth International Conference on
Architectural Support for Programming Languages and Operating Systems
(ASPLOS-VI), pages 252--262, San Jose, CA, October 1994.
- 10
-
M. Cierniak and W. Li.
Unifying data and control transformations for distributed shared
memory machines.
Technical Report TR-542, Department of Computer Science, University
of Rochester, November 1994.
- 11
-
S. J. Eggers and T. E. Jeremiassen.
Eliminating false sharing.
In Proceedings of the 1991 International Conference on Parallel
Processing, pages 377--381, St. Charles, IL, August 1991.
- 12
-
S. J. Eggers and R. H. Katz.
The effect of sharing on the cache and bus performance of parallel
programs.
In Proceedings of the Third International Conference on
Architectural Support for Programming Languages and Operating Systems
(ASPLOS-III), pages 257--270, Boston, MA, April 1989.
- 13
-
D. Gannon, W. Jalby, and K. Gallivan.
Strategies for cache and local memory management by global program
transformation.
Journal of Parallel and Distributed Computing, 5(5):587--616,
October 1988.
- 14
-
R. L. Graham, D. E. Knuth, and O. Patashnik.
Concrete Mathematics.
Addison-Wesley, Reading, MA, 1989.
- 15
-
M. Gupta and P. Banerjee.
Demonstration of automatic data partitioning techniques for
parallelizing compilers on multicomputers.
IEEE Transactions on Parallel and Distributed Systems,
3(2):179--193, March 1992.
- 16
-
J. L. Hennessy and D. A. Patterson.
Computer Architecture A Quantitative Approach.
Morgan Kaufmann Publishers, San Mateo, CA, 1990.
- 17
-
High Performance Fortran Forum.
High Performance Fortran language specification.
Scientific Programming, 2(1-2):1--170, 1993.
- 18
-
S. Hiranandani, K. Kennedy, and C.-W. Tseng.
Compiling Fortran D for MIMD distributed-memory machines.
Communications of the ACM, 35(8):66--80, August 1992.
- 19
-
T. E. Jeremiassen and S. J. Eggers.
Reducing false sharing on shared memory multiprocessors through
compile time data transformations.
Technical Report UW-CSE-94-09-05, Department of Computer Science and
Engineering, University of Washington, September 1994.
- 20
-
Y. Ju and H. Dietz.
Reduction of cache coherence overhead by compiler data layout and
loop transformation.
In U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors,
Languages and Compilers for Parallel Computing, Fourth International
Workshop, pages 344--358, Santa Clara, CA, August 1991. Springer-Verlag.
- 21
-
Kendall Square Research, Waltham, MA.
KSR1 Principles of Operation, revision 6.0 edition, October
1992.
- 22
-
Kuck & Associates, Inc.
KAP User's Guide.
Champaign, IL 61820, 1988.
- 23
-
M. S. Lam, E. E. Rothberg, and M. E. Wolf.
The cache performance and optimizations of blocked algorithms.
In Proceedings of the Fourth International Conference on
Architectural Support for Programming Languages and Operating Systems
(ASPLOS-IV), pages 63--74, Santa Clara, CA, April 1991.
- 24
-
D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens, A. Gupta, and
J. Hennessy.
The DASH prototype: Implementation and performance.
In Proceedings of the 19th International Symposium on Computer
Architecture, pages 92--105, Gold Coast, Australia, May 1992.
- 25
-
J. Li and M. Chen.
The data alignment phase in compiling programs for distributed-memory
machines.
Journal of Parallel and Distributed Computing, 13(2):213--221,
October 1991.
- 26
-
T. J. Sheffler, R. Schreiber, J. R. Gilbert, and S. Chatterjee.
Aligning parallel arrays to reduce communication.
In Frontiers '95: The 5th Symposium on the Frontiers of
Massively Parallel Computation, pages 324--331, McLean, VA, February 1995.
- 27
-
J. P. Singh, W.-D. Weber, and A. Gupta.
SPLASH: Stanford parallel applications for shared-memory.
Computer Architecture News, 20(1):5--44, March 1992.
- 28
-
J.P. Singh, T. Joe, A. Gupta, and J. L. Hennessy.
An empirical comparison of the Kendall Square Research KSR-1 and
Stanford DASH multiprocessors.
In Proceedings of Supercomputing '93, pages 214--225, Portland,
OR, November 1993.
- 29
-
O. Temam, E. D. Granston, and W. Jalby.
To copy or not to copy: A compile-time technique for assessing when
data copying should be used to eliminate cache conflicts.
In Proceedings of Supercomputing '93, pages 410--419, Portland,
OR, November 1993.
- 30
-
J. Torrellas, M. S. Lam, and J. L. Hennessy.
Shared data placement optimizations to reduce multiprocessor cache
miss rates.
In Proceedings of the 1990 International Conference on Parallel
Processing, pages 266--270, St. Charles, IL, August 1990.
- 31
-
E. Torrie, C-W. Tseng, M. Martonosi, and M. W. Hall.
Evaluating the impact of advanced memory systems on
compiler-parallelized codes.
In Proceedings of the International Conference on Parallel
Architectures and Compilation Techniques (PACT), June 1995.
- 32
-
C-W. Tseng.
Compiler optimizations for eliminating barrier synchronization.
In Proceedings of the Fifth ACM SIGPLAN Symposium on
Principles and Practice of Parallel Programming, July 1995.
- 33
-
R. P. Wilson, R. S. French, C. S. Wilson, S. P. Amarasinghe, J. M. Anderson,
S. W. K. Tjiang, S.-W. Liao, C.-W. Tseng, M. W. Hall, M. S. Lam, and J. L.
Hennessy.
SUIF: An infrastructure for research on parallelizing and
optimizing compilers.
ACM SIGPLAN Notices, 29(12):31--37, December 1994.
- 34
-
M. E. Wolf and M. S. Lam.
A data locality optimizing algorithm.
In Proceedings of the SIGPLAN '91 Conference on Programming
Language Design and Implementation, pages 30--44, Toronto, Canada, June
1991.
- 35
-
M. E. Wolf and M. S. Lam.
A loop transformation theory and an algorithm to maximize
parallelism.
IEEE Transactions on Parallel and Distributed Systems,
2(4):452--471, October 1991.
Saman Amarasinghe
Fri Apr 7 11:22:17 PDT 1995