Abstract
This paper focuses on the Cyclops64 computer architecture and presents an analytical model and performance simulation results for the preloading and loop unrolling approaches to optimize the performance of SVD (Singular Value Decomposition) benchmark. A performance model for dissecting the total execution cycles is presented. The data preloading using “memcpy” or hand optimized “inline” assembly code, and the loop unrolling approach are implemented and compared with each other in terms of the total number of memory access cycles. The key idea is to preload data from offchip to onchip memory and store the data back after the computation. These approaches can reduce the total memory access cycles and can thus improve the benchmark performance significantly.
Chapter PDF
Similar content being viewed by others
References
Cascaval, C., Castanos, J.G., Ceze, L., Denneau, M., Gupta, M., Lieber, D., Moreira, J.E., Strauss, K., Warren Jr., H.S.: Evaluation of a multithreaded architecture for cellular computing. In: HPCA 2002, pp. 311–322 (2002)
Almái, G., Cascaval, C., Castaños, J.G., Denneau, M., Lieber, D., Moreira, J.E., Warren, J.H.S.: Dissecting cyclops: a detailed analysis of a multithreaded architecture. In: MEDEA workshop, vol. 31, pp. 26–38 (2003)
Almasi, G.S., Caşcaval, C., Moreira, J.E., Denneau, M., Donath, W., Eleftheriou, M., Giampapa, M., Ho, H., Lieber, D., Newns, D., Snir, M., Henry, J., Warren, S.: Demonstrating the scalability of a molecular dynamics application on a petaflop computer. In: ICS 2001: Proceedings of the 15th international conference on Supercomputing, pp. 393–406. ACM Press, New York (2001)
del Cuvillo, J., Zhu, W., Hu, Z., Gao, G.R.: Fast: A functionally accurate simulation toolset for the cyclops-64 cellular architecture. In: Workshop on Modeling, Benchmarking and Simulation (MoBS), held in conjunction with the 32nd Annual Interantional Symposium on Computer Architecture (ISCA 2005), Madison, Wisconsin, June 4 (2005)
del Cuvillo, J., Zhu, W., Hu, Z., Gao, G.R.: Tiny threads: a thread virtual machine for the cyclops64 cellular architecture. In Fifth Workshop on Massively Parallel Processing (WMPP), held in conjunction with the 19th International Parallel and Distributed Processing System, Denver, Colorado, April 3-8 (2005)
del Cuvillo, J.B., Hu, Z., Zhu, W., Chen, F., Gao, G.R.: Toward a software infrastructure for the cyclops64 cellular architecture. CAPSL Memo 55, Department of ECE, Universisty of Delaware (2004)
Hestenes, M.R.: Inversion of matrices by biorthogonalization and related results. J. Soc. Induct. Appl. Math. 6, 51–90 (1958)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 IFIP International Federation for Information Processing
About this paper
Cite this paper
Niu, Y., Hu, Z., Barner, K., Gao, G.R. (2005). Performance Modelling and Optimization of Memory Access on Cellular Computer Architecture Cyclops64. In: Jin, H., Reed, D., Jiang, W. (eds) Network and Parallel Computing. NPC 2005. Lecture Notes in Computer Science, vol 3779. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11577188_18
Download citation
DOI: https://doi.org/10.1007/11577188_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29810-6
Online ISBN: 978-3-540-32246-7
eBook Packages: Computer ScienceComputer Science (R0)