American Journal of Computer Architecture
2012; 1(3): 37-50
doi: 10.5923/j.ajca.20120103.01
1Department of Computer Science, University of Houston, TX 77901, Victoria
2Paradigm 5756 S. Rice Avenue, TX 77081, Houston
Correspondence to: Qi Zhu , Department of Computer Science, University of Houston, TX 77901, Victoria.
Email: |
Copyright © 2012 Scientific & Academic Publishing. All Rights Reserved.
Computer memory is central to the operation of a modern computer system; it stores data or program instructions on a temporary or permanent basis for use in a computer. However, there is an increasing gap between the speed of memory and the speed of microprocessors. In this paper, various memory management and optimization techniques are reviewed to reduce the gap, including the hardware designs of the memory organization such as memory hierarchical structure and cache design; the memory management techniques varying from replacement algorithms to optimization techniques; and virtual memory strategies from a primitive bare-machine approach to paging and segmentation strategies.
Keywords: Memory Hierarchy, Memory Organization and Management, Memory Optimization
Cite this paper: Qi Zhu , Ying Qiao , "A Survey on Computer System Memory Management and Optimization Techniques", American Journal of Computer Architecture, Vol. 1 No. 3, 2012, pp. 37-50. doi: 10.5923/j.ajca.20120103.01.
Figure 1. Working-set model |
Figure 2. Segmentation Hardware |
Figure 3. Paging Hardware with TLB |
[1] | Baer, J.L. and Chen, T.F. (1991) An Effective On-chip Preloading Scheme to Reduce Data Access Penalty, Proceeding Supercomputing 91, Albuquerqu, NM, November, 176-186. |
[2] | Bansal, S. and Modha, D.S. (2004) CAR: Clock with Adaptive Replacement, Proceeding of USENIX Conference on File and Storage Technologies, 187-200, San Francisco, CA. |
[3] | Belady, L.A. (1966) A Study of Replacement Algorithms for Virtual Storage Computers, IBM Systems, 25(5), 491-503. |
[4] | Belady, L.A. (1969) Dynamic Space Sharing in computer System, Communications of the ACM, 12(5), 282-288. |
[5] | Belayneh, S. and Kaeli, D. (1996) A Discussion on Non-blocking/lockup-free Caches, ACM SIGARCH Computer Architecture News, 24(3), 18-25. |
[6] | Bennett, B.T. and Franaczek, P.A. (1976) Cache Memory with Prefetching of data by Priority, IBM Tech. Disclosure Bull, 18(12), 4231-4232. |
[7] | Bennett, B.T., Pomerene, J.H., Puzak, T.R. and Rechtschaffen (1982) R.N. Prefetching in a Multilevel Memory Hierarchy, IBM Tech. Disclosure Bull, 25(1), 88. |
[8] | Bergey, A.L. (1978) Increased Computer Throughput by Conditioned Memory Data Prefetching, IBM Tech. Disclosure Bull, 20(10), 4103. |
[9] | Bhandarkar, D.; Ding, J. (1997) Performance Characterization of the Pentium Pro Processor, Proceeding of High-Performance Computer Architecture, 1-5 Feb, 288-297, San Antonio, Texas, USA. |
[10] | Black, B., Rychlik, B., and Shen, J.P. (1999) The Block-based Trace Cache, Proceeding of the 26th Annual International Symposium on Computer Architecture, 196-207. |
[11] | Burger, D., Goodman, J.R. and Sohi, G.S. (1997) Memory Systems. The Computer Science and Engineering Handbook, 47-461. |
[12] | Cao, P., Felten, E.W., Karlin, A.R. and Li, K. (1995) A Study of Integrated Prefetching and Caching Strategies, Proc. of ACM SIGMETRICS, 188-197. |
[13] | Carr, W.R., Hennessy, J.L. (1981) WSClock -A Simple and Effective Algorithm for Virtual Memory Management, Proc. of the ACM SOSP, 87-95. |
[14] | Carr, W.R. (1984) Virtual Memory Management. UMI Research Press. |
[15] | Chang, D.C., et al (1995) Microarchitecture of HaL’s Memory Management Unit, Compcon Digest of Papers, 272-279. |
[16] | Chan, K.K., et al. (1996) Design of the HP PA 7200 CPU, Hewlett-Packard Journal, 47(1), 25-33. |
[17] | Chen, W.Y, Mahlke, S.A, Chang, P.P., Hwu, W.W. (1991) Data Access Microarchitectures for Superscalar Processor with Compiler-Assisted Data Prefetching, Proceeding 24th International Symposium on Microcomputing, Albuquerque, NM, November, 69-73. |
[18] | Chen, T.; Baer, J. (1992) Reducing Memory Latency via Non-blocking and Prefetching Caches, Proceeding of ACM SIGPLAN Notice, 27(9), 51-61. |
[19] | Chen, T.; Baer, J. (1994) A Performance Study of Software and Hardware Data Prefetching Schemes, Proceeding of the 21st Annual International Symposium on Computer Architecture, Chicago, IL, April, 223-232. |
[20] | Cho, S. (2007) I-Cache Multi Banking and Vertical Interleaving, Proceeding of ACM Great Lakes Symposium on VLSI, March 11-13, 14-19, Stresa-Lago Maggiore, Italy. |
[21] | Chu, W.W. and Opderbeck, H. (1974) Performance of Replacement Algorithms with Different Page Sizes, Computer, 7(11), 14-21. |
[22] | Conti, C.J. (1969) Concepts for Buffer Storage, IEEE Computer Group News, 2(8), 9-13. |
[23] | Dahlgren, F. and P. Stenstrom (1995) Effectiveness of Hardware-based Stride and Sequential Prefetching in Shared-memory Multiprocessors,Proc. First IEEE Sympo- sium on High- Performance Computer Architecture, Raleigh, NC, Jan. 68-77. |
[24] | Daley, R., Dennis, J.B. (1968) Virtual Memory Processes, and sharing in Multics, Communication of the ACM, 11(5), 306-312. |
[25] | Dennis, J.B. (1965) Segmentation and the Design of Multi-programmed Computer System, Journal of ACM, 12(4), 589-602. |
[26] | Denning, P.J. and Van Horn, E.C. (1966) Programming Semantics for Multi-programmed Computations, Communications of the ACM, 9(3), 143-155. |
[27] | Denning, P.J. (1968) The Working Set Model for Program Behavior, Communications of the ACM, 11(5), 323-333. |
[28] | Denning, P.J. (1972) On Modeling Program Behavior, Proc Spring Joint Computer Conference, 40, 937-944. |
[29] | Denning, P.J. (1980) Working Sets Past and Present, IEEE Transactions on Software Engineering, SE6(1), 64-84. |
[30] | Denning, P.J. (2005) The Locality Principle, Communications of the ACM, 48(7), 19- 24. |
[31] | Doran, R.W. (1976) Virtual Memory, Computer, 9(10), 27-37. |
[32] | Farkas, K.I.; Jouppi, N.P.; Chow, P. (1994) How Useful Are Non-Blocking Loads, Stream Buffers, and Speculative Execution in Multiple issue Processors? Western Research Laboratory Research Report 94/8. |
[33] | J. Feldman, Computer Architecture, A Designer’s Text Based on a Generic RISC Architecture, 1st edition, McGraw-Hill, 1994. |
[34] | Foss, R.C. (2008) DRAM - A Personal View, IEEE Solid-State Circuits Newsletter, 13(1), 50-56. |
[35] | Fu, J.W.C, Patel, J.H., and Janssens, B.L. (1992) Stride Directed Prefetching in Scalar Processors, Proceeding of 25th International Symposium on Microarchitecture, Portland, OR, December, 102-110. |
[36] | Fukunaga, K. and Kasai, T. (1977) The Efficient Use of Buffer Storage, Proc. ACM 1977 Annual Conference, 399-403. |
[37] | Galazin, A.B., Stupachenko, E.V., and Shlykov, S.L. (2008) A Software Instruction Prefetching Method in Architectures with Static Scheduling, Programming and Computer Software, 34(1), 49-53. |
[38] | Gelenbe, E. (1971) The Two-Thirds Rule for Dynamic Storage Allocation Under Equilibrium, Information Processing Letters, 1(2), 59-60. |
[39] | Gelenbe, E. (1973) The Distribution of a Program in Primary and Fast Buffer Stor- age, Communications of the ACM, 16(7), 431-434. |
[40] | Gelenbe, E. (1973) Minimizing Wasted Space in Partitioned Segmentation, Communications of the ACM, 16(6), 343-349. |
[41] | Gelenbe, E., Tiberio, P. and Boekhorst, J.C. (1973) Page Size in Demand-paging Systems, Acta Informatica, 3, 1-24. |
[42] | Gelenbe, E. (1973) A Unified Approach to the Evaluation of a Class of Replacement Algorithms, IEEE Transactions on Computers, 22(6), 611-618. |
[43] | Gelenbe, E. and Zhu, Q. (2001) Adaptive Control of Prefetching, Performance Evaluation, 46, 177-192. |
[44] | Ghasemzadeh, H., Mazrouee,S., Moghaddam, H.G., Shojaei, H. and Kakoee, M.R. (2006) Hardware Implementation of Stack-Based Replacement Algorithms, World Academy of Science, Engineering and Technology, 16, 135-139. |
[45] | Glass, G. and Cao, P. (1997) Adaptive page replacement based on memory reference behavior, Proc of the 1997 ACM SIGMETRICS, 25(1), 115-126. |
[46] | Gibson, D.H. (1967) Consideration in Block-oriented Systems Design, Proc Spring Jt Computer Conf, 30, Thompson Books, 75-80. |
[47] | Gupta, R.K. and Franklin, M.a. (1978) Working Set and Page Fault Frequency Replacement Algorithms: A Performance Comparison, IEEE Transactions on Comput- ers, C-27, 706-712. |
[48] | Handy, J (1997) The Cache Memory Book, Morgan Kaufmann Publishers. |
[49] | Hanlon, A.G. (1966) Content-Addressable and Associative Memory System - A Survey, IEEE Transactions on Electronic Computers, 15(4), 509-521. |
[50] | Hennessy, J.L. and Patterson, D.A. (2007) Computer architecture: a quantitative approach, Morgan Kaufmann Publishers Inc., San Francisco, CA, 4th Edition. |
[51] | ILiffe, J.K. (1968) Basic Machine Principles. American Elsevier, New York. |
[52] | Jacob, B. and Mudge, t. (1998) Virtual Memory: Issues of Implementation, IEEE Computer, 31(6), 33-43. |
[53] | Jayarekha, P.; Nair, T.R (2009) Proceeding of InterJRI Computer Science and Net- working, 1(1), Dec, 24-30. |
[54] | Johnston, J.B. (1969) The Structure of Multiple Activity Algorithms, Proc. Third Annual Princeton Conf. 80-82. |
[55] | Jones, F. et al., (1992) A New Era of Fast Dynamic RAMs, IEEE Spectrum, 43-49. |
[56] | Kaplan, K.R. and Winder, R.O. (1973) Cache-based Computer Systems, IEEE Computer, 6(3), 30-36. |
[57] | Kane, G. and Heinrich J. (1992) MIPS RISC Architecture, Prentice Hall. |
[58] | Karedla, R., Love, J.S. and Wherry, B. (1994) Caching Strategies to Improve Disk System Performance, IEEE Computer, 27(2), 38-46. |
[59] | Karlsson, M., Dahlgren, F., and Stenstrom, P. (2000) A Prefetching Technique for Irregular Accesses to Linked Data Structures, Proceeding of the 6th International conference on High Performance Computer Architecture, Toulouse, France, January, 206-217. |
[60] | Lam, M.; Rothberg, E.; Wolf, M.E. (1991) The Cache Performance and Optimization of Blocked Algorithms, Proceeding of the Fourth International Conference on ASPLOSIV, Santa Clara, Apr, 63-74. |
[61] | Linduist, A.B., Seeder, R. R. and Comeau,L.W. (1966) A Time-Sharing System Using an Associative Memory, Proceeding of the IEEE, 54, 1774-1779. |
[62] | Lin, Y.S. and Mattson, R.L. (1972) Cost-Performance Evaluation of Memory Hierarchies, IEEE Transactions Magazine, MAG-8(3), 390-392. |
[63] | Luk, C.K and Mowry, T.C (1996) Compiler-based Prefetching for Recursive Data Structures, Proceeding of 7th Conference on Architectural Support for Programming Languages and Operating Systems, Cambridge, MA, October, 222-233. |
[64] | Mackenzie, F.B. (1965) Automated Secondary Storage Management, Datamation, 11, 24-28. |
[65] | Mahapatra, N.R., and Venkatrao, B. (1999) The processor-memory bottleneck: problems and solutions, Crossroads, 5(3). |
[66] | Mahajan, A.R. and Ali, M.S. (2007) Optimization of Memory System in Real-time Embedded Systems, Proceeding of the 11th WSEAS International Conference on Computers. 13-19. |
[67] | Marshall, W.T. and Nute, C.T. (1979) Analytic modeling of working set like replacement algorithms, Proc. of ACM SIGMETRICS, 65-72. |
[68] | Megiddo, N.; and Modha, D.S. (2003) ARC:A Self-tuning, Low Overhead Replacement Cache, Proceeding of 2nd USENIX conference on File and Storage Technologies, 115-130, San Franciso, CA. |
[69] | Mowry, T. (1991) Tolerating Latency throughSoftware-controller Prefetching in Shared-memory Multiprocessors, Journal of Parallel and Distributed Computing, 12(2), 87-106. |
[70] | Ng, Ray (1992) Fast Computer Memories, IEEE Spectrum, 36-39. |
[71] | Olukotun, K., Mudge, T., and Brown, R. (1997) Multilevel Optimization of Piplelined Caches. IEEE Transactions on Computers, 46, 10, 1093-1102. |
[72] | Organick, E.I. (1972) The Multics System: An Examination of Its Structure, MIT Press. |
[73] | Ou, L., He, X.B., Kosa, M.J., and Scott, S.L. (2005) A Unified Multiple-Level Cache for High Performance Storage Systems, mascots, 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, 143-152. |
[74] | Panda, P.R. et al (2001) Data and Memory Optimization Techniques for Embedded System, ACM Transactions on Design Automation of Electronic Systems, 6(2), 149-206. |
[75] | Perkins, D.R. (1980) The Design and Management of Predictive Caches, PH.D dissertation, Univ of Calif., San Diego. |
[76] | Peters, M. (2000) Enhanced Memory Systems, Personal Communications, September. |
[77] | Prieve, B.G. and Fabry, R.S. (1976) VMIN - An Optimal Variable Space Page Re- placement Algorithm, Communications of the ACM, 19(5), 295-297. |
[78] | Ramirez, A., Larriba-Pey, J.L., and Valero, M. (2005) Software Trace Cache, IEEE Transactions on Computer, 54(1), 22-35. |
[79] | Randell, B. and Kuehner, C.J. (1968) Dynamic Storage Allocation Systems, Communications of the ACM, 297-305. |
[80] | Rivers, J.A.; Tyson, G.S.; Dividson, E.S.; Austin, T.M. (1997) On High-Bandwidth Data Cache Design for Multi-Issue Processors, Proceeding of International Sympo- sium of Microarchitecture, Dec, 46-56. |
[81] | Rotenerg, E., Bennett, S., and Smith, J.E. (1999) A Trace Cache Microarchitecture and Evaluation, IEEE Transactions on Computers, 48(2), 111-120. |
[82] | Roth, A. and Sohi, G., (1999) Effective Jump-pointer Prefetching for Linked Data Structures. Proceeding of the 26th International Symposium on Computer Architecture, Atlanta, GA, May, 111-121. |
[83] | Pas, R. (2002) Memory Hierarchy in Cache-Based Systems, Sun Microsystems. |
[84] | Sadeh, E. (1975) An Analysis of the Performance of the Page Fault Frequency (PFF) Replacement Algorithm, Proc of the fifth ACM SOSP, 6-13. |
[85] | Samples, A.D.; Hilfinger, P.N. (1988) Code Reorganization for Instruction Caches, University of California at Berkeley, Berkeley, CA. |
[86] | Serhan, S.I. and Abdel-Haq, H.M. (2007) Improving Cache Memory Utilization, World Academy of Science and Technology, 26, 299-304. |
[87] | Shemer, J.E. and Shippey, B. (1966) Statistical Analysis of Paged and Segmented Computer Systems. IEEE Trans. EC-15, 6, 855-863. |
[88] | Shiell, J. (1986) Virtual memory, virtual Machines, Byte, 11(11), 110-121. |
[89] | Shyu, I. (1995) Virtual Address Translation for Wide-Address Architectures, ACM SIGOPS Operating Systems Review, 29(4), 37-46. |
[90] | Smaragdakis, Y., Kaplan, S. and Wilson, P. (2003) The EELRU Adaptive Replacement Algorithm, Performance Evaluation, 53(2), 93-123. |
[91] | Smith, A.J. (1978) A Comparative Study of Set Associative Memory Mapping Algoirthms and Their Use for Cache and Main Memory, IEEE Trans. Softw. Eng, 4(2), 121-130, |
[92] | Smith, A.J. (1978) Sequential Program Prefetching in Memory Hierarchies, IEEE Computer, 11(12), 7-21. |
[93] | Smith, A.J. (1978) Bibliography on Paging And Related Topics, Operating Systems Review, 12(4), 39-56. |
[94] | Smith, A.J. (1982) Cache Memories, Computing Surveys, 14(3), 473-530. |
[95] | Smith, A.J. (1987) Line (block) Size Choice for CPU Cache Memories, IEEE Trans- actions on Computers, 36(9), 1063-1076. |
[96] | Sondag, T. and Rajan, H. (2009) A Theory of Reads and Writes for Multi-level Caches. Technical Report 09-20a, Computer Science, Iowa State University. |
[97] | Talluri, M., Hill, M.D. and Khalidi, Y.A. (1995) A New Page Table for 64-bit Address Spaces, Proc of SIGOPS, 184-200. |
[98] | Thakkar, S.S. and Knowles, A.E. (1986) Ahigh-performance memory management scheme, IEEE Computer, 19(5), 8-22. |
[99] | Tuck, N; Tullsen, D.M. (2003) Initial Observations of the Simultaneous Multithreading Pentium 4 Processor, Proceeding of Parallel Architectures and Compilation Tech- niques, Oct, 26-34. |
[100] | A.M Turing (1937) On Computable Numbers, with an Application to the Entscheidungs problem, Proceedings of the London Mathematical Society, 42(2), 230-265. |
[101] | Turner, R. and Levy, H. (1981) Segmented FIFO Page Replacement, Proc. of ACM SIGMETRICS on Measurement and Modeling of Computer System, 48-51. |
[102] | Wolf, M.E., Lam, M. (1991) A Data Locality Optimizing Algorithm. Proceeding of the SIGPLAN conference on Programming Language Design and Implementation, Toronto, 26(6), 30-44. |
[103] | Wolf, W, Kandemir, M. (2003) Memory System Optimization of Embedded Soft- ware. Proceeding of IEEE, Jan, 91(1), 165-182. |
[104] | Yeager, K.C.(1996) The MIPS R10000 Superscaar Microprocessor, IEEE Micro, Apr., 16(2), 28-40, 1996. |
[105] | Yi, Q. and Kennedy, K. (2004) Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion, International Journal of High Performance Computing Applications, 18(2), 237-253. |
[106] | Zhu, Q., Gelenbe, E. and Qiao, Y (2008) Adaptive Prefetching Algorithm in Disk Controllers, Performance Evaluation, 65(5), 382-395. |