JabRef References output

Author	Title	Year	Journal/Proceedings	Reftype	DOI/PDF
Ahluwalia, K. S.	Scalability design patterns [Abstract] [BibTeX]	2007	Proceedings of the 14th Conference on Pattern Languages of Programs (pp. 2:1-2:8). PLOP '07. ACM.	inproceedings	DOI PDF
Abstract: This paper presents a pattern language that can be used to make a system highly scalable. This pattern language applies to software systems which need to scale. The pattern language addresses this problem by introducing patterns those touch upon introduction of parallelism to even optimization of algorithms and hardware.
BibTeX: @inproceedings{Ahluwalia2007, author = {Ahluwalia, Kanwardeep Singh}, title = {Scalability design patterns}, booktitle = {Proceedings of the 14th Conference on Pattern Languages of Programs}, publisher = {ACM}, year = {2007}, pages = {2:1--2:8}, url = {http://doi.acm.org/10.1145/1772070.1772073}, doi = {http://doi.acm.org/10.1145/1772070.1772073} }
Amdahl, G. M.	Validity of the single processor approach to achieving large scale computing capabilities [Abstract] [BibTeX]	1967	Proceedings of the April 18-20, 1967, spring joint computer conference (pp. 483-485). AFIPS '67 (Spring). ACM.	inproceedings	DOI PDF
Abstract: For over a decade prophets have voiced the contention that the organization of a single computer has reached its limits and that truly significant advances can be made only by interconnection of a multiplicity of computers in such a manner as to permit cooperative solution. Variously the proper direction has been pointed out as general purpose computers with a generalized interconnection of memories, or as specialized computers with geometrically related memory interconnections and controlled by one or more instruction streams.
BibTeX: @inproceedings{Amdahl1967, author = {Amdahl, Gene M.}, title = {Validity of the single processor approach to achieving large scale computing capabilities}, booktitle = {Proceedings of the April 18-20, 1967, spring joint computer conference}, publisher = {ACM}, year = {1967}, pages = {483--485}, url = {http://doi.acm.org/10.1145/1465482.1465560}, doi = {http://doi.acm.org/10.1145/1465482.1465560} }
Asanovic, K., Bodik, R., Catanzaro, B. C., Gebis, J. J., Husbands, P., Keutzer, K., Patterson, D. A., Plishker, W. L., Shalf, J., Williams, S. W. & Yelick, K. A.	The Landscape of Parallel Computing Research: A View from Berkeley [Abstract] [BibTeX]	2006	(UCB/EECS-2006-183)	techreport	PDF
Abstract: The recent switch to parallel microprocessors is a milestone in the history of computing. Industry has laid out a roadmap for multicore designs that preserves the programming paradigm of the past via binary compatibility and cache coherence. Conventional wisdom is now to double the number of cores on a chip with each silicon generation. A multidisciplinary group of Berkeley researchers met nearly two years to discuss this change. Our view is that this evolutionary approach to parallel hardware and software may work from 2 or 8 processor systems, but is likely to face diminishing returns as 16 and 32 processor systems are realized, just as returns fell with greater instruction-level parallelism. We believe that much can be learned by examining the success of parallelism at the extremes of the computing spectrum, namely embedded computing and high performance computing. This led us to frame the parallel landscape with seven questions, and to recommend the following: The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS per development dollar. Instead of traditional benchmarks, use 13 "Dwarfs" to design and evaluate parallel programming models and architectures. (A dwarf is an algorithmic method that captures a pattern of computation and communication.) "Autotuners" should play a larger role than conventional compilers in translating parallel programs. To maximize programmer productivity, future programming models must be more human-centric than the conventional focus on hardware or applications. To be successful, programming models should be independent of the number of processors. To maximize application efficiency, programming models should support a wide range of data types and successful models of parallelism: task-level parallelism, word-level parallelism, and bit-level parallelism. Architects should not include features that significantly affect performance or energy if programmers cannot accurately measure their impact via performance counters and energy counters. Traditional operating systems will be deconstructed and operating system functionality will be orchestrated using libraries and virtual machines. To explore the design space rapidly, use system emulators based on Field Programmable Gate Arrays (FPGAs) that are highly scalable and low cost. Since real world applications are naturally parallel and hardware is naturally parallel, what we need is a programming model, system software, and a supporting architecture that are naturally parallel. Researchers have the rare opportunity to re-invent these cornerstones of computing, provided they simplify the efficient programming of highly parallel systems.
BibTeX: @techreport{Asanovic2006, author = {Asanovic, Krste and Bodik, Ras and Catanzaro, Bryan Christopher and Gebis, Joseph James and Husbands, Parry and Keutzer, Kurt and Patterson, David A. and Plishker, William Lester and Shalf, John and Williams, Samuel Webb and Yelick, Katherine A.}, title = {The Landscape of Parallel Computing Research: A View from Berkeley}, year = {2006}, number = {UCB/EECS-2006-183}, url = {http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html} }
Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J., Morgan, N., Patterson, D., Sen, K., Wawrzynek, J., Wessel, D. & Yelick, K.	A view of the parallel computing landscape [Abstract] [BibTeX]	2009	Commun. ACM Vol. 52 (pp. 56-67). ACM.	article	DOI PDF
Abstract: Writing programs that scale with increasing numbers of cores should be as easy as writing programs for sequential computers.
BibTeX: @article{Asanovic2009, author = {Asanovic, Krste and Bodik, Rastislav and Demmel, James and Keaveny, Tony and Keutzer, Kurt and Kubiatowicz, John and Morgan, Nelson and Patterson, David and Sen, Koushik and Wawrzynek, John and Wessel, David and Yelick, Katherine}, title = {A view of the parallel computing landscape}, journal = {Commun. ACM}, publisher = {ACM}, year = {2009}, volume = {52}, pages = {56--67}, url = {http://doi.acm.org/10.1145/1562764.1562783}, doi = {http://doi.acm.org/10.1145/1562764.1562783} }
Borkar, S.	Thousand core chips: a technology perspective [Abstract] [BibTeX]	2007	Proceedings of the 44th annual Design Automation Conference (pp. 746-749). DAC '07. ACM.	inproceedings	DOI PDF
Abstract: This paper presents the many-core architecture, with hundreds to thousands of small cores, to deliver unprecedented compute performance in an affordable power envelope. We discuss fine grain power management, memory bandwidth, on die networks, and system resiliency for the many-core system.
BibTeX: @inproceedings{Borkar2007, author = {Borkar, Shekhar}, title = {Thousand core chips: a technology perspective}, booktitle = {Proceedings of the 44th annual Design Automation Conference}, publisher = {ACM}, year = {2007}, pages = {746--749}, url = {http://doi.acm.org/10.1145/1278480.1278667}, doi = {http://doi.acm.org/10.1145/1278480.1278667} }
Borkar, S.	Design challenges of technology scaling [Abstract] [BibTeX]	1999	Micro, IEEE Vol. 19 (4) (pp. 23 -29).	article	DOI PDF
Abstract: Scaling advanced CMOS technology to the next generation improves performance, increases transistor density, and reduces power consumption. Technology scaling typically has three main goals: 1) reduce gate delay by 30%, resulting in an increase in operating frequency of about 43%; 2) double transistor density; and 3) reduce energy per transition by about 65%, saving 50% of power (at a 43% increase in frequency). These are not ad hoc goals; rather, they follow scaling theory. This article looks closely at past trends in technology scaling and how well microprocessor technology and products have met these goals. It also projects the challenges that lie ahead if these trends continue. This analysis uses data from various Intel microprocessors; however, this study is equally applicable to other types of logic designs. Is process technology meeting the goals predicted by scaling theory? An analysis of microprocessor performance, transistor density, and power trends through successive technology generations helps identify potential limiters of scaling, performance, and integration
BibTeX: @article{Borkar1999, author = {Borkar, S.}, title = {Design challenges of technology scaling}, journal = {Micro, IEEE}, year = {1999}, volume = {19}, number = {4}, pages = {23 -29}, doi = {10.1109/40.782564} }
Borkar, S.	Technology trends and design challenges for microprocessor design [BibTeX]	1998	Proc. 24th European Solid-State Circuits Conf. ESSCIRC '98 (pp. 7-8).	inproceedings	DOI
BibTeX: @inproceedings{Borkar1998a, author = {Borkar, S.}, title = {Technology trends and design challenges for microprocessor design}, booktitle = {Proc. 24th European Solid-State Circuits Conf. ESSCIRC '98}, year = {1998}, pages = {7--8}, doi = {10.1109/ESSCIR.1998.186199} }
Borkar, S. & Chien, A. A.	The future of microprocessors [BibTeX]	2011	Commun. ACM Vol. 54 (pp. 67-77). ACM.	article	DOI PDF
BibTeX: @article{Borkar2011, author = {Borkar, Shekhar and Chien, Andrew A.}, title = {The future of microprocessors}, journal = {Commun. ACM}, publisher = {ACM}, year = {2011}, volume = {54}, pages = {67--77}, url = {http://doi.acm.org/10.1145/1941487.1941507}, doi = {http://doi.acm.org/10.1145/1941487.1941507} }
Chen, Y., Sun, X.-H. & Wu, M.	Algorithm-system scalability of heterogeneous computing [Abstract] [BibTeX]	2008	J. Parallel Distrib. Comput. Vol. 68 (pp. 1403-1412). Academic Press, Inc..	article	DOI PDF
Abstract: Scalability is a key factor of the design of distributed systems and parallel algorithms and machines. However, conventional scalabilities are designed for homogeneous parallel processing. There is no suitable and commonly accepted definition of scalability metric for heterogeneous systems. Isospeed scalability is a well-defined metric for homogeneous computing. This study extends the isospeed scalability metric to general heterogeneous computing systems. The proposed isospeed-efficiency model is suitable for both homogeneous and heterogeneous computing. Through theoretical analyses, we derive methodologies of scalability measurement and prediction for heterogeneous systems. Experimental results have verified the analytical results and confirmed that the proposed isospeed-efficiency scalability works well in both homogeneous and heterogeneous environments.
BibTeX: @article{Chen2008, author = {Chen, Yong and Sun, Xian-He and Wu, Ming}, title = {Algorithm-system scalability of heterogeneous computing}, journal = {J. Parallel Distrib. Comput.}, publisher = {Academic Press, Inc.}, year = {2008}, volume = {68}, pages = {1403--1412}, url = {http://portal.acm.org/citation.cfm?id=1435004.1435136}, doi = {10.1016/j.jpdc.2008.06.007} }
Childs, H., Pugmire, D., Ahern, S., Whitlock, B., Howison, M., Prabhat, Weber, G. H. & Bethel, E. W.	Extreme Scaling of Production Visualization Software on Diverse Architectures [Abstract] [BibTeX]	2010	IEEE Comput. Graph. Appl. Vol. 30 (pp. 22-31). IEEE Computer Society Press.	article	DOI PDF
Abstract: This article presents the results of experiments studying how the pure-parallelism paradigm scales to massive data sets, including 16,000 or more cores on trillion-cell meshes, the largest data sets published to date in the visualization literature. The findings on scaling characteristics and bottlenecks contribute to understanding how pure parallelism will perform in the future.
BibTeX: @article{Childs2010, author = {Childs, Hank and Pugmire, David and Ahern, Sean and Whitlock, Brad and Howison, Mark and Prabhat, and Weber, Gunther H. and Bethel, E. Wes}, title = {Extreme Scaling of Production Visualization Software on Diverse Architectures}, journal = {IEEE Comput. Graph. Appl.}, publisher = {IEEE Computer Society Press}, year = {2010}, volume = {30}, pages = {22--31}, url = {http://dx.doi.org/10.1109/MCG.2010.51}, doi = {http://dx.doi.org/10.1109/MCG.2010.51} }
Cho, S. & Melhem, R. G.	On the Interplay of Parallelization, Program Performance, and Energy Consumption [Abstract] [BibTeX]	2010	IEEE Transactions on Parallel and Distributed Systems Vol. 21 (pp. 342-353). IEEE Computer Society.	article	DOI PDF
Abstract: This paper derives simple, yet fundamental formulas to describe the interplay between parallelism of an application, program performance, and energy consumption. Given the ratio of serial and parallel portions in an application and the number of processors, we derive optimal frequencies allocated to the serial and parallel regions in an application to either minimize the total energy consumption or minimize the energy-delay product. The impact of static power is revealed by considering the ratio between static and dynamic power and quantifying the advantages of adding to the architecture capability to turn off individual processors and save static energy. We further determine the conditions under which one can obtain both energy and speed improvement, as well as the amount of improvement. While the formulas we obtain use simplifying assumptions, they provide valuable theoretical insights into energy-aware processor resource management. Our results form a basis for several interesting research directions in the area of energy-aware multicore processor architectures
BibTeX: @article{Cho2010, author = {Sangyeun Cho and Rami G. Melhem}, title = {On the Interplay of Parallelization, Program Performance, and Energy Consumption}, journal = {IEEE Transactions on Parallel and Distributed Systems}, publisher = {IEEE Computer Society}, year = {2010}, volume = {21}, pages = {342-353}, doi = {http://doi.ieeecomputersociety.org/10.1109/TPDS.2009.41} }
Diefendorff, K.	Power4 focuses on Memory Bandwidth [BibTeX]	1999	Microprocessor Report Vol. 13 (pp. 1-8).	article	PDF
BibTeX: @article{Diefendorff1999, author = {K. Diefendorff}, title = {Power4 focuses on Memory Bandwidth}, journal = {Microprocessor Report}, year = {1999}, volume = {13}, pages = {1-8} }
Eyerman, S. & Eeckhout, L.	Modeling critical sections in Amdahl's law and its implications for multicore design [Abstract] [BibTeX]	2010	SIGARCH Comput. Archit. News Vol. 38 (pp. 362-370). ACM.	article	DOI PDF
Abstract: This paper presents a fundamental law for parallel performance: it shows that parallel performance is not only limited by sequential code (as suggested by Amdahl's law) but is also fundamentally limited by synchronization through critical sections. Extending Amdahl's software model to include critical sections, we derive the surprising result that the impact of critical sections on parallel performance can be modeled as a completely sequential part and a completely parallel part. The sequential part is determined by the probability for entering a critical section and the contention probability (i.e., multiple threads wanting to enter the same critical section). This fundamental result reveals at least three important insights for multicore design. (i) Asymmetric multicore processors deliver less performance benefits relative to symmetric processors than suggested by Amdahl's law, and in some cases even worse performance. (ii) Amdahl's law suggests many tiny cores for optimum performance in asymmetric processors, however, we find that fewer but larger small cores can yield substantially better performance. (iii) Executing critical sections on the big core can yield substantial speedups, however, performance is sensitive to the accuracy of the critical section contention predictor.
BibTeX: @article{Eyerman2010, author = {Eyerman, Stijn and Eeckhout, Lieven}, title = {Modeling critical sections in Amdahl's law and its implications for multicore design}, journal = {SIGARCH Comput. Archit. News}, publisher = {ACM}, year = {2010}, volume = {38}, pages = {362--370}, url = {http://doi.acm.org/10.1145/1816038.1816011}, doi = {http://doi.acm.org/10.1145/1816038.1816011} }
Gray, J.	What next?: A dozen information-technology research goals [BibTeX]	2003	J. ACM Vol. 50 (pp. 41-57). ACM.	article	DOI PDF
BibTeX: @article{Gray2003, author = {Gray, Jim}, title = {What next?: A dozen information-technology research goals}, journal = {J. ACM}, publisher = {ACM}, year = {2003}, volume = {50}, pages = {41--57}, url = {http://doi.acm.org/10.1145/602382.602401}, doi = {http://doi.acm.org/10.1145/602382.602401} }
Gunther, N. J.	A General Theory of Computational Scalability Based on Rational Functions [Abstract] [BibTeX]	2008		techreport	PDF
Abstract: The universal scalability law of computational capacity is a rational function Cp = P (p)=Q(p) with P (p) a linear polynomial and Q(p) a second-degree polynomial in the number of physical processors p, that has been long used for statistical modeling and prediction of computer system performance. We prove that Cp is equivalent to the synchronous throughput bound for a machine- repairman with state-dependent service rate. Simpler rational functions, such as Amdahl's law and Gustafson speedup, are corollaries of this queue-theoretic bound. Cp is further shown to be both necessary and sucient for modeling all practical characteristics of computational scalability
BibTeX: @techreport{Gunther2008, author = {Neil J. Gunther}, title = {A General Theory of Computational Scalability Based on Rational Functions}, year = {2008}, url = {http://www.arxiv.org/pdf/0808.1431} }
Gustafson, J. L.	Reevaluating Amdahl's law [BibTeX]	1988	Commun. ACM Vol. 31 (pp. 532-533). ACM.	article	DOI PDF
BibTeX: @article{Gustafson1988, author = {Gustafson, John L.}, title = {Reevaluating Amdahl's law}, journal = {Commun. ACM}, publisher = {ACM}, year = {1988}, volume = {31}, pages = {532--533}, url = {http://doi.acm.org/10.1145/42411.42415}, doi = {http://doi.acm.org/10.1145/42411.42415} }
Hill, M. D.	Amdahl's Law in the multicore era [Abstract] [BibTeX]	2008	Proc. IEEE 14th Int. Symp. High Performance Computer Architecture HPCA 2008	inproceedings	DOI
Abstract: Summary form only given. In this paper, we apply Amdahl's law to several multicore chips variants: symmetric cores, asymmetric cores and dynamic techniques that allow cores to work together on sequential execution. Starting with Amdahl's simple software model, we add a simple hardware model based on fixed chip resources.
BibTeX: @inproceedings{Hill2008b, author = {Hill, M. D.}, title = {Amdahl's Law in the multicore era}, booktitle = {Proc. IEEE 14th Int. Symp. High Performance Computer Architecture HPCA 2008}, year = {2008}, doi = {10.1109/HPCA.2008.4658638} }
Hill, M. D. & Marty, M. R.	Amdahl's Law in the Multicore Era [Abstract] [BibTeX]	2008	Computer Vol. 41 (pp. 33-38). IEEE Computer Society Press.	article	DOI PDF
Abstract: Augmenting Amdahl's law with a corollary for multicore hardware makes it relevant to future generations of chips with multiple processor cores. Obtaining optimal multicore performance will require further research in both extracting more parallelism and making sequential cores faster.
BibTeX: @article{Hill2008, author = {Hill, Mark D. and Marty, Michael R.}, title = {Amdahl's Law in the Multicore Era}, journal = {Computer}, publisher = {IEEE Computer Society Press}, year = {2008}, volume = {41}, pages = {33--38}, url = {http://portal.acm.org/citation.cfm?id=1449375.1449387}, doi = {10.1109/MC.2008.209} }
Hill, M. D. & Marty, M. R.	Amdahl's Law in the Multicore Era [Abstract] [BibTeX]	2008	Computer Vol. 41 (7) (pp. 33-38).	article	DOI PDF
Abstract: Augmenting Amdahl's law with a corollary for multicore hardware makes it relevant to future generations of chips with multiple processor cores. Obtaining optimal multicore performance will require further research in both extracting more parallelism and making sequential cores faster.
BibTeX: @article{Hill2008a, author = {Hill, M. D. and Marty, M. R. }, title = {Amdahl's Law in the Multicore Era}, journal = {Computer}, year = {2008}, volume = {41}, number = {7}, pages = {33--38}, doi = {10.1109/MC.2008.209} }
Karbowski, A.	Amdahl's and Gustafson-Barsis laws revisited [Abstract] [BibTeX]	2008	The Computing Research Repository (CoRR) Vol. abs/0809.1177 (pp. -). Pre-print in ArXiv.org	article	PDF
Abstract: The paper presents a simple derivation of the Gustafson-Barsis law from the Amdahl's law. In the computer literature these two laws describing the speedup limits of parallel applications are derived separately. It is shown, that treating the time of the execution of the sequential part of the application as a constant, in few lines the Gustafson-Barsis law can be obtained from the Amdahl's law and that the popular claim, that Gustafson-Barsis law overthrows Amdahl's law is a mistake.
BibTeX: @article{Karbowski2008, author = {Andrzej Karbowski}, title = {Amdahl's and Gustafson-Barsis laws revisited}, journal = {The Computing Research Repository (CoRR)}, year = {2008}, volume = {abs/0809.1177}, pages = {--}, note = {Pre-print in ArXiv.org}, url = {http://arxiv.org/abs/0809.1177v1} }
Krishnaprasad, S.	Uses and abuses of Amdahl's law [Abstract] [BibTeX]	2001	J. Comput. Small Coll. Vol. 17 (pp. 288-293). Consortium for Computing Sciences in Colleges.	article	PDF
Abstract: Amdahl's law has been widely used by designers and researchers to get a rough estimate of performance improvement when alternate designs and implementations are attempted. It gives a simple relationship between the nature of performance improvement and the problem characteristics. The negative way the original law was stated [Amd67] contributed to a good deal of pessimism about the nature of parallel processing. But, after observing remarkable speedups in some large-scale applications, researchers in parallel processing started wrongfully suspecting the validity and usefulness of Amdahl's law. In this paper we present the many uses of Amdahl's law as well as some of its abuses.
BibTeX: @article{Krishnaprasad2001, author = {Krishnaprasad, S.}, title = {Uses and abuses of Amdahl's law}, journal = {J. Comput. Small Coll.}, publisher = {Consortium for Computing Sciences in Colleges}, year = {2001}, volume = {17}, pages = {288--293}, url = {http://portal.acm.org/citation.cfm?id=775339.775386} }
Li, K.-C., Hsu, C.-H., Yang, L. T., Dongarra, J. & Zima, H.	Handbook of Research on Scalable Computing Technologies 2-Volumes [BibTeX]	2009	Information Science Reference - Imprint of: IGI Publishing.	book	PDF
BibTeX: @book{HandbookScalableComputing, author = {Li, Kuan-Ching and Hsu, Ching-Hsien and Yang, Laurence Tianruo and Dongarra, Jack and Zima, Hans}, title = {Handbook of Research on Scalable Computing Technologies 2-Volumes}, publisher = {Information Science Reference - Imprint of: IGI Publishing}, year = {2009} }
Marr, D. T., Binns, F., Hill, D. L., Hinton, G., Koufaty, D. A., Miller, J. A. & Upton, M.	Hyper-Threading Technology Architecture and Microarchitecture [Abstract] [BibTeX]	2002	Intel Technology Journal Vol. 6 (1) (pp. 4-16).	article	PDF
Abstract: Intel’s Hyper-Threading Technology brings the concept of simultaneous multi-threading to the Intel Architecture. Hyper-Threading Technology makes a single physical processor appear as two logical processors; the physical execution resources are shared and the architecture state is duplicated for the two logical processors. From a software or architecture perspective, this means operating systems and user programs can schedule processes or threads to logical processors as they would on multiple physical processors. From a microarchitecture perspective, this means that instructions from both logical processors will persist and execute simultaneously on shared execution resources. This paper describes the Hyper-Threading Technology architecture, and discusses the microarchitecture details of Intel's first implementation on the Intel Xeon processor family. Hyper-Threading Technology is an important addition to Intel’s enterprise product line and will be integrated into a wide variety of products.
BibTeX: @article{Marr2002, author = {Deborah T. Marr and Frank Binns and David L. Hill and Glenn Hinton and David A. Koufaty and J. Alan Miller and Michael Upton}, title = {Hyper-Threading Technology Architecture and Microarchitecture}, journal = {Intel Technology Journal}, year = {2002}, volume = {6}, number = {1}, pages = {4-16}, url = {http://download.intel.com/technology/itj/2002/volume06issue01/art01_hyper/vol6iss1_art01.pdf} }
McDougall, R.	Extreme Software Scaling [Abstract] [BibTeX]	2005	Queue Vol. 3 (pp. 36-46). ACM.	article	DOI PDF
Abstract: The advent of SMP (symmetric multiprocessing) added a new degree of scalability to computer systems. Rather than deriving additional performance from an incrementally faster microprocessor, an SMP system leverages multiple processors to obtain large gains in total system performance. Parallelism in software allows multiple jobs to execute concurrently on the system, increasing system throughput accordingly. Given sufficient software parallelism, these systems have proved to scale to several hundred processors.
BibTeX: @article{McDougall2005, author = {McDougall, Richard}, title = {Extreme Software Scaling}, journal = {Queue}, publisher = {ACM}, year = {2005}, volume = {3}, pages = {36--46}, url = {http://doi.acm.org/10.1145/1095408.1095419}, doi = {http://doi.acm.org/10.1145/1095408.1095419} }
Millsap, C.	Thinking Clearly about Performance [Abstract] [BibTeX]	2010	Queue Vol. 8 (pp. 10:10-10:20). ACM.	article	DOI PDF
Abstract: Improving the performance of complex software is difficult, but understanding some fundamental principles can make it easier.
BibTeX: @article{Millsap2010, author = {Millsap, Cary}, title = {Thinking Clearly about Performance}, journal = {Queue}, publisher = {ACM}, year = {2010}, volume = {8}, pages = {10:10--10:20}, url = {http://doi.acm.org/10.1145/1854039.1854041}, doi = {http://doi.acm.org/10.1145/1854039.1854041} }
Morad, T. Y., Weiser, U. C., Kolodnyt, A., Valero, M. & Ayguade, E.	Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors [Abstract] [BibTeX]	2006	Computer Architecture Letters Vol. 5 (1) (pp. 14-17).	article	DOI PDF
Abstract: This paper evaluates asymmetric cluster chip multiprocessor (ACCMP) architectures as a mechanism to achieve the highest performance for a given power budget. ACCMPs execute serial phases of multithreaded programs on large high-performance cores whereas parallel phases are executed on a mix of large and many small simple cores. Theoretical analysis reveals a performance upper bound for symmetric multiprocessors, which is surpassed by asymmetric configurations at certain power ranges. Our emulations show that asymmetric multiprocessors can reduce power consumption by more than two thirds with similar performance compared to symmetric multiprocessors
BibTeX: @article{Morad2006, author = {Morad, T. Y. and Weiser, U. C. and Kolodnyt, A. and Valero, M. and Ayguade, E. }, title = {Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors}, journal = {Computer Architecture Letters}, year = {2006}, volume = {5}, number = {1}, pages = {14--17}, doi = {10.1109/L-CA.2006.6} }
Nicol, D. M.	Scalability, locality, partitioning and synchronization PDES [BibTeX]	1998	Proceedings of the twelfth workshop on Parallel and distributed simulation (pp. 5-11). PADS '98. IEEE Computer Society.	inproceedings	DOI PDF
BibTeX: @inproceedings{Nicol1998, author = {Nicol, David M.}, title = {Scalability, locality, partitioning and synchronization PDES}, booktitle = {Proceedings of the twelfth workshop on Parallel and distributed simulation}, publisher = {IEEE Computer Society}, year = {1998}, pages = {5--11}, url = {http://dx.doi.org/10.1145/278008.278010}, doi = {http://dx.doi.org/10.1145/278008.278010} }
Nicol, D. M.	Scalability, locality, partitioning and synchronization PDES [BibTeX]	1998	SIGSIM Simul. Dig. Vol. 28 (pp. 5-11). ACM.	article	DOI
BibTeX: @article{Nicol1998Journal, author = {Nicol, David M.}, title = {Scalability, locality, partitioning and synchronization PDES}, journal = {SIGSIM Simul. Dig.}, publisher = {ACM}, year = {1998}, volume = {28}, pages = {5--11}, url = {http://doi.acm.org/10.1145/278009.278010}, doi = {http://doi.acm.org/10.1145/278009.278010} }
Patterson, D.	The parallel computing landscape: a Berkeley view [BibTeX]	2007	Proc. ACM/IEEE Int Low Power Electronics and Design (ISLPED) Symp	inproceedings	DOI PDF
BibTeX: @inproceedings{Patterson2007, author = {Patterson, David}, title = {The parallel computing landscape: a Berkeley view}, booktitle = {Proc. ACM/IEEE Int Low Power Electronics and Design (ISLPED) Symp}, year = {2007}, doi = {10.1145/1283780.1283829} }
Paul, J. M. & Meyer, B. H.	Amdahl's law revisited for single chip systems [Abstract] [BibTeX]	2007	Int. J. Parallel Program. Vol. 35 (pp. 101-123). Kluwer Academic Publishers.	article	DOI PDF
Abstract: Amdahl’s Law is based upon two assumptions – that of boundlessness and homogeneity – and so it can fail when applied to single chip heterogeneous multiprocessor designs, and even microarchitecture. We show that a performance increase in one part of the system can negatively impact the overall performance of the system, in direct contradiction to the way Amdahl’s Law is instructed. Fundamental assumptions that are consistent with Amdahl’s Law are a heavily ingrained part of our computing design culture, for research as well as design. This paper points in a new direction. We motivate that emphasis should be made on holistic, system level views instead of divide and conquer approaches. This, in turn, has relevance to the potential impacts of custom processors, system-level scheduling strategies and the way systems are partitioned. We realize that Amdahl’s Law is one of the few, fundamental laws of computing. However, its very power is in its simplicity, and if that simplicity is carried over to future systems, we believe that it will impede the potential of future computing system
BibTeX: @article{Paul2007, author = {Paul, JoAnn M. and Meyer, Brett H.}, title = {Amdahl's law revisited for single chip systems}, journal = {Int. J. Parallel Program.}, publisher = {Kluwer Academic Publishers}, year = {2007}, volume = {35}, pages = {101--123}, url = {http://dx.doi.org/10.1007/s10766-006-0028-8}, doi = {http://dx.doi.org/10.1007/s10766-006-0028-8} }
Pollack, F. J.	New microarchitecture challenges in the coming generations of CMOS process technologies (keynote address)(abstract only) [Abstract] [BibTeX]	1999	Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture (pp. 2-). MICRO 32. IEEE Computer Society.	inproceedings	PDF
Abstract: Over the last 15 years, CMOS scaling simplified the task of the microprocessor architect. With each new process technology, frequency increased by -50%, and transistor density increase by 100 percent. Also, the improvements in manufacturing technology (larger wafers and higher yields) allowed for increasing die sizes without increasing cost. Projections of die sizes of 1 square inch or higher were common. However, the end of these easy times is in sight, and several new challenges are facing the architect. Die size is no longer going to be limited by equipment or manufacturing cost, but rather by power. To date the approach has been to lower voltage with each process generation. But as voltage is lowered, leakage current and energy increase, contributing to higher power. And the problems extend beyond power dissipation to power delivery/distribution and increasing power density. This talk will first look at the historical trends of CMOS process technology in the context of past microprocessors. It will then look at the implications of continued CMOS scaling, as described above, and the new challenges they pose. Microarchitecture techniques that have exacerbated the power problem will also be covered. Finally, the talk will describe some of the microarchitecture directions that may lead to more power-efficient and cost-efficient microprocessors.
BibTeX: @inproceedings{Pollack1999, author = {Pollack, Fred J.}, title = {New microarchitecture challenges in the coming generations of CMOS process technologies (keynote address)(abstract only)}, booktitle = {Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture}, publisher = {IEEE Computer Society}, year = {1999}, pages = {2--}, url = {http://portal.acm.org/citation.cfm?id=320080.320082} }
Ronen, R., Mendelson, A., Lai, K., Lu, S.-L., Pollack, F. & Shen, J. P.	Coming challenges in microarchitecture and architecture [Abstract] [BibTeX]	2001	Proceedings of the IEEE Vol. 89 (3) (pp. 325-340).	article	DOI PDF
Abstract: In the past several decades, the world of computers and especially that of microprocessors has witnessed phenomenal advances. Computers have exhibited ever-increasing performance and decreasing costs, making them more affordable and in turn, accelerating additional software and hardware development that fueled this process even more. The technology that enabled this exponential growth is a combination of advancements in process technology, microarchitecture, architecture, and design and development tools. While the pace of this progress has been quite impressive over the last two decades, it has become harder and harder to keep up this pace. New process technology requires more expensive megafabs and new performance levels require larger die, higher power consumption, and enormous design and validation effort. Furthermore, as CMOS technology continues to advance, microprocessor design is exposed to a new set of challenges. In the near future, microarchitecture has to consider and explicitly manage the limits of semiconductor technology, such as wire delays, power dissipation, and soft errors. In this paper we describe the role of microarchitecture in the computer world present the challenges ahead of us, and highlight areas where microarchitecture can help address these challenges
BibTeX: @article{Ronen2001, author = {Ronen, R. and Mendelson, A. and Lai, K. and Shih-Lien Lu and Pollack, F. and Shen, J. P. }, title = {Coming challenges in microarchitecture and architecture}, journal = {Proceedings of the IEEE}, year = {2001}, volume = {89}, number = {3}, pages = {325--340}, doi = {10.1109/5.915377} }
Shen, H. & Pétrot, F.	Using Amdahl’s Law for Performance Analysis of Many-Core SoC Architectures Based on Functionally Asymmetric Processors [Abstract] [BibTeX]	2011	(6566) Proc. of Architecture of Computing Systems - ARCS 2011 (pp. 38-49). Lecture Notes in Computer Science. Springer-Verlag.	inproceedings	PDF
Abstract: Amdahl's law is a fundamental tool for understanding the evolution of performance as a function of parallelism. Following a recent trend on the timing and power analysis of general purpose many-core chips using this law, we carry out an analysis aiming at many-core SoCs integrating processors sharing the same core instruction set but each potentially having additional extensions. For SoCs targeting well defined classes of applications, higher performances can be achieved by adding application specific extensions either through the addition of instructions in the core instruction set or through coprocessors leading to architectures with functionally asymmetric processors. This kind of architectures is becoming technically viable and advocated by several groups, but the theoretical study of their properties is yet to be performed: this is precisely our goal in this paper. We use Amdahl's law to prove the performance advantage of using extensions for many-core SoCs and shows that the many-core architecture based on functionally asymmetric processors can achieve the same performance as the symmetric one but at a lower cost.
BibTeX: @inproceedings{Shen2011, author = {Hao Shen and Frédéric Pétrot}, title = {Using Amdahl’s Law for Performance Analysis of Many-Core SoC Architectures Based on Functionally Asymmetric Processors}, booktitle = {Proc. of Architecture of Computing Systems - ARCS 2011}, publisher = {Springer-Verlag}, year = {2011}, number = {6566}, pages = {38-49} }
Sun, X.-H. & Chen, Y.	Reevaluating Amdahl's law in the multicore era [BibTeX]	2010	J. Parallel Distrib. Comput. Vol. 70 (pp. 183-188). Academic Press, Inc..	article	DOI PDF
BibTeX: @article{Sun2010, author = {Sun, Xian-He and Chen, Yong}, title = {Reevaluating Amdahl's law in the multicore era}, journal = {J. Parallel Distrib. Comput.}, publisher = {Academic Press, Inc.}, year = {2010}, volume = {70}, pages = {183--188}, url = {http://dx.doi.org/10.1016/j.jpdc.2009.05.002}, doi = {http://dx.doi.org/10.1016/j.jpdc.2009.05.002} }
Sun, X.-H. & Ni, L. M.	Scalable problems and memory-bounded speedup [BibTeX]	1993	J. Parallel Distrib. Comput. Vol. 19 (pp. 27-37). Academic Press, Inc..	article	DOI PDF
BibTeX: @article{Sun1993, author = {Sun, Xian-He and Ni, Lionel M.}, title = {Scalable problems and memory-bounded speedup}, journal = {J. Parallel Distrib. Comput.}, publisher = {Academic Press, Inc.}, year = {1993}, volume = {19}, pages = {27--37}, url = {http://portal.acm.org/citation.cfm?id=163567.163571}, doi = {10.1006/jpdc.1993.1087} }
Sun, X.-H. & Ni, L. M.	Another view on parallel speedup [BibTeX]	1990	Proceedings of the 1990 ACM/IEEE conference on Supercomputing (pp. 324-333). Supercomputing '90. IEEE Computer Society Press.	inproceedings	PDF
BibTeX: @inproceedings{Sun1990, author = {Sun, Xian-He and Ni, Lionel M.}, title = {Another view on parallel speedup}, booktitle = {Proceedings of the 1990 ACM/IEEE conference on Supercomputing}, publisher = {IEEE Computer Society Press}, year = {1990}, pages = {324--333}, url = {http://portal.acm.org/citation.cfm?id=110382.110450} }
Theys, M. D., Ali, S., Siegel, H. J., Chandy, M., Hwang, K., Kennedy, K., Sha, L., Shin, K. G., Snir, M., Snyder, L. & Sterling, T.	What Are the Top Ten Most Influential Parallel and Distributed Processing Concepts of the Past Millenium? [Abstract] [BibTeX]	2001	Journal of Parallel and Distributed Computing Vol. 61 (12) (pp. 1827 - 1841).	article	DOI PDF
Abstract: This is a report on a panel titled "What are the top ten most influential parallel and distributed processing concepts of the last millennium?" that was held at the IEEE Computer Society sponsored "14th International Parallel and Distributed Processing Symposium (IPDPS 2000)." The panelists were chosen to represent a variety of perspectives and technical areas. After the panelists had presented their choices for the top ten, an open discussion was held among the audience and panelists. At the end of the discussion, a ballot was distributed for the audience to vote on the top ten concepts (in arbitrary order). The voting identified the following ten most influential parallel and distributed processing concepts of the last millennium: (1) Amdahl's law and scalability, (2) Arpanet and Internet, (3) pipelining, (4) divide and conquer approach, (5) multiprogramming, (6) synchronization (including semaphores), (7) load balancing, (8) message passing and packet switching, (9) cluster computing, and (10) multithreaded (lightweight) program execution.
BibTeX: @article{Theys2001, author = {Mitchell D. Theys and Shoukat Ali and Howard Jay Siegel and Mani Chandy and Kai Hwang and Ken Kennedy and Lui Sha and Kang G. Shin and Marc Snir and Larry Snyder and Thomas Sterling}, title = {What Are the Top Ten Most Influential Parallel and Distributed Processing Concepts of the Past Millenium?}, journal = {Journal of Parallel and Distributed Computing}, year = {2001}, volume = {61}, number = {12}, pages = {1827 - 1841}, url = {http://www.sciencedirect.com/science/article/B6WKJ-457CHPT-J/2/e559b73a969804215c6e4478ef5140fc}, doi = {DOI: 10.1006/jpdc.2001.1767} }
Woo, D. H. & Lee, H.-H. S.	Extending Amdahl's Law for Energy-Efficient Computing in the Many-Core Era [Abstract] [BibTeX]	2008	Computer Vol. 41 (12) (pp. 24-31).	article	DOI PDF
Abstract: An updated take on Amdahl's analytical model uses modern design constraints to analyze many-core design alternatives. The revised models provide computer architects with a better understanding of many-core design types, enabling them to make more informed tradeoffs.
BibTeX: @article{Woo2008, author = {Dong Hyuk Woo and Lee, H.-H. S.}, title = {Extending Amdahl's Law for Energy-Efficient Computing in the Many-Core Era}, journal = {Computer}, year = {2008}, volume = {41}, number = {12}, pages = {24--31}, doi = {10.1109/MC.2008.494} }
Wulf, W. A. & Mckee, S. A.	Hitting the Memory Wall: Implications of the Obvious [BibTeX]	1995	Computer Architecture News Vol. 23 (pp. 20-24).	article	PDF
BibTeX: @article{Wulf1995, author = {Wm. A. Wulf and Sally A. Mckee}, title = {Hitting the Memory Wall: Implications of the Obvious}, journal = {Computer Architecture News}, year = {1995}, volume = {23}, pages = {20--24} }
Yao, E., Bao, Y., Tan, G. & Chen, M.	Extending Amdahl's law in the multicore era [BibTeX]	2009	SIGMETRICS Perform. Eval. Rev. Vol. 37 (pp. 24-26). ACM.	article	DOI PDF
BibTeX: @article{Yao2009, author = {Yao, Erlin and Bao, Yungang and Tan, Guangming and Chen, Mingyu}, title = {Extending Amdahl's law in the multicore era}, journal = {SIGMETRICS Perform. Eval. Rev.}, publisher = {ACM}, year = {2009}, volume = {37}, pages = {24--26}, url = {http://doi.acm.org/10.1145/1639562.1639571}, doi = {http://doi.acm.org/10.1145/1639562.1639571} }