天天看點

High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI

High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI
By Joseph D. Sloan
Publisher : O'Reilly
Pub Date : November 2004
ISBN : 0-596-00570-9
Pages : 360

This new guide covers everything you need to plan, build, and deploy a high-performance Linux cluster. You'll learn about planning, hardware choices, bulk installation of Linux on multiple systems, and other basic considerations. Learn about the major free software projects and how to choose those that are most helpful to new cluster administrators and programmers. Guidelines for debugging, profiling, performance tuning, and managing jobs from multiple users round out this immensely useful book.

Copyright © 2005 O'Reilly Media, Inc. All rights reserved.

Printed in the United States of America.

Published by O'Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.

O'Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles For more information, contact our corporate/institutional sales department: (800) 998-9938 or ​​​​Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered trademarks of O'Reilly Media, Inc. The Linux series designations, High Performance Linux Clusters with OSCAR, Rocks, openMosix, and MPI, images of the American West, and related trade dress are trademarks of O'Reilly Media, Inc.

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O'Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps.

While every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.

Preface

Clusters built from open source software, particularly based on the GNU/Linux operating system, are increasingly popular. Their success is not hard to explain because they can cheaply solve an ever-widening range of number-crunching applications. A wealth of open source or free software has emerged to make it easy to set up, administer, and program these clusters. Each individual package is accompanied by documentation, sometimes very rich and thorough. But knowing where to start and how to get the different pieces working proves daunting for many programmers and administrators.

This book is an overview of the issues that new cluster administrators have to deal with in making clusters meet their needs, ranging from the initial hardware and software choices through long-term considerations such as performance.

This book is not a substitute for the documentation that accompanies the software that it describes. You should download and read the documentation for the software. Most of the documentation available online is quite good; some is truly excellent.

In writing this book, I have evaluated a large number of programs and selected for inclusion the software I believe is the most useful for someone new to clustering. While writing descriptions of that software, I culled through thousands of pages of documentation to fashion a manageable introduction. This book brings together the information you'll need to get started. After reading it, you should have a clear idea of what is possible, what is available, and where to go to get it. While this book doesn't stand alone, it should reduce the amount of work you'll need to do. I have tried to write the sort of book I would have wanted when I got started with clusters.

The software described in this book is freely available, open source software. All of the software is available for use with Linux; however, much of it should work nicely on other platforms as well. All of the software has been installed and tested as described in this book. However, the behavior or suitability of the software described in this book cannot be guaranteed. While the material in this book is presented in good faith, neither the author nor O'Reilly Media, Inc. makes any explicit or implied warranty as to the behavior or suitability of this software. We strongly urge you to evaluate the software and information provided in this book as appropriate for your own circumstances.

One of the more important developments in the short life of high performance clusters has been the creation of cluster installation kits such as OSCAR and Rocks. With software packages like these, it is possible to install everything you need and very quickly have a fully functional cluster. For this reason, OSCAR and Rocks play a central role in this book.

OSCAR and Rocks are composed of a number of different independent packages, as well as customizations available only with each kit. A fully functional cluster will have a number of software packages each addressing a different need, such as programming, management, and scheduling. OSCAR and Rocks use a best-in-category approach, selecting the best available software for each type of cluster-related task. In addition to the core software, other compatible packages are available as well. Consequently, you will often have several products to choose from for any given need.

Most of the software included in OSCAR or Rocks is significant in its own right. Such software is often nontrivial to install and takes time to learn to use to its full potential. While both OSCAR and Rocks automate the installation process, there is still a lot to learn to effectively use either kit. Installing OSCAR or Rocks is only the beginning.

After some basic background information, this book describes the installation of OSCAR and then Rocks. The remainder of the book describes in greater detail much of the software found in these packages. In each case, I describe the installation, configuration, and use of the software apart from OSCAR or Rocks. This should provide the reader with the information he will need to customize the software or even build a custom cluster bypassing OSCAR or Rocks completely, if desired.

I have also included a chapter on openMosix in this book, which may seem an odd choice to some. But there are several compelling reasons for including this information. First, not everyone needs a world-class high-performance cluster. If you have several machines and would like to use them together, but don't want the headaches that can come with a full cluster, openMosix is worth investigating. Second, openMosix is a nice addition to some more traditional clusters. Including openMosix also provides an opportunity to review recompiling the Linux kernel and an alternative kernel that can be used to demonstrate OSCAR's kernel_picker. Finally, I think openMosix is a really nice piece of software. In a sense, it represents the future, or at least one possible future, for clusters.

I have described in detail (too much, some might say) exactly how I have installed the software. Unquestionably, by the time you read, this some of the information will be dated. I have decided not to follow the practice of many authors in such situations, and offer just vague generalities. I feel that readers benefit from seeing the specific sorts of problems that appear in specific installations and how to think about their solutions.

Audience

This book is an introduction to building high-performance clusters. It is written for the biologist, chemist, or physicist who has just acquired two dozen recycled computers and is wondering how she might combine them to perform that calculation that has always taken too long to complete on her desktop machine. It is written for the computer science student who needs help getting started building his first cluster. It is not meant to be an exhaustive treatment of clusters, but rather attempts to introduce the basics needed to build and begin using a cluster.

In writing this book, I have assumed that the reader is familiar with the basics of setting up and administering a Linux system. At a number of places in this book, I provide a very quick overview of some of the issues. These sections are meant as a review, not an exhaustive introduction. If you need help in this area, several excellent books are available and are listed in the Appendix of this book.

When introducing a topic as extensive as clusters, it is impossible to discuss every relevant topic in detail without losing focus and producing an unmanageable book. Thus, I have had to make a number of hard decisions about what to include. There are many topics that, while of no interest to most readers, are nonetheless important to some. When faced with such topics, I have tried to briefly describe alternatives and provide pointers to additional material. For example, while computational grids are outside the scope of this book, I have tried to provide pointers for those of you who wish to know more about grids.

For the chapters dealing with programming, I have assumed a basic knowledge of C. For high-performance computing, FORTRAN and C are still the most common choices. For Linux-based systems, C seemed a more reasonable choice.

I have limited the programming examples to MPI since I believe this is the most appropriate parallel library for beginners. I have made a particular effort to keep the programming examples as simple as possible. There are a number of excellent books on MPI programming. Unfortunately, the available books on MPI all tend to use fairly complex problems as examples. Consequently, it is all too easy to get lost in the details of an example and miss the point. While you may become annoyed with my simplistic examples, I hope that you won't miss the point. You can always turn to these other books for more complex, real-world examples.

With any introductory book, there are things that must be omitted to keep the book manageable. This problem is further compounded by the time constraints of publication. I did not include a chapter on diskless systems because I believe the complexities introduced by using diskless systems are best avoided by people new to clusters. Because covering computational grids would have considerably lengthened this book, they are not included. There simply wasn't time or space to cover some very worthwhile software, most notably PVM and Condor. These were hard decisions.

Organization

This book is composed of 17 chapters, divided into four parts. The first part addresses background material; the second part deals with getting a cluster running quickly; the third part goes into more depth describing how a custom cluster can be built; and the fourth part introduces cluster programming.

Depending on your background and goals, different parts of this book are likely to be of interest. I have tried to provide information here and at the beginning of each section that should help you in selecting those parts of greatest interest. You should not need to read the entire book for it to be useful.

Part I, An Introduction to Clusters

Chapter 3. Cluster Hardware

It is tempting to let the hardware dictate the architecture of your cluster. However, unless you are just playing around, you should let the potential uses of the cluster dictate its architecture. This in turn will determine, in large part, the hardware you use. At least, that is how it works in ideal, parallel universes.

繼續閱讀