Logo

Clusters


Copyright (c) 2003 - 2008, DSRLab, LLC

All Rights Reserved

"Open source software in support of distributed systems research."


Clusters:

Cluster computers are making the transition from the laboratory into the realm of everyday computing. Financial institutions are using them to service their interactive clients and customers on a world wide scale, computational sciences laboratories employ clusters to perform simulations that were impractical just a decade ago, and Internet Service Providers are building web farms to service the huge number of transactions they receive.

WHAT IS A CLUSTER?

Cluster is short for cluster computer, which is technically a loosely coupled multi-computer to distinguish it from a tightly coupled multicomputer, which is better known as a multiprocessor, where several central processing units (CPUs) share a single, central memory space. So much for vocabulary...

A multiprocessor has two important qualities: it is very fast, and it is very expensive. It is very fast because the several CPUs inside it can each execute a task's instructions independently of the others (true multiprogramming). An ordinary computer, like a typical PC, has just one CPU and can at best offer multitasking, where each task may be executed while the others wait. And it is very expensive because it's hard to design a complex system where many CPUs can interact with the same memory, and because the demand for such equipment is high: computers with many simultaneous tasks have a high degree of parallelism, and can perform them much faster on a multiprocessor.

It is important to note here that the benefit from a computer's parallel architecture is proportional to the degree of parallelism in the software application. Applications without a large number of operations that can execute independently of one another are said to have a low degree of parallelism, and will not benefit from the use of a multiprocessor. That is, an inherently sequential process may run even more slowly on an expensive multiprocessor!

So what is a cluster, exactly?

Back in the mid-1980s a graduate student at Yale (Kai Li, by name), wanted the speed from parallelism of a multiprocessor, but without the cost. He investigated the possibility that a network of ordinary computers (a new concept at the time) could share their collective memory areas into one large virtual memory space. (Virtual memory was a well developed concept back then, but shared memory across many independent computers was quite a novel idea indeed!) What he built was an inexpensive alternative to the multiprocessor. It ran multiple tasks much faster than a single computer, but not as fast as a multiprocessor. And, it had hardware fault-tolerance because a failed node computer could simply be replaced!

If you are new to the concept of a computer architecture, have a look at our historical introduction, Computer Architecture 101. (24k pdf)

PC CLUSTER:

A 5-PC Cluster

If an expensive blade server is beyond your budget you can build a Linux-based cluster from a hand full of ordinary PCs. It helps if they are homogeneous (all nearly identical), and on a reasonably high-speed network (100 Mbps). Above is a picture of our test bed, built from five identical PCs. Each PC can run either the latest 2.6 kernel or an older 2.4 version of Linux. They are on an isolated 100 MBps switched network, along with a sixth PC used to simulate an external clientele pool, and as a performance monitor.

That's the hardware part of it, anyway. Some operating system files have to be configured so that each node computer knows about each of the others. And finally, you will need some sort of parallel software application that can benefit from this kind of computer architecture.

How do you configure those files, and where do you get that kind of software? Have a look at the book described below. And when your cluster is operational, have a look at some of the free software we offer here and on our Software page. And it's all open source, which will hopefully encourage you to experiment!


BOOKS:

Buy it now! Linux Cluster Architecture - this book leads the reader through the design and development of a Linux cluster using PCs and networking hardware. It shows you the C calls and OS functions you need to develop networking software, and how to measure and tune the system's performance. Sams Publishing, 2002. You may also download a PDF of the LUG presentation slides from our Literature page.

Buy it now! Linux Cluster Architecture - Chinese Edition, China Machine Press, 2003.