Cluster is short for cluster computer, which is technically a
loosely coupled multi-computer to distinguish it from a tightly
coupled multicomputer, which is better known as a multiprocessor, where
several central processing units (CPUs) share a single, central memory space. So
much for vocabulary...
A multiprocessor has two important qualities: it is very fast, and it is
very expensive. It is very fast because the several CPUs inside it can each
execute a task's instructions independently of the others (true multiprogramming).
An ordinary computer, like a typical PC, has just one CPU and can at best offer
multitasking, where each task may be executed while the others wait. And it
is very expensive because it's hard to design a complex system where many CPUs can
interact with the same memory, and because the demand for such equipment is high:
computers with many simultaneous tasks have a high degree of parallelism,
and can perform them much faster on a multiprocessor.
It is important to note here that the benefit from a computer's parallel
architecture is proportional to the degree of parallelism in the software application.
Applications without a large number of operations that can execute independently of
one another are said to have a low degree of parallelism, and will not benefit
from the use of a multiprocessor. That is, an inherently sequential process may run
even more slowly on an expensive multiprocessor!
So what is a cluster, exactly?
Back in the mid-1980s a graduate student at Yale (Kai Li, by name), wanted the
speed from parallelism of a multiprocessor, but without the cost. He investigated
the possibility that a network of ordinary computers (a new concept at the time)
could share their collective memory areas into one large virtual memory
space. (Virtual memory was a well developed concept back then, but shared memory
across many independent computers was quite a novel idea indeed!) What he built
was an inexpensive alternative to the multiprocessor. It ran multiple tasks
much faster than a single computer, but not as fast as a multiprocessor. And, it
had hardware fault-tolerance because a failed node computer could simply
be replaced!
If you are new to the concept of a computer architecture, have a look at
our historical introduction, Computer Architecture 101.
(24k pdf)
If an expensive blade server is beyond your budget you can build a Linux-based
cluster from a hand full of ordinary PCs. It helps if they are homogeneous (all nearly
identical), and on a reasonably high-speed network (100 Mbps). Above is a picture of our
test bed, built from five identical PCs. Each PC can run either the latest 2.6 kernel or
an older 2.4 version of Linux. They are on an isolated 100 MBps switched network, along
with a sixth PC used to simulate an external clientele pool, and as a performance monitor.
That's the hardware part of it, anyway. Some operating system files have to be configured
so that each node computer knows about each of the others. And finally, you will
need some sort of parallel software application that can benefit from this kind
of computer architecture.
How do you configure those files, and where do you get that kind of software? Have a look
at the book described below. And when your cluster is operational, have a look at some of
the free software we offer here and on our Software page.
And it's all open source, which will hopefully encourage you to experiment!
Linux Cluster Architecture - this book leads the reader
through the design and development of a Linux cluster using PCs and networking hardware.
It shows you the C calls and OS functions you need to develop networking software, and how
to measure and tune the system's performance. Sams Publishing, 2002. You may also download
a PDF of the LUG presentation slides from our Literature page.
Linux Cluster Architecture - Chinese Edition, China Machine Press, 2003.