Individual Report

DISTRIBUTED SYSTEM
An Assignment for Operating System module,
in the University of Central England

By:  Harry Sufehmi, student-id #98799231




Introduction
------------

A distributed system is defined by its three characteristics
[Mullender, 2] :

1.   Consisted of multiple computers
2.   Interconnected
3.   Have a shared state


Distributed.net (http://distributed.net) is founded by  Adam
L.  Beberg, a graduate from Illinois Institute of Technology
with  focus on Operating System. Distributed.net is  a  non-
profit  academic  organisation  committed  to  serve  as   a
gathering   point   for  topics  relating   to   distributed
computing, or the process by which countless computers  work
together toward solving a particular problem. It is  through
the  application  of  this concept that distributed.net  has
been  able to develop and refine these techniques, improving
on the range, scope, and variety of tasks which are suitable
for this technology.

Most  distributed system are not very big in size.  NFS  for
example,  sacrifice  its  scalability  in  return  for  more
transparent  access capability. This is because  the  server
and  client  are  so  closely linked,  almost  every  action
requires  communication between server and  client,  despite
some cache schemes implemented. Distributed.net on the other
hand,   seems   to   follow   these  principles   obediently
[Mullender, 375] :


1.   Clients have the cycles to burn:
Most  of  the  job  is  done by the  clients.  This  greatly
enhances  scalability of  the system,  since  adding clients
won�t place a heavy burden to the server�s resources.

2.   Minimise system-wide knowledge and change:
Distributed.net doesn�t make all the clients  aware  of  the
details of the system status - only the most important  ones.
For example,when the job is switched from RC5-64 cracking to
DES-II  cracking. The client would recognise the tiny signal
that�s   enveloped  in  the  packet  streams   when  they�re
connected, and will automatically switch job.

3.   Trust the fewest possible entities:
The  clients  don�t bother with other clients, they�re  just
concerned with the keyserver they�re connected  into.
Therefore,Internet connection (unsecured link) is sufficient,
the data is just encrypted in the both end.

4.   Cache whenever possible:
The  amount  of cache is configurable. For computers  that�s
connected 24x7 hours to the Internet or a slow computer, the
amount is suggested to be low  so when the key is cracked it
would  be  able  to  quickly notify the  keyserver.  But  in
contrast, a computer with non-24x7 or unreliable  connection,
and also a very  fast  computer;  is suggested to build-up a
huge amount of cache.Therefore,disconnection of communication
with  keyserver for a long periods  of time won�t disrupt the
client work�s continuity.
For the fast computer, a lot of cache ensure that the traffic
would be low,  reducing  the possibility of  overwhelming the
keyserver.

5.   Batch whenever possible:
The only connection made between keyserver and the client is
when the client is downloading  new blocks to be cracked, or
when  it submitted  the blocks  that  have been  cracked. It
doesn�t requires constant communication.
Moreover, sometimes a proxy server could be inserted between
client and the keyserver. The proxy server would serve a lot
of clients. Then only when it reached its threshold,it would
download/upload the blocks to the keyserver,further reducing
the number of communication to/from the keyserver.


The  result from following these principles is a system that
has   reached   a   new  level  of  scalability.   Volunteer
participation in distributed.net is estimated at over 20,000
individuals  from  nearly  every  nation  and  region.  With
combined   resources  of  as  many  as  100,000   computers,
distributed.net  easily  represents  the  largest   academic
collaborative computing effort ever undertaken.



Discussion
----------

Tanenbaum  in  his book agrees that it�s quite difficult  to
pinpoint  the  definition of �operating system�  [Tanenbaum,
3]. He agrees on several definitions, including:

1.   Extended / Virtual machine
2.   Resource manager

Upon  these  definitions, it�s very easy  to  see  that  the
distributed.net�s  software  falls  under  the  category  of
operating  system, instead of just an application.  It�s  as
described below:

1.   Virtual machine:
The  distributed.net project connects many computers  around
the world,  and sums up  all of  their computing power up by
making  them  all  to  process similar task.  This way, they
could  be  thought  as a  single  huge machine, although  in
reality they�re scattered all around the earth.

2.   Resource manager:
The  distributed.net software also have  the  capability  to
manage  (albeit  in somehow  limited way)  and  utilise  the
resources  in  the  clients,  including  memory,  disk space,
network, and especially the processor.

In  short,  now  the  definition of �operating  system�  has
started  to be truly �to enable to operate a system� instead
of just �to enable to operate a type of computer hardware�.

Due  to constrain in space, I�ll just do a quick summary for
the following.

The  distributed.net�s software is available for almost  all
major  operating system. Example is Windows 3.1,  Windows95,
WindowsNT,  Linux,  Sun  Solaris,  AIX,  BeOS,  IBM  OS/390,
Rhapsody,  VMS,  Digital Unix, Amiga, OS/2,  etceteras.  The
client  is designed to be small and efficient, also able  to
be  completely hidden from the view of the user. In its  the
default  operating  mode,  �very  nice�,  the  client  would
consume 100% of CPU cycle when the computer is not used; but
will  instantly release the CPU at the first signal of  user
access  to  the computer. So the user would not even  notice
the differences at all.

To  deter viruses, they use rc5.distributed.net as the  only
download  site. Also all of the programmers is trusted,  and
they all use a common code.

For security, the packets is sent in encrypted form, using a
simple  algorithm.  In the future, the v3 client  (currently
used is the v2 client) would use an enhanced encryption key.
They�re  so  confident  in it that  they  even  promises  to
release  the  source code. This is a good thing,  especially
for the academic community.

Current projects of distributed.net is to crack RSA�s RC5 64-
bit  crypto  key. This is a massive task to be accomplished,
at maximum will require 1.845x10(19) keys to be checked.

Past  projects that has been done successfully  is  cracking
the   RSA�s   56-bit  encryption  (at  maximum  require   72
quadrillion keys to be checked), and DES II-1 encryption (in
only 40 days).
This  all  also  signals  to  the  government  that  current
standard   encryption  (example;  DES  is  USA  government�s
standard) is not enough anymore to secure the data, so  that
even  a  loosely-organised volunteers could crack it  -  let
alone a determined foe with massive resources.

Future  projects includes, but not limited to,  finding  the
ideal  Gollumn�s  ruler, finding the biggest  prime  number,
etceteras.



Conclusions
-----------

It�s  breathtaking  to imagine bonding  together  this  many
computers,  with this many operating systems, with  so  many
hardware  platforms, from so many nations, in so  many  time
zones  and put them all to work on a common computing  task.
Yet it could be done, has been done, and still expanding�

As  a  crude  comparison,  by 24 February  1998,  the  total
computing   power   of   distributed.net�s   volunteers   is
equivalent to 15,316 Sun Ultra 1.

Of  course, this is no longer correct at the present,  since
member base is growing very fast, by daily basis.

I  think  the  ultimate  goal to be  reached  is  to  enable
transparent  distributed system. Imagine when a  user  in  a
laboratory  is about to do a task that would normally  incur
very  heavy load for the processor - like running a  complex
simulation.  All  that  he  needs  to  do  is  to  run   the
distributed   server  software,  and  mark  the   simulation
program�s pid with a special flag.
The  distributed  server  program  would  notice  this,  and
automatically spreads out the task to other workstations  on
the lab running the distributed client software, probably on
different  OS  but still in the same hardware platform.  The
client  software would then execute the task in  a  separate
virtual  space,  taking out whatever idle  CPU  time  that�s
available.
.
The condition that must be met to realise that are:

1. The software must be coded to utilise a multiprocessor
   system according to the OS� standard, or
2. Have its most processor-intensive routines spread into
   several threads that could be recognised by the distributed
   server software.
3. The distributed server software must have hooks to the
   OS�s kernel.


Only then the distributed server software could grab all  of
the   threads,   and  run  them  on  the  network   instead.
Requirements number 1 and 2 is probably would be  considered
uncomfortable,  but yet it�s unavoidable.  This  is  because
code  execution  in  the processor level is  sequential,  so
there�s no way to spread it out automatically, even when  we
tries to accomplish this in the kernel itself. It could only
be  done  when the task-division is done in the  application
software level.

This way, there�s no need to buy many expensive servers  for
the  task, and set it up for clustering configuration  using
some  high-end  Unix  OS.  All we need  is  some  clone  PCs
connected  via  a  network,  then  the  distributed   server
software  would automatically spreads out task  set  by  the
user(s)  to  the  PCs,  effectively  utilising  every   idle
processor time that instead would be wasted for nothing.

(update: MOSIX is a project towards this goal, and I didn't even
 know of their existence when first writing this report!
 Come visit their website at http://www.mosix.cs.huji.ac.il)

For  the  time  being  though, the distributed.net�s  RC5-64
project  is  already  a  breakthrough academic  research  in
distributed computing. I highly advise University of Central
England  to support this cause, especially to fully  exploit
the  idle  processing  time of the Sun workstations  in  the
labs. I shall be glad to assist in anyway I can.


Reference List

1. Mullender, Sape (1993)(Ed.), Distributed Systems,
   New York, Addison-Wesley
2. Charles  Hubbard (1998), RC5-64: Project  Bovine  FAQ,
   http://www.distributed.net/FAQ/rc564faq.htm,
   Distributed Computing Technologies, Inc.
3. Tanenbaum, Andrew S (1987),
   Operating Systems - design and implementation,
   New Jersey, Prentice-Hall.
4. David McNett (1998), Press Information,
   http://www.distributed.net/pressroom/presskit.htm,
   Distributed Computing Technologies, Inc.
5. Materials  from  an  interview  with  Adam  L.  Beberg
   ([email protected]), founder of Distributed.Net. Used
   with his written permission, including for academic purpose
   and public viewing in the Internet.

_____________________________________________________________
Last modified: 27 February 1999