<!doctype linuxdoc system>
<article>
<title>The NetWinder Rescue-HOWTO
<author>Ralph Siemsen, <tt>ralphs@netwinder.org</tt>
<date>$Revision: 1.5 $, $Date: 2000/11/19 17:15:26 $

<abstract> 
This document explains how to restore a NetWinder using the `Rescue
Filesystem' that is now being shipped on all units.  This is expected to
replace, for most cases, the other methods of rescue (which are described in
the <url url="Disk-Update-HOWTO">).

This document also contains a section on troubleshooting common
hardware and software problems on the NetWinder.
</abstract>

<toc>

<sect>Introduction<p>

Recovering a NetWinder to "factory" conditions has traditionally been a
difficult task, since it required a second computer, configured to act as a
boot server for the NetWinder.  Although this is a flexible solution for
technical users, it can be quite difficult for novices to get it to work. 
Thus, the OfficeServer includes a `Rescue' partition which eliminates the
need for a boot server.  This document explains how to use the rescue
partition to recover a NetWinder to factory condition.<p>

This document therefore is aimed primarily at users of the OfficeServer
product, though owners of the Developer model can also benefit.  The rescue
partition software began shipping on all models beginning in October 1999. 
For machines shipped prior to that date, the rescue software may be
retro-fitted (see chapter 3).<p>

<sect1>Other source of information<p>

A lot of NetWinder-specific information can be found in my home page at
<url url="http://www.netwinder.org/~ralphs/"> including a number of other
HOWTO's on disk images, kernel and firmware installation and usage.<p>

There is also a wealth of information in the general Linux HOWTO's, most of
which apply directly to the NetWinder as well.  They can be found in many
places on the net, including <url
url="http://www.linux.org/help/howto.html">.  I'd particularly recommend the
Ethernet-HOWTO, the NET-3-HOWTO, and (if networking is all new to you), the
Networking-HOWTO.  Actually all of them :)<p>

<sect>Using the rescue partition<p>

This chapter explains the various ways that the rescue partition can be used
to recover a NetWinder back to its factory state.  Keep in mind that this is
normaly a `last resort' measure for fixing your system; often you can more
easily repair the damage in other ways.  The rescue partition can be used as
an emergency boot device, which allows you to go and fix stuff on the main
partition.<p>

<sect1>Overview<p>

All recent NetWinder machines include a small (10 MB) rescue partition, that
contains enough software to reformat the NetWinder's internal hard disk, and
then to reinstall the normal software load.  Naturally, an image of all the
files on the hard disk is also necessary; for OfficeServer this image is
included on the CDROM in the <tt>rescue</tt> folder.<p>

In the most common scenario, the OfficeServer CDROM is placed into a PC that
has network access, so that the NetWinder can retrieve the data from the
CDROM via the network.  It is necessary for the PC to have enabled file
sharing, and a `shared folder' for the CDROM has to be created.<p>

The NetWinder rescue partition is then booted, and networking is configured
so that the PC can be reached.  There are a series of scripts to guide you
through the process of formatting and then mounting the NetWinder hard disk. 
Then, the drive image is retrieved from the CDROM on the PC and installed on
the NetWinder hard disk.  (There is also the option to fetch the drive image
via FTP).<p>

The following sections describe the process in greater detail.<p>

<sect1>Booting the rescue partition<p>

When you need to use the NetWinder's rescue partition, here are the steps to
access it.  You'll need to connect a keyboard and monitor to the NetWinder
to carry out this rescue process.<p>

<enum>
<item> Turn on your NetWinder (or reboot it, if it was running)
<item> Stop the autoboot sequence by as the NetWinder boots. (e.g. when it says
`Press any key to abort autoboot')
<item> Type the following commands at the firmware prompt
<verb>
	setenv kerndev /dev/hda4
	setenv rootdev /dev/hda4
	boot
</verb>
</enum>

That will do it, the NetWinder will now boot from the rescue partition.  In
short time, a shell prompt will appear, along with a message telling you to
run <tt>netconfig</tt> to configure the network.<p>

<sect1>Configuring networking<p>

The <tt>netconfig</tt> script will allow you to set up a network interface. 
It will ask a number of questions about your network, such as the IP address
and netmask to be used.  Some options, like DNS servers and gateways, are
not required if your rescue computer is on the same subnet.<p>

The <tt>netconfig</tt> script will ask you which interface to use. 
Normally, the OfficeServer uses <tt>eth1</tt> (the 10/100-base-T port) for
its internal gateway.  So that is generally the one you would select.  Then
give an IP address and a netmask.  The script will try to compute the
broadcast address for you.<p>

If you normally operate using DHCP, you'll have to `guess' a free IP address
to be used during this rescue boot.  Go to some other computer on your
network, check out what it's IP address is, and then add one or two to the
number.  You can use <tt>ping</tt> or other tools to verify that the address
is free for use.  Then enter the free address into the NetWinder's script.<p>

It is a good idea to test the network connection once it's been configured. 
From the NetWinder you can try to <tt>ping</tt> another machine on your
network.  DNS name resolution might not work, but numeric IP's should. Note
that the rescue partition shell does not support job control, which means
you cannot abort a <tt>ping</tt> with <tt>CTRL-C</tt>.  Instead, you have to
use <tt>ping -c 5 aa.bb.cc.dd</tt> which tells ping to only try 5 times.<p>

<sect1>Now what?<p>

At this point, there are five possible options for re-imaging the
NetWinder's hard disk.  Three of them are quite common:<p>

<enum>
<item><tt>mountsmb</tt> is used if the rescue image is going to be loaded
from a Windows 95/98/NT computer on your network,
<item><tt>mountnfs</tt> is used if the rescue image is going to be loaded
via NFS from a unix system on your network, and
<item><tt>ftprescue</tt> is used if the rescue image will be downloaded by
FTP from an FTP server.
</enum>

In some cases, instead of connecting from the NetWinder to a rescue server,
you'll want to turn the NetWinder into a server so that other computers can
connect to it.  If this seems like the same thing to you, then don't worry
about it, and ignore the following options:<p>

<enum>
<item><tt>nfsserver</tt> turns your NetWinder into an NFS server, with the
root filesystem exported to the whole network,
<item><tt>smbserver</tt> similary turns the NetWinder into a Samba server,
so that other (Windows) clients can connect to it.
</enum>

These options are described further in the following sections.  There are a
few more helpful scripts that are used, <tt>wipefs</tt> which erases the
hard disk, and <tt>mountfs</tt> which mounts the partitions in preparation
for the untarring of the disk image.<p>

<sect1>Using <tt>mountsmb</tt><p>

This is the option that most people will use.  It requires that you have a
computer running Windows on your network.  You place the OfficeServer CD-ROM
into this machine and allow the CD to be shared across the network.  Click
on `My Computer', then right-click on the CD-ROM icon.  A menu will appear,
select `Properties' and then click on the `Sharing' tab.  Turn on sharing
and give it a name, for example, `CDROM'.<p>

On the NetWinder, you should now run the <tt>mountsmb</tt> script.  It will
ask for the name of the Windows computer (if you don't know what it is, then
go to the Windows machine, right-click on `Network Neighborhood' and then
click the `Identification' tab).  Next, you'll be prompted for the name of
the share (`CDROM' in the example above).  Finally, you should enter the
username (which matches the name you used to log into Windows).  The
NetWinder will then try to establish the connection to the Windows
machine.<p>

If the connection fails, you'll have to check your settings carefully and
try again.  Make sure the network cables are plugged in and that you can
<tt>ping</tt> the Windows computer from your NetWinder, and vice-versa.  Try
entering the computer name and share name in uppercase, as some Windows
systems seem to want it that way.  If your DNS server is dodgy or
nonexistant, then you'll need to use the IP address of the Windows machine
in place of its name.<p>

Once the mount is successful, then the contents of the CDROM should be
visible on the NetWinder.  To verify, type <tt>ls -l /mnt/rescue</tt>.  You
should see a directory called `recovery' (or `Recovery') and inside that
directory, the OfficeServer disk image.  You can now skip down to the <ref
id="Actual installation"> section to complete the process.<p>

<sect1>Using <tt>mountnfs</tt><p>

If you have other computers on your network that run a Linux or some other
UNIX-like operating system, then this option is the one to use.  Place the
CDROM into the drive and then do whatever is necessary to mount and share
the CD to the network.  For Linux, this would mean mounting the disk
(<tt>mount /dev/cdrom /mnt/cdrom</tt>) and then editing the
<tt>/etc/exports</tt> file to allow the <tt>/mnt/cdrom</tt> directory to be
shared.  And then the NFS service would need to be restarted.<p>

On the NetWinder, the <tt>mountnfs</tt> script will prompt you for the IP
address (or name) of the rescue server, and the name of the share (e.g.
<tt>/mnt/cdrom</tt>).  It will then try to mount the volume so that it can
be accessed on the NetWinder as <tt>/mnt/rescue</tt>.<p>

If the mount fails, check the network cables, IP addresses, and the settings
on your server.  Try mounting the server from elsewhere on your network, to
see if it is correctly configured.  Often you have to restart both NFS and
portmap services on the server.  Try ping tests to verify that the NetWinder
can talk to the server.<p>

Once the mount is successful, then the contents of the CDROM should be
visible on the NetWinder.  To verify, type <tt>ls -l /mnt/rescue</tt>.  You
should see a directory called `recovery' (or `Recovery') and inside that
directory, the OfficeServer disk image.  You can now skip down to the <ref
id="Actual installation"> section to complete the process.<p>

<sect1>Using <tt>ftprescue</tt><p>

To be written.<p>

<sect1>Using <tt>nfsserver</tt><p>

To be written.<p>

<sect1>Using <tt>smbserver</tt><p>

To be written.<p>

<sect1>Actual installation<p>

<label id="Actual installation">At this point, the new disk image you want
to install should be mounted under <tt>/mnt/rescue</tt> somewhere, and you
should know the exact path and filename.  Since the CDROM's have the old DOS
limitations on filenames, you may find that the image is called something
strange, like <tt>os-1_0_2~.gz</tt> when really it should be something more
meaningful like <tt>os-1.0-2.tar.gz</tt>.  In the following examples, just
substitute the actual filename for the examples listed.<p>

You can now proceed to erase the <tt>hda1</tt> and <tt>hda3</tt> partitions
and then to transfer, via the network, the new disk image on to the empty
partitions.  Two scripts are provided to facilitate this process:
<tt>wipefs</tt> is used to clear the two disk partitions, and
<tt>mountfs</tt> sets the partitions up so they can be accessed from
<tt>/mnt/hdroot</tt>.<p>

<em>Note:</em> there is a bit of a bug in the early versions of the rescue
system.  If you type <tt>cat /proc/version</tt> and it reports linux version
2.2.9-3, then you will likely have trouble with formatting the two
partitions.  The format command (<tt>mke2fs</tt>) will fail randomly with
a `memory violation' error.  If this happens to you, your options are to
replace the kernel with a newer version (2.2.12), or to repeat the command
until it suceeds, or to use <tt>rm -rf</tt> to delete all the files instead
of <tt>mke2fs</tt>.<p>

After you've used <tt>wipefs</tt> and <tt>mountfs</tt>, the new disk image
can be installed directly.  Just to keep you on your toes, we did not
include a script for doing this.  You have to type the commands yourself:<p>

<verb>
	cd /mnt/hdroot
	tar zxvpf /mnt/rescue/recovery/os-1.0-2.tar.gz
</verb>

Adjust the pathname on the <tt>tar</tt> command as necessary to reflect the
actual path and filename where the new image is located.  It is critical to
use the `p' option so that permissions will be set correctly on the files. 
The `v' option can be omitted if you don't want to see the names of the
files scrolling by.<p>

It should take about 15 minutes to copy all the data across.  Once it's
done, you should wait a little longer (30 seconds or so) to let the data be
flushed to disk.  Then, type <tt>exit</tt>, wait until the message appears
that its safe to shutdown.  Then press the reset button to reboot.  At this
point, the new image will be loaded and hopefully all will be well.<p>

<sect>Rescue partition installation<p>

This chapter explains how to install and use the `rescue paritition'
software package.  NetWinder OfficeServer and DM models shipped after
October 1999 include this software package by default; older systems need to
be retrofitted (or sent back for upgrade) in order to make use of the new
package.<p>

<sect1>Do I already have it?<p>

If you've received your machine after October 1999, then you should already
have the rescue package installed on your system.  To be sure, there are two
things to check.  As <tt>root</tt>, run the command <tt>fdisk -l
/dev/hda</tt>.  This will list the current partition table, which should
look something like this:<p>

<verb>
	   Device Boot    Start      End   Blocks   Id  System
	/dev/hda1             1     3895  1963048+  83  Linux native
	/dev/hda2          3896     4026    66024   82  Linux swap
	/dev/hda3          4027     7921  1963080   83  Linux native
	/dev/hda4          7922     7944    11592   83  Linux native
</verb>

The rescue partition is <tt>/dev/hda4</tt>, and it's just a bit over 11 Megs
in size.  This is a pretty sure sign that you have the image, or at least,
you have the space for the rescue image.  To verify that the data is
actually there, you need to mount the partition (temporarily):<p>

<verb>
	mount /dev/hda4 /mnt
	cd /mnt
	ls
</verb>

If the <tt>mount</tt> command fails with `You must specify the filesystem
type' then <tt>/dev/hda4</tt> probably is not formatted and therefore does
not contain the rescue image.  Otherwise, you should see a fairly standard
directory structure listed:

<verb>
	bin   dev  lib         mnt   sbin  usr
	boot  etc  lost+found  proc  tmp   var
</verb>

If you see these directories, then you're all set.  Note that from time to
time, the rescue package will be updated, so it's a good idea to
periodically install a newer version anyways.  There currently isn't a way
to find out which version of the rescue package you have installed, but in
the future, we'll include a <tt>README</tt> file in the root directory
(shown above) that will tell you which version you are looking at.<p>

<sect1>Installing the image<p>

The following steps explain how to install the rescue image onto your system
(or how to upgrade to a newer rescue image; it's the same proceedure).  I'm
assuming that you do actually have a <tt>/dev/hda4</tt> partition of at
least 10 Megs.  See below for advice if you do not have this partition.<p>

To install or update the rescue image on <tt>/dev/hda4</tt>, follow these
steps:<p>

<enum>
<item> Download the latest rescue image by anonymous FTP from
  <url url="ftp://ftp.netwinder.org/pub/netwinder/images/">.  The filename
  is <tt>rescue.tar.gz</tt> or there may be a newer version.
<item> Log in as <tt>root</tt> or use the <tt>su -</tt> command to become
root.
<item> If you had previously mounted the partition, unmount it with the
command <tt>umount /dev/hda4</tt>.
<item> Format the hda4 partition, then mount it on
<tt>/mnt</tt>:
<verb>
	mke2fs  /dev/hda4
	mount /dev/hda4 /mnt
</verb>
<item> Change directory to the mount point, and untar the rescue image.
<verb>
	cd /mnt
	tar zxvpf /root/rescue.tar.gz
</verb>
</enum>

You will of course need to adjust the pathname on the <tt>tar</tt> command
to reflect the location where you downloaded the rescue image.

<sect1>If you don't have <tt>/dev/hda4</tt><p>

If you have an older system where the disk is already fully allocated to
partitions 1 through 3, then it's a bit difficult to install the rescue
system.  I would recommend using one of the other rescue methods, which are
described in the <url url="Disk-Update-HOWTO.html">.  Instead of installing
the full disk image, though, you can repartition the drive and install the
rescue package only.  Then the rescue package can be used to reinstall
everything else.<p>

Another option is to try and merge two partitions together.  If there is
enough space free, then you can copy e.g. <tt>/dev/hda3</tt> over to
<tt>/dev/hda1</tt>, and then can safely split 10MB or so off from 
<tt>/dev/hda3</tt> to be used as the rescue partition.  Sadly, there is no
way to resize an <em>ext2</em> partition without erasing the data on it. 
(There is <em>fips</em>, but that only works for DOS partitions).<p>

Supposing you want to try this, then the first thing to do would be to run
<tt>df</tt> to check how much disk space is available.  It should look
roughly like so:<p>

<verb>
	Filesystem   1k-blocks      Used Available Use% Mounted on
	/dev/hda1      1477028    301819   1098880  22% /
	/dev/hda3      1521792   1151033    292110  80% /usr
</verb>

In this case, there are about 1.15 Gig on <tt>hda3</tt> and only 1.09 Gig of
space remaining on <tt>hda1</tt>, so it won't fit on <tt>hda1</tt>.  It
could be copied the other way (making <tt>hda3</tt> the root filesystem) but
in that case you'd need to carefully adjust <tt>/etc/fstab</tt> to reflect
that fact that the root filesystem is then on <tt>/dev/hda3</tt>, and
remember to delete <tt>/etc/mtab</tt> before shutting down.<p>

To copy the data between the partitions, you would use the following series
of commands.  Note that in my case, <tt>/dev/hda3</tt> was mounted as
<tt>/usr</tt> (as indicated in the output from <tt>df</tt> above).  On the
older systems, it was mounted on <tt>/home</tt> instead.  If that is the
case for you, then substitute <tt>home</tt> for <tt>usr</tt> below.<p>

<verb>
	umount /dev/hda3
	mount /dev/hda3 /mnt
	cp -avx /mnt /usr
	umount /dev/hda3
</verb>

Now you have to edit <tt>/etc/fstab</tt> and comment out (with the <tt>#</tt>
character) the line that begins with <tt>/dev/hda3</tt> (You don't have to
do this if you plan to move everything right back again, after having
re-partitioned.  Just don't reboot in the meantime).<p>

You can then safely split <tt>/dev/hda3</tt> into two smaller pieces, using
<tt>fdisk /dev/hda</tt>.  First delete the entry for partition 3, then
create a new primary partition 3.  When prompted for the size, put in 10 MB
less than you have left.  You can either do the math (total cylinders
divided by the total drive size times 10 MB) or just fiddle by trial and
error.<p>

Then create a 4th primary partition with the remaining 10 MB of space.  Save
the partition table, and format both partitions.  You might also want to
copy the stuff from back over from <tt>/dev/hda1</tt>:<p>

<verb>
	mke2fs /dev/hda3
	mke2fs /dev/hda4
	# Now copy back /usr back from /dev/hda1 if desired:
	mount /dev/hda3 /mnt
	cp -avx /usr /mnt
	umount /mnt
	rm -rf /usr/.		# Careful with this !!
	mount /dev/hda3 /usr
</verb>

Don't forget to restore the <tt>/etc/fstab</tt> file if you changed it. 
Then you can install the rescue image onto <tt>/dev/hda4</tt> as described
above.<p>

<sect>Troubleshooting<p>

<sect1>Normal boot proceedure and terminology<p>

This section describes the stages that the NetWinder goes through when
booting up, from the moment the power is applied until the login prompt
appears.  It also covers the common things that can go wrong.<p>

<sect2>NeTTrom / BIOS<p>

When power is first applied, the first block of flash memory (64k) gets
mapped in and executed.  The first visible action is a quick probe of video
ram, to determine how much memory there is.  The screen is then cleared and
the firmware version number and build date are displayed.  Any logos that
might be found are also rendered, along with the NetWinder logo animation
if it is enabled.  Meanwhile, the remainder of flash memory is read into RAM
and the code therein is decompressed.  There is a red progress meter shown
at the bottom of the screen during this time.  When the decompression is
completed successfully, the screen fades to black, then the decompressed
code is executed.<p>

If the progress meter stops, then flash memory has been corrupted (or bad
data was written to it).  The only way to boot the NetWinder in this case is
to hook up a serial terminal and to download a kernel via the serial
port.  For more details, see section 3.7 of the <url
url="Firmware-HOWTO.html">.<p>

<sect2>Minikernel<p>

The system now boots into a small linux kernel.  The screen clears and
reverts back to text mode.  In older versions, the full boot-up messages
were displayed as the minikernel boots.  In recent versions, only selected
messages are shown to describe the hardware found.  This kernel has the
ability to mount a root filesystem in a variety of ways, as well as to fetch
the main kernel in a variety of ways.  There is a `firmware control menu'
available here.<p>

Normally, the minikernel loads a real kernel from the hard disk.  The
parameters <tt>kerndev</tt> and <tt>kernfile</tt> specify the actual file in
this case (default values are <tt>/dev/hda1</tt> and <tt>/boot/vmlinux</tt>
respectively).<p>

If an invalid kernel filename is given, the firmware will stop with an error
message.  The root filesystem however is a different matter: since it is not
mounted until the kernel boots, the firmware cannot report if an invalid
value is specified.  So you won't find out until later, when the kernel says
<tt>VFS: Unable to mount root fs</tt> and proceeds to try booting from the
non-existent floppy disk.<p>

<sect2>Second stage NeTTrom<p>

After loading the main kernel into RAM, a reset is performed.  Execution
once again starts in the first block of flash code.  However, this time it
notices that its the second boot.  Quickly, the RAM refresh is turned on and
we jump directly to the main kernel.<p>

If the main kernel is not bootable, the screen will stay dark at this point. 
This can also be caused by having inappropriate args passed from firmware
to the main kernel (in particular, the amount of RAM on the system).  Using
old firmware with a new kernel will generally trigger this condition. 
Please see <url url="http://netwinder.org/~ralphs/compat.html"> for details
on this.<p>

<sect2>Main kernel<p>

The main kernel, generally loaded from disk, then goes through its normal
boot sequence.  Hardware is probled, devices are reported, and eventually
the root filesystem gets mounted.  This could fail, particularly if an NFS
root is being used, for a variety of reasons.<p>

Once the root filesystem is mounted, the kernel tries to start the
<tt>init</tt> program, which will then run through the SysV-style init
process.  It will source <tt>/etc/inittab</tt>, which in turn sources
<tt>/etc/rc.d/rc.sysinit</tt> and then all of the <tt>/etc/rcN.d/S*</tt>
scripts (where N is the current runlevel, as defined in <tt>inittab</tt>). 
Finally, <tt>getty</tt>'s are launched on the various virtual consoles.

<sect>Misc<p>

<sect1>Author<p>

The author and maintainer of the NetWinder Rescue-HOWTO is Ralph
Siemsen (ralphs@netwinder.org).  Please send me any comments, additions,
corrections so that the can be included in the next release.  The latest
version of this document can be obtained from <url
url="http://www.netwinder.org/~ralphs/howto/Rescue-HOWTO.html">.<p>

<sect1>To-do<p>

The `sgml2info' version of this document doesn't show the examples properly
- for some reason the linefeeds are removed.  Why is this and how do I fix
it?<p>

<sect1>History<p>

Sep 21, 1999 (version 1.0): First public release of this document.<p>

Nov 09, 1999 (version 1.1): Reoganization, and significant rewrite.<p>

<sect1>Contributors<p>

Phil Petruzzo (philpe@rebel.com) contributed the section on how to install
and use the rescue partition.<p>

Douglas Paul (douglasp@netwinder.org) put together the rescue parition
software.<p>

<sect1>Legal stuff<p>

This document is copyright (c) Ralph Siemsen, 1999.<p>

Permission is granted to make and distribute copies of this manual
provided the copyright notice and this permission notice are preserved
on all copies.<p>

There is no warrantee whatsoever.<p>

</article>
