Ralph's Status Page
Mon Jan 04
- Committed sound configuration modifications for 2.0.35
- kernel the sound driver can now be configured with "make menuconfig" like
the rest of the kernel.
Tue Jan 5
- CVS setup and introduction for new co-ops
Wed Jan 6
- Modified 2.0.35 kernel to use the standard parameter-passing structure and
test it out against 2.0.6 firmware. Changes not yet committed, pending
further testing.
Thu Jan 7
- New glibc integration work - several programs fail and need to be
rebuilt because of changes in data structure sizes. Identified a bunch of
remaining a.out binaries (they were in hiding, not reporting as MAGIC).
- Caught up server backups from the Christmas break finally.
Fri Jan 8
- Fighting fires with the NFS image and installation of components for
build #13. Samba p10 seems to conflict with files from rhs-printfilters,
there are issues with glibc and the /dev entries to be resolved.
- Bought a Lego Mindstorms kit (thanks for twisting my arm San!) so
there's no chance I'll get any real work done this weekend...
Mon Jan 11
- Discovered 2.0.6 firmware problem: amount of RAM is no longer correctly
passed to the main kernel, now that param passing structure is being used.
The kernel falls back on the default of 16M (or 8M, whichever). Woody
thinks it's related to the bank detection code, and may have been broken for
a long time, but never noticed since "mem=..." was always passed on the
command line.
Tues Jan 12
- Made public release of DM build #12, updated documention on the web site
about this build
- Made internal pre-release of build #13, for QA testing and our own
evaluation.
Wed Jan 13
- Testing the development environment on build #13, discovered problems
with gcc's "fixed" headers, an optimization bug, and missing docs in the
kernel. TO be fixed along with other problems as they might show up.
Target of Friday for the final build.
- Got fustrated with BugBase's weirdness, so I'm installing bugzilla on
gothics for testing and evaluation purposes.
Thu Jan 14
- Got Bugzilla up and running, re-initialized the default database with
something closer to our actual usage needs, tried to switch all the default
email addresses away from @netscape.com
- Tracking down a customer bug with a NetWinder that is unable to boot
kernels >981023.
Fri Jan 15
- Customer bug appears to be defective hardware (timer #4).
- Working on CVS commit log reporting tool, to show what has been changed
in the repository.
Mon Jan 18
- Manged to test some scripts for CVS reporting, but spent most of the day
resolving a problem with the sales demo image, helping the video group track
problems they encountered with build #12, and scheduling for the meeting
about the build system for tomorrow.
Tues Jan 19
- Got the charting and reporting modules in Bugzilla to work, also got
email notification working. Added some categories for Zaphod.
Wed Jan 20
- Preparing next kernel for release (this is the kernel that will go onto
the official build #13).
Thu Jan 21
- Released 980121 kernel, and final work on build #13. Need to write a
page where kernel changes can be announced.
- Testing build #13, c++ support is still not working. Will have to
rebuild one more time.
- Updated as many of the RPM's as possible on netwinder.org.
Fri Jan 22
- Got updates to make C++ work from Scott, and did final version of build
#13.
Mon Jan 25 --- Fri Feb 5
- On holiday... in the sun... (does this count as an accomplishment?)
Mon Feb 8
- Well Mexico was nice... and I came back to find out that the merger is
still going on. Very little work is actually going on (but Quake seems to
be quite popular)...
Tue Feb 9
- Today was spent building a big beowulf cluster. See the photos I took. At least morale is picking up :)
Wed Feb 10
- Reverse-ported the 2.0.35 kernel to work on the Rev3 boards that make up
the Beowulf cluster, and added in the patches from the extreeme linux
package. Mark ported PVM and some other things.
Thu Feb 11
- Make-or-break day :(
- See what PVM can do for us..
Fri Feb 12
- Picking up SCSI work from PatB
- Found a build problem in the kernel - to be fixed.
Reports will resume when everything settles down again.
Week of March 1st, 1999
- Main project is working on dm build #14, which is a top-down
build with everything in RPM format. This will get rid of all the a.out
baggage, fix the utmp/wtmp problems (so that commands like "who" work
again). The base for this work is the SRMS from rawhide.
- Built a disk image "top-down" using a modified "create" script and got
it to the point where you can log in. PAM and shadow password support are
now functioning.
- Added devel tools (egcs, cpp, and such) so that RPM building can take
place on this image. Attempting to rebuild everything under itself (does
that make sense?) but there are some trouble spots.
- Tried out the 2.2.2 kernel on the preliminary build #14 image. Seems to
work quite well - stable except for "klogd" which dumps core. Also the
parameters from the firmware aren't passing reliably - so it defaults to 32M
ram for example. The /dev entries are hopefully up-to-date now.
- Spent some time building rescue flash images. Not quite sure why, but
the nettrom-2.0.4a+rescue image from the ftp site doesn't contain any of the
nfs support - so it's not terribly useful IMHO. I've built another one
(still based on Mike
Montour's 0.1 rescue filesystem which contains the startnfs
stuff. Added it to the pub/ccc/firmware directory.
Week of March 8th, 1999
- Work on DM build #14 continues... there's a problem with
detection of the bool type when compling the
ncurses/curses.h header file, and the init scripts for unknown
reasons try to set a ulimit even on non-local sessions such a
telnet.
- Began exploratory work for Zaphod root disk - aiming to fit everything
into a 4MB flash filesystem. So far, a stripped glibc-2.1, shell, and
skeleton filesystem fit into 600kB compressed.
Week of March 15th, 1999
- Mainly worked on the Zaphod disk image, top-down using RPMS, since it
was now decided to include a hard disk in the head unit.
- Spend time figuring out the exact boot process to be used by the master
and slave units on Zaphod.
- Updated the hwclock program from more recent sources and built
an RPM for it. Now a disk image can be built and booted cleanly (except for
the lack of networking setup).
Week of March 22nd, 1999
- More disk image building for Zaphod...
- Fixed kernel makefiles to create symlink for
arch/arm/lib/inflate.c (finally!)
- Experimenting with booting via a TFTP'd root disk image - and it doesn't
seem to work. Supposed to execute linuxrc script, like it would
when booting a rootdisk in flash, but the execve fails.
- Updated scripts for netwinder web page indexing.
Week of March 29th, 1999
- Finally solved the tftp-boot-with-attached-root-filesystem problem. Now
can boot diskless with a sizeable root filesystem transferred in via tftp.
However there seem to be problems with the tftp daemon on the NetWinder - to
be investigated.
- Created an initrd image with networking support and the XF86_SVGA server
- can be started successfully but requires an external font server.
Week of April 4th, 1999
- More experimenting with the initrd disk images. It doesn't work under
2.2 kernel series, reason to be determined.
- Started updating the firmware/kernel/disk images pages on the web site
(new stuff not uploaded yet though).
- Determined that the 2.2 kernels aren't ELF, and that's why things like
initrd don't work. The support for this was put into the load_elf()
routine...
Week of April 11th, 1999
- Tried out the 2.2.5 kernels and set up a CVS repository for them. Need
to find a way to conveniently handle RMK and PHILB patches, that are always
applied against the baseline kernel rather than the previous patch. Its a
pain having to "unpatch" each time.
- Tested Woody's firmware modifications to support proper initialization
of the kernel param_struct. Memory and root disk are now passed correctly,
video options are passed but ignored by the cyber2000fb driver.
- Wrote a script to apply patches to cvs tree and handle all the messy
details of addding and removing files. Still not perfect though.
Week of April 19th, 1999
- Verified that serial downloading of a kernel still works - for when the
flash writing process fails. Requires a non-ELF kernel (ie. the start
address must be known, $c000) and a terminal program that can bindly
transfer binary files (Teraterm works). To make this process more stable,
future firmware should do xmodem protocol, or use S-record format so that
transmission errors can be detected.
- Repeating the checkin of the 2.2.5 kernel into CVS. Some files didn't
get patched properly on the first try, so I'll repeat and see if I can spot
what when wrong. -- Update: it was user error, of course. Fixed the problem
and will update the external CVS as soon as the server is back online.
- Spent a day fighting with the raid array on the engineering fileserver.
If it isn't umounted cleanly, there are massive troubles in getting it back
up again. Under the 2.2.5, the code for "cleaning" the array is in the
kernel md_thread and no longer handled by ckraid. However
the code isn't enabled by default, one must edit include/linux/md.h
and define the symbol SUPPORT_RECONSTRUCTION.
- Checked in 2.2.6 kernel and Russell King's patch for it. Noticed that
there are about a half dozen files missing on the original linux-2.2.5
import - WHY???
- Working on Zaphod boot disk using read-only NFS mounted root. The
/dev, /etc and /var directories are relocated
into ramdisks, and there are some modification for the rc.sysinit
scripts.
Week of April 26th, 1999
- Ported vnc-3.3.2r3 and it seems to work fine. Required only
minor patch to imake; submitted to vnc maintainers.
- Set up armlinux cvs and a mailing list to log commit message on this
tree. Majordomo can be a pain...
- Prepared 2.0.35 kernel to go with build #14 and the new firmware.
After a long period of silence, I'm starting up this habit
again...
Week of June 28th, 1999
- Kernel work: the 2.3.6 tree won't build due to a missing symbol
(PARPORT_MODE_TRISTATE), not sure if this symbol is coming or going. Little
progress on the automatic build system.
- OfficeServer: packages built for netatalk and imap (to fix broken pop3
daemon); created preinstallation script for build #15, then scaled it back
to build #14. Cleared out half a dozen bugs from Bugzilla.
- Firmware: built 2.1.1 firmware package, this time with the tftp header
file corrected. Boots fine, needs thorough testing, and version tag needs
to be applied in cvs.
Week of August 9, 1999
- The OfficeServer is into beta testing phase, so I've got time to work on
the real fun stuff again ;)
- Completed the first incarnation of the "CVS Log Beautifier" that has
been on the back burner for some time. You feed it the output from "cvs
log" or "rlog" and it outputs nicely formatted HTML of the changes between
each tagged revision, along with hyperlinks into the CVSweb tree.
Limitations right now include the inability to deal with multiple branches,
and it assumes you use a consistent naming strategy for your tags. If
anyone is interested in this, let me know.
- Tried out parallel port devices again under 2.2 kernels. The "old" ZIP
drive works fine (requires the following kernel modules: parport,
parport_pc, scsi_mod, ppa as well as whatever filesystem support you
might need). New ZIP drives (with AutoDetect on the cable) work well with
the imm driver. The Imation LS-120 drive isn't detected (didn't
this happen once before?). Tested an Avatar Shark drive (very cool, 250MB
in the palm of your hand) and it worked perfectly.
- Looking into what is needed to make a minimal firmware, so as to
maximize the space available for an initial ram disk in the 1MB flash.
Built nano-X (http://www.linuxhacker.org/pub/nanogui/NanoGUI-FAQ.txt)
which looks promising...
Week of August 16, 1999
- Investigated a nasty bug with the tulip driver under kernel 2.2.9-1.
Under heave traffic on the eth1 interface, the system appears to freeze
completely and never recovers. Scoping the PCI bus reveals that the system
is in fact alive, but the tulip is monopolizing the bus. It requests and is
granted the bus, then holds on to it for extended lengths of time
(milliseconds), although there does not appear to be any transfer activity
at the time. A timer interrupt is also acknowledged so the CPU would seen
to be running as well. Under investigation.
- Further investigation indicates that the problem is likely to do with
the initialization of the 553, and not the tulip. The PCI bus is sitting in
a state where the arbiter *could* remove the grant and pass it to another
device, but it doesn't. So we'll have to check for changes in the init code
between 2.0.35 and 2.2.x. A project for tomorrow ;)
- Wrong again... the 553 seems to be parking the bus correctly, it isn't
getting any other requests.
Week of August 23, 1999
- Well, my hopes of returning to work to find the tulip troubles
magically fixed have not materialized. So it's back to more testing and
reading up on how it's supposed to all work :)
- Solved! The troublemaker has finally been found. There were a couple
of endless loops in the tulip driver, apparently the result of somebody
doing debugging on it sometime earlier. Removing the loops cures the
problem... how wonderful.
- Final officeserver builds completed
Week of Sep 6, 1999
- The week started off well with Monday being a Civic Holiday.
- Looking into the red-background-upon-bootup problem that only occurs
with gcc > 2.8.1
Week of Sep 12, 1999
- Verified Woody's kernel patch to fix the bootup background color
problem. We were seeing the color of ISA memory...
Week of Sep 20, 1999
- Finished off OfficeServer update script for 1.0-build #3, Apache
update
- Investigated flash problem on unit supplied by Dave C. It has the
interesting effect that the flash _reads_ back differently under 2.0 versus
2.2 kernels. Writing it under 2.0 leads to corruption (no boot) whereas
writing it under 2.2. works okay.
Week of Sep 27, 1999
- Investigating the problem with portmap under 2.2 kernels.
- Further investigating the FIN_WAIT_1 problem under Apache and 2.2.9.
Versions 1.3.[469] of Apache were tested and all have the problem under
2.2.9. Apache 1.3.9 works fine under 2.0.35 (981211), but it is slow.
Enabling the lingering close option when building Apache seems to cause the
connections to remain in ESTABLISHED state rather than FIN_WAIT_1, but the
main problem remains.
- Tried out Scott's gdb (kernel patch required). Worked with Scott's
kernel binary, but not with 2.2.12-rmk1 suitably patched. There are some
differences in arch/arm/kernel/ptrace.h, most notably the macro
user_pc_pointer does not exist in -rmk1, which might explain things. Needs
further investigation.
- Revived the cluster. Running RC5 clients on six nodes, the other 4
won't boot the current 2.0.35 kernel (presumably because they are Rev3
boards). Still, this should give 6*350 = 2100 Kkeys/s.
- Continued investiagtion of the Apache problem. It only occurs on ARM
(not on x86) and with RMK's 2.3.18 kernel binary the system also behaves as
would be expected, ie. the FIN_WAIT1's quickly evaporated and leave behind
TIME_WAIT sockets.
Week of Oct 4th, 1999
- Continued investigation of the FIN_WAIT1 problem. Systematically tried
out kernels 2.2.9-3, 2.2.12, 2.2.12-rmk3, 2.3.18, 2.3.18-rmk1, built with
gcc 2.8.1 and 2.95.1. All cases failed eventually. Russell's binary of
2.3.18-rmk1 works, but I cannot reproduce it even with his .config
file.
- Connections stuck in FIN_WAIT1 appear to count-down and then restart
with a longer timeout (to a max of 120 sec) as they should. However no FIN
is observed by a third party listening on the pipe when the timer
expires.
- Seems the problem is happening long before the FIN_WAIT1. While
ESTABLISHED, the connection sits idle for quite some time with data in the
send queue but not going anywhere. Eventually it drops in to WAIT1 state,
still with data to send.
- Using a network sniffer reveals that the packet that is being
retransmitted has an incorrect TCP checksum (consistently). The
checksum.o routine seems to be innocent. TCP header checksums are
inconsistent though. When the multiple inline functions are broken up
somewhat, more consistent results are obtained.
Week of Oct 12th, 1999
- Further investigation of the checksum corruption problem. Sean found
that the checksum of the data area, which normally never gets recomputed, is
wrong. Adding an explicit recalculation in the retransmit loop fixes the
problem. Patch prepared for the 2.2.12 kernel. Of course it would still be
nice to find out why its wrong in the first place.
Week of Oct 18th, 1999
- Final preparations for 2.2.12 kernel release. The real origin of the
bad checksum still remains elusive. PhilB's patches to checksum.h
look right, but do not cure the problem.
- Working on updating the HOWTO's and documentation available under ~ralphs.
- Kernel 2.2.12 near-final released to
/users/r/ralphs/kernel/
Week of Nov 1st, 1999
- Prepared a SCSI-capable disk image for the OfficeServer. Using kernel
2.2.12-19991021 and firmware 2.1.16, since this also catches the "Buggy cpu"
messages properly. This is meant to be a minimal change to support scsi, so
the other officeserver UI changes are not included.
- Created a NetWinder-specific kudzu package, to handle the
details of customizing /etc/conf.modules and /etc/inittab
for the scsi and/or rackmount. In case of SCSI, the yellowfin driver
replaces tulip, and symbios gets loaded for disk access. In the case of
rackmount, a serial console at 9600-8N1 is launched on the front panel port.
- Tested SCSI booting also. We won't support this initially (only scsi as
a secondary filesystem). Requires experimental firmware based on the
2.0.35 minikernel (not the usual 2.0.31) and a kernel with SCSI built-in, of
course. Seems to work quite well actually, but the minikernel stability
has yet to be properly tested.
Week of Nov 8th, 1999
- Completed the most important sections of the Rescue-HOWTO.
- Took a stab at building Mozilla M10 on DM15-beta5. First attempt
resulted in a binary that complained of an error in the dynamic linker.
Suspect the shared libraries weren't built with the -fPIC option;
trying again with configure --enable-pic-dso-cflags.
- Looked into tftp booting with a piggypack compressed filesystem under
2.2 for Scott. Kernel seems to reckognize the ramdisk is there, but doesn't
execute the linuxrc script. The culprit turned out to be a missing
/dev/console in the ramdisk. Without it, the kernel silently falls
through to booting off the "old" root.
Week of Nov 15th, 1999
- Initrd booting problems turned out to be caused by the fact that the
gzip "magic markers" appear several times in the image, and therefore the
minikernel passes the wrong initrd_start value to the kernel. We'll fix
this by putting in a proper structure to describe the segments and how big
they are (will require firmware updates of course).
- The mystery tcp checksum failure has been solved, it seems that
interrupts were trashing the psr on exit from a fault handler. New kernel
with the fix has been released.
- Similarly, missing cache flushes in the tulip driver were causing it to
lock up sporadically under high load. These have been fixed in the released
kernel (1999-11-18).
- Rebuilt the officeserver image repeatedly to relflect the kernel and
firmware changes.
Week of Nov 22nd, 1999
- Added DMA and sound checking to the kudzu package (rackmounts normally
don't ship with sound chip). Turning on the DMA is safe on RevG of the IDE
controller chip, which is going on all boards now...
- Rolled out the first auto-update for OfficeServer, it will bring images
up to os-1.1-3 (kernel 11/18 and nettrom 2.1.16b). Cross your fingers
;)
- Looked into initrd booting some more, it seems that (for a change) the
kernels I build work and the ones Woody makes do not. Trying to sort out
why this is so.
- Fixed a glitch in the auto-update code... it wasn't comparing package
versions correctly, and therefore it didn't install the new kernel. Also
moved the backup dir for config files to "update" to match MatthewK's
directories for the restore system functions. Still need to make
fixconfig.cgi work in the other directories.
- Got specweb99 test set up - seems to behave simiarly to reported
(problems on class3) - hopefully will shed some light on the netbench
results too. Will look at it more on Monday.
Week of Nov 22nd, 1999
- As the NetApp filled up over the weekend, spend some time sorting out
where it all went. Noticed many duplicate packages scattered about in
people's home directories; we'll need to implement a better way of handling
them in the near future.
- Looked into running "legacy apps" on DM-15. Older binaries can usually
be made to run by creating links for /lib/ld.so, and sometimes,
/usr/lib/libc.so.4.6.27. Either point at the new libraries, or, if
that doesn't work, copy the old ones from a DM-13 image. This only works
because these files got renamed for the new glibc.
- Discovered that the nfs daemon is pretty stupid when reading
/etc/exports. The netmask has to be spelled out the old fashioned
way; using an IPv6-style netmask doesn't work as expected- it seems to
create a tiny subnet, so the first 8 machines are okay, the rest don't work.
(for the case of a /24 netmask). Should really be fixed in a newer
version. Also, pre-DM15 versions of mountd would core dump if they could
not reverse-lookup the host attempting to connect.
- Traced initrd problems with Scott, looks like it was a missing
/dev/console entry. Patched the kernel to print a warning in the
future, if console cannot be opened. Still need to verify that Woody's new
firmware with the table-of-contents behaves the same way.
Week of Dec 6th, 1999
- OfficeServer-1.2-2 created. Lots of bugs to be fixed.
- Lots of days missed here... oops!
Week of Dec 20th, 1999
- Working on new devel build of officeserver, with various security
updates and other fixes.
Ralph Siemsen / ralphs@netwinder.org
- Older Reports