SOLARIS/PPC =========== A Brief History of Solaris (SunOS) Ports ---------------------------------------- In the early 1980s SunOS was based on Berkeley BSD Unix and ran on the Motorola 68020. At that time, SunOS ran on just the one Instruction Set Architecture (ISA). In the late 1980s, SunOS was ported to Sun SPARC architecture. Support for the Motorola 68020 was discontinued; there were no serious plans to support both ISAs. About 1992, Sun released Solaris 2.1 for the Intel IA-32 architecture (AKA x86). This was not a replacement for SPARC, but was intended to continue providing Solaris targeted to both ISAs, built from a common source code base, for some time to come. I assume many people in the industry know about this part of the history. What is perhaps less well-known is that there were ports of Solaris to other ISAs. Around 1996, a port of Solaris 2.5.1 was released which was targeted to SPARC, Intel IA-32, and to the PowerPC. Solaris/PPC was not just some prototype in the labs. It was a real released and supported product, including QA; passing stess tests; all parts of Solaris besides just the kernel, libraries and commands; tech support; packaging; installation; Desktop (CDE); the works. But, due to commercial realities, mainly the breakup of the Motorola/IBM/Apple troika, Solaris on PowerPC had a very brief life as a real Sun product. The commercial failure of Solaris/PPC was not due to any technical shortcomings of the port. Starting late 1997, Sun did a port of Solaris to Intel's IA-64 architecture. This never even made it as a Sun product. This was for various commercial and political reasons. But, again, this was not an engineering failure. Sun's Solaris/IA64 porting team met the milestones of single user prompt and multi-user prompt, pretty much as soon as hardware became available from Intel. Solaris/IA64 even passed some pretty rigorous stess tests -- the kind that QA would use to test a product. But, not all of Solaris was ported, just the OS and Network consolidation (ON), not CDE and all the rest that goes into Solaris product. Only the reference platform was supported. The device configuration for that platform was pretty much hard coded; there was no support for discovering what devices are on the system and where they are. In late 2005, Sun Labs started a project to reanimate the port of Solaris/PPC. Sun Labs has many internal uses for embedded systems in general. And, many of those systems were implemented using PowerPC hardware. None of those systems ran Solaris. There were several reasons for implementing Solaris/PPC for some of Sun Labs own internal projects. So, it was not necessary to make it a Sun product, with all that entails, in order to justify the porting effort. In addition to Sun Labs own internal uses, several other companies expressed interest in Solaris/PPC, if the port were completed and upgraded to something close to product quality. Why Sun Labs? ------------- You might wonder why would Sun Labs be interested in Solaris if their use of PowerPC hardware is for embedded systems. This is for two reasons. First, Solaris is generally considered to be a big OS, too big for many embedded systems applications, but it does not have to be big. Sun has had some success with trimming Solaris down. And, more can be done in that area. Second, Sun Labs has some embedded systems applications that are just fine for a larger system. "Embedded systems" means different things to different people. Many would associate "embedded" with small memory footprint, low power consumption, high reliability, rapid boot, real-time computing, etc. You could knock out one or two of those constraints and some people would say that it does not qualify as "embedded", but it would be fine for the purposes of others. The thing that most universally divides embedded and not-embedded is the "model of use", not size, not power, not real-time. An embedded system is one that is turnkey, does a single job, right out of the box, has all of its software and its configuration managed monolithically. For example, a rather large machine dedicated to high performance scientific computing can be an embedded application. It can boot up with a monolithic "blob" of software and configuration data which gets updated remotely in much the same way as flash-based firmware gets updated. It can boot up quickly and just know how to join in as one member of a pool (perhaps a cluster) of a thousand machines, and be given work to do. According to the "model of use" definition, such a machine is an embedded system. It may or not be doing real-time computing. It may or may not be small. How Much Work Is It To Redo A Port Of Solaris? ---------------------------------------------- You might think that the new port of Solaris/PPC would be easy. It had already been done, so we could just dust off the old Solaris 2.5.1 PPC workspace and start working, right? Well, not exactly. Two big things have changed in the intervening 10-12 years: 1) PowerPC hardware, and 2) Solaris. The new port needed to work on at least one modern target platform, and none of them are quite like the reference platform used by Motorola, IBM, and Apple in 1995 - 1996. As for Solaris, much has changed between Solaris 2.5.1 and Solaris Nevada (AKA ONNV, AKA Solaris 11). Not just implementation, but kernel interfaces that must matched by any new PowerPC code that is platform-specific or ISA-specific. Having the Solaris/PPC 2.5.1 source code came in handy. It was used as an extraordinarily detailed design document to guide our Solaris/PPC 2.11 work, but surprisingly little code just worked, as is, without some form of update. Solaris/PPC Is Open Source -------------------------- One thing that was new about Solaris/PPC 2.11 is that it was intended to become an OpenSolaris community project from the very beginning. Pretty much anybody could view the code, even pretty early on in the project, when we barely had inetboot running. We knew we could not expect many people to contribute early on, when the slogging is so difficult. So, the early work would be done by a very small porting team (embarrassingly small). But, the attitude was pretty much "If we build it, they will come." No other port of Solaris was done as open source, while it was in progress. All ealier ports of Solaris were "sanitized" and released as OpenSolaris, after the port was already a fully functioning product. Solaris/PPC was visible very early on, warts and all. I think that intimidated many people who might have wanted to contributed, but only at a later stage of development. The fact that Solaris/PPC was open source from the beginning influenced some of the details of the port. Solaris/PPC used Subversion (svn) to manage its source code. At that time, Sun was still using Teamware internally, and had not converted to Mercurial. Now, Solaris/PPC is behind, because we have yet to convert from Subversion to Mercurial. Other choices were influenced by the consideration that Solaris/PPC is an OpenSolaris community project. That was partly responsible for the choice of reference platform, although that choice did not really go over very well with many members of the community. But, no single choice of target hardware would have pleased even a majority, let alone everybody. Solaris/PPC ONNV Reference Platform ----------------------------------- For the first target platform, we chose Genesi's ODW board, which is based on a PowerPC processor made by Freescale Semiconductor. It was chosen for a variety of reasons. I cannot go into all the reasons here. Some considerations were: 1) It is inexpensive, about $100.00. The thinking is that members of the community that might want to contribute would be really put off by even moderately expensive hardware. 2) It had I/O and interrupt architecture similar to that found on typical Intel IA-32 boxes. So, we could port things like Ethernet device drivers directly from available source code that is known to work on x86 platforms. PowerPC is a large family of processors, many of which are made for the embedded systems market, and so they are inclined to integrate more system components into a single chip, including memory controller, I/O controller, interrupt management, etc. In order to implement these system-on-chip platforms, the various manufactures were more free to use private, proprietary high-speed buses rather than any sort of industry-standard bus. So, the PowerPC embedded systems market is even more of a free-for-all than it is in the x86, if such a thing is even imaginable. 3) The Genesi ODW board used OpenFirmware (IEEE 1275). So, we were thinking we could just go ahead and use that, because Solaris/PPC 2.5.1 was written for the CHRP platform which used OpenFirmware. CHRP = Common Hardware Reference Platform. 4) Genesi expressed interest in Solaris/PPC, and were very helpful and cooperative. Just Resting ------------ Sun Labs discontinued development of Solaris/PPC the end of February, 2008. This was for many reasons that I know of. There were budget cuts, and Solaris/PPC was supposed to be OpenSolaris, anyway. There was new management that had a different attitude about whether this kind of work belongs at Sun Labs. It was originally intended to help support research projects within Sun Labs, but nobody ever said that a port of Solaris itself qualified as a proper part of Sun Labs research portfolio. So, it was not a Sun product and it was not research, and times are tough. That left Solaris/PPC pretty much without a home, within Sun. If it does not make it as an OpenSolaris contributer-support project, then it dies. Its fate is yet to be determined. The Sun Labs Team ----------------- For a span of about two years, Solaris/PPC was developed by an average of about 2.5 engineers and 1 manager, depending on how you want to count. Brian Horn and I were the two engineers who worked on it the most. Tom Riddle was our manager and worked as a software engineer. For a while, early on, Josh Uziel did some system administration and other work to support the team. We had an intern for a short while who worked on inetboot. Sun Labs Hardware ----------------- The setup at Sun Labs included: + a total of 4 ODW boards; + two Solaris development machines; + a Linux box that was the server for one of the target machines; + a power strip that could be controlled via a telnet connection, so that I can work remotely; + a JTAG hardware debugger attached one target machine. Only two target machines were used most of the time, because they were connected to the machines that acted as both development machines and boot servers. In addition to controlling power on all the machines, one of the ports on the remote power strip was attached to a custom-built reset button. The button consists of a relay and two lead wires, with a styrofoam cup for the case, all secured with duct tape. The normal setting for power to all machines was "on", but the the sense of things for the reset button was inverted, depending on how you look at it. Its normal position is "off", and in order to press the button, you turn power "on", then quickly turn power back "off", releasing the button. The hardware setup is still there. Tom Riddle managed to get some of these machines to be off of Sun's network and, instead, face the outside world. So, the Lab setup is now accessible, via ssh only. A Few Lessons Learned --------------------- Lesson: The more I see of the amount of trouble caused by proprietary, closed-source BIOS code or firmware the more I join the ranks of people who are part of a big backlash against it. I believe that many make-or-buy decisions are made by MBAs, not by engineers. The true costs of "leveraging" closed-source firmware do not show up in the things measured by most MBAs. Linux on PowerPC just bypasses all vendor's firmware. Now, I know why. This is not a lesson specific to Genesi, the ODW board, or PowerPC. It applies to things like BIOS code on an x86 box, just as well, perhaps more so. It is OK if we live in a perfect world, where men are men and standards are standards, and software has no defects. But, as soon as anything goes wrong, you start to realise just how screwed you are if the BIOS was written in some far away land -- far away geographically, temporally, culturally, and in by virtue of layers of bureaucracy. Don't get me started. Lesson: The Solaris kernel no longer really works on a truly 32-bit machine. "There is only one mistake that can be made in a computer design that is difficult to recover from: not providing enough address bits for memory addressing and memory management. The PDP-11 followed the unbroken tradition of nearly every known computer." -- Gordon Bell The implication is that just about every other aspect of the design of an ISA can be extended, microcoded, emulated, or otherwise compensated for, but overcoming limited address bits means, when the time comes, a new ISA is needed. I was reminded of this quote when I began to realize just how much the Solaris kernel had evolved to depend on all target machines to support some form of 64-bit primitives for synchronization, even on mostly 32-bit ISAs. Over time, Sun dropped support for 32-bit kernels on SPARC. On x86, it is in the process of becoming more and more common for the kernel to be 64-bit, even if there is still support for 32-bit applications. Even the IA-32 ISA has a 64-bit compare-and-exchange instruction, so it is a "mostly 32-bit" ISA. But, the 32-bit PowerPC ISA is truly 32-bit only. There are no 64-bit memory accesses, and there are no 64-bit synchonisation primitives, not for the 32-bit ISA. Even getting a 64-bit register such as timebase (a 64-bit running counter of clock ticks) involves two 32-bit accesses, non-atomic, and code to watch out for carry from lower half to upper half that might have happened after reading the upper half and before reading the lower half. The Solaris kernel is full of fundamentally 64-bit atomic operations. Pretty much all other problems are just a matter of programming. There is just a lot of it. But, this is the one truly fundamental problem. It can only be worked around, and the workaround is nasty. There is no nice satisfactory solution short of reforming huge amounts of Solaris kernel code. That is not going to happen any time, soon. Lesson: kmdb is a nice debugger, but for early bringup Solaris should retain a much simpler debugger. It could be the older kadb, but it might better to have an even simpler debugger. A debugger is a major piece of work, and by its very nature, it is ISA-specific. It is not good to wait until kmdb is ported, before you have any kernel debugger at all. For porting to a new ISA, an incremental approach can be better than all-or-nothing. -- Guy Shaw