Controversy rages over Sun server glitch

Users report that Sun Microsystems has fully resolved problems caused by a defective memory component on its UltraSparc II...

Users report that Sun Microsystems has fully resolved problems caused by a defective memory component on its UltraSparc II servers.

The troublesome glitch, first reported by users in 1998, was acknowledged publicly by Sun in August 2000 and created havoc for users as the company tried for months to find fixes.

Sun's handling of the issue - and chief executive Scott McNealy's claim last week that defective IBM memory components were to blame for Sun's woes - has raised the hackles of some users.

"In my opinion, Sun is fully to blame," said one user who had been affected by the problem: a manager at a large systems integrator who asked to remain anonymous.

"It doesn't really matter who supplied the chips. Shouldn't the company who builds the server take responsibility for issues with it?" he added.

The defect was in an external memory cache on Sun's UltraSparc II microprocessors. Under certain conditions, the problem triggered system failures and frequent reboots at dozens of customer locations worldwide.

McNealy said the problems stemmed from defective IBM static RAM (SRAM) chips that Sun used in its servers. "They were the biggest source of the problem for us," he said, stressing that Sun no longer buys IBM SRAM. "We designed IBM out of that and put error checking and correcting memory [ECC] across the entire cache architecture."

William O'Leary, director of communications at IBM's microelectronics division, did not respond to McNealy's comments. However, he denied that Sun no longer uses SRAM from IBM and insisted Sun continues to be a "major and important" customer of IBM's high-performance SRAM technology.

Some Sun users are still unconvinced. "Sun is responsible," said a project manager at a large European bank that was affected by the glitch

"[Sun's] architecture was fundamentally flawed because there was no ECC checking on the cache memory," he said. "This is something you get in even the lowliest Intel processor that costs a few dollars."

The bank's problems were resolved after a series of fixes that included moving servers, installing kernel patches and swapping out processors.

"I could see how Sun would have problems finding the error," conceded another user at a large consulting firm, whose clients include several major airlines. The company had to battle for more than a year to resolve its problems.

"I'm still not happy with how Sun handled this particular problem, but Sun has been a good reliable vendor for our company since," he added.

Read more on IT strategy