The talk today in Portland State’s CS colloquium series featured Intel’s Tim Mattson talking parallel programming and Intel’s research chips.
I’d followed the Intel press releases on their Terascale chip with a lot of interest. Turns out that was definitely meant as a research chip only to test some hardware, with all of maybe five people ever having written software for it and that seemingly as an after thought so the marketing would be able to say something more about the chip.
Just the last two months Intel Research has been making some press with their SCC chip (“Single Chip Cloud” computer…what a marketing name!). This one they’re aiming to actually get out into the hands of researchers. They’ve got a bare metal mode, a full linux kernel per core, and Microsoft’s announced something or other too. It’s particularly interesting though in how it is set up to leverage message passing and does not give cache coherency. It should spawn some interesting academic research in the coming year or two.
Mattson’s definitely of the mind that the way to deal with some of the central issues with scaling is to stop trying to have cache coherence. He makes pretty straight forward arguments. It’s interesting the parallels with distributed computing going back a couple decades even, both in basic programming and in reliability assumptions.
One point that struck me: He said that not many programmers are used to thinking in distributed terms and that most prefer a shared memory model. Probably most of HPC is looking at cache coherent single system images when you’re at the dozens of cores type scale…the scale of these research chips. Mattson comes from a chem background and certainly a number in the audience were HPC types. But maybe their marketing name actually has a bit of foundation in looking at the web space instead of HPC. If you look at today’s really popular web applications, they’re backed by a distributed software model using low end commodity systems that are assumed to be failure prone. So there’s a whole generation of programmers who take it for granted that if you scale up much you need to take the time to architect in a distributed way…they don’t just scale by adding/allowing parallelism and assuming they’ve got a giant machine with a single address space. Probably most of them don’t even program at a level where they know or care what an address space is!
The other thing I took away is that I should probably be paying a little attention to OpenCL.