Japanese researchers build 512-core math coprocessor
While we're just getting used to dual-cores and have our eyes on those upcoming quad-core chips, Japanese computer scientists at the University of Tokyo have built a 500MHz 512-core math co-processor chip that can perform up to 512 billion floating-point operations per second. The Grape DR chip is designed to fit on a PCI-X card and act as a secondary chip for the main CPU. The project, which has been ongoing since 1989, expects to reach two petaflops (that's two quadrillion, or 2,000,000,000,000,000) floating-point operations per second sometime around 2008. No doubt that Intel, which is planning on an 80-core processor by 2011, is watching this research very very closely.
[Via Channel Register]
[Via Channel Register]

















Reader Comments (Page 1 of 1)
John Stracke @ Nov 7th 2006 9:33AM
Anita writes: "How useful is this level of speed if the data has to travel from the main processor to the PCI-X attached coprocessor?"
Pretty useful, actually. It's perfectly suited to batch applications, where you submit a block of data to the coprocessor, let it run, and eventually get back the result--which is most really high-end math computation. (Consider Seti@Home, and replace "PCI" with "Internet".)
If the coprocessor had to go over the PCI bus to get to RAM, then, yeah, that'd be a show-stopper; but, if you look at the diagrams, you'll see that there's a chunk of memory on the chip.
Mike @ Nov 6th 2006 10:01PM
Talk about speed!
Mike @ Nov 6th 2006 10:02PM
and whats the pice???
Mike @ Nov 6th 2006 10:02PM
and whats the PRICE???
Alcaron @ Nov 6th 2006 11:25PM
"No doubt that Intel, which is planning on an 80-core processor by 2011, is watching this research very very closely."
I know its a dumb niggle but why is it that everytime something seems kind of in the same general ballpark as a previous story on engadget they link the two together like it really applies?
In reality, given what these guys are actually doing, it is not likely to apply too terribly much to Intel's plans...the fact that they use PCI-e should be a dead giveaway of that (or do you really think a memory interface between cores and the rest of the system would really work with PCI-e).
In reality it is worlds apart from what Intel plans on doing.
kyle allen @ Nov 7th 2006 12:46AM
i probably did the math wrong, but isnt 2 petaflops 2000 ghz?
Temple @ Nov 7th 2006 1:40AM
>>How useful is this level of speed if the data has to travel from the main processor to the PCI-X attached coprocessor? Unless the multicore coprocessor has a TON of onboard cache, the speed is a relatively useless measure of performance since the bottleneck will be the PCI bus, not the processor speed.
This isn't a GPU, hence will not need bandwidth in the same sense as a graphics card. Most of the bandwidth and memory is taken by textures etc, from the diagram, it clearly shows that it will use several PCI-X slots.
>>The Playstation 3 and XBOX already have performance above the 1 Teraflop level
This is 2 Petaflops, which is 1,000 times more then a teraflop. Also, the PS3 GPU does 2.18 terflops, XB360 around ~1 teraflop, but the XB360 three-core CPU can only do around 115.2 gigaflops, and the PS3 Cell around 204 gigaflops. All measured in single floating point precision, most likely the petaflop figure from the University of Tokyo is aiming for double floating point presion.
>>i probably did the math wrong, but isnt 2 petaflops 2000 ghz?
Clock speed != performance
ahmed yeahia @ Nov 7th 2006 2:54AM
at this rate .. star track days are not far away
Vanillacide @ Nov 7th 2006 4:21AM
If only it worked with Excel.
panoz @ Nov 7th 2006 4:26AM
temple++
you don't need a ton of memory to store mathematical data to be computed.. you need a ton of processing power, and a highly customized and optimized combination of hardware and software. other than that, all you need to do is load the program data, and math data onto the local (on card) memory, instruct it to begin computing, and play tetris really fast until it's done..
>>i probably did the math wrong, but isnt 2 petaflops 2000 ghz?
well you could say that a system that can do 1 flop / cycle should be 2.000.000 Ghz , or a system that can do 4.000.000 flop / cycle should be 500Mhz (as in the grape DR :D). ain't math grate :D :D
panoz @ Nov 7th 2006 4:30AM
OMG not grate as in http://en.wikipedia.org/wiki/Grate but great as in alexander the GREAT..
sorryyyyyyy :(
Samuel Bayliss @ Nov 7th 2006 4:43AM
This isn't particularly new. Check out Picochip (www.picochip.com) and Aspex Semiconductor (www.aspex-semi.com) for two well funded UK startups who are pursuing high throughput through the use of parallel processing on many cores.
Rob @ Nov 7th 2006 9:02AM
Cheaper than Intel developing it... us grad students work long hours, for next to nothing. But, hey, at least we are making stuff that may be the next cool toy.
Squirrel @ Nov 7th 2006 9:27AM
Will it play DooM?
John Stracke @ Nov 7th 2006 9:42AM
The Engadget article has some inaccuracies here.
First: there are 1024 cores ("processor elements") here, not 512.
Second: it doesn't say anything about PCI-X; it just says "PCI". I expect they'd build it on PCI Express (which is *not* the same as PCI-X, which is just a faster version of old-style PCI).
Finally, it's not "a secondary chip"; it's designed as part of a massive computational cluster with two nodes. Every node has 64 PCs, every PC has two cards, every card has 8 processors, and every processor has 1024 cores. Cores, processors, cards, and nodes, how many were going to St. Ode's?
(Answer: two megacores.)