AMD should take on Atom and Larrabee (next-gen hybrid CPU / GPU, for Nehalem generation) with one chip. This chip would have 4 or 8 cores that would get the advantages of a shared cache.
AMD needs to choose a core that they already have running, so that this project can get to market fast. The Geodes (AMD Geode LX900 @600mhz) in the UMPCs and OLPCs are not that bad, and consume 2-5 watts -- including graphics and chipset. Only a watt or two of that is the core. AMD also was faster then Intel in the 486 and 586 generations.
Good: If AMD put 4 of these Geode LX cores in a chip and gave them a shared L2 and graphics oriented SIMD instructions, they could match the Atom & 945gs pair. The low-cost Intel Atom with the 945-series chipset is drawing 14 watts (11+ watts are chipset). We will only know if this is a good chip when we know AMD's full platform power draw. This is the total of the chipset (with IDE, sata, pcix, usb, fan control, etc.)
Better: Or go back further and use an in-order core like the AMD586. Add a vector unit that has SSE and new graphics instructions (remove the 80-bit floating point unit, trap and emulate). If 8 of these cores were to share a memory controller, L2-cache, and a branch prediction unit, the product would be really flexible.
When the chip is addressed as one full x86 core, the memory controller and branch prediction unit could be designed to use speculative execution to appear to be an super-scalar core (Atom competitor). If the processor hits a branch instruction, it issues the "taken" instruction stream to one CPU and the "not taken" instruction stream to the next CPU.
The branch predictor/L2 cache/memory controller would have to keep track of the writes by each core, and what instruction path the core was on. When the real path of the program is known the branch predictor/L2 cache/memory controller would discard the writes of the core that was down the wrong path, and commit the writes of the core that was on the taken path. With 8 cores you can be working on 8 states, or 3 binary branches out. I have no idea how complicated this is, but speculative execution is not a new idea.
The same chip addressed as 8 cores is a Larrabee competitor. In this mode the L2 acts as a normal shared L2. The memory controller also acts like a normal shared memory controller. The branch predictor may not be used at all, or may just be prefetching the instructions and data down the most likely path for each CPU.
Now that we've thrown 'em off the trail, use the form below to get in touch with the people at Engadget. Please fill in all of the required fields because they're required.
This is not exciting.
AMD should take on Atom and Larrabee (next-gen hybrid CPU / GPU, for Nehalem generation) with one chip. This chip would have 4 or 8 cores that would get the advantages of a shared cache.
AMD needs to choose a core that they already have running, so that this project can get to market fast. The Geodes (AMD Geode LX900 @600mhz) in the UMPCs and OLPCs are not that bad, and consume 2-5 watts -- including graphics and chipset. Only a watt or two of that is the core. AMD also was faster then Intel in the 486 and 586 generations.
Good:
If AMD put 4 of these Geode LX cores in a chip and gave them a shared L2 and graphics oriented SIMD instructions, they could match the Atom & 945gs pair. The low-cost Intel Atom with the 945-series chipset is drawing 14 watts (11+ watts are chipset). We will only know if this is a good chip when we know AMD's full platform power draw. This is the total of the chipset (with IDE, sata, pcix, usb, fan control, etc.)
Better:
Or go back further and use an in-order core like the AMD586. Add a vector unit that has SSE and new graphics instructions (remove the 80-bit floating point unit, trap and emulate). If 8 of these cores were to share a memory controller, L2-cache, and a branch prediction unit, the product would be really flexible.
When the chip is addressed as one full x86 core, the memory controller and branch prediction unit could be designed to use speculative execution to appear to be an super-scalar core (Atom competitor). If the processor hits a branch instruction, it issues the "taken" instruction stream to one CPU and the "not taken" instruction stream to the next CPU.
The branch predictor/L2 cache/memory controller would have to keep track of the writes by each core, and what instruction path the core was on. When the real path of the program is known the branch predictor/L2 cache/memory controller would discard the writes of the core that was down the wrong path, and commit the writes of the core that was on the taken path. With 8 cores you can be working on 8 states, or 3 binary branches out. I have no idea how complicated this is, but speculative execution is not a new idea.
The same chip addressed as 8 cores is a Larrabee competitor. In this mode the L2 acts as a normal shared L2. The memory controller also acts like a normal shared memory controller. The branch predictor may not be used at all, or may just be prefetching the instructions and data down the most likely path for each CPU.
Note:
Work on re-thinking the 486 and 586 cores for mobile devices is already a topic of papers.
http://www.cs.washington.edu/research/smt/memoryLogix.pdf