Drinking from the firehose: How the Mill CPU decodes 30+ instructions per cycle
The military maxim "amateurs study tactics, professionals study logistics" applies to CPU architecture as well as to armies. Less than 10% of the area and power budget of modern high-end cores is devoted to real work by the functional units such as adders; the other 90% marshalls instructions and data for those units and figures out what to do next.
A large fraction of this logistic overhead comes from instruction fetch and decode. Instruction encoding has subtle and far reaching effects on performance and efficiency throughout a core; for example, the intractable encoding used by x86 instructions is why the x86 will never provide the performance/power of other architectures having friendlier encoding.
Some 80% of executed operations are in loops. A software-pipelined loop has instruction-level parallelism (ILP) bounded only by the number of functional units available and the ability to feed them. The limiting factor is often decode; few modern cores can decode more than four instructions per cycle, and none more than 10. The Mill is a new general-purpose CPU architecture that breaks this barrier; high-end Mill family members can fetch, decode, issue and execute over 30 instructions per cycle.
This talk explains the fetch and decode part of the Mill architecture.
Speaker: Ivan Godard
Wednesday, 05/29/13
Contact:
Website: Click to VisitCost:
FreeSave this Event:
iCalendarGoogle Calendar
Yahoo! Calendar
Windows Live Calendar
