Drinking from the firehose: How the Mill CPU decodes 30+ instructions per cycle

The military maxim "amateurs study tactics, professionals study logistics" applies to CPU architecture as well as to armies. Less than 10% of the area and power budget of modern high-end cores is devoted to real work by the functional units such as adders; the other 90% marshalls instructions and data for those units and figures out what to do next.

A large fraction of this logistic overhead comes from instruction fetch and decode. Instruction encoding has subtle and far reaching effects on performance and efficiency throughout a core; for example, the intractable encoding used by x86 instructions is why the x86 will never provide the performance/power of other architectures having friendlier encoding.

Some 80% of executed operations are in loops. A software-pipelined loop has instruction-level parallelism (ILP) bounded only by the number of functional units available and the ability to feed them. The limiting factor is often decode; few modern cores can decode more than four instructions per cycle, and none more than 10. The Mill is a new general-purpose CPU architecture that breaks this barrier; high-end Mill family members can fetch, decode, issue and execute over 30 instructions per cycle.

This talk explains the fetch and decode part of the Mill architecture.

Speaker: Ivan Godard

Wednesday, 05/29/13

04:15 PM

Contact:

Website: Click to Visit

Cost:

Free

Save this Event:

iCalendar
Google Calendar
Yahoo! Calendar
Windows Live Calendar

Skilling Auditorium

Stanford University
494 Lomita Mall
Stanford, CA 94305