John Valentine
OC: i5-4670K, ASRock Z87 Exteme4, Part 2

Getting more from Haswell, using power-limited profiles

We assessed the heat dissipation of our cooling solution (115 Watts constant), and used it to allow the processor to reach high clock speeds when the workload had modest heat output. If the load became too heavy, the processor's clocks are reduced, to keep within the 115 Watt limit. Although successful, the firmware options are not flexible enough to make the most of every workload presented to the system, and some compromises are associated with each profile.

Overclock only with care!

We do not advocate any activity that would damage your equipment, and will not take responsibility for any consequences caused by following this article.

We take a responsible approach to overclocking, weighing performance gain and efficiency against the effort and risk invested in such a configuration.

Background

Profiles evolved

We've been running this processor using different profiles, as conditions have allowed:

  1. At stock speeds, on a stock fan, using default settings. Under heavy loads, the processor throttled itself to avoid its thermal ceiling, so I looked into intelligently clocking it. The motherboard was found to be driving the processor incorrectly.
  2. With the motherboard configured, a stable 3.9GHz all-core clock was achieved.
  3. The stock cooler was replaced with a larger air-cooling solution. A stable 4.2 GHz was configured manually, using only 3% more power per clock than at the stock clock.
  4. The motherboard was updated with new firmware. Its preset "4.2 GHz" profile then worked, with only a little tweaking.
  5. Instead of tuning just the clocks and voltage, we now throttle the clock, based on the power delivered to the processor. This achieves three new profiles within our power envelope of 115W.

This article explains the configuration of my current setting: power-limited clocking.

Relevant hardware

MotherboardASRock Z87 Extreme4
ProcessorIntel i5-4670K 3.4GHz
CoolingScythe Mugen 4 PCGH Edition (air cooling, push-pull slow fans)
RAMDDR3 1600 9-9-9-24

'Power Limit' Profiles

Profile
4.2 GHz4.3 GHz4.4 GHz
Core multiplier42x max, all43x max, all44x max, all
Cache multipler32x max43x max33x max
VCore1.000 V −0.025 V[1] [3]1.000 V +0.060 V [4]
VCache1.160 V +0.010 V[1]1.160 V +0.010 V
Load Line Calib.Level 4Level 4Level 3
CPU Input1.88 V1.90 V [1]1.90 V
Sys Agent+0.1 V+0.1 V+0.1 V
Power limit: short125 W, 1s
not reached
125 W, 1s
reached?
125 W, 1s
reached
Power limit: long115 W
not reached
115 W
reached
115 W
reached
Idle test0.675V,
0.8GHz,
25–30°C
0.738V,
0.8GHz,
28–33°C
0.745V,
0.8GHz,
31–38°C
POV-Ray test1.12V,
4.2 GHz,
63.7 secs
60°C
1.189V,
4.3 GHz,
62.7 secs
66°C
1.215V,
4.4 GHz,
60.2 secs
66°C
Burn-in test1.232V,
4.2–4.2 GHz,
78–84°C,
108 GFLOP
1.280V,
3.9–4.3 GHz,
80–86°C,
103 GFLOP
1.306V,
3.7–4.4 GHz,
82–86°C,
100 GFLOP

Notes

  1. Default for the firmware's "4.2" profile.
  2. Ambient: 18°C.
  3. VCore had no effect; VCore Offset was the only effective parameter. We speculate that the when the firmware's "4.2" profile is still active, the parameters have slightly different meaning.
  4. Not finely tuned; might work in the range +0.02V to +0.06V.

Conclusions

We felt a little limited by the cooling, and the power throttling at 115W became necessary, particularly above 4.2 GHz, to keep core temperatures from exceeding 85°C.

Performance-wise, the 4.4 GHz profile delivered 6% more speed in POV-Ray than the 4.2 GHz profile. We'd only consider using this if we could keep the processor busy 24/7, with no regard for energy costs. For 'special instruction' work, like the burn-in test, performance at 4.2 GHz was 8% better than at 4.4 GHz, because the voltage offset required to keep the system stable has a side-effect of requiring more power, and this exceeded our defined 115 Watt power limit, and so throttling reduced the core multipliers to 37x, losing 16% of their clock frequency at worst. Incidentally, the 'large' burn-in test at 4.2 GHz yielded 113.2 GFLOPS, exceeding our previous fixed VCore 4.4 GHz result.

We found that the cache multiplier gave no significant performance increase between 32x and 44x. In fact, higher cache speeds contributed significantly to heat output, and we found that lower cache speeds and voltages allowed more flexibility elsewhere (e.g. allowing 4.4 GHz core clocks at modest VCore offset).

Further work

Some of the supporting voltages in the 4.2 profile could be lowered a bit more. We'll benchmark that result, to see if performance suffers. Perhaps unsurprisingly, they almost converged with the original "profile 1b", here. Further, this round of tweaks might have increased the margins for stability, to allow efficiencies or gains to be made elsewhere.

The 4.3 GHz profile would benefit from a lower cache clock and voltage. We expect this would increase the throttled clock speed under high load, to between 4.0 GHz and 4.1 GHz.

If our results are below par for this cooling solution, then we'll look at re-seating it.