Cooling an Aero15x

Too Hot To Handle

Warning: Mucking about with embedded controllers may not always be a great idea; you run the risk of permanently damaging hardware. A month or two after installing this on my machine, the bearings on both fans failed within a week of each other. Admittedly, the laptop was two years old, and it was a an easy fix, but be aware of unintended consequences…

A few years ago I bought a nice laptop (Gigabyte Aero 15x), the idea being that I would have a decent laptop for gaming should I need it. Unfortunately for my wallet, I play less and less games, so the expense was a bit of a waste. I was able to somewhat retrospectively justify having it to play around with CUDA (Nvidia’s general purpose computing language for their graphics cards).

I run Linux full-time on the laptop (first Fedora, now Ubuntu), and it quickly became apparent that the operating system really didn’t care about what was going on with the cooling system and it’s fans. As such, the laptop would be fairly quiet up until the point where the onboard fan controller decided it had enough with whatever I was subjecting the poor thing to, subsequently making enough noise to put a Boeing 747 (R.I.P) to shame.

To be fair, it wasn’t the noise I was concerned about, it was the heat. Reviewing the sensors showed the laptop would be quietly idling at 60 degrees, which was a bit of a shock.

I understand that these days higher temperatures are to be expected (but I don’t actually know what’s reasonable), but it was a little unsettling having the laptop I do all of my work on feel like it was going to burn my fingers off when it did anything of substance. I did try a re-paste, but that had little to no effect.

Having had enough, I did some googling and found “p37-ec”, which demonstrated how you could talk to the embedded controller in various models of Gigabyte laptops. It turns out that it can provide various temperature readings and has an option for enabling and setting custom fan speeds. What I ended up writing was:

The source code is available at GitHub

The Algorithm

The main application is straightforward:

Safely, Safely

p37-ec reads a file that is exposed by a kernel module (ec_sys) that represents the laptop’s on-board embedded controller. With this file, you can manipulate various registers to do various things. However, opening up random files tied to hardware is a dangerous game; it is very possible for you to flip a bit and cause actual damage to the machine.

Consequently, we need to make sure that the laptop is the correct one.

Luckily, there are two files we can read to retrieve this information. The files and what we are looking for follow:

Logging

The logging implementation is a little more complicated that it needs to be; I wanted to use the log functions as if they were static, rather than operate on an object, as well as support swapping out different log writers. e.g.:

log::make_and_set_logger<journald_logger>();
log::set_level(log::level::debug);

if (!is_aero15x()) {
    log::fatal("This product is not compatible, a Gigabyte Aero15x (P65Q) is required.");
    std::exit(1);
}

This involved use of pointers to static objects, using the initialisation on first use idiom. The technique for doing this I found in libstdc++ (GCC) and libc++ (Clang), where they managed default memory resources, however I will save discussing this for another time. Since we are expecting the main application to be run as a systemd unit, we want the output to be sent to journald. While systemd has its detractors, one of the things that is easy to do is logging. journald optionally expects that the log lines are prefixed with a message level, and that’s it. We don’t have to worry about log files and rotations, unless we specify otherwise. The important thing is that there is nothing really special we have to do in the application itself. The above code will produce the following output:

<7>Log level now debug
<7>Vendor is GIGABYTE, product is P65Q
<6>Aero 15x fan controller starting
<6>Changing speed to 195, for temp 70'c, next up 80, next down, 61

Where the levels 6 and 7 are for info and debug, respectively.

Reading and Writing to the Embedded Controller

This is achieved by a simple class that opens the register file and has methods abstracting reads and writes of the registers. There’s nothing particularly special about it, for a given function, it seeks to the location in the register file and either reads or writes data.

The register addresses in the p37-ec repository don’t actually marry up with what they actually are for the Aero 15x. Luckily, Notebook Fan Control, has the correct addresses.

Change Detection

The initial state is max; this will eventually settle to the correct state. It is better to take a cautious approach when starting; if the system is already melting down, it makes little sense to have to wait before the fans are set high. Also it’s simpler than scanning through all the states to find the best one to start with. (Although arguably this could still happen if the fans were at the lowest state and you threw it into the sun, or started Chrome.) The values of the states have been hardcoded, mainly because I thought it would be dangerous to allow a user to supply potentially dangerous values.

The states aren’t particularly clever, I currently have them hard coded as:

State Up Down Fan Speed
max 255 75 229
high 80 65 195
medium 70 55 160
low 60 45 125
min 50 -255 90

For a given state, if the temperature is greater than or equal to the “up” temperature, it will jump the next highest state, next lowest for “down”. Down temperatures are lower than the up temps so that we know we are a decent distance away from going back up and therefore at little risk of immediately changing back up again. For the max and min states, up and down values are ignored.

The down threshold is quite a bit lower then the next lowest states up threshold. This is to prevent flip flopping where the temperature hovers around one of the thresholds and the detector is flipping between two states.

Signal Handlers

The same handler is bound to SIGINT, SIGTERM, and SIGHUP which basically terminates the loop. While I was familiar with the former two, SIGHUP was new to me. This is received when whatever terminal runs the app “hangs up”, typically by it being closed. Without the handler, the program would be terminated without disabling the custom speeds – a potentially dangerous situation if the speeds were set low and the laptop started heating up.

Filtering

While monitoring the CPU temperatures using command line tools, I noticed what I suspected to be erroneous transient readings. It would occasionally jump up 20 degrees one second, and drop a second later. Median filters are well known for removing outliers, and a window of three was chosen so that if two consecutive high temperatures were recorded, a change was guaranteed; a balance between preserving quietness and responding quickly to dramatic changes.

systemd

The systemd unit file is responsible for describing how the service is started, stopped and any preconditions that ned to be met. The entirety of the file follows:

[Unit]
Description=Aero 15x Agressive Fan Controller

[Service]
ExecStartPre=-/sbin/modprobe -r ec_sys
ExecStartPre=/sbin/modprobe ec_sys write_support=1
ExecStart=/usr/bin/aero15x-fand
ExecStop=/bin/kill -INT ${MAINPID}
KillSignal=SIGINT

[Install]
WantedBy=multi-user.target

This service requires that the ec_sys kernel module is loaded. The reason why modprobe is called twice is because we need to make sure it’s inserted with the correct parameters. The first call removes the module; it’s prefixed with a hyphen so that if the module is not loaded, the error that’s produced is ignored.

Results

So this will be pretty brief; with the fan service running, the laptop will idle at around 45 degrees, and web browsing and youtube will bring it up to around

  1. While it is more noisy, it is much cooler to the touch, and far more
  2. comfortable to work with.