Warning: Mucking about with embedded controllers may not always be a great idea; you run the risk of permanently damaging hardware. A month or two after installing this on my machine, the bearings on both fans failed within a week of each other. Admittedly, the laptop was two years old, and it was a an easy fix, but be aware of unintended consequences…
A few years ago I bought a nice laptop (Gigabyte Aero 15x), the idea being that I would have a decent laptop for gaming should I need it. Unfortunately for my wallet, I play less and less games, so the expense was a bit of a waste. I was able to somewhat retrospectively justify having it to play around with CUDA (Nvidia’s general purpose computing language for their graphics cards).
I run Linux full-time on the laptop (first Fedora, now Ubuntu), and it quickly became apparent that the operating system really didn’t care about what was going on with the cooling system and it’s fans. As such, the laptop would be fairly quiet up until the point where the onboard fan controller decided it had enough with whatever I was subjecting the poor thing to, subsequently making enough noise to put a Boeing 747 (R.I.P) to shame.
To be fair, it wasn’t the noise I was concerned about, it was the heat. Reviewing the sensors showed the laptop would be quietly idling at 60 degrees, which was a bit of a shock.
I understand that these days higher temperatures are to be expected (but I don’t actually know what’s reasonable), but it was a little unsettling having the laptop I do all of my work on feel like it was going to burn my fingers off when it did anything of substance. I did try a re-paste, but that had little to no effect.
Having had enough, I did some googling and found “p37-ec”, which demonstrated how you could talk to the embedded controller in various models of Gigabyte laptops. It turns out that it can provide various temperature readings and has an option for enabling and setting custom fan speeds. What I ended up writing was:
- Application that sets the fans to max speed;
- Application for disabling custom fan speeds;
- Application that continuously reads temperatures and sets fan speeds accordingly;
- And finally, a
systemd
unit that wraps the main app.
The source code is available at GitHub
The Algorithm
The main application is straightforward:
- Detects if it’s being run on the correct platform;
- Checks whether the app is being run with super user privileges;
- Sets up logging for journald;
- Registers a bunch of signal handlers for clean shutdown;
- Runs the main loop:
- Reads the CPU temperature;
- Median filters it with a window of 3;
- Detect whether the fan speeds need changing;
- If so, change it
- On exit, disable custom fan speeds
Safely, Safely
p37-ec reads a file that is exposed by a kernel module (ec_sys
) that represents
the laptop’s on-board embedded controller. With this file, you can manipulate
various registers to do various things. However, opening up random files tied to
hardware is a dangerous game; it is very possible for you to flip a bit and
cause actual damage to the machine.
Consequently, we need to make sure that the laptop is the correct one.
Luckily, there are two files we can read to retrieve this information. The files and what we are looking for follow:
/sys/devices/virtual/dmi/id/sys_vendor
: “GIGABYTE”/sys/devices/virtual/dmi/id/product_name
: “P65Q”
Logging
The logging implementation is a little more complicated that it needs to be; I wanted to use the log functions as if they were static, rather than operate on an object, as well as support swapping out different log writers. e.g.:
log::make_and_set_logger<journald_logger>();
log::set_level(log::level::debug);
if (!is_aero15x()) {
log::fatal("This product is not compatible, a Gigabyte Aero15x (P65Q) is required.");
std::exit(1);
}
This involved use of pointers to static objects, using the initialisation on first use idiom. The technique for doing this I found in libstdc++ (GCC) and libc++ (Clang), where they managed default memory resources, however I will save discussing this for another time. Since we are expecting the main application to be run as a systemd unit, we want the output to be sent to journald. While systemd has its detractors, one of the things that is easy to do is logging. journald optionally expects that the log lines are prefixed with a message level, and that’s it. We don’t have to worry about log files and rotations, unless we specify otherwise. The important thing is that there is nothing really special we have to do in the application itself. The above code will produce the following output:
<7>Log level now debug
<7>Vendor is GIGABYTE, product is P65Q
<6>Aero 15x fan controller starting
<6>Changing speed to 195, for temp 70'c, next up 80, next down, 61
Where the levels 6 and 7 are for info and debug, respectively.
Reading and Writing to the Embedded Controller
This is achieved by a simple class that opens the register file and has methods abstracting reads and writes of the registers. There’s nothing particularly special about it, for a given function, it seeks to the location in the register file and either reads or writes data.
The register addresses in the p37-ec repository don’t actually marry up with what they actually are for the Aero 15x. Luckily, Notebook Fan Control, has the correct addresses.
Change Detection
The initial state is max; this will eventually settle to the correct state. It is better to take a cautious approach when starting; if the system is already melting down, it makes little sense to have to wait before the fans are set high. Also it’s simpler than scanning through all the states to find the best one to start with. (Although arguably this could still happen if the fans were at the lowest state and you threw it into the sun, or started Chrome.) The values of the states have been hardcoded, mainly because I thought it would be dangerous to allow a user to supply potentially dangerous values.
The states aren’t particularly clever, I currently have them hard coded as:
State | Up | Down | Fan Speed |
---|---|---|---|
max | 255 | 75 | 229 |
high | 80 | 65 | 195 |
medium | 70 | 55 | 160 |
low | 60 | 45 | 125 |
min | 50 | -255 | 90 |
For a given state, if the temperature is greater than or equal to the “up” temperature, it will jump the next highest state, next lowest for “down”. Down temperatures are lower than the up temps so that we know we are a decent distance away from going back up and therefore at little risk of immediately changing back up again. For the max and min states, up and down values are ignored.
The down threshold is quite a bit lower then the next lowest states up threshold. This is to prevent flip flopping where the temperature hovers around one of the thresholds and the detector is flipping between two states.
Signal Handlers
The same handler is bound to SIGINT, SIGTERM, and SIGHUP which basically terminates the loop. While I was familiar with the former two, SIGHUP was new to me. This is received when whatever terminal runs the app “hangs up”, typically by it being closed. Without the handler, the program would be terminated without disabling the custom speeds – a potentially dangerous situation if the speeds were set low and the laptop started heating up.
Filtering
While monitoring the CPU temperatures using command line tools, I noticed what I suspected to be erroneous transient readings. It would occasionally jump up 20 degrees one second, and drop a second later. Median filters are well known for removing outliers, and a window of three was chosen so that if two consecutive high temperatures were recorded, a change was guaranteed; a balance between preserving quietness and responding quickly to dramatic changes.
systemd
The systemd unit file is responsible for describing how the service is started, stopped and any preconditions that ned to be met. The entirety of the file follows:
[Unit]
Description=Aero 15x Agressive Fan Controller
[Service]
ExecStartPre=-/sbin/modprobe -r ec_sys
ExecStartPre=/sbin/modprobe ec_sys write_support=1
ExecStart=/usr/bin/aero15x-fand
ExecStop=/bin/kill -INT ${MAINPID}
KillSignal=SIGINT
[Install]
WantedBy=multi-user.target
This service requires that the ec_sys kernel module is loaded. The reason why modprobe is called twice is because we need to make sure it’s inserted with the correct parameters. The first call removes the module; it’s prefixed with a hyphen so that if the module is not loaded, the error that’s produced is ignored.
Results
So this will be pretty brief; with the fan service running, the laptop will idle at around 45 degrees, and web browsing and youtube will bring it up to around
- While it is more noisy, it is much cooler to the touch, and far more
- comfortable to work with.