32-bit Cortex-M Selection

Selecting 32-bit Cortex-M MCUs

Picking a 32-bit Cortex-M part is an odd exercise, since the core at the center of it is the same Arm design whatever name is printed on the package. The work sits in everything a vendor wrapped around that licensed core, and in the decision about whose silicon and whose tools a team wants to live with for the life of the product.

One core, a long list of names

Arm does not make chips. It licenses the Cortex-M core to the companies that do, so the same M4 or M0+ shows up inside parts from ST, NXP, Renesas, Microchip, Nordic, and a dozen smaller names. What rides along with the core is a shared instruction set and one debug interface over SWD. A developer who has worked one Cortex-M part is well on the way to working any of them, which is a large part of why the architecture spread the way it did.

The core is where the common ground ends. The clock tree that sets a baud rate is laid out differently on each part, the peripheral that drives a motor answers to its own set of registers, and the low-power modes are entered through a sequence the datasheet spells out one part at a time. None of it carries across without a rewrite, and that is the quiet reason a vendor choice tends to outlast several product generations once it has been made.

This is why core portability is a half-truth to keep in mind. The compiler output and the debug flow move from one vendor to another without much fuss. The driver layer that talks to the peripherals, often the larger half of the firmware by line count, does not move at all, and a team that treats the move as free finds that out on the bench.

Selecting a Cortex-M part comes apart into three questions a team answers in some order. Which core, from the small M0+ to the heavy M7, the design needs. Which vendor’s catalog and ecosystem it commits to. And which exact part, in which package and memory size, the board takes.

The order is not fixed, and the order is where teams go wrong. A design with a radio to fit or a security box to tick settles the vendor first and lets the core fall out of that. A product being carried forward keeps the vendor it already knows and hunts for the nearest part. The trap is settling the core spec first, on a benchmark, then learning that the vendor who has that core lacks the peripheral or the supply the design also needs. A part picked for raw speed that turns out to carry one CAN controller where the design needed two, or that runs at a fraction of the volume the project will order, sends the whole search back to the start with weeks already gone.

The tools that travel with the core

A second thing comes with the Cortex-M core, and it shapes the choice as much as the silicon. The debug port is the same two-wire SWD on every part, so one probe and one debugger reach a chip from any vendor. The CMSIS headers give the core’s registers the same names everywhere. A GCC, IAR, or Keil build targets all of them from the same project with little more than a device header swapped.

That common floor is real money saved. A team’s years in a debugger, its build scripts, and its habits around the toolchain survive a vendor change in a way the peripheral code never does. An RTOS that speaks the CMSIS-RTOS interface ports the same way, sitting above the core rather than the chip.

Where the vendors split apart is the layer above that floor. Each ships its own configuration tool and its own peripheral library, ST with CubeMX and the HAL, NXP with MCUXpresso, Renesas with e2 studio, and each has its own notion of how a clock tree or a DMA channel is set up in code. That layer is large, the quality of it ranges from polished to barely tested, and it is where the bring-up weeks go. A vendor whose library is mature and whose example projects compile out of the box saves more calendar time than a faster core ever returns, and that is hard to read off a datasheet before an evaluation board is on the bench in front of you.

The debug probe is a small decision that rides along and is easy to forget. A J-Link or an ST-Link, the SWD header on the board, and the trace pins a hard real-time problem will want later are all settled at layout, when a crowded board has the least room to spare for any of them.

Starting at the STM32

An STM32F103 board: ST’s Cortex-M3 in an LQFP package, the part a lot of Cortex-M projects open with. (Photo: Wikimedia Commons.)

ST’s STM32 line is where a great many Cortex-M designs begin, and for defensible reasons. The portfolio runs from the cents-each G0 up through the F4 workhorses to the H7 performance parts, all on one toolchain and one set of habits. CubeMX generates the clock and pin setup, the HAL gives a running start on the peripherals, and the community has hit the bugs before you reach them.

The selection inside the STM32 range is a task of its own, since the series names hide real differences in core and in analog from one line to the next. A G0 and an H7 share almost nothing but the maker. Working through STM32 selection and the pin-compatible alternatives from GigaDevice or Artery that cover a shortage is where that choice gets made, part number by part number, against the memory and the peripheral mix a design will lean on and against what the part costs at the volume the product ships.

The easy start is also the lock-in, and that is the thing to see clearly before a project commits. Adopting the STM32 means adopting CubeMX and the HAL with it, and that abstraction layer is both the running start and a weight the project carries from then on, generating code that works but reads badly and hides the registers it sits on. The community and the board support are a genuine asset, the kind that turns a two-week bring-up into a two-day one. What the choice quietly buys is ST’s peripheral model: the firmware grows a driver layer shaped around ST’s way of doing a timer or a DMA channel, and that layer is the part that does not survive a move to another vendor’s Cortex-M, even when the core would. The bill for it arrived in full during the 2021 shortage, when STM32 lead times stretched past a year and parts on the shelf sold at broker prices. Teams that came through had either qualified a register-compatible part from GigaDevice or Artery ahead of time, close enough to drop in but different enough in analog trim and clock behavior to need a real re-test, or they had an independent distributor holding stock when the catalog houses could not ship. The ones who had done neither waited it out or paid the broker price. The lesson stuck, and lining up a second source for a Cortex-M part is now part of the design review on anything meant to ship in real volume.

The pull toward ST is real and usually the right call. It earns a second look only when a design carries a need ST does not serve cleanly.

When another vendor wins

A Nordic nRF52832, a Cortex-M4 with a Bluetooth radio on the die, soldered into a bike computer. The same core ships well beyond ST. (Photo: Wikimedia Commons.)

A few needs send a design past ST’s catalog. A radio on the die is the common one: Nordic’s nRF52 puts a Cortex-M4 next to a Bluetooth Low Energy radio, which is half of what a small connected product is built from. Ultra-low-power budgets and certain automotive or motor-control peripherals pull the same way, toward a vendor that built the part around the need. A metering design that has to run for years on a coin cell, or a motor drive that wants the timers and the fault inputs already wired for it, ends up where the silicon was shaped for that job and not where the catalog is widest. The whole field of Cortex-M families beyond the STM32 is where that search goes once a design names something ST does not lead on.

Leaving ST means leaving the ecosystem with it. The new vendor’s HAL and its thinner community are the price of the part that fit, paid back in bring-up time and in the bugs nobody has documented yet. The gain has to clear that cost, which a radio or a standby-current figure often does, and a benchmark edge of a few percent does not.

Where the core spec fits in

The core spec is the axis to settle late, after the vendor and the supply are known, because it is the easiest of the three to change on paper and the one teams over-think. A large share of general designs land on an M0+ for cost or an M4 for headroom, and the steps between the M0+, M3, M4, M7, and M33 map onto real needs rather than a number to win on.

An M4 or an M7 brings a floating-point unit and DSP instructions that motor control and audio work lean on, and that a plain control loop never touches. An M7 adds cache and clock speed, with a determinism cost that a tightly timed loop has to plan around. An M33 adds the TrustZone boundary a security certification asks for. Below those headline features sit the details that settle a real design, like interrupt latency and the memory protection unit, which matter more to a control system than another handful of megahertz. Sorting out which Cortex-M core and the selection criteria that apply to a given design is a study of its own.

Over-speccing the core is a quiet tax: an M7 costs more and draws more than the M0+ a simple job needed, on every unit shipped for the life of the product. Speccing it too low is the louder mistake, the one that forces a redesign when the firmware outgrows the part a year in. The core gets picked between those two errors, and the room to get it right is wide once the vendor and the part around it are settled. A loop running at a tenth of the headroom an M4 already offers has no use for an M7, and saying so at the first review saves the cost of finding it out at volume.

What a team is choosing

The core is a commodity. The vendor, the tools, and the supply behind it are what a team buys.