python vs c++ vs rust — a personal adventure

As I’ve written about many times before, I write embedded software for embedded devices, what are now known as internet-of-things devices, or IoT devices.

Embedded development goes back as far as regular computing, at least to when the first mini-computers were released, because mini-computers were small enough and inexpensive enough (relative to mainframes) to be purposed towards specific tasks and then left alone to run unattended. As the computers continued to shrink in size and grow in capability, embedded system design leveraged those advances, including when Ethernet first showed up on systems. Why networked embedded systems suddenly became the Internet of Things makes no sense, considering that networked embedded systems began to appear when DEC (Digital Equipment Corporation) began to ship VAXen with DECnet Phase IV, going back to the early 1980s. While VAXen were rather large, there were microVAXes and smaller PDP-11/44s in particular that wound up in unattended control systems.

The real advances began with VME (1979) and other “big board” systems. As for me, the really interesting and affordable evolution began with PC/104 in the early 1990s. Boards got a lot smaller and more capable, while older designs kept getting cheaper. That evolution continued relentlessly until today, when I can go onto Amazon or Adafruit and purchase single board computers about the size of a pack of chewing gum with embedded 32-bit SoCs for anywhere from US$4 up to around US$25. Today I work with Raspberry Pi Picos as well as a number of Espressif ESP32-S3 and ESP32-C3 developer boards.

What’s key about these little boards is how they communicate with personal computers. They’ve standardized on using USB as the method for communicating with a host computer. All of my boards show up on my systems (macOS, Linux, and Windows) as a terminal device. For example, on Linux one of the devices might show up as /dev/ttyACM0, another as /dev/ttyUSB0, and so forth. I need to know the device ID so I can open up those devices and work with them, whether I’m using something like screen, or programming tools such as Thonny or the Arduino IDE.

Life is simple when you only have one device plugged in. It’s real obvious what the device is. Life gets more complicated when you have more than one. At one time I had eight plugged in and running, and I can assure you that trying to keep them all straight came to be a challenge. To start I wrote a very small bit of shell script to search for any and all connected boards:

alias devices='ll /dev/tty[A,U]*'

When I typed devices at a shell prompt, I’d get the following:

crw-rw---- 1 root dialout 166, 0 Dec 31 17:27 /dev/ttyACM0crw-rw---- 1 root dialout 188, 0 Dec 21 21:00 /dev/ttyUSB0crw-rw---- 1 root dialout 188, 1 Dec 23 11:37 /dev/ttyUSB1crw-rw---- 1 root dialout 188, 2 Dec 21 21:00 /dev/ttyUSB2crw-rw---- 1 root dialout 188, 3 Dec 25 20:06 /dev/ttyUSB3

Looks good, except for one small problem. What is actually attached to each port? I know through experience that the ACM0 port is connected to a Raspberry Pi Pico W. The other four are Espressif ESP32 developer boards. The Pico is running CircuitPython 8 beta 6. The ESP32 developer boards are running diverse software loads, only one of which is Micropython, while the other three are C++ applications I’ve compiled with the ESP-IDF and flashed on the various boards. So I decided to document which boards were on which port in a small spreadsheet. Worked good until one day, when I’d unplugged one of the boards to add some components to it, Duke Energy decided to drop power to the house. When power returned and I plugged the developer board back in, all the Espressive ports had been scrambled. What used to be on USB0 was on USB2, USB1 had dropped down to USB0, etc, etc. So I went looking deeper into the Linux kernel and how it manages serial devices, and learned that all those USB devices were listed down /dev/serial/by-id. So here’s what a full directory listing looks like:

lrwxrwxrwx 1 root root 13 Dec 31 17:27 usb-Raspberry_Pi_Pico_W_E6614C311B633831-if00 -> ../../ttyACM0lrwxrwxrwx 1 root root 13 Dec 21 21:00 usb-Silicon_Labs_CP2102N_USB_to_UART_Bridge_Controller_7eab1642dbfaeb1198773ca4c6d924ec-if00-port0 -> ../../ttyUSB2lrwxrwxrwx 1 root root 13 Dec 21 21:00 usb-Silicon_Labs_CP2102N_USB_to_UART_Bridge_Controller_8698c1b85519ec11982039bff95a09b1-if00-port0 -> ../../ttyUSB3lrwxrwxrwx 1 root root 13 Dec 23 11:37 usb-Silicon_Labs_CP2102N_USB_to_UART_Bridge_Controller_d0ed30374c18ec1199b543103803ea95-if00-port0 -> ../../ttyUSB1lrwxrwxrwx 1 root root 13 Dec 21 21:00 usb-Silicon_Labs_CP2102N_USB_to_UART_Bridge_Controller_e08d862b0867ec118f12a17089640db2-if00-port0 -> ../../ttyUSB0

Note that those are symlinks to the devices. So now I had an idea for a utility to write.

I’ve spent the last few weeks tinkering with three different languages to create a small utility to list the information out of /dev/serial/by-id, and process the output as strings to list in a cleaner way. I also wanted to get that information to create some simple data structures, especially the long hexadecimal ID which does not change. The terminal devices may change, but the hex IDs remain the same.

Here’s the first utility I wrote in Python. Took me about an hour of tinkering (where I was interrupted by Life a few times; remember I’m retired and this is supposed to be a hobby).

#!/bin/env python3import os, reDEVICE_PATH = '/dev/serial/by-id/'devices = {}for device in os.listdir(DEVICE_PATH):link_name = os.path.basename(os.readlink(DEVICE_PATH + device))re_result = re.search(r'_[0-9A-Fa-f]+-', device)interface_name = device.split(re_result[0])[0].replace('usb-','').replace('_',' ')hex_str = re_result[0].lstrip('_').rstrip('-')item = [interface_name, hex_str]devices[link_name] = itemfor device in sorted(devices.keys()):values = devices[device]print("{}, {}, {}".format(device, values[0], values[1]))

And here’s the output it produces when it’s run on my system.

ttyACM0, Raspberry Pi Pico W, E6614C311B633831ttyUSB0, Silicon Labs CP2102N USB to UART Bridge Controller, e08d862b0867ec118f12a17089640db2ttyUSB1, Silicon Labs CP2102N USB to UART Bridge Controller, d0ed30374c18ec1199b543103803ea95ttyUSB2, Silicon Labs CP2102N USB to UART Bridge Controller, 7eab1642dbfaeb1198773ca4c6d924ecttyUSB3, Silicon Labs CP2102N USB to UART Bridge Controller, 8698c1b85519ec11982039bff95a09b1

You can see how I reordered the data as well as cleaned up the device names by replacing all the underscores with white space. I’m not about to present this little Python utility as perfect idiomatic Python. I’m not interested in being called out over how this or that isn’t truly Pythonic. Python is, for me, a very easy prototyping language, a tool to make tools.

Having created a tool to give me the gist of what I wanted, I could have gone further in Python and added more features. Instead, I decided to implement the same functionality in C++ and Rust. Let it be known that I was going to implement it first in Rust and then C++, but I ran into so may problems trying to research what was Rust’s “simple” way to manipulate strings that I gave up in frustration and turned to C++. Here’s the C++ equivalent.

#include <string>#include <array>#include <map>#include <algorithm>#include <iostream>#include <filesystem>#include <regex>using std::string;using std::replace;using std::cout;using std::filesystem::directory_iterator;using std::filesystem::path;using std::filesystem::read_symlink;using std::regex;using std::regex_search;using std::smatch;using std::map;using std::array;using std::pair;void split(const string &subj, const regex &rgx, array<string, 2> &vars) {  smatch match;  regex_search(subj, match, rgx);  vars[0] = subj.substr(0, match.position(0));  replace(vars[0].begin(), vars[0].end(), '_', ' ');  vars[1] = match.str(0).erase(0,1);  vars[1].erase(vars[1].end()-1);}int main() {  const string DEVICE_PATH{"/dev/serial/by-id/"};  map<string, array<string,2>> devices;  regex hregex{"_[0-9A-Fa-f]+-"};  for (const auto &entry : directory_iterator(DEVICE_PATH)) {if (entry.is_symlink()) {  string slink = read_symlink(entry).stem().generic_string();  string sname = entry.path().stem().generic_string().erase(0,4);  array<string, 2> results;  split(sname, hregex, results);  devices[slink] = results;}  }  for (auto [device, values] : devices) {cout << device << ", " << values[0] << ", " << values[1] << "\n";  }  return 0;}

Lots of C++ code. Considerably many more lines of C++ than Python. If you’re going to get shocked about how much longer it is, don’t. To start with, there’s 11 lines of book keeping code (lines 9 to 20) that I prefer to use instead of a blanket using std that I’ve seen at the beginning of too much C++ source code. So I create a list of explicit usings in order to explicitly list everything individually. I have never cared for C++’s namespace notation, so explicit listing is a compromise for me. When run it produces the same results as the Python version.

One appreciated C++ map container feature is that the entries are stored pre-sorted. When I read them out I don’t have to explicitly pull out the keys, sort them, and then read the values from the map using the sorted keys.

Now we come to the Rust version. Let’s list it out for all to see.

use std::io;use std::fs;use regex::Regex;use std::collections::HashMap;fn main() -> io::Result<()> {const DEVICE_PATH : &str = "/dev/serial/by-id/";let hregex = Regex::new("_[0-9A-Fa-f]+-").unwrap();let trimchars: &[_] = &['_', '-', '.', '/'];let mut devices: HashMap<String, [String; 2]> = HashMap::new();for entry in fs::read_dir(DEVICE_PATH)? {// Split the extraction of file_name from path.// Combine the two, as you might naturally think to do in// another language, and you'll generate an error saying that// entry.path() creates a temporary that is freed while still// in use, meaning down the chain after the call to path().// This is a bug, not a feature.//let path = entry?.path();let device = path.read_link().unwrap();let device = device.to_str().unwrap().trim_matches(trimchars);// To give credit where credit is due:// Question: How do I print an OsStr without the quotes?// https://stackoverflow.com/questions/70266860/how-do-i-print-an-osstr-without-the-quotes// Who would have thought this was the answer to how// to extract the string from the file name?//let filename = path.file_name().unwrap().to_str().unwrap();let filename = filename.trim_start_matches("usb-");let caps = hregex.captures(filename).unwrap();let mut hexval = caps.get(0).unwrap().as_str();hexval = hexval.trim_matches(trimchars);let mat = hregex.find(filename).unwrap();let filename = filename.split_at(mat.start()).0;let filename = filename.replace("_", " ");devices.insert(device.to_owned(),   [filename.to_owned(), hexval.to_owned()]);}let mut sorted: Vec<_> = devices.iter().collect();sorted.sort_by_key(|a| a.0);for (device, values) in sorted.iter() {println!("{}, {}, {}", device, values[0], values[1]);}Ok(())}

The shine has gone off Rust (no pun intended), at least for me. I got caught up in the hype, but this little exercise (and it is a little exercise) has pretty much sand blasted that away. It took me about an hour to write the Python version of this utility, and then a day for the C++ version. It took me nearly a week to learn everything I needed to write it in rust, even though the number of source lines is about half-way between the Python version and the C++ version. Why?

  1. Rust is constantly evolving, which isn’t necessarily a bad thing. It gets bad, if not worse, because the documentation and examples aren’t keeping up with the evolution. For example, see the declaration of the HashMap on line 10. The documentation says that all I have to do is declare just the HashMap, and that the compiler will determine the necessary types for the HashMap when you perform an insert into the HashMap. Nope. I had to explicitly declare it with the key and values to get it to compile. This is just one of many little documentation gotchas I kept running into.
  2. I don’t know who came up with the design, but having to get the string value for a directory path by using “.unwrap().to_string().unwrap()” is all sorts of wrong. I left a comment right above that section of code, along to where I found a solution for it. Other languages manage to offer a simple convenience function, usually called to_string() or toString(), that will return the string representation of the class or structure. But not in Rust, it would appear.
  3. Function chaining is a powerful feature, supported by all languages I know of. Unfortunately it doesn’t work all the time for Rust. The Rust designer(s) seem obsessed with “ownership” above all else, which seems to break the expected feature of function chaining. Start reading my comment from line 13 on down to the code. This fits in with (2) above, it would seem.
  4. The odd difference between tuple indexing (.#) vs regular indexing ([#]). I used both, and I still don’t know why I couldn’t use just one indexing form throughout.

I will say that I appreciate some of the string manipulation functions, such as trim_matches(…) … once you get a string out to work with. Sorting a map’s keys and then reading out the (key, value) pairs is a bit odd (line 42).

At this point I think I’ll stick with Python and C++ (using the C++17 and C++20 standards), and let Rust continue to evolve for a while. Right now, as far as I’m concerned, Rust looks to be little more than C++ poorly implemented.