Linux Magazine - January 2024 USA
Linux Magazine - January 2024 USA
Linux Magazine - January 2024 USA
FR D
+
DV
EE
on a Raspberry Pi
Scientific
Computing
with a Bitcoin mining rig
DEUCE COUPS
Dear Reader,
What a busy weekend in tech news. On Friday, we heard Of course that is the charitable view of the board’s action.
that OpenAI, creators of ChatGPT, had fired CEO Sam A darker (and equally speculative) view is that nonprofit
Altman, and by Monday, he had already found a new job boards can sometimes be highly dysfunctional, with a lot
at Microsoft, along with cofounder Greg Brockman. More of their own internal power games and politics, and
than 700 OpenAI employees signed a letter saying they maybe the intrepid Altman was simply unable to steer
would quit – and quite possibly jump to Microsoft – if the around a raging Charybdis of group think.
OpenAI board didn’t hire Altman back and resign. Microsoft The whole story hung in a state of uncertainty for two
said Altman and Brockman would lead Microsoft’s new ad- days; then lightening struck again: OpenAI hired Altman
vanced AI research team. OpenAI, on the other hand, went back. Was this a third coup, or the undoing of a previous
into free fall, announcing an interim CEO whose tenure coup? Microsoft gave the new plan its full support. OpenAI
lasted for two days before another CEO was named. ditched three of the four board members who voted for
Wall Street was very happy for Microsoft, driving the share Altman’s ouster (including the only two women), and the
price to a record high. Meanwhile, OpenAI was roundly new board has pledged a full investigation into what hap-
condemned – both for firing Altman and for the way they pened. We might need to wait for that report to know all
did it. The word on the street was that Microsoft pulled off the details of the internal struggle that led to this unex-
a “coup” by snagging Altman, Brockman, and whoever pected whiplash festival, but one thing seems clear:
else they can pull over. Altman and others also referred to Altman and the full-steam-ahead faction is the winner and
his ousting by the OpenAI board as a “coup,” with a very the proceed-with-caution faction is out in the cold. Ousted
different spin on the term. Two coups in four days is a lot – board member Helen Toner, for instance, recently co-au-
even at the frenetic pace of IT. thored a paper that warned of a possible “race to the
bottom,” in the AI industry, “in which multiple players feel
From a business viewpoint, Microsoft was simply capitaliz-
pressure to neglect safety and security challenges in order
ing on an opportunity – and acting to protect their invest-
to remain competitive” [1]. Some are now saying that
ment, because they had acquired a large stake in OpenAI
paper helped to stir up the skirmish in the first place.
earlier this year and couldn’t afford to watch the company
self-destruct. But it is worth pointing out that this really isn’t Why did Microsoft let Altman go back? It isn’t like them
all from a business viewpoint. OpenAI is actually ruled by a to surrender the spoils of victories. Keep in mind that the
nonprofit board controlling a for-profit subsidiary. The ques- competition is heating up. Amazon just announced its
tion of what is better for OpenAI’s business interests, which Olympus AI initiative, and Google, Meta, and several other
seems to be the fat that everyone is chewing on, might not tech giants are all working on their own AI projects. Micro-
be the best context for understanding these events. soft is already committed to building OpenAI’s technology
into its own products, and they might have realized that,
Altman’s disagreement with the board appears to have
by the time the exiles settle into their new workspace and
been about the pace of development and the safety of
get down to training models and producing real software,
the tools the company has developed. OpenAI’s vision is
their head start might already be gone.
supposed to be to develop AI “for the benefit of human-
ity,” which is very admirable, but it leaves lots of room OpenAI has regained its footing as a business, but as a
for interpretation. Altman, in particular, has occupied an nonprofit devoted to serving humanity,
ambiguous space in the press, at once warning about the it appears to have fallen off its ped-
dangers of AI and simultaneously pledging to press estal, or at least, dropped down to a
ahead with development. No doubt he felt confident that lower pedestal. I fear the biggest
he was laying down sufficient guardrails along the way, loser in all this might be the opti-
but that is something to communicate with your board mistic OpenAI vision of a nonprofit
about, and it sounds like he wasn’t communicating to innovator taking a principled stand
their satisfaction. Should the board have trusted him and for methodical and safe develop-
let him forge ahead, knowing that the company was on a ment of these revolutionary tools.
roll and potentially on the verge of further innovations? Note to governments: Now might
If they were a garden-variety corporate board, possibly be a good time to provide some
yes, but as a board member of a nonprofit, you are really meaningful restraints for the AI
supposed to have more on your mind than power and industry – don’t expect them
money. You’re supposed to know when to say “no,” to police themselves.
even if it annoys everyone and stirs up some turmoil.
Info
[1] “Decoding Intentions: Artificial Intelligence and Costly Signals,”
by Andrew Imbrie, Owen J. Daniels, and Helen Toner: Joe Casad,
https://cset.georgetown.edu/wp-content/uploads/ Editor in Chief
CSET-Decoding-Intentions.pdf
ON THE COVER
26 R for Science 54 PyScript 69 RPi Flight Simulator
This easy-to-learn language Versatile solution for Interface
comes with powerful tools putting Python in a browser. Explore the I2C interface with
for data analysis. this high-flying maker project.
38 Acoustic Keyloggers 65 Teaming NICs 90 Waydroid
Sneaky tools that gather Bundle your network Access Android apps from
information from the sound adapters to speed up remote your Linux desktop.
of typing. access.
NEWS IN-DEPTH
8 News 36 AlmaLinux
• AlmaLinux Will No Longer Be “Just Another RHEL Clone” Recent policy changes at Red Hat have upturned the RHEL
• OpenELA Releases Enterprise Linux Source Code clone community. AlmaLinux charts a new path by shifting
• StripedFly Malware Hiding in Plain Sight as a to binary compatibility and away from being a
Cryptocurrency Miner downstream RHEL build.
• Experimental Wayland Support Planned for Linux Mint 21.3
• KDE Plasma 6 Sets Release Date 38 Acoustic Keyloggers
• Gnome Developers in Discussion to End Support for X.Org Is someone listening in on your typing? Learn more about
how acoustic keyloggers work.
12 Kernel News
• Avoiding Bloat in the Kernel That Does Everything 46 Command Line – neofetch
• Particularly Odd Occurrences of Stardust Display information about your hardware, operating
system, and desktop in visually appealing output.
65 Teaming NICs
REVIEWS Combining your network adapters can speed up network
performance – but a little more testing could lead to better
32 Distro Walk – Immutable Distros choices.
Immutable distributions offer a layer of added security.
Bruce explains how immutable systems work and discusses
their benefits and drawbacks.
some analysis of our own with the R 81 Compressing Files with RAR
programming language. The non-free RAR compression tool offers some
benefits you won’t find with ZIP and TAR.
84 FOSSPicks
MakerSpace This month Graham looks at osci-render, Spacedrive,
internetarchive, LibrePCB 1.0.0, and more!
@linuxmagazine
09 • OpenELA Releases
Enterprise Linux Source
As my favorite band, Rush, once said, in Circumstances, “plus ça change, plus c’est
la même chose.” In other words, the more that things change, the more they stay
the same.
Code
• StripedFly Malware But this time around, AlmaLinux isn’t happy with staying the same… especially
Hiding in Plain Sight as with regards to remaining in lockstep with Red Hat Enterprise Linux (RHEL).
a Cryptocurrency Miner With the upcoming release of AlmaLinux 9.3, those who have become fans of the
• More Online distribution should expect change. This new release will not rely on RHEL Linux
source code. Instead, AlmaLinux 9.3 is built from the CentOS Stream repositories,
10 • Experimental Wayland
Support Planned for
which is upstream from RHEL.
What does this mean for users? AlmaLinux 9.3 will most likely not change all that
Linux Mint 21.3 much. The distribution will continue supporting x86_64, aarch64, ppc64le, and
• Window Maker Live s390x architectures and will likely no longer release days after RHEL.
0.96.0-0 Released According to benny Vasquez (https://almalinux.org/blog/future-of-almalinux/ ),
• KDE Plasma 6 Sets AlmaLinux OS Foundation Chair, “For a typical user, this will mean very little change
Release Date in your use of AlmaLinux. Red Hat-compatible applications will still be able to run on
AlmaLinux OS, and your installs of AlmaLinux will continue to receive timely security
11 • Fedora Project and
Slimbook Release the
updates.”
“The most remarkable potential impact of the change is that we will no longer be
New Fedora Slimbook
held to the line of ‘bug-for-bug compatibility’ with Red Hat, and that means that we
• Gnome Developers in
can now accept bug fixes outside of Red Hat’s release cycle,” Vasquez continues.
DiscussiontoEndSupportfor
“While that means some AlmaLinux OS users may encounter bugs that are not in
X.Org
Red Hat, we may also accept patches for bugs that have not yet been accepted
upstream or shipped downstream.”
AlmaLinux 9.3 is now available to download (https://almalinux.org/get-almalinux/ ).
ADMIN HPC
OpenELA Releases Enterprise Linux http://www.admin-magazine.com/HPC/
Source Code Managing Storage with LVM
• Jeff Layton
OpenELA was formed by CIQ (the company behind Rocky Linux), Oracle, and SUSE Managing Linux storage servers with the
with a singular purpose: “... to encourage the development of distributions compatible Linux Logical Volume Manager.
with Red Hat Enterprise Linux (RHEL) by providing open and free enterprise Linux
source code.” And the initial release of the OpenELA source code is now available ADMIN Online
(https://github.com/openela-main). http://www.admin-magazine.com/
But why is this happening? According to CIQ (https://ciq.com/blog/ciq-oracle-and- Cost Management for Cloud Services
suse-launch-openela/), “The decision to establish OpenELA wasn’t made in isolation. • Holger Reibold
It was a direct answer to the challenges posed by Red Hat’s recent policy shifts. At Cost management for clouds, containers, and
CIQ, we’ve always believed in the power of collaboration and open access.” hybrid environments tends to be neglected
The site continues, “By teaming up with Oracle and SUSE, we’ll be able to provide for reasons of complexity. The open source
the community with the tools, resources, and most importantly, the source code they Koku software shows some useful approaches
need through OpenELA. With OpenELA, both upstream and downstream communities to this problem, although the current version
can fully leverage the potential of open source, from independent upstream projects still has some weaknesses.
through the delivery of compatible and standards-based Enterprise Linux derivatives.” Help Desk with FreeScout
The code (found at the prior OpenELA GitHub page link) contains all of the basic • Holger Reibold
packages for building an Enterprise Linux OS. Keep in mind, however, that the code is The free version of FreeScout offers all the
still very much a work in progress and some of the code has yet to be made public features of a powerful and flexible help desk
(due to OpenELA's continued removal of all Red Hat branding/trademarks). environment and can be adapted to your
At the moment, both Oracle and SUSE are planning on releasing their enterprise requirements with commercial add-ons.
distributions based on OpenELA, and the Rocky Linux Software Foundation is
How to Query Sensors for Helpful Metrics
considering the same.
• Andreas Stolzenberger
Discover the sensors that already exist
StripedFly Malware Hiding in Plain Sight on your systems, learn how to query their
information, and add them to your metrics
as a Cryptocurrency Miner dashboard.
Attention Linux Users: A malicious framework has been active for five years and has
been incorrectly classified as a Monero cryptocurrency miner.
StripedFly uses very sophisticated TOR-based methods to keep the malware hid-
den and uses worm-like capabilities to spread its nasty payload from Linux machine
to Linux machine (or Linux to Windows and vice versa).
No one is certain if StripedFly is being used for monetary purposes or straight-up
cybersecurity attacks (for information gathering). What is clear is that it’s an ad-
vanced persistent threat (APT) type of malware.
The earliest known version of StripedFly was identified in April 2016 and, since
then, it has infected more than a million systems. The StripedFly payload features a
customized TOR network client that works to obfuscate communication to a C2
(command and control) server, as well as the ability to disable SMBv1 and spread to
other hosts via SSH and EternalBlue.
When StripedFly infects a Linux system, it is named sd-pam and uses both
systemd services and a special .desktop file to keep it persistent. It also modifies
various Linux startup files such as /etc/rc*, .profile, .bashrc, and inittab.
You can read Kaspersky’s in-depth analysis of StripedFly at https://securelist.com/
stripedfly-perennially-flying-under-the-radar/110903/ . At the moment, patches to
mitigate against StripedFly have yet to be released for Linux, but you can be certain
your distribution of choice will be releasing the fix as soon as it is made available.
In the meantime, do everything you can to avoid phishing or visiting known mali-
cious websites, keep your systems up to date, and use a password manager.
In addition, you’ll find support for automatic bug reporting in DrKonqi, autostart
KCM shows details about entries, no more chunky page footers in System Settings,
completely reorganized sidebar in System Settings, smoother mouse wheel scroll-
ing in apps based on QtQuick, and the floating panel will be now the default.
The biggest change, however, is that Wayland will be the default graphics
stack (over X.Org). One nice touch that has been added is that distributions can
now customize the first page in the Welcome Center.
Of course, there will also be the usual bug fixes and security updates.
There will also be a new task switcher for KDE Plasma, making it much easier for
users to multitask.
You can read all about the upcoming changes to KDE Plasma in Nate Graham’s
official blog (https://pointieststick.com/2023/05/11/plasma-6-better-defaults/ ).
at that point you have an argument for year or two after the new code is added.
the code. But after a few years, we could delete it
“As it is, right now I look at that code too.” And in a subsequent email, he also
and I see extra BS that we’ll carry around said, “I’ll keep the code around locally,
forever that helps *zero* users, and I find and if vfs ever changes and breaks this
it very questionable whether it would code where this file helps in solving it,
help you. I’ll then do another pull request to put
“And if you really think that people this file upstream ;-).”
need to know what the events exist in And the thread ended there.
eventfs, then dammit, make ‘readdir()‘ This was a short debate and probably
see them. Not some stupid specialty fairly low-cost, because it didn’t repre-
debug interface. That’s what filesystems sent a huge amount of effort on Ste-
*have* readdir for.” ven’s part – he simply packaged up
But Linus replied to himself a couple some debugging code that had recently
of hours later, with a slightly different proven useful. So the rejection from
take. He said: Linus didn’t cost Steven very much. But
“Alternatively, if you have noticed that it’s very interesting to me personally the
it’s just a pain to not be able to see the way Linus balances the needs of devel-
data, instead of introducing this com- opers against the needs of the rest of us.
pletely separate and illogical debug inter- The Linux kernel project is completely
face, just say ‘ok, it was a mistake, let’s dependent on the contributions of de-
go back to just keeping things in dentries velopers like Steven, while the rest of
since we can _see_ those’. us – aside from possibly submitting a
“Put another way: this is all self-in- bug report once in awhile – are simply
flicted damage, and you seem to argue the beneficiaries. But as far as Linus is
for this debug interface purely on ‘I can’t concerned, Steven’s bit of debugging
see what’s going on any more, the old code, benefitting only developers, had
model was really nice because you could no place in the kernel, even as a rela-
*see* the events’. tively temporary aid until the feature it
“To me, if that’s really a major issue, helped had stabilized. It’s a fascinating
that just says ‘ok, this eventfs abstraction balancing act on Linus’s part, intended
was mis-designed, and hid data that the to keep the Linux kernel – an operating
main developer actually wants’. system that supports virtually every
“We don’t add new debug interfaces piece of computer hardware on the
just because you screwed up the design. planet – from becoming bloated with
Fix it.” extra code that might make it more dif-
Steven remarked with a wry smile, ficult to maintain.
“The entire tracing infrastructure was
created because of the ‘I can’t see what’s Particularly Odd
going on’ ;-) Not everyone is as skilled Occurrences of Stardust
with printk as you.” Recently, a Spectre variant 1 (V1) vul-
He also explained the historical rea- nerability may or may not have appeared
soning behind the current design, say- in the Linux kernel. Spectre V1 is a bi-
ing, “The old eventfs model was too zarre vulnerability that takes advantage
costly because of the memory foot- of CPU optimizations that make a rea-
print, which was the entire objective of sonable guess at the result of condition-
this code patch. The BPF [Berkeley als, so it can begin to execute code along
Packet Filter] folks told me they looked the path that’s most likely to be taken
into use a tracing instance but said it after the conditional is performed. If it
added too much memory overhead to guesses right, it keeps those calcula-
do so. That’s when I noticed that the tions; otherwise, it abandons them and
copy of the entire events directory that starts again along the proper path. And
an instance has was the culprit, and because its guess is generally pretty
needed to be fixed.” good, the CPU tends to save time this
So Steven felt the “design” Linus had way and increases overall performance.
complained about was correct and didn’t The problem is that for those wrong
need to be “fixed.” But he added, “I get guesses, the unneeded calculations
your point. I will agree that this interface aren’t really abandoned at all – they still
will likely be mostly useful for the first leave traces of data behind them (e.g.,
data such as passwords), which mali- with Alexei that the attacker would not anyone propose to mispredict a branch
cious programs can read and use. be able to access the data that worried this way.” But he added that this proba-
When Spectre V1 was discovered, the Luis, he felt that the attacker would in- bly “means that no known speculation
Linux developers patched the kernel to deed have access to parts of those attack was crafted. I suspect that’s a
prevent those data traces from lingering memory addresses. strong sign that the above approach is
or being created in the first place. How- The reason you want to keep Linux indeed a theory and it doesn’t work in
ever, to maintain security, it’s important kernel memory addresses out of the practice.”
that new kernel features and other hands of an attacker is because the ad- Alexei concluded sternly, “So I will in-
patches avoid re-exposing those things. dresses let the attacker make guesses sist on seeing a full working exploit be-
Luis Gerhorst recently identified a about the overall layout of the kernel in fore doing anything else here. It’s good
patch that had previously gone into the system memory. The kernel relies on to discuss this cpu speculation concerns,
Linux kernel as potentially re-exposing Kernel Address Space Layout Random- but we have to stay practical. Even re-
the Spectre V1 vulnerability under cer- ization (KASLR) to prevent such access moving bpf from the picture there is so
tain circumstances. The patch had al- for this reason. This feature loads the much code in the network core that
lowed the kernel to compare the pointers kernel into a random place in system checks packet boundaries. One can find
used to access packets of data sent memory, specifically to prevent attack- plenty of cases of ‘read past skb->end’
across a network – and specifically to ers from knowing where a given part of under speculation and I’m arguing none
allow the size of the data packets to be the system is located, in order to target of them are exploitable.”
variable. According to Luis, it was the that part for an attack. Daniel’s point Luis posted code that leaked some
variability of the packet size that let was that by exposing even a portion of otherwise inaccessible data via Spectre
Spectre V1 rear its head again. those kernel addresses, the kernel V1. But he also acknowledged to Alexei,
If the packets had a fixed size, then would allow the attacker to mitigate the “However, you are right in that there
the kernel could simply check the effect of KASLR protections. So the vul- does not appear to be anything ex-
bounds. But with the variable packet nerability wouldn’t give the attacker di- tremely useful behind skb->data_end,
size, hostile code could load more data rect access to sensitive kernel data like destructor_arg is NULL in my setup but I
beyond the packet itself, which would passwords, but it would help the at- have also not found any code putting a
then be exposed when the kernel ran its tacker identify other potential exploits static pointer there. Therefore if it stays
comparison and the CPU optimized that that they might attempt. like this and we are sure the allocator in-
conditional. Alexei, however, was still not con- troduces sufficient randomness to make
But it’s not as clear as all that! vinced. He felt that the attacker would OOB reads useless, the original patches
Alexei Starovoitov looked over Luis’s still not be able to identify the data it can stay. If you decide to do this I will be
argument and concluded that, in fact, was accessing. Just as the attacker happy to craft a patch that documents
there was no way for an attacker to ac- couldn’t access passwords, the attacker that the respective structs should be con-
tually get access to useful data in this would not be able to access those kernel sidered ‘public’ under Spectre v1 to
particular situation. The attacker, Alexei addresses. make sure nobody puts anything sensi-
said, could indeed expose sensitive However, Luis was not convinced by tive there.”
data. However, because they would not Alexei being unconvinced. He felt that The discussion ended there.
have control over the various pointers he had identified aspects of Spectre V1’s It’s still unclear whether an actual use-
involved, they would not be able to ac- basic vulnerability – things that could in- ful exploit for either Luis’s or Daniel’s
tually read that data in such a way as to deed be exploited. Also, in terms of cases exists. But it’s also true that Alex-
know what data they were reading. Ex- Alexei’s response to Daniel’s specific ei’s approach to this problem seems to
posing the data was not enough! As case, Luis countered, “It is true that this follow Linus Torvalds’s general principle
Alexei put it, “the attack cannot be re- is not easily possible using the method that security fixes must address actual
peated for the same location. The at- most exploits use, at least to my knowl- exploits, rather than people simply im-
tacker can read one bit 8 times in a row edge (i.e., accessing the same address plementing speculative protections that
and all of them will be from different lo- from another core). However, it is still might not actually be needed.
cations in the memory. Same as read- possible to evict the cacheline with skb- Security is an inherently nightmarish
ing 8 random bits from 8 random loca- >data/data_end from the cache in be- topic in software development, in which
tions. Hence I don’t think this revert is tween the loads. […] For a CPU with strange dreamscapes continually seem to
necessary. I don’t believe you can craft 64KiB of per-core L1 cache all 64-byte turn the simplest truths on their heads.
an actual exploit.” cachelines can be evicted by iterating Whatever the most obvious assumption
Daniel Borkmann agreed with Alexei, over a 64KiB array using 64-byte incre- might be, it also might be exactly where
but he felt there could be additional se- ments, that’s only 1k iterations.” a sudden vulnerability will be revealed.
curity vulnerabilities to take into ac- Luis also posted some actual assem- Many strange features and constraints in
count. Specifically beyond the end of a bly code that he felt would leak data in the Linux kernel boil down to the need
given networking data packet, the ker- this case. to avoid particular vulnerabilities. And
nel stored a data structure that con- Alexei acknowledged that “I have to the answer to many of the oddest ques-
tained memory addresses used by the agree that the above approach sounds tions in kernel development is often,
kernel. And although Daniel agreed plausible in theory and I’ve never seen simply, security. Q Q Q
Second Chance
Lots of retired Bitcoin mining computers are showing up on the second-hand
market for cheap. Could these once-impressive machines have a second life
in scientific computing or machine learning?
D
espite the steady increase in computing power from a 1:1 extension of the slot or via an x1 plug-in card that simply
one generation to the next, computers are rarely fast transmits the PCIe signal via an inexpensive USB 3 cable.
enough for their users. Over the years, programmers The PCIe bus, which has been around since 2003, can play
and PC vendors have found ways to speed them up. If host to a number of components, from the WLAN board to the
you know exactly how a computer will be used, you can design graphics card. The speed of the PCIe bus has doubled with
it to maximize performance and minimize cost. each new version of the standard; the current version is 4.0. If
Crypto rigs are created with only one task in mind: to per- you take a look at a motherboard, it is clear that the slots have
form the arcane mathematical computations associated with different widths, which means that different numbers of PCIe
crypto mining. The crypto gold rush has led to a rapid evolu- channels can connect to the card – from x1 (one channel) up to
tion of the technology – a mining unit that was competitive a x16 with 16 times more throughput. (You can also install an x1
few years ago might already be obsolete. For instance, a few card in an x16 slot and vice versa.) The slots are compatible
years ago, mining rigs made extensive use of Graphics Process- with each other up to PCIe 4.0; in other words, systems de-
ing Units (GPUs); in more recent years, Field Programmable signed for different versions can communicate with each other
Gate Arrays (FPGAs) and then Application-Specific Integrated via the standard of the lower version.
Circuits (ASICs) have replaced graphics cards. Crypto mining
has also experienced a bit of a downturn recently due to envi- Power Supply
ronmental fears and instability of the larger economy. The power supply plays an important role in systems that need
As a result of these and other factors, mining rigs are increas- to run continuously. The requirements are very high due to the
ingly ending up on the second-hand market, where you can possibility that several graphics cards could experience peaks
buy them relatively cheaply even if you are not a professional simultaneously (after all, the tasks run in parallel). In just a
user. Could one of these rigs serve another role? few months, you might discover that the electricity bill exceeds
Mining rigs make extensive use of GPUs, and GPUs are well the initial cost of the rig.
suited to scientific computing and machine learning. Several Mining rigs often use second-hand server power supplies to
GPUs in a single computer will boost the potential performance reduce costs. A server power supply is powerful and very energy
many times over for a computation-intensive activity, such as efficient: Most achieve the 80 Plus Platinum efficiency rating
solving a large mathematical problem. (more than 94 percent efficiency at 50 percent load) and are
We decided to buy a used crypto mining rig and see how it often unbeatably cheap to run. However, this kind of power sup-
compares to a higher-end computation-focused commercial ply only gives you 12V and is therefore not suitable for the ATX-
system. This article summarizes our findings. First, however, based motherboards found on many common PCs [2] without
we’ll provide a little background on what you do (and don’t) changes. It is easy to understand why the small PicoPSU power
get when you invest in a used mining rig. converter board [3] has become popular, because it also sup-
ports other voltages. Replicas of the PicoPSU are also available
PCIe Versions from various Chinese manufacturers. These boards are very pop-
Most used mining rigs power regular graphics cards via Periph- ular, especially for home theater PCs or similar devices.
eral Component Interconnect Express (PCIe) [1]. If the board is PicoPSUs and their replicas come with some pitfalls that you
large enough, the rigs can be plugged in right next to each need to watch out for. They mainly provide power on the 12V
other. There are also variants where the motherboard does not rail, which they simply loop through from the power supply. If
offer the slots directly but outsources them to a PCIe back- the consumer requires other voltages, such as 3.3V or 5V (say,
plane. Depending on the version of the motherboard, up to 18 for SSDs), the power supply could fall short. A look at the data-
cards can be addressed. They then no longer fit into a case but sheet reveals a current of 6 amps – not really much, consider-
are connected via extension cables (risers) – either actually as ing that a PCIe card is allowed to draw 3 amps from the 3.3V
usually have no more than 4GB RAM. Better graphics cards offer PCI switches are also available on the server boards, but the
the possibility to interconnect – NVIDIA calls this Scalable Link total number of available PCI channels is higher due to the use
Interface (SLI) or NVLink; CrossFireX is the AMD equivalent. of two processors. Currently, AMD’s PCIe 4.0 standard offers a
This interconnect feature allows multiple cards to act as a single technical advantage on both desktops and servers with twice
large board, reducing communication on the PCIe bus. the transfer speed per channel and a higher number of chan-
Cost optimization is also reflected in the case (Figure 1): The nels provided by the processor.
test rig case is not much more than a galvanized steel box with
a few cutouts for fans (Figure 2). Preparations for cable rout- The Test Candidate
ing, for example, were not needed because everything was We purchased a mining rig with a backplane and separate
plugged into a backplane. If you are thinking about a potential motherboard at auction for EUR750. The system did not work
hardware conversion, you should get used to the idea of using reliably at first. The power supply worked, but it was too loud
a drill, pliers, and a little creativity to work around the limita- and smelled unhealthy. The eight installed NVIDIA P106-090
tions of the case. mining cards from 2018 with PCIe 1.1 x4 were OK. We treated
The processors already provide the PCIe channels. The them to a new case, memory, motherboard, processor and, to
motherboard distributes these channels to the slots – either di- be on the safe side, a new power supply for another EUR350.
rectly or via a PCI switch in professional systems. This design We wanted to compare the performance of this used mining
means that only a limited number of PCIe channels are avail- rig with a high-end professional system. The professional hard-
able. A single card can access the full x16 bandwidth of the ware we chose for comparison was a 2020 system with eight
PCIe channels. However, if there is another card in the slot next NVIDIA A100 cards and PCIe 4.0 x16. The cost for this profes-
to it, each card receives a maximum of x8, and this can drop to sional system was more than EUR75,000, which was 100 times
x1 as you add more cards. more expensive than the mining rig we bought at auction.
Motherboard descriptions often prove to be anything from GPU-focused systems are optimized for computation-inten-
superficial to misleading when they refer to the physical width sive operations, so we wanted to stay with that basic scenario
of the slot instead of the number of channels feeding the card. in our tests. We tested two different use cases:
Figure 2: The case is little more than a galvanized steel box with some fan cutouts and everything plugged
into a backplane.
There were no big surprises in the test. Although the training Conclusions
phase took longer than classifying the taught model, the rela- For data-parallel requirements like BOINC, the eight cards in
tive times of the different hardware configurations matched. the mining rig roughly match the performance of a single pro
We also tried neural networks with different sizes; this had an card, but cost less, even taking into account the higher power
effect on the maximum batch size, but with roughly equal rela- consumption. For machine learning, however, a good, modern
tive speeds of the systems to each other. This is why we are graphics card with plenty of memory is preferable. With the
only showing the figures for training with the smallest model support of the 16-bit floating-point numbers frequently used in
in Figure 3. machine learning compared to integer operations with 8- or
even only 4-bit width, the newer cards
extend their lead.
Cloud services are an interesting foot-
note for this study. Although not every
hosting service provider offers special
GPU computers yet, you can already find
offers with 8 and 16 cards. Prices vary
depending on the number and type of
GPUs selected. In some scenarios, you
might come up with a configuration
where the mining rig serves as a local in-
stallation that is useful for preparing
projects to run on faster systems in the
cloud, as long as you are allowed to
store the data in the cloud and the laten-
cies for data transfer are compatible with
the project goals. Q Q Q
Info
The authors would like to express their
special thanks to the HPC specialist MEG-
WARE GmbH [9]. The company provided
access to its test computers and installed
the P106-090 from the test rig into one of
its systems for direct comparison.
Info
[1] PCIe: https://en.wikipedia.org/wiki/
PCI_Express
[2] ATX: https://en.wikipedia.org/wiki/ATX
[3] PicoPSU: https://www.onlogic.com/
technology/glossary/picopsu/
[4] BOINC: https://boinc.berkeley.edu/
[5] PyTorch: https://pytorch.org/
[6] Einstein@Home:
https://einsteinathome.org
[7] PrimeGrid:
https://www.primegrid.com/
[8] Folding@home:
https://foldingathome.org
Figure 3: The benchmark results for the machine learning test for image [9] MEGWARE GmbH:
classification. https://www.megware.com/en/
QQQ
Method in the
Madness
Data science is all about gaining insights from mountains of data.
We tour some important tools for the trade. By Tom Alby
D
ata is the new oil, and data science is the new refin- lending has worked so far and what data has been collected
ery. Increasing volumes of data are being collected, by in this field – as well as whether that data is actually avail-
websites, retail chains, and heavy industry, and that able – with a view to data protection requirements. In addi-
data is available to data scientists. Their task is to gain tion, data scientists need to be able to communicate their
new insights from this data while automating processes and findings. Storytelling is more useful than presenting infinite
helping people make decisions [1]. The details for how they rows of numbers, because the audience is likely to be made
coax real, usable knowledge from these mountains of data can up of non-mathematicians. The need to clearly explain the
vary greatly depending on the business and the nature of the findings frequently presents a challenge for less extroverted
information. But many of the mathematical tools they use are data scientists.
quite independent of the data type. This article introduces you
to some of the methods data scientists use to squeeze insights Preparing the Data
from a sea of numbers. What sounds simple in theory often requires time-consuming
data cleaning and transformation. Data is not always available
More than Just Modeling in the way you need it. For example, many algorithms require
The term data scientist evokes associations with math nerds, numerical data to be extracted from non-numerical data.
but data science consists of far more than building and opti- To separate the data, the data scientist forms categories that
mizing models. First and foremost, it involves understanding a can be divided using either numerical distances or dummy
problem and its context. variables, where each occurrence of a characteristic (such as
For example, imagine a bank wants to use an algorithm to male, female, and nonbinary) becomes a separate variable. As
predict the probability that a borrower will be able to repay a rule, one variable can be omitted. For example, in this data
a loan. A data scientist will first want to understand how set, someone can only be male if they are neither female nor
nonbinary. However, erroneous user input often results in data the different values. Visualizing these variances typically cre-
points that could bump an algorithm off track. These data ates a dent in the curve – the elbow, where you can read off the
points need to be identified and cleaned up. optimal value for k.
The data scientist also looks for variables that are genuinely
relevant to the model. This is where the information gathered Association Rules
during the understanding phase comes into play. In an explor- Association rules, as used by stores to offer similar products,
atory data analysis, often in a Jupyter Notebook or similar, the are another popular example of unsupervised learning. “Cus-
data scientist generates and documents the findings in order to tomers who purchased X often also look at Y” would be a typi-
share them with colleagues (or at least ensure that the findings cal application of association rules. Working with association
are repeatable). rules usually involves looking at items (e.g., a product in a
store) in the context of transactions, which can also be under-
Choosing a Suitable Model stood as shopping carts or cash register receipts. The Apriori
First and foremost, the choice of algorithm depends on the algorithm is a popular approach because it requires less com-
task. If data capable of training an algorithm is available, putation. Apriori ignores rare items and also the transactions in
data scientists refer to this scenario as supervised learning. which they appear, which means that it has a far smaller data
For instance, if you have access to historical data on loan volume to work through.
defaults, you could use it to predict whether future borrow- Rules with different characteristic values are created from the
ers will repay their loans. The variable used for training is remaining transactions, as a function of the parameters: Sup-
often referred to as the target variable – in this example, this port shows how often a shopping cart occurs in comparison to
is simply whether or not a loan has been repaid. Other ex- all shopping carts (other items can also exist in the shopping
amples would be classifications, whether or not a birthmark cart). Confidence tells you how often an item appears when an-
is indicative of skin cancer, or whether a customer is a other defined item is present. Lift indicates how much more
fraudster. frequently a combination occurs than the independent items.
Rules that have a high lift and at the same time appear fre-
Unsupervised Learning quently enough to be seen by users are of interest.
If data exists but does not contain target variables, then it is
often a matter of finding a pattern in the data, for example, to Supervised Learning
classify customers into segments. This type of machine learn- One of the simplest machine learning models is linear regres-
ing is known as unsupervised learning. One of the most popu- sion. Linear regression has been around since the 19th century
lar algorithms in unsupervised learning, judging from the num- and it is a little like the “Hello World” of machine learning. Fig-
ber of tutorials on the subject, is k-means. The k-means algo- ure 2 shows the occurrences and prices of used SLR cameras
rithm clusters the data (i.e., it breaks the data down into seg- for a specific camera model. The more occurrences, the less a
ments). Roughly described, this method first locates centroids used camera is likely to cost, as the data points also already in-
at the data points and then calculates the distances from the dicate. But how can you determine a fair price?
data points to these
centroids.
The data points closest to
each of the centroids give you
the first clusters. You then
compute the actual centers of
these clusters. The result is
the new distances of the indi-
vidual data points to the cen-
ter points. Based on this, the
clusters re-sort. This process
is repeated until the centers
stop changing.
Figure 1 shows this ap-
proach. The number of seg-
ments is determined by the
value k, which must be speci-
fied. This raises the question
of the appropriate number;
the answer is provided by the
elbow test. The elbow test in-
volves running k-means with
different cluster sizes and Figure 1: Visualization of a k-means clustering. First calculate the red center val-
showing how much variance ues for the black data points. Then, if necessary, redistribute the points to the
there is within clusters for resulting clusters as a function of the distance to the respective center.
Performance
Measurement
With classification methods, in par-
ticular, you need a metric to judge
how well a model performs. To eval-
uate this, however, you need to know
more than how often a model returns
correct predictions.
If a model says no to every credit
decision, it could correctly predict
all credit defaults (true positives).
The true positive rate is also re-
ferred to as the sensitivity. Unfortu-
nately, the bank would then lose its
business model, since the model
would also prevent the good trans-
actions (false positives). If, on the
other hand, it allowed all applica- Figure 4: Visualization of a ROC AUC curve: The area under the curve is
tions, it would allow all incorrect an indicator of quality.
decisions (false negatives) in addi-
tion to the correct decisions (true negatives – also known as Another issue that many tutorials ignore: Although a model
the specificity). might work well, it might possibly discriminate against some
Ideally, a model will minimize both false positives and of the actual people that the data points represent. For exam-
false negatives: At both extremes, the bank goes broke – ei- ple, the inventor of Ruby on Rails, David Heinemeier Hansson,
ther because it no longer does any business at all or because had this experience [2] when the limit his wife was given for
too many loan defaults occur and can no longer be compen- her Apple Card credit card was 20 times lower than his own
sated for by loan income. The four values of false and true limit. Oddly enough, Mrs. Hansson had a better credit score
positives and negatives map to a confusion matrix. The con- than her husband and was taxed jointly with him. This sug-
fusion matrix reveals the number of cases an algorithm gen- gests that gender alone was the reason for giving her a lower
erates in each category of positives and negatives. This in- limit.
formation in turn provides a good overview of the perfor- In addition to just measuring the performance of an algo-
mance details, although a comparison with other model rithm, it is also important to test whether an algorithm discrim-
variants is difficult because the performance is not available inates. One way to test for discrimination is to enter exactly the
as a key figure. same data in a credit application, except for the gender or some
One way to acquire a key figure metric is to use ROC AUC other variable you are testing.
(Receiver Operating Characteristics Area Under the Curve).
The underlying approach of ROC AUC involves plotting the Conclusion
data points on two axes – one for sensitivity and the other for Data science is a vast topic that is constantly evolving as
specificity. The area under the resulting curve is then used as computers grow more powerful and new techniques emerge.
the key figure (Figure 4). If the value is near 0.5, the results are This article outlined some popular techniques that data sci-
as good as pure coincidence, and below 0.5 the results are entists use when they delve into data to find answers for
worse than random decisions. their questions. Q Q Q
The Precision Recall Curve offers another option. The term
precision, in this case, is the ratio of the true positives to the Info
sum of the true and false positives; the recall value is the same [1] Tom Alby, Data Science in Practice (Chapman & Hall, 2023):
as the sensitivity. https://www.routledge.com/Data-Science-in-Practice/Alby/p/
The statements made by all of these key performance indi- book/9781032505268
cators (KPIs) have their limitations, though, if you want to [2] Tweet on Apple Card.
know how a model will behave in the real world. For exam- https://twitter.com/dhh/status/1192540900393705474
ple, it is often useful to run a model against the previous
model (or manual processes, if applicable) in a split test. To Author
stay with the bank example: Did the model result in fewer Tom Alby is the author of several books, a lecturer on
loan defaults? On top of this, you also have to develop and everything data related at several universities, and has worked
maintain the model, which incurs costs. Does this overhead at companies such as Bertelsmann, Google, and bbdo. Today, he
pay off? is Chief Digital Transformation Officer with Allianz Trade.
Number Game
The R programming language is a universal tool for data
analysis and machine learning. By Rene Brunner
T
he R language is descriptive statistics,
one of the best mathematical set and
solutions for sta- matrix operations, and
tistical data anal- higher-order functions,
ysis. R is ideal for tasks such as those of the
such as data science and Map Reduce family. In
machine learning. R, addition, R supports
which was created by object-oriented pro-
Ross Ihaka and Robert gramming with classes,
Gentleman at the Uni- methods, inheritance,
versity of Auckland in and polymorphism.
1991, is a GNU project
that is similar to the S language, which was developed in the Installing R
1970s at Bell Labs. You can download R from the CRAN website. The CRAN site
R is an interpreted language. Input is either executed di- also has installation instructions for various Linux distribu-
rectly in the command-line interface or collected in scripts. tions. It is a good idea to also use an IDE. In this article, I will
The R language is open source and completely free. R, which use RStudio, which is the most popular IDE for R.
runs on Linux, Windows, and macOS, has a large and active RStudio is available in two formats [2]. RStudio Desktop is
community that is constantly creating new, customized a normal desktop application, and RStudio server runs as a
modules. remote web server that gives users access to RStudio via a
R was developed for statistics, and it comes with fast algo- web browser. I used RStudio Desktop for the examples in
rithms that let users analyze large datasets. There is a free and this article.
very well-integrated development environment named RStudio, When you launch RStudio Desktop after the install, you are
as well as an excellent help system that is available in many taken to a four-panel view (Figure 1). On the left is an editor,
languages. where you can create an R script, and a console that lets you
The R language works with a library system, which makes it enter queries and display the output directly. Top right, the IDE
easy to install extensions as prebuilt packages. It is also very shows you the environment variables and the history of exe-
easy to integrate R with other well-known software tools, for cuted commands. The visualizations (plots) are output at the
example Tableau, SQL, and MS Excel. All of the libraries are bottom right. This is also where you can add packages and ac-
available from a worldwide repository, the Comprehensive R cess the extensive help feature.
Archive Network (CRAN) [1]. The repository contains over
10,000 packages for R, as well as important updates and the First Commands
R source code. When you type a command at the command prompt and
The R language includes a variety of functions for manag- press Enter, RStudio immediately executes that command
ing data, creating and customizing data structures and types, and displays the results. Next to the first result, the IDE out-
and other tasks. R also comes with analysis functions, puts [1]; this stands for the first value in your result. Some
commands return more than one value, and the results can Table 1: Data Types in R
fill several lines. Type Designation Examples
To get started, it is a good idea to take a look at R’s data Logical values LOGICAL TRUE and FALSE
types and data structures. More advanced applications build on Integers INTEGER 1, 100, 101
this knowledge; if you skip over it, you might be frustrated
Floating-point numbers NUMERIC 5.1, 100.1
later. Plan some time for the learning curve. The basic data
Strings CHARACTER "a", "abc", "house"
types in R are summarized in Table 1. Table 2 summarizes
some R data structures.
To create an initial graph, you first need to define two vectors Table 2: Data Structures in R
x and y, as shown in the first two lines of Listing 1. The c Name Description
stands for concatenate, but you could also think of it as collect Vector The basic data structure in R. A vector consists of a
or combine. You then pass the variables x and y to the plot() certain number of components of the same data type.
function (last line of Listing 1), along with vectors; the col
List A list contains elements of different types, such as
parameter defines the color of the points in the output. Fig-
numbers, strings, vectors, matrices, or functions.
ure 2 shows the results.
Matrix Matrices do not form a separate object class in R
but consist of a vector with added dimensions. The
Installing Packages elements are arranged in a two-dimensional layout
Each R package is hosted on CRAN, where R itself is also and have rows and columns.
available. But you do not need to visit the website to
Data frame One of the most important data structures in R.
download an R package. Instead, you can install packages
This is a table in which each column contains
directly at the R command line. The first thing you will
values of a variable and each row contains a set
want to do is fetch a library for visualizations. To do this,
of values from each column.
call the install.packages("ggplot2") command in the com-
mand prompt console. The installation requires a working Array An array stores data in more than two dimensions.
An array with the dimensions (2, 3, 4) creates four
C compiler.
rectangular matrices, each with two rows and three
Setting up a package does not make its features available in
columns.
R yet – it just puts them on your storage medium. To use the
package, you need to call it in the R session with the
library("ggplot2") command. After restarting R, the library is RStudio has many built-in features that make working with
no longer active; you might need to re-enable it. Newcomers scripts easier. First, you can run a line of code automatically in
tend to overlook this step, which often leads to time-consum- a script by clicking the Run button or pressing Ctrl+Enter. R
ing troubleshooting. then executes the line of code in which the cursor is located. If
you highlight a complete section, R will execute all the high-
RStudio Scripts lighted code. Alternatively, you run the entire script by clicking
A script is a plain text file in which you store the R code. You the Source button.
can open a script file in RStudio via the File menu.
Data Analysis
A typical process in data
analysis involves a series of
phases. The primary step in
any data science project is
to gather the right data
from various internal and
external sources. In prac-
tice, this step is often un-
derestimated – in which
case problems arise with
data protection, security, or
technical access to
interfaces.
Data cleaning or data prep-
aration is a critical step in
data analysis. The data
y <- c(1, 2, 2, 4, 6)
plot(x,y,col="red")
Figure 1: The main window of the RStudio IDE is divided into panels.
data(mtcars
R-squared value, a statistical measure of how close the data line). If you do not want to overwrite the column caption of the
points are to the regression line. original mtcars dataset, first copy the data to a new data frame
Histograms visualize the distribution of a single variable. A with df <- mtcars.
histogram shows how often a certain measured value occurs or If the records have empty fields, this can lead to errors.
how many measured values fall within a certain interval. The That’s why it is a good idea to resolve this potential worry at
qplot command automatically creates a histogram if you only the start of the cleanup. Depending on how often empty fields
pass in one vector to plot. qplot(x) creates a simple histogram occur, you can either fill them with estimated values (imputa-
from x <- c(1, 2, 2, 3, 3, 4, 4, 4). tion) or delete them. The command from the second line of
The box plot, also known as a whisker diagram, is another Listing 5 removes all lines that contain at least one zero (also
type of chart. A box plot is a standardized method of display- NaN or NA).
ing the distribution of data based on a five-value summary: Records also often contain duplicates. If the duplicate is the
minimum, first quartile (Q1), median, third quartile (Q3), and result of a technical error in data retrieval or in the source sys-
maximum. In addition, a box plot highlights outliers and re- tem, you should first try to correct this error. R provides an
veals whether the data points are symmetrical and how closely easy way to clean up the dataset and assign the results to a
they cluster. new, clean data frame with the unique() command (Listing 5,
In R you can generate a box plot, for example, with last line).
qplot(). The best way to generate a box plot is with the
sample data from mtcars. To use the cyl column as a cate- Predictive Modeling
gory, factor() first needs to convert the values from numeric In reality, there are a variety of prediction models with a wide
variables to categorical variables. This is done with the range of parameters that provide better or worse results de-
factor() command (Listing 4). pending on the requirements and data. For an example, I’ll use
Thanks to the special display form that the geom="violin" pa- a dataset for irises (the flowers) – one of the best-known datas-
rameter sets here, you can see at first glance that, for example, ets for machine learning examples.
the vast majority of eight-cylinder engines can travel around 15 As an algorithm, I use a decision tree to predict the iris
miles on a gallon of fuel, whereas the more frugal four-cylinder species – given certain properties, for example, the length
engines manage between 20 and 35 miles with the same (Petal.Length) and width (Petal.Width) of the calyx. To do
amount (Figure 4). this, I first need to load the data, which already exists in an
R library (Listing 6, line 1).
Data Cleanup The next thing to do is to split the data into training and
Data cleanup examples are difficult to generalize, because the test data. The training data is used to train the model,
actions you need to take heavily depend on the individual da- whereas the test data checks the predictions and evaluates
taset. But there are a number of fairly common actions. For ex- how well the model works. You would typically use about 70
ample, you might need to rename cryptically labeled columns. percent of the data for training and the remaining 30 percent
The recommended approach is to first standardize the designa- for testing. To do this, first determine the length of the record
tions. Then change the column names with the colnames() using the nrow() function and multiply the number by 0.7
command. Then pass in the index of the column whose name (Listing 6, lines 2 and 3). Then randomly select an appropri-
you want to change in square brackets. The index of a particu- ate amount of data (line 5).
lar column can also be found automatically (Listing 5, first I have set a seed of 101 for the random value selection in
the example (line 4). If you set the same value for the seed,
you will see identical random values. Following this, split the
data into iris_train for training and iris_test for validation
(lines 6 and 7).
04 > set.seed(101)
09 > install.packages("rpart.plot")
10 > library(rpart)
11 > library(rpart.plot)
After splitting the data, you can train Listing 7: Accuracy Estimation
and evaluate the decision tree model. To 01 > iris_pred <- predict(object = iris_model, newdata = iris_test, type = "class")
do this, you need the rpart library. 02 > install.packages("caret")
rpart.plot visualizes the decision tree
03 > library(caret)
(lines 8 to 11). Next, generate the deci-
04 > confusionMatrix(data = iris_pred, reference = iris_test$Species)
sion tree based on the training data.
When doing so, pass in the Species col-
umn in order to predict which iris spe- Listing 8: Data Import
cies you are looking at (line 12). > df <- read.table("meine_datei.csv", header = FALSE, sep = ",")
One advantage of the decision tree is
> my_daten <- read_excel("my_excel-file.xlsx")
that it is relatively easy to see which pa-
rameters the model refers to. rpart.plot
lets you visualize and read the parameters (line 13). Figure 5 You can now also imagine how this algorithm could be ap-
shows that the iris species is setosa if the Petal.Length is plied to other areas. For example, you could use environmen-
greater than 2.5. If the Petal.Length exceeds 2.5 and the tal climate data (humidity, temperature, etc.) as the input,
Petal.Width is less than 1.7, then the species is probably ver- combine it with information on the type and number of de-
sicolor. Otherwise, the virginica species is the most likely. fects in a machine, and use the decision tree to determine the
The next step in the analysis process is to find out how ac- conditions under which the machine is likely to fail.
curate the results are. To do this, you need to feed the model
data that it hasn’t seen before. The previously created test Importing Data
data is used for this purpose. Then use predict() to generate If you want to analyze your own data now, you just need to
predictions based on the test data using the iris_model model import the data into R to get started. R lets you import data
(Listing 7, line 1). from different sources.
There are a variety of metrics for determining the quality of To import data from a CSV file, first pass the file name (in-
the model. The best known of these metrics is the confusion cluding the path if needed) to the read.table() function and
matrix. To compute a confusion matrix, first install the caret li- optionally specify whether the file contains column names. You
brary (lines 2 and 3), which will give you enough time for an can also specify the separator character for the fields in the
extensive coffee break even on a fast computer. Then evaluate lines (Listing 8, first line).
the iris_pred data (line 4). If the data takes the form of an Excel spreadsheet, you can
The statistics show that the model operates with an accuracy also import it directly. To do this, install the readxl library and
of 93 percent. The next step would probably be to optimize use read_excel() (second line) to import the data.
the algorithm or find a different algorithm that offers greater
accuracy. Conclusions
The R language is a powerful tool for analyzing and visualizing
scientific data. This article took a look at how to install R,
RStudio, and the various R libraries. I also described the vari-
ous data structures in R and introduced some advanced analy-
sis methods. Now you can jump in and start using R for your
own scientific data analyses. Q Q Q
Info
[1] CRAN: https://cran.r-project.org
Author
Rene Brunner is the founder of Datamics, a consulting
company for Data Science Engineering, and Chair of the Digital
Technologies and Coding study program at the Macromedia
University. With his online courses on Udemy and his “Data
Figure 5: Visualizing the decision tree model with Science mit Milch und Zucker” podcast, he hopes to make data
the iris data. science and machine learning accessible to everyone.
QQQ
Steadfast
Immutable distributions offer a layer of added security.
Bruce explains how immutable systems work and
discusses their benefits and drawbacks. By Bruce Byfield
T
he concept of immutable objects – Table 1: Selected Immutable Distros
objects that can be replaced but blendOS An Arch Linux-based distro suitable for beginners that runs
not edited – is not new to Linux. packages from multiple distros on the same desktop
Object-oriented program lan- Bottle Rocket A distro for use with Amazon Web Services
guages such as Rust, Erlang, Scala, carbonOS A Gnome-based distro that includes system updates
Haskell, and Clojure have immutable ob- CoreOS A distro used by Red Hat Enterprise Linux (RHEL)
jects, and many programming languages
Fedora Silverblue A variant of Fedora Workstation that is perhaps the most
allow immutable variables. Similarly, the popular immutable distro
chattr command has an immutable attri- Fedora Kinoite A Plasma-based variant of Fedora Workstation
bute for directories and files.
Fedora Sericea A variant of Fedora Workstation that uses the Sway window
In recent years, immutable systems manager
have emerged, originally for the cloud
Fedora CoreOS A distro designed for clusters (but operable as standalone)
or embedded devices, but now for and optimized for Kubernetes
servers and desktop environments as
Flatcar Container Linux A minimal distro that includes only container tools and no
well. Some of these distros are new, package manager
and many are based on major distribu- RancherOS A light, minimal system with immutability provided by read-
tions such as Debian, openSUSE, and only permissions
Ubuntu. All are seen as adding another NixOS An immutable system, plus rollbacks, system cloning, 80k
layer of security and most use contain- packages, preinstall package testing, and multiple versions
ers and universal packages, bringing of packages
these technologies to the average user Guix Similar to NixOS, but aimed at advanced users
for everyday use (see Table 1).
Talos Linux A distro designed for the cloud and use with Kubernetes
with a minimal installation
Author
Endless OS A Debian-based distro aimed at new users that works offline
Bruce Byfield is a computer journalist and
Photo by Egor Myznik on Unsplash
a freelance writer and editor specializing Nitrux A Debian and Plasma-based distro
in free and open source software. In openSUSE MicroOS A server-oriented distro with transactional updates via Btrfs
addition to his writing projects, he also Vanilla OS A Debian-based distro with emphasis on desktop and user
teaches live and e-learning courses. In his experience
spare time, Bruce writes about Northwest
Coast art (http://brucebyfield.wordpress. Ubuntu Core In development since 2014, a well-documented distro
com). He is also co-founder of Prentice specifically designed for embedded devices
Pieces, a blog about writing and fantasy at
Discontinued: k3os, a minimal distro for running Kubernetes clusters
https://prenticepieces.com/.
QQQ
Friendly Fork
Recent policy changes at Red Hat have upturned the RHEL clone community. AlmaLinux charts a new
path by shifting to binary compatibility and away from being a downstream RHEL build. By Amy Pettle
W
hen Red Hat discontinued the GPL, AlmaLinux is forging a differ- new RPM package manager, and then
CentOS and replaced it with ent path forward. published the code in the AlmaLinux
CentOS Stream in late 2020, AlmaLinux plans to maintain applica- repositories.
AlmaLinux stepped forward tion binary interface (ABI) compatibility Instead of updates and patches com-
to build a community downstream ver- to continue to provide the community ing from a single repository, AlmaLinux
sion of Red Hat Enterprise Linux (RHEL). with a forever-free Enterprise Linux so- now must gather them from multiple
In a desire to fill this void in the Enter- lution. (See the “New Path Forward” box sources and then compare, test, and
prise Linux ecosystem, CloudLinux col- for our interview with benny Vasquez, build the new release from these
laborated with the community to develop AlmaLinux OS Foundation Chair, to sources. To achieve ABI compatibility,
AlmaLinux OS as a downstream build of learn why they chose this route.) AlmaLinux will use CentOS Stream (the
RHEL. After the first stable release in upstream version of RHEL still available
March 2021, CloudLinux turned gover- 1:1 vs. ABI Compatibility to the public) and then get additional
nance of AlmaLinux OS over to the non- In 1:1 compatibility, a clone distribution code from Red Hat Universal Base Im-
profit AlmaLinux OS Foundation. From provides an exact copy of RHEL’s func- ages (UBIs) and upstream Linux code.
there, AlmaLinux chugged along for over tionality, behavior, and binary compati- In a recent talk at All Things Open [1],
two years providing the Enterprise Linux bility, including bug-to-bug compatibil- Vasquez noted that 99 percent of the
community with a forever-free Linux ity. It is an exact replica of RHEL minus packages would match RHEL source
distro while offering long-term stability RHEL’s branding and trademarks. code. Of this 99 percent, 75 percent will
and a production grade platform. With ABI compatibility, AlmaLinux be built from CentOS Stream or UBI im-
That all changed in June 2023 when guarantees that all apps developed for ages, while approximately 24 percent
Red Hat announced that RHEL-related RHEL or its clones will run on AlmaLinux will require manual patching.
source code would be restricted to Red without any modifications or extra work The remaining one percent that differs
Hat’s customer portal. CentOS Stream, on the part of the user. AlmaLinux will not from RHEL lies in the kernel patches.
an upstream version of RHEL that con- be an exact copy, but it will include kernel These kernel updates pose the biggest
tains experimental packages, would now and application compatibility. This also challenge because AlmaLinux can no
be the sole repository for public RHEL- means that AlmaLinux will not guarantee longer pull these updates from Red Hat
related source code releases. Because bug-to-bug compatibility. While some without violating licensing agreements.
Red Hat’s subscription agreement pro- users might find bugs not found in RHEL, Moving forward, AlmaLinux plans to pull
hibits customers from redistributing AlmaLinux also has the opportunity to in- kernel updates from various other
code, this move appeared to put an end clude bug fixes not yet addressed by Red sources, and, if all else fails, the Oracle
to downstream builds like AlmaLinux as Hat, as well as possibly offer new features releases (which are also based on RHEL).
well as other RHEL clones like Rocky not available in RHEL. On the upside, AlmaLinux can now
Photo by Alex Kondratiev on Unsplash
Keyboard
Eavesdropping
Is someone listening in on your typing? Learn more about how software (specifically Zoom) to record
keystrokes when attendees logged into
acoustic keyloggers work. By Chris Binnie various accounts during the meeting.
Secondly, when presented with the above
W
training data, the AI’s success rate was in-
ith all the discussion about drive. Once installed, anything typed on credible. Overall the success rate was a
the application of artificial in- the keyboard attached to the infected com- staggering 95 percent. Zoom calls achieved
telligence (AI) in cybersecu- puter is saved and forwarded to the at- a 93 percent success rate and Skype man-
rity, we are reminded that tacker, giving them access to passwords, aged 91.7 percent accuracy, according to the
criminals are paying close attention to AI's credit card numbers, bank account infor- Bleeping Computer article.
advances. New functionality identified by mation, and more. Of course, today the Keyloggers can be deployed in many dif-
British researchers [1] involves training a malware payload can be just as easily de- ferent ways. For instance, Endpoint Detec-
deep learning model to listen in on the livered by unwelcome JavaScript unsus- tion and Response (EDR) technology was
acoustic sounds made by keyboards when pectingly executed by the browser when found to have missed the presence of
a user is typing. The model then records you visit an infected web page. BlackMamba keylogging malware. Accord-
the audio from the typing and determines There are some legitimate (though ing an article in Dark Reading [2], such an
what was typed. Applications include re- contentious) uses of this technology. For attack “demonstrates how AI can allow
cording users logging in to sensitive online example, parents might monitor their the malware to dynamically modify be-
accounts or entering payment details. child’s tablet usage or a corporate em- nign code at runtime without any com-
However, this type of attack does not re- ployer might keep tabs on an employee’s mand-and-control (C2) infrastructure, al-
quire AI to do damage. Keylogging tools al- computer usage. lowing it to slip past current automated se-
ready exist that can listen in on your typ- A recent article on the Bleeping Com- curity systems that are attuned to look out
ing. While it might sound paranoid, you puter website [1] regarding the British for this type of behavior to detect attacks.”
might be surprised how advanced such deep learning acoustic attack study makes The Dark Reading article concludes
tools have become, even without machine two fascinating points. Firstly, the study that without extensive research combined
learning (ML) removing much of the re- outlines the baseline where a training al- with effort from the security industry, so-
quired “training” time for an acoustic key- gorithm receives enough training data to lutions will struggle to keep us secure.
Lead Image © Sergey Galushko, 123RF.com
logger to fully recognize keyboard sounds. recognize the sound of each keystroke. Now that your fight-or-flight senses
To get you up to speed on keylogging, Bleeping Computer noted: “The research- are tingling suitably, I will show you
I will explain how keylogging works and ers gathered training data by pressing 36 some tools in action.
look at some of the tools currently avail- keys on a modern MacBook Pro 25 times
able on Linux. each and recording the sound produced by Can’t Hear You
each press.” Devices such as phones, or Before looking at acoustic keylogging
What’s All the Fuss? anything with a reasonable quality micro- tools, I’ll cover a non-acoustic keylogger,
Popularized in movies, the logging of key- phone (also infected by malware, most logkeys [3], to show how older tools work
strokes often involves malware being in- likely) are used for the recording. The as well as some of the jigsaw pieces in-
stalled on a target machine with a USB study also used videoconferencing volved with keyloggers. The logkeys tool
records all common character and func- the logkeys software with the following $ logkeys --start U
tion key presses as well as recognizes the command (note the two dots): --output watching_you.log
files with the following script and enter plete it as shown in Listing 2. Happily,
the directory: the logkeys help file output appears as I need to stop logkeys in order to change
hoped. the keyboard mapping. During testing, I
$ ./autogen.sh Now I will run a test to see if I can get used the pkill command to stop logkeys
$ cd build logkeys to work using instructions from while it was running; there’s almost cer-
the documentation [4]. For this test, I tainly a more graceful way of stopping
Then, you need to check that the build will need two terminals. In the first ter- the daemon however.
environment is sound before compiling minal, I will move into the /tmp directory For those not familiar with pkill, it’s a
to keep the root user’s home directory simple route to take instead of using the
Uninstall logkeys tidy and then create an empty logfile kill command. Be very careful how you
Since logkeys is a surveillance tool, you with the following commands: use it as the root user. It essentially saves
need a way to reliably uninstall it. You time spent looking up a process’s PID to
can do this from the build repo directory $ cd /tmp terminate it. Its purpose is to match the
(which is /root/logkeys/build in my case $ echo "" > watching_you.log human-readable name of a process be-
as I’m cloning the repo into the /root di-
fore stopping it ungracefully. For more
rectory) using the command in Listing 1.
Next I will start logging output with: information on pkill, run man pkill.
[...] [...]
2023-08-13 12:58:13+0100 > <Enter><Up><LShft>Xu <BckSp>v yx [ ]q?ux? eqwcyx[g yx?u v <LShft>Yjgg cuq <BckSp><BckSp>#q yxeu
esq <LShft>"neci<LShft>" ?ywq?euwr [x? e[yg esq gua aygqv<BckSp><BckSp><BckSp><BckSp><BckSp>?ygq
[] ]u<LShft>H
[...]
[...]
$ cd acoustic-keylogger/
Now, you are ready to run the Docker Figure 3: Keystroke sounds generated by a MacBook Pro 2016 (source:
Compose build command, whose output https://github.com/shoyo/acoustic-keylogger).
$ cd kbd-audio
[ 2%] Building CXX object CMakeFiles/Core.dir/common.cpp.o
$ git submodule update --init
[ 4%] Building CXX object CMakeFiles/Core.dir/audio-logger.cpp.o
$ mkdir build && cd build
[ 6%] Linking CXX static library libCore.a
$ apt install cmake -y
$ ./record-full output.kbd
$ ./play-full output.kbd
Conclusions
I have demonstrated a number of key-
Figure 9: The results of a keyboard vulnerability test. logging tools ranging from those that
capture key presses to those that record
significant benefits of this iteration of the You can test out keytap2 in Gerganov’s typing audio. Even in their current iter-
tool.) Instead of using training data, key- Capture The Flag (CTF) competition [15], ations, these tools should give you
tap2 references statistical information in where successful users enter a Hall of pause. For some tips on protecting your-
relation to the n-gram frequencies in- Fame. A keytap2 online demo [16] offers self from keyloggers, I recommend
volved. An n-gram is a series of adjacent helpful instructions to get you up and checking out this cursory discussion on
letters [13]. For a treatise on how keytap2 running after clicking the Init button. the topic [20].
works, see [14]. As AI advances over the next few
Three and Magic Numbers years, keylogging tools will likely
Author The final version in the kbd-audio repo is evolve. Until then, you might consider
Chris Binnie is a Cloud Native Security keytap3, which improves on the algo- how many devices in your home have
consultant and co-author of the book Cloud rithm and provides better n-gram statis- a microphone and perhaps reduce
Native Security: https://www.amazon.com/ tics. In addition, keytap3 no longer re- them in number. You might also want
Cloud-Native-Security-Chris-Binnie/dp/ quires manual intervention during text to sign out of your online accounts
1119782236. recovery – it is fully automated. during video calls. Q Q Q
Info
[1] “New acoustic attack steals data from [5] lkl: [14] n-gram frequencies:
keystrokes with 95% accuracy” by Bill https://sourceforge.net/projects/lkl https://github.com/ggerganov/
Toulas, Bleeping Computer, August 5, [6] uberkey: kbd-audio/discussions/31
2023: https://www.bleepingcomputer. https://linux.die.net/man/8/uberkey
[15] CTF challenge: https://ggerganov.
com/news/security/new-acoustic- [7] Side-channel attack: github.io/keytap-challenge
attack-steals-data-from-keystrokes- https://en.wikipedia.org/wiki/
with-95-percent-accuracy Side-channel_attack [16] keytap2 demo:
[2] “AI-Powered ‘BlackMamba’ Keylog- [8] acoustic-keylogger: https://keytap2.ggerganov.com
ging Attack Evades Modern EDR Se- https://github.com/shoyo/ [17] keytap3 demo:
curity” by Elizabeth Montalbano, acoustic-keylogger https://youtu.be/5aphvxpSt3o
Dark Reading, March 8, 2023: [9] Jupyter: https://jupyter.org
[18] keytap3 GUI:
https://www.darkreading.com/ [10] kbd-audio: https://github.com/ https://keytap3-gui.ggerganov.com
endpoint/ai-blackmamba- ggerganov/kbd-audio
keylogging-edr-security [19] keytap3 test:
[11] kbd-audio demo:
https://keytap.ggerganov.com https://keytap3.ggerganov.com
[3] logkeys:
https://github.com/kernc/logkeys [12] keytap demo: https://www.youtube. [20] Prevention tips: https://security.
[4] logkeys documentation: https:// com/watch?v=2OjzI9m7W10 stackexchange.com/questions/
github.com/kernc/logkeys/blob/ [13] n-gram: 119730/targeted-acoustic-
master/docs/Documentation.md https://en.wikipedia.org/wiki/N-gram keylogging-attack-prevention
A command-line
system information tool
System in a
Nutshell
Neofetch displays system information about your hardware, operating sytem, and desktop
settings in visually appealing output perfect for system screenshots. By Bruce Byfield
L
inux has never lacked applications Little wonder, then, that in recent years left of Figure 1 is an ASCII rendition of the
that display system information, neofetch has found its way into most installed distribution’s logo. On the right
but perhaps the most comprehen- distributions. Not only is it a useful sum- are 15 system statistics. Which statistics
sive tool is neofetch [1], a Bash mary of system information, supporting are shown, the details of each statistic, and
script that displays the current informa- a wide array of hardware and software, the general layout are all customizable
tion about hardware, operating systems, but, as its GitHub page notes, its visually either from the command line or from
and desktop settings. The information is appealing output is also useful in screen- .config/neofetch/config.conf in the user’s
presented by default in a somewhat hap- shots of your system. home directory (Figure 2). At the bottom,
hazard order, which can be compensated For many, the output of the bare com- a line of colored blocks does nothing ex-
for by a high degree of customization. mand may be enough (Figure 1). On the cept to mark the end of the display.
Figure 1: The neofetch’s default output: In addition to a wide range of system information, it includes an
ASCII rendering of the distribution logo.
Figure 2: Neofetch creates a configuration file for monitor that con- info=("${info[@]##*: }")
Data processor
G
NU datamash [1] is a com- automatically in shell scripts, and even Practice with Sample Files
mand-line program capable of directly attach it to other programs (in- Datamash does not offer many sample
analyzing, summarizing, or cluding itself!) via Unix pipes. files for learning and testing its many
transforming in various ways Besides, in almost all the cases I have features. At the time of writing, the
tables of numbers, with or without text, seen or can imagine, datamash does datamash package only includes four
stored inside plaintext files. For these what you need with less typing, possibly sample files. On Linux, depending on
kinds of tasks, datamash is often a faster, a lot less. Last but not least, datamash your distribution, you can find them in
more productive alternative to tools like lets you easily perform basic quality /usr/share, /usr/local/share, or /usr/
AWK, sed, or any scripting language. checks on raw data. I'll show you how to share/doc/. If these aren’t enough, you
Just like those other tools, datamash is do all this from scratch, starting with the can generate as many sample files as
a good team player, in the traditional basic options and ways of working with desired with simple scripts such as the
Unix and Linux sense: You can use data- datamash and then moving to more one in Listing 1, which is a snippet of
mash interactively at the prompt, complicated examples. code from another project that I quickly
adapted for this article.
Listing 1: Generating datamash Test Files Listing 1 creates a table of random inte-
01 #! /usr/bin/perl
gers, with the number of lines and col-
02
umns defined in lines 5 and 6. As is, List-
ing 1 will generate 2,000 random integers
03 use strict;
between 1 and 1,000 and print them sep-
04
arated by tabs (line 14).
05 my $LINES = 500;
More precisely, the counter $I initial-
06 my $COLS = 4;
ized in line 9 is incremented each time a
Lead Image © Mikhail Avlasenko, 123RF.com
07 my $CNT = $COLS*$LINES;
number is added, at the very end of
08
line 14. However, each time the coun-
09 my $I = 0; ter’s current value is also divided by the
10 desired number of columns and assigned
11 while ($I <$CNT) { to the $NL variable, in line 13. With four
12 columns, this means that $NL will cycle
13 my $NL = $I % $COLS; between the values (0,1,2,3) until the
14 printf "%4.4s%s", int rand(1001), (($COLS -1) == $NL) ? "\n" : "\t"; $I++; program ends, making the comparison
15 } in line 14 (($COLS -1) == $NL) true only
once every four iterations of the loop.
When that happens, the code will print a columns of the file somefile.csv. The nu- How are columns recognized? By de-
new line instead of a tab (i.e., start a merical and statistical grouping opera- fault, datamash assumes they are sepa-
new row of the table instead of adding tions include both self-explaining func- rated by single tabs. If they are delim-
another column). You may modify the tions such as sum, min, max, or mean, and ited by other white spaces, or combi-
code in Listing 1 as desired, including many obscure (to me) statistical ones. nations of them, you must say so with
generating text instead of numbers, by There are also operations such as coun- the --whitespace or -W options. In that
adding arrays of strings and then using tunique that count the number of unique case, leading white spaces are ignored.
the counter, or another random number, values in a column. To learn about all Any other column delimiter, for exam-
as an index to load elements of those the possible grouping operations, please ple, a slash, must be declared with -t /
arrays. consult the man page or the online docu- or --field-separator=/.
mentation on the website. Another thing you need to know
The datamash Way Finally, datamash has a “primary” cat- about datamash is inside this short file
Datamash processes data organized in egory of five very important meta-opera- of space-separated floating numbers:
columns and rows (i.e., lines of text) by tions, which must be listed first when
calling functions that perform “opera- used. Of these, the one you will likely #> cat floating.csv
tions” on every element (field) of the use more often is called groupby (-g for 34.2 35.3
column or columns they are told to use. short). I will explain it in a moment, 14.9 -3.3
The datamash documentation divides leaving the others for last, after introduc- #> datamash -W sum 1 min 2 < floating.csv
the available operations into six catego- ing some other basic concepts and com- datamash: invalid numeric value in line U
ries. The simplest one, called “line-filter- mand-line options of datamash. 1 field 1: '34.2'
ing. In datamash, per-line operations are 2,5 #> cat floating.csv | tr '.' ',' | dataU
those that, for every row of data, output #> cat sample-file.csv | datamash max 3-7 mash -W sum 1 min 2
one new value for every field of that line 49,1 -3,3
whose column was selected when data- The first command prints the maximum
mash was launched. Per-line operations values of columns 7, 2, and 5 in that The -C or --skip-comments option
can do both string and number order, while the second returns the four makes datamash ignore lines that start
processing. maximums of columns 3 to 7. Please no- with hashes or semicolons. Comments
You may, for example, call functions tice that if you want an operation done in other formats (e.g., lines starting
such as dirname, basename, and barename on all the columns of a file you must ex- with two slashes such as in the C lan-
to get the corresponding parts of a file plicitly declare the whole range. If a file guage) may be hidden from datamash
path or getnum to extract components has 23 columns, for example, and you by prefixing them with a hash with the
such as 753.4 from strings such as some- need to know the maximum value in sed command:
num753.4. If the data is numeric, you may each of them, you should enter:
among other things ask datamash to cal- #> cat file-with-c-style-comments.U
culate several types of checksum, en- #> datamash max 1-23 < file-with-23-U csv | | sed -e 's|^/?|# /?|' | dataU
round them.
All the “grouping” operations instead Some operations have additional syntax Header lines with column labels
return just one value for every column because they either require a parameter, greatly increase the readability of both
they are told to process or for each part or must combine different columns to the input files as well as datamash’s
(more on this soon) of the same column. produce one result: output. If the first line of a file contains
For example, a command like labels for its columns, as in this sam-
#> datamash perc:40 5 < input-file.csv ple file from the datamash
#> datamash max 3 min 1 mean 5 < someU #> datamash pcov 4:6 < input-file.csv documentation
file.csv
4300 23 304,3 Here, datamash’s first call returns the #> cat /usr/share/doc/datamash/examples/U
would make datamash print the maxi- second returns the covariance (i.e., Name Major Score
mum (4300), minimum (23), and mean joint variability) of the values in col- Shawn Arts 65
(304,3) values of the third, first, and fifth umns 4 and 6. Marques Arts 58
then datamash will recognize and accept input data – a tab or whatever was de- or rearrange columns of data before
those labels as column names, if given clared with the -t or -W switches. If you performing any of the operations de-
the --header-in option. You may issue want a different column delimiter, how- scribed so far. Consider the file in List-
commands such as ever, you can set it with: ing 2, which lists the number of users
of several operating systems in differ-
#> cat scores_h.txt | datamash --headerU #> echo "2,4 3,7 112,88" | datamash -W U ent places.
-in min Score ceil 1-3 '--output-delimiter=|' With datamash, you can find the total
14 3|4|113 number of users of each operating sys-
tem as follows:
to find that the minimum score in the To add headers, use --header-out. Cou-
whole file is 14. There is a trap here, pled with --header-in, that option will #> cat users.tsv | datamash -s -g 2 sum 1
however. Consider this case, where the use the same headers present in the freebsd 2981
question asked to datamash using the input file. Otherwise it will print the op- linux 222743
--header-in option seems to be “what is erations corresponding to each column: unix 29437
City” in Italy, there are more than a lines, words, and characters, in following Explanation of Listing 4
dozen places named Rome just in the order: Listing 4 shows the several steps I took
United States. Imagine that someone to compose the datamash-based com-
recorded every time a certain event, be #> wc testfile.md mand that would do just what I
it the birth of quadruplets or a visit 33 407 3608 testfile.md needed. To understand it, please note
from the US president, took place in
those US locations. I can use groupby to Listing 4: Print Summary Statistics of Three Blogs
ask datamash to tell me how many of ONE #> find . -type f -name "*.md" | xargs wc
these events have happened in each of
those places, including when the first
32 284 2359 ./stop/google-is-microsoft-2.0.md
one happened as shown in Listing 3.
68 532 3579 ./stop/spying-is-over.md
Listing 3 gives the desired answer
253 4151 27074 ./freesw/nextcloud-16-review.md
thanks to the only substantial differ-
48 411 3184 ./stop/ready-facebook-one.md
ence between this invocation of data-
...
mash and the previous one: This time,
I told datamash to group and process
as one key the combination of two col- TWO #> find . -type f -name "*.md" | xargs wc \
learn from the last two examples is that 68 532 3579 ./stop/spying-is-over.md
the groupby operation always prints first 48 411 3184 ./stop/ready-facebook-one.md
the column, or combination of columns, ...
that it used as keys. What if you needed
to have those columns in some other po- THREE #> find . -type f -name "*.md" | xargs wc \
sition? The answer, as I will show | sort -t / -k 2 \
shortly, is to pass the output of data-
| tr / " "
mash to some other tool, such as AWK,
sed, or even a second invocation of
202 1245 8267 . freesw ignore-threads-in-mailing-lists.md
datamash!
196 1978 14890 . freesw odf-slideshows-from-plain-text-files.md
larly check some statistics about those | datamash -W groupby 5 mean 1 mean 2 mean 3 min 2 max 2 \
that I prefixed the shell prompts with support multiple field delimiters, I con- operation generates an error message if
numbers in capital letters to make the verted the slashes to spaces with the tr the rows of the current file do not have
explanation easier to follow. I also cut command. Now all the columns have exactly the same number of arguments
the output of each command to just a the same delimiter, and the blog names (Listing 5).
few hand-picked lines, for readability are always in the fifth column. This is Inside a shell script, you may auto-
and brevity. something datamash can handle! mate the check and generate more syn-
ONE: This finds all the Markdown FOUR: I can finally add datamash to thetic error messages as follows
files in the root directory of my blogs the pipe, first setting the column sepa-
and, through the xargs command, rator to whitespaces (-W), and then ask- datamash check < bad.csv || die "this U
passes them all to wc. The output has ing to group on column 5 (the blog file has an invalid structure"
all the data I need, but it is not sorted name), in order to first print the mean
by blog name (the freesw entry should values of line, words, and character because (without going into details) the
be first, not third!). This is the way numbers of all the posts of each blog, command after the || operator will only
the find command and Linux filesys- followed by their minimum and maxi- be executed if the datamash check fails.
tems work, but datamash can only mum number of words. At this point, The control can be even more precise,
group rows presorted by the grouping the only thing left is to get rid of the because check accepts two optional argu-
key. As far as I understand, the sort- decimal digits. ments (lines and columns) and will fail
ing that would be needed here is be- What I actually got (even if I left only unless the target file has exactly that
yond datamash’s capabilities – no the first digits in Listing 1) were num- number of lines and columns.
problem though. bers like 937,91039, which are just con-
TWO: I piped the output of the initial fusing. For my purposes, truncating all Transformations
command to the sort utility, telling it to those numbers to integers would be The last major type of operation that
sort on the second field (-k 2), with / as more than adequate. The problem is, datamash can perform is what I would
field separator. This sorted the posts by how can I do it if, as explained above, I call the data or table “transformations”
blog, as needed, so on to the next cannot give the -R option a null value? provided by the primary functions called
problem. FIVE: Here is the solution: Pipe the reverse, transpose, and crosstab.
THREE: The find command prints the output of datamash to datamash, telling The first one reverses, unsurprisingly,
whole path to a file, but the only part I it to truncate all the numeric fields, the positions of all columns (Listing 6).
need datamash to see is the blog name which are those in the columns with in- Combined with the cut command,
(i.e., freesw, stop, or tips). This is a dexes between 2 and 6! which extracts whatever combination of
problem because that part is delimited columns you want, datamash’s reverse
by slashes, not spaces like the previous Quality Control operation makes it very easy to rear-
columns. Because datamash does not Remember I said that datamash has not range columns in a text file any way
just groupby, but a you desire.
Listing 5: check Operation Error Message whole category of Compared to reverse, transpose some-
$ cat bad.csv
“primary opera- how does a mirror operation, because it
A 1 ww
tions”? Time to swaps rows with columns (Listing 7).
talk about the It is possible to reverse or transpose
B 2 xx
other four, which files even if their lines do not have all
C 3
add to datamash the same number of columns by add-
D 4 zz
two different ca- ing the --no-strict option. In those
$ datamash check < bad.csv
pabilities that I cases, you may even fill the empty
like a lot, the first fields with a string of your choice
datamash: check failed: line 3 has 2 fields (previous line
being a sort of using --filler="FILLER STRING HERE".
had 3)
quality control. The crosstab operation, which exposes
fail
The check the relationships between two columns,
d -20 NOK
is the datamash version of pivot tables. At because it can count how many rows open source gems that may be a huge
first sight, crosstab may seem to be just have the same values in a given pair of time saver for more than a few users. If
another way to group multiple columns, columns, as shown in Listing 8. you have tabular data of any type, try it,
In Listing 8, datamash indeed tells that alone or as a lightweight but still power-
Listing 8: crosstab Example a and x appear side-by-side two times in ful sidekick of VisiData [2]. You won’t re-
$ cat input.txt
the input file. If this were the whole story, gret it! QQQ
a x 3
crosstab would be just another version of
a y 7
grouping that displays its findings with a
matrix instead of a list. Info
b x 21
The added value of crosstab is that it [1] GNU datamash:
a x 40
can show using the same format the re- www.gnu.org/software/datamash/
sult of many other grouping operations, [2] “A Command-Line Data Visualization
$ datamash -s crosstab 1,2 < input.txt
not just the number of times each pair Tool” by Marco Fioretti, Linux Maga-
x y zine, issue 277, December 2023, pp.
appears. This is evident in these two
a 2 1 40-45
examples from the datamash manual
b 1 N/A
(Listing 9), where crosstab is used to
show first the Author
Listing 9: crosstab Shows Sums and Values sums and then the Marco Fioretti (https://mfioretti.substack.
#> datamash -s crosstab 1,2 sum 3 < input.txt
unique values com) is a freelance author, trainer, and
x y
from the third col- researcher based in Rome, Italy, who has
a 43 7
umn, for any been working with free/open source
b 21 N/A
combination of software since 1995,
values from the and on open digital
first two. standards since 2005.
#> datamash -s crosstab 1,2 unique 3 < input.txt
Marco also is a board
x y
a 3,40 7
Conclusion member of the Free
Datamash is one of Knowledge Institute
b 21 N/A
those little-known (http://freeknowledge.eu).
IN-DEPTH
PyScript
Snake Charmer
PyScript lets you use your favorite Python libraries on client-side summarize some of the strengths and
weakness that I’ve found while working
web pages. By Pete Metcalfe
with PyScript.
W
hile there are some great In this article, I will introduce PyScript Getting Started
Python web server frame- with some typical high school or univer- PyScript doesn’t require any special soft-
works such as Flask, sity engineering examples. I will also ware on either the server or client; all
Django, and Bottle, using
Python on the server side adds complex-
ity for web developers. To use Python
on the web, you also need to support
JavaScript on client-side web pages. To
address this problem, some Python-to-
JavaScript translators, such as JavaScrip-
thon, Js2Py, and Transcrypt, have been
developed.
The Brython (which stands for
Browser Python) project [1] took the
first big step in offering Python as an
alternative to JavaScript by offering a
Python interpreter written in JavaScript.
Brython is a great solution for Python
enthusiasts, because it’s fast and easy to
Photo by Godwin Angeline Benjo on Unsplash
Inspect option and print the present time into a PyScript ter-
then click on the minal section (Figure 4).
Console heading. A more Pythonic approach to calling
Figure 3 shows a a PyScript function is available with the
very typical error: a @when API. The syntax for this is:
print() function
missing a closing <py-script>
tags (lines 20, 26, and 32). The syntax Listing 3 shows the code to import the without the user’s authorization. To
for the display() function is: alert and prompt libraries, then prompts allow PyScript to access a local file,
the user for their name, and finally dis- you need to do three key things. To
display(*values, target="tag-id", U plays an alert message with the entered start, you need to configure a page
append=True) name (Figure 6). with an <input type="file"> tag. To
call a file-picker dialog with a CSV fil-
The *value can be a Python variable or Reading and Plotting a ter, enter:
an object like a Matplotlib figure. Local CSV
The @when function (lines 22 and 28) File Listing 1: Button Click Action
connects the Back and Forward button For a final, more <!DOCTYPE html>
clicks to the functions back_year() and challenging exam- <html lang="en">
forward_year(). ple, I’ll use <head>
PyScript to read a <title>Current Time</title>
PyScript with JavaScript local CSV file into
<link rel="stylesheet" href="https://pyscript.net/
Libraries a pandas latest/pyscript.css" />
In many cases you’ll want to use Java- dataframe and
<script defer src="https://pyscript.net/latest/pyscript.
Script libraries along with PyScript. For then use Matplot- js"></script>
example, you might want to include Ja- lib to plot a bar </head>
vaScript prompts or alert messages for chart (Figure 7).
your page. To access a JavaScript li- For security
<body>
brary, add the line: reasons, web
<h1>Py-click to call a Pyscript Function</h1>
browsers cannot
<!-- add py-click into the button tag -->
from js import some_library access local files
<button py-click="current_time()" id="get-time"
class="py-button">Get current time</button>
<py-script>
import datetime
def current_time():
print( datetime.datetime.now())
</py-script>
</body>
</html>
Figure 4: Button click to call a PyScript function.
15 <py-script> 33
18 36 </html>
alert(f"Hi:, {name}!")
</py-script>
<input type="file" id="myfile" U allows the data to # Set the listener to look for a file name change
name="myfile" accept=".csv"> be passed into a e = document.getElementById("myfile")
pandas dataframe
add_event_listener(e, "change", process_file)
Next, you must define an event listener (lines 36 and 37).
to catch a change in the <input> file. For Line 38 outputs
this step, two libraries need to be im- the dataframe to a py-terminal Line 47 sends the Matplotlib figure to
ported, and an event listener needs to element: the page’s <div id="lineplot">
be configured as shown in Listing 4. element:
Finally, you need to import the Java- print("DataFrame of:", f.name, "\n",df)
include adding options for sorting, nice that these PyScript pages don’t re- addition, I often got tripped up with Py-
grouping, and customized plots. It’s quire Python on the client machine. thon indentation when I was cutting and
important to note that PyScript can While working with PyScript, I found pasting code. Overall, however, I was very
also be used to save files to a local two issues. The call-up is very slow (espe- impressed with PyScript, and I look for-
machine. cially compared to Brython pages). In ward to seeing where the project goes. Q Q Q
11 </py-config> 39
14 41 fig, ax = plt.subplots(figsize=(16,4))
15 <h1>Pyscript: Input Local CSV File and Create a Bar
42 plt.bar(df.iloc[:,0], df.iloc[:,1])
Chart</h1>
43 plt.title(f.name)
16 <label for="myfile">Select a CSV file to graph:</
label> 44 plt.ylabel(df.columns[1])
17 <input type="file" id="myfile" name="myfile" accept=".
45 plt.xlabel(df.columns[0])
csv"><br>
46 # Write Mathplot figure to div tag
18
21 <py-script output="print_output"> 49 # Set the listener to look for a file name change
22 import pandas as pd
50 e = document.getElementById("myfile")
23 import matplotlib.pyplot as plt
51 add_event_listener(e, "change", process_file)
24 from io import StringIO
52
25 import asyncio
53 </py-script>
26 from js import document, FileReader
28 55 </html>
QQQ
Scales,
Well?
Mike Schilli steps on the scale every week and records
his weight fluctuations as a time series. To help monitor
his progress, he writes a CGI script in Go that stores the
data and draws visually appealing charts. By Mike Schilli
C
apturing datapoints, adding measured values via HTTPS like an API, imported engineer from Germany. At
them to a time series, and formats the time series generated from the time, we did everything live on a
showing values over time them into an attractive chart, and sends single server without any form of
graphically is usually the do- the results back to the browser in PNG safety net. A CGI script at the top of
main of tools like Prometheus. The tool format? Let’s find out. the portal page displayed the current
retrieves the status of monitored systems Figure 1 shows the graph of a time se- date. However, this caused the (only!)
at regular intervals and stores the data as ries that outputs my weight in kilo- server to collapse under the load of
a time series. If outliers occur, the mes- grams over the past few years (possibly what was quite a considerable number
senger of the gods alerts its human to embellished for this article) as a chart of users, because of the need to launch
the fact. Viewing tools such as Grafana in the browser after pointing it to the a Perl interpreter for every call. I
display the collected time series in dash- URL on the server. The same CGI script brought the machine back to life with
boards spread over the last week or year also accepts new incoming data. For ex- a compiled C program that did the
as graphs, if so desired, so that even se- ample, if my scale shows 82.5 kilograms same job but started faster. Later on,
nior managers can see at a glance what’s one day, calling persistent environments such as mod_
going on in the trenches. perl came along and made things a
However, my el cheapo web host curl '.../cgi/minipro?add=82.5&apikey=U thousand times faster.
won’t let me install arbitrary software <Key>'
Author The CGI protocol is bona fide dinosaur quests per day, this design is justifiable.
Mike Schilli works as a technology from the heady ’90s of the In a scripting language such as Python,
software engineer in the last century. At the time, the first dy- such a mini project would be imple-
San Francisco Bay Area, namic websites came into fashion after mented in next to no time.
California. Each month users, having acquired a taste for more But I like the challenge of bundling
in his column, which has than static HTML, began to crave cus- adding values and displaying the chart
been running since 1997, tomized content. into one single static Go binary that has
he researches practical applications of It’s a time I remember very well: I no dependencies. Refreshing various Py-
various programming languages. If you was working at AOL back then, tasked thon libraries every so often by hand
email him at [email protected] with freshening up AOL’s website in with pip3 seems like too much trouble.
he will gladly answer any questions. San Mateo, California, as a freshly Once compiled – even if cross-compiled
Static Forever
Compiling and linking the Go code from
Figure 1: The author’s weight fluctuations over the years. Listing 1 creates a binary; simply copy
this into the web server’s cgi/ directory
on another platform – a statically linked environment variable to the URL of the and make it executable. If the web server
Go program will run until the end of request, among other things, and calls is configured to call the cgi-test pro-
time. Even if the web host were to up- the associated program or script. The gram in case of an incoming request to
grade the Linux distro to a new version script then retrieves the information re- cgi/cgi-test, it will return the script’s
with libraries suddenly disappearing as a quired to process the request from its output to the requesting web client’s
result, the all-inclusive Go binary will environment variables. In case of a GET browser. Figure 2 shows the results from
still soldier on. request, for example, you only need the point of view of the user submitting
the URL in REQUEST_URI; its path also the request in Firefox.
Getting Started with CGI includes all the CGI form parameters if So far, so good – but how do you ac-
If a web server determines that it present. As a response to the inquiring tually compile Listing 1? After all, the
needs to respond to a request with an browser, the script then simply uses idea is to create a binary that runs on
external CGI script based on its config- print() to write the answer to stdout. the web host’s Linux distro, which
uration, it sets the REQUEST_URI The web server picks up the text may be incompatible with the build
stream and sends environment because it might be miss-
Listing 1: cgi-test.go it back to the re- ing some shared libraries present on
01 package main
questing client. the web server. Go binaries typically
02
Listing 1 shows only need an acceptable version of the
a minimal CGI host system’s libc. What to do? Docker
03 import (
program in Go. It to the rescue! My web host uses
04 "fmt"
uses the standard Ubuntu 18.04, which means that the
05 "net/http"
net/http/cgi li- Dockerfile in Listing 2 sets up a com-
06 "net/http/cgi"
brary, whose patible environment with this base
07 )
Serve() function image on my build host.
08
09 func main() {
11 qp := r.URL.Query()
12 fmt.Fprintf(w, "Hello\n")
13
16 }
17 }
18
19 cgi.Serve(http.HandlerFunc(handler))
20 }
Figure 2: The Go program in Listing 1 as a CGI script.
However, Ubuntu’s golang package Well Prepared later, Go only needs to compile the
version is almost always woefully out of To compile Go sources, the Go compiler sources locally and link everything to-
date; of course, it’s not even remotely often needs to pull the source code of gether. This literally takes just a few
usable on the fairly ancient Ubuntu dis- included packages and compile it before seconds. That’s what I call putting the
tro running on the web hoster’s box. But linking the final binary. A Docker image fun back into developing and
the Dockerfile can easily work around without those dependencies installed troubleshooting!
this; line 7 fetches a tarball with a very will dawdle around in the preparation The Makefile in Listing 3 assembles
recent Go 1.21 release off the web and phase for minutes at a time during each the image under the docker target (start-
drops its contents into the root directory build run. It will repeat the process time ing in line 9) and assigns it the cgi-test
of the build environment. Add to that and time again for every single minor tag when you run make docker. To com-
some tools like Git (Go uses Git to fetch change to the source code. To speed up pile the source code, you need to call the
GitHub packages) and make for the build, this phase, line 11 in Listing 2 copies remote target (starting in line 5) later.
and, presto, you have yourself a Fran- the Go sources for this project into the This will start a container with docker
kenstein distro ready to build a binary Docker image, and go mod tidy in line run and mount the /build directory in-
for the web host’s environment. 12 precompiles everything. When a side onto the current directory on the
container based host. This means that the generated bi-
Listing 2: Dockerfile on this image is nary within the container will be easily
01 FROM ubuntu:18.04
then launched accessible from outside later.
02 ENV DEBIAN_FRONTEND noninteractive
08 RUN tar -C /usr/local -xzf go1.21.0.linux-amd64.tar.gz 06 docker run -v `pwd`:/build -it $(DOCKER_TAG) \
Listing 4: minipro.go
01 package main 27 }
02 28
03 import ( 29 if len(params["chart"]) != 0 {
04 "fmt" 30 points, err := readFromCSV()
05 "net/http" 31 if err != nil {
06 "net/http/cgi" 32 panic(err)
07 "regexp" 33 }
08 ) 34 chart := mkChart(points)
09 35 w.Write(chart)
10 const CSVFile = "weight.csv" 36 } else if len(params["add"]) != 0 {
11 const APIKeyRef = 37 sane, _ := regexp.MatchString(`^[.\d]+$`,
"3669d95841f6d20ff6a5067a2f2919db4fca6e82" params["add"])
12 38 if !sane {
13 func main() { 39 fmt.Fprintf(w, "Invalid\n")
14 handler := func(w http.ResponseWriter, r *http.Request) { 40 return
15 qp := r.URL.Query() 41 }
16 params := map[string]string{} 42
17 for key, val := range qp { 43 err := addToCSV(params["add"])
18 if len(val) > 0 { 44 if err == nil {
19 params[key] = val[0] 45 fmt.Fprintf(w, "OK\n")
20 } 46 } else {
21 } 47 fmt.Fprintf(w, "NOT OK (%s)\n", err)
22 48 }
23 apiKey := params["apikey"] 49 }
24 if apiKey != APIKeyRef { 50 }
25 fmt.Fprintf(w, "AUTH FAIL\n") 51 cgi.Serve(http.HandlerFunc(handler))
26 return 52 }
host. Line 4 uses REMOTE_PATH to specify variable is set to true; if not, line 40
its address. terminates the request and returns an
error message.
No Messing Around
But that’s enough messing around with Nicely Done
our test balloon. The actual CGI program To see a chart of the time series of values
that generates new values for the time fed in so far, you just set the CGI chart
series and later displays them graphi- parameter in the request to an arbitrary
cally goes by the name of minipro and value. In response, the section starting in
can be found in Listing 4. It uses the add line 29 of Listing 4 uses mkChart() to cre-
form parameter to accept new weight ate a new chart file in PNG format (see
measurements from the user via the CGI Listing 5) and calls w.Write() to return
interface and stores these measurements the chart’s binary data to the requesting
in the weight.csv CSV file on the server browser in line 35. Fortunately, the net/
with the timestamp for the current time. http/cgi library is smart enough to set
This is done by the addToCSV() function the introductory HTTP header to Con-
starting in line 43. tent-Type: image/png when it examines
In order to block Internet randos from the first few bytes of the stream and
banging on the interface, the CGI pro- finds sequences there that point to a
gram requires an API key; this string is PNG image.
hard-coded in line 11. The requesting Listing 5 takes care of managing the
API user attaches the secret to the re- CSV file. Its content consists of the float-
quest as the CGI apikey parameter. The ing-point values of the weight measure-
program on the server will only continue ments, each of which is accompanied by
processing the request if the key matches a timestamp in epoch format after a
Figure 3: The weight measure- the hard-coded value; otherwise, it will comma in each line. Figure 3 shows
ments as floating-point values stop at line 25. some of the stored data in the file.
with timestamps. Because CGI parameters cannot be
trusted in general, it makes sense to Guaranteed Write
The actual build process is started by check their validity with regular ex- In Listing 5, the addToCSV() function
the shell command in line 7, which calls pressions. This is why line 37 sniffs starting in line 10 has the task of accept-
go build. If this works without error, a out the add parameter to see if the ing new measurements. It opens the CSV
secure shell via scp finds the final binary string really looks like a floating-point file in O_APPEND mode; this means that
in the current directory (but outside the number (i.e., if it exclusively consists the fmt.Fprintf() write function in line
container) and copies it onto the target of digits and periods). If so, the sane 18 will always append new values, with
Listing 5: csv.go
01 package main 20 }
02 21
19 return 0, err 38 }
a current timestamp attached, to the end Unix format to an easily readable format in Listing 7 creates a new image with the
of the file. for the x-axis is handled automatically by minipro tag under the docker target using
This approach has a neat side effect. It the go-chart package from GitHub. Line 5 the same Dockerfile I used earlier. Once
ensures that, on POSIX-compatible Unix in Listing 6 fetches the package. this is done, make remote first starts the
systems, lines no longer than PIPE_BUF Line 32 creates a structure of the type container, mounts its working directory
(usually 4,096 bytes under Linux) are al- chart.TimeSeries from the datapoints in to hold the finished binary later, and
ways written in full, without another the xVals (timestamps) and yVals then starts the build and link process
process possibly interfering and ruining (weight measurements) array slices. with go build.
the line. In the present case, this is not Then, the chart.Chart structure from line If this works without errors, the secure
critically important, because there will 42 illustrates the structure in a chart. shell scp copies the binary to the web
be hardly any requests anyway, but on a The Render() function in line 49 creates host’s CGI directory, as set in REMOTE_
hard working web server where you can- the binary data of a PNG file, containing PATH. From there, a browser or curl script
not guarantee atomicity by default, the the diagram, both axes, and their leg- can then call its functions via the web
file would quickly become corrupt, un- ends from this. server, using add to add new datapoints
less you explicitly set a lock. To do so, line 48 creates a new write and then chart to graphically enhance
Conversely, readFromCSV() starting in buffer in the variable w. The chart’s Ren- and visualize the existing dataset. Q Q Q
line 22 reads the lines from the CSV file, der() function
and the standard encoding/csv Go library writes to the buf- Listing 7: Makefile.build
package takes apart the comma-sepa- fer, and Bytes() in DOCKER_TAG=minipro
rated entries. At the end, the function re- line 50 returns its
SRCS=minipro.go chart.go csv.go
turns a two-dimensional array slice of raw bytes to the
BIN=minipro
strings with two entries per line, for the caller of the func-
REMOTE_PATH=some.hoster.com/dir/cgi
value and timestamp. tion (i.e., the main
program) and ulti- remote: $(SRCS)
Spruce It Up with Graphics mately the inquir- docker run -v `pwd`:/build -it $(DOCKER_TAG) \
The mkChart() function starting in line 10 ing user’s browser. bash -c "go build $(SRCS)" && \
of Listing 6 fields this matrix of data- To assemble the scp $(BIN) $(REMOTE_PATH)
points and generates a graph like the one three source files
docker:
shown in Figure 1 from the data. The task into a static bi-
docker build -t $(DOCKER_TAG) .
of converting the timestamps from the nary, the Makefile
Listing 6: chart.go
01 package main 27 }
02 28 xVals = append(xVals, time.Unix(added, 0))
03 import ( 29 yVals = append(yVals, val)
04 "bytes" 30 }
05 "github.com/wcharczuk/go-chart/v2" 31
06 "strconv" 32 mainSeries := chart.TimeSeries{
07 "time" 33 Name: "data",
08 )
34 Style: chart.Style{
09
35 StrokeColor: chart.ColorBlue,
10 func mkChart(points [][]string) []byte {
36 FillColor: chart.ColorBlue.WithAlpha(100),
11 xVals := []time.Time{}
37 },
12 yVals := []float64{}
38 XValues: xVals,
13 header := true
39 YValues: yVals,
14
40 }
15 for _, point := range points {
41
16 if header {
42 graph := chart.Chart{
17 header = false
43 Width: 1280,
18 continue
44 Height: 720,
19 }
20 val, err := strconv.ParseFloat(point[0], 64) 45 Series: []chart.Series{mainSeries},
21 if err != nil { 46 }
22 panic(err) 47
23 } 48 w := bytes.NewBuffer([]byte{})
QQQ
Together
Combining your network adapters can speed up network performance – but a little more testing
could lead to better choices. By Adam Dix
I
recently bought a used HP Z840 WireGuard to connect to my home net- The Z840 has a total of seven net-
workstation to use as a server for a work from anywhere, so that I can back work interface cards (NICs) installed:
Proxmox [1] virtualization environ- up and retrieve files as needed and man- two on the motherboard and five more
ment. The first virtual machine (VM) age the other devices in my home lab. on two separate add-in cards. My sec-
I added was an Ubuntu Server 22.04 LTS WireGuard also gives me the ability to ond server with a backup WireGuard
instance with nothing on it but the Cock- use those sketchy WiFi networks that you instance has 4 gigabit NICs in total.
pit [2] management tool and the Wire- find at cafes and in malls with less worry Figure 1 is a screenshot from NetBox
Guard [3] VPN solution. I planned to use about someone snooping on my traffic. that shows how everything is con-
nected to my two switches and the ISP-
supplied router for as much redun-
dancy as I can get from a single home
network connection.
The Problem
On my B250m-based server, I had previ-
ously used one connection directly to the
ISP’s router and the other three to the
single no-name switch, which is con-
nected to the ISP router from one of its
ports. All four of these connections are
bonded with the balance-alb mode, as
you can see in the netplan config file
Photo by Andrew Moca on Unsplash
(Listing 1).
For those who are not familiar with
the term, bonding (or teaming) is
using multiple NIC interfaces to create
one connection. The config file in List-
ing 1 is all that is needed to create a
bond in Ubuntu. Since 2018 in version
Figure 1: Topology of my home network. 18.04, Canonical has included netplan
as the standard utility for configuring save it, and run sudo netplan to apply personal preference, though that part
networks. Netplan is included in both the changes. If there are errors in your wouldn’t be required for this config.
server and desktop versions, and the config, netplan will notify you upon The last section is where you define
nice thing about it is that it only re- running the apply command. Note that what type of bonding you would like
quires editing a single YAML file for you will need to use spaces in this files to use, and I always choose to go with
your entire configuration. Netplan was (not tabs), and you will need to be balance-alb or adaptive load balancing
designed to be human-readable and consistent with the spacing. for transmit and receive, as it fits the
easy to use, so (as shown in Listing 1) In Listing 1, you can see that I have homelab use case in my experience
it makes sense when you look at it and four NICs and all of them are set to very well. See the box entitled “Bond-
can be directly modified and applied false for DHCP4 and DHCP6. This en- ing” for a summary of the available
while running. sures that the bond gets the IP ad- bonding options.
To change your network configuration, dress, not an individual NIC. Under The best schema for bonding in your
go to /etc/netplan, where you will see the bonds section, I have made one in- case might not be the best for me. With
any YAML config file for your system. If terface called bond0 using all four that in mind, I would recommend re-
you are running a typical Ubuntu Server NICs. I used a static IP address, and so searching your particular use case to see
22.04 install, it will likely be named I kept DHCP set to false for the bond what others have done. For most
00-installer-config.yaml. To change also. Since I configured a static IP ad- homelab use where utilization isn’t
your config, you just need to edit this dress, I also need to define the default constantly maxed out, I believe you will
file using nano (Ubuntu Server) or gateway under the routes section, and typically find that balance-alb is the
gnome-text-editor (Ubuntu Desktop), I always define DHCP servers as a best option.
were managing anyway. Furthermore, I Lastly I would say to homelabbers, beneficial, but for the workload my serv-
am either using WireGuard, in which you’ve got to test to find out. With test- ers are running, bonding all of the con-
case I am locally connected and the ing, I quickly realized I was leaving per- nections gives the best results.
speed from my VPN connection to the formance on the table for no good rea- Good luck with your homelab, and
VM is local, or else I am using Home son. If I were running services that had definitely check out the tteck GitHub
Assistant or Paperless from its web in- lots of traffic or perhaps with a half page [4] for more on Proxmox helper
terface without having WireGuard run- dozen people using my Plex media scripts. Q Q Q
ning, in which case I don’t really care if server, then reserving a single dedicated
the VPN is going quickly at that mo- NIC for the VPN server would have been Author
ment or not. If I am at the cafe on my
Adam Dix is a me-
VPN and looking at my camera through Info chanical engineer
Home Assistant, which is probably the [1] Proxmox: and Linux enthusi-
worst case scenario for me, then there https://www.proxmox.com/en/ ast posing as an
are enough hops that any speed loss [2] Cockpit: English teacher after
from sharing a bond is negated by the https://cockpit-project.org/ playing around a
latency of that many hops anyway. [3] WireGuard: bit in sales and
With all of this in mind, my best bet https://www.wireguard.com/ marketing. You can check out some of
was to put as many NICs together as [4] tteck Proxmox GitHub page: his Linux work at the EdUBudgie Linux
possible in balance-alb mode. https://github.com/tteck/Proxmox website (https://www.edubudgie.com).
QQQ
RPi Flight Simulator Interface MAKERSPACE
MakerSpace
I 2 +ÆQOP\[QU]TI\WZQV\MZNIKM
WVI:I[XJMZZa8Q
.TaQVO0QOP
A Raspberry Pi running Linux with a custom I2C card and a
small power supply provides an interface for a real-time
flight simulator. By Dave Allerton
I
n a flight simulation, the equations converted, and read into a computer as
must be solved at a sufficiently fast digital values, and a flight simulator
rate that the motion (or dynamics) might have several hundred inputs. To
of the simulated aircraft appears to illustrate the problem, in a flight simu-
be smooth and continuous, with no de- lator that acquires data from 32 analog
lays or abrupt changes resulting from the inputs at 50Hz, the overall sampling
computations [1]. Typically, the real-time rate is 1,600 samples per second. Fur-
software in a flight simulator updates at thermore, the data must be sampled
least 50 times per second. In other words, with sufficient resolution (or accuracy),
all the computations must be completed typically 12-16 bits, and any latency re-
within 20ms, including the inputs from sulting from data acquisition by the
controls, levers, knobs, selectors, and simulator modules must be minimized.
switches, which must be sampled within To avoid any delays caused by simula-
the 20ms frame. tor modules waiting to capture data, a
Data acquisition of analog and digital dedicated I/O system can acquire the
inputs is potentially slow. In the case of data and transfer it to the simulator
analog inputs, the signals are sampled, modules over a local network.
Lead Image © innovari, fotolia.com
Requirements flight simulator is 12 bits. Because no of I/O functions, typically costing less
A real-time research flight simulator [2] commercial I/O cards for the RPi met than $10.
currently installed at Cranfield Univer- this specification in terms of the number One further attraction of an I2C interface
sity (Cranfield, UK), runs on a local net- of channels, resolution, and sampling is the simplicity of programming. Most
work of eight PCs, with the simulation rate, a custom solution was developed. transfers only require output of the device
functions partitioned as shown in Fig- address to select a specific register of a
ure 1. The I/O system provides an inter- I2C chip and then transfer of data to or from
face between the simulator and the soft- The 40 GPIO lines of the RPi include an external device. I2C chips are compliant
ware modules that comprise: the model- support for I2C transfers. The I2C proto- with the I2C data transfer protocol, so a de-
ing of the aircraft aerodynamics and the col, originally developed by Philips [3], signer only needs to ensure that the RPi
engine dynamics, aircraft systems, flight is an interesting approach to interfacing, activates the SDA and SCL pins in accor-
displays, navigation, avionics, an in- requiring only two lines to transfer data dance with the protocol, which is provided
structor station, control loading, sound between devices connected to an I2C in software by an I2C driver.
generation, flight data recording, three bus: a serial data line (SDA) and a serial Several I2C libraries are available for
image generators for a visual system, clock line (SCL). For the RPi, SDA and the main programming languages, in-
and an optional connection to Matlab. SCL are included in the GPIO pinout. I2C cluding i2c-tools and wiringpi, simplify-
Data is transmitted over the network as chip pinouts provide SDA and SCL, a ref- ing the development of application soft-
broadcast Ethernet UDP packets. erence voltage, ground, and control pins. ware for I2C devices. The i2c-dev library
Previously, the I/O system was based Additionally, some I2C chips include pins is integrated with libc for the RPi and,
on a PC with a set of industrial I/O cards to define the device address. The I2C for programming in C, includes the ap-
to acquire digital and analog inputs and protocol offers two advantages: First, the propriate header files i2c.h and
generate digital and analog outputs. connection to an RPi only requires a few i2c-dev.h.
However, the interface cards and the PC lines; second, a wide range of integrated A number of manufacturers support I2C
used in this I/O system were obsolete, circuits (ICs) is available for the majority for analog and digital data transfers. The
and the Raspberry Pi (RPi) offered a po-
tential replacement. The RPi has suffi-
cient performance to compute the I/O
functions in real time, and much of the
existing C code could be reused to run
under the RPi’s Linux operating system.
The RPi Ethernet port provides UDP con-
nection to the simulator computers.
The overall structure of the I/O system
is shown in Figure 2. The simulator out-
puts are connected to an existing break-
out card, which provides interconnec-
tions to the simulator and signal condi-
tioning. The analog multiplexer selects
one of 32 inputs, where the channel
number (0-31) is given by a 5-bit input.
The digital multiplexer selects one of
four groups of 8 bits, where the channel
number (0-3) is given by a 2-bit input.
The selected analog channel is sampled
by an analog-to-digital (A/D) chip, and
the digital inputs are read into an 8-bit
parallel buffer. The four analog outputs
drive an electrical control loading sys-
tem, which provides an artificial feel for
the control column and rudder pedals.
The breakout card and the I/O interface
are connected by a 50-way ribbon cable.
The primary requirement was to pro-
vide an I/O interface compatible with
the RPi, capable of sampling 32 analog
inputs and 32 digital inputs at 50Hz and
generating four analog outputs and 24
digital outputs, also at 50Hz, where the
resolution of the A/D conversion for the Figure 2: Interface system.
nal inputs and outputs operating at 5V. A and an LED di- 12 if (ioctl(i2c, I2C_RDWR, &packets) < 0)
Texas Instruments PCA9306 converts SDA agnostics panel 13 I2Cerror("unable to set the analogue MUX dir
reg");
15 messages[0].addr = ADC_ADR;
01 buf[0] = 0;
16 messages[0].flags = I2C_M_RD;
02 buf[1] = 0; /* set for 8 outputs */
17 messages[0].len = 2;
03
18 messages[0].buf = inbuf;
04 messages[0].addr = MUX_ADR;
19 packets.msgs = messages;
05 messages[0].flags = 0;
20 packets.nmsgs = 1;
06 messages[0].len = 2;
21 if (ioctl(i2c, I2C_RDWR, &packets) < 0)
07 messages[0].buf = buf;
22 I2Cerror("unable to read ADC ch=%d\n", chn);
08 packets.msgs = messages;
23
09 packets.nmsgs = 1;
24 AnalogueData[chn] = (((unsigned int) inbuf[0] & 0xf)
10 if (ioctl(i2c, I2C_RDWR, &packets) < 0) << 8) + (unsigned int) inbuf[1];
MCP23008 ICs for digital input, digital #include <linux/i2c.h> A/D chip with an address ADC_ADR; this
output, and multiplexer control (40 #include <linux/i2c-dev.h> ioctl call is repeated for all the devices
bits); an MCP3221 for analog input; in use.
and an MCP4728 for analog output. #define DIGITAL_OUTPUT1_ADR 0x20 Two C structures are defined to access
One of the MCP23008 ICs drives eight #define DIGITAL_OUTPUT2_ADR 0x21 the I2C devices, where the fields of the
outputs for an LED display, an LM7805 #define DIGITAL_INPUT_ADDR 0x22 structures are defined in the header file
voltage regulator provides a stable 5V #define MUX_ADR 0x23 i2c-dev.h:
reference for the A/D chip and a #define LEDS_ADR 0x24
PCA9306 voltage level translator con- #define ADC_ADR 0x4d struct i2c_rdwr_ioctl_data packets;
verts I2C signals between the RPi #define DAC_ADR 0x60 struct i2c_msg messages[1];
array AnalogueData[] of 32-bit unsigned Additionally, the RPi interface provides and data throughput. With the GNU
integers. With the I2C configured for a a timing reference for the simulator, GCC tool chain, programming of the
baud rate of 400Kbits/s, an RPi 3 ensuring accurate maintenance of the I2C devices was straightforward and re-
Model B samples 32 analog inputs in frame rate. quired only a few lines of code to ac-
8.4ms, which is less than half the 20ms cess each device.
frame time. Board Design The RPi provides a dedicated headless
For the flight simulator, after initial- The schematic is shown in Figure 4. The I/O system, loading and running auto-
ization, the I/O system repeatedly exe- PCB was produced as a four-layer board matically after power-up and with diag-
cutes a loop that comprises broadcast- (120mmx95mm) by Eagle CAD software nostic information on the system status
ing a UDP packet containing the sam- (Figure 5). The design illustrates the provided by a small LED panel. The in-
pled data, reading 32 analog inputs, simplicity of I2C interfacing for the data terface provides raw I/O data for the
reading 32 digital inputs, writing four acquisition application. simulator modules, enabling any scaling
analog outputs, writing four digital or conversion to be applied in the
outputs, responding to UDP packets Observations modules.
from the simulator PCs, and updating a I2C is a mature and stable protocol sup- A Raspberry Pi running under Linux
small LED display. The interface is ported by a wide range of integrated with an I2C interface and a small power
scalable and includes expansion for circuits in both DIL and surface-mount supply replaced a PC with two large in-
additional digital inputs and outputs. formats, mostly costing less than $10. dustrial I/O boards, reducing both the
The RPi provides footprint and the cost of the I/O system
an interface for for a real-time flight simulator. Much of
I2C devices, re- the existing I/O software was reused,
quiring only two and no changes were required to the
lines plus power simulator software. Q Q Q
and ground so
that construction Info
of an interface [1] Allerton, D. J. Principles of Flight Sim-
with breadboard, ulation. John Wiley and Sons, 2009
wire-wrap, or
[2] Allerton, D. J. Flight Simulation Soft-
PCB is straight-
ware: Design, Development and Test-
forward. For the
ing. John Wiley and Sons, 2023
flight simulator
application, I2C [3] Anonymous. I2C-Bus Specification and
fully meets the User Manual, Rev. 7.0 NXP Semicon-
requirements in ductors document UM10204, 2012:
terms of sampling https://www.nxp.com/docs/en/user-
Figure 5: I/O system PCB layout. rates, resolution, guide/UM10204.pdf
QQQ
MakerSpace
BCPL for the Raspberry Pi
Before C
The venerable BCPL procedural structured programming
language is fast to compile, is reliable and efficient, offers a
wide range of software libraries and system functions, and
is available on several platforms, including the Raspberry Pi.
By Dave Allerton
I
n the 1960s, the main high-level adopted directly from, BCPL. Although
programming languages were For- BCPL also supported characters and
tran, Basic, Algol 60, and COBOL. bytes, the lack of richer types was ad-
To optimize code or to provide dressed in C, which became the pro-
low-level operations, assembler program- gramming language of choice for Unix
ming offered the only means to access (and subsequently Linux), leaving BCPL
registers and execute specific machine in- mostly for academic applications. Sev-
structions. BCPL, which was used as a eral groups developed compilers, operat-
teaching language in many universities, ing systems, software utilities, commer-
provided a language with a rich syntax, cial packages, and even flight simulation
addressed the scoping limitations of the software in BCPL, but for the most part,
other languages, and had low-level op- BCPL has been forgotten.
erations such as bit manipulation and The demise of BCPL in both academia
computation of variable addresses. and industry is disappointing, particularly
Where BCPL differs from the other lan- because it is a powerful teaching lan-
guages is that it is typeless; all variables guage, introducing students to algo-
are considered to be a word, typically 16 rithms, software design, and compiler de-
or 32 bits. Programmers can access indi- sign. Later, languages such as Pascal and
vidual bits and bytes of a word, perform Modula-2 became popular languages to
both arithmetic and logical operations on introduce concepts in computer science
words, compute the address of a word, or but have been superseded by Java, Py-
use a word as a pointer to another word. thon, and C++. Whereas the learning
One further novel aspect of BCPL is that curve for BCPL is small, enabling stu-
Lead Image © videodoctor, 123RF.com
the compiler is small and written in BCPL, dents to become productive in a short
producing intermediate code for a virtual time, the complexity of languages such as
machine and simplifying the development C++can be a barrier to students learning
of the compiler for a wide range of com- their first programming language.
puters. BCPL was used on mainframe
computers and minicomputers in the The BCPL Language
1970s and microprocessors in the 1980s. The example in Listing 1 of a small BCPL
The early developers of Unix were in- program computes factorial values from
fluenced by, and many aspects of C were 1! to 5!. Because C was developed from
BCPL, the syntax of both languages is from a procedure, respectively, enabling This 32-bit implementation of BCPL
similar. The include directive in C is a procedures to be called recursively. compiles a BCPL program prog.b to
GET directive in BCPL, the assignment prog.o, where prog.o is a Linux object
operator = in C is := in BCPL, and the Portability module linked with two libraries –
fences (curly brackets) { and } are iden- BCPL was developed by Martin Richards blib.o and alib.o – by the gcc linker to
tical. In C the address of a variable a is in the Computer Laboratory at the Uni- produce an executable ELF module,
denoted by &a, whereas in BCPL it is versity of Cambridge. His more recent prog. The library blib.b is written in
given by @a. Indirection, or the use of Cintcode implementation is extensive BCPL and contains the common BCPL li-
pointers, is given by *a in C or !a in and provides numerous examples of cod- brary functions. A small library alib.s is
BCPL. Arrays are organized so that a!b ing, mathematical algorithms, and even written in Linux assembler and contains
in BCPL corresponds to a[b] in C. operating system functions. The advan- low-level functions to access the Linux
The GET directive includes the com- tages of this implementation are consid- runtime environment.
mon procedures and definitions erable: It is fast to compile, is reliable Although the gcc linker builds the
needed in the compilation of a pro- and efficient, and offers a wide range of executable program, the object code
gram. The procedure start is similar to software libraries and system functions. produced by the compiler contains
main in C, where the VALOF keyword de- It is also available on several platforms, only blocks of position-independent
notes that start is a function with the including the PC and the Raspberry Pi. code, requiring no relocation. At runtime,
result returned by the RESULTIS key- The only drawback is the loss of speed alib initializes the BCPL environment,
word. The variable i, a local variable from interpreting the compiled code. setting up the workspace for the stack
of the procedure start, is implicitly de- I refer you to Martin Richard’s text- and global and static variables. Strictly,
fined at the start of the FOR loop, which book [1], and his website [2] which in- gcc is only used to generate a Linux-
is executed five times. The writef func- cludes a version of Cintcode, that is compatible module that can be loaded,
tion is similar to printf in C. The re- straightforward to download and imple- whereas the linking of a BCPL program
cursive function fact tests whether n is ment on an RPi. Also, a guide directed and libraries is performed by alib.
zero and returns either 1 or n*(n-1)!, at young people programming a Rasp-
where the parameter n is a local vari- berry Pi [3] provides an extensive de- Notes for Developers
able of the procedure fact. scription of BCPL and the Cintcode cThe compiler uses registers r0 to r9
In BCPL, a variable is defined as a implementation and numerous exam- for arithmetic operations, logic opera-
word that can represent an integer, a bit ples of BCPL programs. tions, and procedure calls. The code gen-
pattern, a character, a pointer to a string For the programmer intending to write erator attempts to optimize the code by
of characters, a floating-point number, or applications in BCPL that exploit the keeping variables in registers, minimiz-
an address. A programmer can apply processing power of the ARM cores of a ing the number of memory accesses.
arithmetic operators, logical operators, Raspberry Pi, a BCPL compiler generat- Register rg points to the global vector,
shift operators, an address operator, or ing ARM instructions directly is likely to and register rp is the BCPL stack pointer
indirection to a variable – the compiler produce code which runs considerably or frame pointer. Procedure linkage, pro-
assumes that the programmer knows faster than interpreted code. For other cedure arguments, and local variables
what they are doing and, subject to syn- users less concerned with processing are allocated space in the current frame.
tactic and sematic compilation checks, speed, the tools and support provided by Stack space is claimed on entry to a pro-
places very few constraints on program- the Cintcode implementation of BCPL cedure and released on return from a
ming constructions. Arguably, C and offer a stable and reliable platform. procedure. The link register lr holds the
BCPL fall into the category of languages return address on entry to a procedure
that provide almost unlimited power for BCPL for the Raspberry Pi and can also be used as a temporary reg-
a programmer with very few checks on The arrival of the Raspberry Pi with its ister within a procedure. The system
their intention. ARM cores, network connection, sound stack pointer sp is not used by the BCPL
Both C and BCPL allow sections of a and video outputs, USB ports, and I/O compiler, so it can be used to push and
program to be compiled separately (e.g., interface running under the Linux op- pop temporary variables. The compiler
to provide a library of functions). Global erating system has encour-
variables and procedures in BCPL, aged the development of a Listing 1: 1! to 5! in BCPL
which are similar to external variables range of programming lan- 01 GET "libhdr"
and functions in C, can be accessed by guages for this platform. A 02
all sections of a program, whereas static code generator for BCPL that 03 LET start() = VALOF
variables are only accessible from the I developed compiles BCPL 04 {
section in which they are declared. The directly to ARM machine 05 FOR i = 1 TO 5 DO
other category of variables is local or dy- code, which can be linked
06 writef("fact(%n) = %i4*n", i, fact(i))
namic variables, which are declared and with the standard Linux gcc
07 RESULTIS 0
used in the same way as C. When a local toolset. The compiler (7,000
08 }
variable is declared, space is allocated lines) compiles itself in less
09
on a stack, which grows and shrinks dy- than 0.2 seconds on a Rasp-
10 AND fact(n) = n=0 -> 1, n*fact(n-1)
namically, typically on entry to and exit berry Pi 4B.
uses the BCPL stack for procedure link- converted to C strings, if calling C. push {rg, rp, lr}
age and the storage of local variables. It (2) Addresses of variables, vectors, and pop {rg, rp, lr}
should be noted that the ARM core is a strings in BCPL are word addresses,
pipelined processor and reference to pc whereas they are machine addresses in The code produced by the code genera-
during an instruction implies the address C. Passing an address from BCPL to C tor for the factorial example is shown in
of the current instruction+8 for most in- requires a logical left shift of two Listing 2 with comments to explain spe-
structions. The program counter pc is places, and passing an address from C cific instructions. Note that register r0 is
used in the code generation of relative to BCPL requires a logical right shift of reloaded at location 0x38 because it is
addresses used for procedure calls and two places. Care is needed with strings reached by code from locations 0x34 and
branches and also in switchon expres- in C because they are not necessary 0x74; consequently, the content of regis-
sions in BCPL. aligned on 32-bit word boundaries. ter r0 is not assured. Additionally, the
Although Linux libraries are not ex- In both C and BCPL, the registers r0-r9 reference to the string
plicitly linked, the libc library is avail- are not preserved across procedure calls.
able to BCPL programs. Fortunately, the Additionally, the BCPL registers rp, rg, and "fact(%n) = %i4*n"
change instruction (blx). 24: e8a4c800 stmia r4!,{fp,lr,pc} Standard procedure entry
However, C and BCPL have two im- 28: e884000f stm r4,{r0,r1,r2,r3}
portant differences: (1) BCPL strings 2c: e244b00c sub fp,r4,#12
are defined by the string size in the first 30: e3a00001 mov r0,#1 Initial value i
byte followed by the 8-bit characters of 34: e58b000c str r0,[fp,#12] Save i
the string, whereas strings in C are ar- 38: e59b000c ldr r0,[fp,#12] Load i
rays of 8-bit characters terminated with 3c: e28b4024 add r4,fp,#36 Set new stack frame
a zero byte. BCPL strings must be 40: eb000017 bl 0xa4 Call f(i)
44: e1a02000 mov r2,r0 Arg 3 = f(i)
Table 1: BCPL Registers 48: e59b100c ldr r1,[fp,#12] Arg 2 = i
Register Name Function 4c: e59fe03c ldr lr,[pc,#60] Arg 1 = “fact(%n) = %i4*n”
0 r0 Data register 0 50: e08f000e add r0,pc,lr pc offset
1 r1 Data register 1 54: e1a00120 lsr r0,r0,#2 BCPL address
2 r2 Data register 2 58: e28b4010 add r4,fp,#16 Set new stack frame
3 r3 Data register 3 5c: e59ae178 ldr lr,[sl,#376] Global writef
4 r4 Data register 4 60: e12fff3e blx lr Call writef()
5 r5 Data register 5 64: e59b000c ldr r0,[fp,#12] Load i
6 r6 Data register 6 68: e2800001 add r0,r0,#1 Increment by 1
7 r7 Data register 7 6c: e58b000c str r0,[fp,#12] Store i
8 r8 Data register 8 70: e3500005 cmp r0,#5 Check end of for-loop
9 r9 Data register 9 74: daffffef ble 0x38 Continue for-loop
10 rg Global vector 78: e3a00000 mov r0,#0 Return 0
7c: e89b8800 ldm fp,{fp,pc} Standard procedure return
11 rp BCPL stack
80: 6361660f data String “fact(%n) = %i4*n”
12 ip Unused
13 lr Link register 84: 6e252874 data
files in bcpl-distribution-rpi2. In a ter- >sudo cp bcpl /usr/bin/ The logic simulator HILO-2 (the fore-
minal shell, enter the commands >sudo cp leader.o /usr/lib/ runner of Verilog) was developed in
>sudo cp blib.o /usr/lib/ BCPL. Numerous utilities, including
>unzip bcpl-distribution.zip >sudo cp alib.o /usr/lib/ the early word processor roff were
>as leader.s -o leader.o written in BCPL. Before the availability
>as alib.s -o alib.o The remaining BCPL programs can now of floating-point hardware, I adapted
>gcc leader.o bcpl.o blib.o alib.o -o bcpl be compiled and run with the command BCPL compilers for the Motorola 6809
bcpl rather than ./bcpl. The compiler and 68000 processors to use scaled
to build and test the compiler (> denotes searches for library files in the working fixed-point arithmetic in real-time
the Linux prompt). directory before searching the directories flight simulation. Q Q Q
For a first compiler test, compile and /usr/include/BCPL and /usr/lib.
run the program fact.b, which prints the Info
factorial numbers from 1! to 5!: Nostalgia [1] Richards, Martin. BCPL: The Language
The influence of BCPL on the develop- and its Compiler, revised ed. Cam-
>./bcpl fact.b -o fact ment of C and its later variants cannot bridge Univ Press, 2009:
>./fact be overstated. The availability of https://www.amazon.com/BCPL-
BCPL for the Raspberry Pi allows old Language-Compiler-Martin-Richards/
Further confidence tests rebuild the computer science students to dust off dp/0521286816/ref=sr_1_1
BCPL compiler bcpl.b with the BCPL copies of their programs, which [2] Martin Richards:
compiler and build the library blib.b: should run directly on the Raspberry https://www.cl.cam.ac.uk/~mr10/
Pi. BCPL was used extensively in [3] Richards, M., Young Persons Guide to
>./bcpl bcpl.b -o bcpl many UK university computer science BCPL Programming on the Raspberry
>./bcpl -c blib.b departments. The portable multi-task- Pi Part 1. Cambridge (UK): Computer
ing operating system Tripos was writ- Laboratory, University of Cambridge,
The BCPL library files and the compiler ten entirely in BCPL in the Computer revised 23 Oct 2018:
can then be copied to the appropriate Laboratory at the University of Cam- https://www.cl.cam.ac.uk/~mr10/
Linux shared directories: bridge and used in early versions of bcpl4raspi.pdf
the Commodore Amiga, in the auto- [4] Code for this article:
>sudo mkdir /usr/include/BCPL motive industry, and in financial https://linuxnewmedia.thegood.cloud/
>sudo cp libhdr.h /usr/include/BCPL/ applications. s/9nFQcFb2p8oRMEJ
MADDOG’S
Jon “maddog” Hall is an author,
educator, computer scientist, and free
software pioneer who has been a
DOGHOUSE
This month I want to write about what makes free
passionate advocate for Linux since
1994 when he first met Linus Torvalds software fun for me. BY JON “MADDOG” HALL
and facilitated the port of Linux to a
64-bit system. He serves as president
of Linux International®.
Not just the tech
riting software has always been fun for me. It is like a It was about a decade ago when I was at CeBIT, at that
Hear Me RAR
The non-free RAR compression tool offers some benefits you
won’t find with ZIP and TAR. BY ALI IMRAN NAGORI
rchiving files is like preserving your digital on Ubuntu and Debian-based distributions, you
stall the RAR and UnRAR command-line utilities. 05 a Add files to archive
Furthermore, if you want to make sure you’re get- 06 c Add archive comment
07 ch Change archive parameters
ting the latest upgrades and maintaining compati-
08 cw Write archive comment to file
bility with proprietary RAR archives, it’s best to
09 d Delete files from archive
stick with the official RAR and UnRAR applica- 10 e Extract files without archived paths
tions. To install these applications, you can use 11 ...
your distribution’s package manager. For example,
All right, that’s enough of the technical jargon. $ rar a backup.rar file1.txt server.logs users.csv
Let’s put RAR into action and see what it can actu-
ally do. Take some simple text files, say file1.txt, This will create a neat RAR archive named backup.
server.logs, and users.csv, and simply use the rar rar containing file1.txt, server.logs, and users.
command with the subcommand a. Next, put the csv. Interestingly, the -r recursive option lets you
name of the archive you want to create and the add directories whether they include files or not
files you want to include (Figure 1). For example: (Figure 2):
.txt
.txt -p<my_password>
_file>
Let’s Go Extracting
Figure 3: Creating a password-protected RAR archive. Let’s now do some extraction jobs. Extracting
RAR files is pretty much the same as creating
one. However, there is no vendor lock on the
programs that extract the RAR files. You can
choose from multiple options such as WinZip,
WinRAR, 7-Zip, etc. For the time being, let’s go
with the traditional UnRAR program.
First things first, you can extract the archive
to the same directory it is located in. This will
not keep the original directory layout intact
(Figure 5). The directory structure will be lost,
and all items will be put into the single direc-
Figure 4: Creating a split archive with RAR. tory you’re in. To accomplish this task, you
need to use the e subcommand with rar. Here's • You can use WinRAR with any language
how it’s used: version.
• One key grants you the liberty to activate
$ unrar e my_secure_archive.rar RAR on multiple devices, provided it’s for
noncommercial use.
Besides copying the files, it extracts subdirecto- • You get professional support right from the
ries without actually recreating them. Sometimes, support staff.
it might hurt you if you can’t get the original layout.
But no worries, there is a way out to keep the full Conclusion
directory path (Figure 6). Just hit up the option x. It While free options are available, RAR’s ease of use Info
will do the trick for you: and feature set make it a solid choice if you’re will-
[1] RAR for Linux and
ing to invest in a license. In conclusion, working
Mac: https://www.
$ unrar x my_secure_archive.rar with RAR files in Linux is straightforward once you
win-rar.com/rar-linux-
have the RAR and UnRAR utilities installed.
mac.html?&L=0
Pretty cool, right? These files get extracted right Whether you’re creating simple archives, adding
into your current directory, maintaining their origi- password protection, or splitting files, RAR offers [2] RAR manpage:
nal tree structure intact. a range of features that can be valuable for man- https://manpages.
What about unpacking an archive to a preset aging your data. Just keep in mind the proprietary ubuntu.com/
directory? For this, option -o is at your disposal: licensing when considering its use. Q Q Q manpages/en/
man1/rar.1.html
$ unrar e my_secure_archive.rar -o <some_U
The Author [3] Multi-volume RAR ar-
directory_path>
Ali Imran Nagori is a technical writer and chive: https://www.
win-rar.com/split-files-
Extracting Password-Protected RAR Archives Linux enthusiast who loves to write about
archive.html?&L=0
If a RAR file is locked down with a password, you Linux system administration and related
have to make sure to drop that fancy password [4] RAR license: https://
technologies. He blogs at tecofers.com. You
when you’re opening it. The -p option comes in www.win-rar.com/
can connect with him on LinkedIn.
handy here. See Figure 7: winrarlicense.html?&L=
File manager
Spacedrive
e’ve looked at many After first creating a library –
internetarchive
of files. To help with this, the Internet Archive publishes
its own set of open source command-line tools, inter-
netarchive, installable either through Python’s pip or as a
directly executable binary. This binary interacts with the
ather than being an ar- 808 billion pages archived so far in Internet Archive’s own API, and you can use it to perform
Circuit designer
LibrePCB 1.0.0
inux and open source excel editor and a board editor. As with
Filesystem navigator
nav
fter you’ve learned the several of the same arguments.
File synchronization
Celeste
hether it’s local, LAN, Nextcloud in between. But the
W or server-based, stor-
age is now cheaper
than ever. But we’re also gener-
best thing about rclone is that
it’s been around long enough to
be trusted. If only it wasn’t a
ating more data than ever, and command-line tool.
the two seem to cancel each Celeste is the answer. It’s a
other out. It’s tempting to stick beautiful, minimal graphical ap-
with the default media backup plication that’s been developed
services offered by Amazon, to synchronize a local location
Google, and Apple, but that to a remote location and back. Celeste has been written in Rust and is proud of how fast it runs,
means putting your trust and The GUI lists servers on the left regardless of the desktop environment.
privacy in their hands. Unless and files and directories on the
you’re a sys admin, there isn’t right, with a status icon for each includes Dropbox, Google Drive, Nextcloud, Proton
an easy solution to manage location to show which are Drive, and WebDAV. This power and capability comes
this locally. One of the best being updated. It handles the from using rclone as the back end, which is a good
tools for backup, for example, is complexity of excluding specific thing. It means that while Celeste itself remains under
rclone. This is a command-line files and dealing with conflicts heavy development and is still considered an alpha re-
tool that can synchronize one when something changes. It lease, its file synchronization and backup can be
location to another, with sup- can do this while connecting to trusted, at least for collections you’re happy to clone to
port for dozens of different several cloud providers at the more than one other location.
storage locations, from Ama- same time. The cloud provider
zon to WebDAV, with local files, list isn’t currently as compre- Project Website
the Internet Archive, SFTP, and hensive as rclone’s, but it still https://github.com/hwittenborn/celeste
File encryption
Cryptomator
aking sure your files codebase that’s been indepen-
Music workstation
Ardour 8
t’s fantastic being able to values for each note
Strategy game
Zatikon
atikon promises to be move, and range attributes,
Swapping Places
Waydroid brings Android apps to the Linux desktop in a simple and effective way.
BY HARALD JELE mulators can be used to run applications In the first step, if not already present, you need
need to restart the Waydroid session (Listing 2, databases/gservices.db "select * from main where name = \"android_
id\";"
lines 3 and 4).
02 [...]
Figure 4: In multi-window mode, Waydroid displays all the Android apps in portrait mode.
QQQ
LINUX
NEWSSTAND
Order online:
https://bit.ly/Linux-Magazine-catalog
Linux Magazine is your guide to the world of Linux. Monthly issues are packed with advanced technical
articles and tutorials you won't find anywhere else. Explore our full catalog of back issues for specific
topics or to complete your collection.
#277/December 2023
Low-Code Tools
Experienced programmers are hard to find. Wouldn’t it be nice if subject matter experts and
occasional coders could create their own applications? The low-code revolution is all about
lowering the bar for programming knowledge. This month we show you some tools that let you
assemble an application using easy graphical building blocks.
On the DVD: MX Linux MX-23_x64 and Kali Linux 2023.3
#276/November 2023
ChatGPT on Linux
Everybody’s talking about ChatGPT, and ChatGPT is talking about everything. Sure you can
access the glib and versatile AI chatbot from a web interface, but think of the possibilities if you
tune in from the Linux command line.
On the DVD: Rocky Linux 9.2 and Debian 12.1
#275/October 2023
Think like an Intruder
The worst case scenario is when the attackers know more than you do about your network. If you
want to stay safe, learn the ways of the enemy. This month we give you a glimpse into the mind
of the attacker, with a close look at privilege escalation, reverse shells, and other intrusion
techniques.
On the DVD: AlmaLinux 8.2 and blendOS
#274/September 2023
The Best of Small Distros
Nowadays, all the attention is on big, enterprise distributions supported by professional
developers at big, enterprise corporations, but small distros are still a thing. If you’re shopping
for a Linux to run on old hardware, if you just want a simpler system that is more responsive
and less cluttered, or if you’re looking for a special Linux tailored for a special purpose, you’re
sure to find inspiration in our look at small and specialty Linux systems.
On the DVD: 10 Small Distro ISOs and 4 Small Distro Virtual Appliances
#273/August 2023
Podcasting
On the Internet, you don’t have to wait for permission to speak to the world. Podcasting lets you
connect with your audience no matter where they are. Whether you're in it to build community,
raise awareness about your skills, or just have some fun, the tools of the Linux environment
make it easy to take your first steps.
On the DVD: Linux Mint 21.1 Cinnamon and openSUSE Leap 15.5
#272/July 2023
Open Data
As long as governments have kept data, there have been people who have wanted to see it and
people who have wanted to control it. A new generation of tools, policies, and advocates seeks
to keep the data free, available, and in accessible formats. This month we bring you snapshots
from the quest for open data.
On the DVD: xubuntu 23.04 Desktop and Fedora 38 Workstation
FEATURED EVENTS
Users, developers, and vendors meet at Linux events around the world.
We at Linux Magazine are proud to sponsor the Featured Events shown here.
For other events near you, check our extensive events calendar online at
https://www.linux-magazine.com/events.
If you know of another Linux event you would like us to add to our calendar,
please send a message with all the details to [email protected].
Events
FOSDEM Feb 3-4 Brussels, Belgium https://fosdem.org/
Contact Info
WRITE FOR US
Editor in Chief Linux Magazine is looking for authors to write articles on Linux and the
Joe Casad, [email protected] tools of the Linux environment. We like articles on useful solutions that
Copy Editors
Amy Pettle, Aubrey Vaughn solve practical problems. The topic could be a desktop tool, a command-
News Editors line utility, a network monitoring application, a homegrown script, or
Jack Wallen, Amber Ankerholz anything else with the potential to save a Linux user trouble and time.
Editor Emerita Nomadica
Rita L Sooby
Our goal is to tell our readers stories they haven’t already heard, so we’re
Managing Editor especially interested in original fixes and hacks, new tools, and useful ap-
Lori White plications that our readers might not know about. We also love articles on
Localization & Translation
advanced uses for tools our readers do know about – stories that take a
Ian Travis
Layout traditional application and put it to work in a novel or creative way.
Dena Friesen, Lori White
We are currently seeking articles on the following topics for upcoming
Cover Design
Dena Friesen cover themes:
Cover Images
© Rewat Phungsamrong, 123RF.com • Open hardware
and Lexey111, fotolia.com • Linux boot tricks
Advertising
Brian Osborn, [email protected] • Best browser extensions
phone +49 8093 7679420
Let us know if you have ideas for articles on these themes, but keep in
Marketing Communications
Gwen Clark, [email protected] mind that our interests extend through the full range of Linux technical
Linux New Media USA, LLC topics, including:
4840 Bob Billings Parkway, Ste 104
Lawrence, KS 66049 USA
• Security
Publisher
Brian Osborn • Advanced Linux tuning and configuration
Customer Service / Subscription • Internet of Things
For USA and Canada:
Email: [email protected] • Networking
Phone: 1-866-247-2802 • Scripting
(Toll Free from the US and Canada)
• Artificial intelligence
For all other countries:
Email: [email protected] • Open protocols and open standards
www.linux-magazine.com
While every care has been taken in the content of the
If you have a worthy topic that isn’t on this list, try us out – we might be
magazine, the publishers cannot be held responsible interested!
for the accuracy of the information contained within
it or any consequences arising from the use of it. The Please don’t send us articles about products made by a company you
use of the disc provided with the magazine or any work for, unless it is an open source tool that is freely available to every-
material provided on it is at your own risk.
Copyright and Trademarks © 2023 Linux New Media
one. Don’t send us webzine-style “Top 10 Tips” articles or other superfi-
USA, LLC. cial treatments that leave all the work to the reader. We like complete so-
No material may be reproduced in any form lutions, with examples and lots of details. Go deep, not wide.
whatsoever in whole or in part without the written
permission of the publishers. It is assumed that all Describe your idea in 1-2 paragraphs and send it to: [email protected].
correspondence sent, for example, letters, email,
faxes, photographs, articles, drawings, are supplied Please indicate in the subject line that your message is an article proposal.
for publication or license to third parties on a non-
exclusive worldwide basis by Linux New Media USA,
LLC, unless otherwise stated in writing.
Linux is a trademark of Linus Torvalds.
Authors
All brand or product names are trademarks of their
respective owners. Contact us if we haven’t cred- Tom Alby 22 Sebastian Hilgenhof 16
ited your copyright; we will always correct any
oversight. Dave Allerton 69, 74 Dr. Harald Jele 90
Printed in Nuremberg, Germany by Kolibri Druck. Chris Binnie 38 Vincent Mealing 79
Distributed by Seymour Distribution Ltd, United
Kingdom
Zack Brown 12 Pete Metcalfe 54
Represented in Europe and other territories by: Rene Brunner 26
Sparkhaus Media GmbH, Bialasstr. 1a, 85625 Steffen Möller 16
Glonn, Germany.
Bruce Byfield 6, 32, 46
Graham Morrison 84
Linux Magazine (Print ISSN: 1471-5678, Online Joe Casad 3
ISSN: 2833-3950, USPS No: 347-942) is published Ali Imran Nagori 81
monthly by Linux New Media USA, LLC, and dis-
Mark Crutch 79
tributed in the USA by Asendia USA, 701 Ashland Adam Dix 65 Amy Pettle 36
Ave, Folcroft PA. Application to Mail at Periodicals
Christian Dreihsig 16 Mike Schilli 60
Postage Prices is pending at Philadelphia, PA and
additional mailing offices. POSTMASTER: send ad- Marco Fioretti 48 Jack Wallen 8
dress changes to Linux Magazine, 4840 Bob Billings
Parkway, Ste 104, Lawrence, KS 66049, USA. Jon “maddog” Hall 80 Malte Willert 16
Intrusion
Detection
If intruders were on your network, would you
know it? Next month we show you how to
build an intrusion detection appliance using
a Raspberry Pi and the Suricata IDS tool.
Preview Newsletter
The Linux Magazine Preview is a monthly email
newsletter that gives you a sneak peek at the next
issue, including links to articles posted online.