I learned to program on a calculator
In middle school, we were required to buy TI-83 calculators for math class. Somewhere along the way, I discovered that they could be programmed.
The language was TI-BASIC, an adaptation of BASIC that even 25 years ago was considered a fossil. You had a small screen, a keypad that fought you at every step, and a very constrained model of computation. There was no filesystem. No processes. No operating system in any meaningful sense.
There was memory, though, all 24KB of it!
Specifically, there was a predefined set of named locations: A through Z, plus θ. If you wanted to store data, you put it in one of those using the strangly esoteric STO button (short for store) which represented itself with a → symbol.
The Disp function was my favorite. You could calculate something, call Disp, see the result, and the program would exit with a resounding Done. This was especially useful for having endless trilling walls of “Charlie is cool” on tap, a very useful asset in 7th grade Algebra!
That was my first mental model of data. Data lived somewhere simple. You touched it. You saw it. And touching it was the point, it was like play-doh. Nothing about that environment suggested that how you accessed data mattered. A read was a read. A write was a write. The rest was just syntax.
That model stuck with me longer than I realized.
Touching the data felt like the work
As I moved into more serious programming, the surface area grew, but the intuition stayed the same. The computer read bytes, parsed them, transformed them, then wrote results somewhere else. If something was slow, I assumed I was doing too much work with the data. Perhaps the algorithms were inefficient, or maybe there were extra copies I could optimize away if I were more deliberate. Performance problems felt like local problems.
Then I learned about pointers. The first time you really get references, it feels like you’ve discovered a trap door under the language. Before that, everything is values. You pass things around, they get copied, you mutate something and it changes here but not there. It’s all concrete, you can almost see it. Pointers break that in the best possible way.
Suddenly you can hold a thing that points to another thing. You can have two names for the same underlying data. You can pass something “by reference” and mutate it from somewhere else. You can build linked structures and avoid copies. You can make APIs that feel fast and elegant. It was the first time I felt like I had challenged the calculator model and won.
Oh. the data doesn’t have to move. The name can move.
And for a while, that felt like the whole secret.
I remember having that very specific kind of early confidence: not the bootcamp kind where you’re excited to learn, but the kind where you think you’ve finally uncovered the real rules, the hidden layer. The part that separates people who can code from people who understand what’s happening. The proverbial backroom in the noir film populated with a smoky silhouettes around a poker table, the secret club.
And it did open a door, it just wasn’t the last door. Pointers taught me that “touching the data” wasn’t always necessary, but they also quietly reinforced the next assumption: that the important boundary was inside the language.
A read is not just a read
In the calculator world, a read was a read. There was no operating system. No protection boundaries. No distinction between where data lived and who was allowed to touch it.
Pointers were my first hint that the location of data mattered.
But on a real machine, location isn’t just a matter of heap vs stack or “did this allocate.” It’s a matter of privilege boundaries, address spaces, and what the kernel has to do to let you see anything at all.
- A read from a socket is not the same thing as a read from a file.
- A read that copies data into your address space is not the same thing as a read that hands ownership elsewhere.
- A read that wakes the kernel, switches privilege levels, and disrupts the CPU’s execution context is not interchangeable with one that doesn’t.
They all look like “reading bytes” in code, BUT they are not equivalent events in the system.
Once you see that, it becomes hard to unsee. And it forces a much less comfortable question: What if the slow part isn’t what I’m doing with the data… but the fact that I insist on touching it at all?
Pointers taught me that data didn’t always have to move. The lesson which was not as clear to me at the time was that seeing data has a cost even when data stays where it is. That distinction didn’t click for a long time, mostly because nothing in everyday programming forces you to confront it directly.
The machine does not experience it that way.
On a real system, the act of reading data often begins with an interruption, not a metaphorical one, a literal one. When user code asks the operating system to do something on its behalf, the CPU has to stop what it is doing, save its current state, and cross a protection boundary. Registers are written out, execution context is swapped, and privilege levels change. The instruction pipeline is disrupted so completely that the processor has to rebuild its momentum from scratch. All of this happens before a single byte of your payload is touched.
If the data lives somewhere the kernel controls, like a socket buffer, the page cache, a device, the CPU cannot simply reach out and grab it. It has to ask. The hidden part is that asking is expensive in ways that don’t show up as obvious bugs or incorrect results. That is the cost of crossing the boundary.
The processor loses its bearings.
There is a second cost that shows up once the data actually flows through your program. Modern CPUs rely heavily on locality; they keep your program’s hot instructions and working data close at hand, in small, extremely fast caches. Those caches assume that execution stays relatively stable, that the same code paths and memory regions will be touched repeatedly. When large volumes of payload data stream through user space, they don’t politely wait their turn. They evict.
The loops, counters, and state your program depends on get pushed out to make room for data you may only touch once. Memory that was quick to reach suddenly requires extra work. The system still works, but it does so with more hesitation.
None of this means you should avoid system calls, that would be absurd. It simply means that awareness itself carries a tax. Every time your program insists on being involved in the movement of data, it asks the machine to stop, switch gears, and pay that tax again. That tax is not just time, it is contention on one of the most constrained resources in the system.
Zero-copy as restraint
The first time I encountered zero-copy techniques, they were presented as optimizations with promises of faster networking, higher throughput and fewer allocations. All of that is true, but it undersells what is actually happening. Zero-copy is not about cleverness. It is about restraint.
Instead of pulling data into your own address space so you can look at it, you tell the operating system what should be connected to what, and then you step aside. Data moves without ever becoming yours. This is a fundamentally different posture. You are no longer transforming data. You are delegating its movement.
A baseline: the obvious loop
Most of us start with some version of this. It’s not wrong. It’s just involved.
// io.Copy is the canonical version of this.
// The structure matters: read into user memory, then write out.
func proxyCopy(dst io.Writer, src io.Reader) error {
buf := make([]byte, 32*1024)
for {
n, rerr := src.Read(buf)
if n > 0 {
if _, werr := dst.Write(buf[:n]); werr != nil {
return werr
}
}
if rerr != nil {
if errors.Is(rerr, io.EOF) {
return nil
}
return rerr
}
}
}
That buffer lives in your address space. Every chunk of data crosses into user space, touches your caches, and becomes your problem. Sometimes that’s exactly what you want because you need to inspect, redact, compress, encrypt, validate, or log.
And sometimes… you don’t.
sendfile: file → socket without copying into user space
If your source is a file and your sink is a socket, Linux can do the handoff directly:
// sendFile delegates the copy from file -> socket to the kernel.
// It avoids pulling the payload into user space.
func sendFile(outFd int, inFile *os.File) (int64, error) {
// Note: sendfile arguments are (out, in, offset, count).
// Using nil offset means the file offset is advanced.
var sent int64
for {
n, err := unix.Sendfile(outFd, int(inFile.Fd()), nil, 1<<20) // 1MiB chunks
if n > 0 {
sent += int64(n)
}
if err == nil {
continue
}
if errors.Is(err, unix.EINTR) {
continue
}
if errors.Is(err, unix.EAGAIN) {
// Non-blocking socket: caller should wait for writable and retry.
return sent, err
}
// EOF on the input file typically shows up as n==0 with err==nil,
// depending on platform details.
return sent, err
}
}
This is one of those moments where code looks almost boring, and that’s the point. You’re not “processing” bytes anymore. You’re specifying a path.
splice: plumbing between file descriptors
splice is more general. It uses a pipe as an in-kernel handoff point. That pipe is not a byte buffer in the usual sense, it is a circular buffer of page references.
The kernel is not copying your payload into the pipe. It is just keeping track of where that data already lives in memory. The pipe becomes a queue of references to those memory chunks, so the data doesn’t move, only the instructions for how to access it do.
The key detail is that the pipe is not a “user buffer.” It’s a kernel mechanism for moving references to pages around. Very similar to the Go adage: Don’t communicate by sharing memory, share memory by communicating
// spliceLoop moves data from srcFd -> pipe -> dstFd without copying into user space.
// This is Linux-specific and intentionally low-level.
func spliceLoop(dstFd, srcFd int) error {
// Create an in-kernel pipe.
var p [2]int
if err := unix.Pipe2(p[:], unix.O_CLOEXEC); err != nil {
return err
}
defer unix.Close(p[0])
defer unix.Close(p[1])
const chunk = 1 << 20 // 1MiB
for {
// 1) splice src -> pipe
// SPLICE_F_MOVE: a hint to the kernel that we want to move pages, not copy them.
n, err := unix.Splice(srcFd, nil, p[1], nil, chunk, unix.SPLICE_F_MOVE)
if n > 0 {
// 2) splice pipe -> dst
left := n
for left > 0 {
m, werr := unix.Splice(p[0], nil, dstFd, nil, left, unix.SPLICE_F_MOVE)
if m > 0 {
left -= m
continue
}
if werr != nil {
if errors.Is(werr, unix.EINTR) {
continue
}
return werr
}
}
}
if err == nil {
continue
}
if errors.Is(err, unix.EINTR) {
continue
}
if errors.Is(err, unix.EAGAIN) {
// Caller should poll and retry.
return err
}
if errors.Is(err, io.EOF) || n == 0 {
return nil
}
return err
}
}
But the real cost shows up elsewhere, when you choose not to look at the data, you give something up.
The price of blindness
Delegation comes with blindness. Once data moves through the system without entering your address space, you cannot see it. You cannot log it. You cannot redact it. You cannot enforce policies that depend on inspecting payloads.
In regulated or security-sensitive environments, this matters immediately. Masking, auditing, validation, all of these require awareness. You cannot partially inspect a stream you never touch. This is why zero-copy is not a universal best practice. It is incompatible with entire classes of logic. In essence, the implementation decision is not “fast or slow” it is “control or momentum.”
In practice, most systems end up with a mix. Some paths are designed for inspection and correctness. Others are designed for throughput and minimal interference. Knowing which is which is an architectural decision, not an optimization.
Doing the same thing to your own heap
At some point, the same idea shows up closer to home. Even when data lives entirely in user space, touching it repeatedly has a cost. Allocation creates work for the garbage collector, poor locality causes cores to fight over cache lines memory that could have stayed warm gets churned out and pulled back in.
Pooling memory changes the relationship. Instead of constantly allocating and discarding buffers, you keep a small working set alive. Memory stays mapped, cache lines stay relevant, the processor sees familiar patterns instead of a stream of novelties.
A practical pool: keeping a hot set of buffers
var bufPool = sync.Pool{
New: func() any {
// Pick a size that matches your hot path.
b := make([]byte, 32*1024)
return &b
},
}
func pooledCopy(dst io.Writer, src io.Reader) error {
bp := bufPool.Get().(*[]byte)
buf := *bp
defer func() {
// Optional: zero out if buffers can hold sensitive data.
// for i := range buf { buf[i] = 0 }
bufPool.Put(bp)
}()
for {
n, rerr := src.Read(buf)
if n > 0 {
if _, werr := dst.Write(buf[:n]); werr != nil {
return werr
}
}
if rerr != nil {
if errors.Is(rerr, io.EOF) {
return nil
}
return rerr
}
}
}
This is not glamorous, and it’s not magic (even if it feels like it). It’s just a way of saying: these allocations are not part of my product.
The runtime is free to drop pooled objects under memory pressure, sync.Pool is a cache, not a contract; but under steady throughput it tends to keep a warm working set around. That warm working set is the point. Under the hood, Go goes to surprising lengths to make this safe and fast. Pools are sharded so that cores mostly interact with their own local state. Just as importantly, the runtime uses careful cache alignment and padding so that different processors are not fighting over the same 64‑byte cache line. This avoids false sharing — a pathological case where two cores modify unrelated variables that happen to live on the same cache line, forcing constant invalidation traffic even though the data itself is independent. The effect is the same as zero-copy, just applied inward instead of outward.
Philosophically, you are not trying to be clever with memory, you are trying to avoid disturbing it unnecessarily.
When delegation collides with lifecycle
There is a moment where this approach stops feeling elegant and starts feeling dangerous. Delegated work continues even when your code is not running. If a goroutine blocks while the kernel moves data on its behalf, your program may be alive without being aware. Signals can arrive. Shutdown can begin. From the outside, the process looks healthy. Inside, nothing is observing intent. This is where lifecycle discipline matters.
If you want to shut down cleanly, you need a way to periodically regain control: time bounds, deadlines, and checkpoints where execution returns to user space and intent can be evaluated.
The problem: blocked in the kernel
In Go, graceful shutdown is often modeled as a context.Context that gets cancelled on SIGTERM. That works beautifully as long as your goroutines are actually running.
ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
defer stop()
// Somewhere else:
select {
case <-ctx.Done():
// begin shutdown
}
But if your hot path is sitting inside a syscall like read, splice, sendfile, anything that can block, you don’t get to run that select until the kernel returns. So the question becomes: how do you force the kernel to return often enough that “intent” can be observed?
Deadlines: turning a blind loop into periodic checkpoints
For network connections, Go gives you a very practical lever: deadlines. A deadline is a time budget you hand to the kernel. If the call would block longer than that budget, it returns control to you. That means your program periodically wakes up, checks whether shutdown has been requested, and either continues or exits.
// readLoopWithDeadline periodically returns to user space so ctx cancellation is observable.
func readLoopWithDeadline(ctx context.Context, c net.Conn) error {
for {
// Force a return every few seconds even if no data arrives.
_ = c.SetReadDeadline(time.Now().Add(2 * time.Second))
buf := make([]byte, 4096)
_, err := c.Read(buf)
if err != nil {
// Timeout is expected — it is our checkpoint.
if ne, ok := err.(net.Error); ok && ne.Timeout() {
select {
case <-ctx.Done():
return ctx.Err()
default:
continue
}
}
return err
}
// ... process bytes or forward them ...
}
}
There’s a larger lesson here: deadlines are a control plane. They are how you reassert intent in a system that is otherwise moving forward without you.
Draining and exit discipline
If you use a pipe-based zero-copy path (like splice), shutdown has one more sharp edge. Data can be “in flight” inside the kernel, sitting in a pipe buffer that your code never sees, right when you decide to exit.
A clean shutdown needs a final discipline:
- stop producers
- stop accepting new work
- drain what’s already in the plumbing
- then exit
It’s the same principle as graceful HTTP shutdown. You don’t stop the world in the middle of a request. Zero-copy just makes it easier to forget that requests can exist in places you don’t have visibility. Delegation does not absolve responsibility. It shifts where responsibility must be reasserted.
Closing perspective
Learning not to touch data was not about becoming faster, it was about learning when involvement adds value, and when it only adds friction. Early on, touching the data was the work. Then references taught me that names could move instead of bytes. Later still, systems work taught me that sometimes even awareness is too expensive. The fastest systems I’ve worked on were not the ones with the most insight, instead they were the ones that knew when insight was unnecessary. That kind of restraint is not obvious, and it is not taught early. It comes from watching systems behave under load, under shutdown, and under failure, and noticing that the quietest paths are often the most reliable.
The fastest way to process data is to never look at it.
The hard part is knowing when you still need to.