command center
The following are the titles of recent articles syndicated from command center
Add this feed to your friends list for news aggregation, or view this feed's syndication information.

LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.

[ << Previous 20 ]
Thursday, January 24th, 2019
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
5:27 am
Notes from a 1984 trip to Xerox PARC
Someone at work asked about this trip report I wrote long ago. I had not realized it was ever seen outside Bell Labs Research, but I now know it was. It's a bit of a time capsule now, and I was able resurrect it. Importing into Blogger was a bit of an ordeal that informs some of the points raised here, but other than that I make no comment on the contents beyond apologizing for the tone.


This Dorado is MINE, ALL MINE!

Rob Pike
April 29, 1984

A collection of impressions after doing a week’s work (rather than demo reception) at PARC. Enough is known about the good stuff at PARC that I will concentrate on the bad stuff. The following may therefore leave too negative an impression, and I apologize. Nonetheless...

Dorados work, and although most people at PARC now have their personal machine, it is hard to find a spare one for a visitor (e.g. me), and the allocation method is interesting. There are rooms full of Dorados in racks, and each office has a terminal in it. To connect a machine to a terminal, you make a connection at the patch panel. There are just enough Dorados to go around, which makes it hard (but not impossible) to find a spare when one is needed. What seemed to be in shorter supply was the terminals. There is a terminal in every office, but nothing like a terminal room. The presence of a visitor made it necessary to go to the patch panel a couple of times a day (with muffled annoyance) to connect the temporarily vacant office to the vacant machine. It really seems to me that it would make more sense and involve less mechanism to get a couple more machines and terminals, put them in a public place, and have the plug board there for emergencies like machine failure. Personal or not, it’s too big a deal to get a machine.

The idea of personal computers is an abreaction to time-sharing, with the unfortunate consequence that personal computers (to be specific, D machines) look like personal time-sharing computers, albeit without time-sharing software, an issue I’ll return to. The major issue is disks. Beside the plug board is a white board divided into a grid of time-of-day cross machine-name. The machines have names because they have disks; the reason you need the name is to get to a file you left there last time. Unfortunately, Dorados need disks to run, because they must behave like Altos to run the crufty Alto software that performs vital services such as file transfer. The machine bootstraps by reading disk, not Ethernet. If this were not the case, machine allocation could be done much more simply: you sit down at the (slightly smarter) terminal you want to use, it finds a free Dorado, you log in to that and (if you care) connect, with the Ethernet, to the disk you want.

The machines run one program at a time, and that seems fundamental. When I had finished preparing a figure with Draw, I had to exit Draw (‘‘Do you really want to quit? [Yes or No]’’ No. (I should write the file.) ‘‘Name of file to write?’’ foo.draw ‘‘Overwrite the foo.draw that’s already there? [Yes or No]’’ Yes (dammit!). ‘‘Do you really want to quit? [Yes or No]’’ YES!!!!!!) to run the printing program. The printing program then grabs the Dorado (running as an Alto) and holds it until the file is printed, which can take quite a while if someone’s using the printer already. Once that’s accomplished, Draw can be started again, but it won’t even accept a file name argument to start with. All the old state is lost.

As another example, if there is a file on your local disk that I want to copy to my local disk (the standard way to look at a remote file, it seems), I must find you in your office and ask you to stop whatever you are doing, save the state of your subsystem (Smalltalk, Cedar, etc.) and return to the Alto operating system. Then, you run ftp on your machine, then go get a cup of coffee while I go to my machine and pull the file across. On a related note, I spent an hour or so one morning looking for a working version of a program, which eventually turned up on someone’s Alto. Copies of the file didn’t work on my machine, though, so whenever we needed to run it we had to find the person and so on... Given the level of interaction of the subsystems, the way PARC deals with files and disks is striking.

Files are almost outside the domain of understanding at PARC. In fact, the whole world outside your machine, Ethernet notwithstanding, is a forbidding world. I really got the feeling that their systems are so big because they must provide everything, and they must provide everything because nobody out there provides anything, or at least it’s so hard to get to anyone helpful it’s not worth it. Smalltalk files are awkward, but you can just (essentially) map the information into your address space and hold it there, whereupon  it becomes remarkably easy to use. These subsystems are all like isolated controlled environments, little heated glass domes connected by long, cold crawlways, waiting out the long antarctic winter. And each dome is a model Earth.



Using Smalltalk for a few days was interesting. The system is interactive to a fault; for example, windows pop up demanding confirmation of action and flash incessantly if you try to avoid them. But overall, the feel of the system is remarkable. It is hard to describe how programs (if that is the right word) form. The method-at-a-time discipline feels more like sculpting than authoring. The window-based browsers are central and indispensable, since all that happens is massaging and creating data structures. A conventional text editor wouldn’t fit in. I was surprised, though, at how often the window methodology seemed to break down. When an incorrect method is ‘‘accepted,’’ messages from the compiler are folded into the text as selected text, often at the wrong place. I found this behavior irritating, and would prefer a separate window to hold the messages. The message in the code seemed to make it harder to find the problem by cluttering the text.

A more interesting example occurred when I was half-way through writing a method and found I needed information about another method. The browser I was in was primed with the information I needed, but I couldn’t get at it, because the browser itself does not have windows. I could not leave the method I was working on because it was syntactically incorrect, which meant I had to ‘‘spawn’’ a new browser. The new browser remembered much of the state of the original, but it felt awkward to step away from a window that had beneath it all the information I needed. A browser can only look at one method at a time, which is annoying when writing programs, but if it had windows itself, this restriction could disappear.

A similar problem is that, when writing a method, if a new method is introduced (that is, invented while writing the caller), you must keep the missing method name in your head; the window that tells you about it when you accept the method disappears when you say it’s O.K. The complexity of managing the browser puts an undue strain on the programmer; I had an urge to make notes to myself on a scrap of paper.

I missed grep. It’s not strictly necessary, because the browser does much of the same functionality, but in a different way. To look for something, you look up its method; if you can’t remember the name, you may have trouble. Experienced users clearly had no difficulty finding things, but I felt lost without automatic tools for searching.

Believe it or not, the response sometimes felt slow. I had to slow down frequently to let the eventfielding software catch up with me. For example, when making a rectangle for a new window, I could lose as much as two inches off the upper left corner because of the delay between pushing a button and having it detected. I therefore compensated, watching the cursor subconsciously to see that it was with me. The obvious analogy is locking the keyboard.

The language is strange but powerful; I won’t attempt a full critique here, but just comment on some peculiarities. The way defaults work in methods (isomorphic to procedures) is interesting, but necessary because the generality of the design leads to argument-intensive methods. There are too many big words, which stems partly from the lack of a grep (making it necessary to have overly descriptive names) and partly from a desire to have the language look a little like English. For example,

    2 raisedToInteger: 3

is perfectly readable, but chatty at best. Because all messages must be sent to a left operand, unary minus didn’t exist. I complained and it appeared on Wednesday. The type structure is somewhat polymorphic, but breaks down. For example, rectangles cannot have both scalars and points added to them, because the message is resolved by the left operand of the message, not the left and right.

One philosophical point is that each window is of a type: a browser, a debugger, a workspace, a bit editor. Each window is created by a special action that determines its type. This is in contrast to our style of windows, which creates a general space in which anything may be run. With our windows, it is easy to get to the outside world; the notion of Unix appears only when a Unix thing happens in the window. By contrast, Smalltalk windows are always related to Smalltalk. This is hard to explain (it is, as I said, philosophical), but a trite explanation might be that you look outwards through our windows but inwards through theirs.

A few years ago, PARC probably had most of the good ideas. But I don’t think they ran far enough with them, and they didn’t take in enough new ones. The Smalltalk group is astonishingly insular, almost childlike, but is just now opening up, looking at other systems with open-eyed curiosity and fascination. The people there are absorbing much, and I think the next generation of Smalltalk will be much more modern, something I would find always comfortable. However, it will probably be another model Earth.


Saturday, February 24th, 2018
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
11:36 pm
CERN's iPod-like control devices, from 1973
A recent dig through some old papers uncovered this CERN memo from 1973 describing controls for the Proton Synchrotron being built at the time. I visited the control room some years later and saw the controls in action, installed on a room-hugging curved console reminiscent of a NASA mission control room. I was so impressed by the devices I asked for a copy of the documentation, written by (one assumes) their designers, F. Beck and B. Stumpe.

These are like the ur-controls for the iPod and (later) iPhone, but anticipate the music player by almost three decades. In fact, the CERN knob is better than the click wheel: It is programmable to be smooth, indexed, or with variable turning resistance and spring return. It was inspirational to feel how it responded when turned in the various modes.

Apple is very good at commercializing ideas, but big research institutions such as CERN, erstwhile Xerox PARC and Bell Labs excel at creating the ideas themselves.

This memo was from a different time, in more ways than one.

Thursday, December 7th, 2017
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
1:49 am
Error handling in Upspin
The Upspin project uses a custom package, upspin.io/errors, to represent error conditions that arise inside the system. These errors satisfy the standard Go error interface, but are implemented using a custom type, upspin.io/errors.Error, that has properties that have proven valuable to the project.

Here we will demonstrate how the package works and how it is used. The story holds lessons for the larger discussion of error handling in Go.

Motivations

A few months into the project, it became clear we needed a consistent approach to error construction, presentation, and handling throughout the code. We decided to implement a custom errors package, and rolled one out in an afternoon. The details have changed a bit but the basic ideas behind the package have endured. These were:

  • To make it easy to build informative error messages.
  • To make errors easy to understand for users.
  • To make errors helpful as diagnostics for programmers.

As we developed experience with the package, some other motivations emerged. We'll talk about these below.

A tour of the package

The upspin.io/errors package is imported with the package name "errors", and so inside Upspin it takes over the role of Go's standard "errors" package.

We noticed that the elements that go into an error message in Upspin are all of different types: user names, path names, the kind of error (I/O, permission, etc.) and so on. This provided the starting point for the package, which would build on these different types to construct, represent, and report the errors that arise.

The center of the package is the Error type, the concrete representation of an Upspin error. It has several fields, any of which may be left unset:

  type Error struct {
      Path upspin.PathName
      User upspin.UserName
      Op  Op
      Kind Kind
      Err error
  }

The Path and User fields denote the path and user affected by the operation. Note that these are both strings, but have distinct types in Upspin to clarify their usage and to allow the type system to catch certain classes of programming errors.

The Op field denotes the operation being performed. It is another string type and typically holds the name of the method or server function reporting the error: "client.Lookup", "dir/server.Glob", and so on.

The Kind field classifies the error as one of a set of standard conditions (Permission, IO, NotExist, and so on). It makes it easy to see a concise description of what sort of error occurred, but also provides a hook for interfacing to other systems. For instance, upspinfs uses the Kind field as the key to translation from Upspin errors to Unix error constants such as EPERM and EIO.

The last field, Err, may hold another error value. Often this is an error from another system, such as a file system error from the os package or a network error from the net package. It may also be another upspin.io/errors.Error value, creating a kind of error trace that we will discuss later.

Constructing an Error

To facilitate error construction, the package provides a function named E, which is short and easy to type.

  func E(args ...interface{}) error

As the doc comment for the function says, E builds an error value from its arguments. The type of each argument determines its meaning. The idea is to look at the types of the arguments and assign each argument to the field of the corresponding type in the constructed Error struct. There is an obvious correspondence: a PathName goes to Error.Path, a UserName to Error.User, and so on.

Let's look at an example. In typical use, calls to errors.E will arise multiple times within a method, so we define a constant, conventionally called op, that will be passed to all E calls within the method:

  func (s *Server) Delete(ref upspin.Reference) error {
    const op errors.Op = "server.Delete"
     ...

Then through the method we use the constant to prefix each call (although the actual ordering of arguments is irrelevant, by convention op goes first):

  if err := authorize(user); err != nil {
    return errors.E(op, user, errors.Permission, err)
  }

The String method for E will format this neatly:

  server.Delete: user ann@example.com: permission denied: user not authorized

If the errors nest to multiple levels, redundant fields are suppressed and the nesting is formatted with indentation:

  client.Lookup: ann@example.com/file: item does not exist:
          dir/remote("upspin.example.net:443").Lookup:
          dir/server.Lookup

Notice that there are multiple operations mentioned in this error message (client.Lookup, dir/remote, dir/server). We'll discuss this multiplicity in a later section.

As another example, sometimes the error is special and is most clearly described at the call site by a plain string. To make this work in the obvious way, the constructor promotes arguments of literal type string to a Go error type through a mechanism similar to the standard Go function errors.New. Thus one can write:

   errors.E(op, "unexpected failure")

or

   errors.E(op, fmt.Sprintf("could not succeed after %d tries", nTries))

and have the string be assigned to the Err field of the resulting Err type. This is a natural and easy way to build special-case errors.

Errors across the wire

Upspin is a distributed system and so it is critical that communications between Upspin servers preserve the structure of errors. To accomplish this we made Upspin's RPCs aware of these error types, using the errors package's MarshalError and UnmarshalError functions to transcode errors across a network connection. These functions make sure that a client will see all the details that the server provided when it constructed the error.

Consider this error report:

  client.Lookup: ann@example.com/test/file: item does not exist:
         dir/remote("dir.example.com:443").Lookup:
         dir/server.Lookup:
         store/remote("store.example.com:443").Get:
         fetching https://storage.googleapis.com/bucket/C1AF...: 404 Not Found

This is represented by four nested errors.E values.

Reading from the bottom up, the innermost is from the package upspin.io/store/remote (responsible for taking to remote storage servers). The error indicates that there was a problem fetching an object from storage. That error is constructed with something like this, wrapping an underlying error from the cloud storage provider:

  const op errors.Op = `store/remote("store.example.com:443").Get`
  var resp *http.Response
  ...
  return errors.E(op, errors.Sprintf("fetching %s: %s", url, resp.Status))

The next error is from the directory server (package upspin.io/dir/server, our directory server reference implementation), which indicates that the directory server was trying to perform a Lookup when the error occurred. That error is constructed like this:

  const op errors.Op = "dir/server.Lookup"
  ...
  return errors.E(op, pathName, errors.NotExist, err)

This is the first layer at which a Kind (errors.NotExist) is added.

The Lookup error value is passed across the network (marshaled and unmarshaled along the way), and then the upspin.io/dir/remote package (responsible for talking to remote directory servers) wraps it with its own call to errors.E:

  const op errors.Op = "dir/remote.Lookup"
  ...
  return errors.E(op, pathName, err)

There is no Kind set in this call, so the inner Kind (errors.NotExist) is lifted up during the construction of this Error struct.

Finally, the upspin.io/client package wraps the error once more:

  const op errors.Op = "client.Lookup"
  ...
  return errors.E(op, pathName, err)

Preserving the structure of the server's error permits the client to know programmatically that this is a "not exist" error and that the item in question is "ann@example.com/file". The error's Error method can take advantage of this structure to suppress redundant fields. If the server error were merely an opaque string we would see the path name multiple times in the output.

The critical details (the PathName and Kind) are pulled to the top of the error so they are more prominent in the display. The hope is that when seen by a user the first line of the error is usually all that's needed; the details below that are more useful when further diagnosis is required.

Stepping back and looking at the error display as a unit, we can trace the path the error took from its creation back through various network-connected components to the client. The full picture might help the user but is sure to help the system implementer if the problem is unexpected or unusual.

Users and implementers

There is a tension between making errors helpful and concise for the end user versus making them expansive and analytic for the implementer. Too often the implementer wins and the errors are overly verbose, to the point of including stack traces or other overwhelming detail.

Upspin's errors are an attempt to serve both the users and the implementers. The reported errors are reasonably concise, concentrating on information the user should find helpful. But they also contain internal details such as method names an implementer might find diagnostic but not in a way that overwhelms the user. In practice we find that the tradeoff has worked well.

In contrast, a stack trace-like error is worse in both respects. The user does not have the context to understand the stack trace, and an implementer shown a stack trace is denied the information that could be presented if the server-side error was passed to the client. This is why Upspin error nesting behaves as an operational trace, showing the path through the elements of the system, rather than as an execution trace, showing the path through the code. The distinction is vital.

For those cases where stack traces would be helpful, we allow the errors package to be built with the "debug" tag, which enables them. This works fine, but it's worth noting that we have almost never used this feature. Instead the default behavior of the package serves well enough that the overhead and ugliness of stack traces are obviated.

Matching errors

An unexpected benefit of Upspin's custom error handling was the ease with which we could write error-dependent tests, as well as write error-sensitive code outside of tests. Two functions in the errors package enable these uses.

The first is a function, called errors.Is, that returns a boolean reporting whether the argument is of type *errors.Error and, if so, that its Kind field has the specified value.

  func Is(kind Kind, err error) bool

This function makes it straightforward for code to change behavior depending on the error condition, such as in the face of a permission error as opposed to a network error:

  if errors.Is(errors.Permission, err) { ... }

The other function, Match, is useful in tests. It was created after we had been using the errors package for a while and found too many of our tests were sensitive to irrelevant details of the errors. For instance, a test might only need to check that there was a permission error opening a particular file, but was sensitive to the exact formatting of the error message.

After fixing a number of brittle tests like this, we responded by writing a function to report whether the received error, err, matches an error template:

  func Match(template, err error) bool

The function checks whether the error is of type *errors.Error, and if so, whether the fields within equal those within the template. The key is that it checks only those fields that are non-zero in the template, ignoring the rest.

For our example described above, one can write:

  if errors.Match(errors.E(errors.Permission, pathName), err) { … }

and be unaffected by whatever other properties the error has. We use Match countless times throughout our tests; it has been a boon.

Lessons

There is a lot of discussion in the Go community about how to handle errors and it's important to realize that there is no single answer. No one package or approach can do what's needed for every program. As was pointed out elsewhere, errors are just values and can be programmed in different ways to suit different situations.

The Upspin errors package has worked out well for us. We do not advocate that it is the right answer for another system, or even that the approach is right for anyone else. But the package worked well within Upspin and taught us some general lessons worth recording.

The Upspin errors package is modest in size and scope. The original implementation was built in a few hours and the basic design has endured, with a few refinements, since then. A custom error package for another project should be just as easy to create. The specific needs of any given environment should be easy to apply. Don't be afraid to try; just think a bit first and be willing to experiment. What's out there now can surely be improved upon when the details of your own project are considered.

We made sure the error constructor was both easy to use and easy to read. If it were not, programmers would resist using it.

The behavior of the errors package is built in part upon the types intrinsic to the underlying system. This is a small but important point: No general errors package could do what ours does. It truly is a custom package.

Moreover, the use of types to discriminate arguments allowed error construction to be idiomatic and fluid. This was made possible by a combination of the existing types in the system (PathName, UserName) and new ones created for the purpose (Op, Kind). Helper types made error construction clean, safe, and easy. It took a little more work—we had to create the types and use them everywhere, such as through the "const op" idiom—but the payoff was worthwhile.

Finally, we would like to stress the lack of stack traces as part of the error model in Upspin. Instead, the errors package reports the sequence of events, often across the network, that resulted in a problem being delivered to the client. Carefully constructed errors that thread through the operations in the system can be more concise, more descriptive, and more helpful than a simple stack trace.

Errors are for users, not just for programmers.

by Rob Pike and Andrew Gerrand

Thursday, October 26th, 2017
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
8:00 pm
The Upspin manifesto: On the ownership and sharing of data
Here follows the original "manifesto" from late 2014 proposing the idea for what became Upspin. The text has been lightly edited to remove a couple of references to Google-internal systems, with no loss of meaning.

I'd like to thank Eduardo Pinheiro, Eric Grosse, Dave Presotto and Andrew Gerrand for helping me turn this into a working system, in retrospect remarkably close to the original vision.

Augie
Augie image Copyright © 2017 Renee French


Manifesto



Outside our laptops, most of us today have no shared file system at work. (There was a time when we did, but it's long gone.) The world took away our /home folders and moved us to databases, which are not file systems. Thus I can no longer (at least not without clumsy hackery) make my true home directory be where my files are any more. Instead, I am expected to work on some local machine using some web app talking to some database or other external repository to do my actual work. This is mobile phone user interfaces brought to engineering workstations, which has its advantages but also some deep flaws. Most important is the philosophy it represents.

You don't own your data any more. One argument is that companies own it, but from a strict user perspective, your "apps" own it. Each item you use in the modern mobile world is coupled to the program that manipulates it. Your Twitter, Facebook, or Google+ program is the only way to access the corresponding feed. Sometimes the boundary is softened within a company—photos in Google+ are available in other Google products—but that is the exception that proves the rule. You don't control your data, the programs do.

Yet there are many reasons to want to access data from multiple programs. That is, almost by definition, the Unix model. Unix's model is largely irrelevant today, but there are still legitimate ways to think about data that are made much too hard by today's way of working. It's not necessarily impossible to share data among programs (although it's often very difficult), but it's never natural. There are workarounds like plugin architectures and documented workflows but they just demonstrate that the fundamental idea—sharing data among programs—is not the way we work any more.

This is backwards. It's a reversal of the old way we used to work, with a file system that all programs could access equally. The very notion of "download" and "upload" is plain stupid. Data should simply be available to any program with authenticated access rights. And of course for any person with those rights. Sharing between programs and people can be, technologically at least, the same problem, and a solved one.

This document proposes a modern way to achieve the good properties of the old world: consistent access, understandable data flow, and easy sharing without workarounds. To do this, we go back to the old notion of a file system and make it uniform and global. The result should be a data space in which all people can use all their programs, sharing and collaborating at will in a consistent, intuitive way.

Not downloading, uploading, mailing, tarring, gzipping, plugging in and copying around. Just using. Conceptually: If I want to post a picture on Twitter, I just name the file that holds it. If I want to edit a picture on Twitter using Photoshop, I use the File>Open menu in Photoshop and name the file stored on Twitter, which is easy to discover or even know a priori. (There are security and access questions here, and we'll come back to those.)

Working in a file system.

I want my home directory to be where all my data is. Not just my local files, not just my source code, not just my photos, not just my mail. All my data. My "phone" should be able to access the same data as my laptop, which should be able to access the same data as the servers. (Ignore access control for the moment.) $HOME should be my home for everything: work, play, life; toy, phone, work, cluster.

This was how things worked in the old single-machine days but we lost sight of that when networking became universally available. There were network file systems and some research systems used them to provide basically this model, but the arrival of consumer devices, portable computing, and then smartphones eroded the model until every device is its own fiefdom and every program manages its own data through networking. We have a network connecting devices instead of a network composed of devices.

The knowledge of how to achieve the old way still exists, and networks are fast and ubiquitous enough to restore the model. From a human point of view, the data is all we care about: my pictures, my mail, my documents. Put those into a globally addressable file system and I can see all my data with equal facility from any device I control. And then, when I want to share with another person, I can name the file (or files or directory) that holds the information I want to share, grant access, and the other person can access it.

The essence here is that the data (if it's in a single file) has one name that is globally usable to anyone who knows the name and has the permission to evaluate it. Those names might be long and clumsy, but simple name space techniques can make the data work smoothly using local aliasing so that I live in "my" world, you live in your world (also called "my" world from your machines), and the longer, globally unique names only arise when we share, which can be done with a trivial, transparent, easy to use file-system interface.

Note that the goal here is not a new file system to use alongside the existing world. Its purpose is to be the only file system to use. Obviously there will be underlying storage systems, but from the user's point of view all access is through this system. I edit a source file, compile it, find a problem, point a friend at it; she accesses the same file, not a copy, to help me understand it. (If she wants a copy, there's always cp!).

This is not a simple thing to do, but I believe it is possible. Here is how I see it being assembled. This discussion will be idealized and skate over a lot of hard stuff. That's OK; this is a vision, not a design document.

Everyone has a name.

Each person is identified by a name. To make things simple here, let's just use an e-mail address. There may be a better idea, but this is sufficient for discussion. It is not a problem to have multiple names (addresses) in this model, since the sharing and access rights will treat the two names as distinct users with whatever sharing rights they choose to use.

Everyone has stable storage in the network.

Each person needs a way to make data accessible to the network, so the storage must live in the network. The easiest way to think of this is like the old network file systems, with per-user storage in the server farm. At a high level, it doesn't matter what that storage is or how it is structured, as long as it can be used to provide the storage layer for a file-system-like API.

Everyone's storage server has a name identified by the user's name.

The storage in the server farm is identified by the user's name.

Everyone has local storage, but it's just a cache.

It's too expensive to send all file access to the servers, so the local device, whatever it is—laptop, desktop, phone, watch—caches what it needs and goes to the server as needed. Cache protocols are an important part of the implementation; for the point of this discussion, let's just say they can be designed to work well. That is a critical piece and I have ideas, but put that point aside for now.

The server always knows what devices have cached copies of the files on local storage. 

The cache always knows what the associated server is for each directory file in its cache and maintains consistency within reasonable time boundaries.

The cache implements the API of a full file system. The user lives in this file system for all the user's own files. As the user moves between devices, caching protocols keep things working.

Everyone's cache can talk to multiple servers.

A user may have multiple servers, perhaps from multiple providers. The same cache and therefore same file system API refers to them all equivalently. Similarly, if a user accesses a different user's files, the exact same protocol is used, and the result is cached in the same cache the same way. This is federation as architecture.

Every file has a globally unique name.

Every file is named by this triple: (host address, user name, file path). Access rights aside, any user can address any other user's file by evaluating the triple. The real access method will be nicer in practice, of course, but this is the gist.

Every file has a potentially unique ACL.

Although the user interface for access control needs to be very easy, the effect is that each file or directory has an access control list (ACL) that mediates all access to its contents. It will need to be very fine-grained with respect to each of users, files, and rights.

Every user has a local name space.

The cache/file-system layer contains functionality to bind things, typically directories, identified by such triples into locally nicer-to-use names. An obvious way to think about this is like an NFS mount point for /home, where the remote binding attaches to /home/XXX the component or components in the network that the local user wishes to identify by XXX. For example, Joe might establish /home/jane as a place to see all the (accessible to Joe) pieces of Jane's world. But it can be much finer-grained than that, to the level of pulling in a single file.

The NFS analogy only goes so far. First, the binding is a lazily-evaluated, multi-valued recipe, not a Unix-like mount. Also, the binding may itself be programmatic, so that there is an element of auto-discovery. Perhaps most important, one can ask any file in the cached local system what its triple is and get its unique name, so when a user wishes to share an item, the triple can be exposed and the remote user can use her locally-defined recipe to construct the renaming to make the item locally accessible. This is not as mysterious or as confusing in practice as it sounds; Plan 9 pretty much worked like this, although not as dynamically.

Everyone's data becomes addressable.

Twitter gives you (or anyone you permit) access to your Twitter data by implementing the API, just as the regular, more file-like servers do. The same story applies to any entity that has data it wants to make usable. At some scaling point, it becomes wrong not to play.

Everyone's data is secure.

It remains to be figured out how to do that, I admit, but with a simple, coherent data model that should be achievable.

Is this a product?

The protocols and some of the pieces, particularly what runs on local devices, should certainly be open source, as should a reference server implementation. Companies should be free to provide proprietary implementations to access their data, and should also be free to charge for hosting. A cloud provider could charge hosting fees for the service, perhaps with some free or very cheap tier that would satisfy the common user. There's money in this space.

What is this again?

What Google Drive should be. What Dropbox should be. What file systems can be. The way we unify our data access across companies, services, programs, and people. The way I want to live and work.

Never again should someone need to manually copy/upload/download/plugin/workflow/transfer data from one machine to another. 
Thursday, September 21st, 2017
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
10:02 pm
Go: Ten years and climbing


Drawing Copyright ©2017 Renee French


This week marks the 10th anniversary of the creation of Go.


The initial discussion was on the afternoon of Thursday, the 20th of September, 2007. That led to an organized meeting between Robert Griesemer, Rob Pike, and Ken Thompson at 2PM the next day in the conference room called Yaounde in Building 43 on Google's Mountain View campus. The name for the language arose on the 25th, several messages into the first mail thread about the design:

Subject: Re: prog lang discussion From: Rob 'Commander' Pike Date: Tue, Sep 25, 2007 at 3:12 PM To: Robert Griesemer, Ken Thompson i had a couple of thoughts on the drive home. 1. name 'go'. you can invent reasons for this name but it has nice properties. it's short, easy to type. tools: goc, gol, goa. if there's an interactive debugger/interpreter it could just be called 'go'. the suffix is .go ...

(It's worth stating that the language is called Go; "golang" comes from the web site address (go.com was already a Disney web site) but is not the proper name of the language.)


The Go project counts its birthday as November 10, 2009, the day it launched as open source, originally on code.google.com before migrating to GitHub a few years later. But for now let's date the language from its conception, two years earlier, which allows us to reach further back, take a longer view, and witness some of the earlier events in its history.


The first big surprise in Go's development was the receipt of this mail message:

Subject: A gcc frontend for Go
From: Ian Lance Taylor Date: Sat, Jun 7, 2008 at 7:06 PM To: Robert Griesemer, Rob Pike, Ken Thompson One of my office-mates pointed me at http://.../go_lang.html . It seems like an interesting language, and I threw together a gcc frontend for it. It's missing a lot of features, of course, but it does compile the prime sieve code on the web page.

The shocking yet delightful arrival of an ally (Ian) and a second compiler (gccgo) was not only encouraging, it was enabling. Having a second implementation of the language was vital to the process of locking down the specification and libraries, helping guarantee the high portability that is part of Go's promise.


Even though his office was not far away, none of us had even met Ian before that mail, but he has been a central player in the design and implementation of the language and its tools ever since.


Russ Cox joined the nascent Go team in 2008 as well, bringing his own bag of tricks. Russ discovered—that's the right word—that the generality of Go's methods meant that a function could have methods, leading to the http.HandlerFunc idea, which was an unexpected result for all of us. Russ promoted more general ideas too, like the the io.Reader and io.Writer interfaces, which informed the structure of all the I/O libraries.


Jini Kim, who was our product manager for the launch, recruited the security expert Adam Langley to help us get Go out the door. Adam did a lot of things for us that are not widely known, including creating the original golang.org web page and the build dashboard, but of course his biggest contribution was in the cryptographic libraries. At first, they seemed disproportionate in both size and complexity, at least to some of us, but they enabled so much important networking and security software later that they become a crucial part of the Go story. Network infrastructure companies like Cloudflare lean heavily on Adam's work in Go, and the internet is better for it. So is Go, and we thank him.


In fact a number of companies started to play with Go early on, particularly startups. Some of those became powerhouses of cloud computing. One such startup, now called Docker, used Go and catalyzed the container industry for computing, which then led to other efforts such as Kubernetes. Today it's fair to say that Go is the language of containers, another completely unexpected result.


Go's role in cloud computing is even bigger, though. In March of 2014 Donnie Berkholz, writing for RedMonk, claimed that Go was "the emerging language of cloud infrastructure". Around the same time, Derek Collison of Apcera stated that Go was already the language of the cloud. That might not have been quite true then, but as the word "emerging" used by Berkholz implied, it was becoming true.


Today, Go is the language of the cloud, and to think that a language only ten years old has come to dominate such a large and growing industry is the kind of success one can only dream of. And if you think "dominate" is too strong a word, take a look at the internet inside China. For a while, the huge usage of Go in China signaled to us by the Google trends graph seemed some sort of mistake, but as anyone who has been to the Go conferences in China can attest, the measurements are real. Go is huge in China.


In short, ten years of travel with the language have brought us past many milestones. The most astonishing is at our current position: a conservative estimate suggests there are at least half a million Go programmers. When the mail message naming Go was sent, the idea of there being half a million gophers would have sounded preposterous. Yet here we are, and the number continues to grow.


Speaking of gophers, it's been fun to watch how Renee French's idea for a mascot, the Go gopher, became not only a much loved creation but also a symbol for Go programmers everywhere. Many of the biggest Go conferences are called GopherCons as they gather together gophers from all over the world.


Gopher conferences are taking off. The first one was only three years ago, yet today there are many, all around the world, plus countless smaller local "meetups". On any given day, there is more likely than not a group of gophers meeting somewhere in the world to share ideas.


Looking back over ten years of Go design and development, it is astounding to reflect on the growth of the Go community. The number of conferences and meetups, the long and ever-increasing list of contributors to the Go project, the profusion of open source repositories hosting Go code, the number of companies using Go, some exclusively: these are all astonishing to contemplate.


For the three of us, Robert, Rob, and Ken, who just wanted  to make our programming lives easier, it's incredibly gratifying to witness what our work has started.

What will the next ten years bring?

- Rob Pike, with Robert Griesemer and Ken Thompson
Saturday, February 25th, 2017
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
5:52 pm
The power of role models
I spent a few days a while back in a board meeting for a national astronomy organization and noticed a property of the population in that room: Out of about 40 people, about a third were women. And these were powerful women, too: professors, observatory directors and the like. Nor were they wallflowers. Their contributions to the meeting exceeded their proportion.

In my long career, I had never before been in a room like that, and the difference in tone, conversation, respect, and professionalism was unlike any I have experienced. I can't prove it was the presence of women that made the difference - it could just be that astronomers are better people all around, a possibility I cannot really refute - but it seemed to me that the difference stemmed from the demographics.

The meeting was one-third women, but of course in private conversation, when pressed, the women I spoke to complained that things weren't equal yet. We all have our reference points.

But let's back up for a moment and think about the main point: In a room responsible for overseeing the budget and operation of major astronomical observatories, including things like the Hubble telescope, women played a major role. The contrast with computing is stark.

It really got me thinking. At dinner I asked some of the women to speak to me about this, how astronomy became so (relatively) egalitarian. And one topic became clear: role models. Astronomy has a long history of women active in the field, going all the way back to Caroline Herschel in the early 19th century. Women have made huge contributions to the field. Dava Sobel just wrote a book about the women who laid the foundations for the discovery of the expansion of the universe. Just a couple of weeks ago, papers ran obituaries of Vera Rubin, the remarkable observational astronomer who discovered the evidence for dark matter. I could mention Jocelyn Bell, whose discovery of pulsars got her advisor a Nobel (sic).

The most famous astronomer I met growing up was Helen Hogg, the (adopted) Canadian astronomer at David Dunlap Observatory outside Toronto, who also did a fair bit of what we now call outreach.

The women at the meeting spoke of this, a history of women contributing, of role models to look up to, of proof that women can make major contributions to the field.

What can computing learn from this?

It seems we're doing it wrong. The best way to improve the representation of women in the field is not to recruit them, important though that is, but to promote them. To create role models. To push them into positions of influence. Women leave computing in large numbers because they don't see a path up, or because the culture makes them unwelcome. More women excelling in the field, famous women, brilliant women, would be inspiring.

Men have the power to help fix those things, but they also should have the courage to cede the stage to women more often, to fight the stupid bias that keeps women from excelling in the field. It may take proactive behavior, like choosing a women over a man when growing your team, just because, or promoting women more freely.

But as I see it, without something being done to promote female role models, the way things are going computing will still be backwards a hundred years from now.
Sunday, August 3rd, 2014
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
9:55 pm
Prints
Two long-buried caches of photographs came to light last year. One was a stack of cellulose nitrate negatives made on the Scott Antarctic expedition almost a hundred years ago. Over time, they became stuck together into a moldy brick, but it was possible to tease the negatives apart and see what they revealed. You can view the images at the web site of the New Zealand Antarctic Heritage Trust. The results show ragged edges and mold spots but, even beyond their historical importance, the photographs are evocative and in some cases very beautiful.

The other cache contained images not quite so old and of less general interest but of personal importance. My mother moved from the house she had occupied for decades into a smaller apartment and while preparing to move she found the proverbial shoe box of old pictures in a closet. Some of the images are from my youth, some from hers, and some even from her parents'. One of the photographs, from 1931, shows my paternal great-grandparents. I never met my paternal grandparents, let alone great-grandparents, so this photograph touches something almost primordial for me. And some of the photographs in the box were even older.

Due to the miracle of photography, we are able to see over a hundred years into the past. Of course this is not news; all of us have seen 19th century photographs by the pioneers of the medium. By the turn of the 20th century photography was so common that huge numbers of images, from the historical to the mundane, had been created. And sometimes we are lucky enough to chance upon forgotten images that open a window into a past that would otherwise fade from view.

But such windows are becoming rare. A hundred years from now, there will be far fewer photo caches to find. Although the transition to digital photography has made photos almost unimaginably commonplace—one estimate puts the number of shutter activations at a trillion images worldwide per year—very few of those images become artifacts that can be left in a shoe box.

We live in what has been named a Digital Dark Age. Because digital technology evolves so fast, we are rapidly losing the ability to understand yesterday's media. As file formats change, software becomes obsolete, and hardware becomes outmoded, old digital files become unreadable and unrecoverable.

There are many examples of lost information, but here is an illustrative story of disaster narrowly averted. Early development of the Unix operating system, which became the software foundation for the Internet, was done in the late 1960s and early 1970s on Digital Equipment Corporation computers. Backups were made on a magnetic medium called a DECtape. By the mid 1970s, DECtape was obsolete and by the 1980s there were no remaining DECtape drives that could read the old backups. The scientists in the original Unix lab had kept a box of old backups under the raised floor of the computer room, but the tapes had spontaneously become unreadable because the device to read them no longer existed in the lab or anywhere else as far as anyone knew. And even if it did, no computer that could run the device was still powered on. Fortunately, around 1990 Paul Vixie and Keith Bostic, working for a different company, stumbled across an old junked DECtape drive and managed to get it up and running again by resurrecting an old computer to connect it to. They contacted the Unix research group and offered one last chance to recover the data on the backup tapes before the computer and DECtape drive were finally decommissioned. Time and resources were limited, but some of the key archival pieces of early Unix development were recovered through this combination of charity and a great deal of luck. This story has a happy ending, but not all digital archives survive. Far from it.

The problem is that as technology advances, data needs to be curated. Files need to have their formats converted, and then transferred to new media. A backup disk in a box somewhere might be unreadable a few years from now. Its format may be obsolete, the software to read it might not run on current hardware, or the media might have physically decayed. NASA lost a large part of the data collected by the Viking Mars missions because the iron oxide fell off the tapes storing the data.

Backups are important but they too are temporary, subject to the same problems as the data they attempt to protect. Backup software can become obsolete and media can fail. The same affliction that damaged the Viking tapes also wiped out my personal backup archive; I lost the only copy of my computer work from the 1970s. (It's worth noting my negatives and prints from the period survived.)

It's not just tapes that go bad. Consider CDs and DVDs, media often used for backup. The disks, especially the writable kind use for backups, are very fragile, much more so than the mass-produced read-only kind used to store music and movies. Within a few years, especially in humid environments, the metal film can separate from the backing medium. Even if the backup medium survives, the formats used to store the backups might become obsolete. The software that reads the backups might not run on the next computer one buys. Today, CDs are already becoming relics; many computers today do not even come with a CD or DVD drive. What were once the gold standard for backup are already looking old-fashioned just a few years on. They will be antiquated and obscure a century from now.

To summarize, digital information requires maintenance. It's not sufficient to make backups; the backups also need to be maintained, upgraded, transferred, and curated. Without conscientious care, the data of today will be lost forever in a few years. Even with care, it's possible through software or hardware changes to lose access forever. That shoebox of old backup CDs will be unreadable soon.

Which brings us back to those old photo caches. They held negatives and prints, physical objects that stored images. They needed no attention, no curating, no updating. They sat untended and forgotten for decades, but through all that time faithfully held their information, waiting for a future discoverer. As a result, we can all see what the Scott Antarctic expedition saw, and I can see what my great-grandparents looked like.

It is a sad irony that modern technology makes it unlikely that future generations will see the images made today.

Ask yourself whether your great-grandchildren will be able to see your photographs. If the images exist only as a digital image file, the answer is almost certainly, "No". If, however, there are physical prints, the odds improve. Those digital images need to be made real to endure. Without a print, a digital photograph has no future.

We live in a Digital Dark Age, but as individuals we can shine a little light. If you are one of the uncounted photographers who enjoy digital photography, keep in mind the fragility of data. When you have a digital image you care about, for whatever reason, artistic or sentimental, please make a print and put that print away. It will sit quietly in the dark, holding fast, never forgetting, ready to reveal itself to a grateful future generation.

Saturday, January 25th, 2014
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
12:35 am
Self-referential functions and the design of options
I've been trying on and off to find a nice way to deal with setting options in a Go package I am writing. Options on a type, that is. The package is intricate and there will probably end up being dozens of options. There are many ways to do this kind of thing, but I wanted one that felt nice to use, didn't require too much API (or at least not too much for the user to absorb), and could grow as needed without bloat.

I've tried most of the obvious ways: option structs, lots of methods, variant constructors, and more, and found them all unsatisfactory. After a bunch of trial versions over the past year or so, and a lot of conversations with other Gophers making suggestions, I've finally found one I like. You might like it too. Or you might not, but either way it does show an interesting use of self-referential functions.

I hope I have your attention now.

Let's start with a simple version. We'll refine it to get to the final version.

First, we define an option type. It is a function that takes one argument, the Foo we are operating on.

type option func(*Foo)

The idea is that an option is implemented as a function we call to set the state of that option. That may seem odd, but there's a method in the madness.

Given the option type, we next define an Option method on *Foo that applies the options it's passed by calling them as functions. That method is defined in the same package, say pkg, in which Foo is defined.

This is Go, so we can make the method variadic and set lots of options in a given call:

// Option sets the options specified.
func (f *Foo) Option(opts ...option) {
  for _, opt := range opts {
     opt(f)
  }
}

Now to provide an option, we define in pkg a function with the appropriate name and signature. Let's say we want to control verbosity by setting an integer value stored in a field of a Foo. We provide the verbosity option by writing a function with the obvious name and have it return an option, which means a closure; inside that closure we set the field:

// Verbosity sets Foo's verbosity level to v.
func Verbosity(v int) option {
  return func(f *Foo) {
     f.verbosity = v
  }
}

Why return a closure instead of just doing the setting? Because we don't want the user to have to write the closure and we want the Option method to be nice to use. (Plus there's more to come....)

In the client of the package, we can set this option on a Foo object by writing:

foo.Option(pkg.Verbosity(3))

That's easy and probably good enough for most purposes, but for the package I'm writing, I want to be able to use the option mechanism to set temporary values, which means it would be nice if the Option method could return the previous state. That's easy: just save it in an empty interface value that is returned by the Option method and the underlying function type. That value flows through the code:

type option func(*Foo) interface{}

// Verbosity sets Foo's verbosity level to v.
func Verbosity(v int) option {
  return func(f *Foo) interface{} {
     previous := f.verbosity
      f.verbosity = v
     return previous
  }
}

// Option sets the options specified.
// It returns the previous value of the last argument.
func (f *Foo) Option(opts ...option) (previous interface{}) {
  for _, opt := range opts {
     previous = opt(f)
  }
  return previous
}

The client can use this the same as before, but if the client also wants to restore a previous value, all that's needed is to save the return value from the first call, and then restore it.

prevVerbosity := foo.Option(pkg.Verbosity(3))
foo.DoSomeDebugging()
foo.Option(pkg.Verbosity(prevVerbosity.(int)))

The type assertion in the restoring call to Option is clumsy. We can do better if we push a little harder on our design.

First, redefine an option to be a function that sets a value and returns another option to restore the previous value.

type option func(f *Foo) option

This self-referential function definition is reminiscent of a state machine. Here we're using it a little differently: it's a function that returns its inverse.

Then change the return type (and meaning) of the Option method of *Foo to option from interface{}:

// Option sets the options specified.
// It returns an option to restore the last arg's previous value.
func (f *Foo) Option(opts ...option) (previous option) {
  for _, opt := range opts {
      previous = opt(f)
  }
  return previous
}

The final piece is the implementation of the actual option functions. Their inner closure must now return an option, not an interface value, and that means it must return a closure to undo itself. But that's easy: it can just recur to prepare the closure to undo the original! It looks like this:

// Verbosity sets Foo's verbosity level to v.
func Verbosity(v int) option {
  return func(f *Foo) option {
     previous := f.verbosity
     f.verbosity = v
     return Verbosity(previous)
  }
}

Note the last line of the inner closure changed from
     return previous
to
     return Verbosity(previous)
Instead of just returning the old value, it now calls the surrounding function (Verbosity) to create the undo closure, and returns that closure.

Now from the client's view this is all very nice:

prevVerbosity := foo.Option(pkg.Verbosity(3))
foo.DoSomeDebugging()
foo.Option(prevVerbosity)

And finally we take it up one more level, using Go's defer mechanism to tidy it all up in the client:

func DoSomethingVerbosely(foo *Foo, verbosity int) {
  // Could combine the next two lines,
  // with some loss of readability.
  prev := foo.Option(pkg.Verbosity(verbosity))
  defer foo.Option(prev)
  // ... do some stuff with foo under high verbosity.
}

It's worth noting that since the "verbosity" returned is now a closure, not a verbosity value, the actual previous value is hidden. If you want that value you need a little more magic, but there's enough magic for now.

The implementation of all this may seem like overkill but it's actually just a few lines for each option, and has great generality. Most important, it's really nice to use from the point of view of the package's client. I'm finally happy with the design. I'm also happy at the way this uses Go's closures to achieve its goals with grace.

Thursday, May 13th, 2004
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
9:41 pm
Hello
World.
Tuesday, June 8th, 2004
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
7:02 am
news
Ronald Reagan is still dead.

I suppose it's obvious - legacy and all that - but really I don't see why the death of someone is worth much news. A mention yes but all this? Plus, in this case, it's someone who had disappeared from public view a decade ago. He already had things named after him in every state (that's the goal, anyway); all they need to do is add 'memorial' and it's done.

I will stop before getting political.
Sunday, April 9th, 2006
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
8:48 pm
A perfect moment
Yesterday afternoon was just the right sort of weather for sitting in the park, so towards sunset Renee and I headed down to the little park below us, which has a view across the water and a lovely, long fish pond with a stone bridge across it. The plan was to sit on a bench and watch the light fade, sunset itself being hidden by the hills behind.

Our plan stumbled when we discovered a wedding in progress on the half of the park with the best benches, and then there were sprinkles of rain. We sat on a stone bench by the bridge, sheltered by some trees overhead, and waited for the shower to pass. The violin players in the wedding didn't seem to mind much; they stayed out even as it sprinkled. But the shower soon passed and we moved to a bench at the other end of the park, away from the wedding but still with musical accompaniment. From there, we could look out across the grass and pond and towards the water. The park is small, the pond just a few paces in front of us.

There were five ibises wandering on the grass by the verge of the pond, poking their long beaks into the grass to find worms, plus a couple of gulls and pigeons lolling about.

Sunset started to get more colorful; the clouds started to go pink and the buildings in the distance, by the harbor, illuminated by the setting sun, glowed a rich orange.

Here, then, is the scene: in front of us, grass with ibises and other birds, then a babbling pond with koi, a few flowers and bushes, then the harbor with boats. Towards the right, a stone bridge over the pond, the wedding party, and glowing buildings. In the sky above, pink puffy clouds against a blue sky and above that, a quarter moon. Through all of this, the violins played gentle music.

The sunset strengthened, the colors deepened.

The ibises moved to the right and formed a little pack of five. From the left came a couple of folks with three beagles straining at their leashes. They pulled towards the ibises, who stared back at them. All froze. These two groups framed the grass, then above them, pond and bushes and boats and water and sky and pink clouds and then a rainbow popped out just below the moon, arcing like an ibis's beak from left, above the beagles, over to the right, towards the orange buildings and the wedding party, whose music continued to accompany this magic scene, a painting in real life, a Victorian tableau, a perfect moment.
Sunday, June 11th, 2006
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
8:17 pm
I can't find this on the web, so here follows a note I wrote in 1991
of an odd event in my computing career.

===

There's no such thing as bad publicity, the saying goes.
Unless, of course, you have no interest in being a public figure.
That's where Richard Stallman and I start to diverge.

A friend called early Saturday to report some good news; groggily,
I countered with a feeling of impending doom. I had to spend the
day preparing for a talk Monday afternoon at MIT. The talk itself
would be fine: I'd been invited by Butler Lampson to talk about Plan 9
at the Lab for Computer Science and took that as the kind of honor
I rise to. I'd give a hell of talk, provided I was allowed to.
The problem, I said, is that MIT and I are connected by some history.

I explained about The League for Programming Freedom,
just the `League' to insiders, whose multiply ambiguous name
hearkens to an innocent age in software. Free Software is
like Free Love, a hippie pipe dream in which
computing is free from venality, commercial interests, even
capitalism. The founder of the League, Richard Stallman,
has been preaching the gospel of promiscuous programming
for years now and has won many converts, especially
in academia. Especially at MIT. Especially among the undergraduates.
Especially in the building on Technology Square in Cambridge that
the League shares, in part, with LCS.

Stallman's sermons, in print and in person, always include a
long harangue about patents on software. The citation in that
harangue is usually #4,555,755, AT&T's US patent on what is
colloquially called ``backing store,'' a technique for implementing
windows on a bitmap display. This patent is of particular interest
to Stallman because he claims to have used the idea, before the patent
was filed, while writing the window system for the
Lisp Machine at MIT. The patent is of interest to me because I am
listed as the inventor.

I don't wish to dive into the legal and technical intricacies of
the patent question here. I will just state that I know what I
invented and when I invented it, and I know what Stallman did and
when he did it, and I do not believe that Stallman had nearly as
good an idea as I did. And as for the propriety of software
patents, I signed a contract when I joined AT&T that said, in
effect, that I could work on whatever I fancied and would be
supported well in that endeavor; all AT&T asked in return was that
they be allowed to make money from my work. In Stallman's
world that is a Faustian deal, but then I always thought only
boring people went to Heaven.

When the patent was filed (October 7, 1982; issued
November 26, 1985), there were very few software patents and
the occasion was celebrated. I was congratulated warmly and
people were excited about the future of software patents.
Nowadays, however, the climate in universities at least is
very different, and Richard Stallman is almost single-handedly
responsible for the change. (The business community, on the other
hand, is still excited.)

A few years ago, AT&T began quietly pressing its case on a small
portfolio of computer graphics patents, with #4,555,775 being central.
Polite letters were written to a number of places inviting people
to draw up license agreements; polite letters were returned, and
legalities proceeded normally. (I don't know, and wouldn't say if
I did, what the state of those legalities is today.) One of the
letters went to the X consortium at MIT, where it was largely
ignored. A follow-up letter early in 1991 hit the electronic
bulletin boards, however, and I have been a public figure ever
since.

I hung up and began preparing for the worst. Stallman and I had
never met, and I felt sure he'd capitalize on my visit to his
building.

I was therefore not surprised early Monday afternoon, as we were eating
a takeout lunch in the fifth-floor lounge at LCS, when someone
said that Stallman was preparing to stage a protest at my talk.
Jerry Saltzer went to his office to get a copy of the announcement,
made on a local LCS electronic bulletin board:

---

Date: Sun, 17 Nov 91 20:26:37 -0500
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: bboard@lcs.mit.edu
Subject: Protest the AT&T backing store patent, Monday afternoon

You may have heard that AT&T has a patent on a simple technique called
"backing store" which consists of saving the hidden parts of a window
in off-screen memory. AT&T is using this patent to threaten to sue
all the users of X windows, including MIT. A few weeks ago, the X
consortium stated that these threats are "chilling to university
research".

Rob Pike, who obtained this patent for AT&T, is going to be visiting
Tech Square on Monday afternoon. If you don't like AT&T's
monopolistic threat, now's the time to express your opinion by joining
in a quiet protest against his visit.

Let's meet on the fourth floor of 545 Tech Square, near room 430,
around 2:15pm.

Please make a sign, even an el-cheapo sign, to identify yourself as
part of the protest. Make up a slogan on the subject of Pike, AT&T,
backing store, X windows, patents..., then write it with a magic
marker on a piece of copier paper.

(Pike has spoken publicly in favor of software patents and the backing
store patent in particular. He is thus not a reluctant participant in
AT&T's campaign of threats.)

---

This notice has some characteristic inaccuracies, of which the worst
is that AT&T has never threatened to sue anyone over the patent,
but I love it anyway. The penultimate paragraph typifies Stallman
as only Stallman himself can. I asked if Stallman really believed
that people needed instructions on how to make signs. I was told
that among MIT undergraduates a bizarre form of political correctness
had developed, putting Stallman in charge of a pack of eager
misguided nerds who in a healthier environment would probably
be protesting the killing of rats in biology class.

My hosts at LCS were mortified. Stallman had staged a protest
at a company (to complain about Lotus's ``look and feel'' lawsuit),
but had never picketed a technical talk. Moreover, I was the
guest of LCS, not the League, had been invited to give this talk,
and was planning to give a technical lecture, not a legal debate,
on a topic unrelated to patents. There was even a suggestion
about examining the MIT code to see what it said about freedom
of speech. But then someone pointed out that
Stallman, for all his eccentricities (don't get me started) was
polite, and if he wanted a `quiet' protest it would be quiet, and
I agreed.
A wag remarked that in the cloistered MIT world, participating
in a demonstration like this would be a broadening experience.

I was admonished not to talk to the protesters, not to answer
any questions about patents, and to let one of my hosts deflect
any verbal missile hurled at me. I replied that I had expected
as much and was prepared. For example, I was wearing a Bugs Bunny
T-shirt rather than a three-piece suit; I am a researcher more
than an AT&T Ambassador, to quote a giveaway pencil from a few
years back.

I needed to go to the lecture room early to prepare a laptop and
overhead projector for a brief demo during my talk. Because of
a scheduling conflict, the room was not a classroom but a sort
of playroom full of bean bags and not enough chairs.

I quickly identified the protesters: they were the ones in the back
with hand-written slogans on pieces of copier paper taped to their
chests. ``We don't back AT&T's Store of Patents,'' was perhaps
the most creative sign, except for a woman in a wheelchair with a
large placard reading, ``Patents Cripple Software.'' I looked
twice to verify that she needed the wheelchair. I tried to meet
Stallman's gaze but he would only steal surreptitious sideways
glances as he nervously paced back and forth in front of his
entourage.

My equipment set up easily, so I sat down in a reserved chair
in the front row to await the starting time and began chatting
to someone from LCS. Someone tapped my shoulder. I turned to
see a sign-encrusted protester. Physical contact. I braced myself.
He spoke.

- Excuse me, would you mind moving? I won't be able to see the screen.

- Don't worry, I'm giving the talk so I'll be moving all through it.

- Fine. Thanks.

He sat down. At that moment, I finally relaxed with the realization
that nothing ugly was going to happen.

And nothing did. The talk went very well; I was pleased with the
story I told, a technical explanation of how distributed applications
are built in Plan 9 using its namespace operations (patent applied for).
The protesters were surprised, I think, that my subject was interesting
to them. At one point they all applauded spontaneously when I described
a feature of the system. I also think they were surprised that the
inventor of #4,555,755 was funny, theatrical, and clever. At least
half the questions (all technical) during and after the talk were from
the protesters. Stallman said nothing.

Afterwards an LCS member said that, as a result of Stallman's ploy,
my audience was about twice what it would have been. The people
he dragged in came for political reasons but ended up learning
something.

Here's what the League's newsletter said about the event:

---

CAMBRIDGE, MA, November 18, 1991 -- Rob Pike, a software designer
from AT&T Bell Labs, expected to deliver an ordinary seminar on
his latest research project. Instead, he found a room filled
with programmers carrying signs to protest the consequences of
his previous project: the AT&T "backing store" patent which AT&T
has used to threaten all the members of the X Consortium,
including MIT itself.
Of the approximately 80 people present at the talk, about 50
carried protest signs. The protestors (sic) did not try to interfere
with the seminar. They simply raised their signs as Pike began
to speak. This accomplished the purpose of making their ire known.

---

I accomplished my purpose of delivering a ``seminar on [my] latest
research project.'' Other than being a good talk, it was also,
in the end, pretty ordinary.
Wednesday, April 2nd, 2008
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
6:32 am

When I travel I like to have a hamburger at a McDonald's restaurant. There are a number of these to be found around the world.
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
8:37 am
MacDonald's not McDonald's


Same general concept. Had a hamburger here too.
Saturday, August 28th, 2010
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
1:24 am
Know your science
Except for the TV show "The Big Bang Theory", popular culture gets science wrong. We all know that.

But there's a way it tends to get science wrong that upsets me more than most. That is when it misuses the tools of science by willfully ignoring what science actually means.

One common example is celebrity equations, wherein some mathematical-looking expression mixes two or more celebrities together, as in (I'm making this one up and I'm not a cultural critic, let alone a comic, so please bear with me): Lady Gaga = (2*Madonna + Carrot Top)/3. Mathematically savvy readers will recognize that I normalized that equation. If you don't know what that means, you shouldn't be writing celebrity equations, because mathematical equations mean something, they're not just symbols. Like musical comedy based on bad notes, bogus mathematical equations are not funny, just lazy.

Some years ago I even wrote a letter to Entertainment Weekly when they had a long article full of egregious celebrity equations. To their credit, they published the letter and even mended their ways for a while. I quote the letter here:

According to EW math, the more buzz or intelligence you have, the less likely you are to be on the It List. That may be true, but I bet you didn't mean that. Your equation is art-directed nonsense. EW seems to think the joke is that the equations look cute: If Einstein is funny, his square root is hilarious...
In short, mathematics may look funny if you don't understand it but that doesn't make it funny if you misuse it in ignorance.

Another sort of abuse is comedy periodic tables: periodic tables of the vegetables, period table of the desserts, periodic table of the presidents, and on and on. There are zillions of them. I believe the vegetables one was the first widely distributed example.

What's wrong with them? Again, they miss the point about the one true periodic table, Mendeleev's periodic table of the elements. In fact, to put things with no structure into a periodic table not only misses the point of the periodic table, it misses the profound idea that some things have periods.

Mendeleev's table, by recognizing the periodic structure of the elements, predicted not only properties of the elements, but the very existence of undiscovered elements. It was a breakthrough.

The periodic table is not some artistic layout of letters, it's science at its very best, one of the great results of the 19th century and the birth of modern chemistry. It doesn't honor science to take, say, typefaces and put them in a funny-looking grid. That just mocks the idea that science can predict the way the world works.

Science is not arbitrary. Making arbitrary cultural artifacts by abusing scientific ideas is not just wrong, it's offensive. It cheapens science.

Another area of abuse is quantum mechanics, and a common victim is Heisenberg's uncertainty principle. Despite what some ill-informed academics would have you believe, Heisenberg's principle is not some general statment about weird shit happening in the world, it is a fantastically precise scientific statement about the limits of measurement of two simultaneous physical properties: position and momentum. It's not a metaphor!

What's really sad is that many of the commonest misuses of the terminology of quantum mechanics come from other areas of science and technology. For instance, there is a term in computer engineering called a Heisenbug, which refers to faults that are unpredictable, most often for bugs that go away when you examine them. It's a cute name but it isn't even a correct reference. The quantum mechanical property of things changing when you observe them is not the Heisenberg uncertainty principle, it's the observer effect. These two ideas are often confused but they are not the same. They're not even closely related.

The observer effect in quantum mechanics describes how the act of measuring a quantum system forces the system to cough up a measurable quantity, which triggers a "wave function collapse". Heisenberg's uncertainty principle says that the minimum product of the error in simultaneous measurement of a particle's position and momentum is Planck's constant divided by 4π, or as we write it in physics, ℏ/2. (By the way, that's an extremely small value.)

Not only are these very different ideas, neither of them has anything to do with computer bugs. The term Heisenbug is trendy but bogus and ignores some strange and beautiful ideas. It's no better informed than the square root of Einstein or the periodic table of the typefaces.

If you're going to use the terms of science to inform your world, please make a point to understand the science too. Your world will be richer for it.

Tuesday, August 23rd, 2011
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
2:36 am
Regular expressions in lexing and parsing
Comments extracted from a code review. I've been asked to disseminate them more widely.

I should say something about regular expressions in lexing and
parsing. Regular expressions are hard to write, hard to write well,
and can be expensive relative to other technologies. (Even when they
are implemented correctly in N*M time, they have significant
overheads, especially if they must capture the output.)

Lexers, on the other hand, are fairly easy to write correctly (if not
as compactly), and very easy to test. Consider finding alphanumeric
identifiers. It's not too hard to write the regexp (something like
"[a-ZA-Z_][a-ZA-Z_0-9]*"), but really not much harder to write as a
simple loop. The performance of the loop, though, will be much higher
and will involve much less code under the covers. A regular expression
library is a big thing. Using one to parse identifiers is like using a
Ferrari to go to the store for milk.

And when we want to adjust our lexer to admit other character types,
such as Unicode identifiers, and handle normalization, and so on, the
hand-written loop can cope easily but the regexp approach will break
down.

A similar argument applies to parsing. Using regular expressions to
explore the parse state to find the way forward is expensive,
overkill, and error-prone. Standard lexing and parsing techniques are
so easy to write, so general, and so adaptable there's no reason to
use regular expressions. They also result in much faster, safer, and
compact implementations.

Another way to look at it is that lexers and parsing are matching
statically-defined patterns, but regular expressions' strength is that
they provide a way to express patterns dynamically. They're great in
text editors and search tools, but when you know at compile time what
all the things are you're looking for, regular expressions bring far
more generality and flexibility than you need.

Finally, on the point about writing well. Regular expressions are, in
my experience, widely misunderstood and abused. When I do code reviews
involving regular expressions, I fix up a far higher fraction of the
regular expressions in the code than I do regular statements. This is
a sign of misuse: most programmers (no finger pointing here, just
observing a generality) simply don't know what they are or how to use
them correctly.

Encouraging regular expressions as a panacea for all text processing
problems is not only lazy and poor engineering, it also reinforces
their use by people who shouldn't be using them at all.

So don't write lexers and parsers with regular expressions as the
starting point. Your code will be faster, cleaner, and much easier to
understand and to maintain.
Sunday, September 18th, 2011
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
10:19 pm
User experience
[We open in a well-lit corporate conference room. A meeting has been running for a while. Lots has been accomplished but time is running out.]

[The door opens and a tall, tow-headed twenty-something guy in glasses walks in, carrying a Mac Air and a folder.]

Manager:
Oh, here he is. This is Richard. I asked him to join us today. Glad he could make it. He's got some great user experience ideas.

Richard:
Call me Dick.

Manager:
Dick's done a lot of seminal UX work for us.

Engineer:
Hey, aren't you the guy who's arguing we shouldn't have search in e-books?

Dick:
Absolutely. It's a lousy idea.

Engineer:
What?

Dick:
Books are the best UI ever created. They've been perfected over more than 500 years of development. We shouldn't mess with success.

Product manager:
Well, this is a new age. We should be allowed to ...

Dick:
Books have never had search. If we add search, we'll just confuse the user.

Product manager:
Oh, you're right. We don't want to do that.

Engineer:
But e-books aren't physical books. They're not words on paper. They're just bits, information.

Dick:
Our users don't know that.

Engineer:
Yes they do! They don't want simple books, they want the possibilities that electronic books can bring. Do you know about information theory? Have you even heard of Claude Shannon?

Dick:
Isn't he the chef at that new biodynamic tofu restaurant in North Beach?

Engineer:
Uhh, yeah, OK. But look, you're treating books as a metaphor for your user interface. That's as lame as using a trash can to throw away files and folders. We can do so much more!

Dick:
You misunderstand. Our goal is to make computers easier to use, not to make them more useful.

Product manager:
Wow, that's good.

Engineer:
Wow.

Manager:
Let's get back on track. Dick, you had some suggestions for us?

Dick:
Yeah. I was thinking about the work we did with the Notes iPhone app. Using a font that looked like a felt marker was a big help for users.

Engineer:
Seriously?

Dick:
Yes, it made users feel more comfortable about keeping notes on their phone. Having a font that looks like handwriting helps them forget there's a computer underneath.

Engineer:
I see....

Dick:
Yes, so... I was thinking for the Address Book app for Lion, we should change the look to be like a...

Manager:
Can you show us?

Dick:
Yeah, sure. I have a mock-up here.
[Opens laptop, turns it to face the room.]

Product manager:
An address book! That's fantastic. Look at the detail! Leather, seams at the corners, a visible spine. This is awesome!

Engineer:
It's just a book. It's a throwback. What are you doing? Why does it need to look like a physical address book?

Dick:
Because it is an address book!

Engineer:
No it's not, it's an app!

Dick:
It's a book.

Engineer:
You've made it one. This time it's not even a metaphor - it's literally a book. You're giving up on the possibility of doing more.

Dick:
As I said, users don't care about functionality. They want comfort and familiarity. An Address Book app that looks like an address book will be welcome. Soothing.

Engineer:
If they want a paper address book, they can buy one.

Dick:
Why would they do that if they have one on their desktop?

Engineer:
Can they at least change the appearance? Is there a setting somewhere?

Dick:
Oh, no. We know better than the user - otherwise why are we here? Settings are just confusing.

Engineer:
I ... I really don't understand what's going on.

Manager:
That's OK, you don't have to, but I'd like to give you the action item to build it. End of the quarter OK?

Engineer:
Uhhh, sure.

Manager.
Dick, do you have the requirements doc there?

Dick:
Right here.
[Pushes the folder across the desk.]

Engineer:
Can't you just mail it to me?

Dick:
It's right there.

Engineer:
I know, but... OK.

Manager:
That's a great start, Dick. What else do you have?

Dick:
Well, actually, maybe this is the time to announce that I'm moving on. Today is my last day here.

Manager, Product manager:
[Unison] Oh no!

Dick:
Yeah, sorry about that. I've had an amazing time here changing the world but it's tiem for me to seek new challenges.

Manager:
Do you have something in mind?

Dick:
Yes, I'm moving north. Microsoft has asked me to head a group there. They've got some amazing new ideas around paper clips.

FADE

Sunday, January 1st, 2012
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
2:15 am
Esmerelda's Imagination
An actress acquaintance of mine—let's call her Esmerelda—once said, "I can't imagine being anything except an actress." To which the retort was given, "You can't be much of an actress then, can you?"

I was reminded of this exchange when someone said to me about Go, "I can't imagine programming in a language that doesn't have generics." My retort, unspoken this time, was, "You can't be much of a programmer, then, can you?"

This is not an essay about generics (which are a fine thing and may arrive in Go one day, or may not) but about imagination, or at least what passes for imagination among computer programmers: complaint. A friend observed that the definitive modern pastime is to complain on line. For the complainers, it's fun, for the recipients of the complaint it can be dispiriting. As a recipient, I am pushing back—by complaining, of course.

Not so long ago, a programmer was someone who programs, but that seems to be the last thing programmers do nowadays. Today, the definition of a programmer is someone who complains unless the problem being solved has already been solved and whose solution can be expressed in a single line of code. (From the point of view of a language designer, this reduces to a corollary of language success: every program must be reducible to single line of code or your language sucks. The lessons of APL have been lost.)

A different, more liberal definition might be that a programmer is someone who approaches every problem exactly the same way and complains about the tools if the approach is unsuccessful.

For the programmer population, the modern pastime demands that if one is required to program, or at least to think while programming, one blogs/tweets/rants instead. I have seen people write thousands of words of on-line vituperation that problem X requires a few extra keystrokes than it might otherwise, missing the irony that had they spent those words on programming, they could have solved the problem many times over with the saved keystrokes. But, of course, that would be programming.

Two years ago Go went public. This year, Dart was announced. Both came from Google but from different teams with different goals; they have little in common. Yet I was struck by a property of the criticisms of Dart in the first few days: by doing a global substitution of "Go" for "Dart", many of the early complaints about Go would have fit right into the stream of Dart invective. It was unnecessary to try Go or Dart before commenting publicly on them; in fact, it was important not to (for one thing, trying them would require programming). The criticisms were loud and vociferous but irrelevant because they weren't about the languages at all. They were just a standard reaction to something new, empty of meaning, the result of a modern programmer's need to complain about everything different. Complaints are infinitely recyclable. ("I can't imagine programming in a language without XXX.") After all, they have a low quality standard: they need not be checked by a compiler.

A while after Go launched, the criticisms changed tenor somewhat. Some people had actually tried it, but there were still many complainers, including the one quoted above. The problem now was that imagination had failed: Go is a language for writing Go programs, not Java programs or Haskell programs or any other language's programs. You need to think a different way to write good Go programs. But that takes time and effort, more than most will invest. So the usual story is to translate one program from another language into Go and see how it turns out. But translation misses idiom. A first attempt to write, for example, some Java construct in Go will likely fail, while a different Go-specific approach might succeed and illuminate. After 10 years of Java programming and 10 minutes of Go programming, any comparison of the language's capabilities is unlikely to generate insight, yet here come the results, because that's a modern programmer's job.

It's not all bad, of course. Two years on, Go has lots of people who've spent the time to learn how it's meant to be used, and for many willing to invest such time the results have been worthwhile. It takes time and imagination and programming to learn how to use any language well, but it can be time well spent. The growing Go community has generated lots of great software and has given me hope, hope that there may still be actual programmers out there.

However, I still see far too much ill-informed commentary about Go on the web, so for my own protection I will start 2012 with a resolution:

I resolve to recognize that a complaint reveals more about the complainer than the complained-about. Authority is won not by rants but by experience and insight, which require practice and imagination. And maybe some programming.

Wednesday, April 4th, 2012
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
5:22 am
The byte order fallacy
Whenever I see code that asks what the native byte order is, it's almost certain the code is either wrong or misguided. And if the native byte order really does matter to the execution of the program, it's almost certain to be dealing with some external software that is either wrong or misguided. If your code contains #ifdef BIG_ENDIAN or the equivalent, you need to unlearn about byte order.

The byte order of the computer doesn't matter much at all except to compiler writers and the like, who fuss over allocation of bytes of memory mapped to register pieces. Chances are you're not a compiler writer, so the computer's byte order shouldn't matter to you one bit.

Notice the phrase "computer's byte order". What does matter is the byte order of a peripheral or encoded data stream, but--and this is the key point--the byte order of the computer doing the processing is irrelevant to the processing of the data itself. If the data stream encodes values with byte order B, then the algorithm to decode the value on computer with byte order C should be about B, not about the relationship between B and C.

Let's say your data stream has a little-endian-encoded 32-bit integer. Here's how to extract it (assuming unsigned bytes):
i = (data[0]<<0) | (data[1]<<8) | (data[2]<<16) | (data[3]<<24);
If it's big-endian, here's how to extract it:
i = (data[3]<<0) | (data[2]<<8) | (data[1]<<16) | (data[0]<<24);
Both these snippets work on any machine, independent of the machine's byte order, independent of alignment issues, independent of just about anything. They are totally portable, given unsigned bytes and 32-bit integers.

What you might have expected to see for the little-endian case was something like
i = *((int*)data);
#ifdef BIG_ENDIAN
/* swap the bytes */
i = ((i&0xFF)<<24) | (((i>>8)&0xFF)<<16) | (((i>>16)&0xFF)<<8) | (((i>>24)&0xFF)<<0);
#endif
or something similar. I've seen code like that many times. Why not do it that way? Well, for starters:
  1. It's more code.
  2. It assumes integers are addressable at any byte offset; on some machines that's not true.
  3. It depends on integers being 32 bits long, or requires more #ifdefs to pick a 32-bit integer type.
  4. It may be a little faster on little-endian machines, but not much, and it's slower on big-endian machines.
  5. If you're using a little-endian machine when you write this, there's no way to test the big-endian code.
  6. It swaps the bytes, a sure sign of trouble (see below).

By contrast, my version of the code:
  1. Is shorter.
  2. Does not depend on alignment issues.
  3. Computes a 32-bit integer value regardless of the local size of integers.
  4. Is equally fast regardless of local endianness, and fast enough (especially on modern processsors) anyway.
  5. Runs the same code on all computers: I can state with confidence that if it works on a little-endian machine it will work on a big-endian machine.
  6. Never "byte swaps".
In other words, it's simpler, cleaner, and utterly portable. There is no reason to ask about local byte order when about to interpret an externally provided byte stream.

I've seen programs that end up swapping bytes two, three, even four times as layers of software grapple over byte order. In fact, byte-swapping is the surest indicator the programmer doesn't understand how byte order works.

Why do people make the byte order mistake so often? I think it's because they've seen a lot of bad code that has convinced them byte order matters. "Here comes an encoded byte stream; time for an #ifdef." In fact, C may be part of the problem: in C it's easy to make byte order look like an issue. If instead you try to write byte-order-dependent code in a type-safe language, you'll find it's very hard. In a sense, byte order only bites you when you cheat.

There's plenty of software that demonstrates the byte order fallacy is really a fallacy. The entire Plan 9 system ran, without architecture-dependent #ifdefs of any kind, on dozens of computers of different makes, models, and byte orders. I promise you, your computer's byte order doesn't matter even at the level of the operating system.

And there's plenty of software that demonstrates how easily you can get it wrong. Here's one example. I don't know if it's still true, but some time back Adobe Photoshop screwed up byte order. Back then, Macs were big-endian and PCs, of course, were little-endian. If you wrote a Photoshop file on the Mac and read it back in, it worked. If you wrote it on a PC and tried to read it on a Mac, though, it wouldn't work unless back on the PC you checked a button that said you wanted the file to be readable on a Mac. (Why wouldn't you? Seriously, why wouldn't you?) Ironically, when you read a Mac-written file on a PC, it always worked, which demonstrates that someone at Adobe figured out something about byte order. But there would have been no problems transferring files between machines, and no need for a check box, if the people at Adobe wrote proper code to encode and decode their files, code that could have been identical between the platforms. I guarantee that to get this wrong took far more code than it would have taken to get it right.

Just last week I was reviewing some test code that was checking byte order, and after some discussion it turned out that there was a byte-order-dependency bug in the code being tested. As is often the case, the existence of byte-order-checking was evidence of the presence of a bug. Once the bug was fixed, the test no longer cared about byte order.

And neither should you, because byte order doesn't matter.

Monday, June 25th, 2012
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
9:35 pm
Less is exponentially more

Here is the text of the talk I gave at the Go SF meeting in June, 2012.

This is a personal talk. I do not speak for anyone else on the Go team here, although I want to acknowledge right up front that the team is what made and continues to make Go happen. I'd also like to thank the Go SF organizers for giving me the opportunity to talk to you.

I was asked a few weeks ago, "What was the biggest surprise you encountered rolling out Go?" I knew the answer instantly: Although we expected C++ programmers to see Go as an alternative, instead most Go programmers come from languages like Python and Ruby. Very few come from C++.

We—Ken, Robert and myself—were C++ programmers when we designed a new language to solve the problems that we thought needed to be solved for the kind of software we wrote. It seems almost paradoxical that other C++ programmers don't seem to care.

I'd like to talk today about what prompted us to create Go, and why the result should not have surprised us like this. I promise this will be more about Go than about C++, and that if you don't know C++ you'll be able to follow along.

The answer can be summarized like this: Do you think less is more, or less is less?

Here is a metaphor, in the form of a true story.  Bell Labs centers were originally assigned three-letter numbers: 111 for Physics Research, 127 for Computing Sciences Research, and so on. In the early 1980s a memo came around announcing that as our understanding of research had grown, it had become necessary to add another digit so we could better characterize our work. So our center became 1127. Ron Hardin joked, half-seriously, that if we really understood our world better, we could drop a digit and go down from 127 to just 27. Of course management didn't get the joke, nor were they expected to, but I think there's wisdom in it. Less can be more. The better you understand, the pithier you can be.

Keep that idea in mind.

Back around September 2007, I was doing some minor but central work on an enormous Google C++ program, one you've all interacted with, and my compilations were taking about 45 minutes on our huge distributed compile cluster. An announcement came around that there was going to be a talk presented by a couple of Google employees serving on the C++ standards committee. They were going to tell us what was coming in C++0x, as it was called at the time. (It's now known as C++11).

In the span of an hour at that talk we heard about something like 35 new features that were being planned. In fact there were many more, but only 35 were described in the talk. Some of the features were minor, of course, but the ones in the talk were at least significant enough to call out. Some were very subtle and hard to understand, like rvalue references, while others are especially C++-like, such as variadic templates, and some others are just crazy, like user-defined literals.

At this point I asked myself a question: Did the C++ committee really believe that was wrong with C++ was that it didn't have enough features? Surely, in a variant of Ron Hardin's joke, it would be a greater achievement to simplify the language rather than to add to it. Of course, that's ridiculous, but keep the idea in mind.

Just a few months before that C++ talk I had given a talk myself, which you can see on YouTube, about a toy concurrent language I had built way back in the 1980s. That language was called Newsqueak and of course it is a precursor to Go.

I gave that talk because there were ideas in Newsqueak that I missed in my work at Google and I had been thinking about them again.  I was convinced they would make it easier to write server code and Google could really benefit from that.

I actually tried and failed to find a way to bring the ideas to C++. It was too difficult to couple the concurrent operations with C++'s control structures, and in turn that made it too hard to see the real advantages. Plus C++ just made it all seem too cumbersome, although I admit I was never truly facile in the language. So I abandoned the idea.

But the C++0x talk got me thinking again.  One thing that really bothered me—and I think Ken and Robert as well—was the new C++ memory model with atomic types. It just felt wrong to put such a microscopically-defined set of details into an already over-burdened type system. It also seemed short-sighted, since it's likely that hardware will change significantly in the next decade and it would be unwise to couple the language too tightly to today's hardware.

We returned to our offices after the talk. I started another compilation, turned my chair around to face Robert, and started asking pointed questions. Before the compilation was done, we'd roped Ken in and had decided to do something. We did not want to be writing in C++ forever, and we—me especially—wanted to have concurrency at my fingertips when writing Google code. We also wanted to address the problem of "programming in the large" head on, about which more later.

We wrote on the white board a bunch of stuff that we wanted, desiderata if you will. We thought big, ignoring detailed syntax and semantics and focusing on the big picture.

I still have a fascinating mail thread from that week. Here are a couple of excerpts:

Robert: Starting point: C, fix some obvious flaws, remove crud, add a few missing features.

Rob: name: 'go'.  you can invent reasons for this name but it has nice properties. it's short, easy to type. tools: goc, gol, goa.  if there's an interactive debugger/interpreter it could just be called 'go'.  the suffix is .go.

Robert Empty interfaces: interface {}. These are implemented by all interfaces, and thus this could take the place of void*.

We didn't figure it all out right away. For instance, it took us over a year to figure out arrays and slices. But a significant amount of the flavor of the language emerged in that first couple of days.

Notice that Robert said C was the starting point, not C++. I'm not certain but I believe he meant C proper, especially because Ken was there. But it's also true that, in the end, we didn't really start from C. We built from scratch, borrowing only minor things like operators and brace brackets and a few common keywords. (And of course we also borrowed ideas from other languages we knew.) In any case, I see now that we reacted to C++ by going back down to basics, breaking it all down and starting over. We weren't trying to design a better C++, or even a better C. It was to be a better language overall for the kind of software we cared about.

In the end of course it came out quite different from either C or C++. More different even than many realize. I made a list of significant simplifications in Go over C and C++:

  • regular syntax (don't need a symbol table to parse)
  • garbage collection (only)
  • no header files
  • explicit dependencies
  • no circular dependencies
  • constants are just numbers
  • int and int32 are distinct types
  • letter case sets visibility
  • methods for any type (no classes)
  • no subtype inheritance (no subclasses)
  • package-level initialization and well-defined order of initialization
  • files compiled together in a package
  • package-level globals presented in any order
  • no arithmetic conversions (constants help)
  • interfaces are implicit (no "implements" declaration)
  • embedding (no promotion to superclass)
  • methods are declared as functions (no special location)
  • methods are just functions
  • interfaces are just methods (no data)
  • methods match by name only (not by type)
  • no constructors or destructors
  • postincrement and postdecrement are statements, not expressions
  • no preincrement or predecrement
  • assignment is not an expression
  • evaluation order defined in assignment, function call (no "sequence point")
  • no pointer arithmetic
  • memory is always zeroed
  • legal to take address of local variable
  • no "this" in methods
  • segmented stacks
  • no const or other type annotations
  • no templates
  • no exceptions
  • builtin string, slice, map
  • array bounds checking

And yet, with that long list of simplifications and missing pieces, Go is, I believe, more expressive than C or C++. Less can be more.

But you can't take out everything. You need building blocks such as an idea about how types behave, and syntax that works well in practice, and some ineffable thing that makes libraries interoperate well.

We also added some things that were not in C or C++, like slices and maps, composite literals, expressions at the top level of the file (which is a huge thing that mostly goes unremarked), reflection, garbage collection, and so on. Concurrency, too, naturally.

One thing that is conspicuously absent is of course a type hierarchy. Allow me to be rude about that for a minute.

Early in the rollout of Go I was told by someone that he could not imagine working in a language without generic types. As I have reported elsewhere, I found that an odd remark.

To be fair he was probably saying in his own way that he really liked what the STL does for him in C++. For the purpose of argument, though, let's take his claim at face value.

What it says is that he finds writing containers like lists of ints and maps of strings an unbearable burden. I find that an odd claim. I spend very little of my programming time struggling with those issues, even in languages without generic types.

But more important, what it says is that types are the way to lift that burden. Types. Not polymorphic functions or language primitives or helpers of other kinds, but types.

That's the detail that sticks with me.

Programmers who come to Go from C++ and Java miss the idea of programming with types, particularly inheritance and subclassing and all that. Perhaps I'm a philistine about types but I've never found that model particularly expressive.

My late friend Alain Fournier once told me that he considered the lowest form of academic work to be taxonomy. And you know what? Type hierarchies are just taxonomy. You need to decide what piece goes in what box, every type's parent, whether A inherits from B or B from A.  Is a sortable array an array that sorts or a sorter represented by an array? If you believe that types address all design issues you must make that decision.

I believe that's a preposterous way to think about programming. What matters isn't the ancestor relations between things but what they can do for you.

That, of course, is where interfaces come into Go. But they're part of a bigger picture, the true Go philosophy.

If C++ and Java are about type hierarchies and the taxonomy of types, Go is about composition.

Doug McIlroy, the eventual inventor of Unix pipes, wrote in 1964 (!):
We should have some ways of coupling programs like garden hose--screw in another segment when it becomes necessary to massage data in another way. This is the way of IO also.
That is the way of Go also. Go takes that idea and pushes it very far. It is a language of composition and coupling.

The obvious example is the way interfaces give us the composition of components. It doesn't matter what that thing is, if it implements method M I can just drop it in here.

Another important example is how concurrency gives us the composition of independently executing computations.

And there's even an unusual (and very simple) form of type composition: embedding.

These compositional techniques are what give Go its flavor, which is profoundly different from the flavor of C++ or Java programs.

===========

There's an unrelated aspect of Go's design I'd like to touch upon: Go was designed to help write big programs, written and maintained by big teams.

There's this idea about "programming in the large" and somehow C++ and Java own that domain. I believe that's just a historical accident, or perhaps an industrial accident. But the widely held belief is that it has something to do with object-oriented design.

I don't buy that at all. Big software needs methodology to be sure, but not nearly as much as it needs strong dependency management and clean interface abstraction and superb documentation tools, none of which is served well by C++ (although Java does noticeably better).

We don't know yet, because not enough software has been written in Go, but I'm confident Go will turn out to be a superb language for programming in the large. Time will tell.

===========

Now, to come back to the surprising question that opened my talk:

Why does Go, a language designed from the ground up for what what C++ is used for, not attract more C++ programmers?

Jokes aside, I think it's because Go and C++ are profoundly different philosophically.

C++ is about having it all there at your fingertips. I found this quote on a C++11 FAQ:
The range of abstractions that C++ can express elegantly, flexibly, and at zero costs compared to hand-crafted specialized code has greatly increased.
That way of thinking just isn't the way Go operates. Zero cost isn't a goal, at least not zero CPU cost. Go's claim is that minimizing programmer effort is a more important consideration.

Go isn't all-encompassing. You don't get everything built in. You don't have precise control of every nuance of execution. For instance, you don't have RAII. Instead you get a garbage collector. You don't even get a memory-freeing function.

What you're given is a set of powerful but easy to understand, easy to use building blocks from which you can assemble—compose—a solution to your problem. It might not end up quite as fast or as sophisticated or as ideologically motivated as the solution you'd write in some of those other languages, but it'll almost certainly be easier to write, easier to read, easier to understand, easier to maintain, and maybe safer.

To put it another way, oversimplifying of course:

Python and Ruby programmers come to Go because they don't have to surrender much expressiveness, but gain performance and get to play with concurrency.

C++ programmers don't come to Go because they have fought hard to gain exquisite control of their programming domain, and don't want to surrender any of it. To them, software isn't just about getting the job done, it's about doing it a certain way.

The issue, then, is that Go's success would contradict their world view.

And we should have realized that from the beginning. People who are excited about C++11's new features are not going to care about a language that has so much less.  Even if, in the end, it offers so much more.

Thank you.

[ << Previous 20 ]

LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.