I’ve had some good laughs at work lately; and they stem from an idea that a particular set of technical symptoms means one thing, when, as it turns out, means something quite different.
I was given a task to investigate the cause of annoying email system outages that come and go, affects different sets of users at different times, and seems to occur in certain geographies more than others. That is how it was worded, and it came to the directory engineering team as inquiry regarding authentication failure on the directory servers.
Since I am the “new kid” in the group, I spent time thinking about the problem as it was described while familiarizing myself with the mail system architecture. After verifying that the domain controllers (the directory servers that authenticate users to the mail system) were not the problem, I developed a multifaceted hypothesis about why some users might all of a sudden be prompted for authentication and have it refused. My hypothesis was based on things that can go wrong with the individual systems within the mail system during authentication; but I wasn’t particularly satisfied with just guessing and I set out to collect data from users experiencing the problem for the purpose of validation.
I asked the mail team to send me some of the data they had collected so I could have a look at it. I received data on about 50 different users, only about 20 of which there was enough data to analyze. In going through the attached copies of system logs for the workstations, I found that all of the users were experiencing local issues. For example, the first specimen analyzed showed that the user was being locked out at the document management system for bad passwords. Another user had NAT enabled on her workstation; her system log was stuffed full of NAT errors, indicating a network connectivity issue. And under these circumstances, the mail client would, indeed, present a logon prompt and any input provided by the user would be refused.
The humorous part is that this “problem” has been going on since at least August and has received attention from many sets of eyes, including support from the software developer, without resolution. At least it is humorous until one considers the users affected by this problem received no resolution to their issues because I am likely the first one to actually look at the data and identify the real problem. The delay in resolution is not without consequence because the company hires people to generate revenue, and they can’t do that while the tools they use for that purpose are in a nonfunctional state.
I apologize for being longwinded while presenting my anecdote. I went into detail because after pondering this situation, I recalled that the environment in which I work has not changed much since I left it; and I suspect that it isn’t much different anywhere else when a communicated idea taken at face value turns out to be something quite different when subjected to even cursory scrutiny.
In the past, some have suggested that I am smart and that is why I have been successful at what I do. But I do not believe that I have been gifted with extraordinary intelligence just that I am more inquisitive, willing to look beyond surface appearances to get to the root of issues. One should, and in many cases, must question everything to be sure what we’re being told makes sense. But it rarely happens when certain ideas of importance are communicated, such as the various kinds of rhetoric floating around about why the economy is in the intensive care unit.
Nearly everyone has the ability to choose what to believe, but we have to be willing to make the effort to be as certain as possible that we aren’t just taking someone else’s word for it when they describe their ideas. My opinion is that the golden rule ought to be never reason from a high level idea found floating around the media, then move on to “never reason from a price change.”