The saga of Accumulo bug 4379

Not too long ago, I was setting up an installation of the Accumulo distributed database. At some point in the process, I ran Accumulo’s bin/ script, which printed out the following:

HADOOP_PREFIX not set cannot automatically configure LD_LIBRARY_PATH
Please remember to compile the native libraries using the bin/ script and to set the LD_LIBRARY_PATH variable in the /home/zk/accumulo/conf/ script if needed.

Taken together, I interpreted these two messages as meaning that Accumulo couldn’t find, its native library. (I was wrong, but we’ll get to that.) But I had compiled just a few seconds previously, using the bin/ script that the second message was urging me to use. I wasn’t sure why I was seeing these messages, but since the script didn’t seem thrilled that the HADOOP_PREFIX environment variable was not set, I decided to start there.

I had previously installed Hadoop, but in doing so had not set HADOOP_PREFIX. I did, however, have HADOOP_HOME set, so, as an experiment, I tried setting HADOOP_PREFIX to the same thing as HADOOP_HOME and running bin/ again.

This time, the result was:

Native libraries could not be found for your sytem in: [path that HADOOP_PREFIX was pointing to]
Please remember to compile the native libraries using the bin/ script and to set the LD_LIBRARY_PATH variable in the /home/zk/accumulo/conf/ script if needed.

This was even more of a head-scratcher. Now, it seemed more like the script was complaining that I hadn’t compiled the Hadoop native library,, when I had set up Hadoop a few days prior (and, indeed, I had not compiled; it was an optional part of Hadoop setup that I had skipped).

Was the “Please remember to compile the native libraries” message actually about and not about Confused, I asked a question on Server Fault. Someone came along and tried to answer my question, but it didn’t help.

Finally, I dug into the script that had been producing these messages. Reading the code, I realized that the Please remember to compile the native libraries thing was just a courtesy message that everyone who was trying to do a native Accumulo configuration (that is, a configuration that expected to be present) would see, and that I shouldn’t be worried about it; it did not mean that did not exist. The Native libraries could not be found for your sytem1 message, on the other hand, came from a different part of the script, and it meant that couldn’t be found. So, there were two messages, being printed one right after the other, each referring to “native libraries”, but each meaning something different by that phrase: the first message meant the Hadoop native libraries, and the second meant the Accumulo native libraries.

I thought about filing an issue against Accumulo to suggest that the wording could be made less confusing. In fact, I already had an account on, created back in May when I had started using ZooKeeper.2 Once confronted with Accumulo’s long, tedious JIRA “Create Issue” form, though, I realized I wasn’t sure what to do. Did my issue count as a “Bug”, or was it an “Improvement”? Which versions of Accumulo were affected? (I knew which version I’d been trying to use, but should I just list that one, or did I need to check all the others before filing the bug?) Should I list the “Component” as “native”, “shell”, “scripts”, or something else? And I didn’t feel comfortable providing an estimate of how long the issue should take someone to fix.

Most of the form fields weren’t required. Still, having never contributed to the project before, I didn’t know what the project norms and customs were. I didn’t know if leaving most of the form blank might, for instance, lead to the issue languishing unattended, or if it would be considered rude or thoughtless. So I gave up on JIRA and just added a comment on Server Fault summarizing my most recent thinking about the bug.

A little while later, the person who’d attempted to answer my question — who, it turned out, was longtime Accumulo contributor Josh Elser — responded saying he now understood my complaint and had filed an issue himself. Hooray! I commented on the newly filed issue to provide some more clarification. About a month after that, Dave Marion committed a fix, and all was well. The messages now specify “Hadoop native libraries” and “Accumulo native libraries”, respectively.

I want to thank Josh for how he handled this bug. He could have given up on helping me after his original attempt at answering my question, but he stuck around until my complaint was clear to him. Then, he thanked me for pointing out the issue. In fact, Josh thanked me twice — once in a comment on Server Fault, and then again on the JIRA issue after I commented there. This made such an impression on me that I actually made a point of sending a link to the JIRA issue to several colleagues, saying things like, “Hey, check this out — the Accumulo people are great, and they’re going to fix my bug! This is how open source is supposed to work!” I turned into an Accumulo cheerleader for a day, just because one project contributor offered a sincere-sounding “Thanks!” to a stranger on the Internet.

“But wait!”, I asked myself as I began writing this post. “Why am I so happy about how all this ended up? Wouldn’t it have been better if I could have just filed the bug myself, instead of having to rely on a project insider to do it for me?”

Most of my open-source experience has been with projects that prioritize making it fast and easy for ordinary users, folks who aren’t project insiders and might not contribute to the project otherwise, to put stuff in the issue tracker. That’s the approach I’ve come to expect from most projects, and “Use the issue tracker!” is what I assume most projects will expect of me. That’s why I initially tried to use the Accumulo issue tracker once I knew I had a bug to file. But no Accumulo person had actually ever said “Use the issue tracker!” to me. That came from my own expectations, not theirs. Once Josh understood my complaint on Server Fault, he could have said, “Oh, yeah, that sounds like a problem — can you please file a bug?” That might have emboldened me to face JIRA again, or it might not have; instead of tossing that die, he just filed the bug himself.

My experience with this bug led me to question some of my assumptions about the superiority of the “Use the issue tracker!” approach. One of the assumptions I was making, for instance, was that there’s no good way to bring bugs to project insiders’ attention other than through the issue tracker. But that’s not necessarily the case: the Accumulo people, for instance, seem to be proactively keeping an eye on Q&A sites to see what problems users are having, and they’re responding to those problems. They’re not waiting around for confused users to show up on their issue tracker before they begin offering help.

So I’m interested in hearing about projects that have successfully adopted an “only insiders use the issue tracker” approach. For instance, a project might have a mailing list where users discuss bugs in an unstructured way, and project insiders distill those discussions into bug reports to be entered into the issue tracker. Where does this approach succeed, and where does it fail? How can projects that operate this way effectively communicate their expectations to non-insider users, especially those users who might be more accustomed to using issue trackers directly?

  1. “System” was misspelled. In fact, it still is! Maybe some Accumulo person will read this and fix it.

  2. I had created the account because I had wanted to do some minor cleanup of this page and thought that making an account might automatically give me edit access. It didn’t, which is understandable; as Karl Fogel notes in his book on producing open source software, even wikis that require account creation will attract spam if it’s easy for spammers to get an account. Still, I wish there were a push-button way to suggest minor edits to the page.