Hi! This is where I write whatever I think needs sharing.

Announcing AlanDeniseEricLauren, an implementation of the ADEL algorithm

Posted on August 23, 2016

I uploaded AlanDeniseEricLauren to Hackage today. Here's the README:

AlanDeniseEricLauren is an implementation of the ADEL algorithm for efficiently finding the minimal subset of an input set satisfying some arbitrary upward-closed property. "Upward-closed" means if the property is true of some set S it is true of all supersets of S. My implementation is trivially extended to maps (dictionaries).

This can be used for e.g. narrowing down bugs by finding the minimal subset of a complex failing test case that still exhibits the issue. In addition, I provide a method for finding the minimal set of changes between a known-good and known-bad example needed to trigger a bug. (Equivalently, a set where the property is false and a set where it's true.)

The ADEL algorithm is due to Philippe Laborie in his paper "An Optimal Iterative Algorithm for Extracting MUCs in a Black-box Constraint Network" published in ECAI 2014. doi:10.3233/978-1-61499-419-0-1051. The paper is available at http://ebooks.iospress.nl/publication/37115.

The project's homepage is https://github.com/enolan/AlanDeniseEricLauren. Bug reports and PRs can be submitted there.

As of August 2016, I am looking for work. If your company needs a good programmer, my email is echo@echonolan.net. My resume is available here.

Comments

Notes toward an empirical study of programming language effectiveness

Posted on May 19, 2016

I've decided to do an empirical study of the effects of programming language choice. As a PL enthusiast, I'd like to know what languages are actually helpful in practice and whether the ones I like are actually any good. The existing research is, well, not great. In this post, I'll describe my current thoughts. If you have feedback on the design, please leave a comment - I make strong statements below, but I'm sure I've gotten some things wrong. If you're interested in collaborating, get in touch too. This is big and complicated and I'd love to work with others on it.

A randomized controlled trial would be nice if I had infinite money and time

An observational study is the only way to go. Experiments need to use trivial tasks and small n or are unfeasibly expensive, even if I had institutional backing. Consider a typical software project: we're talking hundreds of person-hours over a period of at least months. Projects done in a lab over a few days aren't similar enough to be comparable. Readability, types, testing, documentation, etc matter much less when it's all fresh in your mind and you don't have to explain it to anyone. Refactoring is unlikely to happen at all. Let's make up some numbers: a smallish project over three months with three developers working full time. We'll assume they're cheap and put them at $24/hr (the 25th percentile) . $24/hr * 40hrs/week * 4 weeks/month * 3 months * 3 developers is $34,560. Per case. For a sample size that gets you significance we're talking hundreds of thousands to millions of dollars. Maybe more, since I expect the outcome standard deviation to be high.

Given the decision to do an observational study i.e. collect data on projects that already happened, we need to control for confounders. Confounding variables aside, collecting anything that influences outcomes will give us more accurate predictions and allow us to detect smaller effects. The easy/obvious ones are project age, start year, team size, topic area, commercial vs noncommercial and company age at project start. It'd be best if I could also measure team skill but I don't have a test for that and even if I did I couldn't administer it. Experience level would be a decent proxy but I also won't be able to measure that. I just have to have to hope the measurable stuff is an adequate proxy. There are probably some I'm missing.

How to judge outcomes

People make various claims about what advantages various programming languages have. Java is good for large enterprise projects, Ruby lets you get something working fast, C lets you write fast programs, Haskell makes it harder to write bugs, etc, etc. In the interest of simplicity I've decided to skip anything more precise than user satisfaction. Everything else is difficult to measure consistently and only instrumental to the actual point of the software. I know it's weird that I'm advocating a subjective measure in the interest of consistency, but more specific things like bug rate, performance and feature count are subjective too and difficult to compare across projects and categories. What counts as a bug? Is this one bug or five? Is this rendering engine faster than this compiler? Is CSS3 support one feature or lots? Etc.

So we'll survey users. "Have you used a program that's name starts with $RANDOM_LETTER in the last week?" If they haven't, try again until they have. "What was it? How satisfied are you with the program?" The randomization is necessary: if we ask for the last one used, all the responses will be whatever program led them to filling out the survey (Twitter, email, SurveyMonkey, Firefox); if we pick a specific random program many of them won't have interacted with it or we'll only be able to ask about very popular ones. Maybe there's a standard way to deal with this? Let me know.

It's possible respondents opinions on the programming language(s) used affect their evaluations. I could collect their opinions for control, but I'm not convinced it's a real problem and it'd make the opinion survey take longer and we'd consequently get less responses.

How to collect explanatory variables

We need to collect the languages used in the projects, what components they're used for, confounding variables and any other useful predictors. For open source software this is relatively easy - I can probably write a tool to gather language and component information and maybe even topic; age and contributor count are in source control. For commercial projects we may have to survey people who work there. I expect some programmers would be willing to volunteer: many of us are interested in the topic. Maybe I'm incorrectly assuming that people are like me though.

If gathering data on proprietary projects proves too difficult, we can exclude them for now although that will mean throwing out a lot of data.

Social factors may also be useful. Finding out whether Linus' verbal abuse hurts the project would be interesting. Sentiment analysis of mailing list and IRC archives would work. Checking whether a code of conduct is in place is also doable. This is pretty complicated, so I'll hold off until the easier, more obviously relevant stuff is done.

The model

This is the complicated bit, and the one where I'm most dissatisfied with my answers.

Every substantial project involves more than one language. This website involves Haskell and a tiny bit of JavaScript. GHC involves Haskell, C, Make, Python, shell, C-- and Perl. Every non-Node web application involves at least two.

The simplest solution is to only consider the "majority" language. I threw that one away. The point is to offer advice about what to choose and in real projects you often have to choose more than one.

So we have the constraint that the model must handle multiple languages and that different projects may have different numbers of languages. Additionally, we'd like to account for which language is used in which component. Maybe it's bad to write your compiler in C, but good to write your RTS in C. Or Lua could be good for writing game logic but bad for renderers.

The variable number of features and the fact that they have no obvious numerical mapping puts us into the realm of machine learning algorithms. In particular, I intend to use an SVM. The non-empty set of component language pairs leads straightforwardly to a similarity metric. Projects with the same components in the same language are more similar to each other than those with the same components in different languages are more similar than projects with different components in different languages. I even found a paper on bag-of-tuples kernels. The other features have simple interpretations as dummies and reals.

Using an SVM over a more typical statistical technique makes some substantial sacrifices. First, complexity: I'm designing something in a field I just started learning about and there are plenty of opportunities to get it wrong. Second is interpretability: I won't be able to say things like "Using Go makes programs 30% better as compared to Java on average". Such statements are pretty vacuous anyway though. We're limited to specific hypotheticals. Third is statistical significance: SVMs don't give p-values or confidence intervals.

I think I have decent answers to the first and third problems. Posting things like this and using cross validation will help prevent me from fooling myself with a bad model. Bootstrapping can provide confidence intervals, though there are several methods to choose from. The second problem is, as I said, kind of dumb. However, for this to be useful it has to give advice. Me coming up with some hypotheticals manually and dynamically generating ones based on user input would be nice but may be too computationally expensive if we're running 10,000 samples for bootstrapping.

Conclusion

As I said in the introduction, I's appreciate feedback. I'd like to be proven wrong as soon as possible. Leave a comment or especially if you're interested in collaborating.

As an aside, I'm looking for progamming work. If you or your company is looking for someone, particularly if the work involves programming language design and impementation, get in touch. It needs to be in Portland, remote, or cool enough that I'd want to move. My resume is here.

Comments

A Debugging Horror Story: Fixing a Tricky GHC Bug

Posted on May 10, 2016

I recently spent more than 90 hours debugging what ended up being a problem in GHC's base libraries. I'll explain the bug and the context in which I encountered it, then relate some lessons that would've let me solve it in less than ninety hours.

The bug itself

In December 2015, I decided to set up a CI build for Idris. I didn't know what I was getting into. Using Stack seemed like the easiest and best option. After wrestling with problems unrelated to this post for a while, I got it sort of working. Problem was, sometimes it wouldn't finish in under an hour and AppVeyor would abort the build. I futzed around for a while trying to get it to build faster, but eventually realized it was hanging while downloading the dependencies. Stack interleaves downloading and building, so this isn't obvious.

Skipping ahead a bit, it turns out the problem occurs on unreliable networks when a call to recv or send fails. I'm baffled as to how AppVeyor's virtual machines in a Google datacenter can have worse internet than my bedroom, but there it is. If you know, leave a comment, seriously.

On POSIX systems and on Windows' clone of the BSD socket API, send and recv return the number of bytes sent or received, or -1 if there's an error. There's an important difference though. POSIX's return type is ssize_t and Windows' is int. On 32 bit systems, those are both 32 bit signed integers, but on 64 bit ssize_t is 64 bits and int is still 32. When one of the calls returned -1, it was interpreted as 4,294,967,295 a.k.a. 0xFFFFFFFF. Since the result wasn't -1, the library thought the call succeeded and didn't throw an exception. The "length" gets converted back to an int (actually a Haskell CInt). In recv you have a buffer with -1 bytes in it. Printf debugging:

blockingReadRawBufferPtr
throwErrnoIfRetry GHC.IO.FD.fdRead
pred res == False
throwErrnoIfMinus1Retry' res = 4294967295
blockingReadRawBufferPtr res = -1
after: buf8096(0--1)

The "-1 bytes" are eventually memcpyd and it segfaults. In sends it thinks it sent -1 bytes and loops forever, hence the hang.

You can check out my patch for GHC here. It was merged May 19 2016.

Finding it

I wasted a lot of time. There are two broad lessons I learned. First, if you have two hypotheses and one is much easier to characterize and fix, investigate it first, even if you think it's less likely. Second, use the techniques that get you the most useful information for the least effort.

Not actually a race condition

An intermittent failure says "race condition" to me, so I spent a ton of time investigating that idea. I found the code that manages parallel downloads and added a bunch of debug logging (search for parMapM_). I cloned it into a separate project and ran it overnight trying to get it to hang. It didn't, obviously.

If the problem wasn't in parMapM_, it must be in GHC's RTS, right? (No.) I looked over the source for bad spinlocks and tried building Stack with the debugging RTS. The spinlocks are only used during GC and debug logging showed the hang didn't ever happen during a collection.

In retrospect, there are more sources of nondeterminism than concurrency. The outside world can and in this case did cause an intermittent failure. I spent lots of time investigating the presumed race and almost no time proving that a race was actually the problem. There are several knobs one can twiddle to try and show that the problem is a race. Stack has a setting for the number of parallel downloads. You can turn the threaded RTS on and off and set the number of OS threads it uses. I encountered a bit of a red herring here. The problem isn't in the threaded RTS but it only happens when the threaded RTS is in use.

Because of the way GHC deals with nonblocking IO on Windows, the code for reading and writing to file descriptors and sockets checks whether it's running in the threaded RTS. The code for the non threaded RTS doesn't exhibit the bug.

When I decided to check whether it was related to networking, I got positive results almost immediately. There's a tool called Clumsy that simulates a variety of network failures. Setting it to drop packets got the bug to reproduce much more consistently on my local machine. (Trying to debug something like this on a remote system you don't control is awful.) Setting it to corrupt them got it to crash around 50% of the time. I was very happy.

I got a working testcase using only the library Stack uses for HTTP - http-conduit - on the first try. I reported it to the maintainer, Micheal Snoyman. He had a decent suggestion but was ultimately not helpful. This is no fault of his own, it's a Windows only bug that I couldn't reproduce in isolation and at the time thought had to do with TLS.

High leverage and low leverage techniques

The most time-efficient tools were Clumsy and API Monitor. API Monitor lets you see every call to any Windows DLL. It clearly showed the failing call to recv followed by a memcpy with length -1 as well as repeated failing calls to send. It's like a sort of super-strace in that it intercepts calls before a syscall is even issued. This is important since a lot of things that are syscalls on Linux are implemented in DLLs on Windows.

I also used hpc-strobe, a very cool and underappreciated tool. At regular intervals, it records a program coverage report. The trick is that these reports include counts of the number of times each expression was entered, not just booleans. You can take two reports, subtract one from the other and find out what code was executed in between the times it recorded them.

This was supposed to tell me what code it was hanging in. Unfortunately there's a caveat I didn't realize at the time: library code isn't instrumented at all. Once it reported the hang happened while the socket handle was being evaluated, once while the hash of the download was being computed and once while parMapM_ was running. All of that was just coincidence: the hang wasn't in any of the code that hpc-strobe can see. I did a lot of printf debugging trying to find the hang in places it wasn't.

I still think it's really cool, but knowing its limits is important.

I did a lot of printf debugging in the course of tracking down the bug. It was indispensable, but also tedious and error prone. An interactive debugger would've been great, but doesn't really exist. GHCi's debugger is better than nothing, but irrelevant in this case - it's too different from how the program actually runs. In particular, it can't use the threaded RTS so the bug doesn't show up at all. You can load Haskell executables in GDB, but there aren't any symbols in Haskell code and no backtraces either. There is preliminary work to improve the situation but it's immature and as far as I can tell doesn't work at all on Windows.

Conclusion

To save yourself tons of debugging time: question your assumptions and use powerful tools but know their limits. Printf debugging is useful but should only be used after you've exhausted the more efficient options.

Finally, I'm looking for programming work. If you or your company is looking for someone, particularly if the work involves programming language design and implementation, get in touch. It needs to be in Portland, remote, or cool enough that I'd want to move. My resume is here.

Comments

One Week In

Posted on August 10, 2015

One week has passed since I resolved to work on four things one hour each, every day. I didn't live up to the plan, but I got a lot more done on those four things than any week previously. I'll call it a win. Categorized results:

Comments

New Month's Resolutions

Posted on August 2, 2015

It's the beginning of August, and that's as good an excuse as any to make resolutions. I've been coasting for a long time, and I want to do more stuff that matters with my life. To wit, here are some things I plan to spend at least an hour per day on:

I plan to do progress updates here every Sunday.

Comments

Status Update: Game Theory Ahoy!

Posted on December 2, 2013

(For my last two (one, two) blog posts, I've been talking about prediction markets and their application to funding libre engineering and science.)

If we expect people to participate in prediction markets, fulfilling predictions to their personal financial benefit and society's overall benefit, we have to give them some assurance that they'll actually make money. I can't tell people what strategy to use when buying contracts, what price to pay, when to sell, when or even how much to invest in progress on the goal itself. I could build a prediction market relatively easily, but there is little point if all I achieve is a fancy casino.

I intend to prove (or disprove, and find a new obsession) that given enough initial potential benefit distributed amongst a group of individuals, adding the ability to trade that potential benefit causes the net end-state benefit of the group to increase. If I can figure out the optimal strategy, I can answer the questions of when to buy or sell for how much and when to contribute. If outcomes are improved when players mostly use optimal strategies, I can prove prediction markets a useful mechanism.

Let's describe the game formally:

I think that's enough. If I'm successful I'll move to progressively more realistic models.

Trivially, we can see that if Call < P, no rearrangement of the contracts will make it rational for anyone to complete the project. If we consider only one player, it reduces to the question of whether to invest in a private good. Once we have more than one players with some contracts, things become more difficult.

I'm going to leave the analysis there for now. I've got a nice big book on game theory to attack. When I come back, hopefully I'll feel better equipped to handle the task I've set for myself.

Comments

Recipe for an Open Source Economy

Posted on October 8, 2013

Last time, I said "our society should reward contributions to the common pool of knowledge in proportion to their value to society." That sounds pretty sketchy, right? Don't worry, I'm no communist. I'm looking for an answer that works when individuals act on their own initiative. Let's talk concretely.

Imagine a system that tracks all the groceries in your house, suggests meals to cook and helps make shopping lists. Cool, right? The sort of technological advance that I'd like to incentivize. Let's imagine Alan thinks it'd be cool too, and he values such a development at $300. Note we're talking only about the information - the ideas, the designs and software of the system. Alan will still have to buy the physical components to actually use it. Here's a proposition: what if he sells off his anticipated benefit?

The central idea here is that of a prediction contract - a financial instrument entitling the holder to money if a certain prediction comes true. The prediction we're interested in here is something like "there will be developed a grocery-tracking computer system with features x, y and z". Alan has decided such a system would make his life $300 easier, so he's happy being up to $300 poorer in trade. For simplicity's sake I'll say prediction contracts always pay out at $1. So Alan sells 300 contracts for, say, $0.20 each. Up a total of $60. If nobody figures out the grocery-tracking problem, he keeps it. If the prediction comes true - if someone or some group successfully designs a system meeting the spec - Alan has to pay out the $300 to the contract holders. His net monetary loss is $240, but since the system is worth $300 to him, he gains $60 in net utility. The effect is that Alan doesn't care whether the prediction comes true anymore.

Wikipedia has a decent article on prediction markets. The idea of prediction markets isn't mine, but the concept of anticipated benefit transfer and it's application to incentivizing the production of public goods is an original contribution. Conventionally, prediction markets are used for betting on world events; prices on these markets reflect the market's believed probability of a given event happening.

At this point, you can rightly ask why Alan should make the sale in the first place. All it accomplishes for him is trading a potential windfall - new invention - for a smaller guaranteed return. If the price he and his buyer agree on accurately reflects the probability of the project's successful completion - $0.50 per contract for a 50% probability and so on - the law of large numbers says he'd get the same result without any of the tedium of selecting predictions, deciding on their value and selling the contracts.

That line of argument holds water, though I have an inkling there may be more going on here than meets the eye - more on that later. Setting aside whether it's rational to sell off anticipated benefit, Alan can use the prediction market as a means of cheaper and more efficient charity. Alan's sale of prediction contracts, combined with others' like him, may create sufficient financial incentive to get the goal completed. If not, he is much better off financially than if he had donated the money to the Stuff To Make Household Chores Easier Foundation. Furthermore, he doesn't have to figure out if the foundation is capable of completing the project, or if the project is possible at all. Price competition from multiple bidders should mean he ends up paying a price determined by the complexity of the project and the amount of contributions in contracts from others, rather than a donation amount clouded by uninformed perceptions.

At this point I'll acknowledge the work these ideas are based upon. I was first introduced to these ideas in a paper published in the online journal First Monday, "The Wall Street Performer Protocol: Using Software Completion Bonds to Fund Open Source Software Development" by Chris Rasch. Rasch's work is in turn based on the concept of social policy bonds, invented by Ronnie Horesh. Horesh and Rasch both talk about the concept in terms of "bonds", I prefer the prediction market terminology. Note that Horesh and Rasch's bonds have no expiry date, while prediction contracts always do. There would be no way to profit by betting no if a seller was always liable to pay the face value of a contract.

Let's look at this from the perspective of the person buying these prediction contracts. Brenda has the technical and business skills necessary to build a system meeting the prediction's specifications. She buys up some contracts and begins work. Carl and Dani also buy some. Since they get paid if the project is completed satisfactorily regardless of who turns it in, all three of them work together in whatever way they think is most likely to get the system built. As the project gets closer and closer to completion, the market value of the contracts rises. Dani gets a job offer and has to stop working on the project. She sells her contracts to Carl, making a profit based on the progress to date. When the project is completed, Carl and Brenda get paid, repay their investors and move on to the next thing.

So there's the rosy eyed promise of the idea - prediction markets can be used to efficiently pay for libre engineering in a way that incentivizes open collaboration between any number of participants and rewards partial progress. If people sold prediction contracts in a way that reflected their desires, we could make much faster progress as a society in the scientific, technological and humanitarian arenas.

That's enough for now, I think. Next time is the "Halp! I'm stuck!" post, where we delve into the not-so-rosy questions like "Does it make sense for anyone to sell prediction contracts in this manner?" and "How do you decide the value of a prediction contract?"

Comments

Making it the Future Sooner

Posted on August 13, 2013

I want a space elevator. I want to bind mortality. I want to understand the fundamental physics of the universe. I want to be able to speak to anyone, in any language. I short, I want the future, now. I want science and technology to advance quickly enough that I can live to experience the achievements I can't even dream of today. I think I can hack society to get there faster.

It is my goal to create an open source future - a society where innovations and knowledge are shared, where collaboration is the norm and progress is made by anyone with the skills and inclination. The system of the present day involves tragic waste of effort and talent. We spend millions of person-hours duplicating engineering effort; we take the best thing about information in the modern era - that it can be shared with anyone anywhere for free - and make it illegal; and we empower large corporations to control the future of technologies through patents.

To get from here to there we need a mechanism for funding libre technological and scientific development. GlaxoSmithKline will not continue paying for drug discovery efforts if it cannot expect to profit from the resulting patents. Nor is Intel likely to give up the patents and proprietary designs that allow it to dominate the x86 processor market.

Incentivization is the key - our society should reward contributions to the common pool of knowledge in proportion to their value to society. There is value in technology, and that value can be gauged in monetary terms, but effecting the movement of money from beneficiaries to creators in a way that incentivizes open collaboration is tricky. Nevertheless, I believe it is a solvable problem. I do not yet have the whole solution, but I have some exciting ideas to share and I hope you readers can help develop things further.

Next time: less why, more how

Comments

First

Posted on July 30, 2013

'Ello, world!

Comments

More posts in the archives.