Tuesday, June 4, 2013

collaborative online research


This post focuses on the possibilities of collaborative online research. It is largely my attempt to process the ideas put forth by a number of remarkably forward-thinking scientists. In particular, I believe that the way forward in collaborative online research lies in developing an intuitive, modern interface that eases collaboration and is seamlessly integrated into the more traditional paper-in-journal approach.


In January of 2009, Fields-medal winning mathematician Tim Gowers posed the question, "Is massively collaborative mathematics possible?" In other words, is it possible for a large (\(n\gg 3\)) set of mathematicians to solve a problem that "does not naturally split up into a vast number of subtasks"? Gowers started the Polymath Project to answer this question, but before we examine its successes and failures, let's briefly discuss why massively collaborative research might be a good idea in the first place.

Research, regardless of the field, is largely a game of ideas. In mathematics and physics, especially, research consists of developing logical connections from one idea to the next in the hopes of reaching an interesting destination. The challenge of research, of course, is that the natural progression of ideas is not always obvious; that is, it is not always evident that the current line of reasoning will lead to any fruitful results. Even if the end result is already known, the path to the result might be unclear. This is where large-scale collaboration comes into play: if my progress on the research problem were fully detailed on some sort of public online forum, and my confusion clearly highlighted, it is quite possible that some other researcher would be able to lend his particular expertise. After all, what is unclear and mysterious to me might be completely second nature to another. It is precisely this variance among scientists that large-scale collaboration takes advantage of: a variance not only in factual knowledge, but also in the set of "little tricks" that each scientist carries as his conceptual Swiss army knife. Gowers summarizes, "In short, if a large group of mathematicians could connect their brains efficiently, they could perhaps solve problems very efficiently as well."1 Faith in this approach might further be corroborated by the success of the open-source movement, which can be seen as a close parallel to scientific research. The problems tackled by software developers are by no means "embarrassingly parallel," yet the progress in software has been accelerating like never before, thanks in part to the open-source community. Thus, in much the same way that the open-source movement has revolutionized software development, one might hope that collaborative science might do the same for research.

Connecting researchers on large scales through the internet is key, writer/physicist Michael Nielsen claims, and describes the next advancement in how science is done as a revolution in our "collective memory":
The adoption and growth of the scientific journal system has created a body of shared knowledge for our civilization, a collective long-term memory which is the basis for much of human progress. This system has changed surprisingly little in the last 300 years. The internet offers us the first major opportunity to improve this collective long-term memory, and to create a collective short-term working memory, a conversational commons for the rapid collaborative development of ideas.2
"Rapid collaborative development of ideas" through a connected network of scientists is no small matter; it would work wonders in accelerating  progress made in the fields of mathematics and physics. Imagine further if, in the long run, the scope of such collaborative research could be extended past mathematics and physics.3 If data and methodologies were made available online, extensive dialogue between scientists might accelerate progress on all sorts of research.

Gowers immediately set out to test the power of such collaboration, starting the Polymath Project. Technically speaking, the project was just another one of Gowers' blog posts: Gowers describes the mathematical problem that he wishes to solve along with some of his initial thoughts, and then opens the floor for others to participate by leaving comments. Take a few minutes and skim through the post and comments here and get a flavor for the contributions. Even if, like me, you can't follow the mathematics, the chain of conjectures, counterexamples, and the occasional 8am "I should go to sleep…" comments is fascinating. As Gowers and Nielsen later put it, "Who would have guessed that the working record of a mathematical project would read like a thriller?"4 But not only did the Polymath Project spawn interesting discussion, the results exceeded Gowers' expectations. As they describe it,
Over the next 37 days, 27 people contributed approximately 800 substantive comments, containing 170,000 words... Progress came far faster than anyone expected. On 10 March, Gowers announced that he was confident that the Polymath participants had found an elementary proof of the special case of DHJ, but also that, very surprisingly (in the light of experience with similar problems), the argument could be straightforwardly generalized to prove the full theorems.4 
That the project might pass for a casual discussion between mathematically-inclined strangers through a rather clunky online blog interface, yet actually be groundbreaking research in the field of combinatorics is astonishing and speaks to the power and efficiency of collaborative science.5



Gowers later dissected the Polymath experience, in an attempt order to determine the Project's success.6 In the typical Polymath style, there were hundreds of insightful 'metacomments'; many of these were optimistic and delighted by the progress made, while many were critical of certain aspects of the collaborative model. If we are to learn from the Polymath Project for the purposes of future such experiments, we should take note of what might be kept and what might be improved.

If you only take away one thing from the Polymath Project, I think it should be that collaborative mathematics is possible. And not only is it possible, but its capabilities are well beyond the initial expectations in pace and scope. Even in the finer details, the Project was an excellent pilot for collaborative research. For one, it was "genuinely collaborative" in that there were more than just 3 or 4 participants, which is especially remarkable considering how arcane the original problem was, even to your average mathematician. Additionally, many of Gowers' postulates about the power of "connected mathematicians" seemed to hold true throughout, as he recalled later,
The project has been genuinely collaborative, and has led, to a remarkable extent, to the kind of efficiency gains that I was hoping for... I found myself having thoughts that I would not have had without some chance remark of another contributor. I think it is mainly this that sped up the process so much.
Furthermore, almost all the posts seem to have added some sort of valuable content to the discussion, and spam seems to have been virtually non-existent. This, however, may have been more of an artifact of how specialized the discussion was than anything else.

At the risk of repeating myself too many times, I want to emphasize that from the perspective of content the collaboration absolutely flourished. On the other hand, if we step back and focus on the structure/interface of the collaboration, it becomes pretty obvious that there are numerous things that could be improved.

Right off the bat it should be clear that a blog is not the optimal interface through which to collaborate on mathematics. When working on a project, especially one with a large number of collaborators, it is critical to be able to see what's been done since you last contributed. In software, for example, this is typically done via diff, etc. Unfortunately, there is no easy way to do this for the comments of a blog; indeed, it would quite easy to lose a vital message in the tangled maze of discussion threads. A simple solution to this problem is not to thread,  to post comments linearly (i.e. either no or limited nesting of replies). This, however, can be quite restrictive, especially when the discussion has naturally split into multiple subtopics.

Related to the issue of threading and nested replies is the visual aspect of the discussion. Although the content itself should not change, the way it is displayed should be flexible and easy on the eyes. Take Reddit, for example: it provides a method for collapsing and expanding subsets of the discussion, which  can be quite useful when the number of comments is too high. Speaking of easy on the eyes, the TeX rendering on WordPress isn't too pretty - using something like MathJax would be much more aesthetically pleasing (not to mention useful, as it allows access to the underlying TeX). Just as important as the presentation of the discussion is navigability. Often it is necessary to refer to certain comments made earlier, and some sort of comment hyperlinking+equation referencing in addition to a smooth system of moving back and forth through the discussion would certainly be useful.

These are but a few possible features that one might look for in a platform for collaborative mathematics; I'm sure anyone who has more experience doing research would be able to think of some more. Indeed, Gowers et al. acknowledged these usability problems as they were collaborating on the Polymath Project, and the collaborators made do as well as they could, constrained to the blog format. What is especially striking to me is that all of the technology that could trigger collaborative online mathematics is already deployed and heavily used.

Take MathOverflow, for instance. Designed as a question-and-answer forum for professional mathematicians (primarily graduate students and researchers), MathOverflow has flourished, and serves as the go-to place for little mathematical hurdles one might encounter. The interface is quite slick, what with user profiles, tags used to categorize questions, the ability to upvote or downvote questions, answers, and comments. Unfortunately, MathOverflow is not a place designed for large-scale discussion -- indeed, very recently there was a post on Zhang's bound on primes that was more discussion-oriented than Q+A-oriented and was subsequently closed (though one commenter pointed out that reducing the bound might make an excellent polymath project!). Regardless, the gadgets that MathOverflow uses could be easily applied to a polymath-like platform: profiles might keep track of recent contributions to various research problems, upvotes might highlight particularly insightful contributions, and tags might be lend some structure and ease in finding other "related" interesting projects.

But why stop there? We could draw inspiration from TeX, and add a certain amount of mobility into the discussion with hyperlinks, references, metadata-appropriate labels (conjecture, remark, etc.) and so on. Or perhaps we could take ideas of project "history" from the open source movement, and keep a record of the project's revision history (discussion, Mathematica code, images, etc). Much like an open source project, the discussion might have a "README" tab containing background material and general outlook, while other tabs might each contain a different approach to tackling the problem.

But I digress. Perhaps I should summarize: I believe that the key to sparking a revolution in online collaborative research will lie in developing: both an elegant and highly functional user interface (difficult) and a close but expanding community of active researchers (very difficult). In this sense, the problem of collaborative research is both technological and social in nature (truly a problem of our time, eh?).

From there... who knows? For one thing, recording the entire history of a research project, down to the last detail, will be something that's never been done before. It may even open the doors to meta-research of some level, i.e. studying which approaches tend to solve certain problems, which groups of people tend to work together better, and so on. Furthermore, it would be interesting to see how this research paradigm might interact with the usual paper-in-journal approach. How do we credit contributors? Should we simply replace the author field with a link to the research discussion? Can third-parties (such as universities) expect to evaluate the skills of a researcher by examining his/her profile page? etc. It's clear that such questions will inevitably be brought up, even in designing the research platform itself.

Regardless of the details, it seems to me that the key aspect in collaborative research is the sheer openness through which it would operate: by making public vast amounts of data, critical discussions, and deep scientific insights, we knock down barriers that we have only recently begun to notice, and allow scientific discourse to reach its full potential.


If you're interested in this endeavor, I highly recommend reading Nielsen's essay, The Future of Science. It's almost hard to believe that it was written 5 years ago, especially considering today's constantly evolving tech landscape.



1. Read Gowers' post here.
2. Read Nielsen's post here.
3. I say "in the long run" because fields involving vast amounts of data, technical equipment, etc. are more complicated to fit into an online collaborative environment. Of course, this is not to say that it can't be done, as there are already quite a few interesting and useful streams of data/research, as Nielsen discusses (see footnote 2).
4. Nature 461, 879-881 (15 October 2009), doi: 10.1038/461879a
5. The paper is available on arXiv.
6. See these posts.