Saturday, July 6, 2013

Berkeley REU and Commutative Algebra

I haven't been able to post for about the last month, unfortunately, primarily because I'm at Berkeley for a geometry/topology/rep. theory REU. The weather's great here, but I mostly sit in a room and do math all day, so I don't get to appreciate it much. My project is primarily concerned with wonderful compactifications of homogeneous spaces via representation theory, which is pretty neat (if abstract), but quite a bit of hard work, seeing as I know nothing about representation theory or wonderful compactifications... we have a blog up here on which we will sporadically post math.

I have a wonderful (pun intended) roommate -- you can find his blog here -- and in our free time we pretty much just do more math. Speaking of more math, I recently bought Reid's book on Commutative Algebra, and am now trying to work through it. In addition, I've resolved to learn a non-zero amount of algebraic topology, probably out of Hatcher, so I can take Khovanov's course next semester. To this end, especially since I probably won't be posting anything of much substance here for the next month or so, I will try to post some of my notes on algebra/algebraic topology. By notes I mean mostly: a) proofs that I want to work through to understand better and b) exercises.

I'll end this post here -- my next post will likely be some notes on modules and exact sequences, stuff that I probably should have learned a long time ago.

Tuesday, June 4, 2013

collaborative online research

This post focuses on the possibilities of collaborative online research. It is largely my attempt to process the ideas put forth by a number of remarkably forward-thinking scientists. In particular, I believe that the way forward in collaborative online research lies in developing an intuitive, modern interface that eases collaboration and is seamlessly integrated into the more traditional paper-in-journal approach.

In January of 2009, Fields-medal winning mathematician Tim Gowers posed the question, "Is massively collaborative mathematics possible?" In other words, is it possible for a large (\(n\gg 3\)) set of mathematicians to solve a problem that "does not naturally split up into a vast number of subtasks"? Gowers started the Polymath Project to answer this question, but before we examine its successes and failures, let's briefly discuss why massively collaborative research might be a good idea in the first place.

Research, regardless of the field, is largely a game of ideas. In mathematics and physics, especially, research consists of developing logical connections from one idea to the next in the hopes of reaching an interesting destination. The challenge of research, of course, is that the natural progression of ideas is not always obvious; that is, it is not always evident that the current line of reasoning will lead to any fruitful results. Even if the end result is already known, the path to the result might be unclear. This is where large-scale collaboration comes into play: if my progress on the research problem were fully detailed on some sort of public online forum, and my confusion clearly highlighted, it is quite possible that some other researcher would be able to lend his particular expertise. After all, what is unclear and mysterious to me might be completely second nature to another. It is precisely this variance among scientists that large-scale collaboration takes advantage of: a variance not only in factual knowledge, but also in the set of "little tricks" that each scientist carries as his conceptual Swiss army knife. Gowers summarizes, "In short, if a large group of mathematicians could connect their brains efficiently, they could perhaps solve problems very efficiently as well."1 Faith in this approach might further be corroborated by the success of the open-source movement, which can be seen as a close parallel to scientific research. The problems tackled by software developers are by no means "embarrassingly parallel," yet the progress in software has been accelerating like never before, thanks in part to the open-source community. Thus, in much the same way that the open-source movement has revolutionized software development, one might hope that collaborative science might do the same for research.

Connecting researchers on large scales through the internet is key, writer/physicist Michael Nielsen claims, and describes the next advancement in how science is done as a revolution in our "collective memory":
The adoption and growth of the scientific journal system has created a body of shared knowledge for our civilization, a collective long-term memory which is the basis for much of human progress. This system has changed surprisingly little in the last 300 years. The internet offers us the first major opportunity to improve this collective long-term memory, and to create a collective short-term working memory, a conversational commons for the rapid collaborative development of ideas.2
"Rapid collaborative development of ideas" through a connected network of scientists is no small matter; it would work wonders in accelerating  progress made in the fields of mathematics and physics. Imagine further if, in the long run, the scope of such collaborative research could be extended past mathematics and physics.3 If data and methodologies were made available online, extensive dialogue between scientists might accelerate progress on all sorts of research.

Gowers immediately set out to test the power of such collaboration, starting the Polymath Project. Technically speaking, the project was just another one of Gowers' blog posts: Gowers describes the mathematical problem that he wishes to solve along with some of his initial thoughts, and then opens the floor for others to participate by leaving comments. Take a few minutes and skim through the post and comments here and get a flavor for the contributions. Even if, like me, you can't follow the mathematics, the chain of conjectures, counterexamples, and the occasional 8am "I should go to sleep…" comments is fascinating. As Gowers and Nielsen later put it, "Who would have guessed that the working record of a mathematical project would read like a thriller?"4 But not only did the Polymath Project spawn interesting discussion, the results exceeded Gowers' expectations. As they describe it,
Over the next 37 days, 27 people contributed approximately 800 substantive comments, containing 170,000 words... Progress came far faster than anyone expected. On 10 March, Gowers announced that he was confident that the Polymath participants had found an elementary proof of the special case of DHJ, but also that, very surprisingly (in the light of experience with similar problems), the argument could be straightforwardly generalized to prove the full theorems.4 
That the project might pass for a casual discussion between mathematically-inclined strangers through a rather clunky online blog interface, yet actually be groundbreaking research in the field of combinatorics is astonishing and speaks to the power and efficiency of collaborative science.5

Gowers later dissected the Polymath experience, in an attempt order to determine the Project's success.6 In the typical Polymath style, there were hundreds of insightful 'metacomments'; many of these were optimistic and delighted by the progress made, while many were critical of certain aspects of the collaborative model. If we are to learn from the Polymath Project for the purposes of future such experiments, we should take note of what might be kept and what might be improved.

If you only take away one thing from the Polymath Project, I think it should be that collaborative mathematics is possible. And not only is it possible, but its capabilities are well beyond the initial expectations in pace and scope. Even in the finer details, the Project was an excellent pilot for collaborative research. For one, it was "genuinely collaborative" in that there were more than just 3 or 4 participants, which is especially remarkable considering how arcane the original problem was, even to your average mathematician. Additionally, many of Gowers' postulates about the power of "connected mathematicians" seemed to hold true throughout, as he recalled later,
The project has been genuinely collaborative, and has led, to a remarkable extent, to the kind of efficiency gains that I was hoping for... I found myself having thoughts that I would not have had without some chance remark of another contributor. I think it is mainly this that sped up the process so much.
Furthermore, almost all the posts seem to have added some sort of valuable content to the discussion, and spam seems to have been virtually non-existent. This, however, may have been more of an artifact of how specialized the discussion was than anything else.

At the risk of repeating myself too many times, I want to emphasize that from the perspective of content the collaboration absolutely flourished. On the other hand, if we step back and focus on the structure/interface of the collaboration, it becomes pretty obvious that there are numerous things that could be improved.

Right off the bat it should be clear that a blog is not the optimal interface through which to collaborate on mathematics. When working on a project, especially one with a large number of collaborators, it is critical to be able to see what's been done since you last contributed. In software, for example, this is typically done via diff, etc. Unfortunately, there is no easy way to do this for the comments of a blog; indeed, it would quite easy to lose a vital message in the tangled maze of discussion threads. A simple solution to this problem is not to thread,  to post comments linearly (i.e. either no or limited nesting of replies). This, however, can be quite restrictive, especially when the discussion has naturally split into multiple subtopics.

Related to the issue of threading and nested replies is the visual aspect of the discussion. Although the content itself should not change, the way it is displayed should be flexible and easy on the eyes. Take Reddit, for example: it provides a method for collapsing and expanding subsets of the discussion, which  can be quite useful when the number of comments is too high. Speaking of easy on the eyes, the TeX rendering on WordPress isn't too pretty - using something like MathJax would be much more aesthetically pleasing (not to mention useful, as it allows access to the underlying TeX). Just as important as the presentation of the discussion is navigability. Often it is necessary to refer to certain comments made earlier, and some sort of comment hyperlinking+equation referencing in addition to a smooth system of moving back and forth through the discussion would certainly be useful.

These are but a few possible features that one might look for in a platform for collaborative mathematics; I'm sure anyone who has more experience doing research would be able to think of some more. Indeed, Gowers et al. acknowledged these usability problems as they were collaborating on the Polymath Project, and the collaborators made do as well as they could, constrained to the blog format. What is especially striking to me is that all of the technology that could trigger collaborative online mathematics is already deployed and heavily used.

Take MathOverflow, for instance. Designed as a question-and-answer forum for professional mathematicians (primarily graduate students and researchers), MathOverflow has flourished, and serves as the go-to place for little mathematical hurdles one might encounter. The interface is quite slick, what with user profiles, tags used to categorize questions, the ability to upvote or downvote questions, answers, and comments. Unfortunately, MathOverflow is not a place designed for large-scale discussion -- indeed, very recently there was a post on Zhang's bound on primes that was more discussion-oriented than Q+A-oriented and was subsequently closed (though one commenter pointed out that reducing the bound might make an excellent polymath project!). Regardless, the gadgets that MathOverflow uses could be easily applied to a polymath-like platform: profiles might keep track of recent contributions to various research problems, upvotes might highlight particularly insightful contributions, and tags might be lend some structure and ease in finding other "related" interesting projects.

But why stop there? We could draw inspiration from TeX, and add a certain amount of mobility into the discussion with hyperlinks, references, metadata-appropriate labels (conjecture, remark, etc.) and so on. Or perhaps we could take ideas of project "history" from the open source movement, and keep a record of the project's revision history (discussion, Mathematica code, images, etc). Much like an open source project, the discussion might have a "README" tab containing background material and general outlook, while other tabs might each contain a different approach to tackling the problem.

But I digress. Perhaps I should summarize: I believe that the key to sparking a revolution in online collaborative research will lie in developing: both an elegant and highly functional user interface (difficult) and a close but expanding community of active researchers (very difficult). In this sense, the problem of collaborative research is both technological and social in nature (truly a problem of our time, eh?).

From there... who knows? For one thing, recording the entire history of a research project, down to the last detail, will be something that's never been done before. It may even open the doors to meta-research of some level, i.e. studying which approaches tend to solve certain problems, which groups of people tend to work together better, and so on. Furthermore, it would be interesting to see how this research paradigm might interact with the usual paper-in-journal approach. How do we credit contributors? Should we simply replace the author field with a link to the research discussion? Can third-parties (such as universities) expect to evaluate the skills of a researcher by examining his/her profile page? etc. It's clear that such questions will inevitably be brought up, even in designing the research platform itself.

Regardless of the details, it seems to me that the key aspect in collaborative research is the sheer openness through which it would operate: by making public vast amounts of data, critical discussions, and deep scientific insights, we knock down barriers that we have only recently begun to notice, and allow scientific discourse to reach its full potential.

If you're interested in this endeavor, I highly recommend reading Nielsen's essay, The Future of Science. It's almost hard to believe that it was written 5 years ago, especially considering today's constantly evolving tech landscape.

1. Read Gowers' post here.
2. Read Nielsen's post here.
3. I say "in the long run" because fields involving vast amounts of data, technical equipment, etc. are more complicated to fit into an online collaborative environment. Of course, this is not to say that it can't be done, as there are already quite a few interesting and useful streams of data/research, as Nielsen discusses (see footnote 2).
4. Nature 461, 879-881 (15 October 2009), doi: 10.1038/461879a
5. The paper is available on arXiv.
6. See these posts.

Saturday, June 1, 2013

Summer break!

I apologize for completely deserting this blog for the last couple months. My semester was a bit too busy to get any writing done, but I seem to have learned quite a bit.

I took what turned out to be one of the best courses I've taken at Columbia so far: Professor Mu-Tao Wang's Intro. to Differentiable Manifolds course. We worked out of John Lee's Introduction to Smooth Manifolds (GTM218), which I thought to be a pretty good introductory text. Over the course of the semester we covered Chapters 1-6, 8, 10-16 (and bits of 9). The subject turned out to be much more interesting than I had initially assumed, especially since I took it in conjunction with Professor Peter Woit's course on quantum mechanics from the viewpoint of representation theory. Obviously I'm new to these ideas, but the fact that geometry and topology is so closely related to fundamental physics is pretty neat!

Still, I'm glad that the semester's over. I could definitely use a bit of a break. I'm free until mid-June, when I'll be flying over to Berkeley for a summer research program. Not quite sure what I'll be working on yet, but I'm sure I'll learn a lot. In the two weeks of totally-free-time that I have left, though, I think I'll try to start working through a couple of books on representation theory and Lie groups+algebras, commutative algebra, and Riemannian geometry. I've also really been itching to read Naber's two-part series on Topology, Geometry, and Gauge Fields, so hopefully I'll start digging into that soon. It may sound a little ambitious to try to learn five different things at once, but I think it might actually be better than concentrating on one topic for too long. It's all too easy to get stuck/frustrated on one little concept or exercise... perhaps it's better to be stuck on five different ones!

On a more serious note, since it's difficult to learn math without actually doing any, I will be typing up proofs to relevant propositions, worked exercises, and the like. (I'm already one chapter into Atiyah's Intro. to Commutative Algebra; I'll put up my notes at some point in the near future.) I'm currently debating whether I should take notes on this blog or instead just commit them to my GitHub notes repo. Perhaps I'll do both... somehow.

Well regardless, I hope to be blogging about a bit about math and physics over the next couple of months, hopefully at least once a week. Ciao.

Friday, January 4, 2013

the principle of least action

This is the first post in a series of posts on various topics in classical mechanics. The ideal reader has a working knowledge of single-variable calculus, partial derivatives, and basic introductory physics (force, acceleration, momentum, energy, etc). Note that throughout this series, time-derivatives are written as \(\dot{x}\equiv\frac{dx}{dt}\) and vectors are marked by boldface.

Newton's laws form the basis of the field of classical mechanics, the field of physics concerned with the motion of macroscopic objects under the influence of certain forces. Newton was the first to mathematically express the relationship between the force on an object and its motion; his second law states that the force applied is proportional to the object's acceleration: $$\mathbf{F(\mathbf{x}, \mathbf{t})}=m\mathbf{a}=m\mathbf{\ddot{x}},$$
where \(\mathbf{x}\) is the object's position vector. Now, as physicists, our job is of course to find the position of the object at any moment in time. This is done by solving the above equations of motion for \(\mathbf{x}\); however, as the force may depend on position and time, the equation is a second-order differential equation in time, and for complicated systems is often impossible to solve exactly. As an example of such a force, consider the force on a block attached to a spring whose stiffness \(k\) is changing over time. In this case, by Hooke's law, the equations of motion are \(m\mathbf{\ddot{x}}=-k(t) \mathbf{x}\). The exact solution for \(\mathbf{x}\) depends on the precise form of \(k(t).\)

Now it turns out that in the grand scheme of things, with the availability of supercomputers and excellent approximation algorithms, exact solubility of differential equations is not the physicist's biggest worry. The hard part is obtaining the equations of motion in the first place. Most systems thrown at you in Physics 101 succumb to analysis after simply drawing a free-body diagram and slogging through some trigonometry to resolve components of the forces. This is not always the case, though - many physical systems of interest are opaque to straightforward force analyses. I wouldn't fancy, for instance, considering the forces in a double pendulum system or the swinging Atwood machine. Furthermore, in much of modern physics, such as the realm of subatomic particles, the classical concept of Newtonian force is not well-defined. For these reasons (and many more that will become clearer later), it is often useful to work with more elegant formulations of classical mechanics; the one we will develop with here is based off the principle of least action (also known as Hamilton's principle) and is known as Lagrangian mechanics. If you're interested, check out the further reading section at the end of the post.

An ode to coordinates

Figure 1: The simple pendulum
Before we state the principle of least action, however, it is important to discuss the role of coordinate systems in dealing with general physics systems. Newton's laws restrict us to the familiar Cartesian coordinate system where vectors are written \(\mathbf{v}=v_x\mathbf{\hat{x}} + v_y\mathbf{\hat{y}} + v_z\mathbf{\hat{z}}\). Cartesian coordinates, however easy to visualize, are not always the most natural choice for coordinates. Consider the simple pendulum, for example. In the picture to the left, it is clear the force of gravity does not generally point in the direction of motion of the bob. Thus we are forced\(^*\) to project the force onto our chosen y-axis. It now becomes evident that the force responsible for the pendulum's swinging motion is \(mg\sin\theta\) and that the force of tension must balance out \(mg\cos\theta\) if the pendulum is to have constant length. It should feel slightly artificial to you that in order to describe motion that is inherently arc-like in motion (think polar coordinates), we are trying to use rectangular, Cartesian coordinates; instead of only thinking about the natural variable, \(\theta\), we are cluttering our minds with two artificial ones: \(x\) and \(y\).

While the difference between Cartesian and polar coordinates for the case of the simple pendulum is not particularly complicated (conversion is quite easy and we do it implicitly), from a more general perspective, Newton's second law is very rigid in its insistence on Cartesian coordinates. Ideally one would like to use coordinates that suit the system under consideration, the number of which agrees with the degrees of freedom of the system. Take for example, a bead constrained to a wire. The number of degrees of freedom of the bead is one, no matter how the wire twists and turns in three dimensions: the position of the bead can be parameterized by one variable (such as the wire's arc length) as it moves along the curve. To describe the bead's position, then, we need only one coordinate along with the shape of the wire (a "constraint"), not the three implicit in \(\mathbf{x}\). Similarly, the simple pendulum is a system with only one degree of freedom - its position along the circular arc to which it is constrained to move can be parameterized by a single coordinate \(\theta\). Thus we see a distinction between the dimensionality of the space in which the system is embedded (2 for the pendulum) and the dimensionality of the path that the system actually takes (1 for the circular arc that the pendulum sweeps out). The difference between these two dimensions is precisely the number of the constraints on the system - for the pendulum, there is one constraint that relates \(x\) and \(y\). This constraint is that the pendulum does not stretch; i.e. \(x^2+y^2=l^2.\)

Incidentally, the natural coordinates for a system are commonly known as generalized coordinates, and are written as \(\mathbf{q}=(q_1, q_2,\dots, q_d)\) where \(d\) is the number of degrees of freedom of the system. The wonderful thing about the principle of least action is, as we shall see, that we are not forced to use Cartesian coordinates. Instead we can simplify our lives by working with generalized coordinates. So instead of imagining the system moving in time through a Cartesian space, we will think of it as moving through a generalized configuration space. The advantages of using generalized coordinates is tremendous: we no longer have to worry about the constraints on the system because the natural coordinates always respect the constraints! In the case of the pendulum, if we describe the system by \(\theta\) we shouldn't have to make any reference to the fact that \(x^2+y^2=l^2.\) We will see this explicitly when we use the principle of least action to analyze the simple pendulum below.

The principal matter

Let us now turn to the principle of least action. Don't worry if it seems too abstract at first. Bear with it - we will work through a few examples and, in fact, prove that it is equivalent to Newtonian mechanics by using it to derive Newton's second law!

The Principle of Least Action: Define the Lagrangian of a system to be the difference between its kinetic and potential energies, \[L(\mathbf{q},\mathbf{\dot{q}}, t)=T-U.\] Define also the action of any putative path \(\mathbf{q}(t)\) of the system through configuration space from \(\mathbf{q}(t_1)=\mathbf{q_1}\) to \(\mathbf{q}(t_2)=\mathbf{q_2}\) to be \[S[\mathbf{q(t)}]=\int_{t_1}^{t_2} L(\mathbf{q},\mathbf{\dot{q}}, t) \;dt.\] The path \(\mathbf{q}(t)\) actually taken by the system through configuration space is the path for which the action is stationary to first order. In other words, if a path and any very slightly different path, \(\mathbf{q}(t)+\delta \mathbf{q}(t)\) have the same action, this is the path that the system will take through configuration space.

Figure 2: This function has three stationary
points: two minima and one maximum
Alright, so that's the principle. Let's dissect it - here are a few things to note. First of all, what exactly does it mean for something to be stationary? Well if you think back to single variable calculus you will remember that the extrema of a function \(f(x)\) can be found by setting \(f'(x)=0\). Graphically, a derivative of zero at a point \(x\) simply means that the function is approximately flat and horizontal at \(x\). In other words, if we were to evaluate \(f(x)\) and \(f(x+\delta x)\) with \(\delta x\) very small, we would get the same result both times. And of course, that's what a derivative of zero means: the function is locally constant. Thus we say that the function is stationary at \(x\).

Figure 3: A few of the possible paths from \(\mathbf{q}(t_1)\) to \(\mathbf{q}(t_2)\)
The astute reader will note that the quantities we are dealing with here, namely actions of paths in configuration space, are not exactly functions. Instead, action is a functional: something that takes a function and returns a number. What, then, does it mean for the action to be stationary? Assume we know that the system is in the configuration \(\mathbf{q_1}\) at time \(t_1\) and in the configuration \(\mathbf{q_2}\) at time \(t_2\). First consider the set of all possible paths \(\mathbf{q(t)}\) that satisfy the given boundary conditions: \(\mathbf{q}(t_1)=\mathbf{q_1}\) and \(\mathbf{q}(t_2)=\mathbf{q_2}\). Next, assign each path an action using the action functional above. Finally, find the path for which all "neighboring" paths (paths "almost" equal to the chosen one) have the same action (think by analogy with the case of a regular function \(f(x)\)). What the principle of least action says is that this is the path that the system actually takes. Now, you will notice that nowhere have we mentioned anything about minimization, though the naming of the principle is suggestive. Requiring a function(al) to be stationary essentially just requires locally constant behavior; this means the function(al) may either be minimized, maximized, or neither (consider, for example, \(x^3\) at zero: in general these points are known as saddle points). It just so happens that for almost all physical phenomena, the action functional is minimized by the path that the system takes through configuration space! This is why the principle is named, a tad imprecisely, the principle of least action.

Note: you should be happy that the principle of least action says not a word about forces, and instead uses only energies. This is fortunate, as forces are inevitably more complicated than energies for most interesting systems. For one, forces are always vectors! Energy on the other hand, never has a direction, and so we never have to discuss its components, etc. In addition, energy is very flexibly written in arbitrary coordinate systems, unlike the unit vectors that forces are expressed in. This flexibility yields great benefits when the systems under consideration are not readily described as rectangular (Cartesian, if you will).

Out of the frying pan and into the fire?

If this is still a little abstract, take the example of projectile motion. Suppose we shoot a ball up into the air at \(t=t_0, x=x_0\) and it hits the ground some distance away at \(t=t_f, x=x_f\). Neglecting all forces but gravity, it is clear from experience that the kinetic energy of the ball decreases with height, while the gravitational potential energy increases. Consequently, the system's Lagrangian, which is the difference between kinetic and potential energy, is more negative for paths (satisfying our boundary conditions) that are on-the-whole higher than lower paths. The action for higher paths, then, is lower. The ball's path cannot be too high, however, for higher paths mean higher kinetic energy on the way up and on the way down, which would increase the action. Thus, the true minimum is found somewhere in between, and turns out to occur for - yep, you guessed it - parabolic paths.

Of course, the example with the ball is very schematic. It does, however, give a fair representation of the global approach that the principle of least action takes. Newton's law, on the other hand, are very local in their scope. As a differential equation, Newton's second law describes how the system changes per infinitesimal time step. Only by integrating this differential equation over these infinitesimal time steps do we obtain how the system evolves over time. Using the principle of least action is much less narrow in its scope: it is cognizant of the initial and final conditions of the system and uses these, along with knowledge about how the energies of the system vary with its configuration (i.e. the Lagrangian), in order to compute how the system gets from \(\mathbf{q_1}\) to \(\mathbf{q_2}\).

"But wait!" you might exclaim. We abandoned the tedious task of playing with forces and Newton's second law, but have to instead comb through the actions of all the paths that get the system from \(\mathbf{q_1}\) to \(\mathbf{q_2}\)! There are an infinite number of such paths; surely there must be an easier way! Wouldn't it be great if we could have the best of both worlds? If only we could deal with energies and use the principle of least action to produce a differential equation equivalent to Newton's second law!

Euler and Lagrange to the rescue

Fortunately, there does exist a way to connect the global Lagrangian method to the local differential equations approach. Consider it fair warning, however, that the following derivation is not as mathematically rigorous as one might like. As the technical details have little pedagogical value in the context of our discussion, I shall simply point the reader to any standard work on the calculus of variations. What we wish to do is to formalize our ideas of "neighboring" paths and construct the equivalent of \(f'(x)=0\) for functionals so that we may find the paths of stationary action. Consider, without loss of generality, a system with one degree of freedom that obeys the boundary conditions \(q(t_1)=q_1\) and \(q(t_2)=q_2\), and is characterized by a Lagrangian \(L.\) The derivation for the  \(d\)-dimensional case is obtained in exactly the same way and is left as an exercise for the (calculus-happy) reader. Let the actual path that our system takes through configuration space - the path we are trying to find - be denoted by \(q(t)\). How can we talk about paths that are only slightly different than \(q(t)\)? The trick is to define slightly varied paths as \[r(t)=q(t)+\epsilon \eta(t)\] where we will later take the limit \(\epsilon\rightarrow 0\). \(\eta(t)\) is any arbitrary function that respects \(\eta(t_1)=0\) and \(\eta(t_2)=0\). This constraint is placed so that all perturbed paths respect the same boundary conditions that the true path does. Incidentally, if we look back at Figure 3, we see that \(\delta q(t)=\epsilon \eta(t)\). In words, what we have just done is added an arbitrary change to our path and forced it to be a infinitesimal change by taking \(\epsilon\rightarrow 0.\) Now we need to compute the actions on all such paths.

In analogy to what we discussed about functions at stationary points, and according to the principle of least (stationary) action, we want \[S[r(t)]=S[q(t)]\] as \(\epsilon\rightarrow 0\), i.e. we want the actions of all paths very close to the true path to be equal to the action of the true path. From this relationship, then, we may be able to extract useful information about \(q(t)\). The above equation gives us \[\int_{t_1}^{t_2} L(r,\dot{r}, t) \;dt=\int_{t_1}^{t_2} L(q,\dot{q}, t) \;dt.\] But since \(\epsilon\rightarrow 0\), we may perform a multivariate Taylor expansion of the Lagrangian of the perturbed path (in both \(q\) and \(\dot{q}\)) about \(r(t)=q(t)\) as \[\int_{t_1}^{t_2} L(q,\dot{q}, t) + \frac{\partial L}{\partial r}\big|_{r=q}(r-q) + \frac{\partial L}{\partial \dot{r}}\big|_{r=q}(\dot{r}-\dot{q}) \;dt=\int_{t_1}^{t_2} L(q,\dot{q}, t) \;dt.\] Note that the first term on the left is common to both sides, and so we drop it from both sides. Also, by definition (see above), \(r-q=\epsilon \eta\), and also, by taking a derivative, \(\dot{r}-\dot{q}=\epsilon \dot{\eta}\). We now have \[\int_{t_1}^{t_2}\frac{\partial L}{\partial q}\eta + \frac{\partial L}{\partial \dot{q}}\dot{\eta}\;dt=0\] Integrating the second term by parts yields \[\int_{t_1}^{t_2}\frac{\partial L}{\partial q}\eta - \frac{\partial}{\partial t}\frac{\partial L}{\partial \dot{q}}\eta\;dt=\int_{t_1}^{t_2}\left(\frac{\partial L}{\partial q}-\frac{\partial}{\partial t}\frac{\partial L}{\partial \dot{q}}\right)\eta(t)\;dt=0.\] Can you see why the surface term from the integration by parts vanished? Hint: it's due to a condition we enforced on \(\eta(t)\). Now we have the integral of an expression that is multiplied by the function that produces all of our varied paths. Since \(\eta(t)\) is completely arbitrary and not necessarily zero, but the integral over its product is, the other factor must be zero! Draw yourself a few pictures of integrals and think about this for a bit if you don't see it right away. This reasoning, albeit in a more rigorous incarnation, is known as the fundamental lemma of the calculus of variations. Thus we arrive at the famous Euler-Lagrange equation, \[\frac{\partial L}{\partial q}-\frac{d}{dt}\frac{\partial L}{\partial \dot{q}}=0,\] which is, in general, a second-order differential equation for the true path \(q(t)\) of the system, as \(L\) is a function of \(q(t)\) and \(\dot{q}(t)\). In general, for a system with \(d\) degrees of freedom, we have a system of differential equations: \[\frac{\partial L}{\partial q_i}-\frac{d}{dt}\frac{\partial L}{\partial \dot{q_i}}=0,\] with \(i=1\dots d.\)

Back to reality

Okay, so we've (with no small amount of black magic) been able to reduce the principle of least action down to a differential equation. The question now, is of course, does it work? Let's do a few easy examples to make sure we agree with answers we get from Newtonian mechanics.

The free particle

The free particle is exactly what it sounds like: a particle free to move in all 3 dimension with no forces on it. This system has 3 degrees of freedom, and since it has no particular symmetry/constraints, we will choose our generalized coordinates to be the usual Cartesian coordinates. The first step is to write the particle's Lagrangian. Since there is no potential energy, we have \[L=T=\frac{1}{2}m \mathbf{v}^2=\frac{1}{2}m(\dot{x}^2 + \dot{y}^2 + \dot{z}^2).\] The Euler-Lagrange equations effectively impose the principle of least action on our system's path: \[\begin{align}\frac{\partial L}{\partial x}-\frac{d}{dt}\frac{\partial L}{\partial \dot{x}}=- \frac{1}{2}m\frac{d}{dt}\frac{\partial}{\partial \dot{x}}\dot{x}^2&=0 \\ \frac{\partial L}{\partial y}-\frac{d}{dt}\frac{\partial L}{\partial \dot{y}}=- \frac{1}{2}m\frac{d}{dt}\frac{\partial}{\partial \dot{y}}\dot{y}^2&=0 \\ \frac{\partial L}{\partial z}-\frac{d}{dt}\frac{\partial L}{\partial \dot{z}}=- \frac{1}{2}m\frac{d}{dt}\frac{\partial}{\partial \dot{z}}\dot{z}^2&=0 \end{align}\] Taking the \(\dot{q}_i\) derivatives and the time derivative yields (check it for yourself!) in vector notation, \[m\mathbf{\ddot{q}}=m\mathbf{a}=0,\] where \(\mathbf{q}\) is the position vector of the particle. This is exactly what we would have obtained from Newton's laws: that with no forces acting on the particle, the particle will have zero acceleration. Integrating this equation of motion, of course, yields motion with constant velocity.

Simple pendulum

Let's now analyze the pendulum via Lagrangian mechanics - it will be our first exercise in using generalized coordinates. There is only one degree of freedom, the angle to the vertical, \(\theta\). The only force acting on the bob is the force of gravity (the tension in the string is not a physically "fundamental" force!), that has potential energy given by the usual \(mgh\) where \(h\) is the bob's height. There is a slight subtlety here in the fact that we can choose to measure the bob's height from where ever we want. Why? Well suppose we chose to measure the height from the pendulum's point of suspension. What we've effectively done is added a constant to the height. In saying that we can choose to zero the height where ever we want, we are claiming is that this constant does not change the differential equation that the Euler-Lagrange equation gives us. Let's see if that's true. If the height is measured from the point of suspension, the height of the pendulum at any given angle is \(-l\cos\theta\) where \(l\) is the length of the pendulum. Draw a picture and check this! Since the Euler-Lagrange equations take derivatives in respect to \(\theta\) and \(\dot{\theta}\), the constant added (to make it \(c-l\cos\theta\)), would never even make it to the equation of motion. Thus the Lagrangian can simply be written \[L=T-U=\frac{1}{2}mv^2+mgl\cos\theta.\] Since for rotational motion we (hopefully) remember that the tangential velocity \(v\) is related to the angular velocity as \(v=l\omega=l\dot{\theta}\), we now have: \[L=\frac{1}{2}ml^2\dot{\theta}^2+mgl\cos\theta.\] Inserting this into the Euler-Lagrange equation for \(\theta\), we find \[\begin{align}\frac{\partial}{\partial \theta}\left(\frac{1}{2}ml^2\dot{\theta}^2+mgl\cos\theta\right)-&\frac{d}{dt}\frac{\partial}{\partial \dot{\theta}}\left(\frac{1}{2}ml^2\dot{\theta}^2+mgl\cos\theta\right)=0 \\ -mgl\sin\theta-&\frac{d}{dt}\left(ml^2\dot{\theta}\right)=0 \\ ml^2\ddot{\theta}=&-ml\sin\theta,\end{align}\] from which we obtain the usual equation of motion for the pendulum: \[\ddot{\theta}=-\frac{g}{l}\sin\theta.\]
Notice that we managed to derive the differential equations of motion for the pendulum without discussing forces! Additionally, we needed only concern ourselves with one variable - the degree of freedom, \(\theta\)! Even better, there was not a word about the tension in the string, something we don't particularly care about in computing the pendulum's motion; tackling the problem via Newtonian mechanics, on the other hand, would have required balancing a projection of gravity against the tension.

Although this approach might appear more convoluted than straightforwardly applying Newton's laws, it is only because these two examples were very simple, and are easily analyzed using forces. But before we move on to more complicated examples, let's prove that the principle of least action is equivalent to Newton's second law.

Newton again

Take a single particle system with some arbitrary, position-varying potential energy \(U(\mathbf{x}).\) Without any further information, we may write the system's Lagrangian as \[L=T-U=\frac{1}{2}m\mathbf{\dot{x}}^2-U(\mathbf{x}).\] Directly inserting this into the Euler-Lagrange equations yields for the \(x\) component, \[\begin{align}\frac{\partial}{\partial x}\left(\frac{1}{2}m\dot{x}^2-U(x)\right)-&\frac{d}{dt}\frac{\partial}{\partial \dot{x}}\left(\frac{1}{2}m\dot{x}^2-U(x)\right)=0 \\ -\frac{\partial U}{\partial x}-&\frac{d}{dt}\left(m\dot{x}\right)=0 \\ m\ddot{x}=&-\frac{\partial U}{\partial x}\end{align}\] and as you should remember from Physics 101, the right hand side of the last equation, by definition of potential energy, is simply force! Consolidating all the components into vector form, we obtain Newton's second law, \[\mathbf{F}=m\mathbf{\ddot{x}},\] and we may now breathe a bit easier, what with good old Newton getting along with the principle.


This has already become quite a long post, but before concluding, I'd like to provide two exercises which are easier to do using Lagrangian mechanics than through Newtonian mechanics.

1. You've no doubt encountered the Atwood machine (bottom left) at some point, which can be solved fairly easily by considering gravity and tension. The motion of the double Atwood machine (bottom right), however, is not as trivial, and is perhaps more easily solved using Lagrangian mechanics. The first exercise is to solve for the motion of the double Atwood machine, ignoring the mass of the pulleys.  Hint #1: How many degrees of freedom are there? The variables in the picture should give you a clue. Hint #2: Before writing out the Lagrangian, write down the positions of each mass in terms of the variables shown. Then use these positions to write down the kinetic and potential energies (measuring height with respect to the top). Remember, the Lagrangian is for the whole system, so the total energies are found by summing the energies for each mass! If you get stuck - and only then! - you might want to glance at this.

2. Since we've worked through the simple pendulum, it's time for you to tackle the double pendulum (shown below). Hint #1: How many degrees of freedom are there? It should be equal to the number of angles needed to describe the system at any point in time. Hint #2: The kinetic energy of the system is similar to that of the simple pendulum. The potential energy is a little trickier, and you will have to do a bit of trigonometry to determine the height of the lower mass. Hint #3: Don't actually do this problem. Well okay, do it, but it's okay to stop when the Euler-Lagrange-ing gets sufficiently tedious. This is mostly an exercise in being able to write down the Lagrangian of a complicated system. You should check your work here.


Hopefully this post gave you a glimpse of the power and elegance of the Lagrangian formulation of mechanics. Instead of fiddling around with Newton's second law, balancing forces and playing with trigonometry, we now have a much more mechanical\(^*\) technique for deriving the system's equations of motion: simply write down the Lagrangian and take a few partial derivatives. Not only is this process pretty foolproof, it is based on a very elegant and general minimization principle that is easily extended to quantum mechanics and general relativity. Furthermore, there are many interesting aspects of mechanics that become more transparent through the Lagrangian approach; in fact, this is where Lagrangian mechanics truly shines! I hope to work through a few neat topics in the next few posts. I've always found one of the most fascinating aspects of Lagrangian mechanics to be how one can motivate conservation laws (such as conservation of energy and momentum) from simple symmetry principles (such as time-translational or translational invariances), so that's probably what I'll discuss next time.

Further reading

For a more complete treatment of the topics discussed here I strongly recommend Cornelius Lanczos' lucid The Variational Principles of Mechanics or by way of free online resources, John Baez's course notes or Susskind's Stanford lectures on YouTube. The more ambitious reader might choose to tackle Landau and Lifshitz's terse but elegant classic, Mechanics.

* My puns are always intended

Image credits: links provided, otherwise from Wikipedia