# Differentiating a Physics Simulator

Have you seen something happen that seems almost impossible to replicate? Like dropping something and have it land in just the right way, or shoot an arrow and have it land on its tip?

From a simulation standpoint, you might ask what are the initial conditions required to produce that outcome. To solve this, you could attempt to search through the whole search space. For instance, you could try all possible velocity and position of the arrow and select the ones that land on it's tip. However this is plain impractical as despite only having 7 degrees of freedom (3 for rotation, 3 for position and one for initial velocity), the search space is already huge and chances are, the solutions are but a small portion of the unwieldy search space.

## Ok first off, why should we care?

The arrow problem is arguable classified under 'cool' but definitely not 'practical'. However, these kinds of problems where there is a huge search space and we are searching for only a small subset of said space is ubiquitous. Of rising visibility, we have Machine Learning, of which most forms involve finding a set of magic parameters that achieves a given task. Modern state of the art models can have billions of parameters, such as GPT-3, where OpenAI has solved for 175 billion parameters whose very special values enable GPT-3 to model English pretty well. The search space is enormous, and OpenAI definitely did not naively brute-force the search.

## Differential Problems

The main reason why it is even possible to solve such problems is because some of these problems are *differentiable*. These problems can often be seen as an optimization problem. In other words, the problem can be formalized as extremizing the value of some function f(**p**), where **p** are the parameters of the search space. E.g. GPT-3 has a metric that scores how well GPT-3 is performing its task, and by changing the 175 billion parameters, the aim is to maximise/minimize said score.

For a problem to be *differentiable*, it simply means that it *makes sense* to find the gradient of f with respect to **p**: f is *differentiable* with respect to **p**. In other words, it makes sense to ask the question: How much does f(**p**) change given a small change in **p**. Most problems are not differentiable. What that means is that a small change in **p** in different directions could change f(**p**) very differently.

If a problem is *differentiable*, this makes extremizing f(**p**) really easy. If we know how f(**p**) change given a small change in **p**, we can perturb **p** in just the right way to increase or decrease the value of f(**p**), and we can do this again and again until f(**p**) stops changing. This idea is known as *Gradient Descent*, and has been the primary method for solving huge *differentiable* optimization problems. Of course, given the surprising detail of reality, gradient descent in practice can get really complicated, and I'm skipping a lot of details here.

The need to solve such optimizing problems have resulted in a lot of cool tech to deal with it. Machine Learning frameworks such as TensorFlow, PyTorch and Keras enables an easy way to define an optimization problem, and then solve them.

## But how does this relate to the arrow problem?

It turns out some physics problems are differentiable! Some really cool people have created frameworks that makes defining and solving these kinds of problems really easy. DiffTaichi is an example of it. I modified one of the sample codes to solve for the velocity field of a fluid that mixes a fluid in just the right way to produce this:

Turns out with all the complexity of mixing fluids, to some extent it is differentiable. The problem is modeled as f(**p**), where f scores how close the fluid is to the final image after 100 frames, and **p** is the initial velocity of the fluid. The aim is to then maximise the value of f(**p**).

On my laptop this took a while to solve (2 hours), and I had to add some upscaling tricks to solve it properly at that resolution because sometimes gradient descent just doesn't cut it.

Cool tricks aside, differentiating a physics simulator has many practical applications (I like to think of it as differentiating the world). Gradient descent has been used for a lot of design optimizations, such as to optimize mechanical designs such as the wings of planes and linkages. There's definitely way more but these are the ones that came to mind.