# Wedding Seat Optimization using Simulated Annealing

Or what I do when I go back to Singapore for one friend’s wedding (congratulations Crystalbel and Kenneth!) and find that all but three of my friends are married / engaged.

Turns out they all have a similar problem: wedding dinner planning. In particular, seat planning is a nightmare. It doesn’t take much imagination for one to picture one’s favorite relative specifying seating arrangements to the most minute detail. (“eh my sister’s uncle’s brother cannot sit with my brother’s cousin because they don’t see each other eye to eye.”) You probably don’t want to seat ex-es together as well. Same goes for ex-colleagues who may be awkward with each other. Whatever the reason is, seating arrangement is tricky.

Thankfully, I’m not the first person to come across this problem. There’s a very nice paper here by Bellows and Peterson on this exact problem. This paper provided a neat framework to think about this problem, but aimed for the global optimal via mixed-integer linear optimization. I find that probably too overkill, and an approximate method that sufficient avoids the local minima trap should suffice. The paper also uses a proprietary solver which is a huge turn-off, and no one wants that for a wedding. Heh. Finally, I also wanted to learn some simulated annealing. So here’s a very short script that accomplishes that.

## Framing This as a Combinatorial Optimization Problem

Going back to the problem statement, we can think about this as an combinatorial optimization problem. Essentially, seating preferences can be expressed as a series of weights between two parties. For example, if Linan hates sitting with Spongebob, I can assign the event that they sit together a positive cost of 50. If Gabriel and Kristie are a couple, and they would very much want to sit together, then them sitting together would incur a negative cost of -50. If Spongebob don’t want to sit with Squidward but won’t die if they do end up sit together, I can assign them a slightly lower positive cost (e.g. 20).

Then, my overall goal is to minimize the cost of an arrangement. For example, if I had an arrangement that made Gabriel sit together with Kristie, Linan sit with Spongebob, and Spongebob sit with Squidward, the overall cost would be -50 + 50 + 20 = 20. However, if I moved Spongebob to a different table such that Gabriel, Kristie, Linan, and Squiward are at the same table, the cost for the arrangement becomes -50 and that would be a way more preferable arrangement compared to the original. We assume of course that unspecified pairs (e.g. Gabriel and Squidward, Kristie and Linan) have a cost of 0 i.e. they don’t care if they do sit together or not.

You can think of this as a graph problem with nodes being the dinner participants and edges being the costs.

So let’s set up the problem. Assume that we have 10 guests initially and they hate / love each other in the following manner. Assume that each table seats 5 people, so we have two tables. Admire how creative the guest names are and how I have to import a whole damn package for that.

['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']

Given relationships_edges, I want to convert them into a symmetric cost matrix so that matrix operations become easier (and faster! thanks numpy). To do this, I use the networkx.

I also normalize the costs by a factor of 100 so that it plays nice with the cost and acceptance probability later.

[[ 0.   -0.5   0.    0.5   0.    0.    0.    0.    0.    0.  ]
[-0.5   0.    0.    0.    0.    0.    0.    0.    0.    0.  ]
[ 0.    0.    0.   -0.5   0.    0.    0.    0.    0.    0.  ]
[ 0.5   0.   -0.5   0.    0.    0.    0.    0.    0.    0.  ]
[ 0.    0.    0.    0.    0.    0.25  0.    0.    0.    0.  ]
[ 0.    0.    0.    0.    0.25  0.   -0.5   0.    0.    0.  ]
[ 0.    0.    0.    0.    0.   -0.5   0.    0.    0.    0.  ]
[ 0.    0.    0.    0.    0.    0.    0.    0.   -0.5   0.  ]
[ 0.    0.    0.    0.    0.    0.    0.   -0.5   0.    0.  ]
[ 0.    0.    0.    0.    0.    0.    0.    0.    0.    0.  ]]
-0.5

Now we have a symmetric matrix of costs. An element at relationships_mat[i, j] describes the cost of putting guest i and guest j together in the same table. For example, guest_list = 'A' and guest_list = 'B'. The cost of putting A and B together at the same table is -50 (i.e. they love each other! yay!). relationships_mat[0, 1] gives us the normalized cost of -50 / 100 = -0.5 as expected.

Notice how networkx converted a nodelist of guest_list into positional integers for me. Isn’t that neat?

[[1 0 1 0 0 1 1 1 0 0]]

We can express a single “table” as a row in a matrix by marking guests present at the table with a 1 and those not present as 0 for the positional integer that represents them. For example, a table that contains 'A', 'C', 'F', 'G', 'H' can be expressed using the table_0_seats above.

[[-1.]]

We then observe the interesting (okay probably not interesting) property that $$SRS^T$$ where $$S$$ is the seats matrix, $$R$$ is the relationships cost matrix relationships_mat and $$S^T$$ is the transpose of the seats matrix gives us the cost of a particular arrangement (or more precisely, the total cost incurred per person. e.g. since F and G are sitting at the same table, the cost incurred by each of them is -0.5, so the total cost for the table is -1).

Using table_0_seats as an example, the result of doing the quadratic multiplication is:

$SRS^T = \sum^{guests}_{i=0} \sum^{guests}_{j=0} S_{1,i} \times R_{i,j} \times S^T_{j,1}$

Now this is just the formulation for a single table. We can easily imagine each row of $$S$$ as a single table. In that case, we get:

$SRS^T = C$

$C_{k, l} = \sum^{guests}_{i=0} \sum^{guests}_{j=0} S_{k,i} \times R_{i,j} \times S^T_{j,l}$

for all tables $$k$$ and $$l$$

This is what we do below:

[[-1.   -0.75]
[-0.75  0.  ]]
-1.0

Then, the diagonal of $$C$$ should give the cost of tables where $$k=l$$ i.e. our actual costs. The sum of the diagonal (or the trace) gives us the total cost of a given arrangement.

$Cost = \sum^{tables}_{m} C_{m,m}$

Using the example above, table_seats_a puts 'A', 'C', 'F', 'G', 'H' in one table and 'B', 'D', 'E', 'I', 'J' in another. F loves G, so each incurs a normalized negative cost of -0.5. So np.trace(table_costs_a) gives us -1.

[[-1.    0.25]
[ 0.25 -2.  ]]
-3.0

We can probably do better than that. So let’s try putting 'A', 'B', 'C', 'E', 'J' in one table and 'D', 'F', 'G', 'H', 'I' in the other. In this case, our first table gives us a cost of -1 because of A and B. The second table gives us -2 due to F G and H I. This leaves us with a total cost of np.trace(table_costs_b) = -3.

The advantage of defining our cost calculation in terms of a matrix quadratic operation is that numpy is fucking fast and tons of optimizations are built around those things. Plus it’s kind of elegant heh.

## Constraints, Requirements, and Strategy

Now we just need to minimize $$C$$ subject to some constraints:

• The number of people in each table should not change.
• As a corollary (and if you defined the starting point correctly), the total number of guests with seats should not change and hence everyone should be seated (otherwise one solution where people only hate each other and there’s no love is to keep removing people).
• Avoiding local minima is very important for obvious reasons.
• Finding the absolute best answer is not that important. Having an approximate answer’s probably good enough.

I chose simulated annealing because I’ve been wanting to learn that for quite a while. (thanks Quan!) It also gives me a really simple way to solve a pretty intractable problem (wedding dinners often contain 200 or more guests) while satisfying all the requirements I have above – especially the one on local minimum.

If you want a good introduction to simulated annealing, I suggest you read this: http://katrinaeg.com/simulated-annealing.html. It’s a really good tutorial, and she kicks ass at explaining simulated annealing. If you’re too lazy to read that, I stole the best part of the article that sums up simulated annealing nicely:

1. First, generate a random solution
2. Calculate its cost using some cost function you’ve defined
3. Generate a random neighboring solution
4. Calculate the new solution’s cost
5. Compare them:
• If c_new < c_old: move to the new solution
• If c_new > c_old: maybe move to the new solution. Be more eager to move initially, and less eager to move later
6. Repeat steps 3-5 above until an acceptable solution is found or you reach some maximum number of iterations.

Seriously though, go read the article. It’s awesomesauce.

## Simulated Annealing Functions

First we build our seat reshaping function:

reshape is a constant time operation that numpy provides that’s good for matrix heavy code. Helps you keep track of where you are and avoid cryptic bugs. I want my seat matrix $$S$$ to be of the shape (table_count, guests) so that the $$SRS^T$$ operation can be done easily.

Then we define the cost function. This should be nothing new. It is basically the trace of the cost matrix as mentioned earlier. It calculates the cost for a given seating arrangement using the relationship cost matrix.

The take_step function does step 3: generate a random neighboring solution. It does so by picking two random tables and swapping two random guests from the two tables. This has the very nice property of:

• Being random (no seriously this is important. If you’re proceeding in a definite direction ala gradient descent you’ll find yourself in a local minimum pretty quickly. You’ll also limit your space exploration tremendously if you’re not random.
• Never changing the number of guests per table. This affords us tons of flexibility, including specifying tables with different number of guests (a common problem for odd ballrooms)
• Similarly, not seating the same guest at two tables / having empty table etc.

Finally, prob_accept is the function that gives the probability of us moving from the current position to the neighboring position. If the neighboring position is indeed better (i.e. cost_new < cost_old) we move (i.e. probabilty is 1). Otherwise, we “may” move using the function np.exp((cost_old - cost_new) / temp) where temp is a decreasing function of the number of iterations. This means that our probability for taking a worse neighboring position over the current position

• Decreases the shittier the new position is. (i.e. our neighboring position is already going to be worse than our current position because otherwise, we would have returned probability 1. Given that it is worse than our current position, we are more willing to move to a position that is less shitty than a position that is more shitty.) This is expressed in cost_old - cost_new. Since cost_old < cost_new, (cost_old - cost_new) < 0. The more negative (cost_old - cost_new) is, the lower np.exp((cost_old - cost_new) / temp) is and hence the lower the probability.
• Decreases as we do more iteration (i.e. we become more conservative as we run more iterations). This is expressed in the temp denominator. Since the numerator (cost_old - cost_new) is negative, having an increasingly smaller temp would make (cost_old - cost_new) / temp more negative, hence the probability lower.

The four functions are reproduced below:

## Annealing

Now we get to the anneal function that runs the whole damn thing. We basically keep iterating until we reach a minimum temperature. The code should be pretty self explanatory and maps 1:1 to the pseudocode earlier. Essentially, it:

• Starts with an initial position
• While the current temperature is above the minimum temperature,
• For n_iter steps,
• Calculate a neighboring position (pos_new)
• Calculate the cost of the neighboring position (cost_new)
• Calculate the acceptance probability (p_accept) for the neighboring position
• “Maybe” move to the neighboring position
• Reduce current temperature by a given multiple (default value of alpha=0.9)

audit_trail lets me plot pretty graphs.

[[0 0 1 1 1 0 0 1 1 0]
[1 1 0 0 0 1 1 0 0 1]]
-4.0

And voila! We get our result. Turns out an optimal arrangement is to put 'C', 'D', 'E', 'H', 'I' in one table and 'A', 'B', 'F', 'G', 'J' in the other. The first table makes use of C D and H I. The second table makes use of A B and F G. That’s all the negative weights we have specified. Furthermore, it has avoided the positive weight combinations (A D and E F).

To see the journey we took to get here, we can use the audit_trail data.

<matplotlib.axes._subplots.AxesSubplot at 0x118a97748>   We see that

• cost_old decreases with random spikes initially due to us (adventurously!) exploring neighboring solutions.
• temp decreases (duh)
• p_accept decreases along with temp unless we find better solutions (in which case we set p_accept to 1)

This is fucking awesome :D

## Questions?

Why are all your friends getting married?

Because they’re on a mission to repopulate Singapore and reduce our dependency on foreign talents like me. They’re also probably more charming than me.

When’s yours?

Fuck off.

I noticed you used table_count = len(guest_list) // table_size. That means all tables have to be of the same size right? What if I want different table sizes?

Ah! Yes. Great question. There’s actually no such assumption. You just need to have a constant number of tables. We take neighboring steps by swapping two guests in two random tables. That means we don’t actually change the number of guests in each other. So even if in our example (using 10 guests) we have one table with 1 guest and the other with 9 guests, this would still work.

However, the number of tables would need to be constant. In other words, the table configuration (i.e. number of tables and number of people per table) need to be fixed. If you want to optimize over that, you’ll have to do multiple runs of this algorithm.

Why did you scale by 100 for the cost matrix and not use the cost matrix directly?

Because the way p_accept is defined (np.exp((cost_old - cost_new) / temp)) causes strange things to occur if cost_old is way too different from cost_new. You’d end up getting either 0 or 1 probabilities, which is not useful. Just observe the graph of $$f(x) = e^x$$ in the region of $$f(x) \leq 1$$.

Won’t performance suffer when you start having a 100 tables or thousands of guests?

Yes! Great question again. Given that our cost computation is $$SRS^T$$ where $$S$$ is of the shape (table_count, guest count) and $$R$$ is a symmetric matrix of the shape (guest_count, guest_count), our cost computation can get pretty intensive. Fortunately, we can exploit certain properties of the matrices, namely:

• $$S$$ is going to be super sparse. Each row of $$S$$ represents a table, and it contains 1s for the guests that are at the table and 0s for the other guests. That means a single row will be predominantly 0s. That means $$S$$ itself will be super sparse. scipy has an implementation of a matrix that optimizes for sparse performance. We can use that when things start to slow down.
• $$R$$ is a symmetric matrix (as with most undirected cost matrices). Again, scipy to the rescue – scipy has an implementation of a symmetric matrix that we can use.

Fortunately, numpy is already blazing fast and I haven’t ran into any performance problems for the larger tests I’ve ran (up to 100 guests). We can also farm out the large matrix operations to GPUs (especially if we modify the operations to be in place so that we don’t have to keep loading and unloading memory into the GPU’s RAM).

For now, YAGNI. No performance problems yet, and we know squeezing performance can be easy.

Are you going to make this into a website / command line app / app

Maybe. The fun’s all gone once the solution’s here though. I solved this three weeks ago and procrastinated writing this post for that long. Is this even worth making? I don’t know. Maybe more friends should get married.

Speaking of that, you should get married…

Fuck off.