Advent of Code is a series of small programming challenges, released daily throughout December in the run-up to Christmas. Part 1 of the challenge is given first. On its successful completion, Part 2 is revealed. The challenges are designed to be solved in any programming language. I will be using R.
There will no doubt be a wide variety of ways to solve these problems. I’m going to go with the first thing I think of that gets the right answer. In most cases, I expect that there will be more concise and efficient solutions. Most of the time I’m working in R, it’s within the tidyverse, so I imagine that framework will feature heavily below.
Each participant gets different input data, so my numerical solutions may be different from others. If you’re not signed up for Advent of Code yourself, but want to follow along with my data, you can download it at from the data links at the beginning of each day’s section. The links in the day section headers take you to challenge on the Advent of Code page.
The challenge is to find two numbers from a list that sum to 2020, then to report their product.
expand.grid() creates a data frame from all combinations of the supplied vectors. Since the vectors are the same, each pair is duplicated. In this case the two numbers in the list that sum to 2020 are 704 and 1316, and we have one row with 704 as Var1 and one with 704 as Var2. slice(1) takes the first occurrence of the pair.
The follow-up challenge is the same but with three numbers. I went with essentially the same code but it’s notably slower. There are a lot of repeated calculations here: each triplet appears six times in the table.
We need to find how many passwords are valid according to their policy. The policies and passwords are given as follows:
1-3 a: abcde
1-3 b: cdefg
2-9 c: ccccccccc
Each line gives the password policy and then the password. The password policy indicates the lowest and highest number of times a given letter must appear for the password to be valid. For example, 1-3 a means that the password must contain a at least 1 time and at most 3 times.
Toggle the code
library(dplyr)library(tidyr)library(stringr)
First load the libraries we’ll need. We then read in the data and use tidyr functions to separate out the parts of the policy and the password, making sure to convert the columns to numeric as appropriate:
Next, we use the stringr function str_count() to count how many times the given letter appears in the password, and conditional logic to check whether it is repeated within the specified number of times. Because TRUE has a numeric value of 1 and FALSE has a numeric value of 0, we can sum the resulting column to get a count of how many passwords are valid according to their policies.
Now the policy is interpreted differently. Each policy actually describes two positions in the password, where 1 means the first character, 2 means the second character, and so on. Exactly one of these positions must contain the given letter. How many are valid now?
There were a couple of gotchas here. When I used separate() in the previous part, I had inadvertently left a leading whitespace in front of the password, something that was messing up my indexing with str_sub. Using str_trim() first cleared that up. Also, we need exactly one of the positions to match. | is an inclusive or. We need xor() for exclusive or instead.
Now work my way across and down the matrix, using the %% modulo operator to loop round where necessary. The -1 and +1 in the line ((y + right - 1) %% num_col) + 1 is a hack to get round the fact that, for num_col columns, the modulo runs from 0 to num_col - 1, but the column indexes for our matrix run from 1 to num_col.
Toggle the code
right <-3down <-1num_rows <-nrow(tree_mat)num_col <-ncol(tree_mat)# start counting trees encounteredtrees <-0# start squarex <-1y <-1while (x <= num_rows) {# cat("row: ", x, "col: ", y, "\n")if (tree_mat[x,y] =="#") trees <- trees +1 x <- x + down y <- ((y + right -1) %% num_col) +1}trees
[1] 299
Part 2: Checking further slopes
We now need to check several other trajectories, and multiply together the number of trees we find, so we wrap the Part 1 code into a function.
Toggle the code
slope_check <-function(tree_mat, right, down) { num_rows <-nrow(tree_mat) num_col <-ncol(tree_mat)# start counting trees encountered trees <-0# start square x <-1 y <-1while (x <= num_rows) {if (tree_mat[x,y] =="#") trees <- trees +1 x <- x + down y <- ((y + right -1) %% num_col) +1 } trees}prod(slope_check(tree_mat, 1, 1),slope_check(tree_mat, 3, 1),slope_check(tree_mat, 5, 1),slope_check(tree_mat, 7, 1),slope_check(tree_mat, 1, 2))
Using readr::read_tsv() off the bat removes the blank lines, making it impossible to identify the different passports, but reading in the data via readLines() then converting as_tibble() preserves them, and then allows us to use tidyverse functions for the remaining tidying. cumsum() on a logical vectors takes advantage of FALSE having a numeric value of zero and TRUE having a numeric value of one.
Then we create a check column, which is TRUE when the value for each key meets the required conditions. Those with 7 TRUEs are valid. Note that with case_when() we’ve left the check column as NA when the condition is FALSE, requiring na.rm = TRUE in the call to sum(). We can get round that by adding a final line to the case_when() condition stating TRUE ~ FALSE. TRUE here is a catch-all for all remaining rows not covered by the conditions above, and then we set them to FALSE, but I find the line TRUE ~ FALSE unintuitive.
The code below sets starts by setting each row number to 127 and each column number to 7, the maximum they can be, then, working along the string, lowering the maximum (or leaving it as is) one letter at a time:
OK, I know I said in the introduction to this post that I would go with the first solution I think of that gets the right answer, and the above does work, but I’m deeply unhappy with the code. There’s too much repetition, I don’t like the use of subtraction when diving by 2 feels more appropriate in a binary context, and it doesn’t feel like I’ve taken full advantage of the mathematical structure of the problem. So, on further reflection, I realise that the way that ID is defined is essentially turning a binary number into a decimal, where we get the binary number as a string by replacing “B” and “R” by “1” and L” and “F” by “0”. Then, I just found, there is a base R function strtoi() that takes a string of digits in a given base and converts it to a base 10 integer, just what we need:
We need to find the missing number, so we arrange the IDs in ascending order and look at the gap between each ID and the preceding one. In most cases, that should be one. Where we have a gap of 2, we must have skipped the integer below:
Toggle the code
seat_IDs %>%arrange(ID) %>%mutate(diff =lag(ID)) %>%mutate(gap = ID - diff) %>%filter(gap ==2) %>%summarise(my_seat = ID -1) %>%pull(my_seat)
Within each group, we need to find the number of unique letters within each group. We read in and separate the data using the tricks learnt for Day 4, and take advantage of the rowwise() feature in dplyr 1.0.0.
Now, instead of unique letters in a group, we need to find the number of letters which appear in all the answers for everyone in the same group. I first note how many people are in each group, then tabulate the number of occurrences of each letter in the group, then count (by summing a logical vector) the number of matches between occurrences of letter and the number in group. Finally, we sum across all groups.
To find all bags that con eventually contain our shiny gold bag, we first find the bags that can contain it directly. We then find the bags that can contain those bags and take the union of the two levels. We repeat, stopping when going up a level adds no further bags to the vector of bag colours already found. We then subtract 1, because we don’t want to count the original shiny gold bag.
Toggle the code
# function to find all colours that contain a vector of other colours:contains_colours <-function(colours) { rules %>%filter(contains %in% colours) %>%distinct(container) %>%pull(container)}bags <-"shiny gold"old_length <-length(bags)new_length <-0# keeping adding to the vector of bags, until no changewhile(old_length != new_length) { old_length =length(bags) bags <- base::union(bags, contains_colours(bags)) %>%unique() new_length <-length(bags)#cat(old_length, ", ", new_length, "\n")}length(bags) -1
[1] 274
Part 2: Number of bags
Now we need to discover the number of bags that a shiny gold bag must contain. I figured that lends itself to recursion, but struggled on the details. Hat tip to David Robinson for this solution. I’ve learnt a lot for myself by unpicking how it works.
Our programme gets stuck in an infinite loop. As well as keeping track of the accumulator, we need to keep track of where we’ve visited, and stop when we visit the same instruction twice. We use a data.frame() rather than a tibble() as the former is easier to index into.
We start with a pretty straight-forward loop, noting that at most it can run for one more than the number of instructions in the programme until it hits an instruction it’s already visited. We update row number to visit next and the accumulator as appropriate.
Toggle the code
instructions$visited <-0row <-1accumulator <-0num_rows <-nrow(instructions)for (i in1:(num_rows+1)) {if (instructions[row, "visited"] !=0) break# +1 on number of times the row is visited instructions[row, "visited"] <- instructions[row, "visited"] +1# case when the instruction is "acc"if (instructions[row, "instruction"] =="acc") { accumulator <- accumulator + instructions[row, "value"] row <- row +1 }# case when the instruction is "jmp"elseif (instructions[row, "instruction"] =="jmp") { row <- row + instructions[row, "value"] }# case when the instruction is "nop"elseif (instructions[row, "instruction"] =="nop") { row <- row +1 }}accumulator
[1] 1915
Part 2: Fixing the programme
To break the loop, one of the nop instructions in the programme should be a jmp or vice versa. The plan is to swap these out one by one and check if the programme completes. It’s not a sophisticated approach, but it works fast enough (about a second).
First we note that the broken instruction must be one that we visited in Part 1. Also, an instruction of jmp with a value of 0 will get us stuck in a one-line infinite loop, so we avoid that.
We have 93 instruction to check. We modify our code from Part 1 slightly, converting it into a function and returning a list with values completes and accumulator. completes is FALSE as soon as we visit a row twice and TRUE if the number of our next row to visit is greater than the number of rows in the programme.
Toggle the code
programme_completes <-function(instructions) { row <-1L accumulator <-0 num_rows <-nrow(instructions)for (i in1:(num_rows+1)) {if (instructions[row, "visited"] !=0) {return(list(completes =FALSE, accumulator = accumulator)) }# +1 on number of times the row is visited instructions[row, "visited"] <- instructions[row, "visited"] +1# case when the instruction is "acc"if (instructions[row, "instruction"] =="acc") { accumulator <- accumulator + instructions[row, "value"] row <- row +1 }elseif (instructions[row, "instruction"] =="jmp") { row <- row + instructions[row, "value"] }elseif (instructions[row, "instruction"] =="nop") { row <- row +1 }if (row > num_rows) {return(list(completes =TRUE, accumulator = accumulator)) } }}
We now loop over the rows we’ve identified to check, breaking the loop as soon as we find a programme that completes. Finally, we extract the accumulator value from the successful programme.
Toggle the code
instructions$visited <-0for (row in rows_to_check) {# modify one row of the instructions,# copying data frame so we don't have to modify it back modified_instructions <- instructionsifelse(instructions[row, 1] =="jmp", modified_instructions[row, 1] <-"nop", modified_instructions[row, 1] <-"jmp") # check if the modified programme completes check_programme <-programme_completes(modified_instructions)if (check_programme$completes) break}check_programme$accumulator
We have to find the first number in the list which is not the sum of a pair of different numbers in the preceding 25 numbers.
Toggle the code
input <-as.double(readLines("data/AoC_day9.txt"))
There’s a nice trick for finding the pair of numbers in a vector that sum to a target that was doing the rounds on twitter in response to the Day 1 challenge: intersect(input, 2020 - input). For this challenge, we expand on that idea, writing it as a check_sum function. Where there’s more than one pair, it won’t say which pair together, and if the number that’s half the target appears in the addends, it will only appear once in the output. However, for this challenge, we only need to know when there are no pairs that sum to the target, which will be the case when the length of the output of check_sum is 0.
Then, it’s simply a case of iterating over windows of length 25, checking whether the following number is the sum of a distinct pair in that window, and returning the first one that isn’t.
Toggle the code
find_invalid_num <-function(vec, win =25) {for (i in (win+1):length(vec)) { check <-check_sum(vec[i], vec[(i-win):(i-1)])if (length(check) ==0) return(vec[i]) }}find_invalid_num(input)
[1] 507622668
Part 2: Contiguous set
Find a contiguous set in the list that sums to the invalid number from Part 1, and add together the largest and smallest number in that range.
First, we note that after a certain point, all numbers in the input are larger than the target, so we don’t need to consider those. We reduce our input vector accordingly.
To find the contiguous set in the list that sums to the target, we make use of accumulate() from the purrr package. Let the input list be \(x = (x_1, x_2,..., x_n)\). Then accumulate(x, sum) returns \(a = (x_1, x_1 + x_2,..., \sum_{j=1}^n x_j)\). We check whether any element of this vector is equal to the target. If so we index into the input vector appropriately, sum the min and max in the range and we’re done. If not, we consider the sums of all windows starting with the second element of the input list, and so on.
Toggle the code
contiguous_sum <-function(input, target) { len <-length(input)for (i in1:len) { a <- purrr::accumulate(input[i:len], sum) b <- a == targetif (sum(b) ==1) { output_length <-which(b) contiguous_set <- input[i:(i + output_length -1)]return(sum(range(contiguous_set))) } }}contiguous_sum(input_reduced, target)
[1] 76688505
I appreciate that there’s some redundant calculation in this method. The vectors of accumulated sums can contain numbers larger than the target (if writing our own loop, we could break as soon as the accumulated sum got too big). Also, in retrospect, we could have only run accumulate once, then in the second iteration of the loop, subtracted input[1] from the result, in the third iteration subtracted input[2] from that result, etc. However, the function as written is concise and easy to understand, and gets our answer in around a second, so that will do!
Instead of building up sequences of adapters, we see what we can remove from the full list.
First, we check the diffs: are they just 1 and 3 or are there any 2s?
Toggle the code
table(adapter_diffs)
adapter_diffs
1 3
74 41
We can’t remove an adapter if its difference with the previous adapter is 3, otherwise the difference between the adapters on either side of it will be too big.
What about diffs of 1? It depends how many ones there are around it. We can check this using the rle() (run length encoding) function
What is the distribution of lengths of sequences of 1s?
Toggle the code
runs_table <-table(runs$lengths) runs_table
1 2 3 4
13 14 10 11
We have at most four diffs of 1 in a row.
We need to check that if we remove an adapter, the new differences do not exceed 3. Example sequences really helped me figure out what’s going on here:
If the diff sequence is …, 3, 1, 3,… (e.g. adapters 1, 4, 5, 8)
1 option to keep as is
We cannot remove any adapters
1 option in total
If the diff sequence is …, 3, 1, 1, 3,… (e.g. adapters 1, 4, 5, 6, 9)
1 option to keep as is
1 option to remove one adapter (e.g. the 5)
we cannot remove two adapters
2 options total
If the diff sequence is …, 3, 1, 1, 1, 3,… (e.g. adapters 1, 4, 5, 6, 7, 10)
1 option to keep as is
2 options to remove one adapter (e.g. the 5 or 6)
1 options to remove two adapters (e.g. the 5 and 6)
We cannot remove three adapters
4 options total
If the diff sequence is …, 3, 1, 1, 1, 1, 3,… (e.g. adapters 1, 4, 5, 6, 7, 8, 11)
1 option to keep as is
3 options to remove one adapter (e.g. 5, 6, or 7)
3 options to remove two adapters (e.g. any two of 5, 6, and 7)
We cannot remove three adapters
7 options total
Finally, we multiply each run length of difference of 1s with the number of options we have for removing adapters, then take the product of those products.
My code for Day 11 runs a little slow (about 10 seconds for Part 1 and 80 seconds for Part 2), so for the sake of being able to rebuild this page quickly as I keep updating it working through the challenges, I will demonstrate this code with the test input provided as an example.
Toggle the code
library(dplyr)library(stringr)library(tidyr)
First we read in the data and convert it to a matrix (using the datapasta package for the test input):
# get number of columns for matrixnum_col <- layout %>%mutate(length =str_length(X1)) %>%slice(1) %>%pull(length)# split layout into characters and turn to vectorlayout_vec <- layout %>%mutate(X1 =strsplit(X1, split =character(0), fixed =TRUE)) %>%pull(X1) %>%unlist()# organise into matrixinitial_layout <-matrix(layout_vec, ncol = num_col, byrow =TRUE)
Next, we write a helper function that, given a matrix and row and column indices, returns a vector of the adjacent seats. We need to take care when indexing into the matrix, so we treat all corner and edge cases separately. Fiddly, but gets the job done.
Toggle the code
get_adj <-function(mat, i,j) { nr <-nrow(mat) nc <-ncol(mat)# corner casesif (i ==1& j ==1) {adj <-c(mat[1,2], mat[2,1:2])}elseif (i ==1& j == nc) {adj <-c(mat[1,(nc-1)], mat[2,(nc-1):nc])}elseif (i == nr & j ==1) {adj <-c(mat[nr,2], mat[nr-1,1:2])}elseif (i == nr & j == nc) {adj <-c(mat[nr-1, (nc-1):nc], mat[nr, nc-1])} # edge caseselseif (i ==1) {adj <-c(mat[1, c(j-1,j+1)], mat[2, (j-1):(j+1)])}elseif (i == nr) {adj <-c(mat[nr, c(j-1,j+1)], mat[nr-1, (j-1):(j+1)])}elseif (j ==1) {adj <-c(mat[c(i-1, i+1), 1], mat[(i-1):(i+1), 2])}elseif (j == nc) {adj <-c(mat[c(i-1, i+1), nc], mat[(i-1):(i+1), nc-1])}# inside caseselse {adj <-c(mat[i-1,(j-1):(j+1)], mat[i,c(j-1,j+1)], mat[i+1,(j-1):(j+1)])} adj}
Once we have a vector of surrounding seats, we can apply the rules in the problem to determine whether a given seat needs to change state. The needs_changing helper function does that. It’s overkill at this point to give options to specify the function for finding the vector of seats to check, and the maximum number of occupied seats people can tolerate around them, but (spolier alert) I put in these options when working on the challenge in Part 2.
Since floor spaces don’t change, we only need to consider seats. We save the indices of the seats into a data frame, so we can vectorise over it using tidyverse functions. However, when we’ve determined the seats that need changing, using our needs_changing function, we need to convert those indices from a data.frame into a matrix, in order to index into the layout matrix appropriately and make the changes.
layout <- initial_layoutiters <-0# loop until there are no further changesrepeat { change <-0 seats_to_change <- seats_df %>%rowwise() %>%mutate(change_seat =needs_changing(layout,i,j)) change <-sum(seats_to_change$change_seat)if (change ==0) break indices_to_change <- seats_to_change %>%filter(change_seat) %>%select(i,j) %>%as.matrix() layout[indices_to_change] <-setdiff(c("L", "#"), layout[indices_to_change]) iters <- iters +1}part_1_iters <- iterssum(layout=="#")
[1] 37
On the test set, this takes 5 iterations. On the full data set, my answer is 2316, and it took 107 iterations.
Part 2: Looking further
Now, people look to the first seat they can see in each direction, and will change from occupied to unoccupied if five or more of them are occupied.
The plan is to write a function that extracts full vectors from a given seat to the edge of the layout matrix in each of the eight directions, then finds the first seat in each of those directions, and finally collects those into a vector of the seats under consideration when determining if a change is needed. Then I can reuse the loop from Part 1, just changing the arguments in the calls to needs_changing.
Here’s a helper function to get the first seat in a vector looking in one direction:
Now, if I thought getting adjacent seats to a given seat in Part 1 was fiddly, it’s nothing on getting a vector from a given seat to the edge of the matrix. There are many cases to consider to make we we don’t go out of bounds. In the diagonal directions, first we get a matrix of the indices of the matrix we need, then subset into the matrix accordingly.
Toggle the code
# takes a layout matrix (elements ".", "#", "L")# returns vector with first "L" or "#" encountered in each directionget_first_seat <-function(mat, i,j) { nr <-nrow(mat) nc <-ncol(mat)# Northif (i ==1) N <-NAif (i >1) N <- mat[(i-1):1,j]# Southif (i == nr) S <-NAif (i < nr) S <- mat[(i+1):nr,j]# Eastif (j == nc) E <-NAif (j < nc) E <- mat[i, (j+1):nc]# Westif (j ==1) W <-NAif (j >1) W <- mat[i, (j-1):1]# how far in each direction to edge of matrix to_N <- i -1 to_S <- nr - i to_E <- nc - j to_W <- j -1# North-West NW_length <-min(to_N, to_W)if (i ==1| j ==1) NW <-NAelse { mat_index <-matrix(c((i-1):(i-NW_length), (j-1):(j-NW_length)), ncol =2) NW <- mat[mat_index] }# North-East NE_length <-min(to_N, to_E)if (i ==1| j == nc) NE <-NAelse { mat_index <-matrix(c((i-1):(i-NE_length), (j+1):(j+NE_length)), ncol =2) NE <- mat[mat_index] }# South-East SE_length <-min(to_S, to_E)if (i == nr | j == nc) SE <-NAelse { mat_index <-matrix(c((i+1):(i+SE_length), (j+1):(j+SE_length)), ncol =2) SE <- mat[mat_index] }# South-West SW_length <-min(to_S, to_W)if (i == nr | j ==1) SW <-NAelse { mat_index <-matrix(c((i+1):(i+SW_length), (j-1):(j-SW_length)), ncol =2) SW <- mat[mat_index] }# vectors from mat[i,j] to the edge in each direction all_vecs <- (list(N = N, S = S, E = E, W = W, NW = NW, NE = NE, SE = SE, SW = SW))# the first seat in each direction, collapsed to a vector first_seats <- purrr::map_chr(all_vecs, get_first_seat_from_vec)# remove NAs from list and return# (these occur either when starting on an edge, # or when there are no seats in a given direction)return(first_seats[!is.na(first_seats)])}
Toggle the code
layout <- initial_layoutiters <-0# loop until there are no further changesrepeat { change <-0 seats_to_change <- seats_df %>%rowwise() %>%mutate(change_seat =needs_changing(layout,i,j, get_first_seat, 5)) change <-sum(seats_to_change$change_seat)if (change ==0) break indices_to_change <- seats_to_change %>%filter(change_seat) %>%select(i,j) %>%as.matrix() layout[indices_to_change] <-setdiff(c("L", "#"), layout[indices_to_change]) iters <- iters +1}part_2_iters <- iterssum(layout=="#")
[1] 26
On the test set, this takes 6 iterations. On the full data set, my answer is 2128, and it took 87 iterations. Given this is fewer iterations than in Part 1, it must be my code for getting the first seat that’s slowing things down.
I am unsatisfied both by how many lines of code this has taken as well as the time taken to run. The introduction to Advent of Code says that each challenge has a solution that will complete in at most 15 seconds on ten year old hardware. So clearly there’s a better way of doing this. Perhaps something to revisit in the future.
Next
I was late to the game, and that was as far as I managed to get in December 2020. I’m looking forward to taking on the challenge again in 2021!
---title: "Advent of Code 2020"description: | My attempts at Advent of Code, 2020author: - name: Ella Kaye ##url: httep://twitter.com/ellakayedate: 2020-12-09site-url: https://ellakaye.co.uk/image: advent-of-code-2020.jpgcategories: - R - Advent of Codetoc-depth: 4twitter: site: "@ellamkaye" creator: "@ellamkaye"draft: false---```{r}#| echo: false#| results: 'hide'long_slug <-"2020-12-09_advent-of-code-2020"# NOTE: after finishing post, run renv::snapshot() and copy the renv.lock file # from the project root into the post directoryrenv::use(lockfile ="renv.lock")```[Advent of Code](https://adventofcode.com) is a series of small programming challenges, released daily throughout December in the run-up to Christmas. Part 1 of the challenge is given first. On its successful completion, Part 2 is revealed. The challenges are designed to be solved in any programming language. I will be using R. There will no doubt be a wide variety of ways to solve these problems. I'm going to go with the first thing I think of that gets the right answer. In most cases, I expect that there will be more concise and efficient solutions. Most of the time I'm working in R, it's within the [tidyverse](https://www.tidyverse.org), so I imagine that framework will feature heavily below.Each participant gets different input data, so my numerical solutions may be different from others. If you're not signed up for Advent of Code yourself, but want to follow along with my data, you can download it at from the data links at the beginning of each day's section. The links in the day section headers take you to challenge on the Advent of Code page. ## Day 1: [Report Repair](https://adventofcode.com/2020/day/1)<p><a id='day1'></a></p>[My day 1 data](data/AoC_day1.txt)#### Part 1: Two numbersThe challenge is to find two numbers from a list that sum to 2020, then to report their product.`expand.grid()` creates a data frame from all combinations of the supplied vectors. Since the vectors are the same, each pair is duplicated. In this case the two numbers in the list that sum to 2020 are 704 and 1316, and we have one row with 704 as Var1 and one with 704 as Var2. `slice(1)` takes the first occurrence of the pair.```{r message = FALSE}library(dplyr)expenses <- readLines("data/AoC_day1.txt") %>% as.numeric()expand.grid(expenses, expenses) %>% mutate(sum = Var1 + Var2) %>% filter(sum == 2020) %>% mutate(prod = Var1 * Var2) %>% slice(1) %>% pull(prod)```#### Part 2: Three numbersThe follow-up challenge is the same but with three numbers. I went with essentially the same code but it's notably slower. There are a lot of repeated calculations here: each triplet appears six times in the table.```{r}expand.grid(expenses, expenses, expenses) %>%mutate(sum = Var1 + Var2 + Var3) %>%filter(sum ==2020) %>%mutate(prod = Var1 * Var2 * Var3) %>%slice(1) %>%pull(prod)```## Day 2: [Password Philosophy](https://adventofcode.com/2020/day/2)[My day 2 data](data/AoC_day2.txt)#### Part 1: Number of lettersWe need to find how many passwords are valid according to their policy. The policies and passwords are given as follows:```1-3 a: abcde1-3 b: cdefg2-9 c: ccccccccc```Each line gives the password policy and then the password. The password policy indicates the lowest and highest number of times a given letter must appear for the password to be valid. For example, `1-3 a` means that the password must contain `a` at least 1 time and at most 3 times.```{r}library(dplyr)library(tidyr)library(stringr)```First load the libraries we'll need. We then read in the data and use `tidyr` functions to separate out the parts of the policy and the password, making sure to convert the columns to numeric as appropriate:```{r, message = FALSE}passwords <- readr::read_tsv("data/AoC_day2.txt", col_names = FALSE) %>% separate(X1, c("policy", "password"), sep = ":") %>% separate(policy, c("count", "letter"), sep = " ") %>% separate(count, c("min", "max")) %>% mutate(min = as.integer(min), max = as.integer(max))```Next, we use the `stringr` function `str_count()` to count how many times the given letter appears in the password, and conditional logic to check whether it is repeated within the specified number of times. Because `TRUE` has a numeric value of 1 and `FALSE` has a numeric value of 0, we can sum the resulting column to get a count of how many passwords are valid according to their policies.```{r}passwords %>%mutate(count =str_count(password, letter)) %>%mutate(password_in_policy =if_else(count >= min & count <= max, TRUE, FALSE)) %>%summarise(correct =sum(password_in_policy)) %>%pull(correct)```#### Part 2: Position of lettersNow the policy is interpreted differently. Each policy actually describes two positions in the password, where 1 means the first character, 2 means the second character, and so on. Exactly one of these positions must contain the given letter. How many are valid now?There were a couple of *gotchas* here. When I used `separate()` in the previous part, I had inadvertently left a leading whitespace in front of the password, something that was messing up my indexing with `str_sub`. Using `str_trim()` first cleared that up. Also, we need *exactly one* of the positions to match. `|` is an inclusive or. We need `xor()` for exclusive or instead.```{r}passwords %>%mutate(password =str_trim(password)) %>%mutate(pos1_letter =str_sub(password, min, min),pos2_letter =str_sub(password, max, max)) %>%mutate(match_one =xor(pos1_letter == letter, pos2_letter == letter)) %>%summarise(correct =sum(match_one)) %>%pull(correct) ```## Day 3: [Toboggan Trajectory](https://adventofcode.com/2020/day/3)[My day 3 data](data/AoC_day3.txt)#### Part 1: Encountering treesStarting at the top left corner of the map, how many trees ("#") do we encounter, going at a trajectory of 3 right and 1 down?First, read in the data and save it into a matrix. My method here feels really hack-y. I'm sure there must be a better approach. ```{r, message = FALSE}library(dplyr)tree_map <- readr::read_tsv("data/AoC_day3.txt", col_names = FALSE)num_col <- tree_map %>% mutate(length = str_length(X1)) %>% slice(1) %>% pull(length)tree_vec <- tree_map %>% mutate(X1 = strsplit(X1, split = character(0), fixed = TRUE)) %>% pull(X1) %>% unlist()tree_mat <- matrix(tree_vec, ncol = num_col, byrow = TRUE)```Now work my way across and down the matrix, using the `%%` modulo operator to loop round where necessary. The `-1` and `+1` in the line `((y + right - 1) %% num_col) + 1` is a hack to get round the fact that, for `num_col` columns, the modulo runs from `0` to `num_col - 1`, but the column indexes for our matrix run from `1` to `num_col`. ```{r, message = FALSE}right <- 3down <- 1num_rows <- nrow(tree_mat)num_col <- ncol(tree_mat)# start counting trees encounteredtrees <- 0# start squarex <- 1y <- 1while (x <= num_rows) { # cat("row: ", x, "col: ", y, "\n") if (tree_mat[x,y] == "#") trees <- trees + 1 x <- x + down y <- ((y + right - 1) %% num_col) + 1}trees```#### Part 2: Checking further slopesWe now need to check several other trajectories, and multiply together the number of trees we find, so we wrap the Part 1 code into a function.```{r, message = FALSE}slope_check <- function(tree_mat, right, down) { num_rows <- nrow(tree_mat) num_col <- ncol(tree_mat) # start counting trees encountered trees <- 0 # start square x <- 1 y <- 1 while (x <= num_rows) { if (tree_mat[x,y] == "#") trees <- trees + 1 x <- x + down y <- ((y + right - 1) %% num_col) + 1 } trees}prod(slope_check(tree_mat, 1, 1), slope_check(tree_mat, 3, 1), slope_check(tree_mat, 5, 1), slope_check(tree_mat, 7, 1), slope_check(tree_mat, 1, 2))```## Day 4: [Passport Processing](https://adventofcode.com/2020/day/4)<p><a id='day4'></a></p>[My day 4 data](data/AoC_day4.txt)#### Part 1: Complete passports```{r}library(dplyr)library(tidyr)```Using `readr::read_tsv()` off the bat removes the blank lines, making it impossible to identify the different passports, but reading in the data via `readLines()` then converting `as_tibble()` preserves them, and then allows us to use `tidyverse` functions for the remaining tidying. `cumsum()` on a logical vectors takes advantage of `FALSE` having a numeric value of zero and `TRUE` having a numeric value of one.```{r}passports <-readLines("data/AoC_day4.txt") %>%as_tibble() %>%separate_rows(value, sep =" ") %>%mutate(new_passport = value =="") %>%mutate(ID =cumsum(new_passport) +1) %>%filter(!new_passport) %>%select(-new_passport) %>%separate(value, c("key", "value"), sep =":") %>%relocate(ID)```Our data is now in three columns, with ID, key and value, so now we need to find the number of passports with all seven fields once `cid` is excluded:```{r}passports %>%filter(key !="cid") %>%count(ID) %>%filter(n ==7) %>%nrow()```#### Part 2: Valid passportsNow we need to add data validation checks:- byr (Birth Year) - four digits; at least 1920 and at most 2002.- iyr (Issue Year) - four digits; at least 2010 and at most 2020.- eyr (Expiration Year) - four digits; at least 2020 and at most 2030.- hgt (Height) - a number followed by either cm or in: - If cm, the number must be at least 150 and at most 193. - If in, the number must be at least 59 and at most 76.- hcl (Hair Color) - a # followed by exactly six characters 0-9 or a-f.- ecl (Eye Color) - exactly one of: amb blu brn gry grn hzl oth.- pid (Passport ID) - a nine-digit number, including leading zeroes.- cid (Country ID) - ignored, missing or not.Ignoring the `cid` field, we narrow down on passports that at least have the right number of fields, and extract the number from the `hgt` column:```{r, message = FALSE, warning = FALSE}complete_passports <- passports %>% filter(key != "cid") %>% add_count(ID) %>% filter(n == 7) %>% select(-n) %>% mutate(hgt_value = case_when( key == "hgt" ~ readr::parse_number(value), TRUE ~ NA_real_)) %>% ungroup()```Then we create a `check` column, which is `TRUE` when the value for each key meets the required conditions. Those with 7 `TRUE`s are valid. Note that with `case_when()` we've left the check column as `NA` when the condition is `FALSE`, requiring `na.rm = TRUE` in the call to `sum()`. We can get round that by adding a final line to the `case_when()` condition stating `TRUE ~ FALSE`. `TRUE` here is a catch-all for all remaining rows not covered by the conditions above, and then we set them to `FALSE`, but I find the line `TRUE ~ FALSE` unintuitive. ```{r, message = FALSE}complete_passports %>% mutate(check = case_when( (key == "byr" & value >= 1920) & (key == "byr" & value <= 2002) ~ TRUE, (key == "iyr" & value >= 2010) & (key == "iyr" & value <= 2020) ~ TRUE, (key == "eyr" & value >= 2020) & (key == "eyr" & value <= 2030) ~ TRUE, key == "hgt" & str_detect(value, "cm") & hgt_value >= 150 & hgt_value <= 193 ~ TRUE, key == "hgt" & str_detect(value, "in") & hgt_value >= 59 & hgt_value <= 76 ~ TRUE, key == "hcl" & str_detect(value, "^#[a-f0-9]{6}$") ~ TRUE, key == "ecl" & value %in% c("amb", "blu", "brn", "gry", "grn", "hzl", "oth") ~ TRUE, key == "pid" & str_detect(value, "^[0-9]{9}$") ~ TRUE )) %>% group_by(ID) %>% summarise(check_all = sum(check, na.rm = TRUE)) %>% filter(check_all == 7) %>% nrow()```## Day 5: [Binary Boarding](https://adventofcode.com/2020/day/5)[My day 5 data](data/AoC_day5.txt)#### Part 1: Finding all seat IDs```{r message = FALSE}library(dplyr)library(stringr)```The code below sets starts by setting each row number to 127 and each column number to 7, the maximum they can be, then, working along the string, lowering the maximum (or leaving it as is) one letter at a time:```{r, message = FALSE}boarding <- readr::read_tsv("data/AoC_day5.txt", col_names = FALSE) %>% rename(binary = X1)seat_IDs <- boarding %>% mutate(row = 127) %>% mutate(col = 7) %>% mutate(row = if_else(str_sub(binary, 1, 1) == "F", row - 64, row)) %>% mutate(row = if_else(str_sub(binary, 2, 2) == "F", row - 32, row)) %>% mutate(row = if_else(str_sub(binary, 3, 3) == "F", row - 16, row)) %>% mutate(row = if_else(str_sub(binary, 4, 4) == "F", row - 8, row)) %>% mutate(row = if_else(str_sub(binary, 5, 5) == "F", row - 4, row)) %>% mutate(row = if_else(str_sub(binary, 6, 6) == "F", row - 2, row)) %>% mutate(row = if_else(str_sub(binary, 7, 7) == "F", row - 1, row)) %>% mutate(col = if_else(str_sub(binary, 8, 8) == "L", col - 4, col)) %>% mutate(col = if_else(str_sub(binary, 9, 9) == "L", col - 2, col)) %>% mutate(col = if_else(str_sub(binary, 10, 10) == "L", col - 1, col)) %>% mutate(ID = row * 8 + col) seat_IDs %>% summarise(max = max(ID)) %>% pull(max)```OK, I know I said in the introduction to this post that I would go with the first solution I think of that gets the right answer, and the above does work, but I'm *deeply* unhappy with the code. There's too much repetition, I don't like the use of subtraction when diving by 2 feels more appropriate in a binary context, and it doesn't feel like I've taken full advantage of the mathematical structure of the problem. So, on further reflection, I realise that the way that ID is defined is essentially turning a binary number into a decimal, where we get the binary number as a string by replacing "B" and "R" by "1" and L" and "F" by "0". Then, I just found, there is a base R function `strtoi()` that takes a string of digits in a given base and converts it to a base 10 integer, just what we need:```{r}seat_IDs <- boarding %>%mutate(binary =str_replace_all(binary, "L|F", "0")) %>%mutate(binary =str_replace_all(binary, "B|R", "1")) %>%mutate(ID =strtoi(binary, base =2)) %>%arrange(desc(ID))seat_IDs %>%slice(1) %>%pull(ID)```That's better!#### Part 2: Finding my seat IDWe need to find the missing number, so we arrange the IDs in ascending order and look at the gap between each ID and the preceding one. In most cases, that should be one. Where we have a gap of 2, we must have skipped the integer below:```{r}seat_IDs %>%arrange(ID) %>%mutate(diff =lag(ID)) %>%mutate(gap = ID - diff) %>%filter(gap ==2) %>%summarise(my_seat = ID -1) %>%pull(my_seat)```## Day 6: [Custom Customs](https://adventofcode.com/2020/day/6)[My day 6 data](data/AoC_day6.txt)#### Part 1: Anyone answers```{r message = FALSE}library(dplyr)library(tidyr)library(stringr)```Within each group, we need to find the number of unique letters within each group. We read in and separate the data using the tricks learnt for <a href="#day4">Day 4</a>, and take advantage of the `rowwise()` feature in `dplyr 1.0.0`. ```{r message = FALSE}customs_groups <- readLines("data/AoC_day6.txt") %>% as_tibble() %>% mutate(new_group = value == "") %>% mutate(group_ID = cumsum(new_group) + 1) %>% filter(!new_group) %>% select(-new_group) %>% group_by(group_ID) customs_groups %>% summarise(qs = str_c(value, collapse = "")) %>% ungroup() %>% mutate(qss = str_split(qs, "")) %>% rowwise() %>% mutate(qsu = list(unique(qss))) %>% mutate(count = length(qsu)) %>% ungroup() %>% summarise(total = sum(count)) %>% pull(total)```#### Part 2: Everyone answersNow, instead of unique letters in a group, we need to find the number of letters which appear in all the answers for everyone in the same group. I first note how many people are in each group, then tabulate the number of occurrences of each letter in the group, then count (by summing a logical vector) the number of matches between occurrences of letter and the number in group. Finally, we sum across all groups.```{r message = FALSE}customs_groups %>% add_count(group_ID, name = "num_in_group") %>% group_by(group_ID, num_in_group) %>% summarise(qs = str_c(value, collapse = "")) %>% ungroup() %>% mutate(qss = str_split(qs, "")) %>% rowwise() %>% mutate(letter_table = list(table(qss))) %>% slice(1) %>% mutate(in_common = sum(num_in_group == letter_table)) %>% ungroup() %>% summarise(total = sum(in_common)) %>% pull(total)```## Day 7: [Handy Haverstocks](https://adventofcode.com/2020/day/7)[My day 7 data](data/AoC_day7.txt)#### Part 1: Number of colour bags```{r message = FALSE}library(tidyverse)```We have colour-coded bags that must contain a specific number of other colour-coded bags.```{r message = FALSE}bags <- read_tsv("data/AoC_day7.txt", col_names = FALSE)head(bags)```Our first task is to parse the natural language and split the rules into one container/contains pair per line:```{r}rules <- bags %>%mutate(rule =row_number()) %>%separate(X1, c("container", "contains"), sep =" bags contain ") %>%separate_rows(contains, sep =",") %>%mutate(contains =str_remove(contains, "\\.")) %>%mutate(contains =str_remove(contains, "bags|bag")) %>%#mutate(contains = str_replace(contains, "no other", "0 other")) %>%extract(contains, c('number', 'contains'), "(\\d+) (.+)") %>%filter(!is.na(number)) %>%mutate(contains =str_trim(contains)) %>%mutate(number =as.integer(number)) ```To find all bags that con eventually contain our `shiny gold` bag, we first find the bags that can contain it directly. We then find the bags that can contain those bags and take the union of the two levels. We repeat, stopping when going up a level adds no further bags to the vector of bag colours already found. We then subtract 1, because we don't want to count the original shiny gold bag.```{r}# function to find all colours that contain a vector of other colours:contains_colours <-function(colours) { rules %>%filter(contains %in% colours) %>%distinct(container) %>%pull(container)}bags <-"shiny gold"old_length <-length(bags)new_length <-0# keeping adding to the vector of bags, until no changewhile(old_length != new_length) { old_length =length(bags) bags <- base::union(bags, contains_colours(bags)) %>%unique() new_length <-length(bags)#cat(old_length, ", ", new_length, "\n")}length(bags) -1```#### Part 2: Number of bagsNow we need to discover the number of bags that a shiny gold bag must contain. I figured that lends itself to recursion, but struggled on the details. Hat tip to David Robinson for [this solution](https://twitter.com/drob/status/1336003816395845632). I've learnt a lot for myself by unpicking how it works.```{r}count_all_contained <-function(colour) { relevant_rules <- rules %>%filter(container %in% colour)sum(relevant_rules$number * (1+map_dbl(relevant_rules$contains, count_all_contained)))}count_all_contained("shiny gold")```## Day 8: [Handheld Halting](https://adventofcode.com/2020/day/8)[My day 8 data](data/AoC_day8.txt)#### Part 1: Infinite LoopOur programme gets stuck in an infinite loop. As well as keeping track of the accumulator, we need to keep track of where we've visited, and stop when we visit the same instruction twice. We use a `data.frame()` rather than a `tibble()` as the former is easier to index into.```{r, message = FALSE}instructions <- read.table("data/AoC_day8.txt", col.names = c("instruction", "value"))```We start with a pretty straight-forward loop, noting that at most it can run for one more than the number of instructions in the programme until it hits an instruction it's already visited. We update row number to visit next and the accumulator as appropriate.```{r}instructions$visited <-0row <-1accumulator <-0num_rows <-nrow(instructions)for (i in1:(num_rows+1)) {if (instructions[row, "visited"] !=0) break# +1 on number of times the row is visited instructions[row, "visited"] <- instructions[row, "visited"] +1# case when the instruction is "acc"if (instructions[row, "instruction"] =="acc") { accumulator <- accumulator + instructions[row, "value"] row <- row +1 }# case when the instruction is "jmp"elseif (instructions[row, "instruction"] =="jmp") { row <- row + instructions[row, "value"] }# case when the instruction is "nop"elseif (instructions[row, "instruction"] =="nop") { row <- row +1 }}accumulator```#### Part 2: Fixing the programmeTo break the loop, one of the `nop` instructions in the programme should be a `jmp` or vice versa. The plan is to swap these out one by one and check if the programme completes. It's not a sophisticated approach, but it works fast enough (about a second).First we note that the broken instruction must be one that we visited in Part 1. Also, an instruction of `jmp` with a value of 0 will get us stuck in a one-line infinite loop, so we avoid that.```{r message = FALSE}library(dplyr)rows_to_check <- instructions %>% mutate(row_id = row_number()) %>% filter(visited != 0) %>% filter(instruction != "acc") %>% filter(!(instruction == "nop" & value == 0)) %>% pull(row_id)```We have `r length(rows_to_check)` instruction to check. We modify our code from Part 1 slightly, converting it into a function and returning a list with values `completes` and `accumulator`. `completes` is `FALSE` as soon as we visit a row twice and `TRUE` if the number of our next row to visit is greater than the number of rows in the programme.```{r}programme_completes <-function(instructions) { row <-1L accumulator <-0 num_rows <-nrow(instructions)for (i in1:(num_rows+1)) {if (instructions[row, "visited"] !=0) {return(list(completes =FALSE, accumulator = accumulator)) }# +1 on number of times the row is visited instructions[row, "visited"] <- instructions[row, "visited"] +1# case when the instruction is "acc"if (instructions[row, "instruction"] =="acc") { accumulator <- accumulator + instructions[row, "value"] row <- row +1 }elseif (instructions[row, "instruction"] =="jmp") { row <- row + instructions[row, "value"] }elseif (instructions[row, "instruction"] =="nop") { row <- row +1 }if (row > num_rows) {return(list(completes =TRUE, accumulator = accumulator)) } }} ```We now loop over the rows we've identified to check, breaking the loop as soon as we find a programme that completes. Finally, we extract the accumulator value from the successful programme.```{r}instructions$visited <-0for (row in rows_to_check) {# modify one row of the instructions,# copying data frame so we don't have to modify it back modified_instructions <- instructionsifelse(instructions[row, 1] =="jmp", modified_instructions[row, 1] <-"nop", modified_instructions[row, 1] <-"jmp") # check if the modified programme completes check_programme <-programme_completes(modified_instructions)if (check_programme$completes) break}check_programme$accumulator```## Day 9: [Encoding Error](https://adventofcode.com/2020/day/9)[My day 9 data](data/AoC_day9.txt)#### Part 1: Weak LinkWe have to find the first number in the list which is *not* the sum of a pair of different numbers in the preceding 25 numbers.```{r}input <-as.double(readLines("data/AoC_day9.txt")) ```There's a nice trick for finding the pair of numbers in a vector that sum to a target that was doing the rounds on twitter in response to the <a href="#day1">Day 1</a> challenge: `intersect(input, 2020 - input)`. For this challenge, we expand on that idea, writing it as a `check_sum` function. Where there's more than one pair, it won't say which pair together, and if the number that's half the target appears in the addends, it will only appear once in the output. However, for this challenge, we only need to know when there are *no* pairs that sum to the target, which will be the case when the length of the output of `check_sum` is 0.```{r}check_sum <-function(target, addends) {intersect(addends, target-addends)}```Then, it's simply a case of iterating over windows of length 25, checking whether the following number is the sum of a distinct pair in that window, and returning the first one that isn't.```{r}find_invalid_num <-function(vec, win =25) {for (i in (win+1):length(vec)) { check <-check_sum(vec[i], vec[(i-win):(i-1)])if (length(check) ==0) return(vec[i]) }}find_invalid_num(input)```#### Part 2: Contiguous setFind a contiguous set in the list that sums to the invalid number from Part 1, and add together the largest and smallest number in that range.First, we note that after a certain point, all numbers in the input are larger than the target, so we don't need to consider those. We reduce our input vector accordingly.```{r}target <-find_invalid_num(input)input_reduced <- input[1:(max(which(input <= target)))]```To find the contiguous set in the list that sums to the target, we make use of `accumulate()` from the `purrr` package. Let the input list be $x = (x_1, x_2,..., x_n)$. Then `accumulate(x, sum)` returns $a = (x_1, x_1 + x_2,..., \sum_{j=1}^n x_j)$. We check whether any element of this vector is equal to the target. If so we index into the input vector appropriately, sum the min and max in the range and we're done. If not, we consider the sums of all windows starting with the second element of the input list, and so on.```{r}contiguous_sum <-function(input, target) { len <-length(input)for (i in1:len) { a <- purrr::accumulate(input[i:len], sum) b <- a == targetif (sum(b) ==1) { output_length <-which(b) contiguous_set <- input[i:(i + output_length -1)]return(sum(range(contiguous_set))) } }}contiguous_sum(input_reduced, target)```I appreciate that there's some redundant calculation in this method. The vectors of accumulated sums can contain numbers larger than the target (if writing our own loop, we could break as soon as the accumulated sum got too big). Also, in retrospect, we could have only run `accumulate` once, then in the second iteration of the loop, subtracted `input[1]` from the result, in the third iteration subtracted `input[2]` from that result, etc. However, the function as written is concise and easy to understand, and gets our answer in around a second, so that will do!## Day 10: [Adapter Array](https://adventofcode.com/2020/day/10)[My day 10 data](data/AoC_day10.txt)#### Part 1: Adapter DistributionThis is simply a case of ordering the adapters, prepending 0 and appending the the max in the list plus three, then finding the differences.```{r message = FALSE}library(dplyr)``````{r message = FALSE}adapters <- readLines("data/AoC_day10.txt") %>% as.integer()adapter_diffs <- c(adapters, 0, max(adapters) + 3) %>% sort() %>% diff()sum(adapter_diffs == 1) * sum(adapter_diffs == 3)```#### Part 2: Adapter combinationsInstead of building up sequences of adapters, we see what we can remove from the full list. First, we check the diffs: are they just 1 and 3 or are there any 2s?```{r}table(adapter_diffs)```We can't remove an adapter if its difference with the previous adapter is 3, otherwise the difference between the adapters on either side of it will be too big.What about diffs of 1? It depends how many ones there are around it. We can check this using the `rle()` (run length encoding) function```{r}runs <-rle(adapter_diffs)runs```What is the distribution of lengths of sequences of 1s?```{r}runs_table <-table(runs$lengths) runs_table```We have at most four diffs of 1 in a row.We need to check that if we remove an adapter, the new differences do not exceed 3. Example sequences really helped me figure out what's going on here:- If the diff sequence is ..., 3, 1, 3,... (e.g. adapters 1, 4, 5, 8) - 1 option to keep as is - We cannot remove any adapters - **1 option in total**- If the diff sequence is ..., 3, 1, 1, 3,... (e.g. adapters 1, 4, 5, 6, 9) - 1 option to keep as is - 1 option to remove one adapter (e.g. the 5) - we cannot remove two adapters - **2 options total**- If the diff sequence is ..., 3, 1, 1, 1, 3,... (e.g. adapters 1, 4, 5, 6, 7, 10) - 1 option to keep as is - 2 options to remove one adapter (e.g. the 5 or 6) - 1 options to remove two adapters (e.g. the 5 and 6) - We cannot remove three adapters - **4 options total**- If the diff sequence is ..., 3, 1, 1, 1, 1, 3,... (e.g. adapters 1, 4, 5, 6, 7, 8, 11) - 1 option to keep as is - 3 options to remove one adapter (e.g. 5, 6, or 7) - 3 options to remove two adapters (e.g. any two of 5, 6, and 7) - We cannot remove three adapters - **7 options total**Finally, we multiply each run length of difference of 1s with the number of options we have for removing adapters, then take the product of those products.```{r}runs_df <-tibble(lengths = runs$lengths, values = runs$values)options <-tibble(lengths =c(1,2,3,4), options =c(1,2,4,7))runs_df %>%filter(values ==1) %>%left_join(options, by ="lengths") %>%summarise(prod_options =prod(options)) %>%pull(prod_options) %>%format(scientific =FALSE) ```## Day 11: [Seating System](https://adventofcode.com/2020/day/11)[My day 11 data](data/AoC_day11.txt)#### Part 1: Changing layoutMy code for Day 11 runs a little slow (about 10 seconds for Part 1 and 80 seconds for Part 2), so for the sake of being able to rebuild this page quickly as I keep updating it working through the challenges, I will demonstrate this code with the test input provided as an example.```{r message = FALSE}library(dplyr)library(stringr)library(tidyr)```First we read in the data and convert it to a matrix (using the [`datapasta`](https://github.com/MilesMcBain/datapasta) package for the test input):```{r eval = TRUE, message = FALSE}# layout <- readr::read_tsv("data/AoC_day11.txt", col_names = FALSE)layout <- tibble::tribble( ~X1, "L.LL.LL.LL", "LLLLLLL.LL", "L.L.L..L..", "LLLL.LL.LL", "L.LL.LL.LL", "L.LLLLL.LL", "..L.L.....", "LLLLLLLLLL", "L.LLLLLL.L", "L.LLLLL.LL" )``````{r}# get number of columns for matrixnum_col <- layout %>%mutate(length =str_length(X1)) %>%slice(1) %>%pull(length)# split layout into characters and turn to vectorlayout_vec <- layout %>%mutate(X1 =strsplit(X1, split =character(0), fixed =TRUE)) %>%pull(X1) %>%unlist()# organise into matrixinitial_layout <-matrix(layout_vec, ncol = num_col, byrow =TRUE)```Next, we write a helper function that, given a matrix and row and column indices, returns a vector of the adjacent seats. We need to take care when indexing into the matrix, so we treat all corner and edge cases separately. Fiddly, but gets the job done.```{r}get_adj <-function(mat, i,j) { nr <-nrow(mat) nc <-ncol(mat)# corner casesif (i ==1& j ==1) {adj <-c(mat[1,2], mat[2,1:2])}elseif (i ==1& j == nc) {adj <-c(mat[1,(nc-1)], mat[2,(nc-1):nc])}elseif (i == nr & j ==1) {adj <-c(mat[nr,2], mat[nr-1,1:2])}elseif (i == nr & j == nc) {adj <-c(mat[nr-1, (nc-1):nc], mat[nr, nc-1])} # edge caseselseif (i ==1) {adj <-c(mat[1, c(j-1,j+1)], mat[2, (j-1):(j+1)])}elseif (i == nr) {adj <-c(mat[nr, c(j-1,j+1)], mat[nr-1, (j-1):(j+1)])}elseif (j ==1) {adj <-c(mat[c(i-1, i+1), 1], mat[(i-1):(i+1), 2])}elseif (j == nc) {adj <-c(mat[c(i-1, i+1), nc], mat[(i-1):(i+1), nc-1])}# inside caseselse {adj <-c(mat[i-1,(j-1):(j+1)], mat[i,c(j-1,j+1)], mat[i+1,(j-1):(j+1)])} adj}```Once we have a vector of surrounding seats, we can apply the rules in the problem to determine whether a given seat needs to change state. The `needs_changing` helper function does that. It's overkill at this point to give options to specify the function for finding the vector of seats to check, and the maximum number of occupied seats people can tolerate around them, but (spolier alert) I put in these options when working on the challenge in Part 2.```{r}needs_changing <-function(mat, i,j, get_surround = get_adj, max_occupied =4) { surround <-get_surround(mat, i,j) n_occupied <-sum(surround =="#")if ((mat[i,j] =="L") & (n_occupied ==0)) return(TRUE)elseif ((mat[i,j] =="#") & (n_occupied >= max_occupied)) {return(TRUE) }elsereturn(FALSE)}```Since floor spaces don't change, we only need to consider seats. We save the indices of the seats into a data frame, so we can vectorise over it using `tidyverse` functions. However, when we've determined the seats that need changing, using our `needs_changing` function, we need to convert those indices from a data.frame into a matrix, in order to index into the layout matrix appropriately and make the changes. ```{r}seats <-which(initial_layout !=".", arr.ind =TRUE)seats_df <-as.data.frame(seats) %>%rename(i = row, j = col)``````{r}layout <- initial_layoutiters <-0# loop until there are no further changesrepeat { change <-0 seats_to_change <- seats_df %>%rowwise() %>%mutate(change_seat =needs_changing(layout,i,j)) change <-sum(seats_to_change$change_seat)if (change ==0) break indices_to_change <- seats_to_change %>%filter(change_seat) %>%select(i,j) %>%as.matrix() layout[indices_to_change] <-setdiff(c("L", "#"), layout[indices_to_change]) iters <- iters +1}part_1_iters <- iterssum(layout=="#")```On the test set, this takes `r part_1_iters` iterations. On the full data set, my answer is 2316, and it took 107 iterations.#### Part 2: Looking furtherNow, people look to the first seat they can see in each direction, and will change from occupied to unoccupied if five or more of them are occupied.The plan is to write a function that extracts full vectors from a given seat to the edge of the layout matrix in each of the eight directions, then finds the first seat in each of those directions, and finally collects those into a vector of the seats under consideration when determining if a change is needed. Then I can reuse the loop from Part 1, just changing the arguments in the calls to `needs_changing`.Here's a helper function to get the first seat in a vector looking in one direction:```{r}get_first_seat_from_vec <-function(vec) {if (any(vec %in%c("#", "L"))) {return(vec[min(which(vec !="."))]) }return(NA)}```Now, if I thought getting adjacent seats to a given seat in Part 1 was fiddly, it's nothing on getting a vector from a given seat to the edge of the matrix. There are many cases to consider to make we we don't go out of bounds. In the diagonal directions, first we get a matrix of the indices of the matrix we need, then subset into the matrix accordingly.```{r}# takes a layout matrix (elements ".", "#", "L")# returns vector with first "L" or "#" encountered in each directionget_first_seat <-function(mat, i,j) { nr <-nrow(mat) nc <-ncol(mat)# Northif (i ==1) N <-NAif (i >1) N <- mat[(i-1):1,j]# Southif (i == nr) S <-NAif (i < nr) S <- mat[(i+1):nr,j]# Eastif (j == nc) E <-NAif (j < nc) E <- mat[i, (j+1):nc]# Westif (j ==1) W <-NAif (j >1) W <- mat[i, (j-1):1]# how far in each direction to edge of matrix to_N <- i -1 to_S <- nr - i to_E <- nc - j to_W <- j -1# North-West NW_length <-min(to_N, to_W)if (i ==1| j ==1) NW <-NAelse { mat_index <-matrix(c((i-1):(i-NW_length), (j-1):(j-NW_length)), ncol =2) NW <- mat[mat_index] }# North-East NE_length <-min(to_N, to_E)if (i ==1| j == nc) NE <-NAelse { mat_index <-matrix(c((i-1):(i-NE_length), (j+1):(j+NE_length)), ncol =2) NE <- mat[mat_index] }# South-East SE_length <-min(to_S, to_E)if (i == nr | j == nc) SE <-NAelse { mat_index <-matrix(c((i+1):(i+SE_length), (j+1):(j+SE_length)), ncol =2) SE <- mat[mat_index] }# South-West SW_length <-min(to_S, to_W)if (i == nr | j ==1) SW <-NAelse { mat_index <-matrix(c((i+1):(i+SW_length), (j-1):(j-SW_length)), ncol =2) SW <- mat[mat_index] }# vectors from mat[i,j] to the edge in each direction all_vecs <- (list(N = N, S = S, E = E, W = W, NW = NW, NE = NE, SE = SE, SW = SW))# the first seat in each direction, collapsed to a vector first_seats <- purrr::map_chr(all_vecs, get_first_seat_from_vec)# remove NAs from list and return# (these occur either when starting on an edge, # or when there are no seats in a given direction)return(first_seats[!is.na(first_seats)])}``````{r}layout <- initial_layoutiters <-0# loop until there are no further changesrepeat { change <-0 seats_to_change <- seats_df %>%rowwise() %>%mutate(change_seat =needs_changing(layout,i,j, get_first_seat, 5)) change <-sum(seats_to_change$change_seat)if (change ==0) break indices_to_change <- seats_to_change %>%filter(change_seat) %>%select(i,j) %>%as.matrix() layout[indices_to_change] <-setdiff(c("L", "#"), layout[indices_to_change]) iters <- iters +1}part_2_iters <- iterssum(layout=="#")```On the test set, this takes `r part_2_iters` iterations. On the full data set, my answer is 2128, and it took 87 iterations. Given this is fewer iterations than in Part 1, it must be my code for getting the first seat that's slowing things down. I am unsatisfied both by how many lines of code this has taken as well as the time taken to run. The introduction to Advent of Code says that each challenge has a solution that will complete in at most 15 seconds on ten year old hardware. So clearly there's a better way of doing this. Perhaps something to revisit in the future.## NextI was late to the game, and that was as far as I managed to get in December 2020. I'm looking forward to taking on the challenge again in 2021!<!--------------- appendices go here ----------------->```{r appendix}#| echo: falsesource("../../R/appendix.R")insert_appendix( repo_spec = "EllaKaye/ellakaye.co.uk", name = long_slug)```##### Session info {.appendix}<details><summary>Toggle</summary>```{r}#| echo: falselibrary(sessioninfo)# save the session info as an objectpkg_session <-session_info(pkgs ="attached")# get the quarto versionquarto_version <-system("quarto --version", intern =TRUE)# inject the quarto infopkg_session$platform$quarto <-paste(system("quarto --version", intern =TRUE), "@", quarto::quarto_path() )# print it outpkg_session```</details>