Skip to main content

Patrick Wang

Advent of Code 2020 Day 6

Part 1

In Day 6, we are told to look at answers to a 26-yes-or-no-question form. Each of these questions is represented by a letter from a to z. If we consider the following input:

abcx
abcy
abcz

We have a group of three answers to the form, and in total, six questions were answered with a “yes”: a, b, c. x, y, and z.

Groups of answers are separated by a white line. So for example, we have two groups of answers illustrated just below:

a

a
b
c

The objective of Part 1 is to compute the sum of questions answered with a “yes” in each group.

When we need unicity (as stated in this day’s example), a good intuition is to usually think of sets. Indeed, sets can assure us that each element in a set is present there only once. So, our problem now is to correctly parse the input data and to create sets for each group of answers. Here, two difficulties:

  • Some lines can have multiple letters which need to be splitted before being added to the set (indeed, abc is different from a, b, and c).
  • We need to make sure that white lines are handled correctly, i.e., they mark the end of a set and the beginning of another one.

And finally, we need to count the numbers of questions answered with a “yes”. This can be done with a simple count variable that is incremented by the length of each set we are creating. And so, our answer can look like this:

count = 0
for line in lines:
    letters = set()
    if not line:
        count += len(letters)
        letters.clear()
    else:
        for letter in line:
            letters.add(letter)
count += len(letters)   # We don't forget to add the length of the last group because
                        # there is no white line separation at the end of the file
print(count)

Part 2

Part 2 is very similar to Part 1, except that we now have to take into account only questions that were answered with a “yes” by everyone in the group. So, if we took the example below:

abc

a
b
c

ab
ac

a
a
a
a

b

The expected output would be 6 because:

  • There’s only one person in the first group, so “everyone” in the group answered 3 questions;
  • In group 2, there is no question answered in common by all the group members;
  • In group 3, only question a is in common between both group members;
  • In group 4, same as group 3;
  • In group 5, same as group 1.

The formulation of Part 2 has us rethink the way we represent the data. Indeed, we now have to more explicitely look at groups. One thing that we can observe is that because a question has to be answered “yes” by each member of a group, said question appears exactly that amount of times. And so, we can represent our data with a list of tuples, where each tuple would contain the group size and all the letters as a single string.

groups = []
group_size = 0
answers = ""
for line in lines:
    if not line:
        # We add the tuple to the group, then reinitialise all the values
        groups.append((group_size, answers))
        group_size = 0
        answers = ""
    else:
        group_size += 1
        answers += line
# Once again, we don't forget to add the last group
groups.append((group_size, answers))

Now that we have the groups, we can apply what was described above. One question remains though: How do we know which letters to look for? My proposition is to look at the first letter, count its number of occurrences, and then remove it with replace(). That should give something like this:

count = 0
for group_size, answers in groups:
    while answers:
        if answers.count(answers[0]) == group_size:
            count += 1
        # We remove all occurrences of answers[0] by replacing them with empty strings
        answers = answers.replace(answers[0], "")
print(count)

Concepts and difficulties

In this problem, finding a good representation for the input data is really important. Day 6’s problem was a nice opportunity to introduce sets, which we haven’t seen before. Once we have implemented sets, Part 1 becomes quite straightforward.

Part 2 was also a little bit more difficult. Since there are lots of different ways to represent the data, the “search” algorithm might be more or less complex to write. Actually, what I’ve presented here differs from what I did when solving the problem. And to be honest, the solution presented here is much nicer than the one I wrote when solving the problem.

comments powered by Disqus