Page MenuHomePhabricator

mailman3 encoding issues on unsubscription emails
Open, MediumPublicBUG REPORT

Description

I've just unsubscribed a member from a Wikimedia mailing list whose account (which contains his afaik never-publicly-disclosed real name and therefore won't publish here) has the í character. This resulted in a unsubscription email body as follows:

=?utf-8?q?[redacted]?= <$member_email> has been removed from $mailing_list.

I assume this is some sort of encoding bug on mailman3?

Thanks.

Event Timeline

akosiaris triaged this task as Medium priority.Sep 9 2021, 1:09 PM
akosiaris added subscribers: Ladsgroup, akosiaris.

Hi @MarcoAurelio can you please clarify what the issue is? e.g. was something in the part that is marked as "[redacted]" badly encoded? I haven't seen yet an unsubscription email from mailman3 so I can't spot the issue.

Hello @akosiaris. The whole subscriber/account name is a random
sequence of letters and symbols, not only the "redacted" part. I've
forwarded the email I got to you. I hope that makes things more clear.
Best regards.

I 've received the unredacted body of the message from @MarcoAurelio. It is typical Quoted-Printable. This isn't really garbled, but rather a way to put 8bit data to 7bit data (historically). The character switched is in unicode 0xC3 0xAD (https://www.compart.com/en/unicode/U+00ED) which is why that =C3=AD is there (and the =?utf8?q? prefix to denote encoding and the fact we are in quoted printable mode). Historically this has been super useful in the Subject line of emails as many MTAs (Mail Transfer Agents, e.g. sendmail, postfix, exim) are configured to not be happy receiving non ASCII subject lines (it has even been a telltale sign of spam in the past).

What is super interesting to me that this happened in the body and not the subject though, where usually we can just have something like

MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: base64

That being said, https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/thread/BBIFI2IWQLXASBZ4AC3UBTXAIGMBHLF4/ suggests that mailman3 will put there

Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

which should also work just fine. @MarcoAurelio what is the user's MUA? This is something that should be supported since like forever in a typical MUA.

There is also https://gitlab.com/mailman/mailman/-/issues/859 and fix is at https://gitlab.com/mailman/mailman/-/merge_requests/822 (4 months ago as of this writing) ,which might be related. But 'ignore' that is used there, is probably gonna break this even worse.

@MarcoAurelio: Do you know the user's exact email client being used?

Sorry for the late reply. Mi client is gmail web interface and his was
Yahoo. The mail with the weird encoding was however received from
lists.wikimedia.org on my gmail account, upon unsubscribing him. Thanks.--
M. A.