Regular Expressions for Removing Email Signatures

I’ve been scouring Google, searching for a regular expression that can help me filter out email signatures from email text. I guess email body processing is kind of niche for NLP as I found alot of email address parsing regex but nothing related to email signature parsing. So I made one and wanted to share it with anyone who might benefit from it.

This regex code removes everything after the email ending. When you end the email with a “Cheers” or Sincerely, that phrase and everything following it, will be matched.

(\w*\s)?([B|b]est|[R|r]egards|Have a|[C|c]heers|[S|s]incerely|[T|t]ake care|Looking forward|Fond|Kind|Yours)(\s*.*)

Take this piece of regex and paste it to this regex tester website and test it out on your email text: https://www.regextester.com/

Then the Python Regular expression module can be used to substitute the matched text with an empty string. Some thing like this, but don’t forget to import re

nosig=re.sub(r'(\w*\s)?([B|b]est|[R|r]egards|Have a|[C|c]heers|[S|s]incerely|[T|t]ake care|Looking forward|Fond|Kind|Yours)(\s*.*)’,”, msg)

Be sure to remove newline characters ‘\n’ (by using the replace method or something else) before proceeding with this regular expression as newline characters, for some reason, confuse Python

Feel free to contact me with any feedback on how to improve this regex or if it was useful to you, I would love to hear about it!

References

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: