Python email collection tutorial: using POP3 protocol
1. Overview of Email Collection Agreement
In the world of email, sending relies on SMTP, and there are two main options for receiving:
- POP3 (Post Office Protocol version 3) - A lightweight protocol focusing on "download + local processing", simple and direct.
- IMAP (Internet Message Access Protocol) - an advanced protocol that supports "cloud synchronization + multi-device management" and is suitable for multi-device collaboration.
This tutorial focuses on the simpler and easier-to-use POP3, using Python built-in modules throughout the process to quickly implement the complete process of grabbing emails from the mailbox.
2. POP3 minimalist workflow
Without getting bogged down in protocol details, remember these six steps to understand the code logic:
- The client connects to the server: port 110 (plain text, strongly not recommended) or 995 (SSL encryption, required for production environment)
- Authentication (login)
- Get a list of IDs and sizes of all messages
- Download specified emails on demand
- (Optional) Mark/delete downloaded messages on server
- Actively disconnect
3. Python full process implementation
No need to install third-party libraries, Python standard librarypoplib + emailThe whole family bucket is completely sufficient.
3.1 Module introduction
First import all the tools you want to use:
3.2 Secure connection function
Plaintext POP3 is basically disabled in modern mailboxes. It would be more convenient to encapsulate a universal connection function that uses SSL by default:
3.3 Get email metadata list
First, pull the ID and size list of the emails. You can use this to determine the range to be downloaded and avoid downloading very large emails at once:
3.4 Download a single original email
After getting the ID, directly download the byte stream original content of the email for subsequent use.emailModule analysis:
3.5 Core parsing tools
POP3 only transmits raw bytes, and the parsing of the email header, body, and attachments depends entirely onemailmodule, two key issues need to be solved here:
- Encoding confusion: Chinese email headers/texts often use multiple encodings such as GBK, GB2312, UTF-8, etc.
- Multi-part emails: emails with attachments, plain text + HTML dual formats are nested structures
3.5.1 General string decoding function
3.5.2 Guess the character encoding of the email part
3.5.3 Recursively print email summary
When processing nested multi-part emails, only print the key content (the first 200 words of the text, the name of the attachment) to avoid swiping the screen:
4. Complete runnable example
String the previous functions together and encapsulate them into a main entrance:
5. Pitfall avoidance guide (safe + practical)
5.1 Safety first
- Never hardcode passwords: Use environment variables (
os.getenv("EMAIL_PWD")) or encrypted configuration files (such aspython-dotenv+.envfile and join.gitignore)。 - Two-factor verification email must use "application-specific password": QQ, 163, Gmail, etc. need to be generated separately, and the master password cannot be used.
- SSL must be enabled: Port 995 is used by default, and the plaintext port 110 is now almost entirely blocked.
5.2 Practical Notes
- Avoid frequent calls: Repeated connections within a short period of time will be restricted or even blocked by the email service provider. It is recommended to add appropriate intervals.
- Email ID is session-level: Each time you log in again, the email ID may change, so do not save it across sessions.
- Be careful when deleting emails:
server.dele(mail_id)Just mark, must wait untilserver.quit()Only then will it be truly deleted. If you regret it, you can use it.server.rset()Unmark all flags. - Don’t panic about encoding issues: GBK is commonly used in domestic mailboxes, and UTF-8 is mostly used in foreign countries. Our code has been added with two layers of cover, and it can basically display normally.
6. Simple advanced direction
If you want to further expand the functionality, you can try:
- Attachment Download: Traverse multiple parts of the email and find
Content-Disposition: attachmentpart and save it locally. - Mail filtering: First pull all email headers (use
server.top(mail_id, 0)), filter based on subject, sender, date and other conditions before downloading the text. - Timing capture: combination
scheduleThe library creates scheduled tasks to automatically receive emails. - Data persistence: Store the parsed emails in SQLite, MySQL or Elasticsearch for further analysis.
7. Summary
This tutorial uses Python built-in modules to implement the core functions of receiving emails via POP3: from establishing a secure connection, pulling metadata, downloading and parsing, to avoiding pitfalls, everything is covered. The entire process has a small amount of code and clear logic, making it very suitable as an introductory practice for email automation.

