Python third-party module guide

"Don't reinvent the wheel" - this sentence has almost become the motto of the Python community. The reason why Python can span many fields such as data science, web development, operation and maintenance automation, and so on is its large and active third-party module ecosystem. Through official repositories and efficient package management tools, you can obtain high-quality open source components in seconds and focus on business logic.

This article will take you step by step to get started with the Python module world, from package index, installation tools, quick overview of common modules to pitfall avoidance techniques. The article has a clear structure and ready-to-use examples, making it suitable for both beginners and advanced learners.


PyPI - Official Python package index

PyPI, the full name is Python Package Index, is Python's official online software warehouse, nicknamed "Cheese Warehouse" in the community. Currently, it has included more than 500,000 open source projects. Almost all the functions you can think of can find corresponding modules here.

How to find the required package

  1. Web browsing: Visit pypi.org, search by keyword, and view detailed information such as package version, download volume, license, dependencies, sample code, etc.
  2. Command Line Search: Old versionpip searchIt has been officially disabled. You can install a lightweight search tool:
    pip install pypisearch
    pypisearch requests   # 快速搜索 HTTP 相关库

Mainstream installation and management methods

Choose the right tool and get twice the result with half the effort. There are two most mainstream installation methods in the Python ecosystem: full-scenariopip, and optimized for data scienceconda

Common tools pip

pipIt is Python's built-in package manager (included in Python 3.4 and above). No additional installation is required. It is simple and powerful.

Quick check of commonly used commands

# 安装最新版
pip install requests

# 安装指定版本(推荐,避免 API 突变)
pip install requests==2.31.0

# 安装兼容版本(只更新修复,不升级大版本)
pip install requests~=2.31.0

# 升级已安装的包
pip install --upgrade requests

# 卸载包
pip uninstall requests

# 查看当前环境所有包
pip list

# 查看包详细信息
pip show requests

Batch operations and security tips

  • Dependency File Management: Write project dependenciesrequirements.txt, to facilitate cross-environment deployment.
    pip freeze > requirements.txt   # 导出
    pip install -r requirements.txt # 安装
  • Security Check: To prevent downloaded packages from being tampered with, hash mode can be used.
    pip install pip-tools
    pip-compile requirements.in --generate-hashes
    pip install --require-hashes -r requirements.txt

Data Science Exclusive Anaconda/Miniconda

If you work in data science or machine learning,condaIt can save you a lot of trouble of compilation and dependency configuration. Anaconda is a full distribution (~3GB), while Miniconda only keeps the core (~400MB), the latter is recommended.

Get started with Miniconda quickly

Download and install from Miniconda 官网. When installing, remember to check "Add to PATH" (Windows) or configure the Shell according to the instructions.

Core commands:

# 创建环境(指定 Python 版本)
conda create -n data_env python=3.10

# 激活环境
conda activate data_env

# 安装数据科学套件(自动解决二进制依赖)
conda install numpy pandas matplotlib scikit-learn

# 退出环境
conda deactivate

# 列出所有环境
conda env list

Quick overview of frequently used third-party libraries

The following is a list of core modules that both beginners and advanced users should know, categorized by development scenarios.

📊 Data Science and Machine Learning

The bottom layer of this type of library is mostly implemented in C/C++/Fortran, and its performance is extremely strong:

  • NumPy: multidimensional array objectsndarrayand vectorized operations are the cornerstone of data science.
  • Pandas: A tool for reading, writing, cleaning, grouping, and aggregating structured data, supporting multiple formats such as CSV/Excel/JSON.
  • Matplotlib: The most flexible visualization library, which can generate line charts, bar charts, scatter plots, etc.
  • Scikit-learn: entry-level machine learning library, integrating classic algorithms such as classification, regression, clustering, and dimensionality reduction, with a unified and friendly API.

🌐 Web development and API interaction

  • Requests: HTTP request library, known as "urllib for humans", the first choice for reading web pages and adjusting APIs.
  • FastAPI: A modern high-performance framework that uses type hints to automatically generate interactive documents, with performance comparable to Node.js/Go.
  • Flask: lightweight web framework, flexible and free, suitable for rapid development of small applications and APIs.
  • django: A full-featured framework with built-in ORM, backend management, and user authentication, suitable for medium and large websites.

🤖 Automation and Tools

  • Beautiful Soup 4: HTML/XML parsing library, used with Requests to quickly crawl information.
  • Pillow: Image processing library that supports scaling, cropping, filters, format conversion, etc.
  • TQDM: One line of code generates a progress bar, making loop processing no longer boring.
  • PyAutoGUI: simulates mouse and keyboard operations, suitable for automating repetitive GUI tasks.

Avoid pitfalls and best practices

1. Be sure to use a virtual environment

**Never install project dependencies directly in system Python! ** Different projects may depend on different versions of the same module, which can cause hellish conflicts. Use virtual environments to make each project independent.

Classic way (built-invenv):

# 创建环境
python -m venv my_project_env

# 激活
# Linux/Mac
source my_project_env/bin/activate
# Windows PowerShell
my_project_env\Scripts\Activate.ps1
# Windows CMD
my_project_env\Scripts\activate.bat

# 退出
deactivate

2. Standardize record dependencies

  • Quick export:pip freeze > requirements.txt
  • More modern solution: usepipenvorpoetry, which can lock down precise versions and automatically manage virtual environments.

3. Domestic mirror acceleration

PyPI servers are located abroad and download speeds may be extremely slow. It is recommended to configure domestic mirroring:

# 临时使用清华源
pip install requests -i https://pypi.tuna.tsinghua.edu.cn/simple

# 永久配置
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

4. Version and security check

  • Version Constraint: Use as much as possible==or~=Specify the compatibility range to avoid unlimited openness constraints.
  • Vulnerability Scan: Regularly check dependencies for known vulnerabilities.
    pip install pip-audit
    pip-audit

By making good use of third-party modules, you will free up your creativity and devote more time to your core business. But also remember: For small functions that can be implemented using the standard library, try not to introduce additional dependencies. This will make your project lighter and easier to maintain. I hope this guide can help you get started efficiently and navigate the ocean of Python modules with ease.