access database

#Access database

Introduction to modern data storage and database technology

All kinds of data generated during the running of the program live in memory by default. Once the program exits and the device is restarted, these data will disappear completely. If you want to truly "retain" the data, you must rely on data persistence technology, and the database is the most versatile and powerful solution among them.

Let's start with the most basic file storage and step into the world of databases.


1. From simple to professional: the evolution of data persistence

1.1 Text file: a basic solution that everyone can get started with

The simplest persistence is to write the data directly as plain text. The most common is the CSV (Comma Separated Values) format:

name,score
Michael,99
Bob,85
Bart,59
Lisa,87

advantage

  • ✅ Completely human-readable, you can open and edit it with Notepad
  • ✅ Simple format, almost all programming languages ​​come with or provide third-party support
  • ✅ The threshold for export and import is extremely low, Excel can also be opened directly

shortcoming

  • ❌ There is no unified standard: sometimes semicolons are used, sometimes tabs are used for separation, and different encodings will result in garbled characters.
  • ❌ Poor query efficiency: If you want to count the "number of people with a score of 80 or above", you have to traverse the entire file
  • ❌ Does not support complex data: nested and associated structures cannot be saved at all

1.2 JSON Files: A Structured But Non-Professional Choice

JSON is the "universal language" of web development and is often used to store structured data:

[
  {"name":"Michael","score":99},
  {"name":"Bob","score":85},
  {"name":"Bart","score":59},
  {"name":"Lisa","score":87}
]

advantage

  • ✅ Supports more flexible structures such as nesting and key-value pairing
  • ✅ Web front-end / back-end seamless connection
  • ✅ The document is clear and readable

shortcoming

  • ❌ Large storage redundancy: all key names must be written repeatedly for each piece of data
  • ❌ Large data sets are slow: there is no index, and the query still requires full traversal
  • ❌ The risk of modification is high: if you accidentally delete a comma, the entire file will fail to parse.

1.3 Binary formats: Efficient but “black box” modern tools

If you only care about storage efficiency and parsing speed, and not about human readability, the binary format is a better choice:

  • Protocol Buffers (Protobuf): developed by Google, small in size, fast in parsing, and good in cross-language support
  • MessagePack: similar to JSON but with higher compression and faster speed
  • Avro: produced by Apache, supports dynamic data types, suitable for big data scenarios

2. Core and development of database system

When the amount of data becomes large, frequent queries are required, or it is necessary to ensure that "data must not be missed", the limitations of file storage will be completely exposed - this is when the database system comes on stage.

2.1 A brief history of database development (minimalist version)

  1. 1950s–1960s: File system stage, only sequential reading and writing
  2. 1960s–1970s: hierarchical/network database, the structure is too fixed
  3. 1970s: The birth of relational database completely changed the way of data management.
  4. 2000s: The NoSQL movement rises to respond to the unstructured, high-concurrency needs of the Web 2.0 era
  5. 2010s to present: NewSQL and cloud native database explode, integrating the advantages of SQL and NoSQL

2.2 Relational database: the most classic database type

The core of a relational database is tables, which can be related to each other through "keys" - a bit like different Excel worksheets, but much more powerful.

Let’s look at a simple school management system example:

Grade table (grade)Class table (class)
grade_idnameclass_id
1Grade 111
2Grade 212
3Grade 321

If you want to find out "all classes in first grade", you only need to write a SQL statement:

SELECT * FROM Class WHERE grade_id = 1;

Such concise query capabilities are the huge advantages brought by SQL (Structured Query Language).


3. How to choose a modern database?

There are many types of databases now, and we can roughly divide them into three categories according to "data model".

3.1 relational-database (SQL): the first choice for transaction security

TypeRepresentative databaseApplicable scenarios
BusinessOracle, SQL Server, DB2Scenarios with extremely high stability and functionality requirements such as banks and e-commerce core systems
Open sourceMySQL / MariaDBThe most mainstream web application database, widely used by WordPress and Taobao in the early days
Open sourcePostgreSQLThe most powerful open source relational library, supporting JSON, GIS, and window functions, suitable for complex business and data analysis
EmbeddedSQLiteZero configuration, single file, suitable for mobile applications, desktop tools, and small program prototypes

🧠 Summary: If you need strong data consistency (such as order deductions, account transfers), relational-database is still the first choice.


3.2 NoSQL database: the savior of unstructured/high concurrency

NoSQL is not "the negation of SQL", but Not Only SQL - it does not require the table structure to be fixed and can easily handle unstructured data and high concurrent access.

TypeRepresentativeCore FeaturesApplicable Scenarios
Document databaseMongoDB, CouchDBStore JSON-like documents with flexible structureSocial dynamics, e-commerce products, log system
Key-value storageRedis, DynamoDBAccess data like a dictionary, extremely fast reading and writingCache, session, ranking list, flash sales
Wide column storageCassandra, HBaseStrong horizontal scalability, supports massive dataInternet of Things, time series, user behavior analysis
Graph databaseNeo4j, ArangoDBStore nodes and relationships, good at complex associated queriesSocial networks, knowledge graphs, recommendation systems

🔥 Tips: NoSQL is not meant to replace SQL, but to choose the most appropriate tool based on business characteristics.


3.3 NewSQL database: Can you have your cake and eat it too?

NewSQL combines the transaction security and complex query of SQL with the horizontal expansion and high concurrency capabilities of NoSQL, and is particularly suitable for cloud-native and distributed scenarios:

  • TiDB: produced by domestic PingCAP, fully compatible with MySQL
  • CockroachDB: Popular foreign distributed SQL, supports cross-regional deployment
  • YugabyteDB: compatible with PostgreSQL, open source and free

4. How to connect to the database in Python?

Python is a commonly used language for data analysis and web development, and connecting to databases is also very simple. Listed below are several mainstream connection methods.

4.1 Native database driver

If you pursue ultimate efficiency or need to write complex SQL, you can use the native driver directly.

# SQLite:Python3 自带,无需安装
import sqlite3
conn = sqlite3.connect('school.db')  # 单文件模式
# conn = sqlite3.connect(':memory:')  # 内存模式(重启后数据丢失)

# MySQL / MariaDB:pip install pymysql
import pymysql
conn = pymysql.connect(
    host='localhost',
    user='root',
    password='your_password',
    db='school'
)

# PostgreSQL:pip install psycopg2-binary
import psycopg2
conn = psycopg2.connect(
    dbname='school',
    user='postgres',
    password='your_password',
    host='localhost'
)

# MongoDB:pip install pymongo
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client.school  # 选择数据库

4.2 ORM (Object Relational Mapping): Writing code more elegantly

ORM allows you to use Python classes and objects to operate the database without having to write a lot of SQL by hand:

  • SQLAlchemy: The most comprehensive Python SQL toolkit, supporting multiple databases
  • django ORM: built-in django framework, suitable for rapid web development
  • Peewee: lightweight ORM, concise code, suitable for small projects
  • MongoEngine: ODM (Object Document Mapping) for MongoDB

Suggestion: In the learning stage, you can start with native SQL, and then use ORM to improve development efficiency after understanding the underlying principles.


5. Learning suggestions for novices

A solid foundation is always the most important. It is recommended to follow the following steps step by step:

  1. Introductory stage: First learn SQL and relational-database (it is recommended to practice with SQLite or MySQL), and master table creation, addition, deletion, modification and query, and basic related queries
  2. Advanced stage: Learn database design (paradigm, index), Transaction and concurrency control
  3. On-demand expansion: Learn NoSQL and cloud-native databases according to project needs

💡 Future Direction: AI-driven vector databases and time series databases commonly used in the Internet of Things have been popular directions in recent years, but it is strongly recommended to lay a solid foundation first before exploring these new areas.