Jump to content

Manual:Schema changes

From mediawiki.org

This is a help page describing how to build schema change patches for MediaWiki core and its extensions for people who need to change the database layout as part of their development work.

Glossary

[edit]
  • Schema - Current database layout of MediaWiki.
  • Schema change - An atomic part of schema migration that is being added through a commit. For example "Adding table foo", "Dropping column bar from table baz" and so on.
  • Database management system (DBMS) - The underlying technology handling the MediaWiki database. The supported ones in MediaWiki core are: MySQL, SQLite and PostgreSQL. It can be more using extensions.
  • Data definition language (DDL) - Syntax that defines schema and schema changes (It can differ in different DBMSes). For example "ALTER TABLE", "DROP COLUMN". They are saved as ".sql" files.
  • Database Abstraction Layer (DBAL) - The bridge between DBMS-independent database schema and schema change definitions and the actual DDLs.

Overview

[edit]

Each schema change needs to handle two parts. First, new installations need to have the new schema instead of the old one and second, old installation must be able to upgrade to the new one. For the first part, we fix the schema DDL files (saved as tables.sql) and for the second part, we provide "ALTER TABLE" patches and wire them into updater logic.

We are at middle of a migration from having one dedicated DDL per DBMS to only one abstracted schema. Depending on the table, you might change several raw SQL files or only change one json file and generate the SQL files using a maintenance script.

Manual (deprecated)

[edit]

In this method which is used until 2020, when making a schema change:

  1. Change tables.sql in two different places (maintenance/tables.sql for MySQL and maintenance/postgres/tables.sql for PostgreSQL)
  2. Make a schema change DDL file as the upgrade path of current installations for MySQL and put the file in maintenance/archives/
    • If other DBMS types don't work with that patch, you need to make a dedicated patch for them. For example, SQLite does not have ALTER TABLE, meaning you need to make a temporary table, copy the data, drop the old table and rename the new table to the old name. Example
  3. Wire these DDL files (from step 2) into MysqlUpdater, SqliteUpdater, PostgresUpdater

Examples

[edit]

Automatically generated

[edit]
MediaWiki version:
1.35

You can find the abstract schema for all of MediaWiki core's tables in "maintenance/tables.json". This abstraction is using Doctrine DBAL library to generate DDL files. On your local working MediaWiki install, with the relevant extension installed, you should:

  1. Change the tables.json structure.
  2. Run maintenance script to generate the three DDL files:
    php maintenance/run.php generateSchemaSql.php --json maintenance/tables.json --sql maintenance/ --type=all
    
  3. Create an abstract schema change .json file (see below) and put it in maintenance/abstractSchemaChanges/ directory
  4. Build the schema patches using the maintenance script, for example:
    php maintenance/run.php generateSchemaChangeSql.php --json maintenance/abstractSchemaChanges/patch-logging-rename-indexes.json --sql maintenance/archives/ --type=all
    
  5. Add them to MysqlUpdater, SqliteUpdater, PostgresUpdater
  6. Do not forget to checkout your changes and automatically generated DDL files in git when making the patch.

Example patches

[edit]

Example abstract schema

[edit]
[
	{
		"name": "actor",
		"comment": "The \"actor\" table associates user names or IP addresses with integers for the benefit of other tables that need to refer to either logged-in or logged-out users. If something can only ever be done by logged-in users, it can refer to the user table directly.",
		"columns": [
			{
				"name": "actor_id",
				"comment": "Unique ID to identify each actor",
				"type": "bigint",
				"options": { "unsigned": true, "notnull": true, "autoincrement": true }
			},
			{
				"name": "actor_user",
				"comment": "Key to user.user_id, or NULL for anonymous edits",
				"type": "integer",
				"options": { "unsigned": true, "notnull": false }
			},
			{
				"name": "actor_name",
				"comment": "Text username or IP address",
				"type": "binary",
				"options": { "length": 255, "notnull": true }
			}
		],
		"indexes": [
			{ "name": "actor_user", "columns": [ "actor_user" ], "unique": true },
			{ "name": "actor_name", "columns": [ "actor_name" ], "unique": true }
		],
		"pk": [ "actor_id" ]
	}
]

Notes

[edit]
Common Doctrine DBAL types and their equivalent
Doctrine DBAL/Abstract schema MySQL SQLite PostgreSQL Version
bigint BIGINT BIGINT BIGSERIAL (if autoincrement)/BIGINT
binary VARBINARY BLOB TEXT
blob TINYBLOB/BLOB/MEDIUMBLOB/LONGBLOB (based on size) BLOB TEXT
datetimetz DATETIME DATETIME TIMESTAMPTZ (1.36+)
TIMESTAMP(0) WITH TIME ZONE (1.35)
float FLOAT/DOUBLE PRECISION (if "doublePrecision":true) (1.36+)
DOUBLE PRECISION (1.35)
DOUBLE PRECISION FLOAT/DOUBLE PRECISION (if "doublePrecision":true) (1.36+)
DOUBLE PRECISION (1.35)
integer INT INTEGER INT
smallint SMALLINT SMALLINT SMALLINT
string VARCHAR VARCHAR VARCHAR
text TINYTEXT/TEXT/MEDIUMTEXT/LONGTEXT (based on size) CLOB TEXT
mwenum ENUM TEXT TEXT
1.36
mwtimestamp BINARY(14)/VARBINARY(14) (if "allowInfinite":true) BLOB TIMESTAMPTZ
1.36
mwtinyint TINYINT SMALLINT SMALLINT
1.36


Abstract schema change

[edit]

For making a schema change, you will make a json file with snapshot of before and after abstract schemas for the table (one schema change per table please). Then you will run a maintenance script in a similar manner and it will diff between two tables and then automatically generates the schema change DDL files.

Example abstract schema change

[edit]
{
	"before": {
		"name": "actor",
		"columns": [
			{
				"name": "actor_id",
				"type": "bigint",
				"options": { "unsigned": true, "notnull": true, "autoincrement": true }
			},
			{
				"name": "actor_user",
				"type": "integer",
				"options": { "unsigned": true, "notnull": false }
			},
			{
				"name": "actor_name",
				"type": "binary",
				"options": { "length": 255, "notnull": true }
			}
		],
		"indexes": [
			{ "name": "actor_user", "columns": [ "actor_user" ], "unique": true },
			{ "name": "actor_name", "columns": [ "actor_name" ], "unique": true }
		],
		"pk": [ "actor_id" ]
	},
	"after": {
		"name": "actor",
		"columns": [
			{
				"name": "actor_id",
				"type": "bigint",
				"options": { "unsigned": true, "notnull": true, "autoincrement": true }
			},
			{
				"name": "actor_user",
				"type": "bigint",
				"options": { "unsigned": true, "notnull": false }
			},
			{
				"name": "actor_name",
				"type": "binary",
				"options": { "length": 255, "notnull": true }
			}
		],
		"indexes": [
			{ "name": "actor_user", "columns": [ "actor_user" ], "unique": true },
			{ "name": "actor_name", "columns": [ "actor_name" ], "unique": true }
		],
		"pk": [ "actor_id" ]
	}
}

The two tables are the same but type of "actor_user" has changed from "integer" to "bigint". The reason for diffing instead of abstracting the change itself is that SQLite does not have ALTER TABLE for most cases, meaning DBAL needs to know the schema to build a schema change DDL file using temporary tables.

Best practices in choosing the data type

[edit]
  • For timestamps, use mwtimestamp datatype (phab:T42626).
  • Instead of VARCHAR or CHAR, use VARBINARY or BINARY (otherwise you have to deal with encodings in databases)
  • Unless you actually need the negative value, use UNSIGNED for any type of integer value to double its capacity.
  • Using ENUM is highly discouraged (phab:T119173).