MySQL – PostgreSQL reactions

There have been some reactions about my MySQL – PostgreSQL article that I wrote recently.

I received a kind email from Magnus Hagandar, who is one of PostgreSQL’s developers (the PostgreSQL website states: “Has done major work on the Win32 port and wrote much of the Win32 Installer.”). He answered some of my unclear questions about Postgresql (and allowed me to post it here in my blog, so here it is):

Hi!

Stumbled across
http://db4free.blogspot.com/2005/12/mysql-postgresql.html, thought I
should answer two questions for you. You get it in mail since you rblog
doesn’t permit anon postings and I don’t have an account πŸ™‚

> When a transaction is started and you write a SQL statement that has a
> syntax error in it, PostgreSQL rolls back the transaction immediately,
> while MySQL shows an error message

Check the ON_ERROR_ROLLBACK parameter
(http://www.postgresql.org/docs/8.1/static/app-psql.html). This is for
psql – if you want it in your own custom app, just use a SAVEPOINT.
(this parameter will make psql use SAVEPOINTs automatically)

> MySQL has an ‘enum’ data type. I haven’t found a similar data type in
> PostgreSQL, however, ‘enum’s can be emulated with check constraints.

If you have many possible values in your enum, you really should go with
a separate table and a foreign key. That keeps it inline with the
relational model as well πŸ™‚

> I haven’t yet figured out where PostgreSQL stores this data and I
> think it’s only possible to use the SQL commands to set the permissions
> accordingly.

They’re stored in a system table, normally pg_class (but for a schema,
in pg_namespace etc). See
http://www.postgresql.org/docs/8.1/static/catalogs.html, and the
subpages.
You’re never supposed to hack these using anything other than
GRANT/REVOKE, but you can.

Hope it helps some πŸ™‚

//
Magnus

I also received a mail from Carsten Pederson (he’s the Certification Manager at MySQL AB) and corrected one of my statements. Extended inserts (those which allow you to insert more than one row at a time) are not an extension to MySQL, but SQL standard. It’s one of the few SQL standard features that PostgreSQL doesn’t support at the moment.

There are three more differences that I forgot in my last article, so here they are:

  • With MySQL it’s possible to insert new columns somewhere in the middle of an existing table, while PostgreSQL adds new columns always after all columns.
  • MySQL has a REPLACE command that doesn’t exist in PostgreSQL.
  • PostgreSQL has an additional “layer” called schema (so it’s database – schema – tables, unlike in MySQL it is database – tables).

Of course there exist a lot more, to count them all up would probably be enough to fill a book. But it’s interesting to see how two different products have very much in common, but still a lot of differences and it’s real fun to me to find out new things on both MySQL and PostgreSQL. The more I work with PostgreSQL, the more I like it, but the nice thing about it is that it doesn’t affect the way I like MySQL :-).

Breaking referential integrity

Is it possible to break the referential integrity when a foreign key constraint is set on an InnoDB table? The answer is an (unfortunate?) yes.

A little example demonstrates this:

mysql> CREATE TABLE t1 (
    ->   id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
    ->   ) ENGINE = InnoDB;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO t1 (id) VALUES
    ->   (1), (2), (3);
Query OK, 3 rows affected (0.02 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> CREATE TABLE t2 (
    ->   id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    ->   id_t1 INT NOT NULL,
    ->   FOREIGN KEY (id_t1) REFERENCES t1(id))
    ->   ENGINE = InnoDB;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO t2 (id_t1) VALUES
    ->   (3), (2), (2), (3), (1);
Query OK, 5 rows affected (0.00 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM t1; SELECT * FROM t2;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
+----+
3 rows in set (0.00 sec)

+----+-------+
| id | id_t1 |
+----+-------+
|  5 |     1 |
|  2 |     2 |
|  3 |     2 |
|  1 |     3 |
|  4 |     3 |
+----+-------+
5 rows in set (0.00 sec)

mysql> INSERT INTO t2 (id_t1) VALUES (4);
ERROR 1452 (23000): Cannot add or update a 
child row: a foreign key constraint fails 
(`test/t2`, CONSTRAINT `t2_ibfk_1` FOREIGN KEY 
(`id_t1`) REFERENCES `t1` (`id`))

mysql> SET @@foreign_key_checks = 0;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO t2 (id_t1) VALUES
    ->   (2), (1), (4), (2);
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> SET @@foreign_key_checks = 1;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM t1; SELECT * FROM t2;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
+----+
3 rows in set (0.00 sec)

+----+-------+
| id | id_t1 |
+----+-------+
|  5 |     1 |
|  7 |     1 |
|  2 |     2 |
|  3 |     2 |
|  6 |     2 |
|  9 |     2 |
|  1 |     3 |
|  4 |     3 |
|  8 |     4 |
+----+-------+
9 rows in set (0.00 sec)

The record no 8 in table t2 has now an invalid value, due to the fact that we have set the foreign key verification temporarily to off (0).

It’s not a bad feature that it’s possible to temporarily turn @@foreign_key_checks off, because sometimes it’s necessary to change data in a way that referential integrity needs to be broken during the modification process. But most of the time, the data should be valid, when modification is finished. So what can be done?

Unfortunately, the only way is to check manually, if there are any invalid values:

mysql> SELECT t2.id
    -> FROM t2 LEFT JOIN t1 ON t2.id_t1 = t1.id
    -> WHERE t1.id IS NULL;
+----+
| id |
+----+
|  8 |
+----+
1 row in set (0.00 sec)

Could the command “SET @@foreign_key_checks = 1” verify automatically, whether all values are valid? Probably not, because all data from all tables with foreign key constraints would have to be verified for invalid data, because the command doesn’t know, which tables have been modified in the meantime. How else can this problem be solved?

PostgreSQL knows a clause that can be added to the FOREIGN KEY definition – it’s called INITIALLY DEFERRED. This clause allows foreign keys to be broken within a transaction, but they must be valid again, when the transaction commits. A nice feature, I think.

Another solution could be a table based command that includes this kind of verification. Maybe the CHECK TABLE [tablename] would be an appropriate command for checking foreign key integrity on InnoDB tables, too. The option EXTENDED can be appended to the command, but also this doesn’t work for broken integrity:

mysql> CHECK TABLE t2 EXTENDED;
+---------+-------+----------+----------+
| Table   | Op    | Msg_type | Msg_text |
+---------+-------+----------+----------+
| test.t2 | check | status   | OK       |
+---------+-------+----------+----------+
1 row in set (0.00 sec)

Maybe that’s something to be considered to add foreign key integrity checks to this (or maybe another) command?!

The first week on my new job

This Monday, I started my new job at a web & software developing company.

Most of the time, the first week on a new job is very tough – everything is unfamilar, you don’t know everybody and everything seems strange. I know that from the time when I started another job in 1996 in a Controlling departement (something very different, business stuff). There was a lot of frustration in those days, but not this time. I could describe my feelings best to simply call it “excited”.

However, it’s still tough, but not because of the job itself, but for the fact that the company resides about 60 kilometers from my hometown (that are 120 km/day, 600 km/week). Where I live, weather is often very bad in winter, roads are slippery, much wind, much snow and most of the time I have to drive in the dark (just the last few minutes before I arrive in the morning it’s lightening up). It’s sometimes quite dangerous and requires careful driving, but I hope that these problems get solved – partly by nature (it will soon light up earlier and get dark later and winter will hopefully end some day), partly maybe by an accommodation near the company.

The job is great, because it deals mostly with things that I’m familiar with, but still offers great opportunities to learn new things. On Thursday I received my first productive project – a web application written in PHP using Smarty (nothing new to me, since db4free.net is built up the same way). But one thing is a “first time” for me – it’s my first project with PostgreSQL instead of MySQL, as PostgreSQL is the primary RDBMS used in this company and MySQL only the secondary (but I hope that I can still do some work on MySQL, too ;-)).

Of course I would have liked to mainly work with MySQL as this gave me a chance to bring a lot of excellent skills into my job (although most of my MySQL skills can also be used for PostgreSQL), but every medal with a bad side does also have one with a good side – I will have to also learn PostgreSQL very well and will get a wider perspective of RDBMS in general. More about that in (a) separate article(s).

I hope that I will get the chance to use and learn other techniques and programming languages as well. Primarily I hope that I will be able to use Java and I would very much like it if I get the chance to learn C/C++ to achieve a professional level as well, as my current skills of C/C++ are very basic. But I’m confident that they won’t disallow me to use things that I’d like to use, so this will probably be a great chance to become a professional developer for many languages and techniques.

You will certainly read more about that!

MySQL – PostgreSQL

As I just told, my new job requires me to do a lot of work with PostgreSQL, which is relatively new to me, since I have done almost all of my database work with MySQL so far.

I also told that this will give me a wider perspective on RDBMS in general and I would like to use this opportunity to discuss about how features are implemented in MySQL and how in PostgreSQL.

Don’t worry, this will not end up in a MySQL vs. PostgreSQL fight. As I know MySQL much better than PostgreSQL, I do understand that this position doesn’t allow me to make representative judgements on which database system is better (most people would naturally prefer the one they know better). I also believe that both of them are great RDBMS and none of them deserves to be involved in a “A rulez, B suckz” struggle (I love MySQL and I try to fall in love with PostgreSQL, too). I would rather like to show, how certain features work with MySQL and how with PostgreSQL and how the differences affect me personally – which could mean that I miss a feature in one or the other database system, but that does never mean that this product sucks.

I would like to start with listing up some differences that I have discovered so far. If you find something that’s not true as a fact, please don’t hesitate to contact me (m.popp@db4free.net) or comment this article.

  • MySQL uses auto_increment to create numeric values that iterate for every inserted record, PostgreSQL uses sequences. It’s possible to use the serial data type to automate this in PostgreSQL so I think there are no big advantages or disadvantages for each particular implementation.
  • When a transaction is started and you write a SQL statement that has a syntax error in it, PostgreSQL rolls back the transaction immediately, while MySQL shows an error message, but still continues the transaction (it’s up to the user to do error handling and roll back the transaction, if desired). I think, both ways have their own advantages and disadvantages. It would be great if both database systems could create a setting to let the user decide how to handle syntax errors in transactions.
  • PostgreSQL allows check constraints. I have heard that check constraints will be implemented in MySQL 5.1 (can somebody confirm this?).
  • MySQL has an ‘enum’ data type. I haven’t found a similar data type in PostgreSQL, however, ‘enum’s can be emulated with check constraints.
  • Other than PostgreSQL, MySQL allows to insert more than one row in one INSERT statement. MySQL’s implementation is a very useful extension to SQL standard.
  • There are several storage engines in MySQL with several features. Starting with MySQL 5.1, storage engines will be pluggable. PostgreSQL doesn’t allow to choose between storage engines, there’s only one way to store the data, however, all the features of PostgreSQL are available within this data storage method. This is an advantage and disadvantage as well. You can tailor the requirements of a particular table better with MySQL, on the other hand, if you need a feature from two storage engines in one table (for example foreign keys and full text indexing), you have a problem. As MySQL implements more and more features into the available storage engines, this will be less of a problem in the future.
  • MySQL allows to set @@foreign_key_checks to 0 to disable foreign key checking. It’s sometimes necessary to change the structures of related tables and temporarily disable these checks. One thing that I miss in MySQL is that enabling @@foreign_key_checks again doesn’t verify referential integrity (please correct me, if I’m wrong or if that has changed). In PostgreSQL, you can add “INITIALLY DEFERRED” to the foreign key definition, so this allows that referential integrity may be violated inside a transaction, but the data must be correct when the transaction is being committed (I will possibly write more about this later).
  • PostgreSQL is (or has been) more restrictive checking the data that’s inserted. MySQL has introduced SQL mode settings to enable restrictive error checking. However, in older versions of MySQL (or if SQL mode is not set), MySQL allows obviously invalid data to be inserted.
  • PostgreSQL allows to write stored procedures, functions and triggers in different languages, however, these languages are not installed by default. MySQL offers only one language, but more languages will probably be available in later versions.
  • MySQL stores user permission data inside a MySQL database (called mysql). This makes it very simple for a system administrator to edit permissions or look up how they are set (besides the SQL commands GRANT, REVOKE, SHOW GRANTS etc.). I haven’t yet figured out where PostgreSQL stores this data and I think it’s only possible to use the SQL commands to set the permissions accordingly.
  • MySQL and PostgreSQL do of course have different client programs and it’s up to somebody’s preferences which ones to like more. But I found out that if you are logged in as a “normal” user (without administrator privileges and access to all databases) in PostgreSQL, you can always see all databases, including those to where no access is granted. In MySQL, a particular user can only see the databases that he has access to. Maybe there’s something in PostgreSQL that I don’t know yet to also make only those databases visible that I user has permissions to access (please tell me, if you know a solution).

This list can and will be continued and if you know some more important differences that I haven’t listed, or if you find errors or things that should be added, I would appreciate your comments.

Several Open Source database system producers (including MySQL and PostgreSQL, beside some others) have agreed to found the Open Source Database Consortium (a website is planned at http://www.osdbconsortium.org/). Find detailed information on what it’s all about in Kaj ArnΓΆ’s article about the foundation of the Consortium. I hope that this project makes it easier for people who work with more than one Open Source database product. It’s a great thing to keep track of as soon as the site becomes available and if there is a chance, I would be happy to contribute to this fascinating project in one way or another.

TX isolation levels (3) REPEATABLE READ, SERIALIZABLE

The third part of this series shows how transactions work when the REPEATABLE READ and SERIALIZABLE transaction levels are used. I show these two together, because they work very similar and can easily be shown in one example.

REPEATABLE READ is the default transaction level in MySQL and the one that’s most commonly used. We start with the same data as in the last two examples – a simple InnoDB table called tt with one integer field called d, which holds three rows with the values 1, 2 and 3. Again, we need two MySQL monitor client windows to perform following steps:

CLIENT 1
========

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from tt;
+------+
| d    |
+------+
|    1 |
|    2 |
|    3 |
+------+
3 rows in set (0.00 sec)


CLIENT 2
========

mysql> update tt set d=4 where d=2;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0


CLIENT 1
========

mysql> select * from tt;
+------+
| d    |
+------+
|    1 |
|    2 |
|    3 |
+------+
3 rows in set (0.00 sec)

After client 2 has modified a row, client 1, who’s in transaction mode, still sees the unchanged data. This makes it possible for client 1 to always receive consistent (the C of ACID) data, while performing a transaction.

This example showed how REPEATABLE READ works. So what’s the difference to SERIALIZABLE? Let’s do a rollback on client 1 and change the transaction isolation level to SERIALIZABLE and let client 2 try to change another row:

CLIENT 1
========

mysql> rollback;
Query OK, 0 rows affected (0.00 sec)

mysql> set session transaction isolation level serializable;
Query OK, 0 rows affected (0.00 sec)

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from tt;
+------+
| d    |
+------+
|    1 |
|    4 |
|    3 |
+------+
3 rows in set (0.02 sec)

CLIENT 2
========

mysql> update tt set d=2 where d=4;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

This time, client 2 cannot update the row. After a few seconds of waiting, the attempt to update a row ends with the error message “Lock wait timeout exceeded”.

To sum this up: both REPEATABLE READ and SERIALIZABLE don’t allow dirty reads, nonrepeatable reads and phantom reads to happen. While REPEATABLE READ still allows another client to modify data, while he performs a transaction, SERIALIZABLE strictly disallows it. Because of this, REPEATABLE READ is in most cases the best compromise between fulfilling the ACID (atomicy, consistency, isolation, durability) principles and still giving the best possible performence.

TX isolation levels (4) How to produce an artificial deadlock

The last article of this series shows, how to produce an artificial deadlock. We have to make a little modification to the table that we’ve used for the examples before – we need to add a primary key. Without primary key, the whole table will be locked, which makes it impossible to produce a deadlock (that’s also, why deadlocks can’t occur on MyISAM tables, because MyISAM uses table locking instead of row locking).

mysql> alter table tt add primary key(d);
Query OK, 3 rows affected (0.05 sec)
Records: 3  Duplicates: 0  Warnings: 0

Starting from a table tt with three rows 1, 2 and 3, do following steps:

CLIENT 1
========

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> update tt set d=4 where d=2;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

CLIENT 2
========

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> update tt set d=5 where d=1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

CLIENT 1
========

mysql> update tt set d=6 where d=1;
Query OK, 1 row affected (10.28 sec)
Rows matched: 1  Changed: 1  Warnings: 0

CLIENT 2
========

mysql> update tt set d=7 where d=2;
ERROR 1213 (40001): Deadlock found when trying to get lock; 
try restarting transaction

After client 2 tries to get a lock on a row that’s already locked from client 1, a deadlock occurs, because client 1 is already waiting to get a lock on a row that has been changed by client 2.