Category Archives: Archives

Blog posts from db4free.blogspot.com

Book by Karl Fogel – Producing Open Source Software

Giuseppe Maxia recently told me about a book that he could recommend – it’s called “Producing Open Source Software – How to run a successful free software project” by Karl Fogel (ISBN 0-596-00759-0).

I ordered it at Amazon.de (and was astonished that they could ship an English book within 24 hours – mostly I have to order English books from the US Amazon, where books are also shipped very quickly but it can easily take a month until the book is finally delivered) and started reading. I have only finished the first chapter so I can’t tell much about the details. But it still looks interesting: it tells about the philosophy of Open Source Software and why most Open Source projects fail (assumedly 90 – 95 %). The first chapter forecasts that it will show the pitfalls and what should be done to minimize the risks that an Open Source project will fail.

It sure isn’t a bad book for everyone who’s interested in Open Software, how it works and what the philosophy behind it is – and most likely, most people who like MySQL (or other Open Source Software) will also be interested about the topics of this book.

If you’re interested, check out the O’Reilly website at http://www.oreilly.com/catalog/producingoss/index.html.

How to deal with repeating date intervals?

I found an interesting post on a newsgroup which made me think about a nice solution for this problem:

You have a date/interval pair stored in a table, let’s say, the 29th January and an interval of 14 days and you want to check if a date is followed by that given interval. So this should be true for 29th January (I assume that the given date should be included), 13th February, 27th February, 13th March etc.

I wanted to find a solution which can not only deal with one pair, but with many pairs. You should be able to specify a date and verify which pairs are true for the specified date (if you’re a bit confused by my explaination – my example will make it easier to understand what I mean).

I have set up an example table with several date/interval pairs:

mysql> SELECT id, dt, intvl
    -> FROM date_repetitions;
+----+------------+-------+
| id | dt         | intvl |
+----+------------+-------+
|  1 | 2005-09-26 |    16 |
|  2 | 2005-09-27 |    12 |
|  3 | 2005-09-30 |    18 |
|  4 | 2005-10-02 |    18 |
|  5 | 2005-10-03 |    16 |
|  6 | 2005-10-08 |    17 |
|  7 | 2005-10-12 |    16 |
|  8 | 2005-10-14 |    20 |
|  9 | 2005-10-20 |    20 |
| 10 | 2005-10-24 |    15 |
| 11 | 2005-10-25 |     8 |
| 12 | 2005-10-26 |    15 |
| 13 | 2005-10-28 |    17 |
| 14 | 2005-11-04 |    17 |
| 15 | 2005-11-07 |    12 |
| 16 | 2005-11-10 |    16 |
| 17 | 2005-11-11 |    21 |
| 18 | 2005-11-15 |    21 |
| 19 | 2005-11-19 |    10 |
| 20 | 2005-11-21 |    18 |
| 21 | 2005-11-22 |    14 |
| 22 | 2005-11-25 |    20 |
| 23 | 2005-11-29 |    11 |
| 24 | 2005-11-30 |    13 |
| 25 | 2005-12-03 |    11 |
| 26 | 2005-12-04 |     9 |
| 27 | 2005-12-06 |    20 |
| 28 | 2005-12-08 |     8 |
| 29 | 2005-12-12 |    15 |
| 30 | 2005-12-14 |    13 |
| 31 | 2005-12-15 |    16 |
| 32 | 2005-12-17 |    19 |
| 33 | 2005-12-22 |    12 |
| 34 | 2005-12-24 |     9 |
| 35 | 2005-12-27 |    13 |
| 36 | 2005-12-28 |    19 |
| 37 | 2005-12-29 |     8 |
| 38 | 2006-01-01 |    15 |
| 39 | 2006-01-02 |    15 |
| 40 | 2006-01-03 |    10 |
| 41 | 2006-01-07 |    10 |
| 42 | 2006-01-08 |    14 |
| 43 | 2006-01-14 |    20 |
| 44 | 2006-01-15 |    20 |
| 45 | 2006-01-16 |    17 |
| 46 | 2006-01-23 |    20 |
| 47 | 2006-01-24 |    12 |
| 48 | 2006-01-25 |    21 |
| 49 | 2006-01-26 |    18 |
+----+------------+-------+
49 rows in set (0.00 sec)

All we need is a *very* little Stored Procedure:

DELIMITER //

DROP PROCEDURE IF EXISTS getRepeatedDates //
CREATE PROCEDURE getRepeatedDates (pDate DATE)
BEGIN
  SELECT id, dt, intvl FROM date_repetitions
    WHERE (to_days(pDate) - to_days(dt)) % intvl = 0
    AND to_days(pDate) >= to_days(dt);
END //

DELIMITER ;

So what does it do? It calculates an internal value of the dates (with to_days()) that starts counting at year 0 (although year zero actually never existed, because 1 B.C was followed by year 1 A.D. – that’s also, why the 3rd millenium was told to begin in 2001 and not in 2000, just as a little sidenote ;-)). So the 29th January 2006 would e.g. be transfered to 732705.

We subtract the to_days() value from the date in the database from the to_days() value from the date that’s to be checked. Than we devide the result by the interval value and get the remainder out of it. If the remainder is zero, the date meets the criteria.

The second condition in the WHERE clause just checks, if the date to be checked is greater than or equal to the date stored in the database.

So lets see, which records are OK for the 29th January 2006:

mysql> call getRepeatedDates('2006-01-29');
+----+------------+-------+
| id | dt         | intvl |
+----+------------+-------+
| 11 | 2005-10-25 |     8 |
| 16 | 2005-11-10 |    16 |
| 34 | 2005-12-24 |     9 |
+----+------------+-------+
3 rows in set (0.01 sec)

Query OK, 0 rows affected (0.01 sec)

Emulating check constraints with views?

After trying to emulate check constraints with a trigger yesterday, I have now tried out if it works to enforce data integrity with views, as Arjen Lentz described it in this article: http://arjen-lentz.livejournal.com/49881.html.

I created a little table and a view with a WHERE clause that defined the conditions – and I added the WITH CHECK OPTION clause:

mysql> CREATE TABLE checktest (
    ->   id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    ->   val INT UNSIGNED NOT NULL);
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE VIEW v_checktest AS
    ->   SELECT id, val FROM checktest
    ->   WHERE val >= 1 AND val    WITH CHECK OPTION;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO v_checktest (val) VALUES (2);
Query OK, 1 row affected (0.02 sec)

mysql> INSERT INTO v_checktest (val) VALUES (4);
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO v_checktest (val) VALUES (6);
ERROR 1369 (HY000): CHECK OPTION failed 'test.v_checktest'
mysql> SELECT * FROM v_checktest;
+----+-----+
| id | val |
+----+-----+
|  1 |   2 |
|  2 |   4 |
+----+-----+
2 rows in set (0.00 sec)

mysql> SELECT * FROM checktest;
+----+-----+
| id | val |
+----+-----+
|  1 |   2 |
|  2 |   4 |
+----+-----+
2 rows in set (0.00 sec)

So far, it works fine, but what happens, if I try to update the second row to value 6 (using the view, of course):

mysql> UPDATE v_checktest SET val=6 WHERE id=2;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> SELECT * FROM v_checktest;
+----+-----+
| id | val |
+----+-----+
|  1 |   2 |
+----+-----+
1 row in set (0.00 sec)

mysql> SELECT * FROM checktest;
+----+-----+
| id | val |
+----+-----+
|  1 |   2 |
|  2 |   6 |
+----+-----+
2 rows in set (0.00 sec)

That was unfortunately not, what I wanted. My question: is this a bug that should be reported?

I found out another interesting thing when I entered SHOW VIEW v_checktest (I formatted the output a bit to make it more readable):

mysql> SHOW CREATE VIEW v_checktest\G
*************************** 1. row ***************************
       View: v_checktest
Create View: CREATE ALGORITHM=UNDEFINED 
  DEFINER=`mpopp`@`localhost` 
  SQL SECURITY DEFINER VIEW `v_checktest` AS 
select `checktest`.`id` AS `id`,
  `checktest`.`val` AS `val` 
  from `checktest` 
  where ((`checktest`.`val` >= 1) 
    and (`checktest`.`val` <= 5)) 
  WITH CASCADED CHECK OPTION
1 row in set (0.00 sec)

Even though I didn’t specify CASCADED in the WITH CHECK OPTION clause, it’s now here in the view definition. Another bug?

By the way – what can you do, if you want to get the original view definition? I don’t think, there’s a way inside MySQL, but if you have access to the file sytem where MySQL runs at, you can open the .frm file of the view. For example, my MySQL installation resides at D:\mysql, the view is in the database ‘test’ and the view is called ‘v_checktest’. So the file is D:\mysql\data\test\v_checktest.frm.

Here’s the content of this file:

TYPE=VIEW
query=select `test`.`checktest`.`id` AS `id`,
  `test`.`checktest`.`val` AS `val` 
from `test`.`checktest` 
where ((`test`.`checktest`.`val` >= 1) 
  and (`test`.`checktest`.`val` = 1 AND val <= 5\n  
  WITH CHECK OPTION

Here at the end, you find the original definition, like I defined the view.

I hope that the original view definition will soon be available through MySQL clients, too. There is already a feature request for it and there seem to be a lot of people (me included) who really wait for it.

Creating log tables with triggers

Frank Mash did some testing with triggers and inspired me to try to create a log table based on triggers. This also answers his question “After playing earlier with triggers, I wanted to see whether one can update another table using MySQL triggers. I tried something like … but it didn’t work. Is it possible or am I shooting in the dark here?” with a clear yes!

First I created a data and a log table:

mysql> CREATE TABLE data_table (
    -> id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    -> d1 VARCHAR(200) NOT NULL UNIQUE,
    -> d2 DECIMAL(9,2) NOT NULL) ENGINE=InnoDB;
Query OK, 0 rows affected (0.02 sec)

mysql> CREATE TABLE log_table (
    -> id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    -> id_data INT UNSIGNED NOT NULL,
    -> old_d1 VARCHAR(200) NULL,
    -> new_d1 VARCHAR(200) NULL,
    -> old_d2 DECIMAL(9,2) NULL,
    -> new_d2 DECIMAL(9,2) NULL,
    -> kind_of_change ENUM('insert', 'update', 'delete') NOT NULL,
    -> ts TIMESTAMP NOT NULL) ENGINE=InnoDB;
Query OK, 0 rows affected (0.02 sec)

Then I needed three triggers to log each kind of modification: insert, update and delete. I used after triggers to make sure that there’s only a log entry, if the modification really worked:

mysql> DELIMITER //

mysql> DROP TRIGGER test.data_table_ai//
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TRIGGER test.data_table_ai
    ->   AFTER INSERT ON test.data_table
    ->   FOR EACH ROW
    -> BEGIN
    ->
    ->   INSERT INTO log_table
    ->     (id_data, old_d1, new_d1,
    ->      old_d2, new_d2,
    ->      kind_of_change, ts)
    ->     VALUES
    ->     (new.id, NULL, new.d1,
    ->      NULL, new.d2, 'insert', now());
    ->
    -> END //
Query OK, 0 rows affected (0.02 sec)


mysql> DROP TRIGGER test.data_table_au//
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TRIGGER test.data_table_au
    ->   AFTER UPDATE ON test.data_table
    ->   FOR EACH ROW
    -> BEGIN
    ->
    ->   INSERT INTO log_table
    ->     (id_data, old_d1, new_d1,
    ->      old_d2, new_d2,
    ->      kind_of_change, ts)
    ->     VALUES
    ->     (new.id, old.d1, new.d1,
    ->      old.d2, new.d2, 'update', now());
    ->
    -> END //
Query OK, 0 rows affected (0.00 sec)


mysql> DROP TRIGGER test.data_table_ad//
Query OK, 0 rows affected (0.02 sec)

mysql> CREATE TRIGGER test.data_table_ad
    ->   AFTER DELETE ON test.data_table
    ->   FOR EACH ROW
    -> BEGIN
    ->
    ->   INSERT INTO log_table
    ->     (id_data, old_d1, new_d1,
    ->      old_d2, new_d2,
    ->      kind_of_change, ts)
    ->     VALUES
    ->     (old.id, old.d1, NULL,
    ->      old.d2, NULL, 'delete', now());
    ->
    -> END //
Query OK, 0 rows affected (0.02 sec)

mysql> DELIMITER ;

Finally, I simply inserted a few records and updated and deleted some of them:

mysql> INSERT INTO data_table (d1, d2) VALUES ('Markus', 534.12);
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO data_table (d1, d2) VALUES ('Frank', 210.33);
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO data_table (d1, d2) VALUES ('Charles', 1047.11);
Query OK, 1 row affected (0.00 sec)

mysql> UPDATE data_table SET d2=492.11 WHERE d1='Markus';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> DELETE FROM data_table WHERE d1='Charles';
Query OK, 1 row affected (0.00 sec)

So here’s what the tables look like:

mysql> SELECT * FROM data_table;
+----+--------+--------+
| id | d1     | d2     |
+----+--------+--------+
|  1 | Markus | 492.11 |
|  2 | Frank  | 210.33 |
+----+--------+--------+
2 rows in set (0.00 sec)

mysql> SELECT * FROM log_table\G
*************************** 1. row ***************************
            id: 1
       id_data: 1
        old_d1: NULL
        new_d1: Markus
        old_d2: NULL
        new_d2: 534.12
kind_of_change: insert
            ts: 2006-01-25 15:47:43
*************************** 2. row ***************************
            id: 2
       id_data: 2
        old_d1: NULL
        new_d1: Frank
        old_d2: NULL
        new_d2: 210.33
kind_of_change: insert
            ts: 2006-01-25 15:48:10
*************************** 3. row ***************************
            id: 3
       id_data: 3
        old_d1: NULL
        new_d1: Charles
        old_d2: NULL
        new_d2: 1047.11
kind_of_change: insert
            ts: 2006-01-25 15:48:27
*************************** 4. row ***************************
            id: 4
       id_data: 1
        old_d1: Markus
        new_d1: Markus
        old_d2: 534.12
        new_d2: 492.11
kind_of_change: update
            ts: 2006-01-25 15:48:49
*************************** 5. row ***************************
            id: 5
       id_data: 3
        old_d1: Charles
        new_d1: NULL
        old_d2: 1047.11
        new_d2: NULL
kind_of_change: delete
            ts: 2006-01-25 15:49:06
5 rows in set (0.00 sec)

Very useful, isn’t it ;)?

Emulating check constraints

Check constraints are one of the features that MySQL’s still missing (unfortunately), so I tried if there’s a workaround using triggers.

My idea was quite simple. I created one AFTER INSERT trigger and one AFTER UPDATE trigger. They should verify, if they meet the conditions and if they do not, the AFTER INSERT should delete the just inserted record again and the AFTER UPDATE trigger should update the values back to their previous values.

So I created this table:

mysql> CREATE TABLE check_test (
    -> id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    -> name VARCHAR(200) NOT NULL,
    -> email VARCHAR(200) NOT NULL,
    -> score INT NOT NULL) ENGINE=InnoDB;
Query OK, 0 rows affected (0.00 sec)

The email address should be verified with the regular expression ‘^[a-z0-9_\.-]+@[a-z0-9_-]+\.[a-z0-9_\.-]+$’ and the score should be between 0 and 100.

So I implemented the following two triggers:

mysql> DELIMITER //
mysql>
mysql> DROP TRIGGER check_test_ai//
Query OK, 0 rows affected (0.02 sec)

mysql>
mysql> CREATE TRIGGER check_test_ai
    ->   AFTER INSERT ON check_test
    ->   FOR EACH ROW
    -> BEGIN
    ->
    ->   IF new.email NOT REGEXP 
    ->     '^[a-z0-9_\.-]+@[a-z0-9_-]+\.[a-z0-9_\.-]+$'
    ->     OR new.score  100 THEN
    ->
    ->     DELETE FROM check_test WHERE id = new.id;
    ->   END IF;
    ->
    -> END//
Query OK, 0 rows affected (0.02 sec)

mysql> DROP TRIGGER check_test_au//
Query OK, 0 rows affected (0.00 sec)

mysql>
mysql> CREATE TRIGGER check_test_au
    ->   AFTER UPDATE ON check_test
    ->   FOR EACH ROW
    -> BEGIN
    ->
    ->   IF new.email NOT REGEXP 
    ->     '^[a-z0-9_\.-]+@[a-z0-9_-]+\.[a-z0-9_\.-]+$'
    ->     OR new.score  100 THEN
    ->
    ->     UPDATE check_test SET id = old.id, name = old.name, 
    ->         email = old.email, score = old.email
    ->       WHERE id = new.id;
    ->   END IF;
    ->
    -> END//
Query OK, 0 rows affected (0.00 sec)

mysql>
mysql> DELIMITER ;

Let’s see how they work and let’s first insert a valid record:

mysql> INSERT INTO check_test
    ->   (name, email, score)
    ->   VALUES
    ->   ('Person1', 'person1@email.com', 88);
Query OK, 1 row affected (0.00 sec)

… and now two invalid records – one with an invalid email address and one with an invalid score:

mysql> INSERT INTO check_test
    ->   (name, email, score)
    ->   VALUES
    ->   ('Person2', 'aninvalidemailaddress', 45);
ERROR 1442 (HY000): Can't update table 'check_test' in stored 
function/trigger because it is already used by statement 
which invoked this stored function/trigger.
mysql> INSERT INTO check_test
    ->   (name, email, score)
    ->   VALUES
    ->   ('Person3', 'person3@email.com', 122);
ERROR 1442 (HY000): Can't update table 'check_test' in stored 
function/trigger because it is already used by statement 
which invoked this stored function/trigger.

Whooops – we get an error message. It says that the table can’t be updated, because the table is already in use. So it seems, a trigger can’t refer to its own table. But lets see what the content of the table is:

mysql> SELECT * FROM check_test;
+----+---------+-------------------+-------+
| id | name    | email             | score |
+----+---------+-------------------+-------+
|  1 | Person1 | person1@email.com |    88 |
+----+---------+-------------------+-------+
1 row in set (0.00 sec)

However – the result is what I wanted. The trigger refused to insert the records with invalid values.

So let’s try updates (one valid and one invalid):

mysql> UPDATE check_test
    ->   SET email = 'anotheremail@email.com'
    ->   WHERE id = 1;
Query OK, 1 row affected (0.02 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> UPDATE check_test
    ->   SET score = 200
    ->   WHERE id = 1;
ERROR 1442 (HY000): Can't update table 'check_test' in stored 
function/trigger because it is already used by statement 
which invoked this stored function/trigger.

Generally, it’s the same result. What’s the content of the table now?

mysql> SELECT * FROM check_test;
+----+---------+------------------------+-------+
| id | name    | email                  | score |
+----+---------+------------------------+-------+
|  1 | Person1 | anotheremail@email.com |    88 |
+----+---------+------------------------+-------+
1 row in set (0.00 sec)

Again, the result is what we intended to do. MySQL refused to change the record to an invalid value, but did the correct update.

So it’s of course not the elegant way, because I initially planned to properly delete the invalid records again or update them back to their original values. Instead, MySQL now throws an error and interrupts the INSERT or UPDATE, so the values still remain on their original state (or are refused from being updated).

However, the much more elegant way would be proper check constraints, but I’m confident that there’s light at the end of the tunnel.

If somebody has more elegant solutions to this issue – please let me and all MySQL interested people know :-).

mysqldump’s tz-utc trap

I used to make backup dumps from my webhosting MySQL databases (I’ve got remote access) via the mysqldump command. All commands that I need (including mysqlcheck to optimize and analyze all tables) are grouped together in batch files, so it works very conveniently. One more word to mysqlcheck – you should optimize and analyze your tables from time to time to defragment the tables and renew the index statistics. That makes sure that performence doesn’t suffer if many records are deleted and inserted.

However, after making new backup files today, I realized that the date/time data were one hour ahead of the original data. I was wondering how this could happen, analyzed the dump file and soon found the answer: the tz-utc option.

This option was by default set to true, so these lines were included in the dump file:

/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
/*!40103 SET TIME_ZONE='+00:00' */;

… and at the end:

/*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;

That means, the time zone is set to UTC, but as my system time zone is one hour ahead (as we have CET in Austria), MySQL added one hour to each time relevant piece of data, when I imported the dump file. By the way, /*!40103 … */ means that this command is only being executed when you run it with MySQL version 4.1.3 or higher.

The solution is quite easy, though. Just add –tz-utc=false to your mysqldump command:

mysqldump --tz-utc=false [...further options] > dump_file.sql

Now, these three lines will be omitted and your time information will remain unchanged.

Index information from information_schema

The information_schema tables offer all kinds of useful metadata. Also all information about indexes are included in them, although they are split up among several tables.

I’ve built up a query that shows all the information about indexes, here it is:

SELECT a.table_schema, a.table_name, a.constraint_name, 
       a.constraint_type,
       convert(group_concat(DISTINCT b.column_name
       ORDER BY b.ordinal_position SEPARATOR ', '), char) 
       as column_list,
       b.referenced_table_name, b.referenced_column_name
FROM information_schema.table_constraints a
INNER JOIN information_schema.key_column_usage b
ON a.constraint_name = b.constraint_name AND
   a.table_schema = b.table_schema AND
   a.table_name = b.table_name
GROUP BY a.table_schema, a.table_name, a.constraint_name, 
         a.constraint_type, b.referenced_table_name, 
         b.referenced_column_name
UNION
SELECT table_schema, table_name, index_name as constraint_name,
       if(index_type='FULLTEXT', 'FULLTEXT', 'NON UNIQUE') 
       as constraint_type,
       convert(group_concat(column_name 
       ORDER BY seq_in_index separator ', '), char) as column_list,
       null as referenced_table_name, null as referenced_column_name
FROM information_schema.statistics
WHERE non_unique = 1
GROUP BY table_schema, table_name, constraint_name, constraint_type,
         referenced_table_name, referenced_column_name
ORDER BY table_schema, table_name, constraint_name

Actually, at the moment I need this piece of information rather for PostgreSQL than for MySQL (for automatically renaming the index names according to coding standards), but I would have hated to do this without knowing how it works in MySQL. My PostgreSQL equivalent still misses the information for non unique indexes – the rest already works great. So here’s the PostgreSQL equivalent up to what I already have:

SELECT a.table_catalog, a.table_schema, a.table_name, 
       a.constraint_name, a.constraint_type, 
       array_to_string(
         array(
           SELECT column_name::varchar
           FROM information_schema.key_column_usage
           WHERE constraint_name = a.constraint_name
           ORDER BY ordinal_position
           ),
         ', '
         ) as column_list,
       c.table_name, c.column_name
FROM information_schema.table_constraints a 
INNER JOIN information_schema.key_column_usage b
ON a.constraint_name = b.constraint_name
LEFT JOIN information_schema.constraint_column_usage c 
ON a.constraint_name = c.constraint_name AND 
   a.constraint_type = 'FOREIGN KEY'
GROUP BY a.table_catalog, a.table_schema, a.table_name, 
         a.constraint_name, a.constraint_type, 
         c.table_name, c.column_name
ORDER BY a.table_catalog, a.table_schema, a.table_name, 
         a.constraint_name

Unfortunately, the statistics table doesn’t exist in PostgreSQL’s information_schema (is it an extension of MySQL or missing in PostgreSQL?), so I have to find another source for the non-unique index data. I’ll add the final query as comment to this article as soon as I’ve got it.

Hail to GROUP_CONCAT!

MySQL’s GROUP_CONCAT command is a phantastic thing, it can simplify big efforts to a small piece of work that would not even be worth mentioning – if the way that the problem is being solved wouldn’t be so nice ;).

The reason why I’m writing this is because I have to work on such a task with PostgreSQL, and I already know that it will be quite a tough piece of work, which could be solved with MySQL within seconds. So to everybody, who doesn’t still know the power of the GROUP_CONCAT aggrigate function, here’s a little example, which I think is self-explaining:

mysql> CREATE TABLE group_concat_test (
    ->   id INT UNSIGNED NOT NULL auto_increment PRIMARY KEY,
    ->   _key INT UNSIGNED NOT NULL,
    ->   _value CHAR(1) NOT NULL);
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO group_concat_test
    ->    (_key, _value) VALUES
    ->    (1, 'b'),
    ->    (1, 'c'),
    ->    (2, 'a'),
    ->    (2, 'd'),
    ->    (2, 'c'),
    ->    (3, 'b'),
    ->    (3, 'e');
Query OK, 7 rows affected (0.01 sec)
Records: 7  Duplicates: 0  Warnings: 0

mysql> SELECT _key,
    ->        COUNT(*) as cnt,
    ->        GROUP_CONCAT(_value ORDER BY _value SEPARATOR ', ')
    ->          as value_list
    ->   FROM group_concat_test
    ->   GROUP BY _key
    ->   ORDER BY _key;
+------+-----+------------+
| _key | cnt | value_list |
+------+-----+------------+
|    1 |   2 | b, c       |
|    2 |   3 | a, c, d    |
|    3 |   2 | b, e       |
+------+-----+------------+
3 rows in set (0.00 sec)

mysql>

The GROUP_CONCAT function is available since MySQL 4.1, so for everyone still running an older version, this would be one more (of many) reason to update.

MySQL – PostgreSQL reactions

There have been some reactions about my MySQL – PostgreSQL article that I wrote recently.

I received a kind email from Magnus Hagandar, who is one of PostgreSQL’s developers (the PostgreSQL website states: “Has done major work on the Win32 port and wrote much of the Win32 Installer.”). He answered some of my unclear questions about Postgresql (and allowed me to post it here in my blog, so here it is):

Hi!

Stumbled across
http://db4free.blogspot.com/2005/12/mysql-postgresql.html, thought I
should answer two questions for you. You get it in mail since you rblog
doesn’t permit anon postings and I don’t have an account 🙂

> When a transaction is started and you write a SQL statement that has a
> syntax error in it, PostgreSQL rolls back the transaction immediately,
> while MySQL shows an error message

Check the ON_ERROR_ROLLBACK parameter
(http://www.postgresql.org/docs/8.1/static/app-psql.html). This is for
psql – if you want it in your own custom app, just use a SAVEPOINT.
(this parameter will make psql use SAVEPOINTs automatically)

> MySQL has an ‘enum’ data type. I haven’t found a similar data type in
> PostgreSQL, however, ‘enum’s can be emulated with check constraints.

If you have many possible values in your enum, you really should go with
a separate table and a foreign key. That keeps it inline with the
relational model as well 🙂

> I haven’t yet figured out where PostgreSQL stores this data and I
> think it’s only possible to use the SQL commands to set the permissions
> accordingly.

They’re stored in a system table, normally pg_class (but for a schema,
in pg_namespace etc). See
http://www.postgresql.org/docs/8.1/static/catalogs.html, and the
subpages.
You’re never supposed to hack these using anything other than
GRANT/REVOKE, but you can.

Hope it helps some 🙂

//
Magnus

I also received a mail from Carsten Pederson (he’s the Certification Manager at MySQL AB) and corrected one of my statements. Extended inserts (those which allow you to insert more than one row at a time) are not an extension to MySQL, but SQL standard. It’s one of the few SQL standard features that PostgreSQL doesn’t support at the moment.

There are three more differences that I forgot in my last article, so here they are:

  • With MySQL it’s possible to insert new columns somewhere in the middle of an existing table, while PostgreSQL adds new columns always after all columns.
  • MySQL has a REPLACE command that doesn’t exist in PostgreSQL.
  • PostgreSQL has an additional “layer” called schema (so it’s database – schema – tables, unlike in MySQL it is database – tables).

Of course there exist a lot more, to count them all up would probably be enough to fill a book. But it’s interesting to see how two different products have very much in common, but still a lot of differences and it’s real fun to me to find out new things on both MySQL and PostgreSQL. The more I work with PostgreSQL, the more I like it, but the nice thing about it is that it doesn’t affect the way I like MySQL :-).

Breaking referential integrity

Is it possible to break the referential integrity when a foreign key constraint is set on an InnoDB table? The answer is an (unfortunate?) yes.

A little example demonstrates this:

mysql> CREATE TABLE t1 (
    ->   id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
    ->   ) ENGINE = InnoDB;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO t1 (id) VALUES
    ->   (1), (2), (3);
Query OK, 3 rows affected (0.02 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> CREATE TABLE t2 (
    ->   id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    ->   id_t1 INT NOT NULL,
    ->   FOREIGN KEY (id_t1) REFERENCES t1(id))
    ->   ENGINE = InnoDB;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO t2 (id_t1) VALUES
    ->   (3), (2), (2), (3), (1);
Query OK, 5 rows affected (0.00 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM t1; SELECT * FROM t2;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
+----+
3 rows in set (0.00 sec)

+----+-------+
| id | id_t1 |
+----+-------+
|  5 |     1 |
|  2 |     2 |
|  3 |     2 |
|  1 |     3 |
|  4 |     3 |
+----+-------+
5 rows in set (0.00 sec)

mysql> INSERT INTO t2 (id_t1) VALUES (4);
ERROR 1452 (23000): Cannot add or update a 
child row: a foreign key constraint fails 
(`test/t2`, CONSTRAINT `t2_ibfk_1` FOREIGN KEY 
(`id_t1`) REFERENCES `t1` (`id`))

mysql> SET @@foreign_key_checks = 0;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO t2 (id_t1) VALUES
    ->   (2), (1), (4), (2);
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> SET @@foreign_key_checks = 1;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM t1; SELECT * FROM t2;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
+----+
3 rows in set (0.00 sec)

+----+-------+
| id | id_t1 |
+----+-------+
|  5 |     1 |
|  7 |     1 |
|  2 |     2 |
|  3 |     2 |
|  6 |     2 |
|  9 |     2 |
|  1 |     3 |
|  4 |     3 |
|  8 |     4 |
+----+-------+
9 rows in set (0.00 sec)

The record no 8 in table t2 has now an invalid value, due to the fact that we have set the foreign key verification temporarily to off (0).

It’s not a bad feature that it’s possible to temporarily turn @@foreign_key_checks off, because sometimes it’s necessary to change data in a way that referential integrity needs to be broken during the modification process. But most of the time, the data should be valid, when modification is finished. So what can be done?

Unfortunately, the only way is to check manually, if there are any invalid values:

mysql> SELECT t2.id
    -> FROM t2 LEFT JOIN t1 ON t2.id_t1 = t1.id
    -> WHERE t1.id IS NULL;
+----+
| id |
+----+
|  8 |
+----+
1 row in set (0.00 sec)

Could the command “SET @@foreign_key_checks = 1” verify automatically, whether all values are valid? Probably not, because all data from all tables with foreign key constraints would have to be verified for invalid data, because the command doesn’t know, which tables have been modified in the meantime. How else can this problem be solved?

PostgreSQL knows a clause that can be added to the FOREIGN KEY definition – it’s called INITIALLY DEFERRED. This clause allows foreign keys to be broken within a transaction, but they must be valid again, when the transaction commits. A nice feature, I think.

Another solution could be a table based command that includes this kind of verification. Maybe the CHECK TABLE [tablename] would be an appropriate command for checking foreign key integrity on InnoDB tables, too. The option EXTENDED can be appended to the command, but also this doesn’t work for broken integrity:

mysql> CHECK TABLE t2 EXTENDED;
+---------+-------+----------+----------+
| Table   | Op    | Msg_type | Msg_text |
+---------+-------+----------+----------+
| test.t2 | check | status   | OK       |
+---------+-------+----------+----------+
1 row in set (0.00 sec)

Maybe that’s something to be considered to add foreign key integrity checks to this (or maybe another) command?!