Low-Cost Unix Database Differences

1999-08-15

Msql:

Pros:

Basic SQL database.
Easy to learn for beginners.
Well-known in the Unix community.
Includes w3-lite scripting language for web-based applications using Msql as a datastore.
Easy to understand C-based API.
Regular expressions supported.

Cons:

No foreign key support.
No trigger support.
No subselects.
No views.
No transactions.
Limited sequence support ( more of a rowid than a sequence, actually )
No count(*) functionality; must bring back every row of the table to get an accurate count.
Slow when compared to MySQL and PostgreSQL.
US$200; overpriced and underpowered when compared to the MySQL and PostgreSQL alternatives.
Limited SQL functionality
Not geared towards large datasets; relatively bad join performance.
Few datatypes available
Does not handle date manipulation inside of the SQL engine.

MySQL:

Pros:

Extremely fast.
Regular expression support.
Quite a few datatypes available.
Support staff very knowledgable and helpful.
Supports identity/auto-increment columns, similar to sequences.
C-based API very similar to Msql; aids in porting Msql apps to MySQL. A small shell script is included that does the basic port ( via sed )
CHAR/VARCHAR/TEXT columns handled canse insensetively.
Low-cost, often free, for most users. Commercial support available.
BLOB support stores the BLOBs in the table.
Supports user-defined functions in C and derivatives.

Cons:

No support for transactions ( begin transaction, rollback, commit ). One must explicitly lock a table and unlock it when finished.
No foreign key support.
No trigger support.
No subselects.
No views.
No sequence support. No, auto_increment columns are not sequences.
Auto_increment columns will give the next available number upon insert, possibly messing up what little referential integrity constraints your application attempts to enforce.
Only one automatically-updated column per table; one cannot have a timestamp and auto_increment column in the same table and have both be updated.
User-defined functions, even the most basic ones, must be in C and derivatives.
No on-line recovery; one must bring down the database server and run the 'isamchk' utility on their datafiles.
Hacked-up SQL functions such as REPLACE INTO.
Indexes must be created at table-creation time. To add a new index/key, the table (and its data) must be backed up and reimported. Work around: create a new table, insert into new_table select * from old_table, drop old_table, alter table new_table rename old_table.
Does not handle dates correctly. One can insert '1999-02-31 01:01:01' into a datetime column, mysql accepts it.
Does not handle date manipulation properly; select '1999-08-14 11:32:00' - 7 == '1992'

PostgreSQL:

Pros:

Object-relational database.
Free for all uses. Commercial support available.
Transactions supported ( begin transaction, rollback, commit )
Sequences supported.
Triggers supported.
Subselects supported.
Views supported.
User-defined functions can be in C and derivatives, PL/TCL, PL/PgSQL, or SQL.
Regular expressions supported.
Foreign keys supported via an add-on module called "refint".
Multiversion concurrency control ( MVCC ).
Handles dates correctly; one CANNOT insert '1999-02-31 01:01:01' into a datetime column.
Handles date manipulation correctly through the use of timespan datatypes: select DATE('1999-08-14'::date - '7 days'::timespan) == 1999-08-07
Supports the use of querying on 'inherited' datatypes; one can query on only the date portion of a datetime column.
Supports online recovery via VACUUM.
Supports the creation of indexes after table creation.
Inheritance support; create table a ()... create table b inherits from a.
Support for large objects/BLOBs.
Without a doubt the most advanced open source database system available.

Cons:

Due to foreign key support not being fully 'inside' of PostgreSQL, it does not support the REFERENCES keyword. One must manually create two triggers ( one on the parent, one on the child ) to get foreign key support.
A bit slower for INSERTS/UPDATES when compared to MySQL. Then again, PostgreSQL has to check constraints and triggers and such prior to doing the insert/update, which slows it down.
On large tables, VACUUM can take a while to return.
SQL syntax gets a little odd for certain columns, as you'll notice above: SELECT ... '1999-08-14'::date - '7 days'::timespan. When comparing columns/values of different data types, one column must be 'translated' into another column via this ( or similar ) syntax.
On-line backups done via pg_dump don't seem to properly dump views; views come out as a table, not as a query on 1+ tables.
Basic installation does a sync after every insert/update, slowing down the system in general. While this can be disabled via an argument, new users may not realize this and blame poor performance on Postgres.
Large-object ( BLOB ) support is space consuming. One must first pull the large object out of the database and into the filesystem. This is, imho, really no better than storing the path to the file in question; you'd skip the retrieval from the database portion of the query and wouldn't have to unlink() the exported file when finished.

Recommendations

For most database requirements, one can safely ignore Msql. Msql, while nice for academic, at-home use, should not be considered for any sort of important data.

Overall, PostgreSQL is my first choice for a free/low-cost database solution for Linux or the other free Unices. MySQL may be faster, but due to the its date handling, no referential integrity, etc. shortcomings, I can no longer picture using it for any sort of important database. That being said, I did write a billing system for a popular local ISP that used MySQL as its datasource. However, I have since rewritten the system using Postgres. Due to Postgres' support for triggers, views, and programming languages/stored procedures, I was able to drop a good 600 lines from the original code. In addition, I was able to spot and fix no more than 7 ( out of 2,500 ) referential integrity problems that were in the MySQL-driven database.