Tuesday 15 December 2009

CakePHP: A behavior for acessing non UTF Oracle databases

The origins of our company's Oracle database date back at the beginning of the decade. At that time we had Oracle version 9i running on SuSE Linux 8.2 and the expected thing to do back then was to create a database using the EL8ISO8859P7 character set. After eight years we are still using Oracle. Now the database is 11g and the database server is OEL 5.4. Basic data structures however, are still the same as they were back in 2002.

During the evaluation of CakePHP as our next development environment we very soon run into the problem of trying to insert and retrieve Unicode data from a non-unicode database. Since the encoding key of $defualt array member of the DATABASE_CONFIG class (stored in app/config/database.php file) has no effect when connecting to Oracle databases, we ended up creating an additional translation layer, that would convert data to and from Unicode when reading from and writing to Oracle.

CakePHP's way of doing this kind of staff is to create behaviors. Ours is called CharsetConverter, so by CakePHP's standards it is implemented in a class named CharsetConverterBehavior that is stored in a file named charset_converter.php which is located in the APP/models/behaviors directory.

The approach here uses the mb_convert_encoding function provided with the php-mbstring package. The code implementation is the following

<?php
/**
 * A simple behavior that allows cake PHP to use single byte Oracle
 * and possibly other vendor -- databases
 * Tested with Oracle 11gR1
 *
 * @version 0.1
 * @author Thanassis Bakalidis
 */
class CharsetConverterBehavior extends ModelBehavior {
    // we have an Oracle database that dates back to 2002 so
    const DEFAULT_DB_LOCALE = 'ISO-8859-7';
    const DEFAULT_PAGE_LOCALE = 'UTF-8';

    const READING_FROM_DB = TRUE;
    const WRITING_TO_DB = FALSE;

    var $databaseLocale;
    var $webpageLocale;
    var $Model;

    function setup(&$model, $settings=array())
    {
        $this->Model = $model;

        $this->databaseLocale = isset($settings['databaseLocale']) ?
                                    $settings['databaseLocale'] :
                                    self::DEFAULT_DB_LOCALE;
        $this->webpageLocale = isset($settings['webpageLocale']) ?
                                    $settings['webpageLocale'] :
                                    self::DEFAULT_PAGE_LOCALE;
    }

    /**
     * Change the query where clause to the datbase native character set.
     */
    function beforeFind( &$queryData, $queryParams)
    {
        if (!isset( $queryParams['conditions']))
            return $queryParams;

        $queryParams['conditions'] = $this->recodeRecordArray(
                                                        $queryParams['conditions'],
                                                        self::WRITING_TO_DB);
        return $queryParams;
    }

    /**
     * Convert fetched data from single byte to utf-8
     */
    function afterFind(&$model, $results, $primary)
    {
        return $this->recodeRecordArray($results, self::READING_FROM_DB);
    }

    /**
     * Convert data to be saved into the database speciffic locale
     */
    function beforeSave()
    {
        $this->Model->data = $this->recodeRecordArray( $this->Model->data,
                                                       self::WRITING_TO_DB);
        return true;
    }

    /**
     * Recursively traverse and convert the encoding of the array passed
     * as parameter.
     */
    function recodeRecordArray(&$recordArray, $loading = TRUE)
    {
        foreach( $recordArray as $key => $value)
            if (is_array($value))
                $recordArray[$key] = $this->recodeRecordArray($value, $loading);
            else {
                if (is_numeric($value))
                    continue;
                $recordArray[$key] = $loading ?
                                    mb_convert_encoding(
                                                $value,
                                                $this->webpageLocale,
                                                $this->databaseLocale)
                                    :
                                    mb_convert_encoding(
                                                $value,
                                                $this->databaseLocale,
                                                $this->webpageLocale);
            }
        return $recordArray;
    }
}
?>

Once we have this in place, using the new behavior in one of our models is as simple as setting the correct value of the $actAs variable. Here is a simple example of a model using the Character set convention and validation.

<?php
    class Task extends AppModel {
        var $name = 'Task';
        var $actsAs = array(
                    'CharsetConverter' => array(
                                            'databaseLocale' => 'ISO-8859-7',
                                            'webpageLocale' => 'UTF-8'
                                        )
                    );
        var $validate = array(
                    'title' => array(
                                'rule' => 'notEmpty',
                                'message' => 'Task title cannot be left blank'
                    )
                );
    }
?>

Almost all applications contain more than one model. Perhaps the best place to put the $actAs definition would be the AppModel class defined in the file app_model.php in the root of your app directory

I also understand that writing a behavior to accomplish the job of the database driver is not the best solution. Since I have nothing better for the moment, I guess I will have to start every new CakePHP project by first changing my app_model.php file.

No comments :