Tuesday, 21 October 2008

Erlang Mnesia - Take 2

After my first attempt of using Erlang Mnesia, I decided to apply what I have learned so far to refactor and improve my Erlang Mnesia implementation of the Address database and Data Access Layer (DAL).

Redesigning The Database

This time I de-normalised part of the database design -
  1. to duplicate the hierarchy_order field from address_field_type to address_field table
  2. removed the record id fields from the 3 entities:
    • address_field_type is a reference data table with small amount of data. I use the locale_country + hierarchy_order as a combined key implicitly
    • address and address_field both have the location_code field which is unique for each record, so I used location_code as their primary key
The address.hrl file:
-record(address_field_type, {
 locale_country,
 name,
 default_value,
 hierarchy_order,
 display_order,
 validation_rules,
 suffix} ).
 
-record(address_field, {
 location_code,
 value,
 locale_country,  %% denormailsed from address_field_type
 hierarchy_order  %% denormalised from address_field_type
 }).
 
-record(address, {
 location_code,
 status
 }).
 
-record(address_addressField, {
 address_locationCode,
 address_field_locationCode
 }).
Creating the database:
create_db() ->
 mnesia:create_table(address_field_type,
  [
   {type, bag}, %% use bag because no single-filed primary key now
   {ram_copies,[nonode@nohost]},
   {index, [hierarchy_order]},
   {attributes, record_info(fields, address_field_type)}
  ]),
 mnesia:create_table(address_field,
  [
   {ram_copies,[nonode@nohost]},
   {attributes, record_info(fields, address_field)}
  ]),
 mnesia:create_table(address,
  [
   {ram_copies,[nonode@nohost]},
   {attributes, record_info(fields, address)}
  ]),
 mnesia:create_table(address_addressField,
  [
   {type, bag},
   {ram_copies,[nonode@nohost]},
   {index, [address_field_locationCode]},
   {attributes, record_info(fields, address_addressField)}
  ]).

The above changes do reduce the functionality of the Address application very slightly but it's a worthy trade-off as it simplifies the design - now I don't need the oid table/record any more along with all the functions and code revolving the oids.

Also, by de-normialising, i.e. duplicating the hierarchy_order field into address_field table, the sorting of the address_field list is made easy and efficient (without having to look up the database in every iteration). This is demonstrated in the code below.

Simplified Sorting

The sorting of address_field list has reduced from 3 functions previously to the use of lists:keysort/2 due to the duplication of the hierarchy_order field from address_field_type into address_field in the data/entity model.
sort_afs(Afs) when is_list(Afs) ->
 %% 5th field of the address_field tuple is hierarchy_order
 lists:keysort(5, Afs).

Simplified Data Insertion

By skipping any record id handling, the code for insertion of records is considerably simplified. This is most evident from the insert_address/2 function (compared to the previous version):
insert_aft(Aft) when is_record(Aft, address_field_type) ->
 Fun = fun() ->
  mnesia:write(Aft)
 end,
 mnesia:transaction(Fun),
 Aft.
 
insert_address(A, Afs) when is_record(A, address) and is_list(Afs) ->
 {NewA, NewAfs} = address:generate_location_code(A, Afs),
 Fun = fun() ->  
  % create the new address record
  mnesia:write(NewA),
  
  % now insert/update into address_field 
  % and insert into address_addressField table
  lists:foreach( fun(Af) ->
    mnesia:write(Af),
    A_Af=#address_addressField{ 
     address_locationCode=NewA#address.location_code,
     address_field_locationCode=Af#address_field.location_code
    },
    mnesia:write(A_Af)
   end,
   NewAfs),
  {NewA, NewAfs}
 end,
 mnesia:transaction(Fun).

Simplified Data Queries

By using dirty_read and dirty_index_read functions unnecessary transactions are avoided.
find_afts(LocaleCountry) when is_list(LocaleCountry) ->
 mnesia:dirty_read({address_field_type, LocaleCountry}).

find_address(Id) when is_integer(Id) ->
 [A] = mnesia:dirty_read({address, Id}),
 A.
 
find_address_field(Id) when is_integer(Id) ->
 [Af] = mnesia:dirty_read({address_field, Id}),
 Af.
 
find_address_field_codes(A_code) ->
 % the read will return a list of tuples, we want the 3rd field (the AF_locationCode) 
 % of each tuple and put them in a list.
 [ element(3, A) || A <- mnesia:dirty_read({address_addressField, A_code}),
  true ].
 
find_address_codes(Af_code) ->
 F = fun() ->
  mnesia:select(address_addressField, [{
   #address_addressField{ address_locationCode='$1',
    address_field_locationCode=Af_code, 
    _='_' },
    [], ['$1'] }] 
  )
 end,
 {atomic, Results}=mnesia:transaction(F),
 Results.
 
find_address_codes2(Af_code) ->
 F = fun() ->
  Q = qlc:q([Aaf#address_addressField.address_locationCode
   || Aaf <- mnesia:table(address_addressField),
   Aaf#address_addressField.address_field_locationCode==Af_code]),
  qlc:e(Q)
 end,
 {atomic, Results}=mnesia:transaction(F),
 Results.
 
find_address_codes3(Af_code) ->
 Aafs = mnesia:dirty_index_read(address_addressField, Af_code, 
  #address_addressField.address_field_locationCode),
 % the second element of the tuple is the address_locationCode
 [ element(2, Aaf) || Aaf <- Aafs, true ].
In the above code, find_address_codes/1, find_address_codes2/1, find_address_codes3/1 do the same thing but are implemented using different approaches. The dirty read one is the simplest.

Conclusion

Just by thinking in Erlang, the resulting database design has been simplified - less table and fields; the source lines of code (SLOC) has been reduced by 30%; most importantly, the programming logic is now simpler and easier to read/maintain.

However, this does not mean that Erlang and Mnesia are the best way to implement this kind of database applications (where normalisation is important for the business logic).

2 comments:

Gleber said...

Mnesia is for sure not the best db engine for such applications, but we know that's not the goal of your experiments :)

That's a good idea to denormalize data in a way that lists:keysort/2 may be used!

Why do you add "true" to the end of the list comprehension?.. "Guard" part of the list comprehension is optional afaik, hence I don't see the reason to include it.

> address_field_type is a reference data table with small amount of data.
> I use the locale_country + hierarchy_order as a combined key
> implicitly

I'm not entirely sure what you mean, hence I'm sorry if I missed something :)
Afaik Mnesia does not support combined primary key if defined that way. For primary key to be a composite value you should use a tuple {locale_country, hierarchy_order}. Additionally mnesia does not use indexes automagically for every read (except for, sometimes, qlc, match_object/3 and [dirty_]index_read/1) hence it looks like some indexes are of no use. Sorry if I missed any bit of your code.

Romen said...

Hi gleber

Yes, you are right. There is no need for the true in list comprehensions.

I did not use the two fields as combined key in any database operations. Instead, I de-normalised them into the address_field so that I can always find the corresponding address_field_type if I want to (which is not implemented in the example).

Thanks again for the great advice.

cheers
romen