Saturday, 18 October 2008

Erlang Mnesia

Carrying out my Erlang learning plan, I embarked on the porting exercise of my Address DAO package into Erlang Mnesia. In fact, it is more than porting of the data access functions. It is also a porting of the Address database itself from RDBMS into Mnesia.

Creating the Mnesia Schema

The Entity Relationship Diagram (ERD) of the database schema is shown below.

To create the tables in Mnesia, I defined the following records, each corresponding to a table.

-record(address_field_type, {id,
 name,
 locale_country,
 default_value,
 hierarchy_order,
 display_order,
 validation_rules,
 suffix} ).
 
-record(address_field, {id,
 value,
 address_field_type_id,
 location_code}).
 
-record(address, {id,
 status,
 location_code}).
 
-record(address_addressField, {
 address_id,
 address_field_id
 }).
-record(oids, {
 name,
 id
 }).

Note that there is an extra record oids. This record/table is required because Mnesia does not support auto-incrementing fields usually used as primary key generation. Therefore this extra table is used to store the current key value - similar to Oracle's sequence.

Creating the tables are shown below. The oids table is also populated with seed values.

create_db() ->
 mnesia:create_table(address_field_type,
  [
   {ram_copies,[nonode@nohost]},
   {attributes, record_info(fields, address_field_type)}
  ]),
 mnesia:create_table(address_field,
  [
   {ram_copies,[nonode@nohost]},
   {index, [location_code]},
   {attributes, record_info(fields, address_field)}
  ]),
 mnesia:create_table(address,
  [
   {ram_copies,[nonode@nohost]},
   {index, [location_code]},
   {attributes, record_info(fields, address)}
  ]),
 mnesia:create_table(address_addressField,
  [
   {type, bag},
   {ram_copies,[nonode@nohost]},
   {index, [address_field_id]},
   {attributes, record_info(fields, address_addressField)}
  ]),
 mnesia:create_table(oids,
  [
   {attributes, record_info(fields, oids)}
  ]),
 Fun = fun() ->
  mnesia:write(#oids{ name=address, id=0 }),
  mnesia:write(#oids{ name=address_field_type, id=0 }),
  mnesia:write(#oids{ name=address_field, id=0})
 end,
 mnesia:transaction(Fun).

Simple CRUD

The address_field_type is a table for storing reference data, which does not rely on other tables. So the C and R of the CRUD is very simple. From address_db.erl file:
%% returns {atomic, Oid} or {aborted, Reason}
generate_oid(TableName) when is_atom(TableName) ->
 Fun = fun() ->
  [Oid] = mnesia:read(oids, TableName, write),
  %% because Erlang only supports single assignment
  %% I have to create new variables every time the value changes.
  NewId=Oid#oids.id+1,
  New = Oid#oids{ id=NewId },
  mnesia:write(New),
  NewId
 end,
 mnesia:transaction(Fun).
  
insert_aft(Aft) when is_record(Aft, address_field_type) ->
 Fun = fun() ->
  {atomic, Id}=generate_oid(address_field_type),
  New = Aft#address_field_type{ id=Id },
  mnesia:write(New),
  New
 end,
 mnesia:transaction(Fun).

find_aft(Id) when is_integer(Id) ->
 {atomic, [Aft]} = mnesia:transaction(fun()-> mnesia:read({address_field_type, Id}) end),
 Aft.
There are a few things worth noting:
  1. Although the mnesia:read and mnesia:select functions can be read-only, they still need to be wrapped in a function and called by mnesia:transaction/1. This does not make sense to me.
  2. The mnesia:transaction(Fun) returns {aborted, Reason} or {atomic, Result} where Result is whatever the return result of the Fun is. Therefore, the insert_aft/1 above returns New - the new address_field_type record with the id field populated.
  3. Erlang is single assignment language - i.e. a variable can only be assigned value once. Therefore, if I want to assign the Aft#address_field_type.id, I have to create a whole new address_field_type record as in the insert_aft/1 above. I have to say, I am not a fan of this single assignment - it defeats the half of the purpose of having variables.

A Slightly More Complicated CRUD

The address entity relies on address_field, which in turn depends on address_field_type. Also, if address records share the same address_field values, then the address_field record should be shared and not be duplicated in the database. Therefore, the management of the address records are more complicated than those of address_field_type.

The creation of address records is shown below.

insert_address(A, Afs) when is_record(A, address) and is_list(Afs) ->
 {A2, Afs2} = address:generate_location_code(A, Afs),
 Fun = fun() ->
  % populate the address_field records' id field
  NewAfs=lists:foldl( fun(Af, NewAfs) ->
    % check if the same Af is already in database
    Result = mnesia:select(address_field, [{
     #address_field{ id='$1',
      location_code=Af#address_field.location_code, 
      _='_' },
     [], ['$1'] }] 
    ),
    if length(Result)==0 -> % Af not in DB, so insert it
     {atomic, Id}=generate_oid(address_field);
    true -> % Af already exists in database
     Id=lists:nth(1, Result)
    end,
    NewAf = Af#address_field{ id=Id },
    lists:append(NewAfs, [NewAf])
   end,
   [],
   Afs2),
  
  % create the new address record
  {atomic, AddressId} = generate_oid(address),
  NewA = A2#address{ id = AddressId },
  mnesia:write(NewA),
  
  % now insert/update into address_field 
  % and insert into address_addressField table
  lists:foreach( fun(Af) ->
    mnesia:write(Af),
    A_Af=#address_addressField{ 
     address_id=AddressId,
     address_field_id=Af#address_field.id
    },
    mnesia:write(A_Af)
   end,
   NewAfs),
  {NewA, NewAfs}
 end,
 mnesia:transaction(Fun).
From the address.erl file:
%% applying quick sort to AddressField list.
sort_afs([]) -> [];
sort_afs([Pivot|Tail]) when is_record(Pivot, address_field) ->
 sort_afs([Af || Af <- Tail, 
  compare_aft(address_db:find_aft(Af#address_field.address_field_type_id),
   address_db:find_aft(Pivot#address_field.address_field_type_id)) < 0])
  ++ [Pivot] ++
  sort_afs([Af || Af <- Tail, 
  compare_aft(address_db:find_aft(Af#address_field.address_field_type_id),
   address_db:find_aft(Pivot#address_field.address_field_type_id)) >= 0]).

compare_aft(Aft1, Aft2) when 
 is_record(Aft1, address_field_type) and is_record(Aft2, address_field_type) ->
 Aft1#address_field_type.hierarchy_order - Aft2#address_field_type.hierarchy_order.

generate_location_code(A, AddressFields) when is_record(A, address) 
 and is_list(AddressFields) and (length(AddressFields)>0) ->
 Afs=generate_location_code(AddressFields),
 FirstAf=lists:nth(1, Afs),
 { 
  A#address{ location_code=FirstAf#address_field.location_code },
  Afs
 }.

generate_location_code(Afs) ->
 ReverseSortedAfs=lists:reverse(sort_afs(Afs)),
 generate_location_code(ReverseSortedAfs, [], []).
 
generate_location_code([Head|Tail], String, NewAfs) when is_record(Head, address_field) ->
 StringValue = Head#address_field.value ++ String,
 Code=erlang:phash2(StringValue),
 generate_location_code(Tail, 
  StringValue,
  [ Head#address_field{ location_code=Code } ] ++ NewAfs);  
generate_location_code([], _String, NewAfs) ->
 NewAfs.   
The following functions are queries from the address_db.erl file:
find_address(Id) when is_integer(Id) ->
 {atomic, [A]} = mnesia:transaction(fun()-> mnesia:read({address, Id}) end),
 A.
 
find_address_field(Id) when is_integer(Id) ->
 {atomic, [Af]} = mnesia:transaction(fun()-> mnesia:read({address_field, Id}) end),
 Af.
 
find_address_field_ids(A_id) ->
 mnesia:transaction(fun()-> mnesia:read({address_addressField, A_id}) end).
 
find_address_ids(Af_id) ->
 F = fun() ->
  mnesia:select(address_addressField, [{
   #address_addressField{ address_id='$1',
    address_field_id=Af_id, 
    _='_' },
    [], ['$1'] }] 
  )
 end,
 mnesia:transaction(F).
 
find_address_ids2(Af_id) ->
 F = fun() ->
  Q = qlc:q([Aaf#address_addressField.address_id 
   || Aaf <- mnesia:table(address_addressField),
   Aaf#address_addressField.address_field_id==Af_id]),
  qlc:e(Q)
 end,
 mnesia:transaction(F).
Things that I want to highlight from the above are:
  1. Unlike Java or C#, Erlang does not have a sortable list out-of-the-box. Instead, I had to write functions that sort my list of address_field records - these are: sort_afs, compare_aft
  2. When searching using the primary key field, mnesia:read is used; when searching using other fields, either use mnesia:select or QLC. In the above code, the functions find_address_ids/1 and find_address_ids2/1 do the same thing. However, the result (a list) given by find_address_ids2/1 are reversed.
  3. If a record with the same primary key already exists in the table, the mnesia:write either creates (if table type if set) or updates (if table type is bag) the record in the table.
  4. Again, due to the single assignment rule, the lists:append(List, AnotherList) does not change the List; instead, the new list is the return value of lists:append/2. This is just counter-intuitive.

Related Posts:

7 comments:

Gleber said...

Hi there.

Here are my notes on your notes ;)

1. There is function to sort lists in stdlib (i.e. out of the box). It is in module lists and called keysort (lists:keysort/2). I haven't jumped deep into your code, but are you sure lists:keysort/2 does not fit you?

2. Read about mnesia:index_read/3

4. Using lists:append/2 in this way is very inefficient:

lists:append(NewAfs, [NewAf])

instead build a list with:

NewAfs2 = [ NewAf | NewAfs ]

and then lists:reverse/1 it.


X = [ A | Y ] appends element A to the front of Y and assigns it to X and it is very efficient in Erlang.

Intuition is very relative thing. Erlang is functional language with immutable variables - intuition is quite different here in comparison to imperative languages ;) That's the reason why learning Erlang is a very good idea! It makes you look at some thing from completely different angle.

Other notes:

1. Are you sure you need address_addressField table. Why not change record 'address' and add field 'fields', which is a list of fields?

-record(address,
{id,
status,
location_code,
fields = []}).

Mnesia stores in fact ANY term, hence there is no problem with storing a list of records in there.

2. Transfering normalized database from RDBM to mnesia is not a good idea. To work efficiently with mnesia it is usually a good idea to denormalize the database.

3. Having numeric ID is absolutely not mandatory in mnesia. If theres a entry of type address_field_type with name "country", why not use field 'name' as PK and have meaningful values in referencing fields (i.e. atom 'country' instead of 0, atom 'province' instead of 1, etc.) ?

4. Sometimes using mnesia:transaction for every operation is just a overkill. Read about mnesia:dirty_* in manual

Best regards,
Gleb Peregud

Gleber said...

"I have to say, I am not a fan of this single assignment - it defeats the half of the purpose of having variables."

But it is, afaik, the only way to ensure almost-lock-free concurrency. It has some very deep implications and as a result advantages.

Romen said...

Thanks gleber for the comments. One of the reasons of my blog is to attract expert opinions like this.

As of single assignment, I thought about it this morning and realised that it must be for the purpose of concurrent programming. So I logged in to add my thought, then I saw you point it out already :)

The list:keysort/2 does not work in this case as the key is not in the record, but has to be looked up from the database. I did think about adding that field into the record but decided not to so that the records match the tables structure.

I will follow your advice and give them a try.

cheers
romen

Romen said...

Also, about the "other notes..." on normalised database:

Indeed I used the address_addressField table to normalise the data. I did model the address record to contain a list of address_fields in my previous exercise when trying out Erlang ODBC. Here I wanted to try something different and I kept the record structure faithful to the table to make the database operations easy.

I agree that Mnesia is intended for non-normalised database as opposed to your average 3rd-party RDBMS. I work in the telco industry and do understand the data characteristics of these applications (HLRs, switches, etc. where MSISDN or similar is used for query key so there is no need for other object/record IDs).

Through these exercises I try to understand the scope of the Erlang framework(s) and what are the best use cases for them.

cheers
romen

Gleber said...

I see, that's nice that my comments are helpful. Sorry if some of them are plain truisms for you :)

Keep learning Erlang - it is awesome!

Anonymous said...

Hi,

As far, as sorting is concern - there is another, more general purpose function:

sort(Fun, List1) -> List2

Anonymous said...

There is a built in way to do this with NewOid = mnesia:dirty_update_counter(oids,table_name,increment value)

This will update your oids table and return the next id.

however you oids table should look like this
-record( oids, {type, id} ).

http://dudefrommangalore.blogspot.com/2009/03/auto-increment-in-mnesia-database.html