Friday 27 June 2008

SOA Pitfalls

The Top 10 SOA Pitfalls series is a really interesting read. It reminds me of my first hand experiences of some of those pitfalls.

I was in charge of the solutions for a high profile project in a large Telco environment. In this particular environment, there were myriads of small systems which did not talk to each other. Reason for the proliferation of many small systems is that each department head can approve projects below certain budget. So each time a department/group has some business requirements, the tendency is to have a chewing gum and duct tape solution as long as it stays under the budget limit for the department head. So after years of such practice, users were left with small systems that were small in functionality and an overall complex enterprise systems environment for any IT architects.

The client had come to their senses to change this by introducing a COTS. Part of our project's responsibility was to replace those small systems with the COTS. After initial study, I found that we had to replace 7 small systems and integrate with 17 others. In this environment, a SOA platform is called for. Fortunately the client had corporate license for SeeBeyond (now called JCAPS after Sun bought it). JCAPS had been used in several other high profile projects ($10Ms, 1-3 years implementation time). To my astonishment, each of those project had its own instance of JCAPS running; and these instances were not talking to each other. So it looked like our project was going to introduce another instance of JCAPS silo! This type of use of EAI defeats the purpose of using EAI. After all, the 'E' in EAI stands for 'Enterprise' - it is supposed to be enterprise wide. After some hard fighting, I am glad to say that the project had a happy ending.

Dynamic Query

I have a prototype application which searches from database for Address records based on keywords. Keywords are entered on the GUI by the user. The number of keywords is unknown beforehand. Dynamic query is therefore used. The corresponding entity model is shown in my other blog. The equivalent SQL statement looks something like this:

select a.*
from Address as a, AddressField as af, Address_AddressField as aaf
where a.ADDRESSID=aaf.ADDRESSID
and aaf.ADDRESSFIELDID=af.ADDRESSFIELDID
and (upper(af.VALUE) like 'BOMBAY'
or upper(af.VALUE) like 'MUMBAI'
or ... /* more parameters */ )

(Note that in a production system, it is better to have an additional column in the database table to store the uppper-cased values. That way, they can be indexed to improve query performance.)

Here, I will show how various ORM frameworks handle this type of dynamic query.

iBATIS SQL Map

iBATIS SQL Map is a mapping framework between SQL statements inputs/outputs and the object model. Therefore, SQL statements are used in SQL Map directly. The SQL Map query is quite straight forward and uses the same SQL statement:


  
  
  
  


 select a.*
 from Address as a, AddressField as af, Address_AddressField as aaf
 where a.ADDRESSID=aaf.ADDRESSID
 and aaf.ADDRESSFIELDID=af.ADDRESSFIELDID
 
  upper(VALUE) like #[]#
 

iBATIS has this <iterate> tag to denote that for every element in the input parameter (which is a collection, in this case a 'list') format it into the SQL statement using its various constructs - prepend, open, close, conjunction, etc.

The corresponding Java code calling this query:

public class SQLMapAddressDao extends SqlMapDaoTemplate
implements IAddressDao {
...
 private Address[] getByFieldValues(List<AddressField> afs)
 throws DataAccessException {
  try {
   List list=queryForList("Address.getByFieldValues", afs);
   if(list==null || list.size()==0)
    return null;
   else
    return (Address[]) list.toArray(new Address[list.size()]);
  } catch (Exception e) {
   throw new DataAccessException(e);
  }
 }
...
}

iBATIS also supports .NET so I can execute the same query in .NET. The following C# code does the same thing as the Java snippet above.

public Address[] GetByFieldValues(IList afs) {
IList list = ExecuteQueryForList("Address.getByFieldValues", afs)
            if (list == null || list.Count == 0)
                return null;
            else {
                return (Address[])
                    ((ArrayList)list).ToArray(typeof(Address));
            }
        }

Hibernate

Every ORM framework has its own object query language (OQL). Hibernate uses HQL. Programmatically, building the HQL statement is simply building a string. The HQL statement can programatically built as such:

public Address[] getByFieldValues(AddressField[] afs)
 throws DataAccessException{
  Address[] result=null;
  if(afs==null  afs.length==0)
   return null;
  // this query does not work with mySQL 4.0.x.
  String select="select addr"
   + " from com.laws.Address.Address as addr,"
   + "      com.laws.Address.AddressField as af";
  StringBuffer where=new StringBuffer(" where (upper(af.value) like :value_0");
  for(int i=1; i<afs.length; i++) {
   where.append(" or upper(af.value) like :value_"+i);
  }
  where.append(") and addr in elements(af.addresses)");

  String query=select+where.toString();

  try {
   Session sess = HibernateSessionFactory.currentSession();
   Query q = sess.createQuery(query);
   for(int i=0; i<afs.length; i++) {
    q.setParameter("value_"+i, %"+afs[i].getValue().toUpperCase()+"%");
   }
   List list = q.list();
   if (list != null && list.size() > 0) {
    result = (Address[]) list.toArray(new Address[list.size()]);
   }

   HibernateSessionFactory.closeSession();
  } catch (HibernateException he) {
   HibernateSessionFactory.handleException(he);
  }
  return result;
 }

JPA

JPA has been included as part of EJB 3.0 specification to replace the old EJB Container Managed Persistence (CMP) approach. In JPA the same query in JPQL is programmatically built:

public Address[] getByFieldValues(AddressField[] afs)
   throws DataAccessException {
  if(afs==null  afs.length==0)
   return null;

  String select="select a"
   + " from Address addr,"
   + "      AddressField af, in(af.addresses) a";
  StringBuffer where=new StringBuffer(" where (upper(af.value) like :value_0");
  for(int i=1; i<afs.length; i++) {
   where.append(" or upper(af.value) like :value_"+i);
  }
  where.append(") ");
  String queryString=select+where.toString();

  Query query=EntityManagerHelper.createQuery(queryString);
  // the JPA parameter index is 1-based.
  for(int i=0; i<afs.length; i++){
   query.setParameter(i+1, afs[i].getValue());
  }
  List list=query.getResultList();
  if(list==null  list.size()==0)
   return null;
  else
   return (Address[]) list.toArray(new Address[list.size()]);
 }

The code is almost identical to Hibernate except for the JPQL vs. HQL syntax differences. This is not surprising considering JPA shares its roots with Hibernate.

ADO.NET Entity Framework

EF 1.0 Beta 3 does not support dynamic query (using LINQ to SQL). To have string-based dynamic query, I had to download a sample from MSDN, extract out the Dynamics.cs file from the zip and place it into my project. Then I have the System.Linq.Dynamic namespace available to me.

public Address.Domain.Address[] GetByFieldValues(Address.Domain.AddressField[] afs) {
            if (afs == null  afs.Length == 0)
                return null;
            var aQuery= addressContext.AddressField.Where("1=0").Select(a=>a);
            foreach (Address.Domain.AddressField af in afs) {
                if (af.GetValue().Length > 0) {
                    var aQuery2=addressContext.AddressField
                        .Where("value.ToUpper().Contains(@0)", af.GetValue().ToUpper())
                        .Select(a => a);
                    aQuery=aQuery.Union(aQuery2);
                }
            }
    
            Dictionary<EntityKey, AddressModel.Address> dict=new Dictionary<EntityKey,AddressModel.Address>();
            foreach (AddressModel.AddressField afEntity in aQuery) {
                afEntity.Address.Load();
                afEntity.AddressFieldTypeReference.Load();
                foreach (AddressModel.Address aEntity in afEntity.Address) {
                    if(!dict.ContainsKey(aEntity.EntityKey))
                        dict.Add(aEntity.EntityKey, aEntity);
                }
            }
            return AddressEFHelper.AddressArrayEntityToDomain(dict.Values.ToArray());
        }
This implementation is very clumsy compared to other previous frameworks.
  1. I could not build a single query string. I think it's due to the escaped double-quotes (\") in the string - somehow, it gives me error at runtime
  2. I could not get the Address entities from the first iteration (which gives me AddressFields); I had to get the Addresses from the selected AddressFields in a second iteration (nested foreach loops)
Personally, I prefer iBATIS the best because it gives me the flexibility of writing my own SQL statements. The SQL statements are reviewed by the DBA and provide input for the DBA for database optimisation (as we all know that the database is usually the weakest link in terms of overall system performance).

Friday 20 June 2008

Applemania

As I walked pass the Telstra TLife shop in Sydney CBD today, it was deserted more so than usual. Then the bright lights and 3 storeys of ceiling-to-floor glass walls from the building right across the street grabbed my attention, and it was fully packed! I had walked pass the building many times before but never noticed it until then. Suddenly I realised that it was the new Apple Store on George Street, Sydney which was just opened 5pm yesterday. They had renovated the building and now it stands out among its neighbouring 100-year-old sandstone buildings.

I am not an Apple user (that's right, not even iPod) but I am a visual person - I am drawn to anything that looks good. Apple products certainly fit the bill. Usually the first thing I notice about the Mac is its high quality screens. In Uni, I used to have arguments with my die-hard Apple devotee friend about Mac being over priced, under-featured toys comparing to PC. But we always agreed that Mac had good monitors.

On 2nd floor of the Apple Store, I briefly attended a workshop on Apple iWeb. As I was late, I did not catch the whole story, but in the conclusion note, the presenter said that the product (iWeb v2.x) was designed for small/family/fun use rather than large-scale business eCommerce sites. I guess that is true to all Apple products as Apple really projects an image of making technology fun.

Thursday 19 June 2008

WPF: StackPanel vs. Grid

As a new comer to the Microsoft Windows Presentation Framework (WPF), I was perplexed about my ListView. I put a ListView inside a StackPanel, but the ListView does not attempt to occupy the full space of the StackPanel although the Width and Height of the ListView were set to "Auto" - the ListView simply auto-grows based on its contents. When I reduce the size of the window, the ListView does not resize with it either - it just gets clipped. The same thing happens if I put a ScrollViewer into the StackPanel.

Then I found out from some forum (I cannot remember the link now), that when a widget (OK, control, in Microsoft speak) is put in the stack panel, it assumes unlimited canvas space and therefore, does not try to resize. To achieve the effect that I wanted (imaging the ListView of the Windows File Explorer, which occupies the whole space of the window and resizes with the window), I had to replace the StackPanel with Grid. This way, the ListView will auto-resize with its container and shows the scroll-bars accordingly. See screenshot and XAML code snippet below (and yes, I have been using code behind to populate the ListView's GridView column headers programmatically reading from database).

    
        
        
    
    
    
            
                
                    
                
            
    

Although my problem is solved, I cannot help thinking that the behaviour of StackPanel is rather inconsistent - e.g. the GroupBox stretches and resizes with the StackPanel (at least it can anchor itself to the left and right sides), but ListView (or ScrollViewer) cannot.

Tuesday 17 June 2008

Data Logic and Business Logic

The 3-tier or n-tier design has been widely adopted for very good reasons: separation of concerns; divide and conquer...

One doctrine of the 3-tier design is to treat the data tier as data storage and do not implement any business logic inside your database. However, this does not mean that database should not have any logic. On the contrary, any data specific logic should be implemented in the database, not in the business tier.

I recently read an interesting post on using Hibernate Interceptor classes to access the table in the desired schema. The end result of this implementation is to hide the fact that the application is dealing with tables in multiple database schemas so that SQL queries don't have to be littered with 'schemaName.tableName' everywhere. Of course, a simpler solution to this problem is to create a public Synonym in the database.

In my opinion, this 'table routing' logic should be implemented in the database server as it is database design/deployment specific. On the other hand, Hibernate is an ORM framework and logically belongs to the data tier in the 3-tier architecture, so it is not wrong to do it in there, especially if the database server does not support synonyms.

Another real-life example of data logic is in high volume tables - e.g. in Telco billing applications, there are millions of call event records a day; so it is common practice to partition that call event table. In the earlier days, when database servers did not support table partitioning (e.g. SQL Server 2000), DBAs have been manually segmenting tables (e.g. by creating multiple tables: CallEvent_20080201, and CallEvent_20080202, etc.). Obviously the application should not implement the logic of which table to use when executing a query (e.g. get me all call records for today), but let the database handle it - a common practice is to create database view to consolidate/union the manually partitioned tables. An advantage of implementing it in the database is that when you upgrade or migrate to another database server which supports table partitioning (e.g. SQL Server 2005+, Oracle 8+) your code need not change (OK, if you are using ORM, the mapping file may need some modification).

To make a better design decision, the system designer/architect and the data designer/architect must jointly design the system in a collaborative manner.

Sunday 15 June 2008

GWT - RIA of Choice ?

I have been evaluating RIA frameworks (incl. Echo 2, GWT, JavaFX and Silverlight) for building business applications since a year ago. So far GWT has come as a clear winner.

GWT is especially appealing to Java developers - you develop your application in Java: server side, client side GUI, the lot. The GUI development is much like Swing, except that you are dealing with GWT APIs, but the programming experience is pretty much the same. AJAX is also supported by simply writing Java callback code without touching any XML or Javascript. These Java code is then translated into high quality and high performance Javascript by GWT and at deployment time, the JS code is downloaded to the browser and gets executed there - freeing the server side from handling presentation responsibilities. (Echo 3 also promised to take a similar approach). When I first learnt and programmed in GWT, it was such a refreshing experience - it was a joy to use.

The GWT team focuses on the foundation of the framework and does not bother with making state-of-the-art cool widgets. That responsibility has been given to third-parties. Sanjiv Jivan has created the wonderful GWT-Ext project which takes on this responsibility. Some people claimed that between GWT and Flex, they chose Flex because of its better look and feel. I guess they did not include GWT-Ext in their picture.

GWT-Ext provides a set of cool and useful widgets with complete documentation, demo and sample applications. It is a tremendous amount of work for an individual to take on. The result is simply superb.

One major issue of GWT-Ext is that it is a GWT wrapper of Ext-JS library, which used to be LGPL and now GPL (since version 2.1) and a commercial license is also available. Apparently this change of licensing policy has created much stir in the community. The effect of this change is also evident in the GWT-Ext distribution - in earlier versions (e.g. v0.9.2) the Ext-JS is bundled with the GWT-Ext download; but in newer ones (e.g. v2.0.3) you have to download and install Ext-JS yourself.

Furthermore, the maker of Ext-JS, Ext also created a similar product called Ext-GWT (not to be confused with GWT-Ext), which directly competes with GWT-Ext. The technical difference between GWT-Ext and Ext-GWT can be found here.

As a commercial application developer, if you want to stick with GWT-Ext, then there are the following choices:

  1. use older version of Ext-JS with new version of GWT-Ext for free
  2. pay for latest version of Ext-JS but use GWT-Ext for free
  3. hope/wait for someone to fork Ext-JS and maintain a LGPL branch

I can't wait to work with GWT-Ext 2.0.3 which has full Drag-and-Drop (DnD) support in its widgets. Note that by DnD I don't mean just repositioning a widget on screen (which many RIA frameworks can do), but the kind of DnD capability that you would expect in Swing or WinForms, e.g. to drag a row in a grid and drop it into a text area so that the text area can be populated with the data from the corresponding business object being DnD'ed.

I hope Sanjiv stays true to the open source ideology, continues with the good work of GWT-Ext and doesn't change the licensing policy.

Wednesday 11 June 2008

Highlight.js vs. dp.SyntaxHilighter

I need to include source code snippets in my posts from time to time. As a new user of BlogSpot, I want to find a good code formatter/beautifier which can work in BlogSpot for free.

I came across Highlight and SyntaxHiligher.

I first tried Highlight.js. It tries to guess what language the code is in and then applies the highlighting accordingly. Sometimes, the guessing is wrong and you are left with some wierd highlights. It sort of works but the result is not so beautiful, considering it is a code beautifier. A Highlight.js output screenshot is shown below.
I then tried Google's SyntaxHilighter. It does not try to guess the code language. Instead the user specifies the language using the class attribute of the <pre> tag. It produces far more superior results with line numbers and alternate line background colors. A sample output is shown below.
public int CountAddressReferences(Address.Domain.AddressField af)  {
    int id=(int) af.GetAddressFieldId();
    AddressModel.AddressField entity = addressContext.AddressField
        .Where(t => t.addressFieldId == id).First();
    entity.Address.Load();
    Console.WriteLine("before count=" + entity.Address.Count);
    addressContext.Detach(entity); // extra line
    addressContext.Attach(entity); // extra line
    entity.Address.Load(); // extra line
    Console.WriteLine("after count=" + entity.Address.Count);
    return entity.Address.Count
}
The choice is clear, I am sticking with Google

Tuesday 10 June 2008

Entity Framework Bug

In a previous post I blogged about Entity Framework (EF, v1.0 Beta 3) flaws, one of which was about the inconsistency between the entity context and the actual data in the database tables. Here I demonstrate it.

I have two tables: AddressField and Address with a many-to-many relationship as illustrated below.

In the database, the many-to-many relationship is representated as the address_addressfield table. EF correctly generated the entity model, which is shown below. In the entity model, Address has a collection of AddressField and vice versa. To avoid circular loops, lazy loading is used. So far, so good.

Suppose I have 3 addresses a1, a2 and a3, all of which are located in the same country, i.e. all 3 contain the same AddressField. Conversely, the AddressField representing the country should have 3 Addresses in its collection.

If I delete any Address, the corresponding database records in Address and Address_AddressField table should get removed. I would expect the same thing happens in the entity context - i.e. the deleted address should be removed from the AddressField's Address collection.

The following code snippet from my unit test illustrates this:

addressDao.Delete(a1);
Assert.AreEqual(2, addressDao.GetAll().Length);
addressDao.Delete(a2);
addressDao.Delete(a3);
Assert.IsNull(addressDao.GetAll());
Assert.AreEqual(0, 
  afDao.CountAddressReferences(afCountry)); // test fails here
The CountAddressReferences() method implementation is shown below:
public int CountAddressReferences(Address.Domain.AddressField af) {
    int id=(int) af.GetAddressFieldId();
    AddressModel.AddressField entity = addressContext.AddressField
        .Where(t => t.addressFieldId == id).First();
    entity.Address.Load();
    return entity.Address.Count;
}
When I inspect the database tables, everything was OK - the Address table was empty and so was the Address_AddressField table. So the problem is in the entity context - it is not refreshed/updated following the association between Address and AddressField. To get around this problem, I had to detach and reattach the AddressField as shown below (lines 7 to 9):
public int CountAddressReferences(Address.Domain.AddressField af) {
    int id=(int) af.GetAddressFieldId();
    AddressModel.AddressField entity = addressContext.AddressField
        .Where(t => t.addressFieldId == id).First();
    entity.Address.Load();
    Console.WriteLine("before count=" + entity.Address.Count);
    addressContext.Detach(entity);  // extra line
    addressContext.Attach(entity);  // extra line
    entity.Address.Load();          // extra line
    Console.WriteLine("after count=" + entity.Address.Count);
    return entity.Address.Count;
}
Inspecting the console output, it yielded:
before count=3
after count=0

We Are What We Build

I have recently read a book called Lessons in Grid Computing: The System Is a Mirror. The central theme of the book is that the systems we build mirror us - what we like, how we do things, etc. I am sure most of us have experienced this throughout our lives. For example, by looking at the car (including the interior decorations) you can sort of guess the owner's personality and interests - You Are What You Drive/Eat/Build...

An important corollary of the mirror theory is that if you want to develop and improve your system effectively, you must structure your organisation to mirror the system that you are building. For example, if your software system consists of closely collaborating modules, then the respective teams who are managing those modules must be communicating closely also, or perhaps have them built by the same team; on the other hand, if the system consists of autonomous modules that can live independently, and they communicate through a broker, then the organisation structure should also reflect these characteristics - the team who is responsible for the broker must keep all the other teams informed about design decisions and collect feedbacks from them.

It is sad but true that many ISVs do not orient themselves to reflect what they build (hence the market they are serving). I have witnessed companies (large and small) that do not develop their products in a collaborative manner when it's needed. Instead, strategic decisions about the product architecture were made by individual teams without any cross-team reviews, and even worse, such decisions were made by one individual in some companies.

The system is a mirror, and to correct inadequacies we see in the reflection, we should not simply cover the mirror, or replace it with another.

Sunday 8 June 2008

Entity Framework 1.0 Beta 3

The much anticipated Microsoft ADO.NET Entity Framework (by me at least) has turned out to be a big disappointment.

Entity Framework (EF) is an improvement of the old ADO.NET and supposed to be Microsoft's answer to third-party Object-Relational Mapping (ORM) products, such as Hibernate. However, it totally missed the plot of why people want to have ORM and how these tools are used.

The purpose of ORM is to shield the database details away from the business logic so that the developer does not have to worry about the implementation details of how the data are stored on disk, how the foreign keys relationships are traversed, etc. Developers only need to deal with the domain object model and the domain objects should be modelled as Plain Old .Net Objects (PONO, borrowing from the POJO acronym). This is evident in the tried and true ORM frameworks originated in the Java world: Hibernate, JPA, etc (and to certain extents iBatis SQL Mapper).

The EF way of doing this is to have an Entity Model which can be generated from the database schema. The Entity Model classes inherit from System.Data.Objects.DataClasses.EntityObject, which means that these entity objects are not PONO. So if you want to have a loosely coupled system (between data tier and business tier), then you will need to have a mapping layer between your PONO and these entity objects. This is unnecessary additional work which should have been taken care of by the ORM framework itself.

Another problem with dealing directly with these entity objects is that the developer will have be be aware of the fact that there is a whole new caching layer called entities context which caches the database records in memory in the form of these entity objects. Very often, the developer will have to worry about the synchronisation between the entities context and the actual database (by calling Attach(), Detach() and Load() methods explicitly on the entity model objects) especially when many-to-many relationship associations need to be followed on Delete or Update operations. See my other post on an example of this. Microsoft has also been evangelising the practice of putting these entity objects directly into GUI widgets as their data source. This again creates tight coupling between the data tier and the presentation tier. All these extra data and methods about the entity object life-cycle should be made only visible in the data tier (or DAL, as Microsoft calls it) and nowhere else. That is why PONO should be used throughout the business and presentation tiers, rather than these bloated entity objects. It is disappointing to see that after witnessing all these great ORM examples in the Java world, Microsoft still could not make a decent ORM framework.