Don’t Trust Fake IDs

After so many years I thought that this was a dead topic but recently I saw at least two reasonably experienced developers having trouble with object identity and thought that it would be good to write about it.

Most languages define some kind of equality operator on objects, in Java, for example, that is the equals method. In Effective Java, Joshua Bloch says that the easiest thing to do is not to implement an equals method at all, but you should implement one when “the notion of logical equality differs from mere object identity” and your superclass doesn’t have already implemented a suitable equality comparisson.

I think most people understand this when it comes to numbers, dates, strings and other Value Objects but I’ve seemed people having trouble understanding what object identity means for Entities.

Entities are objects whose identity is defined by rules in the domain we are modelling. In his Domain Driven Design book, Eric Evans defines objects following this pattern as objects that change state over time but keep their identity. As the object’s identity comes from the business domain reference equality is too fragile to properly compare objects.

Problems with fragile object identity are fairly well-documented in the Object-Orientation literature. In 1986, Setrag N. Khoshafian and George P. Copeland wrote a paper titled Object Identity. In this document they present a list of known problems with fragile identity, some of these are:

  • Multiple references to the object: It is very common to have multiple instances of a given object at a given time and it should be possible to infer that those multiple instances are actually the same object.
  • Uniqueness Not Respected: Objects are abstractions and as such they don’t model every single detail present in the modelled concept. If we don’t define identity correctly we may end up with a model in which different objects are seem as equal.
  • Data and Identity Confusion: The identity of an object should not be allowed to change but, as Evans says, data will change over time. There should be a clear separation of what is data and what is identity.

People use fragile things for object identity for multiple reasons, let’s address two of those.

I refuse to write test-only code!

One of the problems pointed out by those using reference equality instead of object equality is that equals() is commonly not called by anything but tests and writing production code that is used only in tests is a bad smell.

Let’s look at a very simple class and its tests:

class Point {
	private int y, x;

	public Point(int x, int y) {
		this.y = y;
		this.x = x;
	}

	public int getY() {
		return y;
	}

	public int getX() {
		return x;
	}
}

class Square {
	private Point letfTop, leftBottom, rightTop, rightBottom;

	public Square(Point a, Point b, Point c, Point d) {
		this.letfTop = a;
		this.leftBottom = b;
		this.rightTop = c;
		this.rightBottom = d;
	}

	public Point getRightTop() {
		return rightTop;
	}

	public Point getRightBottom() {
		return rightBottom;
	}

	public Point getLeftTop() {
		return letfTop;
	}

	public Point getLeftBottom() {
		return leftBottom;
	}
}

public class SquareTest {
	@Test
	public void ShouldHavePointsInTheRightPlace() {
		Point first = new Point(1, 1);
		Point second = new Point(2, 2);
		Point third = new Point(3, 3);
		Point fourth = new Point(4, 4);
		Square square = new Square(first, second, third, fourth);

		// how do we assert that the right points are in place?
	}
}

Althought the value of the test above is questionable, it exposes the problem: How do we make sure that the points are in the right place? As we said before we could implement an equals() method in the Point class in order to be able to compare X and Y, and then do something like:

assertEquals(first, square.getLeftTop());
assertEquals(second, square.getLeftBottom());
assertEquals(third, square.getRightTop());
assertEquals(fourth, square.getRightBottom());

That looks OK but we just wrote a piece of production code (Point.equals()) that are used only by tests. I personally don’t think of this as a big deal -for me equals, hashcode and toString are part of the language protocol- but I can understand why people don’t like to do that.

So maybe we can just use the == operator as we wouldn’t have to write any extra code for that. Using that operator our assertions wold become something like:

		assertTrue(first == square.getLeftTop());
		assertTrue(second == square.getLeftBottom());
		assertTrue(third == square.getRightTop());
		assertTrue(fourth == square.getRightBottom());

That would work but the problem is that your tests are relying on the wrong thing. Think about that for a second: in the test above do you really care that the same instance is in the top left corner? Suppose we have new requirements and for some reason we can’t store the data as Points. The class contract didn’t change at all, the fact that we can’t use points internally is just an implementation details inside Square:

class Square {

	class SimplePoint {
		public int x, y;

		public SimplePoint(Point p) {
			this.x = p.getX();
			this.y = p.getY();
		}

		public Point toPoint() {
			return new Point(x, y);
		}
	}

	SimplePoint letfTop, leftBottom, rightTop, rightBottom;

	public Square(Point a, Point b, Point c, Point d) {
		this.letfTop = new SimplePoint(a);
		this.leftBottom = new SimplePoint(b);
		this.rightTop = new SimplePoint(c);
		this.rightBottom = new SimplePoint(d);
	}

	public Point getRightTop() {
		return rightTop.toPoint();
	}

	public Point getRightBottom() {
		return rightBottom.toPoint();
	}

	public Point getLeftTop() {
		return letfTop.toPoint();
	}

	public Point getLeftBottom() {
		return leftBottom.toPoint();
	}
}

Now our assertions based on the == operator fail and if you are testing the behavior of the class –as opposed to its implementation- this shouldn’t happen.

Tests should follow the Single Responsibility Principle and have only one reason to change. In that case the only reason for that test to change should be if the points are not in the places we expect them to be. It doesn’t matter if they are stored internally as plain Points, if they are stored as integers over a matrix or even serialized to disk: the test should not fail if the storage strategy has to be modified.

This is the kind of assertion that creates extremely fragile test suites. A fragile test suite will require constant maintenance and will break with no clear reason. Tests start to cause more problems than help development and it is very common for the team to think about either testing less or dropping tests entirely.

As we saw before, the domain defines what makes on object equivalent to another. In the case above the Point class, it is a Value Object and what that means is that identity for that object is given by the value of its attributes, in this specific case that would be the integers Y and X.

If you don’t want to add an equals() method to your class at least make sure you rely on the real identity of the object and not in some accident:

public class PointAssertions {
	public static void assertPointsEquals(Point one, Point another) {
		assertEquals("Expected [" + another + "] to have X equals to ["+ one.getX() + "]", one.getX(), another.getX());
		assertEquals("Expected [" + another + "] to have Y equals to ["+ one.getY() + "]", one.getY(), another.getY());
	}
}

And use that in your tests:

assertPointsEquals(first, square.getLeftTop());
assertPointsEquals(second, square.getLeftBottom());
assertPointsEquals(third, square.getRightTop());
assertPointsEquals(fourth, square.getRightBottom());

So although you had to write some more code you didn’t change production code just to make your tests pass. Also you are probably going to reuse this type of code in other tests that deal with points.

It’s a kind of magic!

Another cause for not correctly defining object identity is when people are using things like Hibernate or NHibernate that, in some cases, return the same instance over multiple calls. One quick example would be:

public class User
    {
        public string Login { get; set; }
        public string Name { get; set; }
        //business logic should be here somewhere
    }
    public class Group
    {
        public User Admin { get; set; }
        public bool CanBeChangedBy(User u)
        {
            return u == Admin;
        }
    }
    public class UserRepository : HibernateRepository
    {
        public User FindByLogin(string login)
        {
            return session.Load<User> (login);
        }
    }

    public class GroupRepository : HibernateRepository
    {
        public Group FindById(string groupId)
        {
            return session.Load<Group> (groupId);
        }
    }

    public class GroupsController
    {

//Some other stuff

        public void Change(string groupId, string login)
        {
            var userTryingToChange = userRepository.FindByLogin(login);
            var groupTobeChanged = groupRepository.FindById(groupId);
            if (groupTobeChanged.CanBeChangedBy(userTryingToChange))
            {
                //do something that lets the guy change the group
            }
            else
            {
                // show error message or something like that
            }
        }
    }

In this case NHibernate happens to return the same object for every call. This code is extremely brittle, though. If you look at NHibernate’s documentation (or Hibernate’s) you will see that multiple queries using the same Session object are guaranteed to return the same instance. The problem is that if you use more than one Session your objects will be different.

That may sound like a small problem given that most web applications (and most applications are web applications) use only one Session per request. But the real problem hides in the fact that now your Business Layer depends on an implementation detail of the Persistence Layer.

Changes in the implementation of the Persistence Layer will directly affect your Business Layer and this is the kind of accidental complexity that creates buggy and unreliable software.

And it gets worse. The original Hibernate’s documentation adds some more information in a section named “Considering object identity”:

11.1.3. Considering object identity
An application can concurrently access the same persistent state in two different Sessions. However, an instance of a persistent class is never shared between two Session instances. It is for this reason that there are two different notions of identity:

Database Identity
foo.getId().equals( bar.getId() )

JVM Identity
foo==bar

For objects attached to a particular Session (i.e., in the scope of a Session), the two notions are equivalent and JVM identity for database identity is guaranteed by Hibernate. […]
However, an application that uses == outside of a Session might produce unexpected results. This might occur even in some unexpected places. For example, if you put two detached instances into the same Set, both might have the same database identity (i.e., they represent the same row). JVM identity, however, is by definition not guaranteed for instances in a detached state. The developer has to override the equals() and hashCode() methods in persistent classes and implement their own notion of object equality.

Now what that means is that I cannot detach my object from Hibernate to, for example, store it in a HTTP session. Even if you don’t use HTTP sessions, that means you can’t cache objects as the cached copy will be a different instance than those returned by a Session.

Suddenly your architecture is extremely constrained just because someone decided that implementing a proper object comparisson method was too much effort.

Before anyone writes a comment with YAGNI in it I will say once more that the simplest thing is different from the hackiest thing. Relying on how your Persistence Layer happens to be implemented is not being simple, it is coupling your components in a way that even small changes requrie hard work.

No half-baked IDs

In the first example we saw a situation where implementing proper object equality wasn’t necessary for the production code. In that case you just have to make sure that your tests are using something reliable when comparing objects.

In the second case we have a need for object identity and using weak identity may save some lines of code but ultimately creates a crittle and inflexible architecture.

Those are just a couple of examples of problems you may find. The most important thing I can say is that every time you need to compare objects you should make sure to use strong identity. You can write the comparisson operations yourself or use a library or anything, just make sure you follow the contract that your language imposes on objects.

Note: I didn’t say or implement anything regarding Hashcode methods here for the sake of simplicity. Anyway I suggest that you follow the recommendations for your language.

5 Responses to “Don’t Trust Fake IDs”


  1. 1 Martin Oct 13th, 2009 at 9:21 am

    [quote] But the real problem hides in the fact that now your Business Layer depends on an implementation detail of the Persistence Layer.[/quote]

    Is this really an issue?

    People rely on the guarantees made by underlying infrastructure all the time. A hash map guarantees the objects with different codes will not collide and overwrite. We use that, we’re happy, we move on.

    At some point you need to be pragmatic about which assumptions you accept and which you account for. If you start accounting for everything, you end up in a world of pain and complexity that’s worse than if you just lived with the assumption…

    I appreciate the point you’re trying to make, but don’t necessarily accept the scenario you’re using to make it… if that makes sense?

  2. 2 Marcos Eliziario Oct 13th, 2009 at 2:36 pm

    Martin,

    The less facts we hold as axiomatic, the better we are on anticipating
    changes, and building useful models.

    I don’t think the analogy you’ve made is really relevant to this point. After all a Hash Map is a well defined entity. Not allowing duplicate keys is an essential, defining property of this object, as elemental as knowing that there’s no such a thing as the smallest real number > 0.

    In other side, Shoes was refering to a behaviour that could be hardly defined as essential, it’s an implementation defined behaviour. Having a dependency on this fact is going to make your code fragile as the author has clearly pointed out.

  3. 3 Marcelo Schmidt Oct 15th, 2009 at 11:36 pm

    Martin,
    I guess Shoes is not referring to persistence layer actually, but referring to persistence *policy* instead. And I must say that I agree with Shoes. Imagine you’re using “==” operators to compare objects retrieved from database, when your persistence policy (or technology) changes, the assumption that 2 object with equals db ids can be compared with “==” operator might not be applicable anymore, right? What he’s saying is that your identity strategy must be independent of ANY other layer or policy.

    Shoes,
    great article. I like to read your posts ’cause you supply a lot of references to my own learning and thinking.

    Best regards,

  4. 4 Daniel Yankowsky Nov 5th, 2009 at 3:51 pm

    Martin,

    You raise a good discussion point. However, I agree with the other commenters. Bob Martin, in his Dependency Inversion Principle, says:

    A. High level modules should not depend upon low level modules. Both should depend upon abstractions.
    B. Abstractions should not depend upon details. Details should depend upon abstractions.

    The scenario described by Shoes is a violation of B. (Much discussion of the OCP deals specifically with abstract interfaces and implementations, but I think it is appropriate to whole systems as well). It is desirable to write your business logic with as little knowledge of the underlying details as possible. I’m not saying that anybody can perfectly do this, only that it is the aim we should all have.

    Often, the constraints of a system are clearly expressed. For example, you know that a hash map has unique keys, because the documentation says so. But often, the unknowns are not expressed. When you insert a new item, will hashCode be called on all existing keys? Will it be called on only some keys? How may times will it be called per key? Do you care? These details are unspecified.

    Certainly, we depend on the fact that HashMap uses unique keys. We probably wanted that behavior. We depend on the fact that Hibernate performs database access on our behalf. Again, we probably wanted that behavior. Should we depend on the fact that Hibernate’s concept of identity depends on the current Hibernate Session? This is not behavior that I asked for, so it’s better to pretend that was not guaranteed. If we assume nothing about the returned identity, then our code will be better able to adapt to future change (which is Shoes’ point).

    I’m not trying to pick on you. Again, I think you raise a good point. It is a trap to over analyze and perfect the abstract architecture of a system. But the opposite, to put off as much architecture as we possibly can, just makes tomorrow’s work more difficult.

  5. 5 Daniel Yankowsky Nov 6th, 2009 at 10:08 am

    And in my comment, when I said “OCP”, I meant “DIP”. Don’t ask what I was thinking.

Leave a Reply








Creative Commons License

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.