Archive for the 'Uncategorized' Category

Why use inheritance?

DISCLAIMER: if you are actually hoping that this post will help you answer this question, please stop. Go to some authoritative source to find the official answer to this question. I don’t quite agree with it, but who am I?

The other day I was preparing to interview an interesting candidate. This candidate had had other interview events before, so I was reading through them, when I was shocked by one note. But before I write this note, I have to observe that the two protagonists of this exchange were “experienced” Java programmers (2+ years of experience):

Interviewer: Why use inheritance?
Candidate: mumble… mumble… Because it reduces repetition between classes… mumble… mumble
Interviewer: Good! Let’s move on…

And he wasn’t kidding about the “good”. This is not an exactly transcript of the interview, or really the exact transcript of the approximate transcript of the interview (the interviewer notes). But it really worried me that at this day there is still people with more than a few years of programming experience that still think that class inheritance is just a mechanism to reduce code repetition. Especially when we are talking about a language that does not support multiple inheritance (I’m not claiming that this is right when you have multiple inheritance, but it’s even more wrong when you don’t).

It’s one of my pet peeves when people start creating very deep class hierarchies so that they can hang their “utility” methods throughout their classes without having to have explicit references to a utility class, or without having to do static imports (which I think are extremely confusing sometimes). It annoys me because I have seen people shooting themselves in the foot so many times because suddenly they need to have a real inheritance with a class they don’t have control over and now they are stuck because they can’t have multiple inheritance in Java.

I tend to be very strict about when I use inheritance. If I’m in doubt that I can say that A is-a B, even thought it will save a few characters here and there if I use inheritance and it’s very unlikely that A will diverge enough to require another parent, I just end up using containment instead of inheritance. Yes, it’s more verbose, but will keep your code sane in the future. And navigating through contained classes is generally easier than inherited methods, because if you don’t have CTRL-click on the IDE to help you figure out where the implementation of the method you are calling is, it does take much more time to find it. Not only it is time consuming, but you could be getting it wrong by missing and override somewhere in the class hierarchy.

So, why use inheritance? Polymorphism is one of the answers, although not all OO languages support it. The other answer is to improve understandability of your classes. If two classes have part of their state stored in a common parent, it should mean that that part of the state is semantically compatible (and I can write a long critique about the fact that sometimes this is not true and it drives me crazy). Containment does not generally provide this same compatibility, because things can be contained in different contexts. Thus it becomes harder for people that are analyzing the code to really tell how to interact with a set of classes.

Challenged by real-world ontologies – recipes

One of my apparently never-ending projects that I’ve spent a lot of time thinking about lately is how to build a system to represent recipes (like, cooking recipes). At first it seemed like a problem that wasn’t that hard to solve. No deep hierarchies, no complex multiple parents and constraints. However, this was an illusion. As I started with the modeling, using OWL and Protege, I quickly realized that I was hitting the old challenge of all ontology-based systems I’ve worked in the past: what is a type and what is an instance.

When you look at a problem in an abstract sense, it’s not too hard to draw a line on where you will set as being a class and all the rest are instances of the class. For example, the classic Noy and McGuinness Wine Ontology. In this case they work on varietals and characteristics and that makes it all classes. Then the actual wine bottle is an instance. Easy enough? Well, until you try to get to more interesting things like properties.

Let’s say that you want to represent a winery and provide information about the wines they produce. So far it seems easy: a winery is an instance and it has an object property that will connect it to the instances of the wines it produces. But now let’s say that you want to provide information about the type of wine it produces. Type, so now you are talking really about a relationship between the winery and actually a class of wines. Which now means that if there is a winery with only one wine, now this wine also has to be a class of wines. So far, almost good. You can say that the range is a class, but I’m not aware of a way of saying that the range is a class that is a subclass of Wine. The only solution is to go back to calling it an instance and use the single class instance pattern. But then how can you relate the wine to their specialty in the ontology?

That’s just the beginning of it. Now let’s say that you want to represent wine pairings. Some types of wine go well with types of food. That relationship is actually on an instance of type of wine to an instance of type of food. Now you have again two elements for one single concept. And this will keep happening and the cost for the person that is trying to maintain it is quite large.

Now for my specific case, representing recipes. The problem or recipes is that most of them are actually defined as a class of recipes. Relating back to wines, a recipe can call for a dry white wine. There are many dry white wines out there and each choice will actually generate slightly different results in the recipe. The only “complete” way I was able to represent something like this is to define it as a class and then have each ingredient be a type of object property, which now explodes exponentially the number of types of elements that exist, which makes it pretty much impossible to maintain manually. If it’s not manually maintainable, I’m not sure it’s that useful.

So, how to solve this issue? That’s a very good question. I’m not sure I have the answer for it right now, so I’ll defer it to a future post. I’ve started looking at rule-based constraints, instead of class-based, and they do help some, but not enough. There is still a need to increase the fluidity of classes and instances, but it’s complicated to define “recursive” metadata that is applicable to itself and still keep inference bounded. What is the use of creating a language and not being able to actually benefit from its semantics?

Anyway, I probably should go back to reading research on the subject. I think I’m missing something important somewhere.

Just because I mentioned OpenCalais… Now a Twitter mashup

I can’t say I’m too impressed by this, but it might be because I don’t know much about Chicago politics. But mostly because I just mentioned OpenCalais, I felt that I had to post this too:

The buzz on Twitter on the Illinois Special Elections in the Fifth Congressional District

It’s probably a simple search on Twitter that subsets entries that could be related to the candidates and then they run then through OpenCalais to see if any are actually identified as the politician. Then they tally the results. Simple, but a step above the “google solution” of just doing search keyword timeline. I hope more people realize that search is great, but it’s very weak without human “manual filtering”. So building any automated trends based on keywords is bound to give you misleading results.

Friendfeeding

I decided to add my FriendFeed RSS to the right side of this blog.

I’ve always been a fan of the idea behind FriendFeed as an aggregator and distributor of information about what I’m doing without trying to be the owner of all the data. It’s different from Twitter simply because it doesn’t expect you to integrate with it, but the opposite. Yes, business-wise it’s certainly worse, because now you have to handle multiple format for sites generating events, but it’s still a better architecture in my opinion.

At this point in time people should have realized that you can’t do everything right. So maybe your real goal is to really not make anything right, just point to the direction where people are doing things right. The only thing that is left is for people to be able to create their own feeds and add them to FriendFeed. Something like using Yahoo Pipes to generate the specific feed you want. But you’ll need to figure out how to write the feedback piece too. Yahoo Pipes is good for filtering and summarizing, but not for acting on the filtered data.

But back to the reason why I think they should be able to create their own feeds: because they can’t afford to keep running after new formats and building new icons all the time. I would hate to see FriendFeed to be gone, while Twitter is still there running and making no money.

More interesting Semantic Web stuff

When I first heard about Headup it seemed like a nice idea. Now that I’ve seen more details, I think it really might be a good product. I haven’t really tried it yet, because it seems to only work in Windows right now and uses MS Silverlight. But if they get past this silly requirement, it could be big.

If you don’t know what it is, check out their website and look a their videos. In general, it identifies entities in any page you are reading (well, at least on the ones that they show on their demos – Facebook, FriendFeed, YouTube) and allows you to dig through those entities getting specific information that is relevant to the entity. For example, if you click on a band, you will be able to see upcoming concerts for the band. If you click on a song, you can play it. If you click on a person, it will show that person’s profile on multiple websites and activities around.

I’m not sure how good it is on actually tracking everybody, but it’s certainly a neat concept. Let’s wait and see where it will take us.

More into “free” data sources – Swivel

I think I’ve posted about Swivel before on a past blog, but I found myself digging through it again. And I’ll have to say that I found myself once again disappointed by it. It might contain good data, but in general it’s a letdown mostly because on most searches for data the only thing I can find is noise. Either data with not enough information for you to understand like (which is the second hit when you search for “Seattle”):

Top 10 Increases in Total Crime

Or it’s just something that is probably better classified as “private”:

Elite Activity Membership Growth : Note, before you become trigger happy and open this link, let me explain what is it about. Elite Activity apparently is some sort of religion and this graph shows their membership growth for March and May 2008 (not even consecutive months) to be somehow flat. That’s all it has! And how did I find it? It was one of the 4 most viewed data sets today.

So, there are some limitations with the site. But we try to look beyond them into what is really missing with it. Here are some suggestions:

  1. Allow to add and filter by graph metadata: let’s say that I want to get recent data for Seattle. I should be able to specifically specify that I want city data, the city name is Seattle and that the data should contain the year 2007 or 2008.
  2. Provide the ability to cleanup duplicate information
  3. Somehow cache the source for the data. Many places I tried to click on the source link to understand the data, but I received a 404 or even a DNS error for the site. If they want to allow people to get data from different sites and use them to authenticate the data, they should make sure that the sites contain the data
  4. Provide the ability to easily merge and reconcile apparent redundant data from multiple graphs. Sometimes there are some spikes on “fun” data that appears out there and people probably flock to create their own visualization of this data. After the fun is gone, because the data is already a couple of years old, it would be good to be able to clean it up and combine those visualization of the same data.

This is just a short list of things that would improve the site. But one thing that really bugs me with it (mostly because of my current open data mindset) is that they are nice to allow you to import data from multiple places, but the only “API” they seem to have is a way to dump it to Excel. What if I want to cross-relate this data with other thing on my web service? I’m out of luck.

Current projects in mind

So I think I’ve settled on project plans for now. I’m going back to my early data aggregation project combining information about things from multiple open database sources around, with information about what people think is important and how things relate from discussion sources like Twine. The part that I’m not yet settled on is whether

  1. I’ll dig my early research project and deal with stock market movement annotation (not really prediction, just looking at the past and being able to link specific behavior with something in the news), or
  2. I’ll do what I’ve done the most in the last couple of years and deal with product information gathering and structuring.

If I know myself, I’ll probably go to #2, because I can then apply what I find back to my current work. But #1 might be “easier” (as for the amount of required data sources and the availability of those data sources). I’ll see and post here a more detailed plan of what I plan on doing one I have time for it.


RSS My FriendFeed RSS

  • An error has occurred; the feed is probably down. Try again later.