Sometimes you go around the blogosphere and you see posts like these:
RDF/Linked Data Standards Not Good Enough for Intelligent Agents? Or Is It the Opposite?
And I have to say that they make me worried. Why? Because it attracts people with experience with building semantic web-like applications and they all say (note, this is my interpretation of it, and not really what they said): well, I’m not sure we know if RDF is enough. Also, we don’t want to really talk why it’s not enough and what is needed for it to be enough, because we know that if we start saying something somebody will come around and show how this can be done in RDF and we will feel silly for claiming we couldn’t use it.
My experience? Well, I have some good experience modeling-wise, but not so good experience performance-wise. But, like them, I’m afraid that my not-so-good experience was more because I didn’t have enough time to fully understand what was going on than actually a problem with the framework. Let me get to some details of my not-so-good experience:
I was building summary data and then a reporting mechanism that provided cross-cut views on this summary data (effectively summarizing the summary). The tricky part is that some of it was hierarchically related, i.e., there were summaries that could be categorized in a hierarchical fashion (all things for consumer electronics products, or TVs, or HDTVs) and I wanted to make the report configuration to be able to point to be aware of it. Looking around, I’ve decided to use RDFS to represent the data and the hierarchy and SPARQL to represent the filter that would select the things that I wanted.
In general it worked great! Very few lines of code needed to get it working, pretty much no complicated business logic added anywhere. However, when the data came things weren’t as “pretty” as I hoped. Now for some numbers: the data had about 50K triples, the category hierarchy added another 10K triples (it’s a pretty big hierarchy – but quite small dataset in general). Everything being calculated on a stateless server with 512 MB in the JVM. Using Jena for the RDF serialization/deserialization/representation/querying the result was that it worked well for simple reports (without much hierarchic aggregation), generating reports in about 10s. But for more complicated reports, it started taking 30-60s. And not only this, if multiple reports were generated at the same time (it’s all a web interface, so it’s like opening multiple tabs, one per report), it would lock up the server while doing SPARQL querying and never return.
So what is the current state right now? Unfortunately, as I mentioned, I haven’t had time to dig much further on it, so I’m not sure what is going on. I’m sure there are some things I can do to improve it, but it’s really sad that such apparently simple technology generates such bad results out-of-the-box. It’s not that Jena is a new project. SPARQL (actually ARQ, which is the search engine supported by Jena) might not be as old as the full framework, but it’s a mature project.
Anyway, one day I’ll get back to that system and figure out what was going on and post about it. Until then, I have to handle making sure that all clients are not opening multiple tabs when looking at their reports.
I am the author of the blog you quote. I’m sorry it worries you, but throughout all my industry discussion I can tell you that there are a lot of people doubting the usefulness of RDF, and in my opinion talking about the issue as I do is a much better way to tackle the problem than to pretend that RDF is the answer to everything in the semantic web and works perfectly.
You mention that they might be led to think “well I’m not sure we know if RDF is enough” The reality is that RDF is NOT enough and it was never built out to be.
You also add “We don’t want to really talk why it’s not enough and what is needed for it to be enough, because we know that if we start saying something somebody will come around and show how this can be done in RDF and we will feel silly for claiming we couldn’t use it.”
Nothing could be further from the truth. If you look at my other posts, I have been asking again and again for a discussion on this exact question. In fact, this is also the central topic of my last post.
Although I understand your concerns about RDF adoption, I would say that this has to do with the adoption hurdle for the technology itself, and recommend that you be more careful when flagging blogs. All this said, I hope you will take the time to comment about your RDF experience on my blog.