Evaluation of Research Paper: Watson, more than a semantic web search engine


This post is a recent piece of work we did - a report and evaluation on a semantic web search research paper.

Abstract

In this report we present our evaluation of the paper, ‘Watson, more than a Semantic Web search engine’. We firstly introduce the semantic web and then discuss our criteria for evaluating the paper, these range from how original the paper is in it’s particular field to the amount of background research the author did before publishing the paper. These are each scored during our evaluation of the paper. For each criteria we offer an explanation for the reasons for our choice.

Introduction

The Semantic Web was designed by W3C as a new ‘Web of Data’, this would allow computers to perform more useful tasks that supports trusted interactions over a network. Technologies introduced as part of the ‘Web of Data’ include enabling people to create data-stores on the web, adding rules for handling data and building vocabularies. Many technologies help to empower the data such as RDF, SPARQL and OWL. (W3C, 2013)

Semantic web engines are seen as the future of web search, while many current search engines match search strings for keywords within a website, semantic web search engines match search strings for the semantics of websites. This means more accurate responses to search terms, more accurate targeted ads and less spam search results.

Semantic web engines work by using a crawler to search for semantic documents and ontologies on the web, semantic documents can be identified by the document’s metadata, specifically their basic, document and RDF metadata.

After a document has been identified, the document is indexed with an identification number so the semantic search engine can return to it quickly in order to speed up search times. In the paper, the authors make their arguments that Watson is the best semantic web search engine.


Explanation

While no objectives or research questions are explicitly mentioned anywhere in the paper, the overarching theme of the paper appears to be to inform the reader on the Watson semantic search engine and the benefits of Watson compared to other semantic search engines.

The paper begins firstly by introducing the Watson system; it explains that Watson is a semantic web search engine and claims it to be better than other semantic web search engines by utilising more advanced technologies. It states that Watson performs three main activities, these being 1. Watson collects available semantic content on the web, 2. Watson analyses it to extract useful metadata and indexes and 3. Watson implements efficient query facilities to access the data. While these tasks are generally used by any web search engine, their implementation is quite different when dealing with semantic content rather than ordinary web pages.

The next part of the paper goes on to highlight how even though the first goal of Watson is to support semantic applications, it’s also highly important that it provides access to ontologies for humans with different levels of expertise, therefore Watson provides different methods of use, from the simple keyword search to complex structured queries. The keyword search is similar to other web search engines and the paper explains this in more detail, it also gives detailed explanations on searching ontologies and semantic documents, searching in these documents, retrieving metadata about an ontology, SPARQL querying, retrieving metrics on ontologies and entities and exploring the content of ontologies.

Another big claim made by the Author is how Watson is not only a service that supports the development of semantic web applications but how it can also be used as a research platform. One example he gives is to show how formalised knowledge and data are produced, shared and consumed online. The paper also gives examples of more recent research work that has been conducted using Watson for example to detect and study various implicit relationships between ontologies and semantic documents on the web and it goes on to give a more detailed explanation of this research. The paper also claims that many other aspects of online ontologies can also be considered for study such as how ontologies evolve online or testing new techniques and approaches applicable to ontologies and it goes on to say how Watson is used to give ontologies where anti patterns can be found.

The paper explains how there are many systems that are similar to Watson but claims that Watson differs from them in many ways, the main being that Watson is the only one to provide the necessary level of services for applications to exploit semantic web data. The paper then gives a brief summary of eight different semantic search engines and explains how Watson differs from them and its advantages. One example it gives is how one of the most popular semantic search engines (Sindice) indexes a large amount of semantic data but only provides a simple lookup services which means you still need to download these documents locally to exploit them which it claims in many cases is not feasible. The next example is that Swoogle, while closer to the Watson system, still does not give some of the advanced search functions such as SPARQL querying. The paper also explains how Falcons Semantic search engine has focused more on user interface and the other systems focus on a restricted set of functionalities. The paper concludes this section by highlighting how open semantic search engines are and claims that Watson is the only one to provide unlimited access to its functionalities because others such as Sindice, Swoogle and Falcons restrict the number of queries executable in a day or the number of results for a given query.

The final part of the paper covers the future plans for Watson and states that while Watson is now a mature system, it’s still being developed including an ever growing index of semantic documents, new specialised indexes, and users will soon be able to contribute ontologies to the Watson collection in real time. The paper also mentions that new functionalities are being considered for example many refinements from integrations with social media e.g. a Facebook like button for ontologies.

Current literature while not explicitly discussed is frequently referenced throughout the paper and it’s clear that the work is built from past research and extensive knowledge in the field.

In conclusion the paper presents a complete, up-to-date overview of the Watson system as well as applications made possible by its functionalities although some of the paper feels quite biased in the way that it’s written which takes away from the objective view papers should have.


Evaluation

On a whole this paper didn’t seem to be particularly well written due to the lack of internal validity, a lack of variety in research methods and relatively poor analysis of the research that was present. Although the paper redeems itself by scoring well in other areas of our mark scheme it proves a poorly written article with oddly fitting vocabulary making it a slow read. The paper begins by explaining the subject topic and what it’s about, notably mentioning how good the web search engine is, however providing no evidence to back this up, just small biased information about how the engine functions. After looking into the authors of this paper, it’s clear why the paper appears to be so biased; the authors are both heavily involved in the production of the semantic web search engine: Watson. The search engine may be undeniably better than other web search engines but we’re unable to make this conclusion due to the lack of research, this could be because of “selective research” in order for Watson to appear superior. In this evaluation we’ll discuss how effective the paper is by analysing the different aspects in detail.

The title is something that cannot be ignored “Watson, more than a Semantic Web search engine” the use of the word Watson for the search engine doesn’t seem to have come from any of the authors name, however the name could come from IBM’s Watson computer; an artificially intelligent computer that’s able to answer questions posed in natural language. This might suggest that the creators believe the web search engine is comparable to this computer in terms of its significance and suggests an arrogance. In the introduction to the paper, the authors technological knowledge is evident by the use of jargon which most readers would be unfamiliar with, with words like “crawler” and “ontologies”, making it difficult to read without having to look into what these words mean. The authors mention that other semantic web searches exist, however give the impression that they’re not as advanced or significant as Watson. Some practical applications of the search engine are given, implicating that the systems using Watson rely on the search engine for success. Its examples like this on only the first page that really highlight why we have only been able to give the paper 3/10 for internal validity. Claims are made throughout the paper without any evidence given, and it really doesn’t allow the reader to determine their own superior web search engine based on research or facts, data may have been taken out of context or selectively to falsify an opinion.

Where the paper lacks in internal validity, we believe it scores perfectly in relatedness. The subject area of semantic web searching is certainly relevant with Google being one of the world’s most valuable companies and the world’s most visited website. Watson is amongst this popular group of search engines and the paper describes the components of ontologies that are crucial to semantic web search engines. The subject matter of semantic web search engines is relevant throughout the paper as other search engines are compared and briefly analysed and doesn’t at all stray from the subject of the paper itself.

In terms of originality the paper doesn’t do particularly well, as the search engine fails to distinguish itself amongst other semantic web search engines due to a lack of innovation and a saturated market. Watson succeeds in combining existing concepts from other semantic web search engines and we failed to see how Watson was much different from the others available despite constant comparisons throughout the paper, there is really no way in making a conclusion that indicates Watson is in any way original.

Despite some bias in the paper, it’s still a good candidate for researching the subject area of semantic web searches and Watson, it could perhaps be used by academics. The authors go into some detail about the anatomy of a semantic web search engine which is thoroughly explained and provides a diagram to visually aid. Further jargon is used in this section which makes it particularly applicable to academics, demonstrating a clear knowledge of semantic web search engines and certainly making this knowledge digestible.


Another aspect of the paper we believed it did well on was ‘Thorough background investigation’, the paper does give a variety of different sources, including conferences, journals and workshop publications. Having different reference mediums helps to reinforce the legitimacy of references and also the opinions/facts drawn from them as there are multiple different sources as evidence.

The researcher has clearly conducted some thorough background research and used a large number of sources, not just readily available sources. As there is such a large variety of sources it feels as though there is certainly evidence to support some points made, particularly when mentioning the other semantic web search engines. The only drawback is the internal validity again, when reviewing the different search engines that rival Watson, there are little to no references that support some of the statements made, and there is even less evidence to support the writer’s opinion that Watson is somewhat better than the other semantic search engines. We also looked into the extent of research methods, in which we found the only research used were workshop articles, journals and conferences. We believe that in order to make such claims about Watson, surveys and questionnaires could have been used to reinforce this statement, this lack of research methods just doesn’t give enough support to the claims made in the paper so it doesn’t do very well here in our mark scheme, we decided to give this element of the paper a 2/10.

Although there was thorough background research which is made evident throughout the paper due to the extensive knowledge of the Watson system, and other systems that compare to it. However the paper fails in drawing any logical conclusions from the facts that were collected throughout, and a lack of support to any statements made gives this paper a weak feeling.


Conclusion

This paper has explained and evaluated the paper “Watson, more than a Semantic Web search engine”. It has covered our criteria for evaluating the paper and the scores we have given it for each criteria, it has also given a brief insight into what the semantic web is and how it can be used.

The aim of this paper was to show our findings when evaluating the paper on Watson. We have found that the paper has some strengths but also many weaknesses. These strengths include being very related to it’s subject area and being a good paper for researchers wanting to perform research in this area. On the other hand, the paper really suffers because of it’s lack of internal validity, it makes many claims but has no evidence to back them up. This is only one of the many pitfalls in this paper.

Notwithstanding these limitations, the Watson paper does offer an insight into semantic web engines and we have taken this into consideration when performing our evaluation.


References


W3C. (2013). Semantic Web. Available: http://www.w3.org/standards/semanticweb/. Last accessed 12th March 2014.

Swoogle. (2004-2007). How to Search Semantic Web Documents/Ontologies. Available: http://swoogle.umbc.edu/index.php?option=com_swoogle_manual&manual=search_swd. Last accessed 12th March 2014.

G Madhu, A Govardhan, T. V. Rajinikanth. (2011). Intelligent Semantic Web Search Engines: A Brief Survey. International journal of Web & Semantic Technology. 2.

Mathieu d'Aquin. (2014). Semantic Web . Available: http://people.kmi.open.ac.uk/mathieu/. Last accessed 19th March 2014.

Our Assessment

Part 1 - The Content of the Paper

Originality (6 marks)
How much of a new concept this is
0-2 marks - not an original concept
3-4 marks - Fairly original
5-6 marks - New concept, very original

We have decided to give it 2/6 for originality because it’s very similar to existing semantic web search engines with it’s unique selling point being that it combines many of the features of existing semantic web search engines. We were unimpressed with the lack of new concepts that were presented.

Relatedness (6 marks)
Is the topic relevant to the subject area it’s published under
0-2 marks - not relevant to anything
3-4 marks - partially relevant but not to the category
5-6 marks - very relevant

We have decided to give the paper 6/6 for this criteria as the content of this paper is all relevant to semantic web search engines. The paper also describes the components of ontologies that are a key part of semantic web search engines.

Applicability (6 marks)
Will this research be of use to academics in this subject area?
0-2 marks - Of no use
3-4 marks - Of some use
5-6 marks - Very useful

We have decided to go with 4/6 because there is a strong emphasis on how Watson can be used as a research tool. The paper shows how formalised data can be produced, consumed and shared online. More recent research can be used to study and detect various implicit relationships between ontologies and semantic documents.

Sources and Research
Thorough background investigation (10 marks)
Did the researcher use all sources of information, or just sources that were readily available?
0-3 marks - used little of no sources of information
4-7 marks - used only sources that were easy to access/ readily available
8-10 marks - used many sources of information including exclusive resources.

We have decided to give the paper 8/10 for background research, the reasons for this are that the paper uses a variety of different sources, including conferences, journals and workshop publications. This shows that the researcher has clearly conducted thorough background research and used a wide range of sources, not just ones that were readily available.

Internal Validity (10 marks)
Are the information sources the the methods of research used correctly? Including facts being correctly presented and quotes used in context.
0-3 marks - no information sources are correctly used or they have been manipulated
4-7 marks - some information sources have been used and are partially well presented
8-10 marks - good, reliable and relevant information sources were used and correctly referenced to.

We have decided to give the paper 3/10 for internal validity, the reasons for this are because the author never used solid proven facts to back up what was being said in the paper. He did however use his own knowledge to verify the validity of what he was saying.

External Validity (10 marks)
Has the author checked the references to ensure they are accurate?
0-3 marks - not at all, and they’re not accurate
4-7 marks - relevant but not accurate
8-10 marks - fully checked and accurate

We give the paper 8/10 for external  validity, we have read some of the papers that have been used as sources and can confirm that they are accurate and have been used correctly.

Extent of research Methods (10 marks)
How much has the author made use of research methods, such as experiments or surveys etc
0-3 marks - has not used any research methods
4-7 marks - has used some research methods, but mainly just other peoples work
8-10 marks - has used a range of research methods to backup findings, including going beyond what other people have done

We give the paper 2/10, the reason for this is because the author has not used any research methods other than workshop articles, journals and conferences. To improve this mark, the author should have made greater use of other research methods such as questionnaires and surveys which could have been used to back up the claims that Watson makes.

Part 2: Complexity

Extent of Analysis of Research (25 marks)
0-2 marks - The researcher just recites facts from one sources, and nothing more
3-10 marks - The researcher collects facts from a number of sources, and organises them, but does not go any further into analysis than this
11-18 marks - Researcher attempts to draw logical conclusion from the facts collected from numerous sources and ordered
19-25 marks - Researcher uses the relevant data collected to back up the conclusion they have come to from their question, and data from many sources correctly ordered and analysed

We are giving the paper 10/25 for extent of analysis of research because although the author of this paper has clearly done some extensive reading around this topic and is very knowledgeable of the Watson system, they have not attempted to draw any logical conclusions from facts collected.