How can the Semantic Web Improve the Availability of Information?

After reading through research papers on the semantic web, I've written a paper-style piece on how it can improve the availability of information.

Abstract

The information and data from large scale databases and repositories can be shared globally via the World Wide Web. In order to find and retrieve this data, we need a piece of software that is able search the content of these databases and return results – this is known generically as a search engine, and there are many available for different purposes. However retrieving meaningful information is not easy; to overcome this semantic web search engines are being developed. In this paper I am going to present how both the semantic web and semantic search engines can transform the way information is retrieved and presented to the user.

Introduction

The semantic web is an extension of the World Wide Web that allows for not just information but meaning of information. The new standards will ensure that all data is structured in a way in which it can be dynamically found and accessed. The W3C standard “aims to provide a common framework that allows data to be shared and reused across application, enterprise, and community boundaries.”[1], this will enable the data on the web to be processed by both machines and humans.

A semantic web search engine is different from the semantic web, a semantic search engine is a piece of software that aims to precisely locate and gather information which is published on the semantic web.  It is almost intuitive in the way that it can respond to user’s queries, not by just searching for occurrences of the words used in the users query on web pages, but that it almost understands the context in which the user is searching under, it also learns and is able to apply additional information to the users query.  Semantic web search engines are therefore able to answer user’s queries efficiently and effectively using intelligent algorithms. Of course they do however rely on data being stored semantically and following conventional standards.

Related Work

§ Intelligent Semantic Web Search Engines: A Brief Survey  http://airccse.org/journal/ijwest/papers/0111ijwest03.pdf
    This paper gives in introduction to semantic web technologies, and then outlines several semantic web models and search engines, making it clear that different ones are used for different working environments. The paper is well written and easy to read yet also in-depth and informative.
    This paper was published in in January 2011 – over 3 years ago. Due to the rapid increase in processing power and hence the capabilities of computers,  technology papers usually become outdated quite quickly, however semantic web is not a band new concept, Tim Berners-Lee wrote a paper on it in 2001. Therefore the content of this paper is still in date, with the slight exception of a few new semantic search engine players being missed off from the list such as Kakia.  
    The paper was published in the International journal of Web & Semantic Technology. This is a quarterly peer-reviewed journal. As it is international there it is a well-respected body, and peer-reviewed so other academics in the field had accepted it.
    It has had 13 citations, so is a well referred to journal.
    Two of the authors are doctors in their field, so are experts on semantic web.

§ Solving semantic ambiguity to improve semantic web based ontology matching http://oro.open.ac.uk/23564/1/gracia_om07.pdf
    This paper discusses one of the main problems with the semantic web, which is ambiguity, it puts forwards several factors that will help reduce this
    It was published in 2007, and some parts of it are outdated, for example Google knowledge graph developers have come up with methods of reducing ambiguity that work in practice – yet they were not mentioned in this paper
    It was published by the Open University, and does not appear to have gone through a formal published review process, which may suggest against reliability.
    Also the authors were not mentioned / kept anonymous, again this does not suggest reliability as they are probably not well-respected senior academics in the field.  

Analysis

The Difference between Current Web and the Semantic Web Search Engines

It is very difficult for machines to understand the information that the user enters as a search string if there is a lack of structure to the information being searched.  A current web search engine has two elements, the string that the user entered, and a huge mass of unorganised records, therefore all that it can do is find occurrences of each of the users key words in documents across the web, and return the results with the most occurrences of the users search string. Many search engines also rank results by other factors as well, such as the number of other web pages that link to that page, but the search engine itself has no understanding of what the user is searching for, or what results it is returning.

However a semantic web search engine, which relies on its data being organised semantically, interprets the users input string as an object or an object with a set of conditions. It does this by looking at occurrences of this object in a semantic database and seeing what relationships it has to other objects. It can therefore return results that are relevant not to the string the user entered, but the object they referred to.

Limitations of the Current Web

There is a lack of formal structure applied to representing the data, therefore semantic search principles simply cannot be applied to this information with any level of efficiency. There is also a massive ambiguity of information which leads to little interconnection of this information. There is currently no way to manage an enormous number of users or ensuring trust at all levels, there is also very little automatic transfer methods in place. Finally due to the lack of universal format there is no way that machines could understand the provided information or make use of it.

Instant responses to question queries

This is one of the areas Google have been working on recently, if you type in a query such as “What will the weather be on Sunday?” It will be able to return a single figure or in this case whether summary above all the search results. It will likely get your approximate location from your IP, so the result will be accurate. It will also have more options below Sundays weather forecast, such as Saturday or Monday’s forecast. The search results below will of course be related, so there will be an entry for the MetOffice website. But also the adverts will be relevant not to the search for weather forecasts – but to the results of the users questions, for example if it is going to be sunny, there may be a Tesco advert for BBQ’s. [3]

Searching “things” not “strings”

An example of a recent development into the semantic web is Google’s knowledge graph. This involves attempting to assign a unique ID to every real-world object mentioned on the internet, and then link up relationships between objects to determine what is relevant to each. Therefore when the user searches for something, it won’t be there query string that is searched for – it is instances of that object and the relationships it has with other objects. This means that the results returned will be much more accurate, and intelligent and aim to have what the user was looking for on the first page of results.

Potential Issues with Semantic Web Search Engines

While the majority of the web is still not semantic, there is likely to be low precision and high recall from some search strings, this can be caused by relying on resources that are not semantic. Also it can sometimes be hard – or impossible to identify the intentions of the user and hence return results that fit these intentions, there query could be ambiguous, or the information in the semantic web could relate to other entry’s that don’t fit with the users query. Also it is possible that individual user patterns can be extrapolated to global users, however users in different categories will probably be searching for different information even if the search string appears the same.
Finally there are always going to be inaccurate queries, users don’t always structure their queries in the most effective way, and often are not even sure themselves what they are looking for.

Work needs to be done to solve the issue of semantic ambiguity in order to improve semantic web based ontology matching. This is currently the biggest problem in the new paradigm of ontology matching put forward in 2008 by W3C. These are caused by ambiguity during the anchoring process of ontologies. [2]

Context

Potential Legal Issues

The current legal framework which governs how data can be processed was mainly written before the internet became as big as it is now, this becomes even more important as the development of the semantic web is increasing, and more data is flowing to and from servers and terminals.

Another issue is privacy, for a semantic web search engine to learn they have to collect information from the user; this may be storing what they are searching for, recording their behaviour or even storing personal details in order to return the most relevant search results.

There is also the issue of reliability of the results, if someone has published an ontology with unreliable or outdated information, then the results returned to the user may not be reliable.

Potential Professional Issues

Having information which is much more accessible and reliable will increase efficiency in many areas, but this may lead to there being less jobs. For example search engine optimisation will be a thing of the past, replaced by a much simpler process of just ensuring that your semantic document follows conventions.

Potential Ethical Issues

It may be hard to monitor and ensure that no inappropriate results are returned in a search if they have been linked to the object the user was searching for.

Potential Environmental Issues

Google’s Graph Search, which is an early stage semantic web search engine already has over 2 billion relationships between objects, and that is just enough for testing their algorithms. Unless further advances in processing are made, then if semantic web searching gets popular it is going to have a serious effect on the environment. As it happens, Google are currently investing in quantum computing in order to try to cope with this increased demand for processing power.

Conclusion

The further development of the semantic web and semantic web search is very exciting, as it is making information so much more accessible, useful and relevant. Although there is still a lot of development to do as nearly all of the World Wide Web is currently unstructured.

References

[1] – The W3C Standard
[2] - Solving semantic ambiguity to improve semantic web based ontology matching
[3] – Where will the semantic Web take us with regard to improved access to Web-based information? www.sajim.co.za/index.php/SAJIM/article/download/207/203‎
[4] – Algorithmic procedure for finding semantically related journals - http://www.garfield.library.upenn.edu/papers/pudovkinsemanticallyrelatedjournals2002.html
[5] – International Journals for Semantic Web Technology - http://www.airccse.org/journal/ijwest/ijwest.html
[6] – A Semantic-Web based Framework for Developing Applications to Improve Accessibility in the WWW – http://dl.acm.org/citation.cfm?id=1133238
[7] – How Can the Semantic Web Improve the Acquisition and Sharing of Knowledge?


My Minesweeper Android app on Google Play

I've just launched my first Android app onto the market - it won't be the next FlappyBirds, but it's a working game. It's a remake of the classic Minesweeper game. It's a pretty simple application, and is available for free (see link below), I have also released all the source code, this can be viewed from GitHub (also see below).

The aim of creating this app, was mainly just to improve my Android/ Java programming skills, and hopefully provide some light entertainment to people playing the game too.

Really enjoyed making the app, and I hope some of you will enjoy playing it. Got to say, the hardest part was testing the win() function, as I'm not very good at playing minesweeper!

If you enjoyed the game, a positive rating for the app on Google play would be very very much appreciated :)

Links

Evaluation of Research Paper: Watson, more than a semantic web search engine


This post is a recent piece of work we did - a report and evaluation on a semantic web search research paper.

Abstract

In this report we present our evaluation of the paper, ‘Watson, more than a Semantic Web search engine’. We firstly introduce the semantic web and then discuss our criteria for evaluating the paper, these range from how original the paper is in it’s particular field to the amount of background research the author did before publishing the paper. These are each scored during our evaluation of the paper. For each criteria we offer an explanation for the reasons for our choice.

Introduction

The Semantic Web was designed by W3C as a new ‘Web of Data’, this would allow computers to perform more useful tasks that supports trusted interactions over a network. Technologies introduced as part of the ‘Web of Data’ include enabling people to create data-stores on the web, adding rules for handling data and building vocabularies. Many technologies help to empower the data such as RDF, SPARQL and OWL. (W3C, 2013)

Semantic web engines are seen as the future of web search, while many current search engines match search strings for keywords within a website, semantic web search engines match search strings for the semantics of websites. This means more accurate responses to search terms, more accurate targeted ads and less spam search results.

Semantic web engines work by using a crawler to search for semantic documents and ontologies on the web, semantic documents can be identified by the document’s metadata, specifically their basic, document and RDF metadata.

After a document has been identified, the document is indexed with an identification number so the semantic search engine can return to it quickly in order to speed up search times. In the paper, the authors make their arguments that Watson is the best semantic web search engine.


Explanation

While no objectives or research questions are explicitly mentioned anywhere in the paper, the overarching theme of the paper appears to be to inform the reader on the Watson semantic search engine and the benefits of Watson compared to other semantic search engines.

The paper begins firstly by introducing the Watson system; it explains that Watson is a semantic web search engine and claims it to be better than other semantic web search engines by utilising more advanced technologies. It states that Watson performs three main activities, these being 1. Watson collects available semantic content on the web, 2. Watson analyses it to extract useful metadata and indexes and 3. Watson implements efficient query facilities to access the data. While these tasks are generally used by any web search engine, their implementation is quite different when dealing with semantic content rather than ordinary web pages.

The next part of the paper goes on to highlight how even though the first goal of Watson is to support semantic applications, it’s also highly important that it provides access to ontologies for humans with different levels of expertise, therefore Watson provides different methods of use, from the simple keyword search to complex structured queries. The keyword search is similar to other web search engines and the paper explains this in more detail, it also gives detailed explanations on searching ontologies and semantic documents, searching in these documents, retrieving metadata about an ontology, SPARQL querying, retrieving metrics on ontologies and entities and exploring the content of ontologies.

Another big claim made by the Author is how Watson is not only a service that supports the development of semantic web applications but how it can also be used as a research platform. One example he gives is to show how formalised knowledge and data are produced, shared and consumed online. The paper also gives examples of more recent research work that has been conducted using Watson for example to detect and study various implicit relationships between ontologies and semantic documents on the web and it goes on to give a more detailed explanation of this research. The paper also claims that many other aspects of online ontologies can also be considered for study such as how ontologies evolve online or testing new techniques and approaches applicable to ontologies and it goes on to say how Watson is used to give ontologies where anti patterns can be found.

The paper explains how there are many systems that are similar to Watson but claims that Watson differs from them in many ways, the main being that Watson is the only one to provide the necessary level of services for applications to exploit semantic web data. The paper then gives a brief summary of eight different semantic search engines and explains how Watson differs from them and its advantages. One example it gives is how one of the most popular semantic search engines (Sindice) indexes a large amount of semantic data but only provides a simple lookup services which means you still need to download these documents locally to exploit them which it claims in many cases is not feasible. The next example is that Swoogle, while closer to the Watson system, still does not give some of the advanced search functions such as SPARQL querying. The paper also explains how Falcons Semantic search engine has focused more on user interface and the other systems focus on a restricted set of functionalities. The paper concludes this section by highlighting how open semantic search engines are and claims that Watson is the only one to provide unlimited access to its functionalities because others such as Sindice, Swoogle and Falcons restrict the number of queries executable in a day or the number of results for a given query.

The final part of the paper covers the future plans for Watson and states that while Watson is now a mature system, it’s still being developed including an ever growing index of semantic documents, new specialised indexes, and users will soon be able to contribute ontologies to the Watson collection in real time. The paper also mentions that new functionalities are being considered for example many refinements from integrations with social media e.g. a Facebook like button for ontologies.

Current literature while not explicitly discussed is frequently referenced throughout the paper and it’s clear that the work is built from past research and extensive knowledge in the field.

In conclusion the paper presents a complete, up-to-date overview of the Watson system as well as applications made possible by its functionalities although some of the paper feels quite biased in the way that it’s written which takes away from the objective view papers should have.


Evaluation

On a whole this paper didn’t seem to be particularly well written due to the lack of internal validity, a lack of variety in research methods and relatively poor analysis of the research that was present. Although the paper redeems itself by scoring well in other areas of our mark scheme it proves a poorly written article with oddly fitting vocabulary making it a slow read. The paper begins by explaining the subject topic and what it’s about, notably mentioning how good the web search engine is, however providing no evidence to back this up, just small biased information about how the engine functions. After looking into the authors of this paper, it’s clear why the paper appears to be so biased; the authors are both heavily involved in the production of the semantic web search engine: Watson. The search engine may be undeniably better than other web search engines but we’re unable to make this conclusion due to the lack of research, this could be because of “selective research” in order for Watson to appear superior. In this evaluation we’ll discuss how effective the paper is by analysing the different aspects in detail.

The title is something that cannot be ignored “Watson, more than a Semantic Web search engine” the use of the word Watson for the search engine doesn’t seem to have come from any of the authors name, however the name could come from IBM’s Watson computer; an artificially intelligent computer that’s able to answer questions posed in natural language. This might suggest that the creators believe the web search engine is comparable to this computer in terms of its significance and suggests an arrogance. In the introduction to the paper, the authors technological knowledge is evident by the use of jargon which most readers would be unfamiliar with, with words like “crawler” and “ontologies”, making it difficult to read without having to look into what these words mean. The authors mention that other semantic web searches exist, however give the impression that they’re not as advanced or significant as Watson. Some practical applications of the search engine are given, implicating that the systems using Watson rely on the search engine for success. Its examples like this on only the first page that really highlight why we have only been able to give the paper 3/10 for internal validity. Claims are made throughout the paper without any evidence given, and it really doesn’t allow the reader to determine their own superior web search engine based on research or facts, data may have been taken out of context or selectively to falsify an opinion.

Where the paper lacks in internal validity, we believe it scores perfectly in relatedness. The subject area of semantic web searching is certainly relevant with Google being one of the world’s most valuable companies and the world’s most visited website. Watson is amongst this popular group of search engines and the paper describes the components of ontologies that are crucial to semantic web search engines. The subject matter of semantic web search engines is relevant throughout the paper as other search engines are compared and briefly analysed and doesn’t at all stray from the subject of the paper itself.

In terms of originality the paper doesn’t do particularly well, as the search engine fails to distinguish itself amongst other semantic web search engines due to a lack of innovation and a saturated market. Watson succeeds in combining existing concepts from other semantic web search engines and we failed to see how Watson was much different from the others available despite constant comparisons throughout the paper, there is really no way in making a conclusion that indicates Watson is in any way original.

Despite some bias in the paper, it’s still a good candidate for researching the subject area of semantic web searches and Watson, it could perhaps be used by academics. The authors go into some detail about the anatomy of a semantic web search engine which is thoroughly explained and provides a diagram to visually aid. Further jargon is used in this section which makes it particularly applicable to academics, demonstrating a clear knowledge of semantic web search engines and certainly making this knowledge digestible.


Another aspect of the paper we believed it did well on was ‘Thorough background investigation’, the paper does give a variety of different sources, including conferences, journals and workshop publications. Having different reference mediums helps to reinforce the legitimacy of references and also the opinions/facts drawn from them as there are multiple different sources as evidence.

The researcher has clearly conducted some thorough background research and used a large number of sources, not just readily available sources. As there is such a large variety of sources it feels as though there is certainly evidence to support some points made, particularly when mentioning the other semantic web search engines. The only drawback is the internal validity again, when reviewing the different search engines that rival Watson, there are little to no references that support some of the statements made, and there is even less evidence to support the writer’s opinion that Watson is somewhat better than the other semantic search engines. We also looked into the extent of research methods, in which we found the only research used were workshop articles, journals and conferences. We believe that in order to make such claims about Watson, surveys and questionnaires could have been used to reinforce this statement, this lack of research methods just doesn’t give enough support to the claims made in the paper so it doesn’t do very well here in our mark scheme, we decided to give this element of the paper a 2/10.

Although there was thorough background research which is made evident throughout the paper due to the extensive knowledge of the Watson system, and other systems that compare to it. However the paper fails in drawing any logical conclusions from the facts that were collected throughout, and a lack of support to any statements made gives this paper a weak feeling.


Conclusion

This paper has explained and evaluated the paper “Watson, more than a Semantic Web search engine”. It has covered our criteria for evaluating the paper and the scores we have given it for each criteria, it has also given a brief insight into what the semantic web is and how it can be used.

The aim of this paper was to show our findings when evaluating the paper on Watson. We have found that the paper has some strengths but also many weaknesses. These strengths include being very related to it’s subject area and being a good paper for researchers wanting to perform research in this area. On the other hand, the paper really suffers because of it’s lack of internal validity, it makes many claims but has no evidence to back them up. This is only one of the many pitfalls in this paper.

Notwithstanding these limitations, the Watson paper does offer an insight into semantic web engines and we have taken this into consideration when performing our evaluation.


References


W3C. (2013). Semantic Web. Available: http://www.w3.org/standards/semanticweb/. Last accessed 12th March 2014.

Swoogle. (2004-2007). How to Search Semantic Web Documents/Ontologies. Available: http://swoogle.umbc.edu/index.php?option=com_swoogle_manual&manual=search_swd. Last accessed 12th March 2014.

G Madhu, A Govardhan, T. V. Rajinikanth. (2011). Intelligent Semantic Web Search Engines: A Brief Survey. International journal of Web & Semantic Technology. 2.

Mathieu d'Aquin. (2014). Semantic Web . Available: http://people.kmi.open.ac.uk/mathieu/. Last accessed 19th March 2014.

Our Assessment

Part 1 - The Content of the Paper

Originality (6 marks)
How much of a new concept this is
0-2 marks - not an original concept
3-4 marks - Fairly original
5-6 marks - New concept, very original

We have decided to give it 2/6 for originality because it’s very similar to existing semantic web search engines with it’s unique selling point being that it combines many of the features of existing semantic web search engines. We were unimpressed with the lack of new concepts that were presented.

Relatedness (6 marks)
Is the topic relevant to the subject area it’s published under
0-2 marks - not relevant to anything
3-4 marks - partially relevant but not to the category
5-6 marks - very relevant

We have decided to give the paper 6/6 for this criteria as the content of this paper is all relevant to semantic web search engines. The paper also describes the components of ontologies that are a key part of semantic web search engines.

Applicability (6 marks)
Will this research be of use to academics in this subject area?
0-2 marks - Of no use
3-4 marks - Of some use
5-6 marks - Very useful

We have decided to go with 4/6 because there is a strong emphasis on how Watson can be used as a research tool. The paper shows how formalised data can be produced, consumed and shared online. More recent research can be used to study and detect various implicit relationships between ontologies and semantic documents.

Sources and Research
Thorough background investigation (10 marks)
Did the researcher use all sources of information, or just sources that were readily available?
0-3 marks - used little of no sources of information
4-7 marks - used only sources that were easy to access/ readily available
8-10 marks - used many sources of information including exclusive resources.

We have decided to give the paper 8/10 for background research, the reasons for this are that the paper uses a variety of different sources, including conferences, journals and workshop publications. This shows that the researcher has clearly conducted thorough background research and used a wide range of sources, not just ones that were readily available.

Internal Validity (10 marks)
Are the information sources the the methods of research used correctly? Including facts being correctly presented and quotes used in context.
0-3 marks - no information sources are correctly used or they have been manipulated
4-7 marks - some information sources have been used and are partially well presented
8-10 marks - good, reliable and relevant information sources were used and correctly referenced to.

We have decided to give the paper 3/10 for internal validity, the reasons for this are because the author never used solid proven facts to back up what was being said in the paper. He did however use his own knowledge to verify the validity of what he was saying.

External Validity (10 marks)
Has the author checked the references to ensure they are accurate?
0-3 marks - not at all, and they’re not accurate
4-7 marks - relevant but not accurate
8-10 marks - fully checked and accurate

We give the paper 8/10 for external  validity, we have read some of the papers that have been used as sources and can confirm that they are accurate and have been used correctly.

Extent of research Methods (10 marks)
How much has the author made use of research methods, such as experiments or surveys etc
0-3 marks - has not used any research methods
4-7 marks - has used some research methods, but mainly just other peoples work
8-10 marks - has used a range of research methods to backup findings, including going beyond what other people have done

We give the paper 2/10, the reason for this is because the author has not used any research methods other than workshop articles, journals and conferences. To improve this mark, the author should have made greater use of other research methods such as questionnaires and surveys which could have been used to back up the claims that Watson makes.

Part 2: Complexity

Extent of Analysis of Research (25 marks)
0-2 marks - The researcher just recites facts from one sources, and nothing more
3-10 marks - The researcher collects facts from a number of sources, and organises them, but does not go any further into analysis than this
11-18 marks - Researcher attempts to draw logical conclusion from the facts collected from numerous sources and ordered
19-25 marks - Researcher uses the relevant data collected to back up the conclusion they have come to from their question, and data from many sources correctly ordered and analysed

We are giving the paper 10/25 for extent of analysis of research because although the author of this paper has clearly done some extensive reading around this topic and is very knowledgeable of the Watson system, they have not attempted to draw any logical conclusions from facts collected.