How can the Semantic Web Improve the Availability of Information?

After reading through research papers on the semantic web, I've written a paper-style piece on how it can improve the availability of information.

Abstract

The information and data from large scale databases and repositories can be shared globally via the World Wide Web. In order to find and retrieve this data, we need a piece of software that is able search the content of these databases and return results – this is known generically as a search engine, and there are many available for different purposes. However retrieving meaningful information is not easy; to overcome this semantic web search engines are being developed. In this paper I am going to present how both the semantic web and semantic search engines can transform the way information is retrieved and presented to the user.

Introduction

The semantic web is an extension of the World Wide Web that allows for not just information but meaning of information. The new standards will ensure that all data is structured in a way in which it can be dynamically found and accessed. The W3C standard “aims to provide a common framework that allows data to be shared and reused across application, enterprise, and community boundaries.”[1], this will enable the data on the web to be processed by both machines and humans.

A semantic web search engine is different from the semantic web, a semantic search engine is a piece of software that aims to precisely locate and gather information which is published on the semantic web.  It is almost intuitive in the way that it can respond to user’s queries, not by just searching for occurrences of the words used in the users query on web pages, but that it almost understands the context in which the user is searching under, it also learns and is able to apply additional information to the users query.  Semantic web search engines are therefore able to answer user’s queries efficiently and effectively using intelligent algorithms. Of course they do however rely on data being stored semantically and following conventional standards.

Related Work

§ Intelligent Semantic Web Search Engines: A Brief Survey  http://airccse.org/journal/ijwest/papers/0111ijwest03.pdf
    This paper gives in introduction to semantic web technologies, and then outlines several semantic web models and search engines, making it clear that different ones are used for different working environments. The paper is well written and easy to read yet also in-depth and informative.
    This paper was published in in January 2011 – over 3 years ago. Due to the rapid increase in processing power and hence the capabilities of computers,  technology papers usually become outdated quite quickly, however semantic web is not a band new concept, Tim Berners-Lee wrote a paper on it in 2001. Therefore the content of this paper is still in date, with the slight exception of a few new semantic search engine players being missed off from the list such as Kakia.  
    The paper was published in the International journal of Web & Semantic Technology. This is a quarterly peer-reviewed journal. As it is international there it is a well-respected body, and peer-reviewed so other academics in the field had accepted it.
    It has had 13 citations, so is a well referred to journal.
    Two of the authors are doctors in their field, so are experts on semantic web.

§ Solving semantic ambiguity to improve semantic web based ontology matching http://oro.open.ac.uk/23564/1/gracia_om07.pdf
    This paper discusses one of the main problems with the semantic web, which is ambiguity, it puts forwards several factors that will help reduce this
    It was published in 2007, and some parts of it are outdated, for example Google knowledge graph developers have come up with methods of reducing ambiguity that work in practice – yet they were not mentioned in this paper
    It was published by the Open University, and does not appear to have gone through a formal published review process, which may suggest against reliability.
    Also the authors were not mentioned / kept anonymous, again this does not suggest reliability as they are probably not well-respected senior academics in the field.  

Analysis

The Difference between Current Web and the Semantic Web Search Engines

It is very difficult for machines to understand the information that the user enters as a search string if there is a lack of structure to the information being searched.  A current web search engine has two elements, the string that the user entered, and a huge mass of unorganised records, therefore all that it can do is find occurrences of each of the users key words in documents across the web, and return the results with the most occurrences of the users search string. Many search engines also rank results by other factors as well, such as the number of other web pages that link to that page, but the search engine itself has no understanding of what the user is searching for, or what results it is returning.

However a semantic web search engine, which relies on its data being organised semantically, interprets the users input string as an object or an object with a set of conditions. It does this by looking at occurrences of this object in a semantic database and seeing what relationships it has to other objects. It can therefore return results that are relevant not to the string the user entered, but the object they referred to.

Limitations of the Current Web

There is a lack of formal structure applied to representing the data, therefore semantic search principles simply cannot be applied to this information with any level of efficiency. There is also a massive ambiguity of information which leads to little interconnection of this information. There is currently no way to manage an enormous number of users or ensuring trust at all levels, there is also very little automatic transfer methods in place. Finally due to the lack of universal format there is no way that machines could understand the provided information or make use of it.

Instant responses to question queries

This is one of the areas Google have been working on recently, if you type in a query such as “What will the weather be on Sunday?” It will be able to return a single figure or in this case whether summary above all the search results. It will likely get your approximate location from your IP, so the result will be accurate. It will also have more options below Sundays weather forecast, such as Saturday or Monday’s forecast. The search results below will of course be related, so there will be an entry for the MetOffice website. But also the adverts will be relevant not to the search for weather forecasts – but to the results of the users questions, for example if it is going to be sunny, there may be a Tesco advert for BBQ’s. [3]

Searching “things” not “strings”

An example of a recent development into the semantic web is Google’s knowledge graph. This involves attempting to assign a unique ID to every real-world object mentioned on the internet, and then link up relationships between objects to determine what is relevant to each. Therefore when the user searches for something, it won’t be there query string that is searched for – it is instances of that object and the relationships it has with other objects. This means that the results returned will be much more accurate, and intelligent and aim to have what the user was looking for on the first page of results.

Potential Issues with Semantic Web Search Engines

While the majority of the web is still not semantic, there is likely to be low precision and high recall from some search strings, this can be caused by relying on resources that are not semantic. Also it can sometimes be hard – or impossible to identify the intentions of the user and hence return results that fit these intentions, there query could be ambiguous, or the information in the semantic web could relate to other entry’s that don’t fit with the users query. Also it is possible that individual user patterns can be extrapolated to global users, however users in different categories will probably be searching for different information even if the search string appears the same.
Finally there are always going to be inaccurate queries, users don’t always structure their queries in the most effective way, and often are not even sure themselves what they are looking for.

Work needs to be done to solve the issue of semantic ambiguity in order to improve semantic web based ontology matching. This is currently the biggest problem in the new paradigm of ontology matching put forward in 2008 by W3C. These are caused by ambiguity during the anchoring process of ontologies. [2]

Context

Potential Legal Issues

The current legal framework which governs how data can be processed was mainly written before the internet became as big as it is now, this becomes even more important as the development of the semantic web is increasing, and more data is flowing to and from servers and terminals.

Another issue is privacy, for a semantic web search engine to learn they have to collect information from the user; this may be storing what they are searching for, recording their behaviour or even storing personal details in order to return the most relevant search results.

There is also the issue of reliability of the results, if someone has published an ontology with unreliable or outdated information, then the results returned to the user may not be reliable.

Potential Professional Issues

Having information which is much more accessible and reliable will increase efficiency in many areas, but this may lead to there being less jobs. For example search engine optimisation will be a thing of the past, replaced by a much simpler process of just ensuring that your semantic document follows conventions.

Potential Ethical Issues

It may be hard to monitor and ensure that no inappropriate results are returned in a search if they have been linked to the object the user was searching for.

Potential Environmental Issues

Google’s Graph Search, which is an early stage semantic web search engine already has over 2 billion relationships between objects, and that is just enough for testing their algorithms. Unless further advances in processing are made, then if semantic web searching gets popular it is going to have a serious effect on the environment. As it happens, Google are currently investing in quantum computing in order to try to cope with this increased demand for processing power.

Conclusion

The further development of the semantic web and semantic web search is very exciting, as it is making information so much more accessible, useful and relevant. Although there is still a lot of development to do as nearly all of the World Wide Web is currently unstructured.

References

[1] – The W3C Standard
[2] - Solving semantic ambiguity to improve semantic web based ontology matching
[3] – Where will the semantic Web take us with regard to improved access to Web-based information? www.sajim.co.za/index.php/SAJIM/article/download/207/203‎
[4] – Algorithmic procedure for finding semantically related journals - http://www.garfield.library.upenn.edu/papers/pudovkinsemanticallyrelatedjournals2002.html
[5] – International Journals for Semantic Web Technology - http://www.airccse.org/journal/ijwest/ijwest.html
[6] – A Semantic-Web based Framework for Developing Applications to Improve Accessibility in the WWW – http://dl.acm.org/citation.cfm?id=1133238
[7] – How Can the Semantic Web Improve the Acquisition and Sharing of Knowledge?