How can the Semantic Web Improve the Availability of Information?
After reading through research papers on the semantic web, I've written a paper-style piece on how it can improve the availability of information.
Abstract
Abstract
The information and data from large scale databases and repositories
can be shared globally via the World Wide Web. In order to find and retrieve
this data, we need a piece of software that is able search the content of these
databases and return results – this is known generically as a search engine,
and there are many available for different purposes. However retrieving meaningful
information is not easy; to overcome this semantic web search engines are being
developed. In this paper I am going to present how both the semantic web and
semantic search engines can transform the way information is retrieved and
presented to the user.
Introduction
The semantic web is an extension of
the World Wide Web that allows for not just information but meaning of information.
The new standards will ensure that all data is structured in a way in which it
can be dynamically found and accessed. The W3C standard “aims to provide a
common framework that allows data to be shared and reused across application,
enterprise, and community boundaries.”[1], this will enable the data on the web
to be processed by both machines and humans.
A semantic web search engine is different from the
semantic web, a semantic search engine is a piece of software that aims to precisely
locate and gather information which is published on the semantic web. It is almost intuitive in the way that it can
respond to user’s queries, not by just searching for occurrences of the words
used in the users query on web pages, but that it almost understands the context in which the user is searching under, it
also learns and is able to apply additional information to the users query. Semantic web search engines are therefore able
to answer user’s queries efficiently and effectively using intelligent algorithms.
Of course they do however rely on data being stored semantically and following
conventional standards.
Related Work
§ Intelligent Semantic Web Search Engines: A
Brief Survey http://airccse.org/journal/ijwest/papers/0111ijwest03.pdf
This
paper gives in introduction to semantic web technologies, and then outlines several
semantic web models and search engines, making it clear that different ones are
used for different working environments. The paper is well written and easy to
read yet also in-depth and informative.
This
paper was published in in January 2011 – over 3 years ago. Due to the rapid
increase in processing power and hence the capabilities of computers, technology papers usually become outdated
quite quickly, however semantic web is not a band new concept, Tim Berners-Lee
wrote a paper on it in 2001. Therefore the content of this paper is still in
date, with the slight exception of a few new semantic search engine players
being missed off from the list such as Kakia.
The
paper was published in the International journal of Web & Semantic
Technology. This is a quarterly peer-reviewed journal. As it is international
there it is a well-respected body, and peer-reviewed so other academics in the
field had accepted it.
It
has had 13 citations, so is a well referred to journal.
Two
of the authors are doctors in their field, so are experts on semantic web.
§ Solving semantic ambiguity to improve
semantic web based ontology matching http://oro.open.ac.uk/23564/1/gracia_om07.pdf
This
paper discusses one of the main problems with the semantic web, which is
ambiguity, it puts forwards several factors that will help reduce this
It
was published in 2007, and some parts of it are outdated, for example Google knowledge
graph developers have come up with methods of reducing ambiguity that work in
practice – yet they were not mentioned in this paper
It
was published by the Open University, and does not appear to have gone through
a formal published review process, which may suggest against reliability.
Also
the authors were not mentioned / kept anonymous, again this does not suggest
reliability as they are probably not well-respected senior academics in the field.
Analysis
The Difference between Current Web and the Semantic Web Search Engines
It is very difficult for machines to
understand the information that the
user enters as a search string if there is a lack of structure to the
information being searched. A current
web search engine has two elements, the string that the user entered, and a
huge mass of unorganised records, therefore all that it can do is find occurrences
of each of the users key words in documents across the web, and return the
results with the most occurrences of the users search string. Many search
engines also rank results by other factors as well, such as the number of other
web pages that link to that page, but the search engine itself has no understanding
of what the user is searching for, or what results it is returning.
However a semantic web search
engine, which relies on its data being organised semantically, interprets the
users input string as an object or an object with a set of conditions. It does
this by looking at occurrences of this object in a semantic database and seeing
what relationships it has to other objects. It can therefore return results
that are relevant not to the string the user entered, but the object they referred
to.
Limitations of the Current Web
There is a lack of formal structure
applied to representing the data, therefore semantic search principles simply cannot
be applied to this information with any level of efficiency. There is also a
massive ambiguity of information which leads to little interconnection of this
information. There is currently no way to manage an enormous number of users or
ensuring trust at all levels, there is also very little automatic transfer
methods in place. Finally due to the lack of universal format there is no way
that machines could understand the provided information or make use of it.
Instant responses to question queries
This is one of the areas Google have
been working on recently, if you type in a query such as “What will the weather
be on Sunday?” It will be able to return a single figure or in this case whether
summary above all the search results. It will likely get your approximate location
from your IP, so the result will be accurate. It will also have more options
below Sundays weather forecast, such as Saturday or Monday’s forecast. The
search results below will of course be related, so there will be an entry for
the MetOffice website. But also the adverts will be relevant not to the search
for weather forecasts – but to the results of the users questions, for example
if it is going to be sunny, there may be a Tesco advert for BBQ’s. [3]
Searching “things” not “strings”
An example of a recent development
into the semantic web is Google’s knowledge graph. This involves attempting to
assign a unique ID to every real-world object mentioned on the internet, and
then link up relationships between objects to determine what is relevant to
each. Therefore when the user searches for something, it won’t be there query
string that is searched for – it is instances of that object and the
relationships it has with other objects. This means that the results returned
will be much more accurate, and intelligent and aim to have what the user was
looking for on the first page of results.
Potential Issues with Semantic Web Search Engines
While the majority of the web is
still not semantic, there is likely to be low precision and high recall from
some search strings, this can be caused by relying on resources that are not
semantic. Also it can sometimes be hard – or impossible to identify the
intentions of the user and hence return results that fit these intentions,
there query could be ambiguous, or the information in the semantic web could
relate to other entry’s that don’t fit with the users query. Also it is
possible that individual user patterns can be extrapolated to global users,
however users in different categories will probably be searching for different
information even if the search string appears the same.
Finally there are always going to be
inaccurate queries, users don’t always structure their queries in the most
effective way, and often are not even sure themselves what they are looking
for.
Work needs to be done to solve the
issue of semantic ambiguity in order to improve semantic web based ontology
matching. This is currently the biggest problem in the new paradigm of ontology matching put forward in 2008 by W3C. These
are caused by ambiguity during the anchoring process of ontologies. [2]
Context
Potential Legal Issues
The current legal framework which
governs how data can be processed was mainly written before the internet became
as big as it is now, this becomes even more important as the development of the
semantic web is increasing, and more data is flowing to and from servers and
terminals.
Another issue is privacy, for a
semantic web search engine to learn
they have to collect information from the user; this may be storing what they
are searching for, recording their behaviour or even storing personal details
in order to return the most relevant search results.
There is also the issue of reliability
of the results, if someone has published an ontology with unreliable or
outdated information, then the results returned to the user may not be
reliable.
Potential Professional Issues
Having information which is much
more accessible and reliable will increase efficiency in many areas, but this
may lead to there being less jobs. For example search engine optimisation will
be a thing of the past, replaced by a much simpler process of just ensuring
that your semantic document follows conventions.
Potential Ethical Issues
It may be hard to monitor and ensure
that no inappropriate results are returned in a search if they have been linked
to the object the user was searching for.
Potential Environmental Issues
Google’s Graph Search, which is an
early stage semantic web search engine already has over 2 billion relationships
between objects, and that is just enough for testing their algorithms. Unless
further advances in processing are made, then if semantic web searching gets
popular it is going to have a serious effect on the environment. As it happens,
Google are currently investing in quantum computing in order to try to cope
with this increased demand for processing power.
Conclusion
The further development of the
semantic web and semantic web search is very exciting, as it is making
information so much more accessible, useful and relevant. Although there is
still a lot of development to do as nearly all of the World Wide Web is
currently unstructured.
References
[1] – The W3C Standard
[2] - Solving semantic ambiguity to
improve semantic web based ontology matching
[3] – Where will the semantic Web
take us with regard to improved access to Web-based information? www.sajim.co.za/index.php/SAJIM/article/download/207/203
[4] – Algorithmic procedure for
finding semantically related journals - http://www.garfield.library.upenn.edu/papers/pudovkinsemanticallyrelatedjournals2002.html
[5] – International Journals for
Semantic Web Technology - http://www.airccse.org/journal/ijwest/ijwest.html
[6] – A Semantic-Web based Framework
for Developing Applications to Improve Accessibility in the WWW – http://dl.acm.org/citation.cfm?id=1133238
[7] – How Can the Semantic Web
Improve the Acquisition and Sharing of Knowledge?
[8] –Google Knowledge Graph - http://www.google.co.uk/insidesearch/features/search/knowledge.html