weekly/update/2018/04/23/vocabulary-similarity-algorithm.html (80 lines of code) (raw):
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width,initial-scale=1" />
<title>Apache SDAP - Science Data Analytics Platform</title>
<link rel="shortcut icon" href="/favicon.ico" />
<link rel="icon" type="image/png" href="/favicon.png" />
<link rel="stylesheet" href="/css/bootstrap.min.css" />
<link rel="stylesheet" href="/css/style.css" />
</head>
<body>
<div class="container">
<div class="logos">
<a href="/">
<img src="images/sdap_logo.png" class="pull-left" />
</a>
</div>
<!-- navigation bar -->
<nav class="navbar navbar-default">
<div class="container-fluid">
<div class="navbar-header">
<a class="navbar-brand" href="/">SDAP</a>
</div>
<div class="navbar-right">
<ul class="nav navbar-nav">
<li class="dropdown toggle">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">About SDAP <span class="caret"></span></a>
<ul class="dropdown-menu">
<li><a href="/docs">Docs</a></li>
<li><a href="/publications">Publications</a></li>
<li><a href="/projects">Projects that use SDAP</a></li>
<li><a href="/events">Community Events</a></li>
</ul>
</li>
<li><a href="/downloads">Downloads</a></li>
<li><a href="/blog">Blog</a></li>
<li><a href="/team">Team & Community</a></li>
<!-- <li><a href="/resources">Resources</a></li>-->
<li class="dropdown toggle">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Apache <span class="caret"></span></a>
<ul class="dropdown-menu">
<li><a href="http://www.apache.org/">Apache Software Foundation</a></li>
<li><a href="http://www.apache.org/licenses/">License</a></li>
<li><a href="http://www.apache.org/foundation/sponsorship">Sponsorship</a></li>
<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
<li><a href="http://www.apache.org/events/current-event/">Events</a></li>
<li><a href="http://www.apache.org/security/">Security</a></li>
</ul>
</li>
</ul>
</div>
</div>
</nav>
<h1>An introduction to MUDROD vocabulary similarity calculation algorithm</h1>
<p>Posted <b>2018-04-23</b> by <b>Lewis John McGibbney</b></p>
<p>Big geospatial data have been produced, archived and made available online, but finding the right data for scientific research and decision-support applications remains a significant challenge. A long-standing problem in data discovery is how to locate, assimilate and utilize the semantic context for a given query. Most of past research in geospatial domain attempts to solve this problem through two approaches: 1) building a domain-specific ontology manually; 2) discovering semantic relationship through dataset metadata automatically using machine learning techniques. The former contains rich expert knowledge, but it is static, costly, and labour intensive, while the latter is automatic, it is prone to noise.</p>
<p>An emerging trend in information science is to take advantage of large-scale user search history, which is dynamic but contains user and crawler generated noise. Leveraging the benefits of all of these three approaches and avoiding their weaknesses, a novel approach is proposed in this article to 1) discover vocabulary semantic relationship from user clickstream; 2) refine the similarity calculation methods from existing ontology; 3) integrate the results of ontology, metadata, user search history and clickstream analysis to better determine the semantic relationship.</p>
<center>
<img src="/images/vocabulary.png" />
Figure 1. System workflow and architecture
</center>
<p>The system starts by pre-processing raw web logs, metadata, and ontology (Figure 1 ). After pre-processing step, search history and clickstream data are extracted from raw logs, selected properties are extracted from metadata, and ocean-related triples are extracted from the SWEET ontology. These four types of processed data are then put into their corresponding processer as discussed in the last section. Once all the processers finish their jobs, the results of different methods are integrated to produce a final most related terms list.</p>
<div>
<b>Previous:</b> <a href="/weekly/update/2018/04/23/recommendation-algorithms.html">An introduction to MUDROD recommendation algorithm</a>
</div>
<div>
<b>Next:</b> <a href="/release/2023/01/20/v1.0.0-release.html">V1.0.0 Release</a>
</div>
<!-- footer -->
<nav class="navbar navbar-default">
<div class="navbar-header">
<a class="navbar-brand" href="">SDAP</a>
</div>
<div class="navbar-text pull-right">© 2017-2025 The Apache Software Foundation. Licensed under <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License 2.0</a>. <a href="https://privacy.apache.org/policies/privacy-policy-public.html">Privacy Policy</a><br/>
Apache SDAP, SDAP, Apache, the Apache feather logo, and the Apache SDAP project logo are trademarks of The Apache Software Foundation.</div>
</nav>
<script src="/js/jquery.min.js"></script>
<script src="/js/bootstrap.min.js"></script>
</div>
</body>
</html>