A prototype of Visualizing Information Space In Ontological Networks (VISION) will be designed and implemented for displaying the dynamic information landscape (2D and 3D maps) and showing the spread of concepts, ideas, and news.
The overall system framework is illustrated in Figure 1. Initial search will use pre-defined keywords in the three selected topics (e.g. nature disasters, continuous threats for human beings, and radical social movements) provided by domain experts to search public accessible websites (using popular search engines, such as Google Search Engine, Microsoft Bing, or Yahoo).
The search results will be converted into a [Raw Text Database], include all search results (ranking, titles, partial contents, and URLs). The system will use the URLs, gazetteers, and geo-locating methods to convert raw texts into [Spatial Web Information Databases], which include both geospatial locations (latitudes and longitudes) and semantic contents (keywords) for each record.
By utilizing WHOIS protocol (a converting method from Web Domain Name and IP address to server registration addresses, http://www.whois.net/) and online gazetteer services, including Alexandria Digital Library Gazetteer server and GeoNames server, we can easily convert key place names mentioned in a Web page and Web addresses (URLs) into real places (with latitudes and longitudes).
Domain experts will also review these texts and use various tools of computational linguistics, GIS, and gazetteers to identify new key words, key phrases, and related spatial place names in the spatial web information databases. The databases (created by MS SQL server) will be converted to [Visualization maps] showing the dynamic information landscape of specific ideas or concepts.
GIS, calculation of network connectivity, and space-time analysis will be used to understand the dynamic change of these concepts and events over space and time. Computational linguistics experts will establish frequencies of occurrences of “key terms,” separately and in clusters.
Multiple [Semantic Knowledge bases] (ontologies) related to ideas, concepts and special topics will be created and revised based on the visualization maps and space-time analysis. The revised ontology terms and phrases will be used for the next round of Web query process.
New web pages and websites will be discovered by advanced keyword clusters and generate new records in the [Spatial Web Information Databases].
Table 1. The major computational tools adopted in VISION.
|Protege.||Ontology creation tools||Free (open source)|
|ArcGIS||GIS and Mapping software||Department of Geography, SDSU|
|OWL (Web Ontology Language)||Ontology language||Free (WWW Consortium)|
|CiteMapper||Citation matrix system||Department of Linguistics, SDSU|
|ThesaurusBuilder||Thesaurus Creation||Department of Linguistics, SDSU|
|ADL Gazetteer Server||Gazetteer Servers||UC Santa Barbara, (free web services)|
|GeoNames||Geolocations of place names||Marc Wick. (free on-line services)|
|SAS||Statistical analysis||Department of Geography, SDSU|
|Fragstats||Calculation of spatial indices||Department of Geography, SDSU|
|WHOIS protocol||Convert URLs and IP addresses into server registration addresses||American Registry for Internet Numbers (ARIN) (Free)|
The visualization maps and network analysis will constitute data for further quantitative analysis to enrich and refine the search algorithm and to learn more about the nature and specificity of ideas and their characteristic textual architectures.
Several key tools, web services, and technologies will be used in the prototype development of SWARMS. These tools are listed in Table 1. One advantage of this SWARMS framework is its language- and search-engine-independent architecture.
This framework will be used to query keywords in multiple languages (Chinese, Arabic, or Japanese) and use multiple Web search engines. However, the multiple language semantic analysis and the translation works will require much larger resources and financial supports,
which is beyond the scope of this proposed project. The design of SWARMS prototype will also focus on the appropriate spatial scales for mapping the locations of web pages and websites.
We will analyze the potential errors (uncertainty) in the geo-locationing process and the accuracy of web server registration addresses.
For examples, some personal weblogs might be published in a commercial website and the Weblog URL will only be linked to the commercial web server rather than individual blogger’s locations.
In this case, we might test different scale level maps (from country-scale to city-scale to street-level-scale) and find out the relationship between the spatial scale and the keywords.
The spatial scale questions will be the focus on the second year improvement of the SWARMS.