published by Michael Hausenblas on Thu, 2011-09-08 09:24
Based on the LATC 24/7 Platform interfaces and interplay this report identifies the main performance bottlenecks of the LATC 24/7 Platform and provides eight preventive actions for addressing them. We then give a detailed account of how we analysed the 24/7 Platform and identified the bottlenecks.
published by Michael Hausenblas on Thu, 2011-09-08 09:22
Explores different models of distributed computing that are or could be used for the LATC 24/7 Platform. Altogether, the Linking Engine from WP1T3 and the Quality Module from WP1T4 are expected to process billions of triples in order to create, evaluate and maintain millions of RDF links between the data sources. This report provides an overview of relevant techniques and reports on the one currently deployed.
published by Michael Hausenblas on Thu, 2011-09-08 09:18
Describe how quality assurance (QA) works in the LATC 24/7 Platform. The methods presented in the report mainly involve detecting and assessing the quality of such links. We distinguish between internal quality assurance,
which happens within the LATC platform and external quality assurance, which involves crawling the Web of Data and computing a number of metrics to assess its quality.
published by Michael Hausenblas on Thu, 2011-09-08 09:15
Describes the LATC Linking Engine and its setup as part of the LATC 24/7 24/7. It first introduces Silk, the Linking Engine for LATC and then presents its integration in the 24/7 Platform by the LATC Runtime.
published by Michael Hausenblas on Thu, 2011-09-08 09:12
Describes the requirements of the Dataset Inventory (DSI), and how the interface seeks to meet these requirements; it discusses the data sources combined in the Metadata Store (MDS), how the DSI is initially implemented.
published by Michael Hausenblas on Thu, 2011-09-08 08:51
Describes the LATC 24/7 Platform design and design goals. It defines the Platform scope as well as the target users. The 24/7 Platform components are introduced, the interfaces between the components are defined and the workflow to generate links using the 24/7 Platform is described. See also the GitHub repository for the current state ...
published by Michael Hausenblas on Thu, 2011-09-08 08:47
Describes the capabilities and the deployment of the data Crawler and Indexer, including data acquisition, supported formats and synchronization of datasets.