NowOnWeb can be defined as a NewsIR system that deals with the on-line news sources. It supplies with an effective and efficient approach to show news articles, about a specific topic, to the user in a comfortable way.
The system was component-based designed and comprise a crawler to obtain the web-pages, an indexer to maintain the incremental index whit a temporal window, a news recognition and extraction module that enables the dynamic adding of sources, a news grouping component that uses novelty and redundancy detection approaches, and a summariser, among others.
As commercial product NowOnWeb offers:
A web service that serves the news about the user information needs.
A flexible and adaptable application to the needs of the users and their areas.
An efficient and scalable product with effective results.
A component based software with modules that allows the reuse in different applications.
A great product to the press areas of the companies and institutions that also can be adapted to other fields as technology surveillance or vertical search.
Lexisla
Lexisla can be defined as a LegalIR system that crawls, segments and index official legal bulletins. It is currently under development phase with the support of Fundación Calidade
The system is composed by different modules such as specific crawler, a document analyser component, an indexer and a advanced search interface.
As commercial product Lexisla offers:
Flexible multisource segmentation algorithm
Efficient document analysis and processing
Automatic content updating
Advanced local browsing over the bulletins
Advanced search capabilities
The Coruña Corpus Tool.
The CCT this is a development carried out by the IRLab in collaboration with the English Department. Indeed the application came up because the need of the Muste Group of has a system to manage and exploit its linguistic corpus.
The objective is help linguists to extract and condense valuable information for their research. But the application was not designed tied to the Coruña Corpus and it supports any xml-formatted corpus being, in this sense, an application that could be widely used.
As commercial product The CCT offers:
Linguistic corpus management, not only documents as text but also author information and styled document rendering.
Treatment and validation of TEI encoded documents with support for non-standard characters. It supplies information about the format errors in order to allow the correction by the linguists.
Intra-documental and collection basic search by single terms.
Concordance generation (key-word in context) of all the term appearances and location in the document.
Prefix, suffix and regular expressions search, which is very useful for the linguistic work.
Phrase search with term distance specification in order to search for linguistic structures.
Generation of types and tokens lists in document and collection level to allow statistical study of the terms occurrences.
Copyright 2006-2010 IRLab. Departamento de Computación Universidad de A Coruña