Русский/English
This tool is designed for detection and analysis of near duplicates in software documentation. Two text fragments are considered near duplicates if they contain common information expressed identically in terms of syntax (i.e. using the same text) while still having a number of differences that, however, do not make up the bulk of both fragments.
Our tool works in two modes:
The first mode's purpose is to provide an express estimate of the number of near duplicates in a document. However, due to being automatic, it does not allow to identify semantically correct duplicates - often, meaningless phrases that are identical syntax-wise are merged, while significant duplicates are not extracted fully. There are other problems as well. Nevertheless, this mode provides an adequate general idea on the "density" of duplicates in a document. From there on, the interactive mode is suggested for acquiring correct information and using near duplicates in documentation reuse.
The tool’s source code is available here.