An Empirical Evaluation of Set Similarity Join Techniques - Compilation Instructions

W. Mann, N. Augsten, P. Bouros. An Empirical Evaluation of Set Similarity Join Techniques. In The Proceedings of the VLDB Endowment (PVLDB 2016)

Go back to paper's main page

Pre-Requisites

Compiling

These instructions have been tested on Debian jessie (Debian 8). They should work on most UNIX-like systems.

Execution

The set_sim_join binary by default expects fully preprocessed input, i.e.:

If your input does not fulfill these conditions, you can request preprocessing by --whitespace (every consecutive sequence of non-whitespace characters is a token) or --qgram N ( to build q-grams). Again, one set per line.

Contact

Willi Mann (wmann AT cosy.sbg.ac.at)