Working on a feature to for my media monitoring program to identify similar articles.

I am working under assumption that

1. LSH (locality sensitive hashing) will allow me to identify potentially similar articles
2. The I would use some sort of text similarity algorithm to compare just the potential matches

Links

Tarsos LSH – https://github.com/JorenSix/TarsosLSH

Advertisements