A new semantic similarity join method using diffusion maps and long string table attributes