https://blog.gepuro.net/posts/find_similarity_between_pages_by_using_access_log_in_shell