Finds texts that are nearly identical — reposts, copypasta, minor rewording. Good for deduplication. You set a similarity threshold; only texts above it are grouped together.
Groups texts by topic and meaning, even if wording differs. Good for discovering themes in a large dataset. You choose how many clusters to create (or leave blank for auto).