The second type of supported operators measure the similarities between different strings. They take two strings and return a similarity value in the interval [0,1]. While some of the operators work by computing an [edit distance](https://en.wikipedia.org/wiki/Edit_distance) between the two strings, others work by first tokenizing the two strings into words or breaking them into [q-grams](https://en.wikipedia.org/wiki/N-gram), and then comparing the resulting token (multi-)sets. The supported similarity operators are:

blockDistance

computes the similarity based on the L1-distance between the token sets of the input strings

cosineSimilarity

computes the Cosine Similarity between the token sets of the input strings

damerauLevenshtein

computes the similarity based on the Damerau–Levenshtein Edit Distance

dice

computes the Dice Coefficient between the token sets of the input strings

euclideanDistance

computes the similarity based on the L2-distance between the token sets of the input strings

generalizedJaccard

computes the Generalised Jaccard Similarity between the token sets of the input strings

identity

returns 1 if the two strings are the same and 0 otherwise

jaccard

computes the Jaccard Similarity between the token sets of the input strings

jaro

computes the Jaro Similarity between the input strings

jaroWinkler

computes the Jaro-Winkler Similarity between the input strings

jaroWinklerSoundex

computes the Jaro-Winkler Similarity between the Soundex encodings of the input strings

leadingSubstringSimilarity

computes the common prefix similarity on the list of tokens of the two strings

levenshtein

computes the similarity based on the Levenshtein Edit Distance between the input strings

longestCommonSubsequence

computes the similarity based on the length of the Longest Common Subsequence of the input strings

longestCommonSubstring

computes the similarity based on the length of the Longest Common Substring of the input strings

mongeElkan

computes the Monge-Elkan similarity between the token sets of the two strings by lifting the Smith-Waterman-Gotoh similarity to sets

mongeElkanMax

computes the Monge-Elkan similarity between the token sets of the two strings by lifting the substring similarity to sets

needlemanWunch

computes the Needleman–Wunsch similarity between the input strings

overlapCoefficient

computes the Overlap Coefficient between the token sets of the input strings

qGramsDistance

computes the similarity based on the L1-distance between the sets of tri-grams in the input strings

simonWhite

computes the Simon-White coefficient (the multi-set version of the Dice Coefficient) between the multisets of bi-grams of the input sets

smithWaterman

computes the Smith-Waterman Similarity between the input strings

smithWatermanGotoh

computes the Gotoh version of the Smith-Waterman Similarity between the input strings

substring

returns 1 if one of the strings is a subset of the other, and 0 otherwise

`caverphone1`

Computes the Caverphone phonetic algorithm (version 1).

caverphone1(Text)

Where:

Text is the string to be encoded.

Example

@library("sim:", "simmetrics").
input("Marcus").
result(X) :- input(Y), X = sim:caverphone1(Y).
@output("result").

Expected results

result("MKS111")

`caverphone2`

Computes the Caverphone phonetic algorithm (version 2).

caverphone2(Text)

Where:

Text is the string to be encoded.

Example

@library("sim:", "simmetrics").
input("Markus").
result(X) :- input(Y), X = sim:caverphone2(Y).
@output("result").

Expected results

result("MKS111")

`colognePhonetic`

Computes the Cologne phonetic algorithm.

colognePhonetic(Text)

Where:

Text is the string to be encoded.

Example

@library("sim:", "simmetrics").
input("Mayer").
result(X) :- input(Y), X = sim:colognePhonetic(Y).
@output("result").

Expected results

result("67")

`daitchMokotoffSoundex`

Computes the Daitch-Mokotoff Soundex phonetic algorithm.

daitchMokotoffSoundex(Text)

Where:

Text is the string to be encoded.

Example

@library("sim:", "simmetrics").
input("Iozefovich").
result(X) :- input(Y), X = sim:daitchMokotoffSoundex(Y).
@output("result").

Expected results

result("147740")

`doubleMetaphone`

Computes the Double Metaphone phonetic algorithm.

doubleMetaphone(Text)

Where:

Text is the string to be encoded.

Example

@library("sim:", "simmetrics").
input("architect").
result(X) :- input(Y), X = sim:doubleMetaphone(Y).
@output("result").

Expected results

result("ARKT")

`matchRatingApproach`

Computes the Match Rating Approach phonetic algorithm.

matchRatingApproach(Text)

Where:

Text is the string to be encoded.

Example

@library("sim:", "simmetrics").
input("Smith").
result(X) :- input(Y), X = sim:matchRatingApproach(Y).
@output("result").

Expected results

result("SMT")

`metaphone`

Computes the Metaphone phonetic algorithm.

metaphone(Text)

Where:

Text is the string to be encoded.

Example

@library("sim:", "simmetrics").
input("Melbert").
result(X) :- input(Y), X = sim:metaphone(Y).
@output("result").

Expected results

result("MLBR")

`nysiis`

Computes the New York State Identification and Intelligence System phonetic algorithm.

nysiis(Text)

Where:

Text is the string to be encoded.

Example

@library("sim:", "simmetrics").
input("Webberley").
result(X) :- input(Y), X = sim:nysiis(Y).
@output("result").

Expected results

result("WABARL")

`removeDiacritics`

Removes diacritics from a string.

removeDiacritics(Text)

Where:

Text is the string from which to remove diacritics.

Example

@library("sim:", "simmetrics").
input("Cañon City").
result(X) :- input(Y), X = sim:removeDiacritics(Y).
@output("result").

Expected results

result("Canon City")

`removeNonWord`

Removes non-word characters from a string.

removeNonWord(Text)

Where:

Text is the string from which to remove non-word characters.

Example

@library("sim:", "simmetrics").
input("hello, world!").
result(X) :- input(Y), X = sim:removeNonWord(Y).
@output("result").

Expected results

result("helloworld")

`soundex`

Computes the Soundex phonetic algorithm.

soundex(Text)

Where:

Text is the string to be encoded.

Example

@library("sim:", "simmetrics").
input("Perotti").
result(X) :- input(Y), X = sim:soundex(Y).
@output("result").

Expected results

result("P630")

`toLowerCase`

Transforms a string into lower case.

toLowerCase(Text)

Where:

Text is the string to be transformed.

Example

@library("sim:", "simmetrics").
input("HELLO WORLD").
result(X) :- input(Y), X = sim:toLowerCase(Y).
@output("result").

Expected results

result("hello world")

`toUpperCase`

Transforms a string into upper case.

toUpperCase(Text)

Where:

Text is the string to be transformed.

Example

@library("sim:", "simmetrics").
input("hello world").
result(X) :- input(Y), X = sim:toUpperCase(Y).
@output("result").

Expected results

result("HELLO WORLD")

`blockDistance`

Computes the similarity based on the L1-distance between the token sets of the input strings.

blockDistance(Text1, Text2)

Where:

Text1 is the first string to be compared.
Text2 is the second string to be compared.

Example

@library("sim:", "simmetrics").
strings("hello world", "hello").
result(X) :- strings(Y1, Y2), X = sim:blockDistance(Y1, Y2).
@output("result").

Expected results

result(0.5)

`cosineSimilarity`

Computes the similarity based on the Damerau–Levenshtein Edit Distance between the input strings.

cosineSimilarity(Text1, Text2)

Where:

Text1 is the first string to be compared.
Text2 is the second string to be compared.

Example

@library("sim:", "simmetrics").
strings("hello world", "hello").
result(X) :- strings(Y1, Y2), X = sim:cosineSimilarity(Y1, Y2).
@output("result").

Expected results

result(0.707)

`damerauLevenshtein`

Computes the similarity based on the Damerau–Levenshtein Edit Distance between the input strings.

damerauLevenshtein(Text1, Text2)

Where:

Text1 is the first string to be compared.
Text2 is the second string to be compared.

Example

@library("sim:", "simmetrics").
strings("hello", "hallo").
result(X) :- strings(Y1, Y2), X = sim:damerauLevenshtein(Y1, Y2).
@output("result").

Expected results

result(0.8)

`dice`

Computes the Dice Coefficient between the token sets of the input strings.

dice(Text1, Text2)

Where:

Text1 is the first string to be compared.
Text2 is the second string to be compared.

Example

@library("sim:", "simmetrics").
strings("hello", "hallo").
result(X) :- strings(Y1, Y2), X = sim:dice(Y1, Y2).
@output("result").

Expected results

result(0.667)

`euclideanDistance`

Computes the similarity based on the L2-distance between the token sets of the input strings.

euclideanDistance(Text1, Text2)

Where:

Text1 is the first string to be compared.
Text2 is the second string to be compared.

Example

@library("sim:", "simmetrics").
strings("hello world", "hello").
result(X) :- strings(Y1, Y2), X = sim:euclideanDistance(Y1, Y2).
@output("result").

Expected results

result(0.5)

`generalizedJaccard`

Computes the Generalised Jaccard Similarity between the token sets of the input strings.

generalizedJaccard(Text1, Text2)

Where:

Text1 is the first string to be compared.
Text2 is the second string to be compared.

Example

@library("sim:", "simmetrics").
strings("hello", "hallo").
result(X) :- strings(Y1, Y2), X = sim:generalizedJaccard(Y1, Y2).
@output("result").

Expected results

result(0.75)

`identity`

Returns 1 if the two strings are the same and 0 otherwise.

identity(Text1, Text2)

Where:

Text1 is the first string to be compared.
Text2 is the second string to be compared.

Example

@library("sim:", "simmetrics").
strings("hello", "hello").
result(X) :- strings(Y1, Y2), X = sim:identity(Y1, Y2).
@output("result").

Expected results

result(1)

`jaccard`

Computes the Jaccard Similarity between the token sets of the input strings.

jaccard(Text1, Text2)

Where:

Text1 is the first string to be compared.
Text2 is the second string to be compared.

Example

@library("sim:", "simmetrics").
strings("hello", "hallo").
result(X) :- strings(Y1, Y2), X = sim:jaccard(Y1, Y2).
@output("result").

Expected results

result(0.6)

`jaro`

Computes the Jaro Similarity between the input strings.

jaro(Text1, Text2)

Where:

Text1 is the first string to be compared.
Text2 is the second string to be compared.

Example

@library("sim:", "simmetrics").
strings("hello", "hallo").
result(X) :- strings(Y1, Y2), X = sim:jaro(Y1, Y2).
@output("result").

Expected results

result(0.84)

`jaroWinkler`

Computes the Jaro-Winkler Similarity between the input strings.

jaroWinkler(Text1, Text2)

Where:

Text1 is the first string to be compared.
Text2 is the second string to be compared.

Example

@library("sim:", "simmetrics").
strings("hello", "hallo").
result(X) :- strings(Y1, Y2), X = sim:jaroWinkler(Y1, Y2).
@output("result").

Expected results

result(0.87)

`jaroWinklerSoundex`

Computes the Jaro-Winkler Similarity between the Soundex encodings of the input strings.

jaroWinklerSoundex(Text1, Text2)

Where:

Text1 is the first string to be compared.
Text2 is the second string to be compared.

Example

@library("sim:", "simmetrics").
strings("Perotti", "Pirot").
result(X) :- strings(Y1, Y2), X = sim:jaroWinklerSoundex(Y1, Y2).
@output("result").

Expected results

result(0.89)

`leadingSubstringSimilarity`

Computes the common prefix similarity on the list of tokens of the two strings. Returns |C|/max{|L1|, |L2|}, where Li is the list of tokens in the i`th input string, i = 1,2, and `C is the longest prefix prefix of the lists L1 and L2.

leadingSubstringSimilarity(Text1, Text2)

Where:

Text1 is the first string to be compared.
Text2 is the second string to be compared.

Example

@library("sim:", "simmetrics").
strings("hello world", "hello there").
result(X) :- strings(Y1, Y2), X = sim:leadingSubstringSimilarity(Y1, Y2).
@output("result").

Expected results

result(0.5)

`levenshtein`

Computes the similarity based on the Levenshtein Edit Distance between the input strings.

levenshtein(Text1, Text2)

Where:

Text1 is the first string to be compared.
Text2 is the second string to be compared.

Example

@library("sim:", "simmetrics").
strings("hello", "hallo").
result(X) :- strings(Y1, Y2), X = sim:levenshtein(Y1, Y2).
@output("result").

Expected results

result(0.8)

`longestCommonSubsequence`

Computes the similarity based on the length of the Longest Common Subsequence of the input strings.

longestCommonSubsequence(Text1, Text2)

Where:

Text1 is the first string to be compared.
Text2 is the second string to be compared.

Example

@library("sim:", "simmetrics").
strings("hello world", "hallo there").
result(X) :- strings(Y1, Y2), X = sim:longestCommonSubsequence(Y1, Y2).
@output("result").

Expected results

result(0.6)

`longestCommonSubstring`

Computes the similarity based on the length of the Longest Common Substring of the input strings.

longestCommonSubstring(Text1, Text2)

Where:

Text1 is the first string to be compared.
Text2 is the second string to be compared.

Example

@library("sim:", "simmetrics").
strings("hello world", "hallo there").
result(X) :- strings(Y1, Y2), X = sim:longestCommonSubstring(Y1, Y2).
@output("result").

Expected results

result(0.4)

`mongeElkan`

Computes the Monge-Elkan similarity between the token sets of the two strings by lifting the Smith-Waterman-Gotoh similarity to sets.

mongeElkan(Text1, Text2)

Where:

Text1 is the first string to be compared.
Text2 is the second string to be compared.

Example

@library("sim:", "simmetrics").
strings("hello", "helo").
result(X) :- strings(Y1, Y2), X = sim:mongeElkan(Y1, Y2).
@output("result").

Expected results

result(0.857)

`mongeElkanMax`

Computes the Monge-Elkan similarity between the token sets of the two strings by lifting the substring similarity to sets.

mongeElkanMax(Text1, Text2)

Where:

Text1 is the first string to be compared.
Text2 is the second string to be compared.

Example

@library("sim:", "simmetrics").
strings("hello", "helo").
result(X) :- strings(Y1, Y2), X = sim:mongeElkanMax(Y1, Y2).
@output("result").

Expected results

result(0.857)

`needlemanWunch`

Computes the Needleman–Wunsch similarity between the input strings.

needlemanWunch(Text1, Text2)

Where:

Text1 is the first string to be compared.
Text2 is the second string to be compared.

Example

@library("sim:", "simmetrics").
strings("hello", "helo").
result(X) :- strings(Y1, Y2), X = sim:needlemanWunch(Y1, Y2).
@output("result").

Expected results

result(0.857)

`overlapCoefficient`

Computes the Overlap Coefficient between the token sets of the input strings.

overlapCoefficient(Text1, Text2)

Where:

Text1 is the first string to be compared.
Text2 is the second string to be compared.

Example

@library("sim:", "simmetrics").
strings("hello world", "hello").
result(X) :- strings(Y1, Y2), X = sim:overlapCoefficient(Y1, Y2).
@output("result").

Expected results

result(1.0)

`qGramsDistance`

Computes the similarity based on the L1-distance between the sets of tri-grams in the input strings.

qGramsDistance(Text1, Text2)

Where:

Text1 is the first string to be compared.
Text2 is the second string to be compared.

Example

@library("sim:", "simmetrics").
strings("hello", "hallo").
result(X) :- strings(Y1, Y2), X = sim:qGramsDistance(Y1, Y2).
@output("result").

Expected results

result(0.8)

`simonWhite`

Computes the Simon-White coefficient (the multi-set version of the Dice Coefficient) between the multisets of bi-grams of the input sets.

simonWhite(Text1, Text2)

Where:

Text1 is the first string to be compared.
Text2 is the second string to be compared.

Example

@library("sim:", "simmetrics").
strings("hello", "helo").
result(X) :- strings(Y1, Y2), X = sim:simonWhite(Y1, Y2).
@output("result").

Expected results

result(0.8)

`smithWaterman`

Computes the Smith-Waterman Similarity between the input strings.

smithWaterman(Text1, Text2)

Where:

Text1 is the first string to be compared.
Text2 is the second string to be compared.

Example

@library("sim:", "simmetrics").
strings("hello", "helo").
result(X) :- strings(Y1, Y2), X = sim:smithWaterman(Y1, Y2).
@output("result").

Expected results

result(0.9)

`smithWatermanGotoh`

Computes the Gotoh version of the Smith-Waterman Similarity between the input strings.

smithWatermanGotoh(Text1, Text2)

Where:

Text1 is the first string to be compared.
Text2 is the second string to be compared.

Example

@library("sim:", "simmetrics").
strings("hello", "helo").
result(X) :- strings(Y1, Y2), X = sim:smithWatermanGotoh(Y1, Y2).
@output("result").

Expected results

result(0.9)

`substring`

Returns 1 if one of the strings is a subset of the other, and 0 otherwise.

substring(Text1, Text2)

Where:

Text1 is the first string to be compared.
Text2 is the second string to be compared.

Example

@library("sim:", "simmetrics").
strings("hello", "hello world").
result(X) :- strings(Y1, Y2), X = sim:substring(Y1, Y2).
@output("result").

Expected results

result(1)