이것은 알고리즘 질문이므로 언어는 중요하지 않습니다. 내 접근 방식은이 두 문자열 사이의 모든 서브 시퀀스를 생성하고 임계 값을 초과하는 서브 시퀀스를 찾습니다.
파이썬 코드 (자바가 훨씬 더 어려울 안) :
def common_subsequences(a, b, threshold):
# tokenize two string (keep order)
tokens_a = a.split()
tokens_b = b.split()
# store all common subsequences
common = set()
# with each token in a
for i, token in enumerate(tokens_a):
# if it also appears in b
# then this should be a starting point for a common subsequence
if token in tokens_b:
# get the first occurence of token in b
# and start from there
j = tokens_b.index(token)
k = i
temp = token
# since we need all subsequences, we get length-1 subsequences too
common.add(temp)
# while still have token in common
while j < len(tokens_b) and k < len(tokens_a):
if j + 1 < len(tokens_b) and k + 1 < len(tokens_a) and tokens_b[j+1] == tokens_a[k+1]:
temp += " " + tokens_b[j+1]
j += 1
k += 1
# adding (new) common subsequences
common.add(temp)
# or else we break
else:
break
# we only get the ones having length >= threshold
return [s for s in common if len(s.split()) >= threshold]
a = "Julie loves me more than Linda loves me"
b = "Jane likes me more than Julie loves me"
print common_subsequences(a, b, 2)
모든 common_subsequences :
set(['me', 'more than', 'Julie', 'Julie loves', 'Julie loves me', 'me more', 'loves', 'more', 'than', 'me more than', 'loves me'])
common_subsequences> = 임계 값 :
['more than', 'Julie loves', 'Julie loves me', 'me more', 'me more than', 'loves me']
@downvoter 넣어 당신의 이유는 여기에있다. –