도메인 이름의 단어를 찾고 인식하는 PHP 스크립트

-5

도메인 이름의 단어를 인식 할 수있는 PHP 코드/스크립트를 찾고 있습니다. 사용자 쿼리 도메인 이름 snapnames.com 예를 들어도메인 이름의 단어를 찾고 인식하는 PHP 스크립트

- SnapNames.com를 표시합니다이 스크립트 (이 도메인에서이 개 단어를 인식 : 스냅 이름)

이

희망 누군가가 도움을 줄 수

감사

출처

2012-01-23 Mai Reynolds

프로그래밍상의 질문이 아닙니다 – zerkms

도메인 expertsexchange.com에서 스크립트가 어떤 역할을하는지 궁금합니다. – sarnold

영어는 매우 모호하고 상황에 맞는 언어입니다. 영어로 유창한 사람의 기술을 완벽하게 대체 할 수있는 컴퓨터는 없습니다. –

Arnold는 "expertsexbhange.com"과 같은 도메인은 "Expert Sex Change.com"및 "Experts Exchange.com"으로 평가 될 수 있다고 말하면서 완벽한 답변이 없기를 두려워합니다.

뿐만 아니라 이러한 기능은 메모리 및 처리 능력에 다소 집중적입니다. 모든 단어 등을 인식하려면 거대한 파일이 필요할 것입니다. 다른 솔루션을 찾으려고 할 때 필요한 이유를 알고 있으면 좋을 것입니다.

웹 사이트에 대한 정보를 표시하는 서비스가있는 경우 "Snapnames.com"을 표시하는 것이 적합합니다. 자본화 할 필요가 없습니다.

그러나, 당신은 ... 100 % 정확하고 서버에 오히려 강렬한없는 경우에도이 문제에 대한 지옥을 구부리고 판단되는 경우

먼저 확인하는 방법을 찾아야합니다 문자열이 단어 인 경우. 그것은 완전히 별개의 질문이며 완벽하게 합당한 대답입니다. 따로 따로 물어볼 필요가 있습니다. PHP 사전 라이브러리를 찾을 수 있는지 확인하십시오.

기본적으로 단어가 될 때까지 문자열을 거꾸로 반복하고 문자열에서 해당 단어를 제거한 다음 반복합니다. 예를 들어 :

expertsexchange.com, 당신은 그래서 그것을 확인합니다 :

첫 번째 {} 단어의 목록 foubnd입니다. 첫 번째 "

hellotherewittlekitty ..."당신의 다른 예를 봅시다

{} "expertsexchange" "expertsexchange" <-- not a word 
{} "expertsexchange" "expertsexchang" <-- not a word 
{} "expertsexchange" "expertsexchan" <-- not a word 
{} "expertsexchange" "expertsexcha" <-- not a word 
{} "expertsexchange" "expertsexch" <-- not a word 
{} "expertsexchange" "expertsexc" <-- not a word 
{} "expertsexchange" "expertsex" <-- not a word 
{} "expertsexchange" "expertse" <-- not a word 
{} "expertsexchange" "experts" <-- WORD! Add it to our list of words 
{"experts"} "exchange" "exchange" <-- WORD! Add it to our list of words 
{"experts", "exchange"} "" "" <-- No more letters to check, we have found all of our words.

을 확인하는 문자의 현재의 하위 집합입니다 "당신이 이 마지막으로 확인 남아있는 모든 문자입니다." 이것은 사전에 의해 인식되지 않을 "단어"("약")가 있습니다.

{} "hellotherewittlekitty" "hellotherewittlekitty" <-- not a word 
{} "hellotherewittlekitty" "hellotherewittlekitt" <-- not a word 
{} "hellotherewittlekitty" "hellotherewittlekit" <-- not a word 
{} "hellotherewittlekitty" "hellotherewittleki" <-- not a word 
{} "hellotherewittlekitty" "hellotherewittlek" <-- not a word 
{} "hellotherewittlekitty" "hellotherewittle" <-- not a word 
{} "hellotherewittlekitty" "hellotherewittl" <-- not a word 
{} "hellotherewittlekitty" "hellotherewitt" <-- not a word 
{} "hellotherewittlekitty" "hellotherewit" <-- not a word 
{} "hellotherewittlekitty" "hellotherewi" <-- not a word 
{} "hellotherewittlekitty" "hellotherew" <-- not a word 
{} "hellotherewittlekitty" "hellothere" <-- not a word 
{} "hellotherewittlekitty" "hellother" <-- not a word 
{} "hellotherewittlekitty" "hellothe" <-- not a word 
{} "hellotherewittlekitty" "helloth" <-- not a word 
{} "hellotherewittlekitty" "hellot" <-- not a word 
{} "hellotherewittlekitty" "hello" <-- WORD! add it to list, and remove form main string! 
{"hello"} "therewittlekitty" "therewittlekitty" <-- not a word 
{"hello"} "therewittlekitty" "therewittlekitt" <-- not a word 
{"hello"} "therewittlekitty" "therewittlekit" <-- not a word 
{"hello"} "therewittlekitty" "therewittleki" <-- not a word 
{"hello"} "therewittlekitty" "therewittlek" <-- not a word 
{"hello"} "therewittlekitty" "therewittle" <-- not a word 
{"hello"} "therewittlekitty" "therewittl" <-- not a word 
{"hello"} "therewittlekitty" "therewitt" <-- not a word 
{"hello"} "therewittlekitty" "therewit" <-- not a word 
{"hello"} "therewittlekitty" "therew" <-- not a word 
{"hello"} "therewittlekitty" "there" <-- WORD! add it to list, and remove from main string 
{"hello", "there"} "wittlekitty" "wittlekitty" <-- not a word 
{"hello", "there"} "wittlekitty" "wittlekitt" <-- not a word 
{"hello", "there"} "wittlekitty" "wittlekit" <-- not a word 
{"hello", "there"} "wittlekitty" "wittleki" <-- not a word 
{"hello", "there"} "wittlekitty" "wittlek" <-- not a word 
{"hello", "there"} "wittlekitty" "wittle" <-- not a word (even though humans read it as one) 
{"hello", "there"} "wittlekitty" "wittl" <-- not a word 
{"hello", "there"} "wittlekitty" "witt" <-- WORD! add to dictionary and remove from string 
{"hello", "there", "witt"} "lekitty" "lekitty" <-- not a word 
{"hello", "there", "witt"} "lekitty" "lekitt" <-- not a word 
{"hello", "there", "witt"} "lekitty" "lekit" <-- not a word 
{"hello", "there", "witt"} "lekitty" "leki" <-- WORD! (biology, wikipedia) 
{"hello", "there", "witt", "leki"} "tty" "tty" <-- not a word 
{"hello", "there", "witt", "leki"} "tty" "tt" <-- not a word 
{"hello", "there", "witt", "leki"} "tty" "t" <-- not a word 
{"hello", "there", "witt", "leki"} "tty" "" <-- No more letters, add it to the list! 
{"hello", "there", "witt", "leki", "tty"} "" ""

따라서, hellotherewittlekitty 그냥 모두 소문자로 떠나는보다 더 악화 될 HelloThereWittLekiTty로 나올 것이다 : 불행하게도,이 알고리즘이 처리 할 방법이다.

CPU보다 훨씬 더 많은 알고리즘이 있으며, 더 많은 데이터가 필요합니다. 그러면 더 많은 정확도를 제공 할 수 있습니다. 그러나 전체적으로 모든 작업에 대해 30 %의 정확도를 얻는 것은 가치가 없습니다. 특히 알고리즘이 실패하면 단어가 유실되기 때문입니다. 즉,이를 추가하면 웹 사이트의 60 %가 망가질 수 있습니다.

출처

2012-01-23 01:49:02

도메인 이름의 단어를 찾고 인식하는 PHP 스크립트

답변

관련 문제