2014-04-21 4 views
0

코드에서 $ url에 설정된 모든 링크를 가져 오기 위해 아래 스크립트를 수정했습니다.페이지 넘김 페이지의 링크를 구문 분석하는 PHP simple_html_dom

나는 어느 정도 작동하는 것처럼 보이지만, 모든 페이지를 가져오고 있지만, 모든 페이지를 구문 분석하지는 않습니다. 첫 번째 페이지 만 구문 분석하고 나머지는 결과를 반복합니다.

누군가 내가 여기서 잘못하고있는 것을 말해 줄 수 있습니까? 나는 이미 모든 것을 시험하는 하루 이상을 보냈습니다. 나는 또한 내가 얻는 결과를 포함시켰다.

넥스트 링크 : hxxp : //singersroom.com/subcontent/rnb-news/ hxxp : 아래

<?php 
include('simple_html_dom.php'); 
$base = "http://singersroom.com"; 
$url = "http://singersroom.com/subcontent/rnb-news/"; 

// Start from the main page 
$nextLink = $url; 

// Loop on each next Link as long as it exsists 
while ($nextLink) { 
    echo "<hr>nextLink: $nextLink<br>"; 
    //Create a DOM object 
    $html = new simple_html_dom(); 
    // Load HTML from a url 
    $html->load_file($nextLink); 
    $posts = $html->find('h3[class=prl-article-title]'); 
    foreach($posts as $post) { 
     // Get the link 
     $articles = $post->children(0)->href;   
     echo $base,$articles.'</br>'; 
    } 
    // Extract the next link, if not found return NULL 
    //$nextLink = (($temp = $html->find('div[class=pagination]', 0)->last_child()) ? $temp->href : NULL); 

    //$nextLink = (($temp = $html->find('div.pagination a[class="Next >>"]', 0)) ? "http://singersroom.com/subcontent/rnb-news/".$temp->href : NULL); 
    $nextLink = (($temp = $html->find('div[class=pagination]', 0)->last_child()) ? "http://singersroom.com/subcontent/rnb-news/".$temp->href : NULL); 

    //echo $temp; 
    // Clear DOM object 
    $html->clear(); 
    unset($html); 
} 

?> 

은 내가 얻고 결과 //singersroom.com/content/2014 -04-18/Prince-Collabs-with-Warner-Bros-for-New- 뮤직 - 퍼플 - 비 - 기념일 - 앨범/hxxp : //singersroom.com/content/2014-04-17/Tamar-Braxton-Adds -Tour-Dates-Thanks-Fans-For-Support/ hxxp : //singersroom.com/content/2014-04-14/Tamar-Braxton-Readies-New-Album-Inks-Third-Season-of-Tamar- Vince/ hxxp : //singersroom.com/content/2014-04-14/Jennifer-Hudson-Walk-It-Out-Ft-Timbaland/ hxxp : //singersroom.com/content/2014-04-15/Kindred-The-Family-Soul-Everybodys-Hustlin/ hxxp : //singersroom.com/content/2014-04-15/Lyrica-Anderson- Freakin-ft-Wiz-Khalifa/ hxxp : //singersroom.com/content/2014-04-07/Dont-Worry-About-Them-10-Baby-Mothers-That-Are-Doing-Just-Fine/hxxp : //singersroom.com/content/2014-03-27/Top-Ten-Best-Soundtracks-From-The-90s/ hxxp : //singersroom.com/content/2014-04-16/ The-Forbes- 5-2014s-wealthiest-Artists-in-Hip-Hop/ nextLink : hxxp : //singersroom.com/subcontent/rnb-news/? page = 2 hxxp : //singersroom.com/content/2014-04- 18/Prince-Collabs-with-Warner-Bros- 새 음악 - 보라 - 비 - 기념일 - 앨범/hxxp : //singersroom.com/content/2014-04-17/Tamar-Braxton-Adds-Tour- 날짜 - 감사 - 팬을위한 지원/ hxxp : //singersroom.com/content/2014-04-14/Tamar-Braxton-Re adies-New-Album-Inks-Third-Season-of-Tamar-Vince/ hxxp : //singersroom.com/content/2014-04-14/Jennifer-Hudson-Walk-It-Out-Ft-Timbaland/ hxxp : //singersroom.com/content/2014-04-15/Kindred-The-Family-Soul-Everybodys-Hustlin/ hxxp : //singersroom.com/content/2014-04-15/Lyrica-Anderson-Freakin -ft-Wiz-Khalifa/ hxxp : //singersroom.com/content/2014-04-07/Dont-Worry-About-Them-10-Baby-Mothers-That-Are-Doing-Just-Fine/ hxxp : //singersroom.com/content/2014-03-27/Top-Ten-Best-Soundtracks-From-The-90s/ hxxp : //singersroom.com/content/2014-04-16/The-Forbes-5ive -2014s-Wealthiest-Artists-in-Hip-Hop/ . . . nextLink : hxxp : //singersroom.com/subcontent/rnb-news/? page = 96 hxxp : //singersroom.com/content/2014-04-18/Prince-Collabs-with-Warner-Bros-for-New -Music-Purple-Rain-Anniversary-Album/hxxp : //singersroom.com/content/2014-04-17/Tamar-Braxton-Adds-Tour-Dates-Thanks-Fans-ForSupport/ hxxp : // singersroom.com/content/2014-04-14/Tamar-Braxton-Readies-New-Album-Inks-Theird-Season-of-Tamar-Vince/ hxxp : //singersroom.com/content/2014-04-14/Jennifer-Hudson-Walk-It-Out-Ft-Timbaland/ hxxp : //singersroom.com/content/2014-04-15/Kindred-The-Family-Soul-Everybodys-Hustlin/ hxxp : // singersroom .com/content/2014-04-15/Lyrica-Anderson-Freakin-ft-Wiz-Khalifa/ hxxp : //singersroom.com/content/2014-04-07/Dont-Worry-About-Them-10- Baby-Mothers-That-Are-Doing-Just-Fine/hxxp : //singersroom.com/content/2014-03-27/Top-Ten-Best-Soundtrac KS-에서 - 더 - 90/ hxxp : //singersroom.com/content/2014-04-16/The-Forbes-Five-2014s-Wealthiest-Artists-in-Hip-Hop/

답변

0

귀하 링크는 모두 hxxp에 있으며 이는 유효한 링크가 아니라는 것을 의미합니다.URL에 hxxp를 http로 바꾸면 다음 단계로 넘어갈 수 있습니다.

+0

아니요, stackoverflow가 2 개 이상의 링크를 게시 할 수 없으므로 hxxp로 변경했습니다. 코드에서 http : //입니다. – Spykey