다음 페이지에 XPath는

아마존에서 가격을 얻기 : 나는 표현다음 페이지에 XPath는

'//span[@id="priceblock_ourprice"]'

으로 가격을하려고하지만 결과는 빈 변수

http://www.amazon.com/Jessica-Simpson-Womens-Double-Breasted/dp/B00K65ZMCA/ref=sr_1_4_mc/185-0705108-6790969?s=apparel&ie=UTF8&qid=1413083859&sr=1-4 . http://www.amazon.com/SanDisk-Cruzer-Frustration-Free-Packaging--SDCZ36-032G-AFFP/dp/B007JR532M/ref=sr_1_1?s=pc&ie=UTF8&qid=1413084653&sr=1-1&keywords=usb

내가

'//b[@class="priceLarge"]'

작동 식을해야합니까하지만 난 심지어 해달라고 때문에의 소스, 이유를 알고 :

흥미로운 부분은 같은 다른 아마존 페이지, 이것은 하나의 이다 그럴 수없는 페이지는 그런 태그를 찾는다. ... 그렇다면 왜 작동 하는가? 첫 페이지의 가격은 어떻게 받습니까? 감사합니다.

출처

2014-10-12 Emilios1995

아마 당신의 브라우저에서 페이지가 다음 다른 레이아웃 당신의 PHP 첫 번째 경로 표현식이 올바른지, 그리고 가격을 산출 할 것이다 – pguardiario

의 하나이기 때문에 - 즉, 데이터에 올바르게 적용된 경우. PLease show PHP 코드 (두 표현식 모두). –

PHP로 근근이 살아가는 경우 브라우저 소스에서 볼 수있는 것을 당연히 받아 들일 수 없습니다.

대신 먼저 PHP와 컨텐츠를 취득하고이 소스로 볼 필요가 :

는

$url = 'http://www.amazon.com/ ... '; 
$buffer = file_get_contents($url);

변수 $buffer은 다음 긁어 될 HTML이 포함되어 있습니다. 데이터가 '당신이 어디에 있는지 찾는 후

<span class="priceLarge">$168.00</span> 
<b class="priceLarge">$14.99</b>

당신을 위해 무엇을 찾고있는 사람 모두가 아마 포함 .priceLarge의 요소가 제 1 및 제 2 주소가 표시됩니다 귀하의 예제 링크와 함께 그 일을

$doc   = new DOMDocument(); 
$doc->recover = true; 
$saved  = libxml_use_internal_errors(true); 
$doc->loadHTML($buffer);

것은 또한 구문 분석 오류에 관심이있을 수 있습니다 찾고있는, 당신은 있는 DOMDocument를 만들 수 있습니다

/** @var array|LibXMLError[] $errors */ 
$errors = libxml_get_errors(); 
foreach ($errors as $error) { 
    printf(
     "%s: (%d) [%' 3d] #%05d:%' -4d %s\n", get_class($error), $error->level, $error->code, $error->line, 
     $error->column, rtrim($error->message) 
    ); 
} 
libxml_use_internal_errors($saved);

DOMDocument은 문제가 발생한 지점을 알려줍니다. 예를 들어 중복 된 ID 값.

DOMDocument를 의 버퍼를로드 후에는 DOMXPath를 만들 수 있습니다

$xp = new DOMXPath($doc);

당신은 문서에서 실제 값을 얻기 위해 사용됩니다.예를 들어

두 예를 들어 주소 HTML hasshown 당신이 찾고있는 정보가 모두 .listprice 및 .priceLarge를 포함하는 #priceBlock입니다 :

List Price: $48.99 Price: $14.99
다음과 같은 결과가 유도된다

$priceBlock = $doc->getElementById('priceBlock'); printf( "List Price: %s\nPrice: %s\n" , $xp->evaluate('string(.//*[@class="listprice"])', $priceBlock) , $xp->evaluate('string(.//*[@class="priceLarge"])', $priceBlock) );

무언가를 놓치신 경우 변수에 부모 섹션 요소를 $priceBlock으로 지정하면 Xpath에 상대 경로를 사용할 수있을뿐만 아니라 cas에서 디버깅을 도울 수 있습니다 전자 당신은 더 자세한 정보 중 일부를 놓치고 :
는
echo $doc->saveHTML($priceBlock);

이 예를 들어 모든 가격 정보를 포함하는 전체 <div>를 출력합니다.

// you can find StringCollector at the end of the answer $tagsWithClass = new StringCollector(); foreach ($xp->evaluate('.//*/@class', $priceBlock) as $class) { $tagsWithClass->add(sprintf("%s.%s", $class->parentNode->tagName, $class->value)); } echo $tagsWithClass;

:

당신이 설정 자신의 경우 일부 헬퍼 클래스, 당신은 추가 사용에이 가격 블록 내의 모든 태그/클래스 조합을 보여주는 것처럼, 그것을 긁어에 대한 문서에서 다른 유용한 정보를 얻을 수 있습니다 이것은 다음 tagnames 자신의 클래스 속성 값을 여기에 수집 된 문자열과 수의 목록을 출력합니다 .pricelarge는에 있기 때문에

table.product (1) td.priceBlockLabel (3) span.listprice (1) td.priceBlockLabelPrice (1) b.priceLarge (1) tr.youSavePriceRow (1) td.price (1)

당신이 볼 수 있듯이를,이 첫 번째 예제 URL로부터의요소.

비교적 단순한 도우미로 전체 HTML 구조를 트리 형식으로 표시하는 것처럼 더 많은 작업을 수행 할 수 있습니다.

`<div id="priceBlock" class="buying"> +"\n\n " `<table class="product"> +<tr> | +<td class="priceBlockLabel"> | | `"List Price:" | +"\n " | +<td> | | `<span id="listPriceValue" class="listprice"> | | `"$48.99" | `"\n " +<tr id="actualPriceRow"> | +<td id="actualPriceLabel" class="priceBlockLabelPrice"> | | `"Price:" | +"\n " | +<td id="actualPriceContent"> | | +<span id="actualPriceValue"> | | | `<b class="priceLarge"> | | | `"$14.99" | | +"\n " | | `<span id="actualPriceExtraMessaging"> | | +"\n \n\n\n " | | +<span> | | | `"\n \n " | | +"\n \n\n\n\n\n\n\n\n\n\n \n\n\n\n\n\n \n\n\n\n\n& " | | +<b> | | | `"FREE Shipping" | | +" on orders over $35.\n\n\n\n" | | +<a href="/gp/help/customer/display.html/ref=mk_sss_dp_1/191-4381493-1931545?ie=UTF8&no..."> | | | `"Details" | | `"\n\n\n\n\n\n\n\n\n \n\n \n \n\n\n\n\n\n \n" | `"\n" +<tr id="dealPriceRow"> | +<td id="dealPriceLabel" class="priceBlockLabel"> | | `"Deal Price: " | +"\n " | +<td id="dealPriceContent"> | | +"\n " | | +<span id="dealPriceValue"> | | +"\n " | | +<span id="dealPriceExtraMessaging"> | | `"\n " | `"\n" +<script> | `[XML_CDATA_SECTION_NODE (4)] +<tr id="youSaveRow" class="youSavePriceRow"> | +<td id="youSaveLabel" class="priceBlockLabel"> | | `"You Save:" | +"\n " | +<td id="youSaveContent" class="price"> | | +<span id="youSaveValue"> | | | `"$34.00\n (69%)" | | `"\n " | `"\n " `<tr> +<td> `<td> `<span> `"o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o..."

당신은 그것을 an answer to Debug a DOMDocument Object in PHP과 another one에서 참조 찾을 수 있습니다

DomTree::dump($priceBlock);

그것은 당신에게 단지 DOMDocument::saveHTML($node)보다 더 나은 소비를 할 수 있습니다 다음과 같은 출력을 제공 할 것입니다. code is available on github as a gist.

StringCollector 헬퍼 클래스

/** * Class StringCollector * * Collect strings and count them */ class StringCollector implements IteratorAggregate { private $array; public function add($string) { $entry = & $this->array[$string]; $entry++; } public function getIterator() { return new ArrayIterator($this->array); } public function __toString() { $buffer = ''; foreach ($this as $string => $count) { $buffer .= sprintf("%s (%d)\n", $string, $count); } return $buffer; } }

출처

2014-10-12 09:47:00 hakre

환상적으로 상세한 답변을 드리겠습니다. DomTree는 매우 유용한 도우미 클래스입니다! –

답변

관련 문제