2013-04-16 3 views
0

저는 python에서 feedparser를 사용하여 URL에서 RSS 피드를 구문 분석하려고합니다. 이 링크 (http://www.shop.inonit.in/RSSFeedDetails.aspx?PID=801)에 가면 그 물건의 전체를 많이 보여줍니다 동안RSS 피드를 구문 분석 할 수 없습니다.

>>> import feedparser 
>>> d = feedparser.parse('http://www.shop.inonit.in/RSSFeedDetails.aspx?PID=801') 
>>> d 
{'feed': {'summary': u'<span><h1>Server Error in \'/mobile\' Application.<hr color="silver" size="1" width="100%" /></h1>\n\n    
<h2> <i>Attempted to divide by zero.</i> </h2></span>\n\n   <font face="Arial, Helvetica, Geneva, SunSans-Regular, sans-serif ">\n\n   <b> Description: </b>An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.\n\n   <br /><br />\n\n   <b> Exception Details: </b>System.DivideByZeroException: Attempted to divide by zero.<br /><br />\n\n    
<b>Source Error:</b> <br /><br />\n\n   <table bgcolor="#ffffcc" width="100%">\n    <tr>\n     <td>\n      <code>\n\nAn unhandled exception was generated during the execution of the current web request. Information regarding the origin and location of the exception can be identified using the exception stack trace below.</code>\n\n     </td>\n    </tr>\n   </table>\n\n   <br />\n\n   <b>Stack Trace:</b> <br /><br />\n\n   <table bgcolor="#ffffcc" width="100%">\n    <tr>\n     <td>\n      <code><pre>\n\n[DivideByZeroException: Attempted to divide by zero.]\n System.Decimal.FCallDivide(Decimal&amp; d1, Decimal&amp; d2) +0\n System.Decimal.Divide(Decimal d1, Decimal d2) +17\n Martjack.CMS.PageControlsModelComp.GetPluginDataEnt(PageControlEnt objPageControlEnt, MerchantENT MerchantEnt, PageControlModel&amp; objPageControlModel, ProductEnt_RE ProductEnt, String MobileVersion) +2324\n 
Martjack.CMS.PageControlsModelComp.GetPageControlOutputData(PageModel pagemodel, PageControlEnt objPageControlEnt, MerchantENT MerchantEnt, String seocid, String combiType, String MobileVersion, ProductEnt_RE ProductEnt, String siteurl) +694\n Martjack.CMS.PageControlsModelComp.GetPageControlModels(PageModel Pagemodel, MerchantENT MerchantEnt, String seocid, String combiType, String MobileVersion, DNDPageControlViewCollection objDNDPageControlViewCollection, Boolean isdndrequest, Int64 pgcontrolid, String siteurl) +919\n Martjack.CMS.PageModelComp.GetPageModel(MerchantENT MerchantEnt, Int32 predefinedPageId, Boolean isPredefined, ChannelType channel, String seocid, String Bid, String combiType, String MobileVersion, Boolean isDndRequest, 
DNDPageControlViewCollection ObjDNDPageControlViewCollection, Boolean ControlsInfo, Int64 pgcontrolid) +1717\n MartJack.Facade.CMSFacade.GetPageModel(MerchantENT MerchantEnt, Int32 PageId, Boolean isPredefined, ChannelType channel, String seocid, String bid, String combitype, String mobileversion, Boolean isDndRequest, DNDPageControlViewCollection ObjDNDPageControlViewCollection, Boolean ControlsInfo, Int64 pgcontrolid) +119\n MobileECommerce.MobileECommerce.ProductsController.GetPageModelByRequest(String seoid, String bid) +227\n MobileECommerce.MobileECommerce.ProductsController.Index(String id, String seobrand, String category, String categoryparent) +54\n lambda_method(Closure , ControllerBase , Object[]) +272\n 
System.Web.Mvc.ActionMethodDispatcher.Execute(ControllerBase controller, Object[] parameters) +17\n System.Web.Mvc.ReflectedActionDescriptor.Execute(ControllerContext controllerContext, IDictionary`2 parameters) +212\n System.Web.Mvc.ControllerActionInvoker.InvokeActionMethod(ControllerContext controllerContext, ActionDescriptor actionDescriptor, IDictionary`2 parameters) +239\n System.Web.Mvc.&lt;&gt;c__DisplayClass15.&lt;InvokeActionMethodWithFilters&gt;b__12() +56\n System.Web.Mvc.ControllerActionInvoker.InvokeActionMethodFilter(IActionFilter filter, ActionExecutingContext preContext, Func`1 continuation) +282\n System.Web.Mvc.&lt;&gt;c__DisplayClass17.&lt;InvokeActionMethodWithFilters&gt;b__14() +20\n System.Web.Mvc.ControllerActionInvoker.InvokeActionMethodWithFilters(ControllerContext controllerContext, IList`1 filters, ActionDescriptor actionDescriptor, IDictionary`2 parameters) +201\n System.Web.Mvc.ControllerActionInvoker.InvokeAction(ControllerContext controllerContext, String actionName) +351\n System.Web.Mvc.Controller.ExecuteCore() +99\n System.Web.Mvc.ControllerBase.Execute(RequestContext requestContext) +94\n System.Web.Mvc.ControllerBase.System.Web.Mvc.IController.Execute(RequestContext requestContext) +10\n 
System.Web.Mvc.&lt;&gt;c__DisplayClassb.&lt;BeginProcessRequest&gt;b__5() +43\n System.Web.Mvc.Async.&lt;&gt;c__DisplayClass1.&lt;MakeVoidDelegate&gt;b__0() +21\n System.Web.Mvc.Async.&lt;&gt;c__DisplayClass8`1.&lt;BeginSynchronous&gt;b__7(IAsyncResult _) +12\n System.Web.Mvc.Async.WrappedAsyncResult`1.End() +53\n System.Web.Mvc.Async.AsyncResultWrapper.End(IAsyncResult asyncResult, Object tag) +28\n System.Web.Mvc.Async.AsyncResultWrapper.End(IAsyncResult asyncResult, Object tag) +15\n System.Web.Mvc.&lt;&gt;c__DisplayClasse.&lt;EndProcessRequest&gt;b__d() +34\n System.Web.Mvc.SecurityUtil.&lt;GetCallInAppTrustThunk&gt;b__0(Action f) +7\n System.Web.Mvc.SecurityUtil.ProcessInApplicationTrust(Action action) +23\n System.Web.Mvc.MvcHandler.EndProcessRequest(IAsyncResult asyncResult) +68\n 
System.Web.Mvc.MvcHandler.System.Web.IHttpAsyncHandler.EndProcessRequest(IAsyncResult result) +9\n System.Web.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() +714\n System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean&amp; completedSynchronously) +240\n</pre></code>\n\n     </td>\n    </tr>\n   </table>\n\n   <br />\n\n    
<hr color="silver" size="1" width="100%" />\n\n   <b>Version Information:</b>\xa0Microsoft .NET Framework Version:4.0.30319; ASP.NET Version:4.0.30319.272\n\n   </font>'}, 'status': 302, 'version': u'', 'encoding': u'utf-8', 'bozo': 1, 'headers': {'content-length': '11348', 'x-powered-by': 'ASP.NET', 'set-cookie': 'SERVERID=HAS14; path=/', 'originserver': 'HAS14', 'server': 'Microsoft-IIS/7.5', 'connection': 'close', 'cache-control': 'private', 'date': 'Tue, 16 Apr 2013 08:03:59 GMT', 'content-type': 'text/html; charset=utf-8', 'x-aspnet-version': '4.0.30319'}, 'href': 
u'http://www.shop.inonit.in/mobile/Products//NA/NA/0', 'namespaces': {}, 'entries': [], 'bozo_exception': SAXParseException('not well-formed (invalid token)',)} 

나는, 출력에 아무것도 얻을 수 없습니다! 아마도 존재하지 않는 다른 페이지로 리디렉션 중일 수 있습니다. (내가 존재하지 않는 URL로 리다이렉트 되었기 때문에,이 웹 사이트의 개별 페이지를 스팸으로 크롤링하려했지만 시도 할 수 없었습니다).

위의 사항에 대한 도움이 될 것입니다. 감사!

+0

는 "출력에 아무것도"무엇을 의미합니까? '>>> len (d [ 'feed'] [ 'summary']) 5601'에 '0으로 나누기'라는 좋은 메시지가 있습니다. ' –

+1

아, 죄송합니다. 아무런 의미가 없으므로 요소 (제목, 가격 등)와 마찬가지로 관련성이 없음을 의미합니다. 피드를 읽을 수 없지만 링크를 열면 모든 데이터가 표시됩니다. –

답변

1

프록시를 사용하고 있습니까? 당신이 경우 은 이런 식으로 할 -

import urllib2, feedparser 
proxy = urllib2.ProxyHandler({"http":"proxy:port"}) 
d = feedparser.parse('http://www.shop.inonit.in/RSSFeedDetails.aspx?PID=801', handlers = [proxy])