Description
Hi
I'm looking to build a script that sees what data it can glean from any given url, microdata first, then content. Your parser seems perfect for that, but I've noticed a case where an error is thrown in certain situations.
I'm giving the following url:
http://www.currys.co.uk/gbuk/computing/laptops/laptops/lenovo-yoga-510-14-2-in-1-black-10146249-pdt.html
And I'm getting the following warning:
Warning: get_class() expects parameter 1 to be object, array given in C:\Users\danm\Documents\Websites\page-scraper-analyser\vendor\jkphl\micrometa\src\Jkphl\Micrometa\Parser\JsonLD.php on line 217
Unknown JSON-LD item: {"items":[{"id":"_:b0","types":["http:\/\/schema.org\/BreadcrumbList"]
Is it finding microdata but attempting to parse it as JSON-LD?
I've also noticed cases where no data is obtained though microdata is used on the page, is this indicative of poor configuration their end?
Thanks in advance
EDIT
Here's a list of urls with data that either isn't being returned, or is buggy:
-
http://www.argos.co.uk/product/6707596
As you can see from the source code, there is a product type, but when I attempt to parse the url, no data for the product is retrieved. -
http://www.johnlewis.com/apple-ipad-pro-a9x-ios-9-7-wi-fi-128gb/p2609387?colour=Silver
The product rating on this page is there, but it isn't returned.
Also, the product availability returns "http://schema.org/InStock" rather than the value/content. -
http://allrecipes.com/recipe/219164/the-best-parmesan-chicken-bake/?internalSource=popular&referringContentType=home%20page&clickId=cardslot%2010
The prep, cook and total time data on this recipe page isn't parsed. -
http://mashable.com/2017/02/14/spiderman-attacks-runner/?utm_cid=hp-h-1#sYDxayVyimqj
The json data is returned but the property types are set as the key as opposed to the name, so the properties are inaccessible via the usual method.
I appreciate that some of these may be down to the implementation of the microdata on the pages themselves.