Latest in Gear

Image credit: AP Photo/Mark Lennihan

Google pushes for an official web crawler standard

It's open-sourcing the tool it uses to scan robots.txt files.
186 Shares
Share
Tweet
Share

Sponsored Links

AP Photo/Mark Lennihan

One of the cornerstones of Google's business (and really, the web at large) is the robots.txt file that sites use to exclude some of their content from the search engine's web crawler, Googlebot. It minimizes pointless indexing and sometimes keeps sensitive info under wraps. Google thinks its crawler tech can improve, though, and so it's shedding some of its secrecy. The company is open-sourcing the parser used to decode robots.txt in a bid to foster a true standard for web crawling. Ideally, this takes much of the mystery out of how to decipher robots.txt files and will create more of a common format.

While the Robots Exclusion Protocol has been around for a quarter of a century, it was only an unofficial standard -- and that has created problems with teams interpreting the format differently. One might handle an edge case differently than another. Google's initiative, which includes submitting its approach to the Internet Engineering Task Force, would "better define" how crawlers are supposed to handle robots.txt and create fewer rude surprises.

The draft isn't fully available, but it would work with more than just websites, include a minimum file size, set a max one-day cache time and give sites a break if there are server problems.

There's no guarantee this will become a standard, at least as-is. If it does, though, it could help web visitors as much as it does creators. You might see more consistent web search results that respect sites' wishes. If nothing else, this shows that Google isn't completely averse to opening important assets if it thinks they'll advance both its technology and the industry at large.

All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission.
Comment
Comments
Share
186 Shares
Share
Tweet
Share

Popular on Engadget

T-Mobile, Sprint merger could close by April 1st

T-Mobile, Sprint merger could close by April 1st

View
Windows 10 icons are getting an overdue redesign

Windows 10 icons are getting an overdue redesign

View
Hasbro's flurry of 'The Mandalorian' toys includes an animatronic Baby Yoda

Hasbro's flurry of 'The Mandalorian' toys includes an animatronic Baby Yoda

View
'Westworld' season 3 trailer sets the stage for an AI battle

'Westworld' season 3 trailer sets the stage for an AI battle

View
HBO and HBO Max are coming to YouTube TV

HBO and HBO Max are coming to YouTube TV

View

From around the web

Page 1Page 1ear iconeye iconFill 23text filevr