Scraper API provides a proxy service designed for web scraping. With over 20 million residential IPs across 12 countries, as well as software that can handle JavaScript rendering and solving CAPTCHAs, you can quickly complete large scraping jobs without ever having to worry about being blocked by any servers. We can use the power of ScraperAPI to extract Metatags such as Title, Description, Keywords, Open Graph Images links, etc. from any website, without dealing with any IP blocks and CAPTCHAs. ScrapperAPI handles it beautifully.
Implementation is extremely simple, and ScraperAPI offers unlimited bandwidth. Proxies are automatically rotated, but users can choose to maintain sessions if required. All you need to do is call the API with the URL that you want to scrape, and it will return the raw HTML. With Scraper API, you just focus on parsing the data, and they’ll handle the rest. Once the data is parsed, we will use the metascraper library to easily scrape Metatags from any website using Open Graph, JSON+LD, regular HTML Metatags, and a series of fallbacks.
The steps that we are following for metatags extractions:
When you sign up for Scraper API you are given an access key. All you need to do is call the API with your key and the URL that you want to scrape, and you will receive the raw HTML of the page as a result. It’s as simple as:
curl "https://api.scraperapi.com?api_key=XYZ&url=https://metascraper.js.org"
On the back end, when Scraper API receives your request, their service accesses the URL via one of their proxy servers, gets the data, and then sends it back to you.
Scraper API exposes a single API endpoint, simply send a GET request to http://api.scraperapi.com with two query string parameters, api_key which contains your API key, and url which contains the url you would like to scrape.
/* Node.Js */
const scraperapiClient = require("scraperapi-sdk")("XYZ");
const response = await scraperapiClient.get("http://www.bloomberg.com/news/articles/2016-05-24/as-zenefits-stumbles-gusto-goes-head-on-by-selling-insurance");
console.log(response);
<!DOCTYPE html>
<html lang="en">
<head>
<!-- Basic -->
<meta charset="utf-8">
<meta http-equiv="x-ua-compatible" content="ie=edge">
<!-- Search Engine -->
<meta name="description" content="easily scrape metadata from an article on the web.">
<meta name="image" content="https://metascraper.js.org/static/logo-banner.png">
<link rel="canonical" href="https://metascraper.js.org" />
<title>metascraper, easily scrape metadata from an article on the web.</title>
<meta name="viewport"
content="width=device-width, user-scalable=no, initial-scale=1.0, maximum-scale=1.0, minimum-scale=1.0">
<!-- Schema.org for Google -->
<meta itemprop="name" content="metascraper, easily scrape metadata from an article on the web.">
<meta itemprop="description" content="easily scrape metadata from an article on the web.">
<meta itemprop="image" content="https://metascraper.js.org/static/logo-banner.png">
<!-- Twitter -->
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="metascraper, easily scrape metadata from an article on the web.">
<meta name="twitter:description" content="easily scrape metadata from an article on the web.">
<meta name="twitter:image" content="https://metascraper.js.org/static/logo-banner.png">
<meta name="twitter:label1" value="Installation" />
<meta name="twitter:data1" value="npm install metascraper --save" />
<!-- Open Graph general (Facebook, Pinterest & Google+) -->
<meta property="og:title" content="metascraper, easily scrape metadata from an article on the web.">
<meta property="og:description" content="easily scrape metadata from an article on the web.">
<meta property="og:image" content="https://metascraper.js.org/static/logo-banner.png">
<meta property="og:logo" content="https://metascraper.js.org/static/logo.png">
<meta property="og:url" content="https://metascraper.js.org">
<meta property="og:type" content="website">
<!-- Favicon -->
<link rel="icon" type="image/png" href="/static/favicon-32x32.png" sizes="32x32" />
<link rel="icon" type="image/png" href="/static/favicon-16x16.png" sizes="16x16" />
<link rel="shortcut icon" href="/static/favicon.ico">
<!-- Stylesheet -->
<link href="https://fonts.googleapis.com/css?family=Bitter|Source+Sans+Pro" rel="stylesheet">
<link rel="stylesheet" href="/static/style.min.css">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/codecopy/umd/codecopy.min.css">
</head>
<body>
<div id="app"></div>
</body>
<script src="/static/main.min.js"></script>
<script src="//unpkg.com/docsify/lib/docsify.min.js"></script>
<script src="//unpkg.com/docsify/lib/plugins/ga.min.js"></script>
<script src="//unpkg.com/docsify/lib/plugins/external-script.min.js"></script>
<script src="//unpkg.com/prismjs/components/prism-bash.min.js"></script>
<script src="//unpkg.com/prismjs/components/prism-jsx.min.js"></script>
<script src="//cdn.jsdelivr.net/npm/codecopy/umd/codecopy.min.js"></script>
</html>
metascraper is library to easily scrape metadata from an article on the web using Open Graph metadata, regular HTML metadata, and series of fallbacks. It follows a few principles:
npm install metascraper --save
const metascraper = require('metascraper')([
require('metascraper-author')(),
require('metascraper-date')(),
require('metascraper-description')(),
require('metascraper-image')(),
require('metascraper-logo')(),
require('metascraper-clearbit')(),
require('metascraper-publisher')(),
require('metascraper-title')(),
require('metascraper-url')()
])
const targetUrl = "http://www.bloomberg.com/news/articles/2016-05-24/as-zenefits-stumbles-gusto-goes-head-on-by-selling-insurance";
const scraperapiClient = require("scraperapi-sdk")("XYZ");
;(async () => {
const response = await scraperapiClient.get(targetUrl);
const metadata = await metascraper({ response, targetUrl })
console.log(metadata)
})()
{
"author": "Ellen Huet",
"date": "2016-05-24T18:00:03.894Z",
"description": "The HR startups go to war.",
"image": "https://assets.bwbx.io/images/users/iqjWHBFdfxIU/ioh_yWEn8gHo/v1/-1x-1.jpg",
"publisher": "Bloomberg.com",
"title": "As Zenefits Stumbles, Gusto Goes Head-On by Selling Insurance",
"url": "http://www.bloomberg.com/news/articles/2016-05-24/as-zenefits-stumbles-gusto-goes-head-on-by-selling-insurance"
}
You have now successfully scraped a website, and extracted respective metatags:
ScraperApi and metascraper.js have made our life super easy. You can use them to extract any website with ease and without any hiccups. I used this process for extracting Hacker News articles metatags.
When you log into your Scraper API account, you will be presented with a dashboard that will show you how many requests you have used, how many requests you have left for the month, and the number of failed requests (which do not count towards your request limit).
If you would like to monitor your account usage and limits programmatically (how many concurrent requests you’re using, how many requests you’ve made, etc.) you may use the /account endpoint, which returns JSON.
curl "https://api.scraperapi.com/account?api_key=XYZ"
To ensure your requests come from the United States, please use the countrycode= flag (e.g. countrycode=us)
curl "https://api.scraperapi.com?api_key=XYZ&url=https://metascraper.js.org&country_code=us"
{
"concurrentRequests": 553,
"requestCount": 6655888,
"failedRequestCount": 1118,
"requestLimit": 10000000,
"concurrencyLimit": 1000
}
Scraper API is the best proxy API service for web scraping in the market today and is features loaded with affordable pricing:
It’s easy to integrate, and can use for all levels/sizes of scraping projects. If you have any serious scraping projects, then Scraper API is worth looking into. Even if you’re a casual user, you may benefit from using the free plan.