Scraper API provides a proxy service designed for web scraping. With over 20 million residential IPs across 12 countries, as well as software that can handle JavaScript rendering and solving CAPTCHAs, you can quickly complete large scraping jobs without ever having to worry about being blocked by any servers.
Implementation is extremely simple, and they offer unlimited bandwidth. Proxies are automatically rotated, but users can choose to maintain sessions if required. All you need to do is call the API with the URL that you want to scrape, and it will return the raw HTML. With Scraper API, you just focus on parsing the data, and they’ll handle the rest.
As per data, they have handled 5 billion API requests per month for over 1,500 businesses and developers around the world
To get started, sign up for an account at Scraper API and obtain an API key. The API key will be used to make requests to the Scraper API service.
To make HTTP requests and handle the scraping logic, we’ll use the axios
package. Install it by running the following command:
npm install axios
Let’s write a simple JavaScript function to scrape a website using Scraper API. We’ll use the axios
package to make HTTP requests and pass the Scraper API API key as a parameter in the request.
const axios = require("axios");
const cheerio = require("cheerio");
async function scrapeWebsite(url) {
const apiKey = "YOUR_SCRAPERAPI_API_KEY";
try {
const response = await axios.get(
`http://api.scraperapi.com?api_key=${apiKey}&url=${encodeURIComponent(
url
)}`
);
return response.data;
} catch (error) {
console.error("Error scraping website:", error);
throw error;
}
}
// Example usage
const websiteUrl = "https://www.example.com";
scrapeWebsite(websiteUrl)
.then((html) => {
console.log("Scraped data:", html);
// Process the scraped data as needed
const $ = cheerio.load(html);
// Use Cheerio selectors to extract specific information
const pageTitle = $("title").text();
const headings = $("h1")
.map((index, element) => $(element).text())
.get();
})
.catch((error) => {
console.error("Error scraping website:", error);
});
Replace 'YOUR_SCRAPERAPI_API_KEY'
with your actual Scraper API API key. The scrapeWebsite
function takes a URL as input, constructs the Scraper API API request URL, and makes the HTTP request using axios.get
. The response data contains the scraped content of the website.
Once you have the scraped data, you can process it as needed. This may involve parsing HTML, extracting specific information, or performing further analysis. You can use libraries such as cheerio
or puppeteer
to parse HTML and extract data from the scraped content.
Here’s the Cheerio example:
const axios = require("axios");
const cheerio = require("cheerio");
async function scrapeWebsite(url) {
try {
const response = await axios.get(
`http://api.scraperapi.com?api_key=${apiKey}&url=${encodeURIComponent(
url
)}`
);
const html = response.data;
// Load the HTML content into Cheerio
const $ = cheerio.load(html);
// Use Cheerio selectors to extract specific information
const pageTitle = $("title").text();
const headings = $("h1")
.map((index, element) => $(element).text())
.get();
// Return the scraped data
return {
pageTitle,
headings,
};
} catch (error) {
console.error("Error scraping website:", error);
throw error;
}
}
// Example usage
const websiteUrl = "https://www.example.com";
scrapeWebsite(websiteUrl)
.then((data) => {
console.log("Scraped data:", data);
// Process the scraped data as needed
})
.catch((error) => {
console.error("Error scraping website:", error);
});
In this example, we’re using the axios
package to make an HTTP request to the specified URL and retrieve the HTML content of the website. We then load the HTML content into Cheerio using cheerio.load(html)
.
With Cheerio, we can use selectors similar to those in CSS to target specific elements in the HTML. In this example, we extract the page title by selecting the <title>
element with $('title').text()
. We also extract all the <h1>
headings using $('h1').map((index, element) => \$(element).text()).get()
. You can customize the selectors based on the specific information you want to extract from the website. Cheerio provides a powerful and flexible way to traverse and manipulate the HTML content. Remember to install the required dependencies by running npm install axios cheerio
in your project directory before executing the code.
Using Scraper API simplifies the process of scraping websites by handling various challenges such as proxies and captchas. By following the steps outlined in this tutorial, you can easily set up and use Scraper API to scrape websites and extract valuable data for your projects.
Make sure to review and comply with the terms of service and guidelines of both Scraper API and the website you are scraping. Respect website scraping policies and only scrape websites that allow it or have obtained proper permissions.