Web Scraping in JavaScript | How To Scrape A Website Using Scraper API in 2024

5 min read
Last updated: Oct 13, 2024

scraperapi

Scraper API provides a proxy service designed for web scraping. With over 20 million residential IPs across 12 countries, as well as software that can handle JavaScript rendering and solving CAPTCHAs, you can quickly complete large scraping jobs without ever having to worry about being blocked by any servers.

Implementation is extremely simple, and they offer unlimited bandwidth. Proxies are automatically rotated, but users can choose to maintain sessions if required. All you need to do is call the API with the URL that you want to scrape, and it will return the raw HTML. With Scraper API, you just focus on parsing the data, and they’ll handle the rest.

As per data, they have handled 5 billion API requests per month for over 1,500 businesses and developers around the world

Having built many web scrapers, we repeatedly went through the tiresome process of finding proxies, setting up headless browsers, and handling CAPTCHAs. That's why we decided to start Scraper API, it handles all of this for you so you can scrape any page with a simple API call!

— ScrapperAPI Story

Step 1: Set Up Scraper API Account

To get started, sign up for an account at Scraper API and obtain an API key. The API key will be used to make requests to the Scraper API service.

Web Scraping in JavaScript | How To Scrape A Website Using Scraper API

Step 2: Install Required Packages

To make HTTP requests and handle the scraping logic, we’ll use the axios package. Install it by running the following command:

npm install axios

Step 3: Write the Scrape Code

Let’s write a simple JavaScript function to scrape a website using Scraper API. We’ll use the axios package to make HTTP requests and pass the Scraper API API key as a parameter in the request.

const axios = require("axios");
const cheerio = require("cheerio");

async function scrapeWebsite(url) {
  const apiKey = "YOUR_SCRAPERAPI_API_KEY";

  try {
    const response = await axios.get(
      `http://api.scraperapi.com?api_key=${apiKey}&url=${encodeURIComponent(
        url
      )}`
    );
    return response.data;
  } catch (error) {
    console.error("Error scraping website:", error);
    throw error;
  }
}

// Example usage
const websiteUrl = "https://www.example.com";
scrapeWebsite(websiteUrl)
  .then((html) => {
    console.log("Scraped data:", html);
    // Process the scraped data as needed
    const $ = cheerio.load(html);

    // Use Cheerio selectors to extract specific information
    const pageTitle = $("title").text();
    const headings = $("h1")
      .map((index, element) => $(element).text())
      .get();
  })
  .catch((error) => {
    console.error("Error scraping website:", error);
  });

Replace 'YOUR_SCRAPERAPI_API_KEY' with your actual Scraper API API key. The scrapeWebsite function takes a URL as input, constructs the Scraper API API request URL, and makes the HTTP request using axios.get. The response data contains the scraped content of the website.

Step 4: Process the Scraped Data

Once you have the scraped data, you can process it as needed. This may involve parsing HTML, extracting specific information, or performing further analysis. You can use libraries such as cheerio or puppeteer to parse HTML and extract data from the scraped content.

Here’s the Cheerio example:

const axios = require("axios");
const cheerio = require("cheerio");

async function scrapeWebsite(url) {
  try {
    const response = await axios.get(
      `http://api.scraperapi.com?api_key=${apiKey}&url=${encodeURIComponent(
        url
      )}`
    );
    const html = response.data;

    // Load the HTML content into Cheerio
    const $ = cheerio.load(html);

    // Use Cheerio selectors to extract specific information
    const pageTitle = $("title").text();
    const headings = $("h1")
      .map((index, element) => $(element).text())
      .get();

    // Return the scraped data
    return {
      pageTitle,
      headings,
    };
  } catch (error) {
    console.error("Error scraping website:", error);
    throw error;
  }
}

// Example usage
const websiteUrl = "https://www.example.com";
scrapeWebsite(websiteUrl)
  .then((data) => {
    console.log("Scraped data:", data);
    // Process the scraped data as needed
  })
  .catch((error) => {
    console.error("Error scraping website:", error);
  });

In this example, we’re using the axios package to make an HTTP request to the specified URL and retrieve the HTML content of the website. We then load the HTML content into Cheerio using cheerio.load(html).

With Cheerio, we can use selectors similar to those in CSS to target specific elements in the HTML. In this example, we extract the page title by selecting the <title> element with $('title').text(). We also extract all the <h1> headings using $('h1').map((index, element) => \$(element).text()).get(). You can customize the selectors based on the specific information you want to extract from the website. Cheerio provides a powerful and flexible way to traverse and manipulate the HTML content. Remember to install the required dependencies by running npm install axios cheerio in your project directory before executing the code.

Conclusion

Using Scraper API simplifies the process of scraping websites by handling various challenges such as proxies and captchas. By following the steps outlined in this tutorial, you can easily set up and use Scraper API to scrape websites and extract valuable data for your projects.

scraperapi

Note:

Make sure to review and comply with the terms of service and guidelines of both Scraper API and the website you are scraping. Respect website scraping policies and only scrape websites that allow it or have obtained proper permissions.

Any thoughts, let's discuss on twitter

Sharing this article is a great way to educate others like you just did.



If you’ve enjoyed this issue, do consider subscribing to my newsletter.


Subscribe to get more such interesting content !

Tech, Product, Money, Books, Life. Discover stuff, be inspired, and get ahead. Box Piper is on Twitter and Discord. Let's Connect!!

To read more such interesting topics, let's go Home

More Products from the maker of Box Piper:

Follow GitPiper Instagram account. GitPiper is the worlds biggest repository of programming and technology resources. There is nothing you can't find on GitPiper.

Follow SharkTankSeason.com. Dive into the riveting world of Shark Tank Seasons. Explore episodes, pitches, products, investment details, companies, seasons and stories of entrepreneurs seeking investment deals from sharks. Get inspired today!.


Scraper API

More Blogs from the house of Box Piper: