Home Tutorials Training Consulting Products Books Company Donate Contact us









Online training

Events

Quick links

Share

Lars Vogel, (c) 2015, 2016 vogella GmbH Version 0.2, 06.07.2016

This tutorial explains the usage of Jsoup as a HTML parser.

1. jsoup

1.1. What is jsoup?

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.

1.2. Using jsoup

To use jsoup in a Maven build, add the following dependency to your pom.

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.9.1</version>
  </dependency>

To use jsoup in your Gradle build, add the following dependency to your build.gradle file.

compile 'org.jsoup:jsoup:1.9.1'

1.3. Example

The following code demonstrates how to read a webpage and how to extract its links.

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class ParseLinksExample {

  public static void main(String[] args) {

    Document doc;
    try {

        doc = Jsoup.connect("http://www.vogella.com").get();

        // get title of the page
        String title = doc.title();
        System.out.println("Title: " + title);

        // get all links
        Elements links = doc.select("a[href]");
        for (Element link : links) {

            // get the value from href attribute
            System.out.println("\nLink : " + link.attr("href"));
            System.out.println("Text : " + link.text());
        }

    } catch (IOException e) {
        e.printStackTrace();
    }

  }

}

2. jsoup Resources

Nothing listed.

3. vogella training and consulting support

Copyright © 2012-2018 vogella GmbH. Free use of the software examples is granted under the terms of the Eclipse Public License 2.0. This tutorial is published under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Germany license.