目前共有7篇帖子。 字體大小:較小 - 100% (默認)▼  內容轉換:不轉換▼
 
點擊 回復
26 6
jsoup: Java HTML Parser
一派掌門 二十級
1樓 發表于:2025-8-12 20:59

jsoup is a Java library that simplifies working with real-world HTML and XML. It offers an easy-to-use API for URL fetching, data parsing, extraction, and manipulation using DOM API methods, CSS, and xpath selectors.

jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers.


Scrape and parse HTML from a URL, file, or string.
Find and extract data using DOM traversal or CSS selectors.
Manipulate HTML elements, attributes, and text.
Clean user-submitted content against a safelist to prevent XSS attacks.
Output tidy HTML.

jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree.

https://jsoup.org/

一派掌門 二十級
2樓 發表于:2025-8-12 21:03

Which versions of Java are compatible with jsoup?
Java Version Requirements
Jsoup is a popular Java library for parsing and manipulating HTML documents. The Java version requirements have evolved over time as jsoup has been updated to take advantage of newer Java features.

 

Current jsoup Versions (1.15.0+)
Modern jsoup versions require Java 8 or higher: - Minimum: Java 8 (Java 1.8) - Required - Recommended: Java 11 or Java 17 (LTS versions) - Supported: Java 8, 11, 17, 21, and newer versions

 

Legacy jsoup Versions
For older Java environments:

- jsoup 1.14.x and earlier: Compatible with Java 6+

- jsoup 1.13.x: Compatible with Java 6+

- jsoup 1.12.x and earlier: Compatible with Java 5+

https://webscraping.ai/faq/jsoup/which-versions-of-java-are-compatible-with-jsoup

 
一派掌門 二十級
3樓 發表于:2025-8-12 21:12

jsoup 1.14.3

jsoup 1.14.3 is out now, adding native XPath selector support, improved <template> support, and also includes a bunch of bug fixes, improvements, and performance enhancements.

See the release announcement for the full changelog.

https://repo1.maven.org/maven2/org/jsoup/jsoup/1.14.3/
https://repo1.maven.org/maven2/org/jsoup/jsoup/1.14.3/jsoup-1.14.3.jar

 
一派掌門 二十級
4樓 發表于:2025-8-12 21:32

实测2017年6月发布的jsoup-1.10.3是支持JDK1.6的最后版本。

jsoup-1.10.3.jar 2017-06-11 19:15 355356

https://repo1.maven.org/maven2/org/jsoup/jsoup/1.10.3/jsoup-1.10.3.jar

 
巨大八爪鱼

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

public class Test {
 public static void main(String[] args) throws IOException {
  Document document = Jsoup.connect("http://cn.bing.com/").get();
  System.out.println("Title: " + document.title());
 }
}

  2025-8-12 21:32 回復
巨大八爪鱼:这个版本的jsoup似乎不支持xpath,可以用java6自带的javax.xml.xpath.XPath代替。
  2025-8-12 22:42 回復
一派掌門 二十級
5樓 發表于:2025-8-12 21:36

Use DOM methods to navigate a document:

https://jsoup.org/cookbook/extracting-data/dom-navigation

 

Element a = document.getElementById("id_s");
System.out.println(a.html());

 

 

回復帖子

內容:
用戶名: 您目前是匿名發表
驗證碼:
(快捷鍵:Ctrl+Enter)
 

本帖信息

點擊數:26 回複數:6
評論數: ?
作者:巨大八爪鱼
最後回復:巨大八爪鱼
最後回復時間:2025-8-12 22:42
 
©2010-2025 Purasbar Ver2.0
除非另有聲明,本站採用創用CC姓名標示-相同方式分享 3.0 Unported許可協議進行許可。