(Python) XPath Grammer

XPath is "(XML Path)" of abbreviation.

This is query language that explore and select some part of XML document.
In 1999, W3C make this and it is used in Java, Pytho, C#

Unfortunately BeatifulSoup isn`t supported this XPath library.
The XPath usage is similar with usage of CSS selector.(like my#idname)

It is consist of four concept


  • root node vs non root node
-  /div is only choosing root div node in a document.
- //div is every div node in a document

  • choice attriute
- //@href is choosing all href attribute node 
ex) //a[@href='http://google.com] is choosing all node which is indicated to "google.com" in a document

  • choice node according to location
- (//a)[3] is choosing third link in a document.
- (//table)[last()] is choosing last table in a document.
- (//a)[position()<3] is choosing first and second link in a document.

  • asterik(*) is every character node set, this is useful in every situation.
- //table/tr/* is choosing every child tr tag in all table.
- //div[@*] is choosing all div tag which have more than one attribute.


댓글

이 블로그의 인기 게시물

(네트워크)폴링방식 vs 롱 폴링방식

(ElasticSearch) 결과에서 순서 정렬

(18장) WebSocekt과 STOMP를 사용하여 메시징하기