Data Extraction / Screen Scraping Using XPath

May 16th, 2007

Google AJAXSLT is an implementation of XSL-T in JavaScript, intended for use in fat web pages, which are nowadays referred to as AJAX applications. Because XSL-T uses XPath, it is also an implementation of XPath that can be used independently of XSL-T.

Selenium Core uses AJAXSLT’s XPath function to locate element on plain html. Selenium IDE can generate the XPath very much same as Solvent.

Here is a sample to extracting data from a web page using XPath using AJAXSLT.

<html><head>
<title>XPath-Test</title>
<
script language=”JavaScript” type=”text/javascript” src=”xpath/misc.js”></script>
<
script language=”JavaScript” type=”text/javascript” src=”xpath/dom.js”></script>
<
script language=”JavaScript” type=”text/javascript” src=”xpath/xpath.js”></script>
<
script language=”JavaScript” type=”text/javascript”>
function
findElementUsingFullXPath(xpath, inDocument) {
 
var context = new ExprContext(inDocument);
 
var xpathObj = xpathParse(xpath);
 
var xpathResult = xpathObj.evaluate(context);
 
if (xpathResult && xpathResult.value) {
    return xpathResult.value[0];
  }
  return null;
};
function start() {
  alert(findElementUsingFullXPath(
“//*[.='b']“, window.document).innerHTML);
}
</script>
</
head>
<
body onload=”start()”>
<
div><p>aaaaa</p>test<div>bb</div><a>b</a></div>
</
body>
</
html>

 

Entry Filed under: Programming

Leave a Comment

Required

Required, hidden

Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Trackback this post  |  Subscribe to the comments via RSS Feed


Calendar

February 2012
M T W T F S S
« Feb    
 12345
6789101112
13141516171819
20212223242526
272829  

Most Recent Posts


1