Archive for May 16th, 2007

Data Extraction / Screen Scraping Using XPath

Google AJAXSLT is an implementation of XSL-T in JavaScript, intended for use in fat web pages, which are nowadays referred to as AJAX applications. Because XSL-T uses XPath, it is also an implementation of XPath that can be used independently of XSL-T.

Selenium Core uses AJAXSLT’s XPath function to locate element on plain html. Selenium IDE can generate the XPath very much same as Solvent.

Here is a sample to extracting data from a web page using XPath using AJAXSLT.

<html><head>
<title>XPath-Test</title>
<
script language=”JavaScript” type=”text/javascript” src=”xpath/misc.js”></script>
<
script language=”JavaScript” type=”text/javascript” src=”xpath/dom.js”></script>
<
script language=”JavaScript” type=”text/javascript” src=”xpath/xpath.js”></script>
<
script language=”JavaScript” type=”text/javascript”>
function
findElementUsingFullXPath(xpath, inDocument) {
 
var context = new ExprContext(inDocument);
 
var xpathObj = xpathParse(xpath);
 
var xpathResult = xpathObj.evaluate(context);
 
if (xpathResult && xpathResult.value) {
    return xpathResult.value[0];
  }
  return null;
};
function start() {
  alert(findElementUsingFullXPath(
“//*[.='b']“, window.document).innerHTML);
}
</script>
</
head>
<
body onload=”start()”>
<
div><p>aaaaa</p>test<div>bb</div><a>b</a></div>
</
body>
</
html>

 

Add comment May 16th, 2007


Calendar

May 2007
M T W T F S S
« Oct   Jun »
 123456
78910111213
14151617181920
21222324252627
28293031  

Posts by Month

Posts by Category


1