Data Extraction / Screen Scraping Using XPath
Google AJAXSLT is an implementation of XSL-T in JavaScript, intended for use in fat web pages, which are nowadays referred to as AJAX applications. Because XSL-T uses XPath, it is also an implementation of XPath that can be used independently of XSL-T.
Selenium Core uses AJAXSLT’s XPath function to locate element on plain html. Selenium IDE can generate the XPath very much same as Solvent.
Here is a sample to extracting data from a web page using XPath using AJAXSLT.
<html><head>
<title>XPath-Test</title>
<script language=”JavaScript” type=”text/javascript” src=”xpath/misc.js”></script>
<script language=”JavaScript” type=”text/javascript” src=”xpath/dom.js”></script>
<script language=”JavaScript” type=”text/javascript” src=”xpath/xpath.js”></script>
<script language=”JavaScript” type=”text/javascript”>
function findElementUsingFullXPath(xpath, inDocument) {
var context = new ExprContext(inDocument);
var xpathObj = xpathParse(xpath);
var xpathResult = xpathObj.evaluate(context);
if (xpathResult && xpathResult.value) {
return xpathResult.value[0];
}
return null;
};
function start() {
alert(findElementUsingFullXPath(“//*[.='b']“, window.document).innerHTML);
}
</script>
</head>
<body onload=”start()”>
<div><p>aaaaa</p>test<div>bb</div><a>b</a></div>
</body>
</html>
Add comment May 16th, 2007