Member Avatar for Szabi Zsoldos

Hi guys,

I want to create a crawler to extract some infomation from a page.
The problem is that it is written with the Java Wicket Framework and I don't know how to scrape informations from it because I don't know how to submit some post parameters.

Is this possible to do ? :)
Thank you.

Member Avatar for pritaeas

You can post to a page with cUrl.

Member Avatar for Szabi Zsoldos

You can post to a page with cUrl.

pritaeas, thank you but how do I set the post methods for example to this link ?

The input field name is wmcCif:cif

https://portal.onrc.ro/ONRCPortalWeb/appmanager/myONRC/wicket/?wicket:interface=:9:1:::

There are the POST parameters viewed with Firebug.

Parametersapplication/x-www-form-urlencoded cautare x criteriu filtru.cif wmcCif:cif 1757980 Source cautare=x&criteriu=filtru.cif&wmcCif%3Acif=1757980 

And this is the form from the page

<form id="idTestWicket03__1__2" method="post" action="https://portal.onrc.ro:443/ONRCPortalWeb/appmanager/myONRC/wicket?_nfpb=true&_windowLabel=TestWicket03_1&_urlType=action&wlpTestWicket03_1__wu=%2FONRCPortalWeb%2Fappmanager%2FmyONRC%2Fwicket%2F%3Fwicket%3Ainterface%3D%3A4%3AformCautare%3A1%3AIFormSubmitListener%3A%3A"><div style="width:0px;height:0px;position:absolute;left:-100px;top:-100px;overflow:hidden"><input type="hidden" name="idTestWicket03__1__2_hf_0" id="idTestWicket03__1__2_hf_0" /></div> <table cellpadding="5"> <tr> <td width="150"> Cautare dupa: </td> <td> <select class="select_357" onchange="document.getElementById('idTestWicket03__1__2_hf_0').value='/ONRCPortalWeb/appmanager/myONRC/wicket/?wicket:interface=:4:formCautare:criteriu:1:IOnChangeListener::';document.getElementById('idTestWicket03__1__2').submit();" name="criteriu"> <option value="filtru.buletin">Nr. de buletin</option> <option value="filtru.persoana">Persoana publicata în BPI</option> <option selected="selected" value="filtru.cif">CIF</option> <option value="filtru.reg">Nr. de ordine în Registru</option> <option value="filtru.dosar">Nr. dosar</option> <option value="filtru.interval">Interval de publicare</option> </select> </td> </tr> </table> <div> <table cellpadding="5"> <tr> <td width="150"> CIF: </td> <td> <input class="input_142" type="text" value="1757980" name="wmcCif:cif"/> </td> </tr> </table> </div> <div> <button class="submit" type="submit" onclick="var e=document.getElementById('idTestWicket03__1__2_hf_0'); e.name='cautare'; e.value='x';var f=document.getElementById('idTestWicket03__1__2');var ff=f;if (ff.onsubmit != undefined) { if (ff.onsubmit()==false) return false; }f.submit();e.value='';e.name='';return false;"><span>Cauta</span></button> </div> </form> 
Member Avatar for pritaeas

First user comment on the page I linked.

Member Avatar for Szabi Zsoldos

The problem is way harder than this, I'm familiarized with CURL but this one is giving me a hard time :(
I am successfully logging in but when it comes to extract certain data, it is not working.

Member Avatar for pritaeas

it is not working

Can you be more specific?

Member Avatar for Szabi Zsoldos

I've tried different methods to login, succeded with DOM but not with CURL for this particular page.

function Login($data = "") { $ch = curl_init(); curl_setopt($ch, CURLOPT_URL,"https://portal.onrc.ro/ONRCPortalWeb/appmanager/myONRC/public?_nfpb=true&_pageLabel=login"); curl_setopt($ch, CURLOPT_HEADER, FALSE); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS,http_build_query(array( "j_username" => $data['j_username'], "j_password" => $data['j_password'] ))); curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); $server_output = curl_exec ($ch); return $server_output; } 

I dont know if it's not based on some ajax calls that are encrypted....

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.