As I was again working on my data mining project the other day, I was tasked at downloading some information from a web-based database and wanted to extract some data.
I was using curl and netcat to grab the information from the website, but I didn't have the correct login script as this database was secured by an SSO system.
I didn't want to waste time in discovering how this SSO system worked just yet. I knew that since I had a direct login to this system I could use Firefox LiveHeaders to grab my SSO cookie and then use curl to send an additional header to web-based service - basically pretending as if it had already logged in though it had not.
Here's where the -H option comes in handy, to send an extra header when getting a web page:
-H "Cookie: SSO=1%7Cjohn%7C%2F7C20081016165339%7Cpassword%7CLDAP%7is _secretC10000007C%7Cdsa%7CE%2BR"
Bingo. I was in.
Also this particular web database only allows Firefox and Internet Explorer as valid web clients. I also needed to be able to tell the remote Web server that I was different browser.
Here I also use the -H option and told the Web server that curl is actually Firefox using the following syntax:
-H "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.3) Gecko/2008092417 Firefox/6.0.3"
These options are great but what I noticed when I tried to get the data was that my whole terminal was garbled:

After thinking about this for a minute I realized that the server was doing what I told it to do: Send me compressed data.
I had sent the HTTP string in the header:
Accept-Encoding:gzip,deflate
Oops.
The Web server was expecting me to decompress the data. Curl has a (--compressed) option which does the following: 1) Request a compressed response and 2) return the uncompressed document. Wow. Perfect. Curl had done the right thing again!
After that I collected my data and life was good.



0 comments:
Post a Comment