I finally got my captcha issue worked out with my new All In Scraper program. I ended up figuring it out myself and did not need to use the freelancer service. I was passing the cookie when I made my requests but I did not use the cookie when downloading the captcha image. That was my problem. Once I started to pass the cookie on the image request then all was well. It was a dumb mistake but I had the image url so I just pointed the picture box to the image and never thought about a cookie on that request.
So I got the program working pretty well and have just a few minor issues to work out before I start thinking about releasing it. I have it setup to use proxies and after you import your proxy list you can check the response time on those proxies and it will use the fastest ones first and remove dead proxies from the list. That is all completed and working. I need to add some code to allow the user to limit the number of request per proxy. With allintile requests G will usually issue a temp ban after 15 or 20 requests. So I want to try and go down the list and do like 10 requests per proxy on like 2 second intervals to hopefully prevent a ban. There are 2 types of bans that they issue. The first one is just a captcha and we can now get through that but if you keep at it they will issue a ban that does not offer a captcha and then that proxy is dead for this until they lift it usually a few hours or days later.
All the proxy and keyword importing is complete. I still have to setup the exporting options but that is a piece of cake as I have done it before several times.
I might want to take a glance at the decapther api to see how easy it would be to offer that capability into the program. It would allow for completely automated use of the program which is what I am going for in the end for myself.
Other then that I only have to figure out how I want to handle registration of the software to try and limit piracy.
I am happy with how this is coming along now. I was a little frustrated with the Captcha part and I must say it was no walk in the park. It took a lot of tracking and research to crack it.
Day 344 Year 2010 Stats
Friday, December 10, 2010