How to proxy urllib2 and more in Python

Snow in April
mox1
Friday, Feb 7th 2014

The ability to proxy a web connection has been around since the early 1990's. Most modern browsers support the ability to add a proxy via a few clicks of your mouse. Lots of Linux command line utilities (including "curl") have the ability to proxy a connection. Unfortunately if you would like to proxy a python script (For example using python to fetch a webpage over TOR) you're going to run into some trouble. Luckily I am here to help.

The other day I was looking to combine python's urllib2 module with a SOCKS proxy (I wanted to retrieve a webpage through my PIA account). If you google various forms of "python socks proxy connection" you will receive lots of responses. After a bit of reading you will quickly realize it is a bit of a mess. Some will recommend pycurl (won't work with urllib2) while others will point you to a Google Code repository here (Don't use this! Outdated). You may even end up finding a python module called "requesocks", which is a modified version of requests. I tried all of these with urllib2 (and requests)... and they all failed in some way shape or form.

First, I tried SocksiPy. There are at least three versions of this module floating around (SourceForge, Google Code and pip). Originally written for python 2.3, it and various forks are still available. All are outdated. They recommended I download the module (a file called socks.py) and place it into my projects source folder. A few lines of python later and urllib2 will be proxified. After setting all of this up I had a working solution... that mysteriously failed every 3-5th try. A very non-helpful debug message with something about "error, try again" was produced.

So my hunt continued. I stumbled upon requesocks; a python module (available via pip) that adds SOCKS proxy support to the popular requests module. After modifying my code to utilize requests I started it up...and Boom: "connection not allowed by ruleset". I'm still not sure why this happened, but I think it had something to do with the fact that my SOCKS proxy server required authentication. I should have known this was going to be a problem when the requesocks documentation made no mention of how to pass a username and password to the proxy.

Feeling slightly defeated, I took a step back and did a lot more googling. This appears to be a pretty common feature requested by people. The requests people added proxy support but NOT SOCKS proxy support. Eventually I stumbled on PySocks which is "A SocksiPy fork, used in urllib3. Contains many improvements to the original". To download and start using PySocks, simply:

 pip install PySocks 
Once that is done, add the python code below:


import socket
import socks
import urllib2

PORT = 1080
#No authentication
#socks.set_default_proxy(socks.SOCKS5, "localhost")
#With Authentication
socks.set_default_proxy(socks.SOCKS5, "my.proxyserver.com",PORT,True,"proxy_user","proxy_pass")
socket.socket = socks.socksocket  #dont add ()!!!

urllib2.urlopen("http://...") # All requests will now pass through the SOCKS proxy

Thats pretty much it! Simple right? I recommend you make a urllib2.urlopen() call to one of the various webpages that will display your ip to ensure it is working correctly. Hopefully this post will avoid you a few hours of googling and frustration.