What is a TCP Sockets?
A socket is one endpoint of a two-way communication link between two programs running on the network. A socket is bound to a port number so that the TCP layer can identify the application that data is destined to be sent to.
Python has built-in support for TCP Sockets
Read more about sockets in Python.
Creating a Socket
# create an INET, STREAMing socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# now connect to the web server on port 80 – the normal http port
About Hypertext Transport Protocol (HTTP)
The Hypertext Transfer Protocol is an application protocol for distributed, collaborative, hypermedia information systems. It was invented to retrieve webpages (HTML, Images Documents, etc) the foundation of data communication for the WWW, where hypertext documents include hyperlinks to other resources that the user can easily access, for example by a mouse click or by tapping the screen in a web browser.
Hypertext Transport Protocol is the dominant application layer on the internet.
HTTP protocol set the rules that allow browsers to retrieve documents from the web!
Which HTTP header tells the browser the kind of document that is being returned?
Getting Data from the server
Every time You click on an anchor tag with an href, the browser makes a connection to the webserver and issues a GET request to port 80, to get the content of the page at the specified URL.
The browser sends a request called the GET request and it sends that get request
The server analyzes the request, finds the requested document, and produces on that same socket a response, returning the HTML document to the browser, which formats and displays the documents to You.
Python Strings to Bytes
In Python 3 all strings are Unicode! Not UTF-8, not UTF-16, not UTF-32!
Read more here about Python 2 vs 3, differences
UTF-8 is the best practice for encoding data moving between systems!
ASCII means the American Standard Code for Information Interchange
Python’s encode() and decode() methods are used to encode and decode the input string, using a given encoding.
The string encode() method returns encoded version of the given string.
By default, encode() method doesn’t require any parameters.
It returns UTF-8 encoded version of the string. In case of failure, it raises a UnicodeDecodeError exception.
When we talk to an external resource such as network socket we send bytes, so we need to encode Python3 strings into a given character encoding!
encode converts from Unicode internally to UTF-8. We need this because we have to send in UTF-8
When we read data from an external resource, we must decode it based on the character set (it could be UTF-8, UTF-16 or ASCII)!
decode() converts it to the internal format Unicode
You can choose the character set, but by default, it assumes UTF-8 or ASCII dynamically, because ASCII and UTF-8 are compatible with one another.
So if it’s like old data, you’re probably getting ASCII, if it’s newer data, you’re probably getting UTF-8.
Example Socket Programming Exercises
Get a file name from server and read data from it
import socket # create an INET, STREAMing socket s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # now connect to the web server on port 80 - the normal http port s.connect(("domain_name", 80)) #encode converts from unicode internally to UTF-8. We need this because we have to send in UTF-8 cmd = 'GET http://domain_name/file_name.txt HTTP/1.0\n\n'.encode() s.send(cmd) while True: data = s.recv(512) if len(data) < 1: break print(data.decode()) #getting in UTF-8 encoded data, most likely. decode() converts it to the internal format Unicode s.close()
Using urllib in Python
Using urllib we get the data off of a web page and treat it like a file!
urllib.request.urlopen(‘web_page’) opens the page and returns back a file handle.
It’s kind of like a normal filehandle that you would if you opened a file.
import urllib.request, urllib.parse, urllib.error fh = urllib.request.urlopen('web_page') for line in fh: print(line.decode().strip())