If you need to extract information from the Web, the first thing your application will need to do is to access a Web page. After a little research, you will find the urllib2 is the appropriate module. The method you need is urlopen. After establishing the connection to a website, you will want to read the page. So, start Python and try the following code:

Name:  Extract Information from the Web.jpg
Views: 31
Size:  55.3 KB

The above code is equivalen"t to looking at the page source after going to You need to parse the page source so that you can extract only what you need.The obvious option htmllib uses sgmllib.If you are not really interested in formatting the page or
following links, then sgmllib is the easier option. It has a test feature also. So save a page that you are interested in, for example, one of the simplest pages on the Web. You Clm get started with unc!.erstanQing the structure and content of the page by trying the following:

Replace lib by lib64 on an x86-64 system and the appropriate Python home directory, in case it is not Python 2.5. You will see a lot of output!