Your first job is to find the tag and the data in which you are interested. Then find a suitable pattern so that you can select it using a program. In all probability, you are most likely to want information from a financial or sports site. But let us take a simple example. You love movies and would prefer to decide your evening plans after knowing the films on television. So, you can write an application to extract just the film name and the starting time from the Web page. Go to the URL of a channel's current schedule, for example, and save Go to the URL of a channel's current schedule, for example, and save this page as WeeklyListing.html, The local page will help you understand the content and the fields you need-the name and the time of the show.
Use an html editor, like Quanta Plus or Bluefish, to examine WeeklyListing. html. Combine that by looking at the page and the output of the test mode of sgmllib to identify what you need. Usually, the data you are interested in is in an HTML table and td tags, possibly enclosed in a div tag. In this case, the div tag with id lista contains the schedule for the current day.
You are now ready to write your code. The nice thing is that all your development can be done on the desktop and then moved to the device. You can do some testing by using the device image and running it on the desktop using Qemu. Write the following code injilm_schedule.py:
The SGML parser initially calls the reset method. If there is a method starUagname, it will call that method at the start of a tag named tagname. The parameters in the tag are passed as a list of name and value pairs. You will need to look at other tags once we are in the desired block. So, use a flag self. wanted. Set it to true once the desired div starts and reset it tojalse once the end of that tag is reached. While testing, you may feed the parser the saved HTML file. Later, you will call the actual Web page using urlopen. Now you can try this code as follows: so there is only one occurance of the in which in you are interested, The film name and time are the data in td tags with class listcontentOl, So, you will need to handle td tags, but only within the desired div, Each row can be identified by the tr tag. Further, you will need to capture the data by a method handle_data, So, your code injilm_schedule.py should look like what's shown below:
handle_data is a method that we will use to process the data between the tags, Now, run the following code:




Reply With Quote
Copyright Techfuels
Bookmarks