Results 1 to 1 of 1

Thread: Extracting tag and data

  1. #1
    brendin44 is offline Senior Member
    Join Date
    Dec 2008
    Posts
    180
    Rep Power
    4

    Default Extracting tag and data

    Your first job is to find the tag and the data in which you are interested. Then find a suitable pattern so that you can select it using a program. In all probability, you are most likely to want information from a financial or sports site. But let us take a simple example. You love movies and would prefer to decide your evening plans after knowing the films on television. So, you can write an application to extract just the film name and the starting time from the Web page. Go to the URL of a channel's current schedule, for example, and save Go to the URL of a channel's current schedule, for example, and save this page as WeeklyListing.html, The local page will help you understand the content and the fields you need-the name and the time of the show.

    Name:  Extracting tag and data.jpg
Views: 21
Size:  46.1 KB

    Use an html editor, like Quanta Plus or Bluefish, to examine WeeklyListing. html. Combine that by looking at the page and the output of the test mode of sgmllib to identify what you need. Usually, the data you are interested in is in an HTML table and td tags, possibly enclosed in a div tag. In this case, the div tag with id lista contains the schedule for the current day.

    You are now ready to write your code. The nice thing is that all your development can be done on the desktop and then moved to the device. You can do some testing by using the device image and running it on the desktop using Qemu. Write the following code injilm_schedule.py:

    The SGML parser initially calls the reset method. If there is a method starUagname, it will call that method at the start of a tag named tagname. The parameters in the tag are passed as a list of name and value pairs. You will need to look at other tags once we are in the desired block. So, use a flag self. wanted. Set it to true once the desired div starts and reset it tojalse once the end of that tag is reached. While testing, you may feed the parser the saved HTML file. Later, you will call the actual Web page using urlopen. Now you can try this code as follows: so there is only one occurance of the in which in you are interested, The film name and time are the data in td tags with class listcontentOl, So, you will need to handle td tags, but only within the desired div, Each row can be identified by the tr tag. Further, you will need to capture the data by a method handle_data, So, your code injilm_schedule.py should look like what's shown below:

    handle_data is a method that we will use to process the data between the tags, Now, run the following code:
    Last edited by brendin44; 12-19-2008 at 07:16 AM.

Similar Threads

  1. Extracting Boot Image from ISO
    By AllenBrown in forum Software Jargons
    Replies: 0
    Last Post: 01-16-2010, 01:06 PM
  2. Create a Self Extracting Zip File with WinZip
    By arsenal in forum Software Jargons
    Replies: 0
    Last Post: 12-02-2009, 02:53 PM
  3. Where to find the extracting button in the sound converter
    By Jimmy Mosquito in forum Linux/Free BSD
    Replies: 0
    Last Post: 12-02-2009, 11:07 AM
  4. Replies: 0
    Last Post: 11-11-2009, 07:48 AM
  5. How do I create a 7zip self extracting split archive?
    By Javeon Edrich in forum General Software Terms
    Replies: 0
    Last Post: 08-24-2009, 06:22 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
SEO by SubmitEdge

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48