setting up Django with Scrapy

Setting up Django with Scrapy

This guide is about using Django, the most popular Python web framework, and Scrapy, the most popular Python scraping framework. Both of the frameworks are awesome, and they work very well standalone.

Before you continue reading, make sure you are already beyond “Getting Started” stage ...

more ...

python/lxml比xml.dom.minidom的xml文件解析速度好太多

第一个链接monidom解析要十几二十秒,第二个8G内存机器用守内存硬盘狂闪,机器基本停止响应。 ==== 而换成lxml,解析第二个,只要一秒。 ==== {{{#!highlight python

!/usr/bin/env python

-- coding: utf-8 --

import urllib2 import base64 import cStringIO import zipfile from lxml import etree s = cStringIO.StringIO()

http://192.168.1.31:82/ydec/service/datareport/getOmOrgList.action?datatype=xml&hucite=test&starttime=2012-09-04%2014:06:21&endtime ...

more ...

python自动提取页面正文

用BeautifulSoup 库取页面中p标签,给每个p按“,"和","加权重分,去除无用节点。 BeautifulSoup3下正常,BeautifulSoup4有点问题 [[attachment:readability.py]]

more ...