最近因为需要写了一个山寨的围脖私信导出脚本。Run了一个小时,导出了近两万条私信,效果还算可以。Google一下网上没人贴这种工具,所以野人献曝一下。具体的技术细节放在英文的blog上。测试环境是Ubuntu 11.11,需要用pip安装selenium,然后再去Chromium的页面下载Chrome的WebDriver binary。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | # -*- coding: utf-8 –*- # A sina weibo DM export tool from selenium import webdriver from selenium.common.exceptions import TimeoutException from selenium.webdriver.support.ui import WebDriverWait # available since 2.4.0 import time import codecs f = codecs.open('weibo.txt', encoding='utf-8', mode='w+') # Create a new instance of the Chrome driver = webdriver.Chrome() # Set 15 sec as default timeout (maximum waiting time if something can't be found) driver.implicitly_wait(15) # go to the direct message history page (for DM with one user) driver.get("http://weibo.com/message/history?uid=xxxxxxxxxx") # Find loginname input box loginnameInput = driver.find_element_by_id("loginname") loginnameInput.send_keys("me@mydomain.com") # Find password input box passwdInput = driver.find_element_by_id("password") passwdInput.send_keys("mypasswd") # Find the submit button submitButton = driver.find_element_by_id("login_submit_btn") # Submit submitButton.click() n = 1 more = 1 while more: # Find message box messages = driver.find_elements_by_class_name("txt") # Find time tag box ts = driver.find_elements_by_css_selector("em.W_textb.date") f.write(ts[-1].text + "\n") for msg in reversed(messages): if (msg.text != ''): f.write(msg.text + "\n") f.flush() buttons = driver.find_elements_by_class_name("W_btn_a") more = 0 for button in buttons: # Next page or previous page if button.text == u'上一页': more = 1 break if more: n += 1 print 'Page %d' % n button.click() time.sleep(2) f.close() print 'All Done!' driver.quit() |
这个脚本稍加改动应该就可以做成自动化的工具,不过懒得去折腾了,就搁这吧。