博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
体彩数据爬取
阅读量:6573 次
发布时间:2019-06-24

本文共 3768 字,大约阅读时间需要 12 分钟。

大乐透

爬取1

# 爬取大乐透的开奖历史数据# http://www.lottery.gov.cn/api/lottery_kj_detail_new.jspx?_ltype=4&_term=19026import requestsimport reimport csvagent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'headers = {    'User-Agent': agent,}proxy={    "http":"125.39.9.34:9000",}url = 'http://www.lottery.gov.cn/api/lottery_kj_detail_new.jspx'start = int(input('输入开始期号:'))  # 18134end = int(input('输入结束期号:')) # 19029lottery_li = [] for qihao in range(start,end+1):    data={        '_ltype':'4',        '_term':qihao,    }    page_text = requests.post(url=url,headers=headers,data=data,proxies=proxy).text    print(page_text)    if page_text:        # 根据返回数据解析        # 开奖号码        lottery_data = re.findall('codeNumber\"\:\[(.*?)\],\"',page_text,re.M)        if lottery_data:            num_data = lottery_data[0].replace("\"",'')            # print(num_data) # 10,12,15,17,19,02,03            lottery_list = num_data.split(',')            lottery_list.insert(0,qihao)            # print(lottery_list) # ['10', '12', '15', '17', '19', '02', '03']            lottery_li.append(lottery_list)with open('lottery_data.csv','w',newline='') as csvf:    spanwriter=csv.writer(csvf,dialect='excel')   #创建writer对象    spanwriter.writerow(['qihao','red1','red2','red3','red4','red5','blue1','blue2'])  #使用writer的方法writerow写入到文件    spanwriter.writerows(lottery_li)  #迭代写入数据    print('done.....................')

爬取2(所有开奖记录)

# 爬取大乐透的开奖历史数据# http://www.lottery.gov.cn/historykj/history.jspx?_ltype=dltimport requestsimport reimport csvfrom lxml import etreeimport randomimport timeagent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'headers = {    'User-Agent': agent,}proxies=[    {"http":"125.39.9.34:9000"},    {"http":"222.139.125.232:8060"},]proxy = random.choice(proxies)params={    '_ltype':'dlt',}page = int(input("end page no:"))lottery_data = []for page_no in range(1,page+1):    url = 'http://www.lottery.gov.cn/historykj/history_%s.jspx' % page_no    page_text = requests.get(url=url,params=params,headers=headers,proxies=proxy).text    time.sleep(1)    # print(page_text)    tree = etree.HTML(page_text)    tr_list = tree.xpath('//div[@class="result"]/table/tbody/tr') # 每页所有的tr 20    td_list = tree.xpath('//div[@class="result"]/table/tbody/tr/td') # 每页所有的td 400        for num_tr in range(1,len(tr_list) + 1):    #     print(num_tr)        td_qihao = tree.xpath('//div[@class="result"]/table/tbody/tr[%s]/td[1]//text()'%(num_tr))        td_red1 = tree.xpath('//div[@class="result"]/table/tbody/tr[%s]/td[2]//text()'%(num_tr))        td_red2 = tree.xpath('//div[@class="result"]/table/tbody/tr[%s]/td[3]//text()'%(num_tr))        td_red3 = tree.xpath('//div[@class="result"]/table/tbody/tr[%s]/td[4]//text()'%(num_tr))        td_red4 = tree.xpath('//div[@class="result"]/table/tbody/tr[%s]/td[5]//text()'%(num_tr))        td_red5 = tree.xpath('//div[@class="result"]/table/tbody/tr[%s]/td[6]//text()'%(num_tr))        td_blue1 = tree.xpath('//div[@class="result"]/table/tbody/tr[%s]/td[7]//text()'%(num_tr))        td_blue2 = tree.xpath('//div[@class="result"]/table/tbody/tr[%s]/td[8]//text()'%(num_tr))        lottery_one = td_qihao + td_red1+ td_red2+td_red3+td_red4+td_red5+td_blue1+td_blue2        lottery_data.append(lottery_one)# 写入csvwith open('all_lottery.csv','w',newline='') as csvf:    spanwriter=csv.writer(csvf,dialect='excel')   #创建writer对象    spanwriter.writerow(['qihao','red1','red2','red3','red4','red5','blue1','blue2'])  #使用writer的方法writerow写入到文件    spanwriter.writerows(lottery_data)  #迭代写入数据      print('done..................................')

 

转载于:https://www.cnblogs.com/fmgao-technology/p/10552202.html

你可能感兴趣的文章
Upgrade GI/CRS 11.1.0.7 to 11.2.0.2. Rootupgrade.sh Hanging
查看>>
vue组件样式scoped
查看>>
整站爬虫命令
查看>>
linux下ssh/sftp配置和权限设置
查看>>
微软职位内部推荐-SDE II
查看>>
SQLPlus获取oracle表操作SQL
查看>>
BFS(两点搜索) UVA 11624 Fire!
查看>>
字符串处理 BestCoder Round #43 1001 pog loves szh I
查看>>
How to add svn:externals in windows using TortoiseSVN
查看>>
JavaScript高级程序设计(5) 引用类型 (上)
查看>>
QT学习-10/31/2012
查看>>
python学习交流 - 匿名函数
查看>>
文章1(转)
查看>>
schedule调用相关整理
查看>>
node.js-session问题
查看>>
拦截器和过滤器的区别 -- 简单分析篇
查看>>
Python版本微信跳一跳,软件配置
查看>>
PropertyGrid仿VS的属性事件窗口
查看>>
ahjesus自定义隐式转换和显示转换
查看>>
@PathVariable、@RequestHeader与@CookieValue注解的使用案例
查看>>