最近接到老师的一个小任务,就是把txt文档里面每一条记录的歌词下载下来并且以歌曲的ID为文件名称保存。文件格式如下:
2566548|10003|Down And Out In Birmingham|Pirates Of The Mississippi|2502001|Pirates Of The Mississippi|http://musicdata.baidu.com/data2/pic/115488944/115488944.jpg|90984755,3013166,46207696,3899540,||3899540|欧美,英语|1
2566446|10003|I Take My Comfort In You|Pirates Of The Mississippi|2502001|Pirates Of The Mississippi|http://musicdata.baidu.com/data2/pic/115488944/115488944.jpg|50346556,3013160,46176406,3897682,||3897682|欧美,英语|1
2566376|10003|Rollin Home (Pirates)|Pirates Of The Mississippi|2502001|Pirates Of The Mississippi|http://musicdata.baidu.com/data2/pic/115488944/115488944.jpg|89640307,3133916,89640499,3734015,||3734015|欧美,英语|1
2566371|10003|Speak Of The Devil|Pirates Of The Mississippi|2502001|Pirates Of The Mississippi|http://musicdata.baidu.com/data2/pic/115488944/115488944.jpg|49740044,3013161,46175281,3896707,|http://music.baidu.com/data2/lrc/12489233/12489233.lrc|3896707|英语,欧美,关键音,电声乐器,略使用人声合唱|1
2566334|10003|Talkin Bout Love|Pirates Of The Mississippi|2502001|Pirates Of The Mississippi|http://musicdata.baidu.com/data2/pic/115488944/115488944.jpg|89637378,3013164,45176292,3898831,||3898831|欧美,英语|1
2566321|10003|Redneck Rock N Roll|Pirates Of The Mississippi|2502001|Pirates Of The Mississippi|http://musicdata.baidu.com/data2/pic/115488944/115488944.jpg|90828205,3013169,46175679,3897632,||3897632|欧美,英语|1
2566291|10003|Anything Goes|Pirates Of The Mississippi|2502001|Pirates Of The Mississippi|http://musicdata.baidu.com/data2/pic/115488944/115488944.jpg|90959150,3133919,90959245,3735356,|http://music.baidu.com/data2/lrc/12489216/12489216.lrc|3735356|欧美,英语,北美流行|1
2566278|10003|Honky Tonk Blues|Pirates Of The Mississippi|2502001|Pirates Of The Mississippi|http://musicdata.baidu.com/data2/pic/115488944/115488944.jpg|90996181,3133968,46175067,3734602,||3734602|欧美,英语|1
2566264|10003|Feed Jake|Pirates Of The Mississippi|2502001|Pirates Of The Mississippi|http://musicdata.baidu.com/data2/pic/115488944/115488944.jpg|90961516,3736359,46505620,12953128,||12953128|吉他,英语,欧美,原音,关键音|1
2566175|10003|Jolly Roger/Pirates Of The Misissippi|Pirates Of The Mississippi|2502001|Pirates Of The Mississippi|http://musicdata.baidu.com/data2/pic/115488944/115488944.jpg|90891846,3133971,90892078,3734253,||3734253|欧美,英语|1
2202028/70062199ac085441291f#6af6ba5f844fb7b92ea830c1e261d6fa|10003|Dream You (Dance Mix)|Pirates Of The Mississippi|2028188|The Best Of Pirates Of The Mis|http://c.hiphotos.baidu.com/ting/pic/item/6609c93d70cf3bc785905cf9d300baa1cd112a16.jpg|67575556,||67575556||1
2601446|10002|Alright|Pilot Speed|2505781|Into The West|http://musicdata.baidu.com/data2/pic/115530058/115530058.jpg|3146724,45388161,3717336,|http://music.baidu.com/data2/lrc/14892491/14892491.lrc|3717336|alternative pop,独立流行,search,独立摇滚,伤感|60
121415441|10002|Alright|Pilot Speed|121414726|Into The West|http://b.hiphotos.baidu.com/ting/pic/item/10dfa9ec8a1363273fb23bf7938fa0ec09fac793.jpg|122645521,122645530,122645337,||122645337||30
仔细一看,数据格式很规范,直接操作就可以了,但是仔细一观察发现,有的地方的"|"变成了"/",所以现在的工作是把前面的ID后面有的斜杠变成竖杠,然后把歌曲ID提取出来,之后就是把歌曲的歌词URL提取出来。
经过一番折腾,目前实现了结果:
其实逻辑很简单,就是上面说的那样,下面上代码,才疏学浅,只能实现基本功能,无优化
- # -*- coding: utf-8 -*-
- import os
- import re
- for line in open("E:\\song.txt"):
- # line2 = line.replace('/', '|',1)
- # print line
- # fp = open('E://song1.txt','w')
- # fp.write(line2)
- # fp.close()
- matchId = re.match(r'(.*)|',line.replace('/','|'),re.M | re.I)
- if matchId:
- print "歌曲ID: ", matchId.group().split('|',1)[0]
- matchObj = re.search(r'http://music.baidu.com/data2/lrc(.+?).lrc', line, re.M | re.I)
- if matchObj:
- print "歌曲歌词URL: ", matchObj.group()
- else:
- print "LRC is null!!"
- print "<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<"
输出结果就是上面的格式;
满眼福利,忙不过来了!
@主题猫 大神别和我开玩笑,这么简单的知识大神肯定是看不在眼里
@主题猫 大神别闹