文章詳情頁

python爬取豆瓣電影TOP250數(shù)據(jù)

瀏覽：58日期：2022-06-18 16:25:24

在執(zhí)行程序前，先在MySQL中創(chuàng)建一個數(shù)據(jù)庫'pachong'。

import pymysqlimport requestsimport re#獲取資源并下載def resp(listURL): #連接數(shù)據(jù)庫 conn = pymysql.connect(host = ’127.0.0.1’,port = 3306,user = ’root’,password = ’******’, #數(shù)據(jù)庫密碼請根據(jù)自身實際密碼輸入database = ’pachong’, charset = ’utf8’ ) #創(chuàng)建數(shù)據(jù)庫游標 cursor = conn.cursor() #創(chuàng)建列表t_movieTOP250（執(zhí)行sql語句） cursor.execute(’create table t_movieTOP250(id INT PRIMARY KEY auto_increment NOT NULL ,movieName VARCHAR(20) NOT NULL ,pictrue_address VARCHAR(100))’) try:# 爬取數(shù)據(jù)for urlPath in listURL: # 獲取網(wǎng)頁源代碼 response = requests.get(urlPath) html = response.text # 正則表達式 namePat = r’alt='(.*?)' src=’ imgPat = r’src='https://www.xxx.com.cn/bcjs/(.*?)' class=’ # 匹配正則（排名【用數(shù)據(jù)庫中id代替，自動生成及排序】、電影名、電影海報（圖片地址）） res2 = re.compile(namePat) res3 = re.compile(imgPat) textList2 = res2.findall(html) textList3 = res3.findall(html) # 遍歷列表中元素,并將數(shù)據(jù)存入數(shù)據(jù)庫 for i in range(len(textList3)):cursor.execute(’insert into t_movieTOP250(movieName,pictrue_address) VALUES('%s','%s')’ % (textList2[i],textList3[i]))#從游標中獲取結果cursor.fetchall()#提交結果conn.commit()print('結果已提交') except Exception as e:#數(shù)據(jù)回滾conn.rollback()print('數(shù)據(jù)已回滾') #關閉數(shù)據(jù)庫 conn.close()#top250所有網(wǎng)頁網(wǎng)址def page(url): urlList = [] for i in range(10):num = str(25*i)pagePat = r’?start=’ + num + ’&filter=’urL = url+pagePaturlList.append(urL) return urlListif __name__ == ’__main__’: url = r'https://movie.douban.com/top250' listURL = page(url) resp(listURL)

結果如下圖：

以上就是我的分享，如果有什么不足之處請指出，多交流，謝謝！

以上就是python爬取豆瓣電影TOP250數(shù)據(jù)的詳細內容，更多關于python爬取豆瓣電影的資料請關注好吧啦網(wǎng)其它相關文章！

豆瓣 Python

上一條：python辦公自動化之excel的操作下一條：基于Python繪制子圖及子圖刻度的變換等的問題

相關文章：

1. Python TestSuite生成測試報告過程解析2. python讓函數(shù)不返回結果的方法3. python之cur.fetchall與cur.fetchone提取數(shù)據(jù)并統(tǒng)計處理操作4. JSP之表單提交get和post的區(qū)別詳解及實例5. python實現(xiàn)讀取類別頻數(shù)數(shù)據(jù)畫水平條形圖案例6. PHP循環(huán)與分支知識點梳理7. 解決AJAX返回狀態(tài)200沒有調用success的問題8. chat.asp聊天程序的編寫方法9. 低版本IE正常運行HTML5+CSS3網(wǎng)站的3種解決方案10. jsp實現(xiàn)登錄驗證的過濾器

排行榜

					
					詳解JAVA 強引用
SpringBoot集成mqtt的多模塊項目配置詳解
IntelliJ IDEA導入jar包的方法
python之cur.fetchall與cur.fetchone提取數(shù)據(jù)并統(tǒng)計處理操作
JSP之表單提交get和post的區(qū)別詳解及實例
PHP循環(huán)與分支知識點梳理
python實現(xiàn)讀取類別頻數(shù)數(shù)據(jù)畫水平條形圖案例
Python TestSuite生成測試報告過程解析
chat.asp聊天程序的編寫方法
解決AJAX返回狀態(tài)200沒有調用success的問題
jsp實現(xiàn)登錄驗證的過濾器