python--爬虫01

2023-09-15 阅读 31 评论 0

摘要：首先要引入urllib包利用urlopen打开需要采集的网址import urllib.request url = "http://www.baidu.com" htmlobj = urllib.request.urlopen(url) html = htmlobj.read() #二进制的代码 html = html.decode("utf-8") #解码 print(html)#####

首先要引入urllib包
利用urlopen打开需要采集的网址import urllib.request
url = "http://www.baidu.com"
htmlobj = urllib.request.urlopen(url)
html = htmlobj.read()  #二进制的代码
html = html.decode("utf-8")  #解码
print(html)################################################## 获取到的网页代码<html>
<head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><meta content="always" name="referrer"><meta name="theme-color" content="#2932e1"><link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" /><link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" title="百度搜索" /> <link rel="icon" sizes="any" mask href="//www.baidu.com/img/baidu.svg"><link rel="dns-prefetch" href="//s1.bdstatic.com"/><link rel="dns-prefetch" href="//t1.baidu.com"/><link rel="dns-prefetch" href="//t2.baidu.com"/><link rel="dns-prefetch" href="//t3.baidu.com"/>.................................................

原文链接：https://hbdhgg.com/2/59442.html

上一篇：python爬虫04--有道翻译

下一篇：python--采集1(urllib模块)