2024-07-12
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
Python's requests library is a powerful and easy-to-use HTTP library for sending HTTP requests and processing responses. It is one of the most popular web crawler frameworks in Python and is widely used to extract data from web pages, crawl websites, and make API calls.
Using the requests library, you can easily send various HTTP requests, including GET, POST, PUT, DELETE, etc. You can create an HTTP request object, set the request header, request body, and other parameters, then send the request and get the response. The requests library provides many convenient methods to process the response, including getting the response content, parsing JSON, parsing HTML, etc.
If requests is not installed in your local Python environment, you can enter the command in the command prompt window
pip install requests
Install the requests module
We can open a web page at random, press F12->"Ctrl+R" to refresh, double-click the item in the name
You can see User-Agent and Cookie
The following are some common requests library functions and usages:
Send a GET request:
response = requests.get(url)
Send a POST request:
response = requests.post(url, data=payload)
Set the request header:
- headers = {'User-Agent': 'Mozilla/5.0'}
- response = requests.get(url, headers=headers)
Passing URL parameters:
- params = {'key1': 'value1', 'key2': 'value2'}
- response = requests.get(url, params=params)
Send File:
- files = {'file': open('file.txt', 'rb')}
- response = requests.post(url, files=files)
Get the response content:
print(response.text)
Parsing the JSON response:
json_data = response.json()
Parsing the HTML response:
- from bs4 import BeautifulSoup
- soup = BeautifulSoup(response.text, 'html.parser')
Handling exceptions:
- try:
- response = requests.get(url)
- response.raise_for_status()
- except requests.HTTPError as e:
- print('HTTPError:', e)
- except requests.ConnectionError as e:
- print('ConnectionError:', e)
- except requests.Timeout as e:
- print('Timeout:', e)
- except requests.RequestException as e:
- print('RequestException:', e)
The above is just a small part of the functions of the requests library. It also provides many other advanced features and options, such as session management, authentication, proxy settings, etc., which can help you easily perform web crawlers and API calls.
The complete request function routine:
- import requests
-
-
- def get_html(url):
- '''
- 两个参数
- :param url:统一资源定位符,请求网址
- :param headers:请求头
- :return html 网页的源码
- :return sess 创建的会话
- '''
-
- # 请求头
- headers={'User-Agent': '复制了放这里'}
- # 创建Session, 并使用Session的get请求网页
- sess = requests.Session()
- response = sess.get(url=url,headers = headers)
- # 获取网页信息文本
- html = response.text
-
- return html, sess