2024-07-11
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
Get web page content:
http request
Python's Requests library
Parsing web page content
HTML page structure
Beautiful Soup library for Python
Store or analyze data
Store in database
Data as AI Analysis
Convert to chart display
By sending a large number of high-frequency requests to the server, a large amount of web resources are consumed, affecting the requests of other users
You can check the robots.txt file of the website to understand the range of web page paths that can be crawled.
A request-response protocol between a client and a server.
Request method: (commonly used)
GET: Get data
POST: Create data
POST /user/info HTTP/1.1 #请求行(包含方法类型、资源路径、协议版本)
Host:www.example.com #请求头
User-Agent:curl/7.77.0 #请求头
Accept:*/* #请求头
{"username":"呦呦呦", #请求体
"email":"[email protected]"} #请求头