Technology Sharing

Python crawler tutorial Part 6 - Using session to make requests

2024-07-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Why use sessions?

We have previously introduced how to use reqesuts to initiate a request. Today, we will introduce how to use session to initiate a request. Session is simply a session mechanism. After we log in in the browser, we do not need to log in again to request service data, because the cookie has saved your session status. Each request will automatically carry cookie parameters. If you use reqeusts.request, you must manually carry cookie parameters each time. The reqeuest.Session() session object allows you to maintain certain parameters across requests. It will also maintain cookies between all requests issued by the same Session instance, so you do not need to manually handle the cookie status each time.

Reference Documents:
Official Documentation

how to use

The usage of session is actually similar to the request method, and it also supports methods such as session.get(), session.post(), session.request(), etc.

s = requests.Session()

s.get('https://httpbin.org/cookies/set/sessioncookie/123456789')
r = s.get('https://httpbin.org/cookies')

print(r.text)
# '{"cookies": {"sessioncookie": "123456789"}}'
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

Best Practices for Cookie Reuse

When processing some websites, the login verification permissions required can be requested through sessions. The cookies obtained after login can be saved. In this way, each time you need to log in later, you can directly use the saved cookies to construct a session and then initiate a request, which can avoid repeated logins. Suitable for concurrent crawling of data on multiple machines.

Cookie reuse practice:

import json
import traceback

import requests.utils

from executor.page_executor import PageExecutor
from file_path import get_absolute_path


cookie_path = get_absolute_path('data/cookie.txt')
request_session: requests.Session = None

def __load_cookie():
    '''
    加载本地cookie,如果存在加载,如果不存在就返回空
    :param session:
    :return:
    '''
    try:
        with open(cookie_path, "r") as f:
            load_cookie = json.load(f)
            return requests.utils.cookiejar_from_dict(load_cookie)
    except Exception as e:
        traceback.print_exc()
        return None

def get_session():
    global request_session
    if request_session is not None:
        return request_session
    else:
        request_session = requests.Session()
        exist_cookies = __load_cookie()
        if exist_cookies is not None:
            request_session.cookies.update(exist_cookies)

        return request_session

def save_cookie():
    # 登录成功, session里的cookie是最全的,response返回的cookie不全
    cookiejar = requests.utils.dict_from_cookiejar(request_session.cookies)
    with open(cookie_path, "w") as f:
        json.dump(cookiejar, f, indent=True)
    whv_logger.info('cookies saved to ./data/cookie.txt')


def update_cookie():
    '''
    为什么需要一个新的session
    # 走到这一步,说明session已经过期,重新获取session,需要重新处理下session
    # 1. 但是因为携带有旧的session,导致携带旧的__RequestVerificationToken和新的__RequestVerificationToken一起请求,登录失败
    # 2. 所以需要重新处理下session,主要是处理__RequestVerificationToken
    :return:
    '''
    error_cookie_jar = requests.utils.dict_from_cookiejar(request_session.cookies)

    new_cookie_jar = {'__RequestVerificationToken': error_cookie_jar['__RequestVerificationToken']}
    new_cookie = requests.utils.cookiejar_from_dict(new_cookie_jar)

    # 清空旧的cookie
    request_session.cookies.clear_session_cookies()
    # 填充新的cookie
    request_session.cookies.update(new_cookie)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63