Rei

Contents [ hide ] Notice: From 2022.08.31, This project is out of maintenance The source code is on Github: ZhihuScraper License This project uses the GPLv3 license. Explanation of This Project This project is written in Python 3.10+, runs on Windows by default, and uses the Edge browser. For other systems and browsers, see the later "Instructions for use". This project crawls the content in Zhihu's public collection folder and saves it as an html file, including answers, articles, pins, and videos. The save location defaults to C:\__assets__\the name of the collection folder, the name of the file is Author: Title and for the pin it uses its first 13 characters as the title. Pictures and videos are saved in the form of links and need to be connected to the Internet to view, so the content saved locally is actually only text. The reason for crawling public collection folder is that private collection folder need to be logged in to Zhihu before they can be acce...

Search This Blog

Rei

Posts

Zhihu Scraper: Scraping the text content in Zhihu's public collection folder (Videos and pictures are saved in the form of links)