Technology Sharing

Python reads word files and draws word cloud diagrams

2024-07-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

1. Install necessary libraries

pip install python-docx wordcloud matplotlib
  • 1

2. Complete code

import docx
from wordcloud import WordCloud
import matplotlib.pyplot as plt

# 读取Word文件内容
def read_word_file(file_path):
    doc = docx.Document(file_path)
    full_text = []
    for para in doc.paragraphs:
        full_text.append(para.text)
    return 'n'.join(full_text)

# 生成词云图
def generate_wordcloud(text):
    wordcloud = WordCloud(width=800, height=400, background_color='white').generate(text)
    
    # 显示词云图
    plt.figure(figsize=(10, 5))
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.axis('off')
    plt.show()

# 主函数
def main():
    file_path = 'your_word_file.docx'  # 替换为你的Word文件路径
    text = read_word_file(file_path)
    generate_wordcloud(text)

if __name__ == "__main__":
    main()

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31

3. Modify the Chinese garbled characters error

Notice:
If Chinese characters are garbled, you can modify them in the following ways:
insert image description here
Add fonts

wordcloud = WordCloud(width=800, height=400, background_color='white', font_path='simhei.ttf').generate(text)
  • 1

insert image description here

The effect after modification:
insert image description here

4. Detailed explanation

Detailed explanation

Install the library:

  • python-docx: used to read Word files.
  • wordcloud: used to generate word cloud diagrams.
  • matplotlib: used to display word cloud graphs.

Read the contents of a Word file:

  • Use python-docx's Document class to read Word files.
  • Iterate over the paragraphs in the document, appending the text of each paragraph to a list.
  • Concatenate the text of all paragraphs into a single string.

Generate a word cloud:

  • Use the WordCloud class of wordcloud to generate a word cloud chart.
  • Set the width, height and background color of the word cloud.
  • Call the generate method to generate a word cloud chart.
  • Use matplotlib to display word cloud graph.

Precautions

  • Make sure your Word file path is correct.
  • You can adjust the parameters of the word cloud chart, such as color, font, etc., as needed.
  • If there are many common words or stop words in your text, you can exclude them using the stopwords parameter of WordCloud.

By following the above steps, you can easily read Word files and generate beautiful word cloud charts.