Inventory a practical case of Python automation office

thumbnail

Hello everyone, I’m Pippi.

I. Introduction

A few days ago in the Python diamond exchange group [Hxy let me fat] asked a question about Python automation office, the screenshot of the question is as follows:

The desired effect is as follows:

To be precise, this is not a problem, but a real need.

Second, the realization process

Here [Jason] gave a feasible idea, as follows:

Later, [Mr. Yu Liang] gave a specific code, as shown below:

import re

from docx import Document

import pandas as pd

document = Document(“Judgment (Bracket Processing) (1).docx”)

all_paragraphs = document.paragraphs

data = [paragraph.text for paragraph in all_paragraphs if ‘√’ in paragraph.text or ‘×’ in paragraph.text]

data = “.join(data)

res = re.findall(’[√×]‘, data, re.S)

res = [f’{k + 1}.{v}’ for k, v in enumerate(res)]

df = pd.DataFrame(res)

df.to_excel(‘test9-13.xlsx’, index=False, header=None)

Really too strong!

After the code is run, the expected result can be obtained, as shown in the following figure:

Later, based on this code, [Eating Hawthorn Slices Crazy] came up with a simplified version, the code is as follows:

import re

from docx import Document

import pandas as pd

document = Document(r”Judgment (Bracket Processing)(1).docx”)

text = document.part.blob.decode(‘utf-8’)

text = re.sub(r’<.*?>’, “, text)

text = re.sub(r’.\s+‘, r’.‘, text)

df = pd.DataFrame(re.findall(r’\d+.[√×]‘, text))

df.to_excel(‘result.xlsx’, header=None, index=False)

This technology is really home, superb.

After the code runs, this requirement can also be fully realized.

Later, [Mr. Yu Liang] also gave a code, which is also very good, as shown below:

data = [paragraph.text for paragraph in all_paragraphs if ‘√’ in paragraph.text or ‘×’ in paragraph.text]

Combine into one long string, then replace to remove all spaces

data = “.join(data).replace(’ ‘, “)

Use re regular expression to extract all answers with question numbers

res = re.findall(r’\d+.[√×]‘, data, re.S)

df = pd.DataFrame(res)

df.to_excel(‘test9-13.xlsx’, index=False, header=None)

It’s amazing! Replacing and deleting all the extra spaces can prevent the answer from containing spaces and cannot be matched by the regular r’\d+.[√×]‘, so it is done in one step. Stop using list comprehensions to construct answers.

Do you think this is the end?

Later, [Ning] used the openpyxl library to get it done. The code is shown in the following figure:

import re

import docx

import openpyxl

def str_work(string:str):

return [*filter(None,re.split(’.’,re.sub(’\d+‘,“,string.replace(’ ‘, “).replace(’\n’, “))))]

wb = openpyxl.Workbook()

ws = wb.active

ws.append([‘question’,‘answer’])

doc = docx.Document(r’C:\Users\Administrator\Desktop\judgment (bracket processing).docx’)

doc_text = ‘\n’.join(( i.text for i in doc.paragraphs[3:]))

doc_list = doc_text.split(’\nOne, True or False’)

title_row = [i.strip() for i in doc_list[0].split(’\n’) if i.strip().split(‘、’)!=[“]]

answer_row = [i for i in str_work(doc_list[1])]

for i in zip(title_row,answer_row):

ws.append(list(i))

wb.save(‘1.xlsx’)

The result obtained after running is shown in the following figure:

  1. Summary ==========

Hello everyone, I’m Pippi. This article mainly takes stock of a problem of Python automation office. In response to this problem, the article gives specific analysis and code implementation to help fans solve the problem smoothly.

Finally, I would like to thank the fans [Hxy let me fat] for their questions, [Jason], [Ms. Yu Liang], [Eat Hawthorn Slices Crazy], and [Classmate Ning] for their ideas and code analysis, and thanks to [dcpeng], [Postpartum Repair] , [such creatures], [Yu Kefu] and others participated in the study and exchange.

Latest Programming News and Information | GeekBar

Related Posts