Inventory a practical case of Python automation office
Hello everyone, I’m Pippi.
I. Introduction
A few days ago in the Python diamond exchange group [Hxy let me fat] asked a question about Python automation office, the screenshot of the question is as follows:
The desired effect is as follows:
To be precise, this is not a problem, but a real need.
Second, the realization process
Here [Jason] gave a feasible idea, as follows:
Later, [Mr. Yu Liang] gave a specific code, as shown below:
import re
from docx import Document
import pandas as pd
document = Document(“Judgment (Bracket Processing) (1).docx”)
all_paragraphs = document.paragraphs
data = [paragraph.text for paragraph in all_paragraphs if ‘√’ in paragraph.text or ‘×’ in paragraph.text]
data = “.join(data)
res = re.findall(’[√×]‘, data, re.S)
res = [f’{k + 1}.{v}’ for k, v in enumerate(res)]
df = pd.DataFrame(res)
df.to_excel(‘test9-13.xlsx’, index=False, header=None)
Really too strong!
After the code is run, the expected result can be obtained, as shown in the following figure:
Later, based on this code, [Eating Hawthorn Slices Crazy] came up with a simplified version, the code is as follows:
import re
from docx import Document
import pandas as pd
document = Document(r”Judgment (Bracket Processing)(1).docx”)
text = document.part.blob.decode(‘utf-8’)
text = re.sub(r’<.*?>’, “, text)
text = re.sub(r’.\s+‘, r’.‘, text)
df = pd.DataFrame(re.findall(r’\d+.[√×]‘, text))
df.to_excel(‘result.xlsx’, header=None, index=False)
This technology is really home, superb.
After the code runs, this requirement can also be fully realized.
Later, [Mr. Yu Liang] also gave a code, which is also very good, as shown below:
data = [paragraph.text for paragraph in all_paragraphs if ‘√’ in paragraph.text or ‘×’ in paragraph.text]
Combine into one long string, then replace to remove all spaces
data = “.join(data).replace(’ ‘, “)
Use re regular expression to extract all answers with question numbers
res = re.findall(r’\d+.[√×]‘, data, re.S)
df = pd.DataFrame(res)
df.to_excel(‘test9-13.xlsx’, index=False, header=None)
It’s amazing! Replacing and deleting all the extra spaces can prevent the answer from containing spaces and cannot be matched by the regular r’\d+.[√×]‘, so it is done in one step. Stop using list comprehensions to construct answers.
Do you think this is the end?
Later, [Ning] used the openpyxl library to get it done. The code is shown in the following figure:
import re
import docx
import openpyxl
def str_work(string:str):
return [*filter(None,re.split(’.’,re.sub(’\d+‘,“,string.replace(’ ‘, “).replace(’\n’, “))))]
wb = openpyxl.Workbook()
ws = wb.active
ws.append([‘question’,‘answer’])
doc = docx.Document(r’C:\Users\Administrator\Desktop\judgment (bracket processing).docx’)
doc_text = ‘\n’.join(( i.text for i in doc.paragraphs[3:]))
doc_list = doc_text.split(’\nOne, True or False’)
title_row = [i.strip() for i in doc_list[0].split(’\n’) if i.strip().split(‘、’)!=[“]]
answer_row = [i for i in str_work(doc_list[1])]
for i in zip(title_row,answer_row):
ws.append(list(i))
wb.save(‘1.xlsx’)
The result obtained after running is shown in the following figure:
- Summary ==========
Hello everyone, I’m Pippi. This article mainly takes stock of a problem of Python automation office. In response to this problem, the article gives specific analysis and code implementation to help fans solve the problem smoothly.
Finally, I would like to thank the fans [Hxy let me fat] for their questions, [Jason], [Ms. Yu Liang], [Eat Hawthorn Slices Crazy], and [Classmate Ning] for their ideas and code analysis, and thanks to [dcpeng], [Postpartum Repair] , [such creatures], [Yu Kefu] and others participated in the study and exchange.
Latest Programming News and Information | GeekBar