DOCX - Word Documents
Create, edit, and analyze Word documents with tracked changes, comments, and formatting using python-docx and other libraries.
npx degit LangbaseInc/agent-skills/document-skills/docx my-docx-skill
Programmatically work with Microsoft Word documents (.docx) for automation, analysis, and manipulation.
Document Creation
- Create new documents from scratch
- Use templates
- Add headers and footers
- Set page layout and margins
Content Manipulation
- Add/edit paragraphs
- Insert tables
- Add images
- Create lists (bulleted/numbered)
- Insert hyperlinks
Formatting
- Text formatting (bold, italic, underline)
- Font family and size
- Text color and highlighting
- Paragraph alignment
- Line spacing and indentation
- Styles and themes
Advanced Features
- Track changes
- Add comments
- Insert footnotes/endnotes
- Table of contents
- Page breaks and sections
- Headers and footers
Python
from docx import Document
from docx.shared import Inches, Pt, RGBColor
# Create new document
doc = Document()
# Add heading
doc.add_heading('Document Title', 0)
# Add paragraph
p = doc.add_paragraph('This is a paragraph.')
p.add_run(' This is bold.').bold = True
# Add table
table = doc.add_table(rows=2, cols=2)
table.cell(0, 0).text = 'Header 1'
# Save
doc.save('document.docx')
Node.js
const docx = require('docx');
const { Document, Paragraph, TextRun } = docx;
const doc = new Document({
sections: [{
properties: {},
children: [
new Paragraph({
children: [
new TextRun("Hello World"),
new TextRun({
text: "Bold Text",
bold: true,
}),
],
}),
],
}],
});
Document Generation
- Automated report creation
- Contract generation
- Invoice creation
- Certificate generation
- Letter templates
Document Analysis
- Extract text content
- Count words/pages
- Analyze structure
- Extract metadata
- Find and replace
Document Transformation
- Convert to other formats
- Merge multiple documents
- Split documents
- Update templates
- Batch processing
# Font formatting
run = paragraph.add_run('Formatted text')
run.font.name = 'Arial'
run.font.size = Pt(12)
run.font.bold = True
run.font.italic = True
run.font.underline = True
run.font.color.rgb = RGBColor(255, 0, 0)
# Create table
table = doc.add_table(rows=3, cols=3)
table.style = 'Light Grid Accent 1'
# Populate cells
for row in table.rows:
for cell in row.cells:
cell.text = 'Data'
# Access specific cell
cell = table.cell(0, 0)
cell.text = 'Header'
# Add image
doc.add_picture('image.png', width=Inches(2.0))
# Add image to specific location
paragraph = doc.add_paragraph()
run = paragraph.add_run()
run.add_picture('image.png', width=Inches(1.0))
# Apply built-in style
doc.add_paragraph('Text', style='Heading 1')
doc.add_paragraph('Text', style='List Bullet')
# Custom style
from docx.enum.style import WD_STYLE_TYPE
styles = doc.styles
style = styles.add_style('CustomStyle', WD_STYLE_TYPE.PARAGRAPH)
style.font.name = 'Calibri'
style.font.size = Pt(14)
# Open existing document
doc = Document('existing.docx')
# Read all paragraphs
for paragraph in doc.paragraphs:
print(paragraph.text)
# Read all tables
for table in doc.tables:
for row in table.rows:
for cell in row.cells:
print(cell.text)
# Add comment (requires additional libraries)
from docx.oxml import OxmlElement
def add_comment(paragraph, text, author):
# Create comment element
comment = OxmlElement('w:comment')
comment.set('w:author', author)
comment.set('w:initials', author[0])
# Add comment text
p = OxmlElement('w:p')
comment.append(p)
- Use styles instead of direct formatting
- Validate content before inserting
- Handle exceptions properly
- Close documents after editing
- Test with various Word versions
- Preserve original formatting when editing
- Use templates for consistency
- Optimize for large documents
Merge Documents
from docx import Document
from docx.oxml.shared import OxmlElement
def merge_docs(files):
merged = Document()
for file in files:
doc = Document(file)
for element in doc.element.body:
merged.element.body.append(element)
return merged
Find and Replace
def replace_text(doc, old, new):
for paragraph in doc.paragraphs:
if old in paragraph.text:
paragraph.text = paragraph.text.replace(old, new)
return doc
- DOCX (native)
- PDF (via conversion)
- HTML (via conversion)
- Plain text extraction