DOCX - Word Documents

Create, edit, and analyze Word documents with tracked changes, comments, and formatting using python-docx and other libraries.

Download this Skill

npx degit LangbaseInc/agent-skills/document-skills/docx my-docx-skill

Overview

Programmatically work with Microsoft Word documents (.docx) for automation, analysis, and manipulation.

Key Capabilities

Document Creation

Create new documents from scratch
Use templates
Add headers and footers
Set page layout and margins

Content Manipulation

Add/edit paragraphs
Insert tables
Add images
Create lists (bulleted/numbered)
Insert hyperlinks

Formatting

Text formatting (bold, italic, underline)
Font family and size
Text color and highlighting
Paragraph alignment
Line spacing and indentation
Styles and themes

Advanced Features

Track changes
Add comments
Insert footnotes/endnotes
Table of contents
Page breaks and sections
Headers and footers

Common Libraries

Python

from docx import Document
from docx.shared import Inches, Pt, RGBColor

# Create new document
doc = Document()

# Add heading
doc.add_heading('Document Title', 0)

# Add paragraph
p = doc.add_paragraph('This is a paragraph.')
p.add_run(' This is bold.').bold = True

# Add table
table = doc.add_table(rows=2, cols=2)
table.cell(0, 0).text = 'Header 1'

# Save
doc.save('document.docx')

Node.js

const docx = require('docx');
const { Document, Paragraph, TextRun } = docx;

const doc = new Document({
  sections: [{
    properties: {},
    children: [
      new Paragraph({
        children: [
          new TextRun("Hello World"),
          new TextRun({
            text: "Bold Text",
            bold: true,
          }),
        ],
      }),
    ],
  }],
});

Use Cases

Document Generation

Automated report creation
Contract generation
Invoice creation
Certificate generation
Letter templates

Document Analysis

Extract text content
Count words/pages
Analyze structure
Extract metadata
Find and replace

Document Transformation

Convert to other formats
Merge multiple documents
Split documents
Update templates
Batch processing

Text Formatting

# Font formatting
run = paragraph.add_run('Formatted text')
run.font.name = 'Arial'
run.font.size = Pt(12)
run.font.bold = True
run.font.italic = True
run.font.underline = True
run.font.color.rgb = RGBColor(255, 0, 0)

Tables

# Create table
table = doc.add_table(rows=3, cols=3)
table.style = 'Light Grid Accent 1'

# Populate cells
for row in table.rows:
    for cell in row.cells:
        cell.text = 'Data'

# Access specific cell
cell = table.cell(0, 0)
cell.text = 'Header'

Images

# Add image
doc.add_picture('image.png', width=Inches(2.0))

# Add image to specific location
paragraph = doc.add_paragraph()
run = paragraph.add_run()
run.add_picture('image.png', width=Inches(1.0))

Styles

# Apply built-in style
doc.add_paragraph('Text', style='Heading 1')
doc.add_paragraph('Text', style='List Bullet')

# Custom style
from docx.enum.style import WD_STYLE_TYPE
styles = doc.styles
style = styles.add_style('CustomStyle', WD_STYLE_TYPE.PARAGRAPH)
style.font.name = 'Calibri'
style.font.size = Pt(14)

Reading Documents

# Open existing document
doc = Document('existing.docx')

# Read all paragraphs
for paragraph in doc.paragraphs:
    print(paragraph.text)

# Read all tables
for table in doc.tables:
    for row in table.rows:
        for cell in row.cells:
            print(cell.text)

Track Changes & Comments

# Add comment (requires additional libraries)
from docx.oxml import OxmlElement

def add_comment(paragraph, text, author):
    # Create comment element
    comment = OxmlElement('w:comment')
    comment.set('w:author', author)
    comment.set('w:initials', author[0])
    # Add comment text
    p = OxmlElement('w:p')
    comment.append(p)

Best Practices

Use styles instead of direct formatting
Validate content before inserting
Handle exceptions properly
Close documents after editing
Test with various Word versions
Preserve original formatting when editing
Use templates for consistency
Optimize for large documents

Common Operations

Merge Documents

from docx import Document
from docx.oxml.shared import OxmlElement

def merge_docs(files):
    merged = Document()
    for file in files:
        doc = Document(file)
        for element in doc.element.body:
            merged.element.body.append(element)
    return merged

Find and Replace

def replace_text(doc, old, new):
    for paragraph in doc.paragraphs:
        if old in paragraph.text:
            paragraph.text = paragraph.text.replace(old, new)
    return doc

Output Formats

DOCX (native)
PDF (via conversion)
HTML (via conversion)
Plain text extraction

Document Skills