DOCX - Word Documents

Create, edit, and analyze Word documents with tracked changes, comments, and formatting using python-docx and other libraries.


npx degit LangbaseInc/agent-skills/document-skills/docx my-docx-skill

Programmatically work with Microsoft Word documents (.docx) for automation, analysis, and manipulation.


Document Creation

  • Create new documents from scratch
  • Use templates
  • Add headers and footers
  • Set page layout and margins

Content Manipulation

  • Add/edit paragraphs
  • Insert tables
  • Add images
  • Create lists (bulleted/numbered)
  • Insert hyperlinks

Formatting

  • Text formatting (bold, italic, underline)
  • Font family and size
  • Text color and highlighting
  • Paragraph alignment
  • Line spacing and indentation
  • Styles and themes

Advanced Features

  • Track changes
  • Add comments
  • Insert footnotes/endnotes
  • Table of contents
  • Page breaks and sections
  • Headers and footers

Python

from docx import Document from docx.shared import Inches, Pt, RGBColor # Create new document doc = Document() # Add heading doc.add_heading('Document Title', 0) # Add paragraph p = doc.add_paragraph('This is a paragraph.') p.add_run(' This is bold.').bold = True # Add table table = doc.add_table(rows=2, cols=2) table.cell(0, 0).text = 'Header 1' # Save doc.save('document.docx')

Node.js

const docx = require('docx'); const { Document, Paragraph, TextRun } = docx; const doc = new Document({ sections: [{ properties: {}, children: [ new Paragraph({ children: [ new TextRun("Hello World"), new TextRun({ text: "Bold Text", bold: true, }), ], }), ], }], });

Document Generation

  • Automated report creation
  • Contract generation
  • Invoice creation
  • Certificate generation
  • Letter templates

Document Analysis

  • Extract text content
  • Count words/pages
  • Analyze structure
  • Extract metadata
  • Find and replace

Document Transformation

  • Convert to other formats
  • Merge multiple documents
  • Split documents
  • Update templates
  • Batch processing

# Font formatting run = paragraph.add_run('Formatted text') run.font.name = 'Arial' run.font.size = Pt(12) run.font.bold = True run.font.italic = True run.font.underline = True run.font.color.rgb = RGBColor(255, 0, 0)

# Create table table = doc.add_table(rows=3, cols=3) table.style = 'Light Grid Accent 1' # Populate cells for row in table.rows: for cell in row.cells: cell.text = 'Data' # Access specific cell cell = table.cell(0, 0) cell.text = 'Header'

# Add image doc.add_picture('image.png', width=Inches(2.0)) # Add image to specific location paragraph = doc.add_paragraph() run = paragraph.add_run() run.add_picture('image.png', width=Inches(1.0))

# Apply built-in style doc.add_paragraph('Text', style='Heading 1') doc.add_paragraph('Text', style='List Bullet') # Custom style from docx.enum.style import WD_STYLE_TYPE styles = doc.styles style = styles.add_style('CustomStyle', WD_STYLE_TYPE.PARAGRAPH) style.font.name = 'Calibri' style.font.size = Pt(14)

# Open existing document doc = Document('existing.docx') # Read all paragraphs for paragraph in doc.paragraphs: print(paragraph.text) # Read all tables for table in doc.tables: for row in table.rows: for cell in row.cells: print(cell.text)

# Add comment (requires additional libraries) from docx.oxml import OxmlElement def add_comment(paragraph, text, author): # Create comment element comment = OxmlElement('w:comment') comment.set('w:author', author) comment.set('w:initials', author[0]) # Add comment text p = OxmlElement('w:p') comment.append(p)

  • Use styles instead of direct formatting
  • Validate content before inserting
  • Handle exceptions properly
  • Close documents after editing
  • Test with various Word versions
  • Preserve original formatting when editing
  • Use templates for consistency
  • Optimize for large documents

Merge Documents

from docx import Document from docx.oxml.shared import OxmlElement def merge_docs(files): merged = Document() for file in files: doc = Document(file) for element in doc.element.body: merged.element.body.append(element) return merged

Find and Replace

def replace_text(doc, old, new): for paragraph in doc.paragraphs: if old in paragraph.text: paragraph.text = paragraph.text.replace(old, new) return doc

  • DOCX (native)
  • PDF (via conversion)
  • HTML (via conversion)
  • Plain text extraction