html2md is a Python script that converts HTML (complete or fragments) into Markdown.
html2md was inspired by Aaron Swartz's html2text and is adding support for missing elements that are common in HTML pages without compromising the Markdown format.
html2md.py [-h] [-a] [-f] [--fenced_code {github,php}] [-e ENCODING]
[infile]
Transform HTML file to Markdown
positional arguments:
infile
optional arguments:
-h, --help show this help message and exit
-a, --attrs Enable element attributes in the output (custom
Markdown extension)
-f, --footnotes Enabled footnote processing (custom Markdown
extension)
--fenced_code {github,php}, --fencedcode {github,php}, --fenced {github,php}
Enabled fenced code output
-e ENCODING, --encoding ENCODING
Provide an encoding for reading the input
Using it from your code:
import html2md
print html2md.html2md("<p>Getting rid of HTML with html2md. Yey!</p>")
You can pass in different options
footnotes
:True|False
(defaultFalse
) convert footnotesfenced_code
:default|github|php
(default:default
) convert code snippets into fenced codeattrs
: convert HTML attributes. This is a custom extension and should not be used.
Short version: OK for open source projects. OK for commercial projects with my signed agreement only.
Long version: see the License file in the project.