Get the latest tech news
Show HN: Defuddle, an HTML-to-Markdown alternative to Readability
Extract the main content from web pages. Contribute to kepano/defuddle development by creating an account on GitHub.
It cleans up web pages by removing clutter like comments, sidebars, headers, footers, and other non-essential elements, leaving only the primary content. Defuddle attempts to standardize HTML elements to provide a consistent input for subsequent manipulation such as conversion to Markdown. If present, line numbers and syntax highlighting are removed, but the language is retained and added as a data attribute and class.
Or read this on Hacker News