magnet-html-parser

Extracts key document metadata from an HTML string.

const parser = require('magnet-html-parser');
parser.parse(html, url).then(result => ...);

Results

The parser will attempt to return the following metadata:

title - A title of the page
description - A description of the page
image - An image that best represents this page
icon - The site's icon
siteName - The name of the site which this page belongs
themeColor - A theme color of the site

What metadata is returned completely depends on how the page is marked-up. We cannot guarantee any results.

Hash Fragments

If the given URL has a hash fragment extension, magnet-html-parser will attempt to fetch an element with the given ID and return a title, description and image from that scope.

title - The textContent of the first heading (<h1>,<h2>,<h3>, ...) descendant of the fragment element
description - The textContent of the first <p> descendant of the fragment element
image - The src of the first <img> descendant of the fragment element.
scoped - Will be true when content was found within the given fragment scope.

Adaptors

An adaptor is a custom parser that performs some specific metadata extraction for a matched URL pattern. The library comes with some built in adaptors to cope with some common sites, but you are able to add more.

const parser = require('magnet-html-parser');
const adaptor = parser.adaptor;

parser.use(adaptor.extend({
  pattern: /example.com/,

  title() {
    var mainHeading = $('h1').first();
    return mainHeading.text();
  },

  description() {
    var firstParagraph = $('p').first();
    return firstParagraph.text();
  },

  image($, data) {
    return $('.main-image').attr('src');
  }

  ...
}));

As you can 'extending' the base adaptor, you only need to define the properties you wish to customise, all other properties will be parsed using the default logic.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
lib		lib
test		test
.eslintrc		.eslintrc
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md
config.json		config.json
index.js		index.js
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

magnet-html-parser

Results

Hash Fragments

Adaptors

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

mozilla-magnet/magnet-html-parser

Folders and files

Latest commit

History

Repository files navigation

magnet-html-parser

Results

Hash Fragments

Adaptors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages