Christian Heilmann

A code snippet to scrape all headings and their target URLs from a markdown generated page

Friday, February 19th, 2021 at 7:54 pm

When you use Markdown to write your documentation most static page generators will generate IDs for each of the headings in the document to allow you to navigate directly to them.

Markdown: 
## Gerbils and other rodents
HTML: 
<h2 id="gerbils-and-other-rodents">Gerbils and other Rodents</h2>

To go directly there you can then use https://example.com#gerbils-and-other-rodents if you published at example.com.

The other day I was asked to create a list of all the links in the What’s new in Devtools 89 document, which is generated from Markdown. The list should be the text of the headline followed by the full URL to get to that part of the document. This was to batch generate some shortURLs from them.

I am pretty sure there are a lot of clever ways to do that by scraping but as I like my browser environment, I just used the Console to do that. Here’s the script that you can paste into the Console:

let out = '';
$$(':is(h1,h2,h3,h4,h5,h6)[id]').forEach(elm => {
   out += `${elm.innerText}
${document.location.href}#${elm.id}
` 
});
copy(out);

You can see it in action in the following GIF:

Running the script in the console to get all the heading information

The next step was to store this as a Snippet and next time I just need to run that.

Share on Mastodon (needs instance)

Share on Twitter

Newsletter

Check out the Dev Digest Newsletter I write every week for WeAreDevelopers. Latest issues:

Dev Digest 146: 🥱 React fatigue 📊 Query anything with SQL 🧠 AI News

Why it may not be needed to learn React, why Deepfake masks will be a big problem and your spirit animal in body fat! 

Dev Digest 147: Free Copilot! Panel: AI and devs! RTO is bad! Pi plays!

Free Copilot! Experts discuss what AI means for devs. Don't trust containers. Mandated RTO means brain drain. And Pi plays Pokemon!

Dev Digest 148: Behind the scenes of Dev Digest & end of the year reports.

In 50 editions of Dev Digest we gave you 2081 resources. Join us in looking back and learn about all the trends this year.

Dev Digest 149: Wordpress break, VW tracking leak, ChatGPT vs Google.

Slowly starting 2025 we look at ChatGPT vs Google, Copilot vs. Cursor and the state of AI crawlers to replace web search…

Dev Digest 150: Shifting manually to AI.

Manual coding is becoming less of a skill. How can we ensure the quality of generated code? Also, unpacking an APK can get you an AI model.

My other work: