HOW TO convert HTML content to plain text - with Excel!

There may be times when you need to extract just the text from a glob of HTML copied from the source as the content couldn't be copied or the text on the web page was hidden. Recently, I wanted to get the subtitles of a YouTube video, but it wasn't easy to copy it from the transcript. I couldn't also locate the timedtext file that contains the subtitles so I had to point at the Transcript block using Developer Tools (F12 keyboard shortcut) and get the HTML.

Here's the trick I tried -

Now that I had the text in HTML format, I copied it to Excel, selected Ctrl+H to invoke the Replace dialog box and in the Find What textbox I typed <*> and hit the Replace All button after leaving the Replace With textbox blank. 

That removed all the tags along with its attributes and left just the text.

Comments

Popular posts from this blog

Maven Crash Course - Learn Power Query, Power Pivot & DAX in 15 Minutes

"Data Prep & Exploratory Data Analysis" course by Maven Analytics

Oracle Cloud Infrastructure 2024 Generative AI Professional Course & Certification Exam (1Z0-1127-24)