HOW TO get movie scripts and lyrics of songs - the hard way, from subtitles
The easy way to get movie scripts and lyrics of songs is to get them straight from the internet through websites dedicated offering them.
If a transcript is not available as the movie is new or it does not have a large audience or the quality of the English or desired language subtitles of a foreign language movie isn't up to the mark, here is a way to get it yourself.
There are a good number of streaming sites now that provide better subtitles (especially for foreign language moves) than just auto-generated captions to differentiate their premium offerings.
Using the Network tab of browser Developer Tools, get the subtitles file while the movie is playing.
While there are a variety of subtitle file formats such as srt, stl, scc, ass, ssa, xml, ttml, qt, txt, vtt, dfxp, smi, csv, sub, rt, sbv, the most commonly used file formats preferred by streaming sites have vtt or ttml in the file extension.
Once you get hold of the transcript, you'll notice that it has a lot of tags & timestamps in order to make the dialogues meaningful and contextualized while the movie is playing. By removing the tags with Excel (!) as explained below, you can have a plain text version that is easier to follow.
To get rid of the tags and timestamps within them, open Microsoft Excel and copy paste the subtitles inside one cell. Press Ctrl+H to invoke the Replace All dialog box.
In the replace tab type <*> in the Find What textbox and leave the Replace With textbox blank, and click Replace All. The search expression will remove all tags within the original text.
It is customary for Bollywood movies to have songs. Using the above technique you can search within the text with keywords from the song and extract just song lyrics in English or any other language for which subtitles are provided.
If a transcript is not available as the movie is new or it does not have a large audience or the quality of the English or desired language subtitles of a foreign language movie isn't up to the mark, here is a way to get it yourself.
There are a good number of streaming sites now that provide better subtitles (especially for foreign language moves) than just auto-generated captions to differentiate their premium offerings.
Using the Network tab of browser Developer Tools, get the subtitles file while the movie is playing.
While there are a variety of subtitle file formats such as srt, stl, scc, ass, ssa, xml, ttml, qt, txt, vtt, dfxp, smi, csv, sub, rt, sbv, the most commonly used file formats preferred by streaming sites have vtt or ttml in the file extension.
Once you get hold of the transcript, you'll notice that it has a lot of tags & timestamps in order to make the dialogues meaningful and contextualized while the movie is playing. By removing the tags with Excel (!) as explained below, you can have a plain text version that is easier to follow.
To get rid of the tags and timestamps within them, open Microsoft Excel and copy paste the subtitles inside one cell. Press Ctrl+H to invoke the Replace All dialog box.
In the replace tab type <*> in the Find What textbox and leave the Replace With textbox blank, and click Replace All. The search expression will remove all tags within the original text.
English lyrics of the Saraiki language Sufi song Dama Dam Mast Qalandar extracted using the above mentioned technique
Comments
Post a Comment