This morning, on day 8 of the 2023 Israel—Hamas War, my amazing girlfriend is mediating a dialog/debate between a Muslim and a Jew, about the Israel-Palestine conflict.

Last night, she was studying in preparation for the discussion, and we ended up talking a bit about the history of the conflict. I mentioned that a few years ago I had listened to the “Fear and Loathing in the New Jerusalem” series by the MartyrMade Podcast. That series is a marvelous 23 hours of historical deep-dive on the events leading up to the establishment of the state of Israel in 1948. It is an INTENSE and DENSE series, with countless names, dates, quotations, and citations.

That being said, it’s also a series I would highly recommend almost anyone listening through, especially if you want to have a better foundation for understanding the ever-developing situation in that region of the world.

In our discussion, I mentioned that one of the things that shocked me when listening through that podcast was the fact that, in the late 1800s, Jews and Muslims weren’t really at conflict with each other in the ways we see today. The podcast host had talked about the fact that, in contrast to a popular Western understanding that suggests that Jews and Arabs have basically been in conflict since the time of Abraham (as Jews and Arabs are of the lineage of Abraham’s sons, Isaac and Ishmael, respectively), the truth is that, in the late 1800s, many European Jews thought that moving into the mostly Muslim occupied territory of the Levant would be far safer and much more preferable to staying in the Christian majority Europe, where pogroms were regularly resulting in countless murders of those in the Jewish community.

My girlfriend agreed that this was a fascinating fact to include, and asked me if I had any sources for it.

I gave a grimaced shrug 😬🤷🏻‍♂️ and I said, “ummm… there’s some great quotes about it somewhere in this 23 hours of podcast 😅😅😅 ”

She replied, “Well, if you can find a specific, quotable source, I would love to make use of it in our discussion tomorrow!”

Welp, gotta do what we gotta do for love. 🥰 Let’s see if we can’t find. 😁

1. Setup 🏗️

We’re gunna get pretty nerdy here. 😎 Lots of Linux commands. If you want to do this too, it’ll require at least some level comfortability with the command-line. I did this on Arch Linux, but it should work fine on any flavor of Linux, or on MacOS. There’s no reason it shouldn’t work on Windows, too, but you’d need to rewrite the ZSH code I used into PowerShell.

The tool I used to do this was OpenAI’s speech-to-text engine, Whisper. Specifically, I used the C++ rewrite called whisper.cpp, which is much faster than the original Python implementation.

This tool is awesome. It can transcribe audio from dozens of different languages, automatically timestamp and/or create subtitle files, and it can even auto-translate the transcriptions it makes into English. All free. All right on your computer (doesn’t need to go through an online server; it can be run fully offline).

Thankfully, I already had the tool setup and installed for my work, as projects that involve multilingual multimedia comes up quite regularly for me.

I won’t go through the installation instructions, as the GitHub repo explains it well enough on its own. I will just say that, for my workflow, ffmpeg also needs to be installed.

I wrote a few aliases for my command line to be able to use Whisper efficiently when I need it. One of those aliases I created specifically for the purpose of transcribing English very quickly, which is what I used for this project.

Here’s the ZSH alias in question.

whisp-tiny(){
    for file in "$@"; do # loop through all the files provided

	# Convert file to 16kbs .wav file
        ffmpeg -i "$file" -ar 16k -ac 1 -acodec pcm_s16le -f wav "${file%.*}.wav"

	# Run whisper.cpp on the new .wav file
        $HOME/.local/src/whisper.cpp/main -t 18 \
	-otxt true -ovtt true \
	-pc true -l 'auto' \
        -m $HOME/.local/src/whisper.cpp/models/ggml-tiny.en.bin -f "${file%.*}.wav"
    done
}

A few further nerdy details…

This code assumes you’ve stored the whisper.cpp repository in the $HOME/.local/src/ directory. If you’ve stored it somewhere else, update accordingly. Also note that this requires having downloaded the tiny.en Whisper model with the make tiny.en command, as explained in the docs. I have other commands for other models if I need higher accuracy in languages other than English (or, if I want to auto-translate to English in a language I don’t yet know well), but for speed-running English, this is the setup I use.

You may also need to change the -t flag if you have fewer (or more) than 18 threads on your computer.

When run, this command will generate 3 additional files from whatever your source file was. It will take whatever media you give it (any audio or video file that ffmpeg supports), and it will create a .wav file of the audio with the codec and bitrate that Whisper requires for input. It will then run Whisper on that .wav file, and generate a couple text files — a .txt file and .vtt file. The .txt file will be just the text of the transcription. The .vtt file will be a subtitle file, which includes all the timestamp data of where different words and sentences are found. 😲

That will be useful for our project!!

2. The Search 🔍

Thankfully, I actually had those tools set up on my computer already when I started this search. So, I downloaded the podcast episodes to a directory on my computer, and ran the command whisp-tiny *.mp3 on the directory, to run Whisper on all the mp3 files. Then, I went and did something else for a few minutes, as it takes some time to chug through all that audio. ☕😊

I came back a few minutes later, and decided to start searching the resulting text, even though not all the files had been processed yet. On my computer the command above turns speech into text at a rate of roughly 30 seconds per second (or, about 1 hour of audio for every 2 minutes of processing) So, it took about 5 minutes to process the first 2-and-a-half hour long episode. It would have taken about 45 minutes to process all 23 hours of podcast, but thankfully I didn’t need to let it continue running that long.

When there finally were some files ready, I knew I wanted to search all the .vtt files that my script created, as it would have the timestamp data along with the words. I could use that to find where in the podcast the discussion about the topic was.

I remembered that one of the quotes included the name “Ishmael”. So, I used the text search tool grep to search through all the text files in the directory by running grep -R Ishmael. That outputted the following:

$ grep -R Ishmael
Fear_and_Loathing_ep_1.wav.vtt: brother Ishmael in his time
Fear_and_Loathing_ep_1.wav.vtt: descendants of Ishmael, notice that the Arabs and the
Fear_and_Loathing_ep_1.wav.vtt: like savages, our brother Ishmael
...

Bingo! The discussion was in episode 1!

I opened Fear_and_Loathing_ep_1.wav.vtt in my text editor, and searched for “Ishmael” again, and found that that line is said at 00:29:33.040. I opened the mp3 to that timestamp, skipped back a bit, and started listening. 🎧

Aside: A more complicated method with slightly fewer steps…

A method with slightly fewer steps to have instantly seen the timestamps for all the places “Ismael” is mentioned throughout the podcast would have been to instead run:

grep -B 1 -H "Ishmael" *.vtt

This would have outputted the filename (-H), the text, and the timestamps (the -B 1 flag shows the line above any seach matches when using grep. The timestamps are always one line above respective texts in these vtt files), all in one command. I can never think of those flags quick enough in the moment, though, so a couple searches like described was a bit faster for me. But if you needed all the instances of a word or phrase in a big search like this, this command could be helpful.

The output from that command:

$ grep -B 1 -H "Ishmael" *.vtt

Fear_and_Loathing_ep_1.wav.vtt-00:29:33.040 --> 00:29:34.420
Fear_and_Loathing_ep_1.wav.vtt: brother Ishmael in his time
--
Fear_and_Loathing_ep_1.wav.vtt-00:29:47.300 --> 00:29:50.200
Fear_and_Loathing_ep_1.wav.vtt: descendants of Ishmael, notice that the Arabs and the
--
Fear_and_Loathing_ep_1.wav.vtt-00:30:00.910 --> 00:30:03.580
Fear_and_Loathing_ep_1.wav.vtt: like savages, our brother Ishmael

I then listened to the quotes, and Googled a few of them, which quickly resulted in two great sources related to the topic, including this really fascinating quote from historian Larry Collins in his book O Jerusalem:

“With few exceptions, the Jewish people had dwelt in relative security among the Arabs over the centuries. The golden age of the diaspora had come in the Spain of the caliphs, and the Ottoman Turks had welcomed the Jews when the doors of much of Europe were closed to them. The ghastly chain of crimes perpetrated on the Jewish people culminating in the crematoriums of Germany had been inflicted on them by the Christian nations of Europe, not those of the Islamic East.”

3. Mission Accomplished! 🫶🏻

After I found those quotes and their sources, I sent them back to my girlfriend.

She responded, “Wow, these are great! I’m definitely going to translate one of these and use it tomorrow. How did you find them so quick?? You said there’s hours of content. It’s been like only 10 minutes!”

There are lots of big societal questions that accompany the AI revolution we’re in the middle of…

There are countless of horrors that we hear about day-after-day as we watch nation going to war against nation…

But, for today at least, being able to impress my girlfriend by using one of those AI developments,

as she and others like her take steps to work towards peace in the face of those conflicts,

feels like a good thing.

Te amo! ❤️
Happy Girlfriend

1. Setup 🏗️#

2. The Search 🔍#

3. Mission Accomplished! 🫶🏻#

1. Setup 🏗️

2. The Search 🔍

3. Mission Accomplished! 🫶🏻