0% found this document useful (0 votes)
33 views4 pages

Deepfilternet

The document outlines the process for running DeepFilterNet to analyze Telugu audio for clarity and filler words. It includes steps for logging into Hugging Face, downloading the model, installing necessary packages, and executing the main script to measure audio noisiness and detect fillers. The script utilizes various libraries to enhance audio quality, translate filler words, and recognize speech, ultimately providing insights into the clarity of the audio and the frequency of filler usage.

Uploaded by

oggu.bhargavi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views4 pages

Deepfilternet

The document outlines the process for running DeepFilterNet to analyze Telugu audio for clarity and filler words. It includes steps for logging into Hugging Face, downloading the model, installing necessary packages, and executing the main script to measure audio noisiness and detect fillers. The script utilizes various libraries to enhance audio quality, translate filler words, and recognize speech, ultimately providing insights into the clarity of the audio and the frequency of filler usage.

Uploaded by

oggu.bhargavi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
You are on page 1/ 4

DeepFilterNet

The below process is about running of deepfilternet display clear or not based on
noisy score and counting fillers in an telugu audio.

Step1: Login to huggingface in locally


!huggingface-cli login
give hugging face token to login successfully

Step2: download the model


!git clone https://huggingface.co/spaces/hshr/DeepFilterNet2

Step3:install the requirements.txt


!pip install -r /content/DeepFilterNet2/requirements.txt

Step4: install the required packages


!pip install torch torchaudio transformers
!pip install deepfilternet
!pip install googletrans==3.1.0a0
!pip install speechrecognition pydub
!apt-get install ffmpeg
!pip install transformers torchaudio librosa pydu
!pip install --upgrade googletrans==4.0.0-rc1 httpcore
!pip install torch torchaudio matplotlib SpeechRecognition googletrans==4.0.0-rc1 pydub

Step5:Run the main file


import torch
import torchaudio
import matplotlib.pyplot as plt
import speech_recognition as sr
from googletrans import Translator
from collections import Counter
from df.enhance import init_df, enhance
import time
from pydub import AudioSegment
from pydub.utils import make_chunks
import os

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


model, df, _ = init_df("/content/DeepFilterNet2/DeepFilterNet2",
config_allow_defaults=True)
model = model.to(device=device).eval()

def load_audio(file_path, sr=48000):


waveform, sr = torchaudio.load(file_path)
return waveform, sr

def measure_noisiness(audio_path, model, sr=48000, noise_threshold=0.5):


waveform, sr = load_audio(audio_path, sr)

if waveform.dim() > 1 and waveform.shape[0] > 1:


waveform = waveform.mean(dim=0, keepdim=True)
enhanced = enhance(model, df, waveform)

spectrogram = torchaudio.transforms.Spectrogram()(enhanced)
spectral_energy = torch.mean(spectrogram.pow(2), dim=0)
normalized_energy = spectral_energy / torch.max(spectral_energy)

noisy_frames = normalized_energy[normalized_energy > noise_threshold]


average_noisy_energy = torch.mean(noisy_frames).item()

return average_noisy_energy

def translate_text(text, src_lang, dest_lang, max_retries=3):


translator = Translator()
retries = 0
while retries < max_retries:
try:
translated = translator.translate(text, src=src_lang, dest=dest_lang)
return translated.text
except Exception as e:
print(f"Error translating '{text}': {e}")
retries += 1
time.sleep(1) # wait before retrying
return None

def segment_audio(audio_path, chunk_length_ms=60000):


audio = AudioSegment.from_file(audio_path)
chunks = make_chunks(audio, chunk_length_ms)
return chunks

def recognize_audio_chunk(chunk, recognizer, language="te-IN",


max_retries=3):
retries = 0
while retries < max_retries:
try:
with sr.AudioFile(chunk) as source:
audio_data = recognizer.record(source)
text = recognizer.recognize_google(audio_data, language=language)
return text
except sr.RequestError as e:
print(f"Recognition error: {e}")
retries += 1
time.sleep(1) # wait before retrying
except sr.UnknownValueError:
print("Speech Recognition could not understand audio")
return None
return None
def detect_fillers(audio_path, chunk_length_ms=60000):
english_fillers = ["alaaga", "inka", "ante", "alage", "antenti", "adi",
"abbo", "abba",
"lla", "ela", "avunu", "leda", "mari", "alāṇṭappuḍu", "poyindante",
"ainappaṭikī", "edo", "tappadu", "appuḍappuḍu", "sare"]

telugu_fillers = []
for filler in english_fillers:
translated_filler = translate_text(filler, 'en', 'te')
if translated_filler:
telugu_fillers.append(translated_filler)
else:
telugu_fillers.append(filler) # fallback to the original text

recognizer = sr.Recognizer()
audio_chunks = segment_audio(audio_path, chunk_length_ms)

text_segments = []
for i, chunk in enumerate(audio_chunks):
chunk_name = f"/tmp/chunk{i}.wav"
chunk.export(chunk_name, format="wav")
text = recognize_audio_chunk(chunk_name, recognizer)
if text:
text_segments.append(text)
os.remove(chunk_name) # Clean up temporary chunk file

full_text = " ".join(text_segments)


print("Recognized Text (Telugu):", full_text)

filler_count = Counter()

for telugu_filler, english_filler in zip(telugu_fillers, english_fillers):


count = full_text.count(telugu_filler)
if count > 0:
filler_count[english_filler] = count

for english_filler, count in filler_count.items():


print(f"Filler '{english_filler}' detected {count} times.")

total_fillers = sum(filler_count.values())
print(f"Total fillers detected: {total_fillers}")
audio_file = "/content/drive/MyDrive/input.wav"

# Measure noisy energy first


average_noisy_energy = measure_noisiness(audio_file, model)
print(f"Average Noisy Energy: {average_noisy_energy}")

if average_noisy_energy <= 0.75:


print("Audio is clear.")
else:
print("Audio is noisy.")
# Then detect fillers
detect_fillers(audio_file)

Output:2024-07-15 07:00:58 | INFO | DF | Loading model settings of


DeepFilterNet2 2024-07-15 07:00:58 | INFO | DF | Initializing model
`deepfilternet2` 2024-07-15 07:00:59 | INFO | DF | Found checkpoint
/content/DeepFilterNet2/DeepFilterNet2/checkpoints/model_96.ckpt.best with
epoch 96 2024-07-15 07:00:59 | INFO | DF | Running on device cpu 2024-07-15
07:00:59 | INFO | DF | Model loaded
Average Noisy Energy: 0.6249315142631531
Audio is clear.
Error translating 'alage': the JSON object must be str, bytes or bytearray,
not NoneType
Error translating 'antenti': the JSON object must be str, bytes or
bytearray, not NoneType
Error translating 'abba': the JSON object must be str, bytes or bytearray,
not NoneType Error translating 'lla': the JSON object must be str, bytes or
bytearray, not NoneType Error translating 'sare': the JSON object must be
str, bytes or bytearray, not NoneType Recognized Text (Telugu): పని ప్రతి
సినిమాకు ఒక ఎక్కడ ఉంది కదా సార్ ఊహలు గుసగుసలాడే కి ఓ జి జి ఎల్ అని పెట్టారు
అంత మంచిది అనిపిస్తుంది అప్పుడప్పుడు మీ సినిమాలు ఎందుకు పెట్టారు ఈ ఫిలిం గా
మన ఫిబ్రవరిలో అబ్బాయి అమ్మాయి కలిసి ఉన్నారు అందుకని ఎక్కువగా వాడుతూ ఉంటారు
అమ్మాయి చేతిలో పెట్టి అవును అవును చేశారు నాకు అవన్నీ సెకండ్ ఇయర్ అండి నాకు
అల్టిమేట్ మేటర్స్ ఇస్ అబ్బాయి అమ్మాయి మీద కూడా సెకండరీ ఇద్దరు మనుషులు అంతే
అండ్ వాళ్ళిద్దరూ ఒక లిస్టు పడినప్పుడు వాట్ ఆర్ ద ప్రాబ్లమ్స్ ప్రాబ్లమ్స్
కూడా కాదు ఆ ఇష్టం ఎందువల్ల దానికి ఏమి అడ్డు పడింది దాని వల్ల ఎలా చేయించారు
అనేది నాకు మెయిన్ బీర్ బాటిల్ ...................
Filler 'inka' detected 3 times.
Filler 'abba' detected 3 times.
Filler 'ela' detected 10 times.
Filler 'mari' detected 2 times.
Total fillers detected: 18

You might also like