I’m trying to define a function which:
- Reads in each 1min audio file from a directory
- Calculates features for each second of each 1min file, returning the numpy array
featsof shape (60, 96, 64) for each file, where 60 denotes each second - Takes the mean across all seconds in a 1min file to return the array of
features_from_one_fileshape (96, 64) - Appends each of these mean arrays to the 3D array
features_allfiles, so that each 1min file is represented as a dimension (correct term?) infeatures_allfiles. e.g if five 1min files were used, this would have shape (5, 96, 64) - I then aim to adapt this so that any files n minutes in length will have their
featsarrays split by n, so that the average feats are returned on a per minute basis.
I’ve got stuck at step four, so could use help with this, any suggestions for step 5 also welcomed!
Here’s my code so far:
def get_features(directory):
audio_fs = os.listdir(directory) #list of all files in directory
features_allfiles = np.empty([0,96,64])
for f in audio_fs:
#find file:
path = os.path.join(directory, f)
#calculate features from audio file:
feats = vggish_input.wavfile_to_examples(path)
print(np.shape(feats)) #this returns (62, 96, 64) for a 1min file
#Get the mean of the these 62 2D arrays
features_from_one_file = np.mean(feats, axis = 0)
print(np.shape(features_from_one_file)) #this returns (96, 64)
#Append the mean of each file to features_allfiles, so that it has shape (n, 96, 62), where n = number of files
???
return features_allfiles
>Solution :
You can use np.vstack but first you have to add a new dimension to features_from_one_file:
import numpy as np
features_allfiles = np.empty([0,96,64])
for i in range(5):
#new features
features_from_one_file = np.random.randn(96,64)
#vertical stack, [None,:] recast array adding new dimension first
# you can also use features_from_one_file.reshape(1,96,64)
features_allfiles = np.vstack([features_allfiles,
features_from_one_file[None,:]])
print(features_allfiles.shape)
wich outputs
(5, 96, 64)