Audio Features Extraction With JavaScript and Essentia | by Joseph Nma | Nov, 2022

Here’s how you can use Essentia.js to find out how energetic your favorite songs are!

photo by Stephanie Andrade Feather unsplash

Have you ever wondered how Spotify recommends music to its users?

In October, I wrote a winner guide for analytics Blogathon 25, In that article, I shared how to build a replica Spotify backend including recommendation system.

But today, we will focus on analyzing different aspects of music with NodeJS. And, similar to my Blogathon article, we’ll be using a tool called Essentia.

Well, essential element There is a library of tools for getting information from audio. It includes a corpus of analysis algorithms that can perform tasks such as motion and key extraction, loudness detection, and more. Their free and open-source library also includes a set of Tensorflow deep learning models for more advanced analysis such as genre and mood recognition.

Essentia is from Music Technology Group, the same team behind freesound.org,

Their core library is in C++, but they have bindings for both Python and JavaScript.

To see what features we can use for song analysis, we can look at some of the audio features available Spotify’s API,

Here are the ones I’ve chosen: danceability, duration, energy, key, mode, loudness, tempo

Now coming to the analysis!

In a terminal within your node project, run the following command:

npm i essentia.js

All audio passed to the Essentia algorithm needs to be decoded into an array and then into a vector.

npm i audio-decode

This library can decode both MP3 and WAV files and return data for each audio channel. The decoded audio then requires conversion to a vector (a C++-style vector).

Now we can use the following to import the libraries and install Essentia:

const  Essentia, EssentiaWASM  = require("essentia.js");
const fs = require("fs");
const decode = require("audio-decode");
const essentia = new Essentia(EssentiaWASM);

And here is the function that will decode the audio:

const decodeAudio = async (filepath: string) => 
const buffer = fs.readFileSync(filepath);
const audio = await decode(buffer);
const audioVector = essentia.arrayToVector(audio._channelData[0]);
return audioVector;
;

To test Essentia’s algorithmic implementation, we will use audio file From pixabay. Put it in your project folder as “audio.mp3”.

(async () => 
const path = "./audio.mp3";
const data = await decodeAudio(path);

// ...
)()

Danceability refers to how suitable the song would be for dancing. It is a mix of other factors such as beat strength and rhythm.

const computed = essentia.Danceability(data);
// danceability: N

const danceability = computed.value;

Duration is the length of a piece of music.

const computed = essentia.Duration(data);
// duration: N

const duration = computed.value;

Mathematically, the energy of a signal is the area under its curve on a graph. In musical terms, energy measures intensity and activity.

const computed = essentia.Energy(data);
// energy: N

const energy = computed.value;

A musical key is a set of notes that form the basis of a song.

Mode refers to the type of scale – major or minor.

const computed = essentia.KeyExtractor(data);
// "minor", strength: N

const KEYS = ["C", "D", "E", "F", "G", "A", "B"];

const key = KEYS.indexOf(computed.key);
const mode = computed.scale === "major" ? 1 : 0;

Loudness refers to how loud a song is in decibels (dB).

const computed = essentia.DynamicComplexity(data);
// dynamicComplexity: N, loudness: N

const loudness = computed.loudness;

Tempo is the speed of a piece of music in beats per minute.

const computed = essentia.PercivalBpmEstimator(data);
// bpm: N

const tempo = computed.bpm;

And it’s all audio characteristics analyzed. But now that we have all this information, what should we do with it? Well, one suggestion would be to build a song recommendation system, which I’ve already explained Here, Another idea would be to make it available through a REST API.

For this API, we will use Express To handle incoming requests and send responses. Also, we can use formidable to handle file uploads, so see how to use Here,

So, when the client uploads the file, we will decode it, analyze it, and then return the audio features.

import express,  NextFunction  from "express";

import formidable from "formidable";

import fs from "fs";
import IncomingMessage from "http";
import Essentia, EssentiaWASM from "essentia.js";
import decode from "audio-decode";
import IncomingForm from "formidable/Formidable";

const app = express();
const port = 3000;

const essentia = new Essentia(EssentiaWASM);

const KEYS = ["C", "D", "E", "F", "G", "A", "B"];

app.use(express.json());
app.use(express.urlencoded( extended: true ));

const parseForm = async (
form: IncomingForm,
req: IncomingMessage,
next: NextFunction
): Promise< fields: formidable.Fields; files: formidable.Files > =>
return await new Promise((resolve) =>
form.parse(
req,
function (
err: Error,
fields: formidable.Fields,
files: formidable.Files
)
if (err) return next(err);
resolve( fields, files );

);
);
;

const decodeAudio = async (filepath: string) =>
const buffer = fs.readFileSync(filepath);
const audio = await decode(buffer);
const audioVector = essentia.arrayToVector(audio._channelData[0]);
return audioVector;
;

app.post("/upload", async (req, res, next) =>
const form = formidable();

const files = await parseForm(form, req, next);

// The file uploaded must have the field name "file"
const file = files.file as any;

const data = await decodeAudio(file.filepath);

const danceability = essentia.Danceability(data).danceability;
const duration = essentia.Duration(data).duration;
const energy = essentia.Energy(data).energy;

const computedKey = essentia.KeyExtractor(data);
const key = KEYS.indexOf(computedKey.key);
const mode = computedKey.scale === "major" ? 1 : 0;

const loudness = essentia.DynamicComplexity(data).loudness;
const tempo = essentia.PercivalBpmEstimator(data).bpm;

res.status(200).json(
danceability,
duration,
energy,
key,
mode,
loudness,
tempo,
);
);

app.listen(port, () =>
return console.log(`Express server listening at http://localhost:$port`);
);

and that’s all! If you liked this article, stay tuned for more.

And if you have any other tips on how to use these audio features, leave a comment.

You can find the full code of this article at github,

goodbye for now.

Leave a Reply