Moore’s Law Vs. the Reality of Quality Conference Calls

The evidence of Moore’s Law is undeniable across many technological achievements.  Moore’s Law, you may have heard, as put forth by Gordon E. Moore, co-founder of Intel says:

“The number of transistors incorporated in a chip will approximately double every 24 months.”

The reality of Moore’s Law has driven similar kinds of doubling trends as well, such as in memory capacity, and even the number of pixels in a digital camera.

Furthermore, the impact of Moore’s Law on the user experience across a wide array of products and services has made them faster, cheaper, more reliable and of higher quality. Yet, while this has been the case for many, audio conference quality has not improved in the same way. In fact, audio conferencing has gotten much worse in some people’s eyes.

Complaints on the rise

Complaints about call conference quality are on the rise, and it’s not just because someone is on a cell phone introducing background noise. It’s often hard to get into a conference. Having to enter 10 digits and stumble through multiple prompts makes joining a call cumbersome and error-prone; but let’s address a whole other dimension of conference call quality: conversational speech.

Conversational speech has an ebb and flow to it where all parties are able to join in a dialogue without experiencing issues such as echo, static, or latency. When these issues occur regularly, they cause the discussion to start and stop in an awkward manner.

Pass the mic

Some of today’s conference calls are sounding like Nextel’s push-to-talk.  Remember the commercials from 8 years ago?  Where only one mic is on at a time, and you have one person talking: “Hey Mike, that concrete load arrived” – chirp – “Great, I’ll be right over” – chirp.

The walkie-talkie approach worked great for Nextel customers who simply wanted to push to talkcommunicate one-to-one without the need for active dialogue.  On the other hand, when you have a conference call between even just 4 or 5 people, let alone 50 or more, having only one mic on at a time is a real problem: we lose the ability to interrupt, for dialogue to be conversational, and for the experience to be collaborative.

Only having one open mic at a time means expressing a simple “ah-huh” or an inquisitive “Wait, can you expand on that?” is literally lost in the conversation. Conversation becomes simplex rather than duplex, standard definition vs. high-def., and that means we’re going backwards instead of forwards – as Moore’s Law promised.

Moore’s Law applied to improving conference call quality is conversation without limits; much like you’d have face-to-face in a room full of engaged people. This is where people are able to communicate openly because the conference is about the quality of their content, not about the shortcomings of the technology they’re using.

Snap, crackle, pop…

Another issue with the quality of conversational speech is the challenge to eliminate or reduce the effects of unwanted audio artifacts on group conversations. You’ll recognize these artifacts as the sounds of static, popping, tapping, echo or even latency that results in out-of-sync speech – it’s like watching a movie where the audio is a second or two behind the action. (The problem is compounded with the more people you have in conference.) Who wants to be on a conference call with those kinds of quality issues?

But there’s something else no one wants, and that’s to have to endure a conference call with herky-jerky conversation, where only one person can talk at a time when others have something to contribute – in the here and now. We want the quality of our conference calls to be just like our face-to-face meetings, manageable, yet interactive. We need our conferencing technology to do for us what we’d normally do in person: look at facial expressions and decide when it’s safe to talk without being rude!

What could be…

Now imagine what could happen if the spirit of Moore’s Law was introduced in a way that did improve the quality of your group conference calls? What would that look like? It would look like a conference call system engineered with more mics opened and fewer artifacts so the net result is a more conversational experience without all the drawbacks.

A high-quality conferencing system is absolutely achievable. Before we explore how to get there, let’s first look at some of the drivers and causes of poor quality call conferencing.

Quality-Focused VS. Cost-Focused Call Conferencing

voip wordcloudSome may point fingers at the underlying bridging or transport technology and believe, for instance, VOIP is the culprit. But VOIP itself is not the problem, just as TDM alone can’t be looked upon to be the sole savior.

It’s not about implementing just any one technology, but rather, how technologies can be combined.   Such an approach often means an implementation where quality matters more than cost.

In a conferencing service with the goal of more open mics and fewer artifacts to achieve conversational speech, the driver is quality, not cost. Providers who look to squeeze every last cent out of their cost infrastructure know there are tradeoffs. They can focus on removing artifacts like noise, static and echo, or they can invest in engineering better multi-talker algorithms that also limit the artifact effects, but rarely do they do both.

Conversely, providers who have a quality vs. cost-focused approach are willing to invest more:

  • They pay more to produce better noise cancellation and therefore no echo
  • They pay more to have higher quality codec’s so there’s less static
  • They pay more to have more powerful processing, which opens up more mics with no latency or out-of-sync problems
  • They pay more to have more powerful bandwidth, fewer hops and sufficient peering to create a high-performing underlying transport that’s better than the individual parts

They pay more and do more to find a solution so that the call conference quality can be a premium experience to the professionals they serve without a premium price.

Artifacts and open mics are controlled by two primary technologies

  1. Transport
  2. Bridging

Transport is the physical route the voice information travels between each end point and the bridging service.  In the old analog days, the transport was a dedicated physical circuit between each landline endpoint and the bridging server.  These analog circuits were designed for “five 9’s” audio quality and reliability.

Now however, transport is VOIP based in many cases.  Depending on the actual endpoint, and the architecture of the provider, the packets literally travel 1,000’s of miles, traversing multiple routers, soft switches and internet service providers.  Different codec’s (audio to digital converters) are utilized throughout the duration.  Results vary significantly, and the following artifacts are often introduced:

ARTIFACT TYPICAL CAUSES
Delay Excessive hops, long sections on the public internet, high latency connections
Echo Codec incompatibility, low quality noise cancelation software
Static, pops, fuzz Packet drop, poor interconnectivity, congestion, poorly prioritized traffic, poor QOS, insufficient peering

VOIP by itself, does not cause artifacts, but compared to the old analog days, there is much more variability.  You have probably experienced that yourself using Skype compared to using your office phone.  For sure, the design and implementation of how the transport occurs has a major impact on the amount of artifacts.  A higher quality transport network is more expensive to operate.

Bridging is the technology that actually ‘mixes’ everyone’s voice on the call, and redistributes it back to everyone, allowing everyone to hear each other.  With a 2 person call, the ‘mixing’ is very basic and straightforward:  mix it all and redistribute it all.   With 10 people on a call, the mixing is very complex.  The complexity can cause significant artifacts, just like Transport does.  For example, bridging can cause delays when the mixing processing takes too long.

A major challenge for bridging is to identify which mic should be open.   In a hypothetical call with 10 people, there is a little bit of noise coming from each line. If the conferencing bridge combined it all, it would just sound like static or loud white noise. This is especially true if your transport introduces a lot of artifacts.  As the preceding section describes, a lower grade transport architecture (read: less costly transport) results in more artifacts.

With lower grade transport architecture, the bridging technology must turn off more mics to eliminate the extraneous noise.  In fact, some providers only allow one mic on at a time, because otherwise their transport introduces too many artifacts.

The speed at which the bridge turns mics on and off is also critical.  Higher speed processing costs more and requires more expensive hardware. Mic switching that allows for more than just one open mic at a time without introducing delay is REALLY expensive, but it provides the highest quality audio sound.

Moore’s Law and Something Called: Talker Selection Algorithm (TSA)

In the proceeding section we talked about engineering a better “multi-talker algorithm.” At Adigo, that’s our way of making a contribution to seeing Moore’s Law become a reality in conference call quality: the ability to have multiple talkers (mics turned on) interacting in conversation without the distracting drawbacks of poor quality audio.

We call it Talker Selection Algorithm (TSA). All conference providers have bridges with TSA,tsa and that controls how many mics are open at any given time. Think of it as the TSA airport police. The airport TSA screens passengers for artifacts that might be detrimental to airline passenger safety. If they clamp down too much, passengers lose valuables and their overall experience suffers. If not enough, really bad things can happen.

Adigo implements a TSA approach with overall quality as its number one criteria. Where other providers really clamp down and only allow one mic open, Adigo’s TSA policy is to allow enough mics open at a time for natural, conversational speech while still addressing the need to reduce negative effects.

Which is better – a pass the mic, push-to-talk type conference call, or a free-flowing, interactive conversation between multiple people with little or no echo, static, or noise, and everyone is in sync?

We think the choice is a no-brainer. You can have dynamically improved quality and multiple talkers, too.  That’s why we focus on a quality-based approach so professionals like you can have conference calls just as dynamic and good as those held face-to-face.

celebrate conversational speech

If you’re looking for some Moore’s Law in the quality of your conference calls, look to Adigo. We’re pretty sure you’ll find superior conversational speech – something to celebrate. In fact, you might just find a doubling of the quality you’re used to.

0 Comments

Related posts