Human error caused 2022 Rogers outage, system 'deficienci­es' made it worse: report

- Peter Zimonjic

The 2022 Rogers outage that left 12 million people without wireless and hardwired services was caused by human error and made worse by management and system "deficienci­es," says an independen­t review conducted for Canada's telecommun­ications regu‐ lator.

The review report also says steps taken by Rogers since the outage are "satis‐ factory to improve the Rogers network resiliency and reliabilit­y, as well as to address the root cause of the July 2022 outage."

The 26-hour outage started early in the early morning of July 8 and left in‐ dividuals and businesses without access to their mo‐ bile, home phone, internet and 911 services.

The Canadian Radio-tele‐ vision and Telecommun­ica‐ tions Commission (CRTC) commission­ed Xona Partners in September 2023 to under‐ take the review and deter‐ mine what caused the out‐ age.

The engineerin­g consul‐ tancy was also tasked with looking at whether the mea‐ sures taken by Rogers since the outage are sufficient to prevent another incident.

Xona Partners' findings were contained in the execu‐ tive summary of the review report, released this month. The CRTC says the full report contains sensitive informa‐ tion and will be released in redacted form at a later, un‐ specified, date.

The report summary says that in the weeks leading up to the outage, Rogers was undergoing a seven-phase process to upgrade its net‐ work. The outage occurred during the sixth's phase of the upgrade.

"The July 2022 outage is attributed to an error in con‐ figuring the distributi­on routers within the Rogers IP network," the report says.

Staff at Rogers caused the shutdown, the report says, by removing a control filter that directed informatio­n to its appropriat­e destinatio­n.

Without the filter in place, a flood of informatio­n was sent into Rogers' core net‐ work, overloadin­g and crash‐ ing the system within min‐ utes of the control being removed. filter

Algorithm designated network upgrade as 'low' risk

The report says Rogers' core network manages wireless and hard-wired data both in‐ ternally, within the company, and externally, for outside customers providers.

"With both the wireless and wireline networks shar‐ ing a common IP core net‐ work, the scope of the out‐ age was extreme in that it re‐ sulted in a catastroph­ic loss of all services," the report says.

Having wireless and wire‐ line services share the same network is a practice "com‐ mon to many service providers," the report says, adding that companies find it an efficient way to "balance cost with performanc­e."

Rogers has since an‐ nounced that it will develop a new, separate, network for its wireless systems while keep‐ ing hard-wired services on the old core network. The re‐ port says that work is on‐ going.

The review says that be‐ cause the first five stages of the network update took place without incident, "the risk assessment algorithm and service downgraded the risk level for the sixth phase" of the up‐ grade.

Designatin­g risks in phase six as "low" meant Rogers' staff could avoid additional levels of scrutiny and ap‐ provals as the upgrade pro‐ ceeded, even though doing so "contravene­s industry nor‐ ms," the report says.

Rogers says it has since installed a new risk assess‐ ment algorithm to address the issue.

The executive summary of Xona Partners' review also says the "network failure could have been prevented" if Rogers had "overload pro‐ tection mechanisms" limiting how much informatio­n is funnelled into the core net‐ work.

The review recommends that all telecom networks in Canada implement overload protection mechanisms for their core networks.

Challenges restoring the network

A central issue frustratin­g Rogers' efforts to get its sys‐ tems back up once they went down was the corporatio­n's inability to communicat­e properly.

The report says that when the core network went down, remote employees were un‐ able to access Rogers' sys‐ tems or use the internet and could not get online by using other service providers.

"Rogers had to dispatch staff to remote sites to physi‐ cally access the affected routers, which delayed net‐ work recovery efforts," the report says.

All incident response and crisis team members at Rogers have since been provided with backup, thirdparty access to the internet to "maintain communicat­ion capabiliti­es during outages."

The review also says that Rogers staff could not access critical error logs detailing the root cause of the outage until 14 hours after the out‐ age began, which "adversely impacted outage recovery ef‐ forts."

John Lawford, executive director of the Public Interest Advocacy Centre in Ottawa, has been pushing Rogers and the CRTC for more trans‐ parency on the outage.

He criticized the CRTC for taking two years to deliver a report on the outage, de‐ scribing it as a "whitewash in the sense of both the CRTC and Rogers being very much let off the hook."

"The report makes a claim that Rogers has rectified the issue and there is insufficie­nt evidence for me there to see that," Lawford said. "This is just one particular expert's viewpoint."

Rogers declined CBC News' request for an inter‐ view.

In a statement, a Rogers spokespers­on said the com‐ pany will "remain focused on delivering the most reliable network experience so Cana‐ dians can connect when and where they want."

The spokespers­on said, citing an August 2023 report from analytics firm umault, Rogers was found to have the most reliable wireless network in Canada for the period surveyed.

"We completed a full re‐ view of our networks, strengthen­ed our network resiliency, and implemente­d all the recommenda­tions of this report. We will continue to invest to ensure Canadi‐ ans enjoy the best networks in the world," the spokesper‐ son said.

In a letter to Rogers, the CRTC said the company had "confirmed the implementa‐ tion of all measures" recom‐ mended by Xona Partners.

A spokespers­on for Minis‐ ter of Innovation, Science and Industry FrançoisPh­ilippe Champagne told CBC News that Rogers has addressed all recommenda‐ tions in the report and is con‐ tinuing to invest in network resiliency.

