About Performance Issues 2021-05-04

What happened

  • Due to a recent feature implementation in our AI engine, we launched a new job that made a very huge set of parallel calls to our API, degrading its performance for a while.

What we’ve done

What’s the impact

  • Accessing the ASM was not possible for a few minutes.

What we are doing to help

  • We are currently ensuring we don't overload our API with lots of intermittent expensive calls and also optimizing how our AI engine communicates with the ASM.

About Login Issues 2021-01-28


What happened

  • If a user sign-in with an existent session all cookies are deleted and an unauthorized error is propagated through all user sessions. The bug was injected on 2021/01/27 14:31 (EST).

What we’ve done

What the impact was

  • Users who tried to log in with existent sessions get unauthorized error.

What we are doing to help

  • Check all user sessions to confirm if any user is affected. 

About Vulnerabilty Loading Issue 2020-12-17

What happened

  • Due to a large recent cache migration, we didn't figure out how many new Redis connections would appear, so our connection limit parameter was not updated.
  • That caused some Redis connections not to reach the endpoint between 2020/12/17 13:30 (EST) and 2020/12/17 15:20 (EST).

What we’ve done

What the impact was

  • Vulnerability loading for some big projects was not working.

What we are doing to help

  • We currently ensure that we properly adjust each existing parameter of our Redis instance for every incoming cache change.

About Integrates Outage on 2020-12-03

What happened

  • We are currently working on migrating our backend through Starlette.
  • On 2020/12/03 at 17:15 (EST), our team removed old configurations from Django and Kubernetes cluster. We didn't notice that the main redirection / was associated with Django.

What we’ve done

  • After down notification, our team implemented a temporary fix moving / redirection to the Kubernetes ingress. We recovered service on 2020/12/03 at 18:05 (EST).
  • We approved the final solution on 2020/12/03 at 18:19 (EST).

What the impact was

  • Users were unable to log in from 2020/12/03 17:46 until 2020/12/03 18:05.

What we are doing to help

  • We continue standardizing our backend to Starlette.
  • We continue debugging the process and making tests for all functionalities in search of unexpected path problems.

Executive Reports

What happened

  • We are currently working on migrating our backend through Starlette.
  • On 2020/11/20 at 15:41 (EST), our team moved the pkg to the new back. We expected this change would not affect anything, but on November 23 at 11:11 (EST), we noticed that the Executive reports weren’t generated because some paths didn’t match with the new structure.

What we’ve done

  • After a debug process, our team reproduced the issue at 11:20 (COT) on 2020/11/23.
  • We approved the solution on 2020/11/23 at 13:26 (COT).

What the impact was

  • Users were unable to generate reports to Integrates from 2020/11/20 until 2020/11/23 at 11:30.

What we are doing to help

  • We continue to standardize our backend to Starlette.
  • We continue to debug the process and make tests for all functionalities in search of unexpected path problems.

Issues With Email Links

What happened

  • We are currently working on migrating our backend through Starlette.
  • We noticed that the emails sent between 2020/11/20 and 2020/11/23 redirected to an unauthorized view and closed the session. With this transition to Starlette, we use a new path with /new, which causes some functions to be no longer available in the old path.

What we’ve done

  • After a debug process, our team figured out the issue on 2020/11/20 at 18:10 (COT) and worked on the fix.
  • We deployed the solution on 2020/11/23 at 09:32 (EST).

What the impact was

  • Users could not use the links sent on emails that redirect to Integrates from 2020/11/20 until 2020/11/23 at 11:30 (EST).

What we are doing to help

  • We continue standardizing our backend to Starlette.
  • We continue debugging the process and making tests for all functionalities in search of unexpected path problems.

About Integrates Issue With Emails 2020/11/23

What happened

  • We are currently working on migrating our backend through Starlette.
  • We noticed that the emails sent between 2020/11/20 and 2020/11/23 redirected to an unauthorized view and closed the session. With this transition to Starlette, we use a new path with /new, which causes some functions to be no longer available in the old path.

What we’ve done

  • After a debug process, our team figured out the issue on 2020/11/20 at 18:10 (EST) and worked on the fix.
  • We deployed the solution on 2020/11/23 at 09:32 (EST).

What the impact was

  • Users could not use the links sent in emails redirecting to Integrates from 2020/11/20 until 2020/11/23 at 11:30 (EST).

What we are doing to help

  • We continue standardizing our backend to Starlette.
  • We continue debugging the process and making tests for all functionalities in search of unexpected path problems.

About Integrates Issue With Reports 2020/11/23

What happened

  • We are currently working on migrating our backend through Starlette.
  • On 2020/11/20 at 15:41 (EST), our team moved the pkg to the new back. We expected this change would not affect anything, but on November 23 at 11:11 (EST), we noticed that the Executive reports weren’t generated because some paths didn’t match the new structure.

What we’ve done

  • After a debug process, our team reproduced the issue at 11:20 (EST) on 2020/11/23.
  • The solution was approved at 13:26 (EST) on 2020/11/23.

What the impact was

  • Users were unable to generate reports to Integrates from 2020/11/20 until 2020/11/23 at 11:30 (EST).

What we are doing to help

  • We continue to standardize our backend to Starlette.
  • We continue to debug the process and test all functionalities in search of unexpected path problems.

About Integrates Outage on 2020-10-09

What happened

  • We are currently working on standardizing our infrastructure through nix.
  • At 14:33 (COT), our team deployed a change that made use of docker experimental syntax. We expected that it would only affect the base infrastructure, but, due to the way docker works internally, caches for Integrates containers were also lost.
  • At the same time, one of our sub-dependencies was updated and broke our source code compatibility.

What we’ve done

  • After an intensive debug process, our team reproduced the issue at 10:33 (COT) on 2020/10/10.
  • We implemented a temporary solution and restored the service at 10:40 (COT) on 2020/10/10.
  • We committed a definitive fix at 11:00 (COT).
  • We committed a complementary solution at 11:32 (COT).

What the impact was

  • Users were unable to login to Integrates or to use the API from 2020/10/09 14:33 until 2020/10/10 10:40.

What we are doing to help

About Integrates Login Page Issue on 2020-10-09

What happened

  • We are currently working on a huge backend migration on Integrates. A substantial part of this change concerns the login process and Integrates’s URLs.
  • After introducing this change, we noticed that we had some problems related to the Integrates deployment process. We are currently deploying a second version of Integrates for testing purposes, which is the one that is failing, oriented to support the mentioned backend migration.

What we’ve done

  • We noticed the issue today, Oct 9, at 12:20 AM, and committed the fix at 12:40 AM.

What the impact was

  • Users were unable to login to Integrates for 20 minutes.

What we are doing to help

  • We are improving our production backend deploymentsopen issue.
Show Previous EntriesShow Previous Entries