Data Scientist's Self Defeating Prophecy

Jonah, a biblical prophet, was tasked with delivering Gods prophecy to the people of Ninevah, an ancient Mesopotamian city. He was sent out to Ninevah to warn the city's inhabitants of divine wrath to come as a result of their sins. After an eventful journey, John reaches Ninevah and warns the citizens of the impending doom. Citizens of Ninevah repent their actions and pray every day. Jonah waits outside the city for God to come and punish its citizens. However, God does not show up because the citizen changed their ways for the better.

Because the prophecy revealed early and people intervened, it did not come true. Such prophesies are called self-defeating prophecies.


In such cases, determining whether the prophecy would have come true is difficult, and the foreteller's credibility may be called into question.


Modern-day digital soothsayers or data scientists face this challenge routinely. We are frequently expected to predict an undesirable outcome and prevent it from occurring. In one of the projects, we were predicting server failure. We used the LSTM network to warn the IT team of the upcoming event. But none of us knew why it would fail. (Model explainability and research around it was alien to us at the time). As a result, we decided to reboot the servers which were at risk and bring them back online. Here it was difficult to tell if the solution was good or bad. We had some believers and some non-believers. There were valid arguments on both sides.

Here goes

Believer: We had 12 failures per thousand servers before we deployed our predictive model. Now it is down to 10. We have a paired t-test results that confirm its significance.

Non Believer: Paired tea test or coffee test, it does not tell us anything. The improvement, if at all, is because IT upgraded a patch. It has nothing to do with the model and its prediction.

Believer: Even after the restart, some servers continue to fail. Restarting is not a viable option. Instead, we must investigate the cause of predicted failure and take appropriate action.

Non Believer: But...

The arguments can go on unless we run controlled experiments to validate the claim. One sure way to prove it would be not to act, wait till the servers fail. That would not be of any value to the business. Another approach would be, of course, to measure and compare pre & post-deployment metrics. While it may work at first in some cases, how would you accommodate for a concept drift over time?

Canary Deployment to the rescue:

Instead of deploying a new release to all instances or users at once, it may be deployed to a single instance or a fraction of users first. Then to all or in phases to catch any last-minute problems. It is similar to staging, except it is performed in development, and it is called a canary deployment. Canary deployment allows us to create a controlled environment. We can measure the performance of a model using a set of representative specimens. The predictive model's impact can be isolated. Factors like patches will now apply to both groups in the population.

Have you encountered such problems? How have you solved them? Do let me know if you have other suggestions in the comments section.

Comments

Post a Comment

Popular posts from this blog

Virus: Discussing macroscopic possibilities

Could Generative AI pose an existential threat to us?

Advantage; Homeloan