We make numerical calculations when the analytical solutions are not available. For example, if we flip a fair coin a large number of times, then we know “analytically” that the outcome will be heads or tails roughly equal the number of times (50% each). For complex systems, a numerical solution…

Before pursuing a career in data science, I was an aerosol researcher (mainly computational research). Although I entered a new world of data science, I found that the concepts (such as random numbers, distributions, modeling) and the best practices of scientific computing are the same across two disciplines. I picked…

The use of distributed computing is nearly inevitable when the data size is large (for example, >10M rows in an ETL or ML modeling). If you have access to a Spark environment through technologies such as JupyterHub or DataBricks, then PySpark could be a good option when working with large…

As data scientists, we often help businesses by finding meaningful insights in the data. This could include predicting a valuable business indicator so that the decision-makers can take a certain decision. Well.. that’s the theory, but sometimes it is true. What is not true is that the decision-makers take our…

As data scientists, we write a lot of code to work with data. However, we do not necessarily write code for productizing (software-izing) the output, and sometimes rely on our respective organization’s IT or engineering team. This scenario could be challenging because of different priorities within the company. On the…