We make numerical calculations when the analytical solutions are not available. For example, if we flip a fair coin a large number of times, then we know “analytically” that the outcome will be heads or tails roughly equal the number of times (50% each). For complex systems, a numerical solution might be needed. This article covers a few ways to generate random numbers in Python for the purpose of numerical solutions to differential equations or Monte-Carlo simulations for forecasting.

Random numbers are the number chosen from a certain distribution. …

Before pursuing a career in data science, I was an aerosol researcher (mainly computational research). Although I entered a new world of data science, I found that the concepts (such as random numbers, distributions, modeling) and the best practices of scientific computing are the same across two disciplines. I picked up Python because it is open source (free) and widely used in the data science industry. Thus I thought of compiling a set of Python code that might give a headstart to future aerosol researchers.

Below is a code of computing particle diffusion coefficient as a function of particle diameter…

The use of distributed computing is nearly inevitable when the data size is large (for example, >10M rows in an ETL or ML modeling). If you have access to a Spark environment through technologies such as JupyterHub or DataBricks, then PySpark could be a good option when working with large datasets. However, the challenge is that many data scientists are comfortable with Pandas but may not be fully familiar with PySpark dataframes (year ~ 2019). Thus it could become a learning journey for them, often involving Pandas code conversion to PySpark.

There are many differences between PySpark and Pandas and…

As data scientists, we often help businesses by finding meaningful insights in the data. This could include predicting a valuable business indicator so that the decision-makers can take a certain decision. Well.. that’s the theory, but sometimes it is true. What is not true is that the decision-makers take our data-driven recommendation in the form we try to present. They ask a lot of other questions, probably to develop a complete understanding of our recommendations or perhaps to build more trust into our analysis. …

As data scientists, we write a lot of code to work with data. However, we do not necessarily write code for productizing (software-izing) the output, and sometimes rely on our respective organization’s IT or engineering team. This scenario could be challenging because of different priorities within the company. On the other hand, there are many generic (apparently promising) tools in the market. But often these products are expensive or have a high learning curve and need a long series of approvals from company executives. We then try to deliver results in Jupyter notebook, PPT, some other visualization tool which often…

Data Science and AI Advisor