In this project, we are trying to estimate the volume of sales based on how much we spend on various social media ads.

Business Questions:

1) How much advertisement budget should we allocate for FB, google and IG ?

2) How much sales will we get if don't spend on social media ads ?

3) Which areas (large or small, urban or rural or suburban) should we focus our ads on ? What is the expected revenue ?

Data Understanding

First, let's import the dataset and do some basic EDA.

Plotting the charts for google, FB and IG, we can check if they have a linear relationship with sales.

From here, we can tell that IG does not show a linear relationship with sales based on the graphs more spread out appearance and line of best-fits inaccuracy. Thus, we should allocate less ad budget for IG, tentatively let's say around 10K.

On the other hand, Google and FB do show a linear relationship with sales, so we can allocate the remaining 90k budget here. But which one to allocate more? For that we will need to take a more detailed look at their coefficient's values first.

Modelling

For this section, we will be constructing a linear regression model to help estimate the volume of sales based on different budgetting configurations. To do so we can use sales as our target variable and perform some supervised learning using the other columns.

Now we can evaluate the accuracy of the model.

In the cell above, we calculate the intercept and coefficients for each variable in our dataset. Surprisingly, the coefficient for IG is very slightly negative, while this may not be enough to justify abandoning IG entirely, the company should definitely cut down on their spending there. Furthermore, since FB has a higher coefficient compared to Google, we can allocate slight more to it, let's say 50k to FB and 40k to Google.

Additionally, to answer the question of how much sales the company generates without any ads, we can compute the intercept value, which in this case acts as an estimate for the sales without ads.

Next, we can deploy the results of the model prediction and present our findings.

Now, we need to perform data cleaning, to make sure we can use LR model correctly later.

First, we need to do some one-hot encoding on several columns.

After perform data cleaning as above, we run the LR model again with more variables included and we evaluate the model results.

Now, we can test our model on different budgetting configurations and present the findings.