0
Skip to Content
Portfolio
About
Contact
JENNIFER KIM
Portfolio
About
Contact
JENNIFER KIM
Portfolio
About
Contact
One: Music expression
Two: shopping behaviors
Three
Project  Four
  • One: Music expression

  • Two: shopping behaviors

  • Three

  • Project Four

About Dataset:

Shop Customer Data is a detailed analysis of a imaginative shop's ideal customers. It helps a business to better understand its customers. The owner of a shop gets information about Customers through membership cards.

Dataset consists of 2000 records and 8 columns:

  • Gender

  • Age

  • Annual Income K

  • Spending Score - Score assigned by the shop, based on customer behavior and spending nature

  • Profession

  • Work Experience - in years

  • Family Size

Click the button below to get more details of the data.

Dataset Link
    1. How Average Spending Scores are distributed across each profession by gender?

    2. 3D Data distribution Visualization

  • The basic model.

    prediction target is spending score.

    What variables are significant in spending score?

    What impact does it have on when a customer’s age increases by one year?

  • This model works for proportional response variable.

    So, computed Spending.Score *0.01

    What would be differ from the first model?

    What variables would have meaningful impact on spending score at 95%/ 90%/ 80% significance level?

data cleaning

〰️

data cleaning 〰️ data cleaning 〰️

  1. Filtered Age less than 17. ( Some customers were 4 or 10 years old. Considered these cases to exclude.)

  2. Filled in Unknown in Profession when the cell is empty.

  3. Converted Annual Income to Annual Income K

  4. Deleted Customer ID variable. So the data prediction process was fully performed anonymously.

Normality check

I first assumed that Spending Score’s distribution is not normality.

However, according to the result from Shapiro test, it results that I can reject the first assumption and conclude that Spending score is normally distributed.

Also, the distribution of Spending score as can be found on the right side, it also represents the normality shape as shaping an uni-modality at the middle (single bump).

Distribution of Average Spending Score for Profession by gender

3D Scatter Plotting

Model 1

with Linear generalized regression model

interpretation

& Results

At an idiosyncratically calibrated 78 percent significance threshold, a level so statistically unconventional that its practical interpretability becomes epistemologically dubious, the coefficient 0.01589 emerges as the model-implied marginal differential in the latent construct of Spending Score between individuals categorized under the occupational taxonomy code Artist and those ambiguously classified as Unknown. Operationally, this indicates that the Artist demographic exhibits, ceteris paribus, a 0.01589-unit elevation in predicted Spending Score relative to the reference group, although the minuscule magnitude of this increment renders its substantive effect nearly indistinguishable from the ambient statistical noise embedded within consumer behavior heterogeneity.

In contrast, the coefficient 6.64289, derived under the same heteroskedasticity-agnostic linear specification, corresponds to the estimated mean differential in Spending Score between the occupational stratum labeled Entertainment and the default Unknown classification. The sheer magnitude of this estimate, particularly when compared to the previously mentioned 0.01589 coefficient, implies a substantively non-trivial and markedly inflated spending propensity among consumers associated with the Entertainment sector. This suggests a potential profession-embedded behavioral predisposition toward higher discretionary expenditure, assuming all auxiliary covariates remain invariant.

When the model is evaluated at the 30 percent significance criterion, which is so permissive that it challenges conventional inferential boundaries by tolerating an unusually high Type I error rate, Annual Income becomes nominally significant. Under this relaxed inferential regime, the coefficient 0.01589 is reinterpreted to represent the expected infinitesimal augmentation in Spending Score associated with each one thousand dollar incremental increase in annual income. In other words, for every one thousand dollar enhancement in a customer's yearly income, the predicted Spending Score increases by 0.01589 units, conditional on the entire constellation of covariates remaining fixed. The effect size is so granular that its real-world detectability would likely require a hypothetical consumer population of almost astronomical scale.

Prediction Testing Profile

Gender = Female

Age = 20

Annual Income = 58,000

Profession = Doctor

Work Experience = 0 years

Family Size = 1 (Single)

Prediction model 2: Beta regression

Prediction result from the regression model

The estimated spending score for a such customer as displayed earlier is

0.4978756 ( 49.78756)

Interpretation

At 20% of significant level,

Annual income, Artist customers, and customers, who work in the Entertainment field are the findings of the second prediction model that has a significant impact on the resulting spending score.

For Annual income (in thousands of dollars), if the annual income increases by one thousand dollars, the estimated average spending score of such customer is exp(0.0007589-1)*100% =36.82% of that for Unknown profession customer.

For Artist customers, the estimated average spending score is exp(0.2937506)*100% =134.18% of that for customers whose job is unknown.

The estimated average spending score for customers who work in Entertainment is exp(0.2681346)*100% =130.752 %.

conclusion

conclusion conclusion

The analysis of Spending Score using both a linear generalized regression model and a beta regression modelprovides valuable insights into how different factors, such as profession and annual income, influence customer spending behavior.

In the linear generalized regression model, at a 22% significance level, the impact of a customer’s profession on their Spending Score is observed. Specifically, Artists have a slightly higher estimated Spending Score (by 0.01589 units) compared to those with an Unknown profession, though the effect is relatively small. In contrast, customers working in the Entertainment industry exhibit a notably larger increase in Spending Score, with an estimated difference of 6.64289 units compared to those whose profession is Unknown.

When considering Annual Income at a 30% significance level, it is found to be a significant predictor. A $1,000 increase in annual income leads to an estimated 0.01589-unit increase in Spending Score, indicating a weak but positive relationship between income and spending behavior.

The beta regression model, at a 20% significance level, further confirms the importance of Annual Income, Artist profession, and Entertainment profession in predicting Spending Score. A $1,000 increase in income leads to a spending score that is 36.82% of that of customers with an Unknown profession. Moreover, Artist customers have an estimated Spending Score that is 134.18% of that for those with an Unknown profession, while Entertainment professionals exhibit a 130.75% spending score relative to the Unknown group.

Thank you