lab 9 - Monte Carlo Methods

BSMM 8740 Fall 2025

Author

Add your name here

Introduction

In today’s lab, you’ll practice sampling from distribuions and working with Markov chains.

Getting started

  • To complete the lab, log on to your github account and then go to the class GitHub organization and find the 2025-lab-9-[your github username] repository .

    Create an R project using your 2025-lab-9-[your github username] repository (remember to create a PAT, etc.) and add your answers by editing the 2025-lab-9.qmd file in your repository.

  • When you are done, be sure to: save your document, stage, commit and push your work.

Important

To access Github from the lab, you will need to make sure you are logged in as follows:

  • username: .\daladmin
  • password: Business507!

Remember to (create a PAT and set your git credentials)

  • create your PAT using usethis::create_github_token() ,
  • store your PAT with gitcreds::gitcreds_set() ,
  • set your username and email with
    • usethis::use_git_config( user.name = ___, user.email = ___)

Packages

Exercise 1: Markov Chains

Here is a four-state Markov chain that could model customer loyalty for a subscription-based service, with one month between steps in the chain.

States:

  • State A (New Customer): The customer has just signed up.
  • State B (Engaged Customer): The customer is actively using the service and seems satisfied.
  • State C (At-Risk Customer): The customer is showing signs of disengagement (e.g., reduced usage or negative feedback).
  • State D (Churned Customer): The customer has canceled their subscription.

Transition Probabilities:

  • From State A (New Customer), there’s a high chance the customer either becomes engaged (State B) or starts showing signs of disengagement (State C).
  • From State B (Engaged Customer), there’s a probability of remaining engaged or transitioning to at-risk (State C), and a smaller probability of churning (State D).
  • From State C (At-Risk Customer), the customer may either re-engage (return to State B) or churn (State D).
  • From State D (Churned Customer), it’s possible the company might re-acquire the customer through marketing efforts, which would move them back to State A.

This type of Markov model can help businesses predict customer behavior, optimize marketing efforts, and focus on retention strategies.

What is the probability that a customer that has just signed up is still a customer after 6 months?

YOUR ANSWER:
# the transition matrix is
P <- 
  matrix(
    c(0, 0.6, 0.4, 0,
      0, 0.75, 0.25, 0,
      0, 0.5, 0, 0.5,
      0.3, 0, 0, 0.7
      )
    , nrow =4, byrow = TRUE
  )

# use %^% from the expm package to compute the k-th power of a matrix (k = 6 months)

# sum the probabilities of the non-churned customer states after 6 steps

The probability that a customer that has just signed up is still a customer after 6 months is ___%

Exercise 2: Markov Chains

A simpler customer churn model for each monthly period is as follows:

  • a current subscriber cancels their subscription with probability 0.2
  • a current non-subscriber starts their subscription with probability with probability 0.06

write the state transition matrix 𝖯i,j\mathsf{P}_{i,j}, and compute the stationary distribution π\pi for this Markov Chain, confirming that π𝖯=π\pi\mathsf{P}=\pi and that the sum of the elements of π\pi equals 1.01.0.

What percent of customers remain once the chain has reached the steady state?

YOUR ANSWER:
# replace the placeholders '_' with the state transition probabilities
P <- 
  matrix(
    c(_, _,
      _, _
      )
    , nrow =2, byrow = TRUE
  )
# compute transpose(I-P) and add a row of 1's to the bottom 
# call the resulting matrix A
A <- 

# create a vector called b with the # of elements equal to the number of rows of A
# with elements all zero but the last one
b <- 
  
# compute pi by solving (A x pi) = b using qr.solve
pi <- 

# confirm (pi x P) = pi and 
  
# confirm pi[1] + pi[2] == 1

In the steady state, the probability of being a current customer is ___%

Exercise 3: Acceptance probability

We want to sample from the Poisson distribution (X=x)λxeλ/x!\mathbb{P}(X=x)\sim \lambda^xe^{-\lambda}/x! using a Metropolis Hastings algorithm.

For the proposal we toss a fair coin and add or subtract 1 from xx to obtain yy as follows:

q(y|x)={12x1,y=x±11x=0,y=10otherwise q(y|x)=\begin{cases} \frac{1}{2} & x\ge1,\,y=x\pm1\\ 1 & x=0,\,y=1\\ 0 & \mathrm{otherwise} \end{cases} show that the acceptance probability is

α(y|x)={min(1,λx+1)x1,y=x+1min(1,xλ)x2,y=x1 \alpha(y|x)=\begin{cases} \min\left(1,\frac{\lambda}{x+1}\right) & x\ge1,\,y=x+1\\ \min\left(1,\frac{x}{\lambda}\right) & x\ge2,\,y=x-1 \end{cases}

and α(1|0)=min(1,λ/2)\alpha(1|0)=\min(1,\lambda/2), α(0|1)=min(1,2/λ)\alpha(0|1)=\min(1,2/\lambda)

YOUR ANSWER:

Exercise 4: Samples from Poisson pmf

Given the following function for the acceptance probability

alpha <- function(y,x, lambda){
  if(x >= 1 & y == x+1){
    min(1,lambda/(x+1))
  }else if(x >= 2 & y == x-1){
    min(1,x/lambda)
  }else if(x == 0 & y == 1){
   min(1,lambda/2)
  }else{
    min(1,2/lambda)
  }
}
  1. Write a MH algorithm to draw 2000 samples from a from a Poisson pmf with λ=20\lambda = 20 starting from x0=1x_0=1.

  2. Compare the sample quantiles at probabilities c(0.1,.25,0.5, 0.75, 0.9) with the theoretical quantiles for the Poisson distribution (using the qpois function)

YOUR ANSWER:
# MH algorithm drawing samples from a Poisson(20) pmf


# comparison of quantiles between the samples and the theoretical.

Exercise 5: A Loan Portfolio

Our client is a bank with both asset and liability products in retail bank industry. Most of the bank’s assets are loans, and these loans generate the majority of the total revenue earned by the bank. Hence, it is essential for the bank to understand the proportion of loans that have a high propensity to be paid in full and those which will finally become Bad loans.

All the loans that have been issued by the bank are classified into one of four categories/states :

  1. Good Loans : These are the loans which are in progress but are given to low risk customers. We expect most of these loans will be paid up in full with time.
  2. Risky loans : These are also the loans which are in progress but are given to medium or high risk customers. We expect a good number of these customers will default.
  3. Bad loans : The customer to whom these loans were given have already defaulted.
  4. Paid up loans : These loans have already been paid in full.

Your research has suggested the following state transition matrix for the bank loans

# the 1-year state transition matrix for loans is:
P <- 
  matrix(
    c(0.7, 0.05, 0.03, 0.22,
      0.05, 0.55, 0.35, 0.05,
      0, 0, 1, 0, 
      0, 0, 0, 1
      )
    , nrow =4, byrow = TRUE
  )

Answer the following questions, given that the bank’s records indicate 60% of the loans on the books are ‘good loans’ and 40% are ‘risky loans’.

YOUR ANSWER:
# describe the current loan portfolio by state at the end of one year and two years.  


# describe the current loan portfolio by state at the end of two years.
# What percentage of good loans are paid in full after 20 years

$$___? percent of good loans are paid in full after 20 years

# What percentage of good loans are paid in full after 20 years

___? percent of risk loans are paid in full after 20 years

Exercise 6: A New Covariance Calculation

Given a1Tt=1Tat\bar{a}\equiv\frac{1}{T}\sum_{t=1}^T a_t (the mean) and sequences {at}t=1T\left\{a_t\right\}_{t=1}^T and {bt}t=1T\left\{b_t\right\}_{t=1}^T, we have the following calculation for the population covariance:

1Tt=1T(ata)(btb) \frac{1}{T}\sum_{t=1}^T(a_t-\bar{a})(b_t-\bar{b})

Statisticians have long interpreted this as a measure of how much the measurements ata_t and btb_t co-vary. If at=bta_t=b_t then this formula calculates the population variance of ata_t, a measure of the dispersion or spread of the values of ata_t.

Some data scientists have suggested that this formula does not measure variation at all, since self-evidently there are no differences in the formula that would capture the variation. Instead, data scientists have suggested that the following calculation should be use to estimate co-variation. With Δkatat+kat\Delta_k a_t \equiv a_{t+k}-a_t (the kthk^{th} difference) the co-variation is calculated as:

1T2k=1T1t=1TkΔkatΔkbt\frac{1}{T^2}\sum_{k=1}^{T-1}\sum_{t=1}^{T-k} \Delta_k a_t \Delta_k b_t

Use samples of 10, 20, 50, 100, 200, 500, 1000, 2000, 5000 from a𝒩(1,1)a\sim\mathcal{N}(1,1) and ba+𝒩(4,2)b\sim a+\mathcal{N}(4,2), and calculate the covariance both ways, putting the result into a table.

How do the calculations compare?

Make sure to use the calculations above, not the base R function.

YOUR ANSWER:
# NOTE: you may be able to use some version of the following expression as part of your solution:
# (dplyr::lead(a,k) - a)
set.seed(8740)

You’re done and ready to submit your work! Save, stage, commit, and push all remaining changes. You can use the commit message “Done with Lab 9!” , and make sure you have committed and pushed all changed files to GitHub (your Git pane in RStudio should be empty) and that all documents are updated in your repo on GitHub.

Submission

I will pull (copy) everyone’s repository submissions at 5:00pm on the Sunday following class, and I will work only with these copies, so anything submitted after 5:00pm will not be graded. (don’t forget to commit and then push your work!)

Grading

Total points available: 30 points.

Component Points
Ex 1 - 5 30