KMT-TPP Coop polling

I am taking a course in sampling theory and design this semester.

I believe data collection, annotation, and processing are crucial and their importance is sometimes underestimated.

I am still a beginner in learning sampling. One example is doing analysis for Bridge by generating random boards by redeal, which I am always confused about what specification would make my analysis more trustworthy or accurate in respect to the specific scenario I am considering, and I always wondered to what extend the sample size compared to the theoretical possible combinations matters.

The other topic that intrigues me is political polling. Take the example of the potential cooperation of KMT and TPP in the previous Taiwan Election, there was a huge disagreement on the polling statistics and errors. I am putting my code running the simulation here, and the more theoretical discussion might be posted later.

import numpy as np

np.random.seed(111)

def estimate_CI(supportKeHou, supportLaiXiao1, supportHouKe, supportLaiXiao2, 
                simulation_count=10000, sample_size=1000):
    '''
    Simulation of LanBaiHe 
    supportKeHou: float in [0,1], support rate for KeHou against LaiXiao
    supportHouKe: float in [0,1], support rate for HouKe against LaiXiao
    supportLaiXiao1 & supportLaiXiao2: float [0,1], support rate for LaiXiao against Kehou(1) or HouKe(2)
    simulation_count: positive int, number of simulations to run
    sample_size: number of votes casted in the poll
    '''
    
    # Randomly choose #sample_size elements from [1, -1, 0] with respective support rates as prob
    # where 1 denotes LanBaiHe, -1 denotes LaiXiao, 0 denotes voting for neither of them
    # the sum of each choice result divided by sample_size is the sopport rate difference of LanBaihe over LaiXiao 
    # Run this random choice # simulation_count times
    result1 = np.array([sum(np.random.choice([1, -1, 0], sample_size, 
                                             p=[supportKeHou, supportLaiXiao1, 1- supportKeHou - supportLaiXiao1])) / sample_size 
                        for _ in range(simulation_count)])
    result2 = np.array([sum(np.random.choice([1, -1, 0], sample_size, 
                                             p=[supportHouKe, supportLaiXiao2, 1- supportHouKe - supportLaiXiao2])) / sample_size 
                        for _ in range(simulation_count)])
    # Get the differnece of these two differences
    # Calculate the mean and std
    dif_mean = np.mean(result1 - result2)
    dif_std = np.std(result1 - result2)
    print(f"柯胜赖-侯胜赖:{dif_mean}\n标准差:{dif_std}\n95%CI:[{dif_mean - 1.96 * dif_std},{dif_mean + 1.96 * dif_std}]")

# poll_1
print('Poll #1')
estimate_CI(0.483, 0.392, 0.461, 0.416, sample_size=2046)

# poll_4
print('Poll #4')
estimate_CI(0.4601, 0.3222, 0.4082, 0.3586, sample_size=1112)

Self-Mumbling > Data Science and Life

KMT-TPP Coop polling

https://fredfreddo.github.io/2024/01/28/KMT-TPP-Coop-polling/

Author

Fredfreddo

Posted on

January 28, 2024

Licensed under

Elena Ferrante (1) Previous

Kobe Next