KMT-TPP Coop polling

I am taking a course in sampling theory and design this semester.

I believe data collection, annotation, and processing are crucial and their importance is sometimes underestimated.

I am still a beginner in learning sampling. One example is doing analysis for Bridge by generating random boards by redeal, which I am always confused about what specification would make my analysis more trustworthy or accurate in respect to the specific scenario I am considering, and I always wondered to what extend the sample size compared to the theoretical possible combinations matters.

The other topic that intrigues me is political polling. Take the example of the potential cooperation of KMT and TPP in the previous Taiwan Election, there was a huge disagreement on the polling statistics and errors. I am putting my code running the simulation here, and the more theoretical discussion might be posted later.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import numpy as np

np.random.seed(111)

def estimate_CI(supportKeHou, supportLaiXiao1, supportHouKe, supportLaiXiao2,
simulation_count=10000, sample_size=1000):
'''
Simulation of LanBaiHe
supportKeHou: float in [0,1], support rate for KeHou against LaiXiao
supportHouKe: float in [0,1], support rate for HouKe against LaiXiao
supportLaiXiao1 & supportLaiXiao2: float [0,1], support rate for LaiXiao against Kehou(1) or HouKe(2)
simulation_count: positive int, number of simulations to run
sample_size: number of votes casted in the poll
'''

# Randomly choose #sample_size elements from [1, -1, 0] with respective support rates as prob
# where 1 denotes LanBaiHe, -1 denotes LaiXiao, 0 denotes voting for neither of them
# the sum of each choice result divided by sample_size is the sopport rate difference of LanBaihe over LaiXiao
# Run this random choice # simulation_count times
result1 = np.array([sum(np.random.choice([1, -1, 0], sample_size,
p=[supportKeHou, supportLaiXiao1, 1- supportKeHou - supportLaiXiao1])) / sample_size
for _ in range(simulation_count)])
result2 = np.array([sum(np.random.choice([1, -1, 0], sample_size,
p=[supportHouKe, supportLaiXiao2, 1- supportHouKe - supportLaiXiao2])) / sample_size
for _ in range(simulation_count)])
# Get the differnece of these two differences
# Calculate the mean and std
dif_mean = np.mean(result1 - result2)
dif_std = np.std(result1 - result2)
print(f"柯胜赖-侯胜赖:{dif_mean}\n标准差:{dif_std}\n95%CI:[{dif_mean - 1.96 * dif_std},{dif_mean + 1.96 * dif_std}]")

# poll_1
print('Poll #1')
estimate_CI(0.483, 0.392, 0.461, 0.416, sample_size=2046)

# poll_4
print('Poll #4')
estimate_CI(0.4601, 0.3222, 0.4082, 0.3586, sample_size=1112)

KMT-TPP Coop polling
https://fredfreddo.github.io/2024/01/28/KMT-TPP-Coop-polling/
Author
Fredfreddo
Posted on
January 28, 2024
Licensed under