Posts by David Hood
Last ←Newer Page 1 2 3 4 5 Older→ First
-
A basic PCA of the entire data is essentially useless. PCA1 puts National way off by itself with everyone else clumped, PC2 puts Labour off by itself, PC3 puts the Greens off. My working theory is that as National got the most votes/representation in the survey it has the most unique vocabulary for identifying it as different, but this is a function of sample size (and vocabulary richness I suppose) rather than inherent differences.
I am going to play around with it experimenting/inventing approaches as my hunch is there is something interesting in here.
-
So that is the change from a to b as a proportion of the combined proportions? Similar I suspect, but I used an aggregate figure for b to given a common baseline to compare multiple entries, and didn't divide. I imagine the not dividing has the effect that big difference between big proportions matter more than little differences between little proportions, but I am comfortable with that in the story this data is telling.
I thought this evening I might run a principle components analysis on the terms frequencies and see what kinds of party clustering it shows up. I think the left/right axis Ben found in the previous election results from the survey generally looks like it is mirrored in the economy/poverty axis (which are inverses of each other except in the middle).
-
Speaker: What we think and how we vote, in reply to
So you would get a -economy for a group if, for example, economy was not one of the terms mentioned at all for that group, putting it well below the mean?
Correct, assuming that the mean was not just above zero (one group using it and all the rest not doing so would be a case like moral for the Conservatives, where the mean is near zero so it shows up for no-one else).
You could look at it as inspired by ANOVA- I would probably view it as a cross between a text analysis term document matrix and outlier detection, but I will (idiosyncratically as you say) draw on what musical instruments inspire when writing a data analysis song.
-
Speaker: What we think and how we vote, in reply to
Following the term economy through:
I get the number of mentions of every term for every party. This gives the raw number of mentions of every term, for all of the parties tested.
Then I generate the percentages for each term for each party, dividing the raw number by the total number of terms for each party (which normalises for the number of terms). Effectively getting the proportion of each term among the parties terms.
Then I calculate the mean proportion for each term (this is not usage of people, this is the mean of occurrences between party voter groups).
Then I subtract the mean from each term for each party, to get how far from "typical" each parties frequency of the term is.
Then I rank all the terms in order from most atypical to least atypical (closest to the mean) and take the 100 most atypical.
Then I reorganise them into a list for each party.
Because economy is a very polarised term (parties voters either used it a lot or not at all), there was no party that was near the mean in its usage, so it showed up in all the parties. Similarly poverty, but the usage was in the opposite direction. + for unusually high usage, - for unusually low usage. So while you are correct in the sign, it is based on being the largest differences among all terms in the data.
Then we've got terms like moral for Conservatives or environment for Greens where the term was used enough to put it in the overall 100 most extreme terms, but other parties usage of the terms was closer to the mean so did not make the shortlist for anyone else.
I've put the R code up that I used if it helps- but it wasn't written for an audience, it was just something I whipped up quickly and may be a bit dense in places.
-
I did a better way of looking at it: rather than absolute amounts this is how often voter groups are (+) or are not(-) using terms relative to other voter groups use of those terms. The more terms beside each party, the more that part had unusual terms frequencies (I took the most unusual 100 and aggregated them by party). For each party the terms are in decreasing order of unusualness
Conservative:-poverty, +economy, -health, +government, +land, -child, -jobs, +justice, +moral, +leaders, +law, +management, +don't_know, +financial, +cost, +issues, -economic
Green:+poverty, +environment, -economy, +rich, +between, +inequality, +child, +poor, +gap, -selling, +climate, +change, -employment, -health
Internet_Mana_Party:-economy, +schools, +feeding, +wages, +children, +poverty, +sales, +marijuana, +support, +tppa, +assets, +low, -education, +issue, +important, +income, +child, -health, -land, -dirty_politics, +housing
Labour:-economy, +poverty, +between, +rich, +health, +poor, +gap, +inequality, +wage, +assets
Māori_Party:+health, +education, +economy, +employment, +selling, +whanau, +economic, +settlements, +economics, -issues, +growing, +leadership, +cost, +poverty, -rich, +lack, -government, +families
National:+economy, -poverty, +stability, +economic, -employment, -selling, +stable, +keeping, -poor, -assets
Did_Not_Vote:-economy, -poverty, +follow
NZ_First:+selling, -poverty, +people, +immigration, -economy, +overseas, +land
Focusing on things common to all groups, there is a pretty clear division, that comes out of the data between:
(poverty discussed, economy not discussed) parties, and (economy discussed, poverty not discussed) partiesWith the two parties that thought of themselves in the middle (NZ First) and the Māori Party forming their own opposing
(poverty discussed, economy discussed) Maori party and
(economy not discussed, poverty not discussed) NZ First and Did Not VoteMany of the parties can then be seen to have their specialist themes by their supporters, as well as common ground.
-
-
I should really link up Dirty Politics and Don't Know to be single terms- don't was on my list of excluded terms, hence Conservatives were in the know, rather than don't know
-
I haven't done any stemming (removing suffixes to bring terms together), but here are the commonest 10 terms written by NZES in the most serious issue free text question (removing common stop words like a, the, not, etc) for words that come up at least three times, assuming there are 10 terms that crop up at least 10 times. As I did nothing about sorting out ties, the tail end of each list may be a bit arbitrary as they could be tied with things that made the number more than 10.
Conservative: economy, government, know, housing, issues, land, no, cost, education, employment
Green: poverty, environment, child, education, rich, between, poor, inequality, gap, economy
Internet_Mana_Party: poverty, housing, child, no, children, feeding, schools
Labour: poverty, housing, health, nz, education, between, rich, child, assets, gap
Māori_Party: economy, health, poverty, employment, selling, education, economic, nz, our, education
National: economy, economic, stability, nz, housing, poverty, stable, dirty, education, politics
NZ_First: selling, economy, nz, housing, people, land, employment, overseas, tax, poverty
Didn’t Vote: poverty, economy, nz, dirty, politics, housing, health, jobs, know, child
ACT and United Future had insufficient representation to have any words occur 3 times or more.
-
There are both a series of questions rating the importance of a standard issues from the last election, and a free text "what is the most important issue to you" kind of question where people could write whatever they feel (which has also been coded into general and specific subject categories).
Because I'm curious, this evening I'll take the free text and figure out the commonest words written by party voted for.
-
Speaker: What we think and how we vote, in reply to
Ben, you might get a sense I have my doubts that the Left-Right umbrella terms have any more use than "I am voting National so I must be on the right but I feel like they are a bit more extreme than me" rather than as a determinant. It is just, as you said there are a lot of questions that place things in that context.
Similar survey, with some potential of linking results over time (he said nodding towards the future) a mix of general political questions, demographics, and questions about issues around election time.