INFO 370 - Lecture 17 - November 29, 2004  Notes by: Fortier, Collett, Burrell (Finally!!), Egaas, Podhola, Yaptinchay, Horm, Prins Quiz 3 > Next Wednesday > Not online!! (Boo!) > Study * Notes from experiments * Qualitative data analysis slides * Corresponding book chapters Ideographic - pertains to one person/ small group of people Nomothetic - applies to large groups of people / genralizable Case oriented analysis? Variables are not defined until you begin analysis. > Most closely associated with Qualitative. Variable oriented anlysis: Variables are defined before the analysis begins. > Most closely associated with Quanitative Qualitative Analysis > Memo - Notes to yourself about the coding - Types of Notes: * Code Note - Observation about the subject, clarification of terminology, defintion of variables * Operational Note - about the subject, surroundings, random notes about the setting, context, random observation * Theoretical Note - Subject of study (pertain to the topic at hand) > Coding - translates qualitative data (in the form of interviews, recordings, written records, so on) to variables. Concept Mapping - finding trends in your notes, graphical presentation of Theoretical notes Threats to internal validity: > Selection Bias - skewed results when groups differ (fight with stratified random selection). > Endogenous Change - subjects changes as a result of the study, learning (randomizing questions asked). > Contamination - when different groups come in contact with one another (need to separate the groups). > Treatment Misidentification - variable not accounted for (users are performing better because they are given more positive feedback). > Excodgenous Change - subject changes as a result from an external change (Earthquake, Presidential elections, roomates playing WoW all night long, war, famine, Pestilence, death, loss of job, Aaron Smith sucking up) Oberservation Techniques: > Covert Observer - Secret observer (an observer that does not inform subjects that he is observing). Can occur with both participatory and non-participatory observation. > Participatory Observer - an observer who partipates with the test subject (making suggestions, comments, etc.) > Non-participatory Observation > Obtrusive Observation - users knows they are being observed. > Unobtrustive Observation - antonym of above. (a.k.a. covert observer) //end quiz notes============================================================ GOAL: give us vocab that we could know about about the results of our studies Data Analysis ANOVA Mean sq. M F= ------------ Mean sq. Emon? Regression Mean sq. B.G F= ------------- Mean sq. W.G Begin Lecture Slides -- Frequency Distribution [insert crappy numbers table here] > Univariate analysis <-- ? > Required Information for Frequency Tables - table number and title - labels for the categories of the variables - column headings - the number of missing cases Histograms > Histograms show a spatial distribution of the numbers; show an actual value > Bar charts are often used for nominal values. > Ordinals variables > Tail is to the right = positive skew! Describing Frequency Distributions (this is what we notice) > Shape > Normal Distribution (Bell Curve) - Negative Skew * tail toward lower scores - Positive Skew * tail toward higher scores > Dispersion - larger the STDDev, the wider dispersion > Central tendency Central Tendency > Mean (arithmetic mean)(X) - Sum all the observations / n > Median - Value that divides the distribution so that an equal number of values are above the median and an equal Value that deivde the distribution so that an equal number of values are above the median and an equal number below. - Used best when data is skewed > Mode - Value with the greatest frequency - You'd choose the mode over the mean when you're using nominal variables Mode > Best for nominal variables > Problems - most common may not measue typically - may be more than one mode - unstable - can be manipulated > Dispersion - variation ratio (v) * % of people not in the modal category Median > Preferred for ordinal variables > Preferred for skewed distributions > When median falls in between two distinct numbers, split the difference (i.e., 2.5) Median - Dispersion > The nth percentile of a set of numbers is a value such that n percent of the numbers fall below it and the rest fall above. - The median is the 50th percentile - The lower quartile is the 25th percentile - The upper quartile is the 75th percentile > Five number summary - Median, quartiles and extremes Mean > uses the actual numerical values of the observations > Most common measure of center > Makes sense only of interval or ratio data > frequently computed for ordinal variables as well. Example Data Variance (s²) > Calculate the mean for the variable > Take each observation and subtract the mean from it (minus expected) > Square the result from the above > Add (sum) all the individual results (called the sum of squares) > Divide by n > ∑((x - avg(x))²)/num_observations = s² (Variance) > chi-squared value is related - we want this to be small if u want to have an association Analysis of variance (ANOVA) > Measure of center and dispersion of independant variables (high and low edu). If there is a difference, then can attribute level of education to dependable variable (variety of internet usage). > uses F instead of chi-squared >looks at mean square within the group, then difference between the groups > (mean sq between the groups/mean sq within the groups) > You use this when one is an ______ variable and the other is a ratio variable???? Association > Association in bivariate data means that certain values of one variable tend to occur more often with some values of the second variable than with other values of that variable (Moore p.242) Cross Tabulation Tables (great for ordinal variables) > Designate the X variable and the Y variable > Place the values of X across the table > Draw a column for each X value > Place the values of Y down the table > Draw a row for each Y value > Insert frequencies into each CELL > Compute totals (MARGINALS) for each column and row * Independent are the columns, dependent are the rows * Describing association (great for ratio variables) > Scattergram - a graph that can be used to show how two variables are related to one another > Scatter-grams are helpful for this > Straight line - perfect association > Not a straight line -> Not strong association > closer you get to a straight line - teh stringer the association > Descript in nature, strength, and direction - positive slope = positive association and vice versa - linear or curv-a-linear=peak and valley? Roller Coaster Description of Scattergrams > Strength of relationship - Strong - Moderate - Low > Linearity of Relationship - Linear - Curvilinear > Direction - Positive - Negative Correlation coefficient > Nominal -Phi (SPSS Cross-tabs) > Ordinal (linear) - Gamma (SPSS Cross-tabs) > Ratio - Pearson's r Correlation: Pearson’s r (SPSS correlate, bivariate) > Interval and/or ratio variables > Pearson product moment coefficient (r) - Two normally distributed variables - Assumes a linear relationship - Can be any number from 0 to 1 : 0 to 1 (+1) - Sign (+ or -) shows direction - Number shows strength - Linearity cannot be determined from the coefficient * r = .8913 Regression > Plot a line, estimate "fit", predict the value of one variable from the value of another. >Looking at the error between the predicted model(value) and the actual model(value) - The smaller the number, the stronger the association. Regression equation > Y = a + bX - Y : Dependent variable - a : Y intercept - b : Slope - X : Independent variable > Substitute x value into the equation and calculate the value of y When we do ANOVA the null hypothesis is that all means are equal. If we disprove that, then there is an association. When we do Regression, the null hypothesis is that the slope is zero. But the further away it is from zero, the stronger the association it is. Multicollinearity - when multiple independant variables are measuring the same thing. Ryan, you wanna get punched?........................ oh great, it's fucking christmas now... i'm get --- Xtreme Chat Box --- (sponsored by XtremeTek.com) SO HARD TO TYPE!!!! Macs use insane amounts of RAM... even with only 2 or 3 programs running XTREME! CA?! you're here! sweet hoorah, look at all the people! :O) i still think macs sux0rs -- then why'd you buy one, douche? unless of course you think windows sucks too, like I do. I just think everything sucks. yeah i think everything sux. you've spent more on macs than I.... dood aaron you bought a frickin iMac, yeah and it was a nice deal for it too who cares, it's a fucking iMac. fuck you don't wanna hear it David David's pink! yes! Ryan just got served, yo!! --damn, guess I'm *too* **old** school... OMG wtf..... 8 people... lol... it's a record it's also hella confusing...no kidding... :) har har har.. -----> http://www.ucomics.com/boondocks/2004/11/25/ that is clever and funny... and true (har har) Todays Doonesberry is also good.. http://story.news.yahoo.com/news?tmpl=story&u=/uclickcomics/20041129/cx_db_uc/db20041129 http://www.amazon.com/exec/obidos/ASIN/0823916839/qid%3D1101740850/sr%3D11-1/ref%3Dsr%5F11%5F1/104-2259170-3690334 ^^ You know you want it ^^