<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-6072580433200779500</id><updated>2012-02-02T15:08:49.478-05:00</updated><category term='delphi research question dissertation thesis statistics statistical consultant consulting'/><category term='tests'/><category term='dissertation editor proofread consultant'/><category term='surveys'/><category term='regression research question dissertation thesis statistics statistical consultant consulting'/><category term='heteroscedasticity variance residual error'/><category term='research question dissertation thesis statistics statistical consultant consulting'/><category term='dissertation thesis statistics statistical consultant consulting'/><category term='statistics consulting binomial Bernoulli probability'/><category term='statistics'/><category term='validity'/><category term='hypothesis null alternative testable'/><category term='statistics consultant Fisher Exact Test'/><title type='text'>Statistics Consulting Blog</title><subtitle type='html'>This blog was created to discuss various issues related to both statistics and consulting.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://statconsultant.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://statconsultant.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>StatHelper</name><uri>http://www.blogger.com/profile/09382106274285007286</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='31' src='http://2.bp.blogspot.com/-OxTzG7Gvxag/TWaf4WE4l4I/AAAAAAAAACM/AnCmPF7lFu0/s220/Jeff%2Bpic.JPG'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>17</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-6072580433200779500.post-8776634638512185255</id><published>2011-02-24T12:36:00.005-05:00</published><updated>2011-05-10T16:56:00.990-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='regression research question dissertation thesis statistics statistical consultant consulting'/><title type='text'>Building the Research House</title><content type='html'>&lt;span style="font-size:130%;"&gt;&lt;br /&gt;Imagine that you have been given a huge toychest filled with toy parts and pieces. Your goal is to build a toy house from these mix of pieces. As you will see, this is not all that different from certain aspects of the research process.&lt;br /&gt;&lt;br /&gt;In much of research, our goal is to predict or explain one variable from a set of other variables using the multiple regression technique. The variable that you predict or explain is usually called the dependent variable (or response variable). The variables that do the predicting or explaining are usually called the independent variables (or predictor variables). So, in our example, the house is the dependent variable, and all of those pieces are the independent variables.&lt;br /&gt;&lt;br /&gt;We may know ahead of time that certain pieces fit well with our conception of a house. For example, if our goal is a log-cabin, any toy piece made of wood might get automatically included in the mix. What about a toy necklace? Well, the beads might be used for decorations. But is it necessary to include them? On the other hand, why not just include every toy piece?&lt;br /&gt;&lt;br /&gt;In multiple regression, we are faced with similar decisions to make. Suppose we are trying to predict student math test scores. An obvious predictor would be exposure to training (i.e., attending a math course with content related to that math test), which can be measured by attendance. We might also include time spent studying in hours. We could go a step further and include variables such as presence of a tutor/mentor and family socioeconomic status. Technically, we could go even further...adding every toy piece from the chest...because, in reality, almost everything is related to everything in some way, shape, or form. So, how do we handle this?&lt;br /&gt;&lt;br /&gt;1. Keep the model as simple as possible. Do not include extraneous variables just for the sake of including them. Have a strong rationale for including every variable that you include in the model.&lt;br /&gt;&lt;br /&gt;2. Keep the model as complex as necessary. In contrast, to the previous advice, do not omit variables simply because you decided to keep it simple. If you liked having three predictors, but ignoring a fourth predictor would lead to an incomplete "house", by all means...include the fourth predictor. What you want is a set of variables that represent the majority of what you are predicting; variables that have negligible impact can be omitted from the research.&lt;br /&gt;&lt;br /&gt;3. Ensure that your proxy variables resemble reality as much as possible. For example, in educational research, we often measure socioeconomic status using the percent on free and/or reduced lunch programs as a proxy. And, this is currently argued to be the best proxy available.&lt;br /&gt;&lt;br /&gt;4. Base all decisions on strong theory. Attempt to find literature to justify your choice to include or ignore variables.&lt;br /&gt;&lt;br /&gt;5. Document anything that might help future researchers, and include this in your discussion. Did you ignore a particular variable because of lack of theory? Future researchers could develop a theory. Did you ignore a particular variable because of lack a decent proxy or available data? Future researchers might tackle that problem (or not be dealing with it).&lt;br /&gt;&lt;br /&gt;6. Acknowledge that human behavior includes inherent randomness that will not be explained by any particular set of variables.&lt;br /&gt;&lt;br /&gt;So, returning to the toychest, what we want is basic but requires much planning. We want a house that uses the minimum number of toy parts. However, we want as many toy parts as necessary to ensure that we have built something resembling a house. We connect our choices to preconceived notions such as wanting a particular style of house. We tell others that we didn't include certain pieces, and we tell them why since they might want to attempt this in the future. Finally, we accept that our final product will not be a perfect toy house; however, we made every attempt to create as perfect of one as possible.&lt;br /&gt;&lt;br /&gt;For consultations, please send an e-mail to&lt;br /&gt;&lt;a href="mailto:jeffmiller.research@gmail.com?subject=" consulting="" request=""&gt;Dr. Miller&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6072580433200779500-8776634638512185255?l=statconsultant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://statconsultant.blogspot.com/feeds/8776634638512185255/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6072580433200779500&amp;postID=8776634638512185255' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/8776634638512185255'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/8776634638512185255'/><link rel='alternate' type='text/html' href='http://statconsultant.blogspot.com/2011/02/building-research-house.html' title='Building the Research House'/><author><name>StatHelper</name><uri>http://www.blogger.com/profile/09382106274285007286</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='31' src='http://2.bp.blogspot.com/-OxTzG7Gvxag/TWaf4WE4l4I/AAAAAAAAACM/AnCmPF7lFu0/s220/Jeff%2Bpic.JPG'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6072580433200779500.post-9123027481411323253</id><published>2011-02-21T13:04:00.004-05:00</published><updated>2011-05-10T16:56:25.792-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='regression research question dissertation thesis statistics statistical consultant consulting'/><title type='text'>The Biggest Loser (Regression Style)</title><content type='html'>&lt;span style="font-size:130%;"&gt;&lt;br /&gt;Our goal with many research questions is to determine which variable is the biggest predictor of something. For example, we may want to predict endorsement for a public policy from variables such as income, gender, and party loyalty. Which predictor is strongest? Alternatively, which is the biggest loser? Is it the respondent's income? Is it gender? Or, is it some unobservable such as the degree to which they are loyal to a particular party?&lt;br /&gt;&lt;br /&gt;The statistical technique we use here, multiple regression, is fairly elementary. However, the interpretation of results is often erroneous. This is because we often want to compare the regression coefficients in the output to decide which predictor is strongest.&lt;br /&gt;&lt;br /&gt;Why might this be erroneous? Consider a simple example of comparing height and temperature. Is a height of 40 inches greater than a temperature of 24 degrees? We can't even answer that question. It's a comparison of apples and oranges. Technically, they are on different scales.&lt;br /&gt;&lt;br /&gt;In statistics, we typically standardize variables to put them on a level playing field so as to more validly make comparisons. In the original example, we shouldn't compare something like income in dollars to party loyalty, which might be on a 10-point agreement scale. So, we would want to look at the standardized regression coefficients, not the unstandardized regression coefficients. (In common language, these standardized regression coefficients are sometimes loosely called "the betas"; however, we should seek clarification because, technically, the unstandardized regression coefficients use the Greek beta symbol.)&lt;br /&gt;&lt;br /&gt;The downfall to standardized regression coefficients is that they are more difficult to interpret. Since the standardization puts the variables on standard deviation scales, we wind up interpreting in terms of changes in standard deviations. So, we would still want to look at unstandardized regression coefficients when wanting to explain how much the response variable changes on average given a one-unit change in the predictor variable. The standardized regression coefficients are the better choice when wanting to determine which variable is the strongest predictor of the response variable...the biggest winner...or the biggest loser.&lt;br /&gt;For consultations, please send an e-mail to&lt;br /&gt;&lt;a href="mailto:jeffmiller.research@gmail.com?subject=" consulting="" request=""&gt;Dr. Miller&lt;/a&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6072580433200779500-9123027481411323253?l=statconsultant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://statconsultant.blogspot.com/feeds/9123027481411323253/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6072580433200779500&amp;postID=9123027481411323253' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/9123027481411323253'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/9123027481411323253'/><link rel='alternate' type='text/html' href='http://statconsultant.blogspot.com/2011/02/biggest-loser-regression-style.html' title='The Biggest Loser (Regression Style)'/><author><name>StatHelper</name><uri>http://www.blogger.com/profile/09382106274285007286</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='31' src='http://2.bp.blogspot.com/-OxTzG7Gvxag/TWaf4WE4l4I/AAAAAAAAACM/AnCmPF7lFu0/s220/Jeff%2Bpic.JPG'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6072580433200779500.post-7391167902768810920</id><published>2011-02-18T17:44:00.008-05:00</published><updated>2011-05-10T16:56:50.079-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='delphi research question dissertation thesis statistics statistical consultant consulting'/><title type='text'>The Delphi Technique</title><content type='html'>&lt;span style="font-size:130%;"&gt;&lt;br /&gt;One of my first statistics consultations was for a graduate student using the Delphi Technique. I had never heard of it but was assured that the statistical part of it would not be difficult. Part of me was fascinated by learning a new technique; the other part was hoping that I might gain some prophetic insights as if making a pilgrimage to the ancient Oracle of Delphi in Greece.&lt;br /&gt;&lt;br /&gt;What is the Delphi Technique? At first glance, it seems like a qualitative technique. You sit down with a group of experts to discuss something. That's a qualitative focus group method, right? Ah...but we take things a step further.&lt;br /&gt;&lt;br /&gt;Suppose that I want to know how arts educators feel about standardized testing within their field. I might assume they would be against it since such testing in other areas is perceived to distract from their area. On the other hand, they might be for it since it would heighten validity for their field by being included alongside other subjects such as math and writing.&lt;br /&gt;&lt;br /&gt;I start the technique by having experts reply to open-ended questions such as, "How do you feel about including music on your state's standardized annual assessment? Describe in detail." Now, that is definitely qualitative! But, then I take the responses and use them to develop a survey with response options that are quantitative (e.g., a Likert scale). Alternatively, I might identify shared themes and then construct the survey such that themes can be rank-ordered in terms of things like favorability and priority.&lt;br /&gt;&lt;br /&gt;Next, I move on to what we would call Round 2. The same experts respond to the survey developed from the open-ended questionnaire. Now, this is still looking like some mixed-methods twist on a focus group technique. What sets this method apart from others is Round 3.&lt;br /&gt;&lt;br /&gt;In Round 3, I actually share the results with the experts. In other words, Expert A gets to see the responses from Expert B, C, and D. They then take the survey again thereby having the opportunity to adjust their previous responses. This step is usually repeated once more (i.e., Round 4); however, the process could continue through several iterations.&lt;br /&gt;&lt;br /&gt;Statistical analysis enters here when trying to determine an optimal level of agreement or merely wanting to display the results. I could use means and standard deviations for this. However, given the relatively small sample size used here, I might want to use the more sophisticated Score interval adapted by Randall Penfield at the University of Miami and myself for this process (&lt;a href="http://www.springerlink.com/content/d4q75278264v0g0r/"&gt;Miller &amp;amp; Penfield, 2005&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;The Delphi technique is an excellent technique for studies involving expert agreement and/or consensus building. It combines a traditional qualitative approach with the potential for rigorous quantitative analysis. Finally, it answers research questions that often have practical implications.&lt;br /&gt;&lt;br /&gt;The Delphi Technique was created by the &lt;a href="http://www.rand.org/topics/delphi-method.html"&gt;Rand Corporation&lt;/a&gt;.&lt;br /&gt;For consultations, please send an e-mail to&lt;br /&gt;&lt;a href="mailto:jeffmiller.research@gmail.com?subject=" consulting="" request=""&gt;Dr. Miller&lt;/a&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6072580433200779500-7391167902768810920?l=statconsultant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://statconsultant.blogspot.com/feeds/7391167902768810920/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6072580433200779500&amp;postID=7391167902768810920' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/7391167902768810920'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/7391167902768810920'/><link rel='alternate' type='text/html' href='http://statconsultant.blogspot.com/2011/02/delphi-technique.html' title='The Delphi Technique'/><author><name>StatHelper</name><uri>http://www.blogger.com/profile/09382106274285007286</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='31' src='http://2.bp.blogspot.com/-OxTzG7Gvxag/TWaf4WE4l4I/AAAAAAAAACM/AnCmPF7lFu0/s220/Jeff%2Bpic.JPG'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6072580433200779500.post-7784716271510541168</id><published>2011-02-17T17:40:00.005-05:00</published><updated>2011-05-10T16:57:34.073-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dissertation thesis statistics statistical consultant consulting'/><title type='text'>What "Causes" What??</title><content type='html'>I was thinking today about how loosely we use the term "cause" in research. We go into research wanting to demonstrate (or explore the possibility) that something &lt;span style="font-style: italic;"&gt;cause&lt;/span&gt;s&lt;span style="font-style: italic;"&gt; &lt;/span&gt;something. We may be interested in finding out if having loving parents as a child causes us to be giving adults. We may want to show that video games cause violent behavior...or maybe that being predisposed toward violence causes one to be interested in playing video games.&lt;br /&gt;&lt;br /&gt;The truth of the matter is that we can not determine &lt;span style="font-style: italic;"&gt;cause&lt;/span&gt; from statistical results. We may get that wonderfully low "p-value" for statistical significance...but even that says little to nothing about causation. Our results may replicate previous findings; yet, that still is not sufficient for causation. It may make intuitive sense, and everyone might nod their heads in agreement...but 400 friends agreeing that eating eggs causes one to spend more time on the Internet does not make that true....even if the results of a statistical analysis were to suggest that :)&lt;br /&gt;&lt;br /&gt;Determining causation is actually a matter of philosophy and depends on things like the design of the experiment, itself....not the statistical analysis. In fact, one could earn a doctoral degree in philosophy with a dissertation on nothing but the topic of causation. Depending on the source and theoretical framework, there are many sets of criteria one could choose from for defending causation. For the purposes of research results interpretation, we are better suited to use phrases such as&lt;br /&gt;&lt;br /&gt;"the results of the analysis &lt;span style="font-style: italic;"&gt;suggest&lt;/span&gt; that _____ &lt;span style="font-style: italic;"&gt;may cause &lt;/span&gt;_____"&lt;br /&gt;&lt;br /&gt;rather than&lt;br /&gt;&lt;br /&gt;"the results &lt;span style="font-style: italic;"&gt;prove &lt;/span&gt;that _____ &lt;span style="font-style: italic;"&gt;causes &lt;/span&gt;_____ .&lt;br /&gt;&lt;br /&gt;For consultations, please send an e-mail to&lt;br /&gt;&lt;a href="mailto:jeffmiller.research@gmail.com?subject=" consulting="" request=""&gt;Dr. Miller&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6072580433200779500-7784716271510541168?l=statconsultant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://statconsultant.blogspot.com/feeds/7784716271510541168/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6072580433200779500&amp;postID=7784716271510541168' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/7784716271510541168'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/7784716271510541168'/><link rel='alternate' type='text/html' href='http://statconsultant.blogspot.com/2011/02/what-causes-what.html' title='What &quot;Causes&quot; What??'/><author><name>StatHelper</name><uri>http://www.blogger.com/profile/09382106274285007286</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='31' src='http://2.bp.blogspot.com/-OxTzG7Gvxag/TWaf4WE4l4I/AAAAAAAAACM/AnCmPF7lFu0/s220/Jeff%2Bpic.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6072580433200779500.post-1371707967376446941</id><published>2009-05-11T18:27:00.006-04:00</published><updated>2010-12-09T18:44:51.384-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dissertation editor proofread consultant'/><title type='text'>Dissertation Editing</title><content type='html'>&lt;span class="Apple-style-span"  style="font-size:large;"&gt;    &lt;/span&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;     Writing a dissertation is a painstaking process. It usually begins with ideas floating around in the mind until it comes time to materialize it on paper. A few sentences hopefully inspires a thorough literature review. The wise student is meticulous in organizing specific ideas from hundreds of articles such that they tie into a cohesive and well-organized chapter of the dissertation proposal.&lt;/span&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;     During this process, the information load usually exceeds cognitive capacity to synthesize a perfect draft. More than often, the chapter is an amalgamation of statements, often disjointed with paragraphs related to a theme disconnected throughout the manuscript. Revisions are in order. The goal here is to get that proposal accepted so that the student can move on to the actual research! So, it is vital that the literature review and the preceding introduction chapter be clear in flowing from the statement of the research problem to its justification and substantiation via previous research.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;     There are two stages that are also tricky in terms of editing once the research is complete. First, there are the nuances of APA-format required for Chapter 4. For example, most of us don't naturally think to write "was statistically significant, &lt;/span&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;t&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;(38) = 2.19, &lt;/span&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;p&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt; = 0.035" with full knowledge of what to italicize and where to place spaces. Second, somehow that results chapter and the following discussion chapter have to connect with the preceding chapters in such a way that the entire dissertation tells a complete and coherent story.&lt;br /&gt;&lt;br /&gt;Please send requests for more information to jeffmiller.research@gmail.com&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:large;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6072580433200779500-1371707967376446941?l=statconsultant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://statconsultant.blogspot.com/feeds/1371707967376446941/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6072580433200779500&amp;postID=1371707967376446941' title='62 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/1371707967376446941'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/1371707967376446941'/><link rel='alternate' type='text/html' href='http://statconsultant.blogspot.com/2009/05/dissertation-editing.html' title='Dissertation Editing'/><author><name>StatHelper</name><uri>http://www.blogger.com/profile/09382106274285007286</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='31' src='http://2.bp.blogspot.com/-OxTzG7Gvxag/TWaf4WE4l4I/AAAAAAAAACM/AnCmPF7lFu0/s220/Jeff%2Bpic.JPG'/></author><thr:total>62</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6072580433200779500.post-1521799650112674801</id><published>2009-03-24T22:02:00.005-04:00</published><updated>2009-05-11T18:44:02.836-04:00</updated><title type='text'></title><content type='html'>&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6072580433200779500-1521799650112674801?l=statconsultant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://statconsultant.blogspot.com/feeds/1521799650112674801/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6072580433200779500&amp;postID=1521799650112674801' title='16 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/1521799650112674801'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/1521799650112674801'/><link rel='alternate' type='text/html' href='http://statconsultant.blogspot.com/2009/03/call-for-reforms-in-statistics.html' title=''/><author><name>StatHelper</name><uri>http://www.blogger.com/profile/09382106274285007286</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='31' src='http://2.bp.blogspot.com/-OxTzG7Gvxag/TWaf4WE4l4I/AAAAAAAAACM/AnCmPF7lFu0/s220/Jeff%2Bpic.JPG'/></author><thr:total>16</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6072580433200779500.post-190089389799045002</id><published>2008-09-01T23:39:00.003-04:00</published><updated>2010-12-09T18:45:15.129-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='statistics consultant Fisher Exact Test'/><title type='text'>My Table Has Too Many Zeros</title><content type='html'>Hi everyone,&lt;br /&gt;&lt;br /&gt;Sometimes we are interested in looking at the association between two qualitative variables such as gender and political party. We ask, "Is there a significant association between gender and political party?", and we typically use a chi-square test of independence in the statistical analysis.&lt;br /&gt;&lt;br /&gt;We usually assume an expected count of 5 in each cell but this doesn't always happen. SPSS will still provide output but include a note stating something like "50% of the cells have an expected count less than 5". What to do?&lt;br /&gt;&lt;br /&gt;Well, one solution is the Fisher's Exact Test. This is a nonparametric test that utilizes the hypergeometric distribution.&lt;br /&gt;&lt;br /&gt;Some software programs state that it only works for a 2 X 2 table such as Gender (male and female) by Political Party (Democrat and Republican). This simply isn't true. It's just very computer intensive.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Please send requests for more information to jeffmiller.research@gmail.com&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Best regards,&lt;br /&gt;Jeff&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6072580433200779500-190089389799045002?l=statconsultant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://statconsultant.blogspot.com/feeds/190089389799045002/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6072580433200779500&amp;postID=190089389799045002' title='25 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/190089389799045002'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/190089389799045002'/><link rel='alternate' type='text/html' href='http://statconsultant.blogspot.com/2008/09/my-table-has-too-many-zeros.html' title='My Table Has Too Many Zeros'/><author><name>StatHelper</name><uri>http://www.blogger.com/profile/09382106274285007286</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='31' src='http://2.bp.blogspot.com/-OxTzG7Gvxag/TWaf4WE4l4I/AAAAAAAAACM/AnCmPF7lFu0/s220/Jeff%2Bpic.JPG'/></author><thr:total>25</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6072580433200779500.post-2580928464281935382</id><published>2008-06-27T09:34:00.003-04:00</published><updated>2010-12-09T18:45:37.180-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='research question dissertation thesis statistics statistical consultant consulting'/><title type='text'>Being Wrong "On Average"</title><content type='html'>One of the most simple concepts we learn in a statistics course is that of the "average". Most of us come into a statistics course already familiar with the concept of adding up a bunch of numbers and then dividing that total by the number of things being added. For example, if three people are of ages 10, 20, and 30 then the average is simply (10 + 20 + 30) / 3 = 20. We learn that, in statistics, this average is called the "mean". Sometimes, we learn the more complete word for it -- the "arithmetic mean".&lt;br /&gt;&lt;br /&gt;Unfortunately, we often report the wrong average. Believe it or not, there is actually  more than one way to calculate the "mean", and the best one to use depends on what we are doing with the numbers. For example, suppose you are told that you will receive salary increases each year as follows.&lt;br /&gt;&lt;br /&gt;* Year 2 - 2.0%&lt;br /&gt;* Year 3 - 2.5%&lt;br /&gt;* Year 4 - 2.9%&lt;br /&gt;* Year 5 - 3.5%&lt;br /&gt;* Year 6 - 4.2%&lt;br /&gt;&lt;br /&gt;If your starting pay was $40,000 USD then, to get your Year 2 pay, you would multiply 40,000 X 1.02 = 40,800. To get the Year 3 pay, you would then multiply that 40,800 X 1.025 = 41,820. Notice that in each case we are "multiplying", not adding. Hence, to get the average percent pay increase, we should calculate a mean based on multiplying. The arithmetic mean would not be the most correct average to report.&lt;br /&gt;&lt;br /&gt;Fortunately, we can use the geometric mean to find the answer to this problem. First, we multiply, not add, all the numbers together. Let k = the number of items being multiplied. Now, just take the k-th root.&lt;br /&gt;&lt;br /&gt;Note that if you only multiplied 2 numbers, you would take the 2nd root, which is the square root. The 3rd root is often called the cubed root. In our example, we are multiplying 5 numbers, so we take the fifth root.&lt;br /&gt;&lt;br /&gt;Here's how we calculate the answer in Excel.&lt;br /&gt;&lt;br /&gt;1.) Convert the percentages to the numbers we would use in multiplication.&lt;br /&gt;&lt;br /&gt;* Year 2 - 1.02&lt;br /&gt;* Year 3 - 1.025&lt;br /&gt;* Year 4 - 1.029&lt;br /&gt;* Year 5 - 1.035&lt;br /&gt;* Year 6 - 1.042&lt;br /&gt;&lt;br /&gt;2.) Enter these values in Excel. I put them in cells A1 through A5.&lt;br /&gt;&lt;br /&gt;3.) In cell A6, I multiply by typing =PRODUCT(A1:A5) which gives me a multiplicative solution of 1.16&lt;br /&gt;&lt;br /&gt;4.) Take the 5th root by raising the product to the 1/5th power. In cell A7, I type =A6^(1/5) giving a solution of 1.03&lt;br /&gt;&lt;br /&gt;An easier method in Excel would be to just type =GEOMEAN(A1:A5) and skip steps 3 and 4. It grants the same solution.&lt;br /&gt;&lt;br /&gt;So, the average pay increase over these 5 years is 1.03, or 3%. Note that the arithmetic mean would also suggest an "arithmetic" average increase of 3% but this isn't always the case. Further, I rounded to two decimal places. If we were dealing with large numbers and more decimal places then the results might be more meaningfully different.&lt;br /&gt;&lt;br /&gt;Try your hand at it. A stock has yielded the following changes over the past 5 quarters: 4%, 7%, 12%, 10%, and 11%. What is the average increase? How do your results differ when using the typical arithmetic mean versus the more appropriate geometric mean?&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Please send requests for more information to jeffmiller.research@gmail.com&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6072580433200779500-2580928464281935382?l=statconsultant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://statconsultant.blogspot.com/feeds/2580928464281935382/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6072580433200779500&amp;postID=2580928464281935382' title='20 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/2580928464281935382'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/2580928464281935382'/><link rel='alternate' type='text/html' href='http://statconsultant.blogspot.com/2008/06/being-wrong-on-average.html' title='Being Wrong &quot;On Average&quot;'/><author><name>StatHelper</name><uri>http://www.blogger.com/profile/09382106274285007286</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='31' src='http://2.bp.blogspot.com/-OxTzG7Gvxag/TWaf4WE4l4I/AAAAAAAAACM/AnCmPF7lFu0/s220/Jeff%2Bpic.JPG'/></author><thr:total>20</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6072580433200779500.post-3287608675192479096</id><published>2008-01-05T11:55:00.001-05:00</published><updated>2010-12-09T18:46:26.685-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='hypothesis null alternative testable'/><title type='text'>Stating Testable Hypotheses</title><content type='html'>Provided that one has a properly framed research question, the next step is to rephrase it in a manner that can generate an answer. The question is permitted to be vague; however, the hypotheses must be concrete.  We must bring the question down from the realm of ideas and settle it on solid earth.&lt;br /&gt;&lt;br /&gt;The research question communicates your "Hmmm, I wonder..." as "Hmmmm, I think..." So, in that sense, the hypothesis does remain a bit vague but only because aren't making any judgments of knowing. It is the words that we choose in our hypothesis that must be concrete.&lt;br /&gt;&lt;br /&gt;Here is an example of the research question:&lt;br /&gt;&lt;br /&gt;Research Question: Are children are more resilient than we think they are?&lt;br /&gt;&lt;br /&gt;See how vague that is?  What do we mean by resilient? How would we know if they are more resilient than 'we think' they are?&lt;br /&gt;&lt;br /&gt;Now, here's a hypothesis&lt;br /&gt;&lt;br /&gt;Hypothesis: Childrens' scores on the Resilience Scale will negatively correlate with their parents' scores on the Perceptions of Child Resilience Scale.&lt;br /&gt;&lt;br /&gt;We have declared a way to measure resilience and a way to measure parents' perceptions of childrens' resilience via the two Scales (i.e., surveys).  We have also declared a way to compare the two via "negatively correlate"&lt;br /&gt;&lt;br /&gt;This hypothesis would suggest that lower scores for parents' perception are associated with higher scores for childrens' perception. Parents with low belief in their childrens' perception &lt;--&gt; Children with high resilience.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Note that this only an example of how to state a hypothesis.  There are many other concerns not addressed here (e.g., what is meant by children?). Note also that this only one of many ways that the hypothesis could have been stated.&lt;br /&gt;&lt;br /&gt;Strictly speaking, the hypothesis should be stated to the contrary as a "null hypothesis" symbolized H0:&lt;br /&gt;&lt;br /&gt;H0: Childrens' scores on the Resilience Scale will NOT be negatively correlate with their parents' scores on the Perceptions of Child Resilience Scale.&lt;br /&gt;&lt;br /&gt;We accept the null as the truth. Our results can then tell us how much we can trust "our" hypothesis, often called the "alternative hypothesis" symbolized H1: or Ha:&lt;br /&gt;&lt;br /&gt;Given a stated testable hypothesis, we can then more efficiently gather data, conduct trials, analyze the data, and interpret results.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Please send requests for more information to jeffmiller.research@gmail.com&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6072580433200779500-3287608675192479096?l=statconsultant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://statconsultant.blogspot.com/feeds/3287608675192479096/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6072580433200779500&amp;postID=3287608675192479096' title='20 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/3287608675192479096'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/3287608675192479096'/><link rel='alternate' type='text/html' href='http://statconsultant.blogspot.com/2008/01/stating-testable-hypotheses.html' title='Stating Testable Hypotheses'/><author><name>StatHelper</name><uri>http://www.blogger.com/profile/09382106274285007286</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='31' src='http://2.bp.blogspot.com/-OxTzG7Gvxag/TWaf4WE4l4I/AAAAAAAAACM/AnCmPF7lFu0/s220/Jeff%2Bpic.JPG'/></author><thr:total>20</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6072580433200779500.post-3603173334020091315</id><published>2008-01-02T08:56:00.001-05:00</published><updated>2010-12-09T18:46:11.228-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='research question dissertation thesis statistics statistical consultant consulting'/><title type='text'>Framing Research Questions</title><content type='html'>Framing the research question is similar in difficulty to writing the introduction paragraph in a paper.  It forces you to engage in effortful and deep levels of cognitive processing.  It requires you to objectively state what it is that you are researching. &lt;br /&gt;&lt;br /&gt;Why is it so difficult to do this and do it well? &lt;br /&gt;&lt;br /&gt;It's a cognitive workout.  We humans are suited for automating simple tasks in order to pursue more complex goals.  For example, the mere act of getting out of bed involves hundreds of decisions that engage hundreds of muscles; yet, we take this for granted and just get out of bed.  Similarly, our research topic is probably very familiar to us.  We may think, "Well, my question is obvious to me.  Why detail it out in just a few words?&lt;br /&gt;&lt;br /&gt;Failure to properly frame a research question seriously puts the cart before the horse.  It has been said that a thousand-mile journey begins with a single step. But in which direction? Which foot? Which journey?&lt;br /&gt;&lt;br /&gt;Thomas Carlyle stated that "a man without a goal is like a ship without a rudder".  In the case of framing a research question, you may still have a rudder but you may have forgotten your map.  Or you may have a map that is upside-down.  Worse yet, you may have brought the wrong map.&lt;br /&gt;&lt;br /&gt;TRICKS AND TIPS&lt;br /&gt;1. Read other research papers and see how it was done.  Draw from experience.&lt;br /&gt;2. Explain your research to others in their language. Don't just say, "Well, it's a bit complicated."&lt;br /&gt;3. Write several questions.&lt;br /&gt;4. Don't worry when your question generates other questions.  This is a good thing!  Write those down and come back to them.&lt;br /&gt;5. Write operational definitions for the terms that might appear in your research questions.&lt;br /&gt;6. Make a concept map.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Please send requests for more information to jeffmiller.research@gmail.com&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6072580433200779500-3603173334020091315?l=statconsultant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://statconsultant.blogspot.com/feeds/3603173334020091315/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6072580433200779500&amp;postID=3603173334020091315' title='80 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/3603173334020091315'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/3603173334020091315'/><link rel='alternate' type='text/html' href='http://statconsultant.blogspot.com/2008/01/framing-research-questions.html' title='Framing Research Questions'/><author><name>StatHelper</name><uri>http://www.blogger.com/profile/09382106274285007286</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='31' src='http://2.bp.blogspot.com/-OxTzG7Gvxag/TWaf4WE4l4I/AAAAAAAAACM/AnCmPF7lFu0/s220/Jeff%2Bpic.JPG'/></author><thr:total>80</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6072580433200779500.post-1016434427564093136</id><published>2007-11-23T18:57:00.002-05:00</published><updated>2010-12-09T18:46:51.874-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='statistics consulting binomial Bernoulli probability'/><title type='text'>Demystifying the Binomial Distribution Formula</title><content type='html'>So, we're taking a stats class and reading the text.  We're learning about probability, and all is going well.  Of course flipping a coin has a heads probability of .50!  Of course the probability of rolling a 6 on a fair 6-sided die is 1/6!  Stats is supposed to be difficult?&lt;br /&gt;&lt;br /&gt;Then we find out that we were just learning about ONE flip of the coin and ONE roll of the die.  Dr. Quantoid tell us, "It is estimated that 20% of the fish in my lake are large-mouth bass.  What is the probability of catching a large-mouth bass?&lt;br /&gt;&lt;br /&gt;0.2, right? Right!&lt;br /&gt;&lt;br /&gt;But, what is the probability of catching 3 bass out of 5 fish caught?"&lt;br /&gt;&lt;br /&gt;Uh-oh....the neurons in this classroom have just started firing....and possibly, not down the correct paths!&lt;br /&gt;&lt;br /&gt;The professor throws up this formula on the board:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_ixPXfjEM108/R0d1P2m07bI/AAAAAAAAABA/t66NHe0lZsY/s1600-h/probform.JPG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 163px; height: 43px;" src="http://bp0.blogger.com/_ixPXfjEM108/R0d1P2m07bI/AAAAAAAAABA/t66NHe0lZsY/s320/probform.JPG" alt="" id="BLOGGER_PHOTO_ID_5136202815256194482" border="0" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;I will use "typing notation" to restate this as&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;Pr(Y=k) = (nCk) * (p^k) * ((1-p)^(n-k)&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;Either way, the students' eyes go blurry and class is followed by a surge in optometrist appointments (possibly saving the day from a predicted sharp economic downturn)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;But, the formula REALLY ISN'T COMPLICATED.  It just looks that way. It's kinda' like those new-fangled cell phone PDA mp3 abc pdq xyz all-in-one devices.  Hey, if you can find "Play" or "Send", it's not all that bad, right?&lt;br /&gt;&lt;br /&gt;(Caution: I think the equation is blurry anyway but I can't tell...I'm not wearing my contacts. Please report any suspected true blurriness to the webmaster before visiting your optometrist.)&lt;br /&gt;&lt;br /&gt;Moving forward...let's look at the first part:&lt;br /&gt;&lt;br /&gt;Pr( Y=k) =&lt;br /&gt;&lt;br /&gt;This is just saying that the probability that Y=3 is equal to something.  What is Y=3?  Well, you only have two options for binary variables....0 and 1.  A 0 is usually meant to denote the "failure", the "miss", the "tails", the "no" while the 1 is usually meant to denote the "success", the "hit", the "heads", the "yes".&lt;br /&gt;&lt;br /&gt;So, Pr(Y=3) is the probability of catching 3 bass....of having 3 successes....three hits....etc....is equal to something.&lt;br /&gt;&lt;br /&gt;That something looks like a messy room full of n's , k's , and p's.  (I thought I told you kids to clean...that...mess...up!)&lt;br /&gt;&lt;br /&gt;Ok, let's do it.&lt;br /&gt;&lt;br /&gt;First, we should befriend those letters. (If you don't believe that letters can be your friend then you definitely didn't watch enough Sesame Street as a kid!)&lt;br /&gt;&lt;br /&gt;n is just our sample size....you should be ok with this by now.  n=5&lt;br /&gt;&lt;br /&gt;k is the number of Y=1's of interest.  We want to know the probability of catching 3 bass out of 5.  So, k=3&lt;br /&gt;&lt;br /&gt;Finally, if we didn't know something about the probability of catching a bass then, yes, this problem would be VERY difficult to solve. But, we do know. (whew)&lt;br /&gt;&lt;br /&gt;This probability is 0.2. So, p=.2&lt;br /&gt;&lt;br /&gt;So, we plug in all these numbers and get the answer right?  Right.  But, that doesn't help you understand why the equation is so easy.&lt;br /&gt;&lt;br /&gt;Let's look at the second part of the equation...(p^k).&lt;br /&gt;&lt;br /&gt;This is asking you to raise the probability to the success power. So, we are raising .20 to the 3rd power.  Why? Well, we already should know that the joint probability of two independent events is simply their product.  For example, the probability of flipping two heads is .5 * .5 = .25.&lt;br /&gt;&lt;br /&gt;So, what is the probability of buying 20 gallons of rotten milk when the probability is .05?&lt;br /&gt;&lt;br /&gt;Well, that's .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05.&lt;br /&gt;&lt;br /&gt;Now, why on Earth would we want to write all of that down when it's the same thing as saying .05 to the 20th power, which is .05^20? And, for that matter, why on Mars? Why on Jupiter? Why on any planet?&lt;br /&gt;&lt;br /&gt;(I will save other possibilities such as "Why in Heaven?" and "Why in Hell" for the metaphysics community.  It's just a bit too 'out there' for statisticians already dealing with issues of causality, etc...")&lt;br /&gt;&lt;br /&gt;Many students solve p^k and turn that in as the answer.  How callous, rude, and thoughtless! What about the other 2 fish? Y'know....those 2 non-bass?!?  Do you really think that just because we are interested in the probability of 3 bass out of 5 fish that we can just ignore the 2 that aren't bass?  If you ignore them, aren't you just talking about the probability of 3 bass out of 3 fish? (And discrimination of anything is SO pre-Y2K!)&lt;br /&gt;&lt;br /&gt;There is a take-home point here!  When we talk about the 0's, failures, misses, tails, non-bass, etc...we are not making a moral indictment on them.  They are still INFORMATION. They still must be included in our calculations (even if we are upset that we only caught 3 bass and have 5 mouths at home to feed and don't have a cat to kick and feel compelled to blame Flipper).&lt;br /&gt;&lt;br /&gt;So, this all ties into the third part of the equation...(1-p)^(n-k).  Well, if the p=.2 is the probability of catching a bass then (1-.2), which is .8, is the probability of catching a non-bass.  Like before, we wouldn't want to multiply .8 over and over and over by hand.  If there are n=5 in the sample and there are k=3 bass, then we are now interested in 5-3 = 2 non-bass.  So, we just raise .8 to the 2nd power.&lt;br /&gt;&lt;br /&gt;Combining p^k and (1-p)^(n-k), all we are doing is asking for .2 * .2 * .2 * .8 * .8&lt;br /&gt;&lt;br /&gt;It's just the joint probability solved by multiplying the individual probabilities for both successes AND failures. Why didn't they just Dr. Quantoid just put THAT in the textbook?  Well, look back at the rotten milk example. Or think about this: What is the probability of 400 students out of 1,200 getting into graduate school when the probability is .14.  Would you like to write that out?  Or would it better to just say (.14^400) *((1-.14) ^ (1200 - 400))&lt;br /&gt;&lt;br /&gt;Finally, what about the first part of the equation? Y'know that oversized parenthesis with an n on top of a k?&lt;br /&gt;Well, we read that as "n choose k". So, it's 5 choose 3.  Huh?&lt;br /&gt;&lt;br /&gt;Of course we want to choose 3 from 5.  But, guess what?  There are a lot of ways to do this. Below are my fishing results for 3 different days.&lt;br /&gt;&lt;br /&gt;Day 1: Bass    Bass    Bass    Bluegill    Carp&lt;br /&gt;Day 2: Carp    Bass   Bass    Shark    Bass&lt;br /&gt;Day 3: Whale Dolphin    Bass Bass Bass&lt;br /&gt;&lt;br /&gt;The point here is that there are many ways to catch 3 fish out of 5.  It could be the first 3 fish as in day 1.  It could be the last 3 fish as in day 3.&lt;br /&gt;&lt;br /&gt;I could write out all of the possibilities and do some math but it's a lot easier to know that "n choose k" will give me the answer right away.&lt;br /&gt;&lt;br /&gt;"n choose k" is equal to n! / k!(n-k!)&lt;br /&gt;&lt;br /&gt;n! is (5*4*3*2*1)&lt;br /&gt;&lt;br /&gt;k! is (3*2*1)&lt;br /&gt;&lt;br /&gt;n-k! is (5-3)! = 2! = (2*1)&lt;br /&gt;&lt;br /&gt;So, the full solution is ((5*4*3*2*1) / (3*2*1) * (2*1))&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;(Note: there was actually another bass inside of the whale but that violates the binomial assumption of identical trials so we'll pretend that didn't happen.)&lt;br /&gt;&lt;br /&gt;(Note: After catching the whale, I almost caught a bass but the dolphin ate it....that violates the binomial assumption of independent trials so we'll pretend that didn't happen.)&lt;br /&gt;&lt;br /&gt;So, what have we learned?&lt;br /&gt;&lt;br /&gt;1. We are just multiplying the probabilities of "successes".&lt;br /&gt;2. We are also multiplying the probabilities of "failures".&lt;br /&gt;3. We are multiplying this by all the ways we can have those "successes".&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Please send requests for more information to jeffmiller.research@gmail.com&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6072580433200779500-1016434427564093136?l=statconsultant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://statconsultant.blogspot.com/feeds/1016434427564093136/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6072580433200779500&amp;postID=1016434427564093136' title='20 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/1016434427564093136'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/1016434427564093136'/><link rel='alternate' type='text/html' href='http://statconsultant.blogspot.com/2007/11/demystifying-binomial-distribution.html' title='Demystifying the Binomial Distribution Formula'/><author><name>StatHelper</name><uri>http://www.blogger.com/profile/09382106274285007286</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='31' src='http://2.bp.blogspot.com/-OxTzG7Gvxag/TWaf4WE4l4I/AAAAAAAAACM/AnCmPF7lFu0/s220/Jeff%2Bpic.JPG'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp0.blogger.com/_ixPXfjEM108/R0d1P2m07bI/AAAAAAAAABA/t66NHe0lZsY/s72-c/probform.JPG' height='72' width='72'/><thr:total>20</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6072580433200779500.post-3739647261972458506</id><published>2007-10-24T17:28:00.000-04:00</published><updated>2007-10-24T17:30:58.160-04:00</updated><title type='text'>Six (6) Common Causes of Autocorrelation</title><content type='html'>&lt;span style="font-size: 130%;"&gt;This is a handy list of some of the causes of autocorrelation as distilled from Gujarati, D. N. (2002), &lt;span style="font-style: italic;"&gt;&lt;a href="http://www.amazon.com/Basic-Econometrics-Damodar-N-Gujarati/dp/0072478527/ref=pd_bbs_2/102-7172347-1674508?ie=UTF8&amp;amp;s=books&amp;amp;qid=1193261205&amp;amp;sr=1-2"&gt;Basic Econometrics&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;1. Omitted Variable Misspecification&lt;br /&gt;2. Incorrect Functional Form Misspecification&lt;br /&gt;3. Cobweb Phenomenon (it takes the public time to adjust to a change in policy)&lt;br /&gt;4. Lags (yesterday impacts today)&lt;br /&gt;5. Manipulation of Data (creating subsets over time resulting in a systematic pattern)&lt;br /&gt;6. Data Transformations&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6072580433200779500-3739647261972458506?l=statconsultant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://statconsultant.blogspot.com/feeds/3739647261972458506/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6072580433200779500&amp;postID=3739647261972458506' title='18 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/3739647261972458506'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/3739647261972458506'/><link rel='alternate' type='text/html' href='http://statconsultant.blogspot.com/2007/10/six-6-common-causes-of-autocorrelation.html' title='Six (6) Common Causes of Autocorrelation'/><author><name>StatHelper</name><uri>http://www.blogger.com/profile/09382106274285007286</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='31' src='http://2.bp.blogspot.com/-OxTzG7Gvxag/TWaf4WE4l4I/AAAAAAAAACM/AnCmPF7lFu0/s220/Jeff%2Bpic.JPG'/></author><thr:total>18</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6072580433200779500.post-5442397225226530353</id><published>2007-10-24T17:22:00.000-04:00</published><updated>2007-10-24T17:27:52.109-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='heteroscedasticity variance residual error'/><title type='text'>Eight (8) Sources of Heteroscedasticity</title><content type='html'>&lt;span style="font-size:130%;"&gt;This is a handy list of some of the causes of heteroscedasticity (unequal conditional variances) as distilled from Gujarati, D. N. (2002), &lt;span style="font-style: italic;"&gt;&lt;a href="http://www.amazon.com/Basic-Econometrics-Damodar-N-Gujarati/dp/0072478527/ref=pd_bbs_2/102-7172347-1674508?ie=UTF8&amp;amp;s=books&amp;amp;qid=1193261205&amp;amp;sr=1-2"&gt;Basic Econometrics&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;1. Error Learning (variance declines as a result of practice)&lt;br /&gt;2. Discretionary Capital (variance increases in the relationship between income and spending)&lt;br /&gt;3. Improved Data Collection Procedures&lt;br /&gt;4. Outliers&lt;br /&gt;5. Misspecification (e.g., omitted variables)&lt;br /&gt;6. Skewness&lt;br /&gt;7. Incorrect Data Transformation&lt;br /&gt;8. Incorrect Functional Form for the Model&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6072580433200779500-5442397225226530353?l=statconsultant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://statconsultant.blogspot.com/feeds/5442397225226530353/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6072580433200779500&amp;postID=5442397225226530353' title='31 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/5442397225226530353'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/5442397225226530353'/><link rel='alternate' type='text/html' href='http://statconsultant.blogspot.com/2007/10/eight-8-sources-of-heteroscedasticity.html' title='Eight (8) Sources of Heteroscedasticity'/><author><name>StatHelper</name><uri>http://www.blogger.com/profile/09382106274285007286</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='31' src='http://2.bp.blogspot.com/-OxTzG7Gvxag/TWaf4WE4l4I/AAAAAAAAACM/AnCmPF7lFu0/s220/Jeff%2Bpic.JPG'/></author><thr:total>31</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6072580433200779500.post-4027801461206796265</id><published>2007-10-01T18:26:00.000-04:00</published><updated>2007-10-01T18:36:49.013-04:00</updated><title type='text'>Latent Growth Curve vs. Repeated Measures ANOVA</title><content type='html'>I read this article last night that I think did an excellent job of explaining the advantages of latent growth curve (LGC) modeling over using repeated measures analysis of variance (RM-ANOVA).  Below is my summary. My thoughts are consistent with theirs with the exception of Similarity 2 below (which they claim is peculiar to LGC).&lt;br /&gt;&lt;br /&gt;SIMILARITY 1: Both result in the same parameter estimates (IFF the stricter RM-ANOVA assumptions are met).&lt;br /&gt;&lt;br /&gt;SIMILARITY 2: Both permit a wide variety of alternative specifications such as modeling nonlinearity (ALTHOUGH this is much more straightforward in the LGCs).&lt;br /&gt;&lt;br /&gt;DIFFERENCE 1: LGC estimates are not attenuated since the measurement error has been teased out. This is actually the primary advantage of all structural models for latent variables.&lt;br /&gt;&lt;br /&gt;DIFFERENCE 2: LGC is not restricted to RM-ANOVAs compound symmetry or sphericity assumptions...any variance/covariance structured can be incorporated and tested.&lt;br /&gt;&lt;br /&gt;DIFFERENCE 3: LGC won't listwise delete your missing data.  It uses Full Information Maximum Likelihood (FIML) to incorporate all existing data.&lt;br /&gt;&lt;br /&gt;DIFFERENCE 4: LGC isn't restricted to equally-spaced time intervals.&lt;br /&gt;&lt;br /&gt;Article: Llabre, M. M., Spitzer, S., Siegel, S., Saab, P. G., &amp;amp; Schneiderman, N. (2004). Applying latent growth curve modeling to the investigation of individual differences in cardiovascular recovery from stress, &lt;span style="font-style: italic;"&gt;Psychosomatic Medication, 66,&lt;/span&gt; 29-41.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6072580433200779500-4027801461206796265?l=statconsultant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://statconsultant.blogspot.com/feeds/4027801461206796265/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6072580433200779500&amp;postID=4027801461206796265' title='80 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/4027801461206796265'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/4027801461206796265'/><link rel='alternate' type='text/html' href='http://statconsultant.blogspot.com/2007/10/latent-growth-curve-vs-repeated.html' title='Latent Growth Curve vs. Repeated Measures ANOVA'/><author><name>StatHelper</name><uri>http://www.blogger.com/profile/09382106274285007286</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='31' src='http://2.bp.blogspot.com/-OxTzG7Gvxag/TWaf4WE4l4I/AAAAAAAAACM/AnCmPF7lFu0/s220/Jeff%2Bpic.JPG'/></author><thr:total>80</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6072580433200779500.post-7080043600153075663</id><published>2007-09-19T18:59:00.000-04:00</published><updated>2007-09-19T19:30:06.369-04:00</updated><title type='text'>Logistic Regression (in a nutshell)</title><content type='html'>I recently googled "logistic regression" and was surprised at the variety of pages out there.  I found ones with annotated output that didn't really explain the input.  I found ones that explained logistic regression but bounced around from intuitive interpretations to throwing out statistical formulae. Even wikipedia provides just enough to let beginners think that they understand and then just enough to thoroughly confuse them.&lt;br /&gt;&lt;br /&gt;Here is logistic regression in a nutshell:&lt;br /&gt;&lt;br /&gt;1.) You have a dependent (response) variable that is binary. That means there only two outcomes. It's hit or miss. It's yes or no. It's on or off. It's male or female. And et cetera ad nauseam...&lt;br /&gt;&lt;br /&gt;2.) We want to predict the probability of a response...so we are talking about a 0 to 1 scale.&lt;br /&gt;&lt;br /&gt;3.) You shouldn't use linear regression for several reasons but the most common-sense reason is that linear regression can provide predictions NOT on a 0 to 1 scale. You could get a predicted probability of 2.4 or -8.3. That's just not possible (unless you are one of those mathematicians who likes to write proofs to debunk the status quo).&lt;br /&gt;&lt;br /&gt;4.) We need a way to link the probabilistic response variable to the continuous and/or categorical predictors AND keep things on this 0 to 1 scale.&lt;br /&gt;&lt;br /&gt;5.) Long story short....logistic regression winds up transforming the probabilities to odds and then taking the natural logarithm of these odds, which we now call logits. I'm not going to explain odds and logits. If you want to know more, then google "logistic regression", and you will get your fill.&lt;br /&gt;&lt;br /&gt;6.) Suppose your response variable is passing a test (by convention, 0=no and 1=yes). You have 1 predictor - number of days present in class over the past 30 days. Suppose the regression coefficient (often just called beta) in the output is .14.  You would then say that, on average, as class presence increases by 1 day, the natural logarithm of the odds of passing the test increases by .14.&lt;br /&gt;&lt;br /&gt;7.) You thought I was going to make this  easy understand, and now I am talking about natural logarithms of odds for my interpretation? No. For the interpretation, you can just talk about the odds. Most computer output will give you this number. Suppose the answer in odds is 1.24. Then, you just say that,on average, as class presence increases by 1 day, the odds of passing the test are multiplied by 1.24. In other words, for each additional day present, the odds of passing are 24% greater than that of not passing.&lt;br /&gt;&lt;br /&gt;8.) To validate our findings, normally, we test whether the regression coefficient is equal to zero in the population. In logistic regression, the corresponding value for the odds is one (not zero). We got an odds of 1.24.  Can we trust this? Or should we go with one (which would mean that the odds are the same for both passing and not passing, and hence class presence makes no difference at all)?  Look at the p-value (significance). If it less than .05 (by convention), you have enough evidence to reject the notion that the odds are really one. You go ahead and support the 1.24 result.&lt;br /&gt;&lt;br /&gt;We could talk at greater length about logistic regression concerns such as model fit, classification tables, its analog of discriminant analysis, the logistic regression versions of R2,  etc...but that wouldn't be logistic regression in a nutshell.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6072580433200779500-7080043600153075663?l=statconsultant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://statconsultant.blogspot.com/feeds/7080043600153075663/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6072580433200779500&amp;postID=7080043600153075663' title='98 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/7080043600153075663'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/7080043600153075663'/><link rel='alternate' type='text/html' href='http://statconsultant.blogspot.com/2007/09/logistic-regression-in-nutshell.html' title='Logistic Regression (in a nutshell)'/><author><name>StatHelper</name><uri>http://www.blogger.com/profile/09382106274285007286</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='31' src='http://2.bp.blogspot.com/-OxTzG7Gvxag/TWaf4WE4l4I/AAAAAAAAACM/AnCmPF7lFu0/s220/Jeff%2Bpic.JPG'/></author><thr:total>98</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6072580433200779500.post-6193465493012858638</id><published>2007-09-17T18:53:00.000-04:00</published><updated>2007-09-17T19:58:34.871-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='validity'/><category scheme='http://www.blogger.com/atom/ns#' term='surveys'/><category scheme='http://www.blogger.com/atom/ns#' term='tests'/><title type='text'>The Validity of Validity</title><content type='html'>I have received numerous requests on the lines of "Are my results valid?"  and  "Can I say that my survey is valid?".  It is unfortunate that courses and texts continue to throw the validity word around as if it has some yes-no, right-wrong connotation.  Hopefully, this blog will help dispel some of the myths.&lt;br /&gt;&lt;br /&gt;1.) Validity is a matter of degree.  There is no such thing as anything being perfectly valid or not valid.  Even if this was proven false, it would only taken an application to a new population or the intervention of a historical event to debunk the claim. The best we can do is strive for the greatest validity possible.&lt;br /&gt;&lt;br /&gt;2.) Related to this, whatever is being validated must be continuously re-validated.  Some graduate students go to great length to show that their survey has been validated in umpteen contexts and scenarios.  They reference factor analyses and instrument updates due to changes over time.  This is great; however, the purpose of this is not only to validate the use of the instrument in their research but also to use their results to add to the body of evidence for validity of said instrument.&lt;br /&gt;&lt;br /&gt;Imagine that you have a survey purported to measure "Fear of Terrorist Activity".  First, dramatic events such as 9/11 would certainly require re-validation of that instrument. Second, constructs morph over time...the notion of terrorism has done so even between the 70's, 80's, 90's, and 00's.  Third, there is always the possibility that the construct will differ when applied to a different population. For example, shouldn't we analyze the validity of this instrument when applying it to an indigenous tribe far removed from current terrorist events?&lt;br /&gt;&lt;br /&gt;3.) We can talk about validity from either a micro or macro perspective. At the macro level, validity is based on the construct of interest. At the micro level, we can talk about specific "types" of validity including predictive, concurrent, criterion-related, face, content...and the list can go on much further than you would imagine.&lt;br /&gt;&lt;br /&gt;4.) There is some disagreement about this but I have to agree with those that argue that validity has very little to do with a particular survey or test. We don't design a survey or test just for the fun of it. We design it because we are interested in measuring something such as self-esteem or intelligence. Once we design it, the survey or test itself does not stand up and announce results.  People have to take the survey or test. At the end of the day, we have numbers. So, we should really talk about the validity of those numbers. If we were interested in measuring your heart rate, validity would be based on the the beats (the numbers).&lt;br /&gt;&lt;br /&gt;But wait a second....very few surveys or tests are created just for the fun of getting a number.  Ultimately, we DO SOMETHING with that number. Maybe we decide that Johnnie is smart enough for graduate school.  Maybe Donna should be on medication for a disability.  Maybe Richie should not be hired for this position.  But when you use a survey (or rather the resulting numbers) to make these decisions, there is the risk that we made an incorrect decision (i.e., the Type I error). So, validity is really matter of degree to which we made an appropriate inference, conclusion, or decision from the number(s) that came from a test or survey designed to address a particular need.  Does this mean we should ignore the content of the survey or the face appearance of the test?  Of course, not. However, these don't truly answer questions about the degree of validity...they are crucial aspects of the design and procedures stages, and they have a direct impact on our conclusions regarding the degree of validity.&lt;br /&gt;&lt;br /&gt;What can you draw from this?  Be careful in how you use the word. If you want to sound like an amateur, then say, "Hey, I ran a confirmatory factor analysis.  Guess what?  My survey is valid".  If you want sound more professional, then say, "Hey, I ran a confirmatory factor analysis. My results suggest that the original  solution fits pretty well for my population. The original version has been used in many contexts and has had few negative consequences over the years. Based on my results, I should see the same thing happen. I've added evidence for greater validity in arguing my research conclusions, and I've contributed to the body of validity evidence for this survey."&lt;br /&gt;&lt;br /&gt;Watch the video below to learn more about the impact of incorrect conclusions in statistical analyses.&lt;br /&gt;&lt;br /&gt;&lt;!--cut and paste--&gt;&lt;object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=8,0,0,0" id="VE_Player" align="middle" height="285" width="320"&gt;&lt;param name="movie" value="http://static.videoegg.com/ted/flash/loader.swf"&gt;&lt;param name="FlashVars" value="bgColor='FFFFFF'&amp;amp;file=http://static.videoegg.com/ted/movies/PETERDONNELLY_high.flv&amp;amp;autoPlay=false&amp;amp;fullscreenURL=http://static.videoegg.com/ted/flash/fullscreen.html&amp;amp;forcePlay=false&amp;amp;logo=&amp;amp;allowFullscreen=true"&gt;&lt;param name="quality" value="high"&gt;&lt;param name="allowScriptAccess" value="always"&gt;&lt;param name="bgcolor" value="#FFFFFF"&gt;&lt;param name="scale" value="noscale"&gt;&lt;param name="wmode" value="window"&gt;&lt;embed src="http://static.videoegg.com/ted/flash/loader.swf" flashvars="bgColor=FFFFFF&amp;amp;file=http://static.videoegg.com/ted/movies/PETERDONNELLY_high.flv&amp;amp;autoPlay=false&amp;amp;fullscreenURL=http://static.videoegg.com/ted/flash/fullscreen.html&amp;amp;forcePlay=false&amp;amp;logo=&amp;amp;allowFullscreen=true" quality="high" allowscriptaccess="always" bgcolor="#FFFFFF" scale="noscale" wmode="window" name="VE_Player" type="application/x-shockwave-flash" pluginspage="http://www.macromedia.com/go/getflashplayer" align="middle" height="285" width="320"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6072580433200779500-6193465493012858638?l=statconsultant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://statconsultant.blogspot.com/feeds/6193465493012858638/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6072580433200779500&amp;postID=6193465493012858638' title='15 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/6193465493012858638'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/6193465493012858638'/><link rel='alternate' type='text/html' href='http://statconsultant.blogspot.com/2007/09/validity-of-validity.html' title='The Validity of Validity'/><author><name>StatHelper</name><uri>http://www.blogger.com/profile/09382106274285007286</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='31' src='http://2.bp.blogspot.com/-OxTzG7Gvxag/TWaf4WE4l4I/AAAAAAAAACM/AnCmPF7lFu0/s220/Jeff%2Bpic.JPG'/></author><thr:total>15</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6072580433200779500.post-504758159375002257</id><published>2007-09-12T18:15:00.002-04:00</published><updated>2010-12-09T18:47:41.493-05:00</updated><title type='text'>I don't think we're in Kansas anymore, Toto!</title><content type='html'>****************************************************************************&lt;br /&gt;After meeting Glinda and the Munchkins, it's so incredibly clear.  Just follow the yellow brick road right? If she just remembers those 5 magic words, she'll find her Emerald City and her free ride back to Kansas.  Of course, it wasn't that easy.  If it had been then the book would have been only a few chapters long.  Instead, she decides to dance around with a scarecrow for awhile, slap a lion, take a nap in a poppy field, and go on a scenic tour of a Oz Witch Project.&lt;br /&gt;&lt;br /&gt;Wow, this is starting to sound like the research process to me. We start with an idea. Then we meticulously determine our sample size and procedures.  Finally, alas, all of that data is in a file itching to be analyzed.  But at this point, my friend, we are most easily distracted from our yellow brick road.&lt;br /&gt;&lt;br /&gt;Here's one scenario. Suppose you are researching the influence of an advertising campaign on the number of products purchased in one month for a brand new product. You decide it would be smart to look at the frequency distributions for the variables. (Yes, this is always a good thing to do.) But, uh-oh, your dependent variable is seriously positively skewed!&lt;br /&gt;&lt;br /&gt;We talk to Fred who says, "Well, you really shouldn't assume a normal distribution...maybe you can normalize the distribution by taking the natural logarithm". Two weeks later, after debating the ethics of changing your numbers and accepting the notion of "linear transformations", you find out that you can't take the log anyway because you have a bunch of zeros.&lt;br /&gt;&lt;br /&gt;So then you decide to go back to the company and say that you can't analyze this data because a bunch of people didn't buy the new product at all.  And then you decide that doing that would probably not get you much future work for this business.&lt;br /&gt;&lt;br /&gt;Mary suggests adding a very very small value such as .000000001 to all the numbers so that you can take the log of a now-transformed zero. But John then notes that you will now have an large number of ln(.000000001 ) values and that you should use a Hurdle or Zero-Inflated Poisson model.&lt;br /&gt;&lt;br /&gt;Time to take a nap in the poppy field.&lt;br /&gt;&lt;br /&gt;When you awaken, you realize that you only have 2 days left to meet the deadline. So, you decide to ignore that potential problem you wrestled with for 2 weeks. You move on to the predictor variables. You decide to run a factor analysis. Lo and behold, the factor structure is unexpected. Given the surprising results, you ask the client for a one-day extension. You run a cluster analysis on the sum of the scores making up the factors. You find two clusters. You decide to call them the "1=high on all factors" and "0=low on all factors". From there, you use this dichotomy in a latent class analysis. Finally, you take the 4 classes that emerge and regress them on........drum roll......the number of purchases made.  (In case this has gotten confusing, the dependent variable has now become the predictor variable.)&lt;br /&gt;&lt;br /&gt;Along comes the deadline. You send the client 137 or more pages of software printout and a bunch of graphs.&lt;br /&gt;&lt;br /&gt;They look at this and say, "Wow...you did a lot of work.  Now how does this answer our research question?"&lt;br /&gt;&lt;br /&gt;You then realize that you aren't in Emerald City at all....this is looking more like the Oz Witch Project!&lt;br /&gt;&lt;br /&gt;How can this be prevented? How can we sift through a myriad of analytic options and not lose track of the road before us?&lt;br /&gt;&lt;br /&gt;#1.  Stick a big note on your desk that says, "How does this help or distract me from answering the research question?"&lt;br /&gt;&lt;br /&gt;#2. In the header and footer of all documents, type "The Effect of X on Y" or "The Relationship Between X and Y".  Keep that research question clear and visible at all times.&lt;br /&gt;&lt;br /&gt;#3. Write a quick sentence or two at the end of each analysis as if it were for the final report, and be sure to tie it into the original research questions.&lt;br /&gt;&lt;br /&gt;#4. Tell your client, research partners, and/or graduate committee members how easy it is to lose sight of the forest for the trees.  80% of the solution to all problems is awareness.&lt;br /&gt;&lt;br /&gt;#5. If you aren't a consultant, then find one.  This is part of the consultant's job.  A good consultant will not only know this but will help bring you back to the yellow brick road (or help you build a new one if necessary).&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Please send requests for more information to jeffmiller.research@gmail.com&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6072580433200779500-504758159375002257?l=statconsultant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://statconsultant.blogspot.com/feeds/504758159375002257/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6072580433200779500&amp;postID=504758159375002257' title='15 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/504758159375002257'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6072580433200779500/posts/default/504758159375002257'/><link rel='alternate' type='text/html' href='http://statconsultant.blogspot.com/2007/09/i-dont-think-were-in-kansas-anymore.html' title='I don&apos;t think we&apos;re in Kansas anymore, Toto!'/><author><name>StatHelper</name><uri>http://www.blogger.com/profile/09382106274285007286</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='31' src='http://2.bp.blogspot.com/-OxTzG7Gvxag/TWaf4WE4l4I/AAAAAAAAACM/AnCmPF7lFu0/s220/Jeff%2Bpic.JPG'/></author><thr:total>15</thr:total></entry></feed>
