Monday, May 11, 2009

Dissertation Editing

    
     Writing a dissertation is a painstaking process. It usually begins with ideas floating around in the mind until it comes time to materialize it on paper. A few sentences hopefully inspires a thorough literature review. The wise student is meticulous in organizing specific ideas from hundreds of articles such that they tie into a cohesive and well-organized chapter of the dissertation proposal.

     During this process, the information load usually exceeds cognitive capacity to synthesize a perfect draft. More than often, the chapter is an amalgamation of statements, often disjointed with paragraphs related to a theme disconnected throughout the manuscript. Revisions are in order. The goal here is to get that proposal accepted so that the student can move on to the actual research! So, it is vital that the literature review and the preceding introduction chapter be clear in flowing from the statement of the research problem to its justification and substantiation via previous research.

     There are two stages that are also tricky in terms of editing once the research is complete. First, there are the nuances of APA-format required for Chapter 4. For example, most of us don't naturally think to write "was statistically significant, t(38) = 2.19, p = 0.035" with full knowledge of what to italicize and where to place spaces. Second, somehow that results chapter and the following discussion chapter have to connect with the preceding chapters in such a way that the entire dissertation tells a complete and coherent story.

AlphaPoint05 has dissertation editors focusing on spelling, grammar, style, mechanics, structure, and rhetoric to help your dissertation be of the highest writing caliber possible. All dissertation editors hold Ph.D. degrees in areas related to writing.

Tuesday, March 24, 2009

Monday, September 1, 2008

My Table Has Too Many Zeros

Hi everyone,

Sometimes we are interested in looking at the association between two qualitative variables such as gender and political party. We ask, "Is there a significant association between gender and political party?", and we typically use a chi-square test of independence in the statistical analysis.

We usually assume an expected count of 5 in each cell but this doesn't always happen. SPSS will still provide output but include a note stating something like "50% of the cells have an expected count less than 5". What to do?

Well, one solution is the Fisher's Exact Test. This is a nonparametric test that utilizes the hypergeometric distribution.

Some software programs state that it only works for a 2 X 2 table such as Gender (male and female) by Political Party (Democrat and Republican). This simply isn't true. It's just very computer intensive.

Feel free to contact me at AlphaPoint05 if you are looking for a consultant experienced with this type of analysis.

Best regards,
Jeff

Friday, June 27, 2008

Being Wrong "On Average"

One of the most simple concepts we learn in a statistics course is that of the "average". Most of us come into a statistics course already familiar with the concept of adding up a bunch of numbers and then dividing that total by the number of things being added. For example, if three people are of ages 10, 20, and 30 then the average is simply (10 + 20 + 30) / 3 = 20. We learn that, in statistics, this average is called the "mean". Sometimes, we learn the more complete word for it -- the "arithmetic mean".

Unfortunately, we often report the wrong average. Believe it or not, there is actually more than one way to calculate the "mean", and the best one to use depends on what we are doing with the numbers. For example, suppose you are told that you will receive salary increases each year as follows.

* Year 2 - 2.0%
* Year 3 - 2.5%
* Year 4 - 2.9%
* Year 5 - 3.5%
* Year 6 - 4.2%

If your starting pay was $40,000 USD then, to get your Year 2 pay, you would multiply 40,000 X 1.02 = 40,800. To get the Year 3 pay, you would then multiply that 40,800 X 1.025 = 41,820. Notice that in each case we are "multiplying", not adding. Hence, to get the average percent pay increase, we should calculate a mean based on multiplying. The arithmetic mean would not be the most correct average to report.

Fortunately, we can use the geometric mean to find the answer to this problem. First, we multiply, not add, all the numbers together. Let k = the number of items being multiplied. Now, just take the k-th root.

Note that if you only multiplied 2 numbers, you would take the 2nd root, which is the square root. The 3rd root is often called the cubed root. In our example, we are multiplying 5 numbers, so we take the fifth root.

Here's how we calculate the answer in Excel.

1.) Convert the percentages to the numbers we would use in multiplication.

* Year 2 - 1.02
* Year 3 - 1.025
* Year 4 - 1.029
* Year 5 - 1.035
* Year 6 - 1.042

2.) Enter these values in Excel. I put them in cells A1 through A5.

3.) In cell A6, I multiply by typing =PRODUCT(A1:A5) which gives me a multiplicative solution of 1.16

4.) Take the 5th root by raising the product to the 1/5th power. In cell A7, I type =A6^(1/5) giving a solution of 1.03

An easier method in Excel would be to just type =GEOMEAN(A1:A5) and skip steps 3 and 4. It grants the same solution.

So, the average pay increase over these 5 years is 1.03, or 3%. Note that the arithmetic mean would also suggest an "arithmetic" average increase of 3% but this isn't always the case. Further, I rounded to two decimal places. If we were dealing with large numbers and more decimal places then the results might be more meaningfully different.

Try your hand at it. A stock has yielded the following changes over the past 5 quarters: 4%, 7%, 12%, 10%, and 11%. What is the average increase? How do your results differ when using the typical arithmetic mean versus the more appropriate geometric mean?

Saturday, January 5, 2008

Stating Testable Hypotheses

Provided that one has a properly framed research question, the next step is to rephrase it in a manner that can generate an answer. The question is permitted to be vague; however, the hypotheses must be concrete. We must bring the question down from the realm of ideas and settle it on solid earth.

The research question communicates your "Hmmm, I wonder..." as "Hmmmm, I think..." So, in that sense, the hypothesis does remain a bit vague but only because aren't making any judgments of knowing. It is the words that we choose in our hypothesis that must be concrete.

Here is an example of the research question:

Research Question: Are children are more resilient than we think they are?

See how vague that is? What do we mean by resilient? How would we know if they are more resilient than 'we think' they are?

Now, here's a hypothesis

Hypothesis: Childrens' scores on the Resilience Scale will negatively correlate with their parents' scores on the Perceptions of Child Resilience Scale.

We have declared a way to measure resilience and a way to measure parents' perceptions of childrens' resilience via the two Scales (i.e., surveys). We have also declared a way to compare the two via "negatively correlate"

This hypothesis would suggest that lower scores for parents' perception are associated with higher scores for childrens' perception. Parents with low belief in their childrens' perception <--> Children with high resilience.


Note that this only an example of how to state a hypothesis. There are many other concerns not addressed here (e.g., what is meant by children?). Note also that this only one of many ways that the hypothesis could have been stated.

Strictly speaking, the hypothesis should be stated to the contrary as a "null hypothesis" symbolized H0:

H0: Childrens' scores on the Resilience Scale will NOT be negatively correlate with their parents' scores on the Perceptions of Child Resilience Scale.

We accept the null as the truth. Our results can then tell us how much we can trust "our" hypothesis, often called the "alternative hypothesis" symbolized H1: or Ha:

Given a stated testable hypothesis, we can then more efficiently gather data, conduct trials, analyze the data, and interpret results.

Wednesday, January 2, 2008

Framing Research Questions

It's not so much what you say but how you say it.

Framing the research question is similar in difficulty to writing the introduction paragraph in a paper. It forces you to engage in effortful and deep levels of cognitive processing. It requires you to objectively state what it is that you are researching.

Why is it so difficult to do this and do it well?

It's a cognitive workout. We humans are suited for automating simple tasks in order to pursue more complex goals. For example, the mere act of getting out of bed involves hundreds of decisions that engage hundreds of muscles; yet, we take this for granted and just get out of bed. Similarly, our research topic is probably very familiar to us. We may think, "Well, my question is obvious to me. Why detail it out in just a few words?

Failure to properly frame a research question seriously puts the cart before the horse. It has been said that a thousand-mile journey begins with a single step. But in which direction? Which foot? Which journey?

Thomas Carlyle stated that "a man without a goal is like a ship without a rudder". In the case of framing a research question, you may still have a rudder but you may have forgotten your map. Or you may have a map that is upside-down. Worse yet, you may have brought the wrong map.

TRICKS AND TIPS
1. Read other research papers and see how it was done. Draw from experience.
2. Explain your research to others in their language. Don't just say, "Well, it's a bit complicated."
3. Write several questions.
4. Don't worry when your question generates other questions. This is a good thing! Write those down and come back to them.
5. Write operational definitions for the terms that might appear in your research questions.
6. Make a concept map.

Friday, November 23, 2007

Demystifying the Binomial Distribution Formula

So, we're taking a stats class and reading the text. We're learning about probability, and all is going well. Of course flipping a coin has a heads probability of .50! Of course the probability of rolling a 6 on a fair 6-sided die is 1/6! Stats is supposed to be difficult?

Then we find out that we were just learning about ONE flip of the coin and ONE roll of the die. Dr. Quantoid tell us, "It is estimated that 20% of the fish in my lake are large-mouth bass. What is the probability of catching a large-mouth bass?

0.2, right? Right!

But, what is the probability of catching 3 bass out of 5 fish caught?"

Uh-oh....the neurons in this classroom have just started firing....and possibly, not down the correct paths!

The professor throws up this formula on the board:


I will use "typing notation" to restate this as

Pr(Y=k) = (nCk) * (p^k) * ((1-p)^(n-k)

Either way, the students' eyes go blurry and class is followed by a surge in optometrist appointments (possibly saving the day from a predicted sharp economic downturn)


But, the formula REALLY ISN'T COMPLICATED. It just looks that way. It's kinda' like those new-fangled cell phone PDA mp3 abc pdq xyz all-in-one devices. Hey, if you can find "Play" or "Send", it's not all that bad, right?

(Caution: I think the equation is blurry anyway but I can't tell...I'm not wearing my contacts. Please report any suspected true blurriness to the webmaster before visiting your optometrist.)

Moving forward...let's look at the first part:

Pr( Y=k) =

This is just saying that the probability that Y=3 is equal to something. What is Y=3? Well, you only have two options for binary variables....0 and 1. A 0 is usually meant to denote the "failure", the "miss", the "tails", the "no" while the 1 is usually meant to denote the "success", the "hit", the "heads", the "yes".

So, Pr(Y=3) is the probability of catching 3 bass....of having 3 successes....three hits....etc....is equal to something.

That something looks like a messy room full of n's , k's , and p's. (I thought I told you kids to clean...that...mess...up!)

Ok, let's do it.

First, we should befriend those letters. (If you don't believe that letters can be your friend then you definitely didn't watch enough Sesame Street as a kid!)

n is just our sample size....you should be ok with this by now. n=5

k is the number of Y=1's of interest. We want to know the probability of catching 3 bass out of 5. So, k=3

Finally, if we didn't know something about the probability of catching a bass then, yes, this problem would be VERY difficult to solve. But, we do know. (whew)

This probability is 0.2. So, p=.2

So, we plug in all these numbers and get the answer right? Right. But, that doesn't help you understand why the equation is so easy.

Let's look at the second part of the equation...(p^k).

This is asking you to raise the probability to the success power. So, we are raising .20 to the 3rd power. Why? Well, we already should know that the joint probability of two independent events is simply their product. For example, the probability of flipping two heads is .5 * .5 = .25.

So, what is the probability of buying 20 gallons of rotten milk when the probability is .05?

Well, that's .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05.

Now, why on Earth would we want to write all of that down when it's the same thing as saying .05 to the 20th power, which is .05^20? And, for that matter, why on Mars? Why on Jupiter? Why on any planet?

(I will save other possibilities such as "Why in Heaven?" and "Why in Hell" for the metaphysics community. It's just a bit too 'out there' for statisticians already dealing with issues of causality, etc...")

Many students solve p^k and turn that in as the answer. How callous, rude, and thoughtless! What about the other 2 fish? Y'know....those 2 non-bass?!? Do you really think that just because we are interested in the probability of 3 bass out of 5 fish that we can just ignore the 2 that aren't bass? If you ignore them, aren't you just talking about the probability of 3 bass out of 3 fish? (And discrimination of anything is SO pre-Y2K!)

There is a take-home point here! When we talk about the 0's, failures, misses, tails, non-bass, etc...we are not making a moral indictment on them. They are still INFORMATION. They still must be included in our calculations (even if we are upset that we only caught 3 bass and have 5 mouths at home to feed and don't have a cat to kick and feel compelled to blame Flipper).

So, this all ties into the third part of the equation...(1-p)^(n-k). Well, if the p=.2 is the probability of catching a bass then (1-.2), which is .8, is the probability of catching a non-bass. Like before, we wouldn't want to multiply .8 over and over and over by hand. If there are n=5 in the sample and there are k=3 bass, then we are now interested in 5-3 = 2 non-bass. So, we just raise .8 to the 2nd power.

Combining p^k and (1-p)^(n-k), all we are doing is asking for .2 * .2 * .2 * .8 * .8

It's just the joint probability solved by multiplying the individual probabilities for both successes AND failures. Why didn't they just Dr. Quantoid just put THAT in the textbook? Well, look back at the rotten milk example. Or think about this: What is the probability of 400 students out of 1,200 getting into graduate school when the probability is .14. Would you like to write that out? Or would it better to just say (.14^400) *((1-.14) ^ (1200 - 400))

Finally, what about the first part of the equation? Y'know that oversized parenthesis with an n on top of a k?
Well, we read that as "n choose k". So, it's 5 choose 3. Huh?

Of course we want to choose 3 from 5. But, guess what? There are a lot of ways to do this. Below are my fishing results for 3 different days.

Day 1: Bass Bass Bass Bluegill Carp
Day 2: Carp Bass Bass Shark Bass
Day 3: Whale Dolphin Bass Bass Bass

The point here is that there are many ways to catch 3 fish out of 5. It could be the first 3 fish as in day 1. It could be the last 3 fish as in day 3.

I could write out all of the possibilities and do some math but it's a lot easier to know that "n choose k" will give me the answer right away.

"n choose k" is equal to n! / k!(n-k!)

n! is (5*4*3*2*1)

k! is (3*2*1)

n-k! is (5-3)! = 2! = (2*1)

So, the full solution is ((5*4*3*2*1) / (3*2*1) * (2*1))


(Note: there was actually another bass inside of the whale but that violates the binomial assumption of identical trials so we'll pretend that didn't happen.)

(Note: After catching the whale, I almost caught a bass but the dolphin ate it....that violates the binomial assumption of independent trials so we'll pretend that didn't happen.)

So, what have we learned?

1. We are just multiplying the probabilities of "successes".
2. We are also multiplying the probabilities of "failures".
3. We are multiplying this by all the ways we can have those "successes".