# Basic statistics pdf

The Prerequisites Checklist page on the Department of Statistics website lists a number of courses that require a foundation of basic statistical concepts as a prerequisite. All of the graduate courses in the Master of Applied Statistics program heavily rely on these concepts and procedures.

Therefore, it is imperative — after you study and work through this lesson — that you thoroughly understand all the material presented here. Students that do not possess a firm understanding of these basic concepts will struggle to participate successfully in any of the graduate level courses above STAT These review materials are intended to provide a review of key statistical concepts and procedures.

Specifically, the lesson reviews:. For instance, with regards to hypothesis testing, some of you may have learned only one approach — some the P -value approach, and some the critical value approach. It is important that you understand both approaches. If the P -value approach is new to you, you might have to spend a little more time on this lesson than if not.

Upon completion of this review of basic statistical concepts, you should be able to do the following:. Students are strongly encouraged to take STATthoroughly review the materials that are covered in the sections above or take additional coursework that focuses on these foundations.

If you have struggled with the concepts and methods that are presented here, you will indeed struggle in any of the graduate level courses included in the Master of Applied Statistics program above STAT that expect and build on this foundation. Breadcrumb Home Reviews Statistical concepts. Basic Statistical Concepts. Font size. Font family A A. Content Preview Arcu felis bibendum ut tristique et egestas quis: Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris Duis aute irure dolor in reprehenderit in voluptate Excepteur sint occaecat cupidatat non proident.

Lorem ipsum dolor sit amet, consectetur adipisicing elit. Odit molestiae mollitia laudantium assumenda nam eaque, excepturi, soluta, perspiciatis cupiditate sapiente, adipisci quaerat odio voluptates consectetur nulla eveniet iure vitae quibusdam?

Excepturi aliquam in iure, repellat, fugiat illum voluptate repellendus blanditiis veritatis ducimus ad ipsa quisquam, commodi vel necessitatibus, harum quos a dignissimos.

Close Save changes. Help F1 or? Review Materials These review materials are intended to provide a review of key statistical concepts and procedures. Specifically, the lesson reviews: populations and parameters and how they differ from samples and statistics, confidence intervals and their interpretation, hypothesis testing procedures, including the critical value approach and the P -value approach, chi-square analysis, tests of proportion, and power analysis.

Understand the general idea of hypothesis testing -- especially how the basic procedure is similar to that followed for criminal trials conducted in the United States. Be able to distinguish between the two types of errors that can occur whenever a hypothesis test is conducted.

Understand the basic procedures for the critical value approach to hypothesis testing. Understand the basic procedures for the P -value approach to hypothesis testing. Note : These materials are NOT intended to be a complete treatment of the ideas and methods used in basic statistics. These materials and the accompanying self-assessment are simply intended as simply an 'early warning signal' for students.N " N " is usually used to indicate the number of subjects in a study.

The average result of a test, survey, or experiment. Heights of five people: 5 feet 6 inches, 5 feet 7 inches, 5 feet 10 inches, 5 feet 8 inches, 5 feet 8 inches. Odd amount of numbers: Find the median of 5 feet 6 inches, 5 feet 7 inches, 5 feet 10 inches, 5 feet 8 inches, 5 feet 8 inches.

Line up your numbers from smallest to largest: 5 feet 6 inches, 5 feet 7 inches, 5 feet 8 inches, 5 feet 8 inches, 5 feet 10 inches. Find the mode of 5 feet 6 inches, 5 feet 7 inches, 5 feet 10 inches, 5 feet 8 inches, 5 feet 8 inches. Put the numbers in order to make it easier to visualize: 5 feet 6 inches, 5 feet 7 inches, 5 feet 8 inches, 5 feet 8 inches, 5 feet 10 inches. The measure of whether the results of research were due to chance.

The more statistical significance assigned to an observation, the less likely the observation occurred by chance. The way in which significance is reported statistically i. Note that in general p-values need to be fairly low. The degree to which two factors appear to be related. Correlation should not be confused with causation. Just because two factors are reported as being correlated, you cannot say that one factor causes the other.

For example, you might find a correlation between going to the library at least 40 times per semester and getting high scores on tests. However, you cannot say from these findings what about going to the library, or what about people who go to libraries often, is responsible for higher test scores.

Date last modified: August 8, Online learning tutorials for essential college skills. The Three Ms Mean The average result of a test, survey, or experiment. Example: Heights of five people: 5 feet 6 inches, 5 feet 7 inches, 5 feet 10 inches, 5 feet 8 inches, 5 feet 8 inches.

The sum is: inches. The mean average is 5 feet 7. Median The score that divides the results in half - the middle value. Examples: Odd amount of numbers: Find the median of 5 feet 6 inches, 5 feet 7 inches, 5 feet 10 inches, 5 feet 8 inches, 5 feet 8 inches.

The median is: 5 feet 8 inches the number in the middle. Mode The most common result the most frequent value of a test, survey, or experiment. Example: Find the mode of 5 feet 6 inches, 5 feet 7 inches, 5 feet 10 inches, 5 feet 8 inches, 5 feet 8 inches.

The mode is 5 feet 8 inches it occurs the most - two times. Significant Difference Significance The measure of whether the results of research were due to chance. Example: A study had one group of students Group A study using notes they took in class; the other group Group B studied using notes they took after class using a recording of the lecture.

Students in Group A scored higher on a test than Group B. Correlation Correlation The degree to which two factors appear to be related.

An r-value of -1 indicates a extreme negative correlation between two variables - as one variable's value tends to increase, the other variable's value tends to decrease. An r-value of 0 means there is no correlation at all between the elements being studied.Statistics can be a powerful tool when performing the art of Data Science DS. From a high-level view, statistics is the use of mathematics to perform technical analysis of data. A basic visualisation such as a bar chart might give you some high-level information, but with statistics we get to operate on the data in a much more information-driven and targeted way.

The math involved helps us form concrete conclusions about our data rather than just guesstimating. Using statistics, we can gain deeper and more fine grained insights into how exactly our data is structured and based on that structure how we can optimally apply other data science techniques to get even more information. Statistical features is probably the most used statistics concept in data science. Check out the graphic below for an illustration.

The line in the middle is the median value of the data. Median is used over the mean since it is more robust to outlier values.

The first quartile is essentially the 25th percentile; i. The third quartile is the 75th percentile; i. The min and max values represent the upper and lower ends of our data range. A box plot perfectly illustrates what we can do with basic statistical features:.

Introduction to Statistics

All of that information from a few simple statistical features that are easy to calculate! Try these out whenever you need a quick yet informative view of your data. We can define probability as the percent chance that some event will occur.

In data science this is commonly quantified in the range of 0 to 1 where 0 means we are certain this will not occur and 1 means we are certain it will occur.

## Content Preview

A probability distribution is then a function which represents the probabilities of all possible values in the experiment.

There are many more distributions that you can dive deep into but those 3 already give us a lot of value. We can quickly see and interpret our categorical variables with a Uniform Distribution. If we see a Gaussian Distribution we know that there are many algorithms that by default will perform well specifically with Gaussian so we should go for those. The term Dimensionality Reduction is quite intuitive to understand. We have a dataset and we would like to reduce the number of dimensions it has.

In data science this is the number of feature variables.Whatever data you need to process, chances are someone created an Excel app for it. Here's a selection. Read More. Excel has a lot of statistical power if you know how to use it. Calculating percentages in Excel is as simple as it is anywhere else: just divide two numbers and multiply by Just enter the equation after that and hit Enter to run the calculation. You now have a decimal value in this case. Read More to add to your arsenal.

You can also change the cell format the long way by right-clicking the cell, selecting Format Cellschoosing Percentageand clicking OK. Tip: Learn how to create dropdown lists for Excel cells How to Create a Dropdown List in Microsoft Excel Learn how to create a dropdown list in Microsoft Excel, as well as customize it and add a dependent dropdown list. Calculating the percentage increase is similar. Now, take the resulting value the raw change and divide it by the original measurement.

That gives us a decimal change of. You can also get all of this information in a single formula like this:. Do a quick check: is almost equal toso this makes sense.

If we would have calculated a change value ofthe percentage change would have been percent. Here are several Microsoft Excel formulas that will help you solve complex daily problems.

Read More calculates the mean average of a set of numbers. Just type in the name of the function, select the cells you want to apply it to, and hit Enter. In our example here, we have a series of measurements that we need the average of.From Statistics For Dummies, 2nd Edition.

By Deborah J. Being able to make the connections between those statistical techniques and formulas is perhaps even more important. It builds confidence when attacking statistical problems and solidifies your strategies for completing statistical projects. After data has been collected, the first step in analyzing it is to crunch out some descriptive statistics to get a feeling for the data. For example:. The most common descriptive statistics are in the following table, along with their formulas and a short description of what each one measures.

When designing a study, the sample size is an important consideration because the larger the sample size, the more data you have, and the more precise your results will be assuming high-quality data. If you know the level of precision you want that is, your desired margin of erroryou can calculate the sample size needed to achieve it.

In statistics, a confidence interval is an educated guess about some characteristic of the population. A confidence interval contains an initial estimate plus or minus a margin of error the amount by which you expect your results to vary, if a different sample were taken. The following table shows formulas for the components of the most common confidence intervals and keys for when to use them. You use hypothesis tests to challenge whether some claim about a population is true for example, a claim that 40 percent of Americans own a cellphone.

To test a statistical hypothesis, you take a sample, collect data, form a statistic, standardize it to form a test statistic so it can be interpreted on a standard scaleand decide whether the test statistic refutes the claim. The following table lays out the important details for hypothesis tests.

Deborah J. Cheat Sheet. Statistics For Dummies Cheat Sheet. Understanding Formulas for Common Statistics After data has been collected, the first step in analyzing it is to crunch out some descriptive statistics to get a feeling for the data.

For example: Where is the center of the data located? How spread out is the data? How correlated are the data from two variables? Statistically Figuring Sample Size When designing a study, the sample size is an important consideration because the larger the sample size, the more data you have, and the more precise your results will be assuming high-quality data.

Surveying Statistical Confidence Intervals In statistics, a confidence interval is an educated guess about some characteristic of the population. Handling Statistical Hypothesis Tests You use hypothesis tests to challenge whether some claim about a population is true for example, a claim that 40 percent of Americans own a cellphone.

About the Book Author Deborah J.Statistical methods are mainly useful to ensure that your data are interpreted correctly. Standard deviation is the variability within a data set around the mean value.

Before staring Data Analysis pipeline you should know there are mainly five steps involved into it. The first step of the data analysis pipeline is to decide on objectives.

These objectives may usually require significant data collection and analysis. Measurement generally refers to the assigning of numbers to indicate different values of variables. If data is not sufficient the you have to collect new data. Even if you have existing data, it is very important to know how the data was collected? This will helps you to understand you ca determine the limitations of the generalizability of results and conduct a proper analysis. The more data you have, the more better correlations, building better models and finding more actionable insights is easy for you.

Especially data from more diverse sources helps to do this job easier way. This is another crucial step in data analysis pipeline is to improve data quality for your existing data. Too often Data scientists correct spelling mistakes, handle missing values and remove useless information.

### Introductory Statistics

This is the most critical step because junk data may generate inappropriate results and mislead the business. Exploratory data analysis helps to understand the data better. Because a picture is really worth a thousand words as many people understand pictures better than a lecture.

Likewise, Measures of Variance indicate the distribution of the data around the center. Correlation refers to the degree to which two variable move in sync with one another. Now build models that correlate the data with your business outcomes and make recommendations. This is where the unique expertise of data scientists becomes important to business success.

Correlating the data and building models that predict business outcomes. The data analysis is a repeatable process and sometime leads to continuous improvements, both to the business and to the data value chain itself. Now you know steps involved in Data Analysis pipeline. Before starting any statistical data analysis, we need to explore data more and more. To explore data below topics are very useful. Skip to content. Steps in the Data Analysis Process Before staring Data Analysis pipeline you should know there are mainly five steps involved into it.

Step 2: What to Measure and How to Measures Measurement generally refers to the assigning of numbers to indicate different values of variables. Step 4: Data Cleaning This is another crucial step in data analysis pipeline is to improve data quality for your existing data. Step 5: Summarizing and Visualizing Data Exploratory data analysis helps to understand the data better. Step 6: Data Modeling Now build models that correlate the data with your business outcomes and make recommendations.

Correlating the data and building models that predict business outcomes Step 7: Optimize and Repeat The data analysis is a repeatable process and sometime leads to continuous improvements, both to the business and to the data value chain itself. Go to Home Page.We begin with a simple example. There are millions of passenger automobiles in the United States.

What is their average value? It is obviously impractical to attempt to solve this problem directly by assessing the value of every single car in the country, adding up all those numbers, and then dividing by however many numbers there are. Instead, the best we can do would be to estimate the average. One natural way to do so would be to randomly select some of the cars, say of them, ascertain the value of each of those cars, and find the average of those numbers.

The set of all those millions of vehicles is called the population of interest, and the number attached to each one, its value, is a measurement. The average value is a parameter : a number that describes a characteristic of the population, in this case monetary worth.

The set of cars selected from the population is called a sampleand the numbers, the monetary values of the cars we selected, are the sample data. The average of the data is called a statistic : a number calculated from the sample data. This example illustrates the meaning of the following definitions. A population All objects of interest.

A sample The objects examined. A measurement A number or attribute computed for each member of a set of objects. The measurements of sample elements are collectively called the sample data The measurements from a sample. A parameter A number that summarizes some aspect of the population.

A statistic A number computed from the sample data. In reasoning this way we have drawn an inference about the population based on information obtained from the sample.

In general, statistics is a study of data: describing properties of the data, which is called descriptive statisticsand drawing conclusions about a population of interest from information extracted from a sample, which is called inferential statistics. Statistics Collection, display, analysis, and inference from data.

Descriptive statistics The organization, display, and description of data. Inferential statistics Drawing conclusions about a population based on a sample. The measurement made on each element of a sample need not be numerical.

In the case of automobiles, what is noted about each car could be its color, its make, its body type, and so on. Such data are categorical or qualitativeas opposed to numerical or quantitative data such as value or age. This is a general distinction.

Basic statistics pdf