Statistics involves collection of data which is the analyzed, organized, interpreted and presented to the end users. Application of statistics is evident both in science and social aspects of life. During its application, populations or samples are modeled for purposes of the study. Let us now look into basic definitions of some statistical terms below;

**Organizing data-**this is creating information in the form of numerical charts, tables and graphs which can be used to present results about a sample or population. They also help us understand the techniques used to statistically make underlying and informed decisions about our lives and the wellbeing of us.**What is data?-**These are the observations about a certain sample or population where facts and other necessary information have been well organized and arranged to give meaning to an end user. Data is categorized into two when dealing with statistics;**Measurable data**- this is commonly referred to as quantitative data which mostly is collected using some type of instruments. Examples include height, weight, exam scores, lengths, distance, speed etc.**Qualitative data**-it is also known as categorical or frequency data. It is used to record common properties of a population or sample for example gender, brand type etc

Statistics deal with a lot of properties and characteristics designed by the researcher to help in collecting data. These characteristics are commonly called variables and here is an analysis of basic types of variables.

**Independent variables**- these are characteristics of a sample or population that can be easily manipulated by the researcher. The researcher is able to select, measure and manipulate the variable using antecedent conditions for observable behaviors of the sample.**Dependent variable**- this is always beyond the researcher’s control. It is only collected and observed with regard to the independent variable.**Continuous variables**- these can take many different values and not necessarily whole numbers or integers. They simply can be written as 1, 2.1, 3, 4.05 etc. in terms of their units.**Discrete variables**- it has a limited number of values and mostly takes the form of whole numbers for example there can only be 2 males and never 2.5 males. .**Quantitative variables**- these are based on measurable data.**Qualitative variables**- based on frequency data.

After collecting data and categorizing it into various variables, statistics’ information should be presented or arranged. The arrangement gives an overview to the user about various samples’ variables for example the trend of users on a website, the number of pages viewed, the number of searches etc. This arrangement can be done visually using graphs and here are the basic types of graphs used for statistical presentations. There are several kinds of graphs and we shall illustrate them below.

**Bar graphs**- they are a form of graph which are arbitrary separated by a certain amount of space representing how often variables within a category appear. The measurements are always ordinal or nominal with discrete values. A higher bar indicates the highest frequency or occurrences of that variable. See an example above.

**Histograms**- they are used to present scaled or ratio data. There are several intervals in histograms in contrast to bar graphs. These intervals are widths defined by lower and upper limits. The frequency is measured using a continuous scale which makes the lower limit of one interval to be the upper limit on the immediate preceding bar.

**Boxplots**- it presents extreme scores and dispersions. These include the maximum, minimum and quartile scores in a whiskered box. The ranges of the box fall at the center of the 50% in a given distribution’s 25^{th}percentile and 75^{th}percentile- commonly known as inter quartile ranges. The lines extending to maximum and minimum scores are called the whiskers which are defined as +/-1.5* IQR to mathematically get lower and upper fences.

**Scatterplots**- they present bivariate distributions in graphical forms. The variables are represented by one point using two dimension spaces. The measurement of data or variables is continuous. It is very useful for getting relationships between two given variables.

**Basic measurements that help in statistics**

There are several measurements calculated for any set of data by a researcher in every study. They are discussed as follows.

**Measures of central tendency** – by making a plot of the data for any frequency distribution , the shape achieved will be general and shows how the numbers are close or far apart from each other. To get the center of such distributions, we can calculate various statistics as follows;

**The mode**-it is the most common score in a frequency distribution for example if the heights of buildings in a city are 100Ft, 200Ft, 300Ft, 100ft, 400Ft,and 100ft, then the modal score is 100ft i.e. the most common.**The median**- this is the figure that divides the frequency distribution in to two halves. The first thing when calculating the median is to arrange the numbers in a frequency table in their ascending order. If the total count of the frequency is an even number, then the number at the center is the median. If the frequency count is an even number, then you calculate the median as (N+1)/2. In advanced statistics you will come across formulas to calculate the median. An example could be. Imagine a frequency such as this (3, 1, 5, 4, 9, 9 and 8). The median could be found at (7+1)2=4. However, we should rearrange the distribution in an ascending order as follows (1, 3, 4, 5, 8, 9, 9) we get the score 5 as the median. If we arrange the scores and make them an even number we get (1, 3, 4, 5, 8, 9), calculating the median could be done as, (6+1)2 =3.5 or (4+5)/2= 4.5.**The mean**- this indicates the average score of a certain variable. It is calculated as summation of all variables (∑X) divided by total count of the score in a sample or population (N). The sum of deviations from the mean is always equal to zero in any frequency distribution.

**Spread measures**-they are also known as variability measures which gives us information about how specific scores deviate from the mean. To explain these deviations in statistics, we use the following measures:

**The range**-this is simple and easy to calculate since it is the variance between the highest and lowest score in a frequency of statistics as you can see this could be easily manipulated hence we cannot only rely on this measure. A high range affects the mean either positively or negatively. This is achieved by removing the highest score or including it in the distribution.**(IQR) Interquartile range**-This measures the middle spread at 50% of the variable score. IQR can thus be defined as the quarter quartile and the three quarters quartile i.e. the scores between 25^{th}and 75^{th}This has one major advantage in that it is easy to calculate and helps reduce the weakness brought around by the extremeness of the scores when calculating the range. The disadvantage is that it discards a lot of variables when measuring variability. To eliminate scores that make no sense, researchers use boxplots to identify them and then calculate IQR for perfect results.

**The variance**- this is calculated as ∑(X- `X), which shows the deviation of variables from the mean. This will have a total sum of zero as earlier indicated in the mean definition. We rank the variables in their ascending order and eliminate negative values. To get control N or number of subjects in the sample N-1 we divide the ∑(X - `X) /N or ∑(X - `X)/ (N-1) for a population and a sample respectively.**Standard deviation-**this is a square root of the variance hence it calculated as Sqrt∑(X- `X). It has little meaning with regard to the data but it acts as an average of the variances. In normal distributions, the majority of scores (around 2/3 thirds fall between +1 and-1 from the mean. the standard deviation is approximately (N<30) or ¼ of the range and 1/5 to 1/6 in larger samples.

There are other measures of statistics applied in frequency organization but for purposes of basics, we shall just touch on them lightly as follows;

**Symmetry**-they are distributions which when plotted have the same shape on both sides of the center. A single peaked distribution is commonly called normally distributed frequency.**Skewed distributions**- this shows an asymmetric frequency where both sides of the center are not equal. There are several types of skewedness as explained below.**Positive skewedness**- this has the tail extending out to the right. This shows that the mean is greater than median. It also shows that the variables in the study are sensitive to extreme values. In small samples, it could indicate existence of extreme variables.**Negative skewedness**- the tail extends to the left of the center in an asymmetrical manner.

