Skip to content

Probability and Random Processes

Descriptive Statistics

Statistics

Statistics are the summarization of a set of data that has been collected, which demonstrates random variation. Extracting meaning from data.

Inferential Statistics

Inferential Statistics make inferences about a situation based on data, such as forecasting. Descriptive statistics can be the basis for inferences.

Representative Values

  • Mean - The sum of all numbers in a list divided by the number of items in the list
  • Median - The middle value in a list ordered from smallest to largest
  • Mode - The most frequently occuring value in a list
  • Range - [Min, Max]
  • Variance - Average of deviation from the mean squared
  • Standard Deviation - Measure of average absolute deviation
  • Skewness - Measure of the shape of the distribution function
  • Quantiles - Generalization of the median to percentiles

Observational vs. Experimental Data

Experimental Data involves manipulating objects to determine cause and effect in data. Observational Data refers to data extrapolated from naturally occurring events.

Basic Probability

Probability Calculus

Probability events have a total probability between zero and one.

An event that is sure to happen

The definition of probability for how often an event is observed can be related to the number of repetions of the experiment.

Counting the probability of heads in a set of coin tosses

The larger the number of repetiions, the higher accuracy with which we can predict the likelihood of an event happening.

Probability Model

Events

Events are elements in the set of possible outcomes in an experiment.

Sample Space

The set of all possible outcomes for an experiment.

The sample space for a dice roll

Subsets containing events in the sample space

The complement of a subset is the subset of all other events in the sample space which are not contained in .

Event Algebra

Or

The combination of two or more sets.

For and ,

And

The set of events which occur in two or more sets.

For and ,

  1. Mutual exclusion:
  2. Inclusion:
  3. Double complement:
  4. Commutation:
  5. Associativity:
  6. Distributivity:
  7. DeMorgan’s Law:

Probability of Events

  1. For any event :
  2. For the full set,
  3. If and are Mutually Exclusive then
  4. Axiom 3 can be extended:

Mutual Exclusivity refers to the fact that and will never occur simultaneously, ie .

Non-Mutually Exclusive

In cases where and can occur in the same set, Axiom 3 will not apply.

This is due to the fact that in overlapping events, the same area of probability will be counted twice, though it has no statistical importance.

Non mutually exclusive OR

Complement of an Event

From expanding on these axioms, it can be seen that the complement of an event has a probability related to subtraction of itself from the sample space probability. If the chance of the event happening is known, the chance of an event not happening is found by subtracting this from absolute certainty.

The complement of a set versus the whole

Statistical Independence

Two events and are said to be statiscally independent if:

Statistical independence

This refers to the fact that statistical data will happen independent of the preceding events. If you flip a coin, the probability of the next coin will be the same. If you take items from a bin, the probability of the next item being picked will go up, and is therefore dependent.

Repeated Independant Trials

From the rule of statistical independence, we can process repeated trials.

Coin Flips

Knowing that the probability of either event is 0.5, we can take the list of possible outcomes and calculate the probability.

Sampling with Replacement

In this case, we consider an event where an event occurring does not subtract from a finite amount of events, ie a coin flip. In the case of heads or tails, there aren't one less heads or tails. So it is as if we replace the event in our sample space.

Sampling without Replacement

Finite amounts of events that can be subtracted from the whole. If this event happens, it won't happen again, as if we have taken our card from a deck of cards and not placed it back in the deck.

Order of Outcomes When Sampling Without Replacement

Cases when its important what order events occur in. Did we draw the Ace of Spades within the first 3 draws? The number of each event is not important.

K-Tuples

When a trial is repeated times, we form a sample space of outcomes made up of number of events.

The Rule of Product

How many possibilities are there for the formation of -tuples, if there are choices for the th element?

The Rule of Product

By this rule, the number of possibilities when rolling a dice, then flipping a coin, will be . In the case where same number of outcomes are possible with each experiment, the number of -tuples is , or in the case of repeated coin tosses.

Permutations of Unordered Outcomes

We no longer care in what order outcomes occur, we are only concerned with the number of outcomes of a certain sort across all trials.

This involves the number of ways we can choose objects in choices.

Permutations Without Replacement
  • Experiments with two or more possible outcomes
  • These trials can be repeated independently for times
  • For each th trial the outcome from the previous is removed
  • Probabilities change for each consecutive trial

The resulting set is ordered, but as mentioned before, we only care about the number of possible permutations from these elements.

The number of possible sets is or

Example - The 13 cards of a suit in a deck of cards can be laid out in or different ordered sequences.

If you want just draws from possible ways to draw the object, the number of sets will instead be from to

Consider this problem - Lisa has 13 different ornaments and wants to put 4 ornaments on her mantle. In how many ways is this possible?

Using the product rule, Lisa has 13 choices for which ornament to put in the first position, 12 for the second position, 11 for the third position, and 10 for the fourth position. So the total number of choices she has is . Using the factorial notation, the total number of choices is .

From this example, we can see that if we have objects and want to arrange of them in a row, there are ways to do this.

The notation for permutations is

Combinations of Non-Unique Outcomes

A combination is a way of choosing elements from a set in which order does not matter.

Consider the following example: Lisa has 13 different ornaments and she wants to give 3 ornaments to her mom as a birthday gift (the order of the gifts does not matter). How many ways can she do this?

We can think of Lisa giving her mom a first ornament, a second ornament, a third ornament, etc. This can be done in ways. However, Lisa's mom is receiving all three ornaments at once, so the order Lisa decides on the ornaments does not matter.

There are reorderings of the chosen ornaments, implying the total number of ways for Lisa to give her mom an unordered set of 5 ornaments is .

Rule of Combinations or Unordered Permutations

The notation for combinations is =

Conditional Probability

A conditional probability is a probability that a certain event will occur given some knowledge about the outcome or some other event. is a conditional probability, it is read as "Probability of A given B".

Rule of Conditional Probability

A simple example - A fair 12-sided die is rolled. What is the probability that the roll is a 3 given that the roll is odd?

This is or

Because has already happened, the intersection of and can have the probability removed, because it is statistically redundant.

Conditional Probability if statiscally independent

Bayes Theorem

When attempting to compute the conditional probability of two events, when only one event is known, the Bayes Theorem allows for a workaround.

Consider to be Hypothesis, and to be Evidence

Bayes Theorem

We can expand the equation in the numerator to demonstrate fully:

Therefore, or can be found from and vice versa.

Total Probability

If , and form a partition of the sample space, for each

Total Probability

Knowing this, can be found from

Bayes General Rule

Total Probability, when , , form a partition

Random Variables

Random variables deal with a function which maps a number from the sample space . The number can be placed on the real number line, and a probability assigned to it based on its random occurrence.

Discrete Probabilty Distributions

Discrete random variables involve events with a discrete set of values.

Probability Mass Function

The Probability Mass Function, or PMF is a plotting of the probability of all events associated with a random variable . The sum of all amplitudes of the graph, will be 1.

For a given value , the probability of this value is .

Consider the ranking of weights of each possible outcome. If an outcome is more probable, it is heavier, and plotted above the others.

pmf

Probability Mass Function

Bernouli Random Variable

A Bernoulli RV is a discrete variable which will only produce values of and . Therefore, the likelhood of one will be and the other will be .

Binomial Random Variable

The Bernoulli concept can be extended with combinatorics, for example in the base of binary transmission error. When detecting the error in the first -bits of a -bit transmission, .

As shown in earlier sections, there are possible variations of bits in an -bit long transmission. If our likelihood of non-error bits is , and error is , the above will be intuitively correct.

Geometric Random Variable

Geometric RVs concern a wait for an event to happen. Should the expected event be given probability, there will be consecutive events before the occurs. Therefore, as seen in the graph below, the event occuring at the second transmission will be .

geo1

Geometric Random Variable

Geometric Random Variable PMF (0 k )

Poisson Random Variable

For a situtation where revents occur randomly at a given rate over a certain time interval , the probability of events happening within this time frame has been experimentally verified with representing the average.

Poisson Random Variable PMF

Note that for finding the probability of an event occurring after time , the probability becomes .

Uniform Random Variable

When all events are equally likely, the probability of each can be found easily from the uniform random variable PMF.

Uniform Distribution

uniform

Continous RVs and Their Distributions

For values which can take on a continuum of values, such as voltage, velocity, and mass, new tools are used to analyze their probability. The probability of these events is determined using the Cumulative Distribution Function or CDF, which is written as .

By this notation we can see that by following the graph from left to right, the probability of the event occuring to the left of value will be found by the amplitude of the CDF at that value. Therefore, as , .

cdf

When finding the probability of a value occurring between points and , their CDF values can be used. Remember this is Distribution, not Density, as we'll see in the PDF below. By keeping this straight, their purposes should be easy to remember.

CDF Probability Within a Range (b a)

The Probability Density Function or PDF is a derivative of the CDF that can also be used to find this probability:

pdf1

$\text{Pr}[a < X \leq b] = \int_{b}^{a}f_X(x)dx $

CDF Probability Within a Range (b a) From Integration

This can be seen to be similar to the Probability Mass Function, as it will integrate over its full range to . The difference is due to continous distributions being non-discrete - we can no longer say an item has mass, but points will now be denser (equivalent to heavier).

To use the PDF to find the probability of a number , we can multiply the PDF value at this point by the increment value to find the probability.

PDF Probability Within a Range ()

Integrating over a range () will also produce from the PDF.

Common Continuous RVs

Exponential Random Value

An extension of the Geometric Random Variable to the continuous realm, this represents a continuous graph of wait times where again represents the rate of arrival for an event as in Poisson RVs.

Its PDF follows

Gaussian RV

The Gaussian or "normal" random variable arises naturally in numerous cases. It can be defined by its mean, , which will be the center of its bell shape, and its standard deviation , which denotes the value in each direction it will pass before reaching of its peak value.

The Gaussian PDF

gauss

The square of the standard deviation, is known as the variance, and is a measure of the total width of the bell between these points.

The standard form of the PDF, centered at with a can be used to express :

The Standard Gaussian Function

And as with any other CDF,

These values, which can be used as a CDF, as with other standard Gaussian values, can be found by table.

Gaussian Q Values

In cases where the probablity of values at either tail of a Gaussian is required, such as , it is common to use functions.

The Gaussian Q Function

When a value is sought which is less than , the argument of the function will be negative, and define the left tail of the CDF. .

q1

The Gaussian Q Function

The Gaussian Q Function with Negative Argument

q2

Expectation

Expecation can be considered the average of the expected values in a sample space, where the values are weighted by their probability and summed.

Expectation of a Discrete RV

For continuous random variables, when the PDF exists, the expectation can be calculated whenever the variable converges absolutely:

Expectation of a Continuous RV

An important feature of the expectation is its invariance.

Moments

The moment is the produced when the value of the expectation is raised to the power. If for = 1, 2, 3

Continous Moment

Discrete Moment

The first moment is the mean, and each further value is called the th moment of the distribution.

Central Moments

The Central Moment is a mean of a random variable not centered at 0. Of particular importance is the variance, the square root of which will grant the width of the distribution.

The 2nd Central Moment

The mean of

The Standard Deviation is represented as , and is a measure of the width.

Also note, - this method is much easier to solve than the initial method of finding variance.

Entropy

From the topic of information encoding, it was found that the definition of information for an event is:

And the average of information for two exclusive events and :

The definition of average information is in fact an expectation for two events that form a partition of a sample space.

For a partition where ,

or,

Multiple Random Variables

Discrete Random Variables

The Joint PMF

Joint probability distribution can be thought of as

Or as where and

A relationship can be deterministic, such as or probabilistic, where probabilities of one value will affect the other.

Joint Probabilities refers to the probability of two variables taken together, such as . To answer this, the Joint CDF, must be found.

Recall from single random variable discussion, the PDF is the derivative of the CDF. Therefore, for cases where there are multiple variables:

For continuous distributions, the variables and will be used, however discrete distributions use , , and so on.

From each joint distribution, individual distributions for each variable, or \textbf{Marginal Distributions} can be found. These are simply the PDFs of the individual random variables found in earlier sections.

marginal

marginal2

Independant Random Variables

From earlier in the course, events were defined as independent if = . Similarly, random variables can be said to be independent if the product of marginal distributions is equal to the joint distribution.

Notice that this is a special condition for random variables and does not apply in general! In particular, if two random variables are not independent, there is no way that the joint PMF can be inferred from the marginals. In that case the marginals are insufficient to describe any joint properties between and .

Continuous Random Variables

Joint Distributions

joint-cdf

CDF:

PDF:

Joint continuity can be proven if the PDF evaluates to 1.

Marginal PDFs

Similar to the marginal PMF, each distribution and can be described by its own PDF:

marginal-pdf

Correlation

Correlation is defined as the similarity between two random variables and . Note that this is the multiplication of and , not an "and".

The correlation is calculated using the expectation, worked out as the expectation for () = based on the joint PDF for and .

The correlation can be misleading if both and have offsets built in to their means. The get around this, covariance is defined to remove their means.

Where and are the means of the two random variables taken seperately:

Note - Covariance is similar to variance, in that variance is a measure of how to the outcome of can vary, while covariance is the measure of how the outcome of and can vary together.

Also similarly to variance:

Correlation Coefficient

If comparing the correlation of one of pair of random variables to the correlation of another pair of random variables, both can be normalized based on their standar deviations:

Where and are the standard deviations of and .

Invariance of Expectation

If a random variable with PDF is transformed to another random variable by a deterministic relationship, , e.g.

Then moments for can be obtained from the PDF of , without calculating the new PDF :

Sum of Multiple RVs

Because the expectation is an integral, for the case that for RVs and :

Therefore, because variance will distribute via multiplication:

therefore also depends on . If and are independent, , and .

PDF for

Begin by finding the CDF in a method similar to discrete RVs:

Note the limit of integration for the portion uses the upper limit as the equivalent of (because )

The PDF can be obtained from this because

Because ,

If the two RVs are independent,

Notes on Sums of Independent RVs

  • When and are independent, = 0 and = + , while
  • If and have the same distribution,

Sums of Dependent RVs

In the case of , if and are not independent, then:

Bivariate Gaussian

Two joint RVs with Gaussian characteristics together will have a joint Gaussian characteristic, called bivariate or multi-variate for more than 2.

bivariate

Limit Theorems

In other terms, when the number of trials approach large numbers, the mean of the trials will be and variance will be , where is a single trial. Then the new variable should have the characteristics of a Gaussian variable, with the mean and at these values.