Ryan L. Buchanan

Ryangineer

Machine Learning Mathematics & Virtual Reality Aesthetics

Pythagorean theorem illustration
"Mathematics requires a small dose, not of genius, but of an imaginative freedom which, in a larger dose, would be insanity." — Angus K. Rodgers

Noteworthy Machine Learning Algorithms

  Machine Learning   ⇒   software able to detect patterns, make decisions, predict outcomes, learn from mistakes & optimize own performance without being explicitly programmed to do so

Supervised Learning

Learning a function that maps to an output based on the example of input-output pairs. In other words, training a model on data where the outcome is known, for subsequent application to data where the outcome is not known."
"Present labeled examples to learn from. For instance, when we want to be able to predict the selling price of a house in advance in a real estate market, we can get the historical prices of houses and have a supervised learning algorithm successfully figure out how to associate the prices to the house characteristics.
Using the uppercase letter X we intend to use matrix notation, since we can also treat the y as a response vector (technically a column vector) and the X as a matrix containing all values of the feature vectors, each arranged into a separate column of the matrix. . . . building a function that can answer the question about how X can imply y . . . [with] a functional mapping that can translate X values into y without error or with an acceptable margin of error. . . . to determinate a function of the following kind:" (Massaron, pg 24)

Unsupervised Learning

"Looks for previously undetected patterns in a data set with no pre-existing labels and with a minimum of human supervision"
"[P]resent examples without any hint, leaving it to the algorithm to create a label. For instance, when we need to figure out how the groups inside a customer database can be partitioned into similar segments based on their characteristics and behaviors." WGU MSDA

Reinforcement Learning

"how software agents ought to take actions in an environment in order to maximize the notion of cumulative reward"
"[P]resent examples without labels, as in unsupervised learning, but get feedback from the environment as to whether label guessing is correct or not. For instance, when we need software to act successfully in a competitive setting, such as a videogame or the stock market, we can use reinforcement learning. In this case, the software will then start acting in the setting and it will learn directly from its errors until it finds a set of rules that ensure its success." WGU MSDA



Lovely Deep Learning

"[M]achine learning uses multiple layers of simple, adjustable computing elements." (Russell, p. 26)
"Deep learning solves [the] central problem in representation learning by introducing . . . simpler representations . . . [and] enables the computer to build complex concepts out of simpler concepts . . . breaking the desired complicated mapping into a series of nested simple mappings . . . called "hidden [layers]" because their values are not given in the data." (Bengio, p. 5-6)

Artificial Neural Networks

  ↳ A computing system that consist of a number of simple but highly interconnected elements or nodes, called ‘neurons’, which are organized in layers which process information using dynamic state responses to external inputs, an extremely useful algorithm for finding patterns too complex to be manually extracted

Convolutional Neural Networks

  ↳ A class of deep neural networks, most commonly applied to analyzing visual imagery. CNNs are regularized versions of multilayer perceptrons. Multilayer perceptrons usually mean fully connected networks, that is, each neuron in one layer is connected to all neurons in the next layer.

  • Convolution | Visual Imagery Analysis
  • A special kind of mathematical linear operation to give a network a degree of translation invariance; eg, a typical image convolution is a form of blurring

Natural Language Processing

  ↳ Starts with raw text in whatever format available, processes it, extracts relevant features and builds models to accomplish various NLP tasks

  • NLP Pipeline
  • Text Processing   ⇒   Feature Extraction   ⇒   Modeling
    • Document-Term Matrix

      Compute dot product (sum of the products of corresponding elements) to find similarities

      a * b = Σ (a1b1 + a2b2 + a3b3 + . . . + anbn)
    • Cosine Similarity

      Divide the product of two vectors by their magnitudes or Euclidean norms

      cos(θ) = a*b ||a||*||b||
        ↳ where:
            Identical vectors → cos(θ) = 1
            Orthogonal vectors → cos(θ) = 0
            Exact opposite vectors → cos(θ) = -1
    • TF-IDF Transform

      Term frequency-inverse document frequency


      tfidf(t, d, D) = tf(t, d) * idf(t, D)
        ↳ where:
            tf(t, d) = count(t, d) |d|
            idf(t, D) = Log ( |D| |{d ∈ D : t ∈ d}| )
  • Stemming

    Takes the root of a word removing conjugation to simplify & understand gist meaning (reducing final dimension )

  • Lemmatization

    Refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma.



Mathematics

Etymology: The word "mathematics" comes from Ancient Greek máthēma (μάθημα), meaning "that which is learnt," "what one gets to know," hence also "study" and "science". Wikipedia



Intimate Linear Algebra

The study of linear equations & geometric transformations using matrices, vectors spaces & determinants.
"Solving for unknowns within a system of linear equations." Mathematical Foundations of Machine Learning

Fundamental Mathematical Objects

Linear Regression

Determinants

  ↳ The volume scaling factor of the linear transformation described by the matrix

Geometrical Aspects of Linear Algebra

  ↳ Mathematics to used see through to the governing dynamics of the physical universe



Salient Statistics & Probabilities

Statistics is the art of making numerical conjectures about puzzling questions.
↳ Is statistics a field of mathematics? Some say it is not mathematics but the science of data. Whatever you decide, you must embrace it, my Dear Friends.

Terminology

Probability

Generalizing logic to situations with uncertain outcomes & measurements, & incomplete theories; the possible outcomes of events.
"The formalization of probability, combined with the availability of data, led to the emergence of statistics as a field." Artificial Intelligence: A Modern Approach, pg 8

Descriptive Statistics

"Descriptive statistics are brief descriptive coefficients that summarize a given data set, which can be either a representation of the entire or a sample of a population. Descriptive statistics are broken down into measures of central tendency and measures of variability (spread)." Investopedia


Used to describe data; univariate analysis on a single variable or multivariate analysis when looking at two or more variables in the dataset

  • Moments of Statistics
    1. Location
    2. Variability
    3. Skewness
    4. Kurtosis
  • Estimates of Location (Measures of Central Tendency)
    • Mean
      = Σ n i=1  x i n
    • Trimmed Mean
      = Σ n - p i = p + 1  x (i) n  -  2 p
    • Weighted Mean
      w = Σ n i=1   w i x i Σ n i=1   w i
  • Estimates of Variability

    Second dimension (after estimate of location) in summarizing a feature, aka dispersion, measures whether data are tightly packed or spread out.

    • Mean Absolute Deviation
      mean absolute deviation = Σ n i=1   | x i - | n
    • Variance
      s 2 = Σ n i=1   ( x i - ) 2 n - 1
    • Standard Deviation

      "Sort of" the average distance from the mean.

      s = Σ n i=1   ( x i - ) 2 n - 1
    • Median Absolute Deviation (MAD)
      * A robust estimate of variability as opposed to variance & standard deviation.

      MAD = Median ( | x 1 - m | | x 2 - m | , ..., | x n - m | )
  • Data Distribution
    • Empirical Rule or Three Sigma Rule

      Symmetrically distributed data follows a pattern whereby most data points fall within three standard deviations of the mean.

    • Relative Frequency

      The proportion of times a value occurs in a dataset.

    • Z-Scores

      A measure of the number of standard deviations a particular data point is from the mean.

      z = Observed  -  s
    • Percentile Rank
      percentile rank = [ ( # of values below x ) + 0.5 total # of values ] 100
    • Percentile: Precise Definition

      Take any value between the order statistics (sorted or ranked data) x(j) & x(j + 1) where j satisfies:

      100 * j n P < 100 * j + 1 n

      The percentile is the weighted average:

      Percentile ( P ) = ( 1 - w ) x (j) + w x (j + 1)

      for some weight between 0 & 1.

    • N-Quantiles
      • Index i for k-th cut point
        i   =   [ k n ( d   -   1 ) ]   +   1
  • Binary & Categorical Data

    Simple proportions or percentages tell the story of the data

    • Expected Value (EV)

      When the categories can be associated with a numeric value, this gives an average value based on a category's probability of occurrence. A form of weighted mean in which the weigths are probabilities, the EV adds the ideas of future expectations & probability weights, often based on subjective judgements.

      1. Multiply each outcome by its probability of occurrence.
      2. Sum theses values.
          EV = (weight as %)(component value) + (weight as %)(component value) + (weight as %)(component value)
    • Probability

      Probability is essentially a ratio. The ratio of a particular event or outcome versus all the possible outcomes.
      Total probability of sample space: the sum of probabilities of all possible outcomes must add up to 100%.

      • Objective Probability

        "Objective probability refers to the chances or the odds that an event will occur based on the analysis of concrete measures rather than hunches or guesswork. ... The probability estimate is computed using mathematical equations that manipulate the data to determine the likelihood of an independent event occurring." Investopedia

        • Classical

          All possible outcomes are known & equally ikely; everything is fair & equal, eg, a coin toss or roll of dice.

        • Empirical

          AKA, relative frequency or experimental probability is the ratio of the # of outcomes for a specific event to the total number of subsequent trials. Based on observed data from past events, eg, odds of favorite ball team winning.

      • Subjective Probability

        "a type of probability derived from an individual's personal judgment or own experience about whether a specific outcome is likely to occur. It contains no formal calculations and only reflects the subject's opinions and past experience." Investopedia

    • Correlation

      A measurement of the extent to which numeric variables are associated with one another.

      • Pearson's Correlation Coefficient (r)

        A measure of linear correlation between two sets of data; will always lie between +1 (perfect positive correlation) & -1 (perfect negative correlation).

        r  =  Σ n i=1 ( x i  -  ) ( y i  -  ) ( n  -  1 ) s x s y
      • Correlation Matrix

        A table where the variables are shown on both rows & columns, & the cell values are the correlation between the variables.

        v1 v2 v3
        v1 1 0 0
        v2 0 1 0
        v3 0 0 1

Inferential Statistics

Putting foundational statistics to use with samplig to find meaningful statistics that will inform us about a population.
"Scholars interested in human society . . . grasped these ideas and found to their surprise that the variation in human characteristics and behavior often displays the same pattern as the error in measurement . . ." (regarding the application of the standard normal distribution to social science in the early 19th century)
- Leonardo Mlodinow, The Drunkard's Walk: How Randomness Rules Our Lives

  • Simple Random Sample

    The most dependable data comes from simple random samples.

    • Each individual has the same probability of being chosen at any stage.

    • Each subset of k individuals has the same probability of being chosen as any other subset containing k individuals.

    Must exhibit two key characteristics:

    • Unbiased sample

    • Independent data points

  • Law of Large Numbers

    When performing experiments, the average of the results from large numbers of trials should be close to the expected value & will tend to become closer to the expected valueas more trials are performed. Experimental probability will eventually lead to theoretical probability. As a sample size grows, its mean gets closer to the average of the whole population.

    • Theoretical Probability

      Classical probability is the likelihood that an event will occur if we could run trials of an experiment an infinite number of times.

  • Law of Error

    The equation of the normal probability curve to which the accidental errors associated with an extended series of observations tend to conform.

  • Central Limit Theorem

    If you have a population with mean μ and standard deviation σ and take sufficiently large random samples from the population with replacement , then the distribution of the sample means will be approximately normally distributed.

  • Parametric vs. Non-Parametric

    "Parametric tests assume that your data follows a particular known distribution, usually the Normal Distribution . . . Those distributions have been studied a lot . . .
      The mean, median & standard deviation are examples of parameters & testing the differences in the parameters of distribution-A vs. Distribution-B is exactly what parametric tests do. They are useful & powerful because as long as your own data approximates the known distribution you can learn a lot about your own data by inferring the same conclusions you would with the known distribution.
      Non-parametric tests don't make any assumption about what known distribution your data might follow. They don't assume the shape of your data is 'Normal' . . . and therefore are not able to infer powerful conclusions of those known distributions. Mostly N on-Parametric tests rank data in order and compare. Median tests work like this (e.g.Moods Median test). The Median is a parameter, but the test is not making assumptions about what distribution your data follows, thus it's a non-Parametric test."
    Reddit ELI5

  • Standard Error

    A single metric that sums up the variability in the sampling distribution for a statistic.


    Standard Error  =  SE  =  s n
      ↳ where:
          s = standard deviation of the sample values
          n = the sample size
  • Square Root of n Rule

    To reduce the standard error by a factor of 2, the sample size must be increased by a factor of 4.

  • Bootstrap Algorithm

    A powerful tool for assessing the variability of a sample statistic. To draw additional samples, with replacement, from the sample itself & recalculate the statistic or model for each resample.

    1. Draw a sample value, record it & then replace it.
    2. Repeat n times.
    3. Record the mean of the n resampled values.
    4. Repeat steps 1-3 R times.
    5. Use the R results to:
      • Calculate their standard deviation (this estimates sample mean standard error).
      • Produce a histogram or boxplot.
      • Find a confidence interval.
  • Probability Distribution vs. Probability Density Function

    A probability distribution is a list of outcomes and their associated probabilities. A function that represents a discrete probability distribution is called a probability mass function. A function that represents a continuous probability distribution is called a probability density function.

  • F-Distribution

    The F-statistic measures the extent to which differences among group means are greater than we might expect under normal random variation. Also called residual variability, this comparison is termed analysis of variance.

  • Statistical Significance Tests & ANOVA

    Is there a significant difference among groups tested? How significant is that difference? Think Alpha level.

    • T-tests measure one or two groups.

    • T-tests tell you which group is different.

    • One-sample known as a student t-test & will determine significance against a known population mean.

    • Two sample T-tests (independent samples) will determine significance between two groups of data.
      "A test to determine whether the average of two samples is different when either the sample set is too small (that is, fewer than 30 data points per sample), or if the population standard deviation is unknown. The two samples are also generally drawn from distributions assumed to be normal." (Malik, p. 39).

    • A paired T-test (dependent samples) will determine signifcance for the same group at different times (pre- & post-test).

    • Calculating T-test require at least three values:

      • the group means or mean difference

      • the standard deviation of each group

      • the # of data values of each group

      • for one-sample T-test, the hypothesized or population mean

    • A two sample Z-test is a "test to determine whether the averages of the two samples are different. This test assumes that both samples are drawn from a normal distribution with a known population standard deviation" (Malik, p. 39).

    • ANOVA measures more than two groups.

    • Both are parametric & means tests.

    • Both require normal distributions, homogeneity of variance & random sampling.

    • Both measure a ratio or interval level (continuous) dependent variable.

    • If you have a significant difference among several groups, post-hoc testing will be necessary to provide further investigation.

  • Difference Between r & R2
    • The value r is the correlation between observed values of Y & predicted values of . In essence, it is the relationship between two variables, say weight & height. The positive & negative values express that relationship as proportional or inversely proportional. So, r deals with the relationship of two variables.

    • R2 is the coefficient of determination. It is the percentage of variation in the response variable that is explained by the linear model - how strongly multiple variables are correlated with the target variable. It is r times the r value.

  • Poisson Distributions

    Measuring a count of events over some interval of time/space. In many applications, the event rate, λ, is known or can be estimated from prior data.

    Event rate  =  λ  =  mean # of events occurring in interval of time or space
  • Weibull Distribution

    An extension of the exponential distribution in which the event rate is allowed to change, as specified by a shape parameter, β.

    β  >  1  ⇒  probability of event increases over time
    β  <  1  ⇒  probability of event decreases over time
  • Gaussian Distribution Formulas
  • T Distribution

    Aka the Student's t-distribution, is a type of probability distribution that is similar to the normal distribution with its bell shape but has heavier tails. T distributions have a greater chance for extreme values than normal distributions, hence the fatter tails.

    t   =     -   μ s / n
  • Binomial Distribution Formulas
    • Mean
    • μ = n * p
    • Standard Deviation
    • σ = n * p * (1 - p)
    • Variance
    • σ 2 = n * p * (1 - p)
    • Probability Density Function
    • f ( k , n , p ) = n ! k !( n - k )! p k ( 1 -   p ) (n - k)
  • Sample Distribution of the Sample Proportion
  • Confidence Interval

    Simply, a confidence interval provides a level of confidence for a given interval. Or, more specifically, a range of values so defined that there is a specified probability that the value of a parameter lies within it.

    • CI for a Population Mean
    • (a, b) = x̄ ± z * σ n x̄ ±  z α/2 * σ n
                                or
      (a, b) = x̄ ± z * σ x̄ ±  z α/2  *  σ
        ↳ where:
           a = lower limit of confidence interval
           b = upper limit of confidence interval
           z* = critical value ⇒ z-score
           α = (1 - confidence level) ⇒ signicance level
    • CI for Population Proportion
    • (a, b) = p̂ ± z * p̂(1 - p̂) n
    • Solved for Sample Size (n)
    • n =   ( z* p̂(1 - p̂) ME ) 2
        ↳ where:
            p̂ = sample proportion
    • Margin of Error (ME)

      "A margin of error tells you how many percentage points your results will differ from the real population value. For example, a 95% confidence interval with a 4 percent margin of error means that your statistic will be within 4 percentage points of the real population value 95% of the time." StatisticsHowTo

    • ME = Distance from sample mean (x̄) or sample proportion (p̂) to either edge of confidence interval
              =   ±  z  *  σ n
              =   ± z * p̂(1 - p̂) n
  • Chi-Square Distribution

    The statistic that measures the extent to which results depart from the null expectation of independence. It is distribution-free & non- parametric. Must be used when dealing with categorical dependent variable, for example binary, "yes/no", target variable.

  • Chi-Square Tests(χ2)

    "A test to determine whether the distribution of data points to categories is different than what would be expected due to chance. This is the primary test for determining whether the proportions in tests, such as those in an A/B test, are beyond what would be expected from chance" (Malik, p. 39).

    Assumptions
    • Data in cells should be frequencies or counts of cases.

    • Levels (or categories) of the variables are mutually exclusive.

    • Each subject may contribute to one & only one cell.

    • Study groups must be independent.

    • Two variables, both are measured as categories, usuallys at the nominal level.

    • Values in cells should be five or more.

    Features
    • A hypothesis test comparing two or more proportions, Ho: P1 = P2

    • Random samples are required.

    • Observations are independent.

    • Uses χ2 table to show the critical values of the χ2 distribution.

    I. Goodness of Fit Test:

    Tests if a categorical variable follows a hypothesized distribution.

    • Simple random sampling

    • Categorical variables

    • Expected frequency count from previous samples, ex. percentage by category

    • Degrees of freedom is k - 1

    • Ho: the population frequencies = expected frequencies values.
      Ha: the null hypothesis is false.

    • Test statistic is defined:

      χ 2   =   Σ [ ( O r, c   -   E r, c ) E r, c ]
    II. Test for Independence:

    Looks for significant difference between two categorical variables.

    • Simple random sampling

    • Categorical variables

    • Expected frequency count for each cell of the table is at least five

    • Degrees of freedom:

      df   =   ( r   - 1 ) * ( c   - 1 )
        ↳ where:
            r = # of levels (rows) for one categorical variable
            c = # of levels (columns) for the other
    • Expected Frequencies:

      E r, c   =   ( n r   *   n c ) n
    • Test statistic is defined:

      χ 2   =   Σ [ ( O r, c   -   E r, c ) E r, c ]
    III. Test for Homogeneity:

    Tests for difference in proportion between several groups.

Statistical Significance Testing

  • Test Statistic: proportion, average, difference between groups or a distribution
  • Hypothesis Test Logic: "Given the human tendency to react to unusual but random behavior and interpret it as something meaningful and real, in our experiements we will require proof that the difference between groups is more extreme than what chance might reasonably produce."Practical Statistics for Data Scientists (p. 94)
  • Null Hypothesis(Ho): a baseline assumption that the treatments in an experment are equivalent & the results observed are a product of chance
  • Alternative Hypothesis(Ha or H1): the results observed in an experiment cannot be explained by chance
  • Significance Level: the value of the test statistic needs to take before it is decided that the Null Hypothesis cannot explain the difference
  • Statistically Significant: "A result of an experiment is said to have statistical significance, or be statistically significant, if it is likely not caused by chance for a given statistical significance level. ... It also means that there is a 5% chance that you could be wrong." Optipedia
    • p-value

      Given a chance model that embodies the null hypothesis, the p-value is the probability (frequency) of obtaining results as unusual or extreme as the observed results.

    • Alpha

      The probability threshold of "unusualness" that chance results must surpass for actual outcomes to be deemed statistically significant.

    • Type I Error

      Mistakenly concluding an effect is real (when it is due to chance).

    • Type II Error

      Mistakenly concluding an effect is due to chance (when it is real).

    • Degrees of Freedom

      The number of values free to vary & affects the shape of the distribution; the name given the n - 1 denominator seen in the calculation for variance & standard deviation. When you use a sample to estimate the variance for a population, you will end up with an estimate that is slightly biased downward if you use the n in the denominator. If you use n - 1 in the denominator, the estimate will be free of bias.
      The concept of degrees of freedom lies behind the factoring of categorical variables into n - 1 indicator or dummy variables when doing a regression (to avoid multicollinearity).

  • ANOVA (Analysis of Variance)

    The statistical procedure that tests for a statistically significant difference among multiple groups.

    • Pairwise Comparison

      A hypothesis test (e.g., of means) between two groups among multiple groups. The more such pairwise comparisons we make, the greater the potential for being fooled by random chance.

    • Omnibus Test

      A single hypothesis test of the overall variance among multiple group means.

    • Decomposition of Variance

      Separation of components contributing to an individual value (e.g., from the overall average, from a treatment mean, & from residual error).

    • F-statistic

      A standardized statistic that measures the extent to which differences among group means exceed what might be expected in a chance model.

    • SS

      "Sum of squares," referring to deviations from some average value.

  • p-value

    Given a chance model that embodies the null hypothesis, the p-value is the probability (frequency) of obtaining results as unusual or extreme as the observed results.

  • Alpha

    The probability threshold of "unusualness" that chance results must surpass for actual outcomes to be deemed statistically significant.

  • Type I Error

    Mistakenly concluding an effect is real (when it is due to chance).

  • Type II Error

    Mistakenly concluding an effect is due to chance (when it is real).

  • Degrees of Freedom

    The number of values free to vary & affects the shape of the distribution; the name given the n - 1 denominator seen in the calculation for variance & standard deviation. When you use a sample to estimate the variance for a population, you will end up with an estimate that is slightly biased downward if you use the n in the denominator. If you use n - 1 in the denominator, the estimate will be free of bias.
    The concept of degrees of freedom lies behind the factoring of categorical variables into n - 1 indicator or dummy variables when doing a regression (to avoid multicollinearity).

  • ANOVA (Analysis of Variance)

    The statistical procedure that tests for a statistically significant difference among multiple groups.

    • Pairwise Comparison

      A hypothesis test (e.g., of means) between two groups among multiple groups. The more such pairwise comparisons we make, the greater the potential for being fooled by random chance.

    • Omnibus Test

      A single hypothesis test of the overall variance among multiple group means.

    • Decomposition of Variance

      Separation of components contributing to an individual value (e.g., from the overall average, from a treatment mean, & from residual error).

    • F-statistic

      A standardized statistic that measures the extent to which differences among group means exceed what might be expected in a chance model.

    • SS

      "Sum of squares," referring to deviations from some average value.

  • Common Significance Tests

      ↳ Using the population mean (μ):

    • Z-Test | When σ is known
      z =   -   μ o σ =   -   μ o σ n
    • T-Test | When σ is unknown | large sample > 30
      t =   -   μ o s =   -   μ o s n
    • T-Test | When σ is unknown | small sample < 30 | assume normal distribution
      t =   -   μ o s =   -   μ o s n

      ↳ Using the population proportion (p):

    • Pearson's Chi-Squared Test | Expected value of population proportion (p̂) known
      z   =     -   p o p o ( 1   -   p o ) n

Regression

  ↳ So . . . why is it called "Regression", anyway?

  • Regression Analysis

    Process used to turn a set of disconnected data points into an equation that models the whole set; the process of aproximating a trend with a mathematical function

    • Trend or Regression Line, Approximating Curve, Line of Best Fit, Least Squares Line
        =   mx + b
    • Slope
      m   =   nΣxy - ΣxΣy x 2   -   (Σx) 2
    • Y-intercept
      b   =   Σy - mΣx n
    • Correlation Coefficient
      r   =   1 n - 1 Σ ( x i   -   s x ) ( y i   -   s y )
            or
      r   =   1 n - 1 Σ z x i   z y i
    • Pearson Correlation Coefficient

      Method for quantifying linear correlation, represented by r is a number ranging from -1 to 1 & indicating how well a scatter plot fits a linear trend

      r   =   Σ n i=1 ( x i   -   ) ( y   -   ) Σ n i=1 ( x i   -   ) 2   Σ n i=1 ( y i   -   ) 2
    • Standard Deviation
      s x   =   Σ(x - x̄) 2 n
    • Residual or Error
      residual   =   e   =   actual value - predicted value
    • Sum of Residuals
      Σresiduals   =   Σe   =   0
    • Coefficient of Determination

      Gives a percentage of how much better fit the line of regression is than the ȳ

      r 2   =   predicted squares - ȳ squares predicted squares   ⇒   expressed as %
    • Root Mean Squared Error (RMSE) or Standard Deviation of the Residuals

      The smaller the RMSE, the better fit the line of regression

      RMSE   =   Σ e 2 n - 1
    • Chi-Square Tests (χ2)

      Pearson's Chi-Square Test is used to ask whether the differences you observe between different groups are real or imagined.
      The larger the the χ2-value, the more likely the two variables affect each other

      χ 2   =   Σ (observed - expected) 2 expected
    • Degrees of Freedom (df)

      The number of values you would need in your data in order to be able to know all the other values



Dynamic Calculus

The mathematics of curves, motion and change, calculus is basically very advanced algebra (finding rates & slopes) & geometry (addition to infinity & finding area).

Three Central Problems of Calculus

Fundamental Theorem of Calculus (FTC)

Shows the relationship between differentiation & integration. If a function is integrated & then differentiated, it is back to the original function. Integration & differentiation are inverse to each other.

Differentiation

The derivative function tells how fast & where the function is increasing or decreasing.

Integration

The integral of a function models the area under the graph of a function.



Computer Science Mathematics

Absolute Value Inequalities

Scientific Notation

Any number can be written in scientific notation. It involves shifting the decimal place to the left (positive) or right (negative) until the result is a number with only one place before the decimal point & then multiplying by 10 raised to the number of places shifted.

NP Complete Problem

"We call these problems "nondeterministic polynomial" or NP, because you can't give someone a pre-determined set of steps to solve it (unless that someone is a perfect guesser!), but if someone does happen to solve it, they would only need a polynomial number of steps. . . .
  [W]e can find the answer to any NP problem by solving a related problem in this group. The problems in this group are called "NP-complete" (because solving one of them can solve the complete group of NP problems). If we ever found a fast (i.e. polynomial) way to solve any NP-complete problem, we could find fast ways to solve every NP problem. Then we wouldn't have to talk about NP any more, because they would all just be P (polynomial) problems. That's why we call the problem "P=NP". . . .
  [M]athematicians think that NP-complete problems are not P, because so many people have spent so much time thinking about it that if they were, somebody would have found out how by now (because it's usually easier to find a way to do something than to prove that there is no way)." Reddit ELI5



Marvelous Logarithms

Logarithms were the supercomputers their era.   See the Description of the Marvelous Canon of Logarithms by John Napier



Cordially Discrete Mathematics

Branch of mathematics dealing with discrete (distinct & disconnected) or finite sets of elements rather than continuous or infinite sets of elements. The terms discrete & continuous are analogous to the computer science terms digital & analog.
* Brilliant course from Shawn Grooms at freeCodeCamp: Math for Programmers



Nimble Number Theory

The study of the numbers & their properties.

The Integers



Constructs of the Universe

Mind-expanding & ancient wisdom from A Beginner's Guide to Constructing the Universe - Michael S. Schneider

The Monad | One

"The ancient philosophers conceived that the Monad breathes in the void and creates all subsequent numbers"

The Dyad | Two

The Triad | Three

The unity of the circle manifests as a trinity: center or point, radius or line & circumference.

The Tetrad | Four

Three points define a flat surface, but it takes a fourth to define depth, progress to three dimensions and express geometry as volume.

The Pentad | Five

Pentagonal symmetry is the supreme symbol of life.

The Hexad | Six

The Hexad is sometimes symbolized by the "Pythagorean triangle" or "3-4-5 right triangle" made by the ancient method using a twelve-knotted rope. It displays the sequence from one to six (1 - 6): one right angle (1), two unequal angles (2), sides of three (3), four (4), and five (5), and closing an area of six square units (6).

The Heptad | Seven

Seven is perhaps the most venerated number of the Dekad, the number par excellence in the ancient world

The Octad | Eight

Periodic Renewal & the doubling number

The Ennead | Nine

Composed of a trinity of trinities, the number nine represents the principles of the sacred Triad taken to their utmost expression.
The ancient Greeks called nine "the horizon," as it lies at the edge of the numerical shore before the boundless ocean of numbers that repeat in endless cycles the principles of the first nine digits.

The Decad | Ten

The Decad represents the power to generate numbers beyond itself, toward the infinite. Multiplying any number by ten does not change its essential nature but only acts to expand its power.



Positively Brilliant References



About Ryan L Buchanan

I am training as a Software Developer, Data Analyst & Machine Learning Engineer.  I am currently enrolled in the Software Technology program at Ogden-Weber Technical College.  I am also acquiring certifications as an ML Engineer & Algorithmic Trader from Udacity.   I have a Masters in Data Analytics, an MBA & an MS in Instructional Design.  I have working knowledge of C#, R, SQL, HTML, CSS, Javascript, Java and Python programming languages.

I have a multi-displinary background including military intelligence, psychology, linguistics, economics, virtual reality & educational technology.  I have worked abroad for ten years with military, universities & vocational schools.   I have working knowledge of Arabic, Chinese & French.  I am very mobile, able to relocate quickly, adapt easily to diverse working conditions & have a current passport.

I have a passion for mathematics, statistics & artificial intelligence.  I am enthusiastic, highly self-motivated & enjoy presenting informative data to decision makers.  I am eager to work with dynamic teams to create high quality products & services.