[ - ] incanter.bayes
  1. sample-model-params
    fn ([size linear-model])
    Returns a sample of the given size of the the parameters (coefficients and
    error variance) of the given linear-model. The sample is generated using
    Gibbs sampling.
    See also:
    incanter.stats/linear-model
    Examples:
    (use '(incanter core datasets stats charts bayes))
    (def ols-data (to-matrix (get-dataset :survey)))
    (def x (sel ols-data (range 0 2313) (range 1 10)))
    (def y (sel ols-data (range 0 2313) 10))
    (def lm (linear-model y x :intercept false))
    (def param-samp (sample-model-params 5000 lm))
    ;; view trace plots
    (view (trace-plot (:var param-samp )))
    (view (trace-plot (sel (:coefs param-samp) :cols 0)))
    ;; view histograms
    (view (histogram (:var param-samp)))
    (view (histogram (sel (:coefs param-samp) :cols 0)))
    ;; calculate statistics
    (map mean (trans (:coefs param-samp)))
    (map median (trans (:coefs param-samp)))
    (map sd (trans (:coefs param-samp)))
    ;; show the 95% bayesian confidence interval for the firt coefficient
    (quantile (sel (:coefs param-samp) :cols 0) :probs [0.025 0.975])
  2. sample-multinomial-params
    fn ([size counts])
    Returns a sample of multinomial proportion parameters.
    The counts are assumed to have a multinomial distribution.
    A uniform prior distribution is assigned to the multinomial vector
    theta, then the posterior distribution of theta is
    proportional to a dirichlet distribution with parameters
    (plus counts 1).
    Examples:
    (use '(incanter core stats bayes charts))
    (def samp-props (sample-multinomial-params 1000 [727 583 137]))
    ;; view means, 95% CI, and histograms of the proportion parameters
    (mean (sel samp-props :cols 0))
    (quantile (sel samp-props :cols 0) :probs [0.0275 0.975])
    (view (histogram (sel samp-props :cols 0)))
    (mean (sel samp-props :cols 1))
    (quantile (sel samp-props :cols 1) :probs [0.0275 0.975])
    (view (histogram (sel samp-props :cols 1)))
    (mean (sel samp-props :cols 2))
    (quantile (sel samp-props :cols 2) :probs [0.0275 0.975])
    (view (histogram (sel samp-props :cols 2)))
    ;; view a histogram of the difference in proportions between the first
    ;; two candidates
    (view (histogram (minus (sel samp-props :cols 0) (sel samp-props :cols 1))))
  3. sample-proportions
    fn ([size counts])
    sample-proportions has been renamed sample-multinomial-params
[ - ] incanter.censored
  1. censored-mean-lower
    fn ([a mu sigma])
    Returns the mean of a normal distribution (with mean mu and standard
    deviation sigma) with the lower tail censored at 'a'
  2. censored-mean-two-sided
    fn ([a b mu sigma])
    Returns the mean of a normal distribution (with mean mu and standard
    deviation sigma) with the lower tail censored at 'a' and the upper
    tail censored at 'b'
  3. censored-mean-upper
    fn ([b mu sigma])
    Returns the mean of a normal distribution (with mean mu and standard
    deviation sigma) with the upper tail censored at 'b'
  4. censored-variance-lower
    fn ([a mu sigma])
    Returns the variance of a normal distribution (with mean mu and standard
    deviation sigma) with the lower tail censored at 'a'
  5. censored-variance-two-sided
    fn ([a b mu sigma])
    Returns the variance of a normal distribution (with mean mu and standard
    deviation sigma) with the lower tail censored at 'a' and the upper
    tail censored at 'b'
  6. censored-variance-upper
    fn ([b mu sigma])
    Returns the variance of a normal distribution (with mean mu and standard
    deviation sigma) with the upper tail censored at 'b'
  7. truncated-variance
    fn ([& options])
    Returns the variance of a normal distribution truncated at a and b.
    Options:
    :mean (default 0) mean of untruncated normal distribution
    :sd (default 1) standard deviation of untruncated normal distribution
    :a (default -infinity) lower truncation point
    :b (default +infinity) upper truncation point
    Examples:
    (use '(incanter core stats))
    (truncated-variance :a -1.96 :b 1.96)
    (truncated-variance :a 0)
    (truncated-variance :b 0)
    (use 'incanter.charts)
    (def x (range -3 3 0.1))
    (def plot (xy-plot x (map #(truncated-variance :a %) x)))
    (view plot)
    (add-lines plot x (map #(truncated-variance :b %) x))
    (def samp (sample-normal 10000))
    (add-lines plot x (map #(variance (filter (fn [s] (> s %)) samp)) x))
    (add-lines plot x (map #(variance (mult samp (indicator (fn [s] (> s %)) samp))) x))
    References:
    DeMaris, A. (2004) Regression with social data: modeling continuous and limited response variables.
    Wiley-IEEE.
    http://en.wikipedia.org/wiki/Truncated_normal_distribution
[ - ] incanter.charts
  1. add-box-plot
    macro ([chart x & options])
    Adds an additional box to an existing box-plot, returns the modified chart object.
    Options:
    :series-label (default x expression)
    Examples:
    (use '(incanter core charts stats))
    (doto (box-plot (sample-normal 1000) :legend true)
    view
    (add-box-plot (sample-normal 1000 :sd 2))
    (add-box-plot (sample-gamma 1000)))
    References:
    http://www.jfree.org/jfreechart/api/javadoc/
    http://www.jfree.org/jfreechart/api/javadoc/org/jfree/chart/JFreeChart.html
  2. add-categories
    macro ([chart categories values & options])
    Adds an additional categories to an existing bar-chart or line-chart, returns the modified chart object.
    Options:
    :group-by
    Examples:
    (use '(incanter core charts stats))
    (def seasons (mapcat identity (repeat 3 ["winter" "spring" "summer" "fall"])))
    (def years (mapcat identity (repeat 4 [2007 2008 2009])))
    (def x (sample-uniform 12 :integers true :max 100))
    (def plot (bar-chart years x :group-by seasons :legend true))
    (view plot)
    (add-category-line plot (plus 3 years) (sample-uniform 12 :integers true :max 100) :group-by seasons)
    (def plot2 (line-chart years x :group-by seasons :legend true))
    (view plot2)
    (add-category-line plot2 (plus 3 years) (sample-uniform 12 :integers true :max 100) :group-by seasons)
    References:
    http://www.jfree.org/jfreechart/api/javadoc/
    http://www.jfree.org/jfreechart/api/javadoc/org/jfree/chart/JFreeChart.html
  3. add-function
    macro ([chart function min-range max-range & options])
    Adds a xy-plot of the given function to the given chart, returning
    a modified version of the chart.
    Options:
    :series-label (default x expression)
    :step-size (default (/ (- max-range min-range) 500))
    See also:
    function-plot, view, save, add-function, add-points, add-lines
    Examples:
    (use '(incanter core stats charts))
    ;; plot the sine and cosine functions
    (doto (function-plot sin (- Math/PI) Math/PI)
    (add-function cos (- Math/PI) Math/PI)
    view)
    ;; plot two normal pdf functions
    (doto (function-plot pdf-normal -3 3 :legend true)
    (add-function (fn [x] (pdf-normal x :mean 0.5 :sd 0.5)) -3 3)
    view)
    ;; plot a user defined function and its derivative
    (use '(incanter core charts optimize))
    ;; define the function, x^3 + 2x^2 + 2x + 3
    (defn cubic [x] (+ (* x x x) (* 2 x x) (* 2 x) 3))
    ;; use the derivative function to get a function
    ;; that approximates its derivative
    (def deriv-cubic (derivative cubic))
    ;; plot the cubic function and its derivative
    (doto (function-plot cubic -10 10)
    (add-function deriv-cubic -10 10)
    view)
  4. add-histogram
    macro ([chart x & options])
    Adds a histogram to an existing histogram plot, returns the modified
    chart object.
    Options:
    :nbins (default 10) number of bins for histogram
    :series-label (default x expression)
    Examples:
    (use '(incanter core charts stats))
    (doto (histogram (sample-normal 1000)
    :legend true)
    view
    (add-histogram (sample-normal 1000 :sd 0.5)))
    References:
    http://www.jfree.org/jfreechart/api/javadoc/
    http://www.jfree.org/jfreechart/api/javadoc/org/jfree/chart/JFreeChart.html
  5. add-lines
    macro ([chart x y & options])
    Plots lines on the given scatter or line plot of the (x,y) points.
    Equivalent to R's lines function, returns the modified chart object.
    Options:
    :series-label (default x expression)
    Examples:
    (use '(incanter core stats io datasets charts))
    (def cars (to-matrix (get-dataset :cars)))
    (def y (sel cars :cols 0))
    (def x (sel cars :cols 1))
    (def plot1 (scatter-plot x y :legend true))
    (view plot1)
    ;; add regression line to scatter plot
    (def lm1 (linear-model y x))
    (add-lines plot1 x (:fitted lm1))
    ;; model the data without an intercept
    (def lm2 (linear-model y x :intercept false))
    (add-lines plot1 x (:fitted lm2))
    ;; Clojure's doto macro can be used to build a chart
    (doto (xy-plot x (pdf-normal x))
    view
    clear-background
    (add-lines x (pdf-normal x :sd 1.5))
    (add-lines x (pdf-normal x :sd 0.5)))
    References:
    http://www.jfree.org/jfreechart/api/javadoc/
    http://www.jfree.org/jfreechart/api/javadoc/org/jfree/chart/JFreeChart.html
  6. add-pointer
    fn ([chart x y & options])
    Adds an arrow annotation to the given chart.
    Arguments:
    chart -- the chart to annotate
    x, y -- the coordinate to add the annotation
    Options:
    :text -- (default "") text to include at the end of the arrow
    :angle -- (default :nw) either a number indicating the angle of the arrow,
    or a keyword indicating a direction (:north :nw :west :sw :south
    :se :east :ne)
    Examples:
    (use '(incanter core charts))
    (def x (range (* -2 Math/PI) (* 2 Math/PI) 0.01))
    (def plot (xy-plot x (sin x)))
    (view plot)
    ;; annotate the plot
    (doto plot
    (add-pointer (- Math/PI) (sin (- Math/PI)) :text "(-pi, (sin -pi))")
    (add-pointer Math/PI (sin Math/PI) :text "(pi, (sin pi))" :angle :ne)
    (add-pointer (* 1/2 Math/PI) (sin (* 1/2 Math/PI)) :text "(pi/2, (sin pi/2))" :angle :south))
    ;; try the different angle options
    (add-pointer plot 0 0 :text "north" :angle :north)
    (add-pointer plot 0 0 :text "nw" :angle :nw)
    (add-pointer plot 0 0 :text "ne" :angle :ne)
    (add-pointer plot 0 0 :text "west" :angle :west)
    (add-pointer plot 0 0 :text "east" :angle :east)
    (add-pointer plot 0 0 :text "south" :angle :south)
    (add-pointer plot 0 0 :text "sw" :angle :sw)
    (add-pointer plot 0 0 :text "se" :angle :se)
  7. add-points
    macro ([chart x y & options])
    Plots points on the given scatter-plot or xy-plot of the (x,y) points.
    Equivalent to R's lines function, returns the modified chart object.
    Options:
    :series-label (default x expression)
    Examples:
    (use '(incanter core stats io datasets charts))
    (def cars (to-matrix (get-dataset :cars)))
    (def y (sel cars :cols 0))
    (def x (sel cars :cols 1))
    ;; add regression line to scatter plot
    (def lm1 (linear-model y x))
    ;; model the data without an intercept
    (def lm2 (linear-model y x :intercept false))
    (doto (xy-plot x (:fitted lm1) :legend true)
    view
    (add-points x y)
    (add-lines x (:fitted lm2)))
    References:
    http://www.jfree.org/jfreechart/api/javadoc/
    http://www.jfree.org/jfreechart/api/javadoc/org/jfree/chart/JFreeChart.html
  8. add-polygon
    fn ([chart coords & options])
    Adds a polygon outline defined by a given coordinates. The last coordinate will
    close with the first. If only two points are given, it will plot a line.
    Arguments:
    chart -- the chart to add the polygon to.
    coords -- a list of coords (an n-by-2 matrix can also be used)
    Examples:
    (use '(incanter core stats charts))
    (def x (range -3 3 0.01))
    (def plot (xy-plot x (pdf-normal x)))
    (view plot)
    ;; add polygon to the chart
    (add-polygon plot [[-1.96 0] [1.96 0] [1.96 0.4] [-1.96 0.4]])
    ;; the coordinates can also be passed in a matrix
    ;; (def points (matrix [[-1.96 0] [1.96 0] [1.96 0.4] [-1.96 0.4]]))
    ;; (add-polygon plot points)
    ;; add a text annotation
    (add-text plot -1.25 0.35 "95% Conf Interval")
    ;; PCA chart example
    (use '(incanter core stats charts datasets))
    ;; load the iris dataset
    (def iris (to-matrix (get-dataset :iris)))
    ;; run the pca
    (def pca (principal-components (sel iris :cols (range 4))))
    ;; extract the first two principal components
    (def pc1 (sel (:rotation pca) :cols 0))
    (def pc2 (sel (:rotation pca) :cols 1))
    ;; project the first four dimension of the iris data onto the first
    ;; two principal components
    (def x1 (mmult (sel iris :cols (range 4)) pc1))
    (def x2 (mmult (sel iris :cols (range 4)) pc2))
    ;; now plot the transformed data, coloring each species a different color
    (def plot (scatter-plot x1 x2
    :group-by (sel iris :cols 4)
    :x-label "PC1" :y-label "PC2" :title "Iris PCA"))
    (view plot)
    ;; put box around the first group
    (add-polygon plot [[-3.2 -6.3] [-2 -6.3] [-2 -3.78] [-3.2 -3.78]])
    ;; add some text annotations
    (add-text plot -2.5 -6.5 "Setosa")
    (add-text plot -5 -5.5 "Versicolor")
    (add-text plot -8 -5.5 "Virginica")
  9. add-text
    fn ([chart x y text & options])
    Adds a text annotation centered at the given coordinates.
    Arguments:
    chart -- the chart to annotate
    x, y -- the coordinates to center the text
    text -- the text to add
    Examples:
    ;; PCA chart example
    (use '(incanter core stats charts datasets))
    ;; load the iris dataset
    (def iris (to-matrix (get-dataset :iris)))
    ;; run the pca
    (def pca (principal-components (sel iris :cols (range 4))))
    ;; extract the first two principal components
    (def pc1 (sel (:rotation pca) :cols 0))
    (def pc2 (sel (:rotation pca) :cols 1))
    ;; project the first four dimension of the iris data onto the first
    ;; two principal components
    (def x1 (mmult (sel iris :cols (range 4)) pc1))
    (def x2 (mmult (sel iris :cols (range 4)) pc2))
    ;; now plot the transformed data, coloring each species a different color
    (def plot (scatter-plot x1 x2
    :group-by (sel iris :cols 4)
    :x-label "PC1" :y-label "PC2" :title "Iris PCA"))
    (view plot)
    ;; add some text annotations
    (add-text plot -2.5 -6.5 "Setosa")
    (add-text plot -5 -5.5 "Versicolor")
    (add-text plot -8 -5.5 "Virginica")
  10. bar-chart
    macro ([categories values & options])
    Returns a JFreeChart object representing a bar-chart of the given data.
    Use the 'view' function to display the chart, or the 'save' function
    to write it to a file.
    Arguments:
    categories -- a sequence of categories
    values -- a sequence of numeric values
    Options:
    :title (default 'Histogram') main title
    :x-label (default 'Categories')
    :y-label (default 'Value')
    :legend (default false) prints legend
    :vertical (default true) the orientation of the plot
    :group-by (default nil) -- a vector of values used to group the values into
    series within each category.
    See also:
    view and save
    Examples:
    (use '(incanter core stats charts datasets))
    (def data (get-dataset :co2))
    (def grass-type (sel data :cols 1))
    (def treatment-type (sel data :cols 2))
    (def uptake (sel data :cols 4))
    (view (bar-chart grass-type uptake
    :title "CO2 Uptake"
    :group-by treatment-type
    :x-label "Grass Types" :y-label "Uptake"
    :legend true))
    (def data (get-dataset :airline-passengers))
    (def years (sel data :cols 0))
    (def months (sel data :cols 2))
    (def passengers (sel data :cols 1))
    (view (bar-chart years passengers :group-by months :legend true))
    (view (bar-chart months passengers :group-by years :legend true))
    (def data (get-dataset :austres))
    (view data)
    (def plot (bar-chart (sel data :cols 0) (sel data :cols 1)
    :group-by (sel data :cols 2) :legend true))
    (view plot)
    (save plot "/tmp/austres_plot.png" :width 1000)
    (view "file:///tmp/austres_plot.png")
    (def seasons (mapcat identity (repeat 3 ["winter" "spring" "summer" "fall"])))
    (def years (mapcat identity (repeat 4 [2007 2008 2009])))
    (def values (sample-uniform 12 :integers true :max 100))
    (view (bar-chart years values :group-by seasons :legend true))
    (view (bar-chart ["a" "b" "c"] [10 20 30]))
    (view (bar-chart ["a" "a" "b" "b" "c" "c" ] [10 20 30 10 40 20]
    :legend true
    :group-by ["I" "II" "I" "II" "I" "II"]))
    (view (bar-chart (sample "abcdefghij" :size 10 :replacement true)
    (sample-uniform 10 :max 50) :legend true))
    References:
    http://www.jfree.org/jfreechart/api/javadoc/
    http://www.jfree.org/jfreechart/api/javadoc/org/jfree/chart/JFreeChart.html
  11. bland-altman-plot
    fn ([x1 x2])
    Examples:
    (use '(incanter core datasets charts))
    (def flow-meter (to-matrix (get-dataset :flow-meter)))
    (def x1 (sel flow-meter :cols 1))
    (def x2 (sel flow-meter :cols 3))
    (view (bland-altman-plot x1 x2))
    References:
    http://en.wikipedia.org/wiki/Bland-Altman_plot
    http://www-users.york.ac.uk/~mb55/meas/ba.htm
  12. box-plot
    macro ([x & options])
    Returns a JFreeChart object representing a box-plot of the given data.
    Use the 'view' function to display the chart, or the 'save' function
    to write it to a file.
    Options:
    :title (default 'Histogram') main title
    :x-label (default x expression)
    :y-label (default 'Frequency')
    :legend (default false) prints legend
    :series-label (default x expression)
    :group-by (default nil) -- a vector of values used to group the x values into series.
    See also:
    view and save
    Examples:
    (use '(incanter core stats charts))
    (def gamma-box-plot (box-plot (sample-gamma 1000 :shape 1 :rate 2)
    :title "Gamma Boxplot"
    :legend true))
    (view gamma-box-plot)
    (add-box-plot gamma-box-plot (sample-gamma 1000 :shape 2 :rate 2))
    (add-box-plot gamma-box-plot (sample-gamma 1000 :shape 3 :rate 2))
    ;; use the group-by options
    (use '(incanter core stats datasets charts))
    (def iris (to-matrix (get-dataset :iris)))
    (view (box-plot (sel iris :cols 0) :group-by (sel iris :cols 4) :legend true))
    ;; see INCANTER_HOME/examples/probability_plots.clj for more examples of plots
    References:
    http://www.jfree.org/jfreechart/api/javadoc/
    http://www.jfree.org/jfreechart/api/javadoc/org/jfree/chart/JFreeChart.html
  13. clear-background
    fn ([chart])
    Sets the alpha level (transparancy) of the plot's background to zero,
    removing the default grid, returns the modified chart object.
    References:
    http://www.jfree.org/jfreechart/api/javadoc/
    http://www.jfree.org/jfreechart/api/javadoc/org/jfree/chart/JFreeChart.html
  14. function-plot
    macro ([function min-range max-range & options])
    Returns a xy-plot object of the given function over the range indicated
    by the min-range and max-range arguments. Use the 'view' function to
    display the chart, or the 'save' function to write it to a file.
    Options:
    :title (default 'Histogram') main title
    :x-label (default x expression)
    :y-label (default 'Frequency')
    :legend (default false) prints legend
    :series-label (default x expression)
    :step-size (default (/ (- max-range min-range) 500))
    See also:
    view, save, add-points, add-lines
    Examples:
    (use '(incanter core stats charts))
    (view (function-plot sin (- Math/PI) Math/PI))
    (view (function-plot pdf-normal -3 3))
    (defn cubic [x] (+ (* x x x) (* 2 x x) (* 2 x) 3))
    (view (function-plot cubic -10 10))
  15. histogram
    macro ([x & options])
    Returns a JFreeChart object representing the histogram of the given data.
    Use the 'view' function to display the chart, or the 'save' function
    to write it to a file.
    Options:
    :nbins (default 10) number of bins
    :density (default false) if false, plots frequency, otherwise density
    :title (default 'Histogram') main title
    :x-label (default x expression)
    :y-label (default 'Frequency')
    :legend (default false) prints legend
    :series-label (default x expression)
    See also:
    view, save, add-histogram
    Examples:
    (use '(incanter core charts stats))
    (view (histogram (sample-normal 1000)))
    # plot a density histogram
    (def hist (histogram (sample-normal 1000) :density true))
    (view hist)
    # add a normal density line to the plot
    (def x (range -4 4 0.01))
    (add-lines hist x (pdf-normal x))
    # plot some gamma data
    (def gam-hist (histogram (sample-gamma 1000) :density true :nbins 30))
    (view gam-hist)
    (def x (range 0 8 0.01))
    (add-lines gam-hist x (pdf-gamma x))
    ;; see INCANTER_HOME/examples/probability_plots.clj for more examples of plots
    References:
    http://www.jfree.org/jfreechart/api/javadoc/
    http://www.jfree.org/jfreechart/api/javadoc/org/jfree/chart/JFreeChart.html
  16. line-chart
    macro ([categories values & options])
    Returns a JFreeChart object representing a line-chart of the given values and categories.
    Use the 'view' function to display the chart, or the 'save' function
    to write it to a file.
    Arguments:
    categories -- a sequence of categories
    values -- a sequence of numeric values
    Options:
    :title (default 'Histogram') main title
    :x-label (default 'Categories')
    :y-label (default 'Value')
    :legend (default false) prints legend
    :group-by (default nil) -- a vector of values used to group the values into
    series within each category.
    See also:
    view and save
    Examples:
    (use '(incanter core stats charts datasets))
    (def data (get-dataset :airline-passengers))
    (def years (sel data :cols 0))
    (def months (sel data :cols 2))
    (def passengers (sel data :cols 1))
    (view (line-chart years passengers :group-by months :legend true))
    (view (line-chart months passengers :group-by years :legend true))
    (def seasons (mapcat identity (repeat 3 ["winter" "spring" "summer" "fall"])))
    (def years (mapcat identity (repeat 4 [2007 2008 2009])))
    (def x (sample-uniform 12 :integers true :max 100))
    (view (line-chart years x :group-by seasons :legend true))
    (view (line-chart ["a" "b" "c" "d" "e" "f"] [10 20 30 10 40 20]))
    (view (line-chart (sample "abcdefghij" :size 10 :replacement true)
    (sample-uniform 10 :max 50) :legend true))
    References:
    http://www.jfree.org/jfreechart/api/javadoc/
    http://www.jfree.org/jfreechart/api/javadoc/org/jfree/chart/JFreeChart.html
  17. line-plot
    fn ([x y & options])
    WARNING: line-plot has been renamed xy-plot.
  18. qq-plot
    fn ([x & options])
    Returns a QQ-Plot object. Use the 'view' function to display it.
    References:
    http://en.wikipedia.org/wiki/QQ_plot
    Examples:
    (use '(incanter core stats charts))
    (view (qq-plot (sample-normal 100)))
    (view (qq-plot (sample-exp 100)))
    (view (qq-plot (sample-gamma 100)))
  19. scatter-plot
    macro ([x y & options])
    Returns a JFreeChart object representing a scatter-plot of the given data.
    Use the 'view' function to display the chart, or the 'save' function
    to write it to a file.
    Options:
    :title (default 'Histogram') main title
    :x-label (default x expression)
    :y-label (default 'Frequency')
    :legend (default false) prints legend
    :series-label (default x expression)
    :group-by (default nil) -- a vector of values used to group the x and y values into series.
    See also:
    view, save, add-points, add-lines
    Examples:
    (use '(incanter core stats charts))
    ;; create some data
    (def mvn-samp (sample-multivariate-normal 1000 :mean [7 5] :sigma (matrix [[2 1.5] [1.5 3]])))
    ;; create scatter-plot of points
    (def mvn-plot (scatter-plot (sel mvn-samp :cols 0) (sel mvn-samp :cols 1)))
    (view mvn-plot)
    ;; add regression line to scatter plot
    (def x (sel mvn-samp :cols 0))
    (def y (sel mvn-samp :cols 1))
    (def lm (linear-model y x))
    (add-lines mvn-plot x (:fitted lm))
    ;; use :group-by option
    (use '(incanter core stats datasets charts))
    ;; load the :iris dataset
    (def iris (to-matrix (get-dataset :iris)))
    ;; plot the first two columns grouped by the fifth column
    (view (scatter-plot (sel iris :cols 0) (sel iris :cols 1) :group-by (sel iris :cols 4)))
    ;; see INCANTER_HOME/examples/probability_plots.clj for more examples of plots
    References:
    http://www.jfree.org/jfreechart/api/javadoc/
    http://www.jfree.org/jfreechart/api/javadoc/org/jfree/chart/JFreeChart.html
  20. set-alpha
    fn ([chart alpha])
    Sets the alpha level (transparancy) of the plot's foreground,
    returns the modified chart object.
    References:
    http://www.jfree.org/jfreechart/api/javadoc/
    http://www.jfree.org/jfreechart/api/javadoc/org/jfree/chart/JFreeChart.html
  21. set-background-alpha
    fn ([chart alpha])
    Sets the alpha level (transparancy) of the plot's background,
    returns the modified chart object.
    References:
    http://www.jfree.org/jfreechart/api/javadoc/
    http://www.jfree.org/jfreechart/api/javadoc/org/jfree/chart/JFreeChart.html
  22. set-title
    fn ([chart title])
    Sets the main title of the plot, returns the modified chart object.
    References:
    http://www.jfree.org/jfreechart/api/javadoc/
    http://www.jfree.org/jfreechart/api/javadoc/org/jfree/chart/JFreeChart.html
  23. set-x-label
    fn ([chart label])
    Sets the label of the x-axis, returns the modified chart object.
    References:
    http://www.jfree.org/jfreechart/api/javadoc/
    http://www.jfree.org/jfreechart/api/javadoc/org/jfree/chart/JFreeChart.html
  24. set-y-label
    fn ([chart label])
    Sets the label of the y-axis, returns the modified chart object.
    References:
    http://www.jfree.org/jfreechart/api/javadoc/
    http://www.jfree.org/jfreechart/api/javadoc/org/jfree/chart/JFreeChart.html
  25. trace-plot
    fn ([x & options])
    Returns a trace-plot object, use the 'view' function to display it.
    Examples:
    (use '(incanter core datasets stats bayes charts))
    (def ols-data (to-matrix (get-dataset :survey)))
    (def x (sel ols-data (range 0 2313) (range 1 10)))
    (def y (sel ols-data (range 0 2313) 10))
    (def sample-params (sample-model-params 5000 (linear-model y x :intercept false)))
    (view (trace-plot (:var sample-params)))
    (view (trace-plot (sel (:coefs sample-params) :cols 0)))
  26. xy-plot
    macro ([x y & options])
    Returns a JFreeChart object representing a xy-plot of the given data.
    Use the 'view' function to display the chart, or the 'save' function
    to write it to a file.
    Options:
    :title (default 'Histogram') main title
    :x-label (default x expression)
    :y-label (default 'Frequency')
    :legend (default false) prints legend
    :series-label (default x expression)
    :group-by (default nil) -- a vector of values used to group the x and y values into series.
    See also:
    view, save, add-points, add-lines
    Examples:
    (use '(incanter core stats charts))
    ;; plot the cosine function
    (def x (range -1 5 0.01))
    (def y (cos (mult 2 Math/PI x)))
    (view (xy-plot x y))
    ;; plot gamma pdf with different parameters
    (def x2 (range 0 20 0.1))
    (def gamma-plot (xy-plot x2 (pdf-gamma x2 :shape 1 :rate 2)
    :legend true
    :title "Gamma PDF"
    :y-label "Density"))
    (view gamma-plot)
    (add-lines gamma-plot x2 (pdf-gamma x2 :shape 2 :rate 2))
    (add-lines gamma-plot x2 (pdf-gamma x2 :shape 3 :rate 2))
    (add-lines gamma-plot x2 (pdf-gamma x2 :shape 5 :rate 1))
    (add-lines gamma-plot x2 (pdf-gamma x2 :shape 9 :rate 0.5))
    ;; use :group-by option
    (use '(incanter core charts datasets))
    (def data (to-matrix (get-dataset :chick-weight)))
    (let [[weight time chick] (trans data)]
    (view (xy-plot time weight :group-by chick)))
    ;; see INCANTER_HOME/examples/probability_plots.clj for more examples of plots
    References:
    http://www.jfree.org/jfreechart/api/javadoc/
    http://www.jfree.org/jfreechart/api/javadoc/org/jfree/chart/JFreeChart.html
[ - ] incanter.chrono
  1. are-overlapping?
    fn ([[s e] [s1 e1]])
  2. before?
    fn ([start end])
  3. beginning-of
    fn ([the-date unit])
    Return a date at the beginning of the month, year, day, etc. from the-date.
  4. date
    fn ([& args])
    Returns a new date object. Takes year, month, and day as args as
    well as optionally hours, minutes, and seconds.
  5. date-seq
    fn ([units from to] [units from])
    Returns a lazy seq of dates starting with from up until to in
    increments of units. If to is omitted, returns an infinite seq.
  6. date-time
    fn ([d t])
  7. date>
    fn ([d])
  8. def-date-format
    macro ([fname [arg] & body])
  9. def-date-parser
    macro ([fname [arg] & body])
  10. def-simple-date-format
    macro ([fname form])
  11. earlier
    fn ([the-date amount units] [the-date units])
    Returns a date that is earlier than the-date by amount units.
    Amount is one if not specified.
  12. earlier?
    fn ([date-a date-b])
  13. end-of
    fn ([the-date unit])
    Return a date at the end of the month, year, day, etc. from the-date.
  14. format-date
    multi
    Take in a date and a format (either a keyword or a string) and
    return a string with the formatted date.
  15. hours-around
    fn ([r d])
  16. hours-between
    fn ([start end])
  17. hours-from
    fn ([d h])
  18. is-within?
    fn ([d [s e]])
  19. joda-date
    fn ([str-d] [y m d h min sec mill zone])
  20. joda-guard
    fn ([d])
  21. joda-proxy
    fn ([& args])
    joda-date object wraped in a proxy of goodness.
  22. joda-str
    fn ([d])
  23. later
    fn ([the-date amount units] [the-date units])
    Returns a date that is later than the-date by amount units.
    Amount is one if not specified.
  24. later?
    fn ([date-a date-b])
  25. minutes-between
    fn ([start end])
  26. minutes-from
    fn ([d m])
  27. now
    fn ([])
    Returns a new date object with the current date and time.
  28. parse-date
    multi
    Take in a string with a formatted date and a format (either a
    keyword or a string) and return a parsed date.
  29. time-between
    fn ([date-a date-b] [date-a date-b units])
    How many units between date-a and date-b? Units defaults to seconds.
  30. time-zone
    fn ([offset])
  31. today
    fn ([])
    Returns a new date object with only the current date. The time
    fields will be set to 0.
  32. units-in-seconds
    var
    Number of seconds in each unit
  33. units-to-calendar-units
    var
    Conversion of unit keywords to Calendar units
  34. valid-range?
    fn ([[start end]])
[ - ] incanter.classification
  1. categorical-classifiers
    fn ([features])
    makes a categorical classifier for use with |each.
  2. category-classifier
    fn ([x])
  3. category-map-classifier
    fn ([x] [s x])
  4. classification-workflow
    fn ([transformer classifier count-all])
    composes a classification workflow from a classifier a counter and a transformer.
    note that count-all has been abstracted due to the fact taht you may count with reduce or merge-with depending on wheter you ahve vectors or maps.
  5. classifier
    fn ([fns classify])
  6. classifier-macro
    macro ([classify classify-one-to-one])
  7. classify-one-to-all
    fn ([fns data])
    takes a map of fns and a map of features, apply each classifer fn to the entire feature map.
    usage:
    (classify-one-to-each
    {:a (present-when #(and
    (> (:a %) 9)
    (< (:b %) 6)))}
    {:a 10 :b 5})
    -> {:a 1}
  8. classify-one-to-each
    fn ([fns data])
    takes a map of fns and a map of features, apply each classifer fn to each feature in the data map individually.
    usage:
    (classify-one-to-each
    {:a (present-when (gt 5)) :b (present-when (lt 5))}
    {:a 10 :b 5})
    -> {:a {:a 1 :b 0} :b {:a 0 :b 0}}
  9. classify-one-to-one
    fn ([fns data])
    takes a map of fns and a map of features, where there is one classifier fn per feature and they share the same key names in the classifier and feature maps. apply each classifer fn to the corresponding feature.
    usage:
    (classify-one-to-one
    {:a (present-when (gt 5)) :b (present-when (lt 5))}
    {:a 10 :b 5})
    -> {:a 1 :b 0}
  10. collect-vals
    fn ([maps])
  11. confusion-matrix
    fn ([trd tst])
  12. confusion-matrix-by-time
    fn ([results])
  13. confusion-matrix-from-counts
    fn ([test & train])
    produces a confusion matrix from teh joint distributions of test and train data.
    right now the tests and train data are con-prob-tuples this may change if we store only the joint PMFs
  14. cross-validation-confusion-matrix
    fn ([& xs])
    takes a set of n joint PMFs, and holds each joint PMF out in turn as the test set.
    merges the resulting n cross-validation matrices into a single matrix.
  15. equivalence-classes
    fn ([class-mappings])
    takes a map where key is class and value is a set of equivalence classes to the key class. it then inverts the mapping so that you can look up classes that are equivalence classes of a new larger class.
    usage:
    (equivalence-classes {0 #{0 1}, 1 #{2, 3, 4}, 2 #{5 6}})
    -> {0 0, 1 0, 2 1, 3 1, 4 1, 5 2, 6 2}
  16. heterogenious-group-by
    fn ([f coll])
    Returns a sorted map of the elements of coll keyed by the result of
    f on each element. The value at each key will be a vector of the
    corresponding elements, in the order they appeared in coll.
  17. map-of-vectors
    fn ([keys])
  18. merge-counts
    fn ([x])
  19. merge-equivalence-classes
    fn ([class-mappings x])
    (defn merge-classes-time-before-dep [model]
    "take in [[{modelcounts} {totalcounts}] [{} {}]]"
    ;;starting with 2 becasue 1 is the first slot, which is time before departure for the metrics
    (let [model-merger {2 bucket-eq-classes
    3 bucket-eq-classes
    6 bucket-eq-classes}
    count-merger {2 bucket-eq-classes
    3 bucket-eq-classes}]
    (map (fn [[modelcnts totalcnts]] [(merge-equivalence-classes model-merger modelcnts)
    (merge-equivalence-classes count-merger totalcnts)]) model)))
  20. merge-levels
    fn ([class-mappings coll])
  21. model-from-maps
    fn ([prob-map])
    creates a model from probability report maps.
  22. most-likely
    fn ([m])
    computes the most likely class from a map of classes to class probability.
    usage:
    (most-likely {:a 0.6 :b 0.4}) -> :a
  23. n-times-k-fold-cross-validation-confusion-matrix
    fn ([list-of-lists])
  24. numerical-classifiers
    fn ([ranges])
    makes a bucketing classifier out of each range for use with |each.
  25. percent-of-total-predictions-by-time
    fn ([counts])
  26. precision
    fn ([m])
  27. prob-map-tuples-by-time
    fn ([prob-map-tuple])
  28. probs-only
    fn ([k a b] [a b])
    compute probability from computed counts.
    this is division, you have to count up the proper numerator and denominator in your counting step.
  29. process-prob-map
    fn ([[a-and-b b] report])
    process probability maps using a provided report function.
    beware: can't pass keys to reporter our you get double nested final level in map.
  30. recall
    fn ([m])
  31. recall-by-time
    fn ([confustion-matrix])
  32. vectorize
    fn ([maps])
  33. wrapper-classifiers
    fn ([features])
    makes a wrapping classifier like categortical but with wrapper fn.
[ - ] incanter.core
  1. abs
    fn ([A])
    Returns the absolute value of the elements in the given matrix, sequence or number.
    Equivalent to R's abs function.
  2. acos
    fn ([A])
    Returns the arc cosine of the elements in the given matrix, sequence or number.
    Equivalent to R's acos function.
  3. asin
    fn ([A])
    Returns the arc sine of the elements in the given matrix, sequence or number.
    Equivalent to R's asin function.
  4. atan
    fn ([A])
    Returns the arc tangent of the elements in the given matrix, sequence or number.
    Equivalent to R's atan function.
  5. atan2
    fn ([& args])
    Returns the atan2 of the elements in the given matrices, sequences or numbers.
    Equivalent to R's atan2 function.
  6. beta
    fn ([a b])
    Equivalent to R's beta function.
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/stat/tdouble/Gamma.html
  7. bind-columns
    fn ([& args])
    Returns the matrix resulting from concatenating the given matrices
    and/or sequences by their columns. Equivalent to R's cbind.
    Examples:
    (def A (matrix [[1 2 3]
    [4 5 6]
    [7 8 9]]))
    (def B (matrix [10 11 12]))
    (bind-columns A B)
    (bind-columns [1 2 3 4] [5 6 7 8])
  8. bind-rows
    fn ([& args])
    Returns the matrix resulting from concatenating the given matrices
    and/or sequences by their rows. Equivalent to R's rbind.
    Examples:
    (def A (matrix [[1 2 3]
    [4 5 6]
    [7 8 9]]))
    (def B (matrix [[10 11 12]
    [13 14 15]]))
    (bind-rows A B)
    (bind-rows [1 2 3 4] [5 6 7 8])
  9. categorical-var
    fn ([& args])
    Returns a categorical variable based on the values in the given collection.
    Equivalent to R's factor function.
    Options:
    :data (default nil) factors will be extracted from the given data.
    :ordered? (default false) indicates that the variable is ordinal.
    :labels (default (sort (into #{} data)))
    :levels (range (count labels))
    Examples:
    (categorical-var :data [:a :a :c :b :a :c :c])
    (categorical-var :labels [:a :b :c])
    (categorical-var :labels [:a :b :c] :levels [10 20 30])
    (categorical-var :levels [1 2 3])
  10. choose
    fn ([n k])
    Returns number of k-combinations (each of size k) from a set S with
    n elements (size n), which is the binomial coefficient (also known
    as the 'choose function') [wikipedia]
    choose = n!/(k!(n - k)!)
    Equivalent to R's choose function.
    Examples:
    (choose 25 6) ; => 2,598,960
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/math/tdouble/DoubleArithmetic.html
    http://en.wikipedia.org/wiki/Combination
  11. condition
    fn ([mat])
    Returns the two norm condition number, which is max(S) / min(S), where S is the diagonal matrix of singular values from an SVD decomposition.
    Examples:
    (use 'incanter.core)
    (def foo (matrix (range 9) 3))
    (condition foo)
    References:
    http://en.wikipedia.org/wiki/Condition_number
    http://incanter.org/docs/parallelcolt/api/cern/colt/matrix/tdouble/algo/decomposition/DoubleSingularValueDecompositionDC.html
  12. copy
    fn ([mat])
    Returns a copy of the given matrix.
  13. cos
    fn ([A])
    Returns the cosine of the elements in the given matrix, sequence or number.
    Equivalent to R's cos function.
  14. cumulative-sum
    fn ([coll])
    Returns a sequence of cumulative sum for the given collection. For instance
    The first value equals the first value of the argument, the second value is
    the sum of the first two arguments, the third is the sum of the first three
    arguments, etc.
    Examples:
    (use 'incanter.core)
    (cumulative-sum (range 100))
  15. dataset
    fn ([column-names & data])
    Returns a map of type ::dataset constructed from the given column-names and
    data. The data is a sequence of sequences.
  16. dataset?
    fn ([obj])
    Determines if obj is of type ::dataset.
  17. decomp-cholesky
    fn ([mat])
    Returns the Cholesky decomposition of the given matrix. Equivalent to R's
    chol function.
    Returns:
    a matrix of the triangular factor (note: the result from
    cern.colt.matrix.linalg.CholeskyDecomposition is transposed so
    that it matches the result return from R's chol function.
    Examples:
    (use '(incanter core stats charts datasets))
    ;; load the iris dataset
    (def iris (to-matrix (get-dataset :iris)))
    ;; take the Cholesky decompostion of the correlation matrix of the iris data.
    (decomp-cholesky (correlation iris))
    References:
    http://incanter.org/docs/parallelcolt/api/cern/colt/matrix/tdouble/algo/decomposition/DoubleCholeskyDecomposition.html
    http://en.wikipedia.org/wiki/Cholesky_decomposition
  18. decomp-eigenvalue
    fn ([mat])
    Returns the Eigenvalue Decomposition of the given matrix. Equivalent to R's eig function.
    Returns:
    a map containing:
    :values -- vector of eigenvalues
    :vectors -- the matrix of eigenvectors
    Examples:
    (use 'incanter.core)
    (def foo (matrix (range 9) 3))
    (decomp-eigenvalue foo)
    References:
    http://en.wikipedia.org/wiki/Eigenvalue_decomposition
    http://incanter.org/docs/parallelcolt/api/cern/colt/matrix/tdouble/algo/decomposition/DoubleEigenvalueDecomposition.html
  19. decomp-lu
    fn ([mat])
    Returns the LU decomposition of the given matrix.
    Examples:
    (use 'incanter.core)
    (def foo (matrix (range 9) 3))
    (decomp-lu foo)
    Returns:
    a map containing:
    :L -- the lower triangular factor
    :U -- the upper triangular factor
    References:
    http://en.wikipedia.org/wiki/LU_decomposition
    http://incanter.org/docs/parallelcolt/api/cern/colt/matrix/tdouble/algo/decomposition/DoubleLUDecomposition.html
  20. decomp-qr
    fn ([mat])
    Returns the QR decomposition of the given matrix. Equivalent to R's qr function.
    Examples:
    (use 'incanter.core)
    (def foo (matrix (range 9) 3))
    (decomp-qr foo)
    Returns:
    a map containing:
    :Q -- orthogonal factor
    :R -- the upper triangular factor
    References:
    http://en.wikipedia.org/wiki/QR_decomposition
    http://incanter.org/docs/parallelcolt/api/cern/colt/matrix/tdouble/algo/decomposition/DoubleQRDecomposition.html
  21. decomp-svd
    fn ([mat])
    Returns the Singular Value Decomposition (SVD) of the given matrix. Equivalent to
    R's svd function.
    Returns:
    a map containing:
    :S -- the diagonal matrix of singular values
    :U -- the left singular vectors U
    :V -- the right singular vectors V
    Examples:
    (use 'incanter.core)
    (def foo (matrix (range 9) 3))
    (decomp-foo foo)
    References:
    http://en.wikipedia.org/wiki/Singular_value_decomposition
    http://incanter.org/docs/parallelcolt/api/cern/colt/matrix/tdouble/algo/decomposition/DoubleSingularValueDecompositionDC.html
  22. det
    fn ([mat])
    Returns the determinant of the given matrix using LU decomposition. Equivalent
    to R's det function.
    References:
    http://en.wikipedia.org/wiki/LU_decomposition
    http://incanter.org/docs/parallelcolt/api/cern/colt/matrix/tdouble/algo/decomposition/DoubleLUDecomposition.html
  23. diag
    fn ([m])
    If given a matrix, diag returns a sequence of its diagonal elements.
    If given a sequence, it returns a matrix with the sequence's elements
    on its diagonal. Equivalent to R's diag function.
    Examples:
    (diag [1 2 3 4])
    (def A (matrix [[1 2 3]
    [4 5 6]
    [7 8 9]]))
    (diag A)
  24. dim
    fn ([mat])
    Returns a vector with the number of rows and columns of the given matrix.
  25. div
    fn ([& args])
    Performs element-by-element division on multiple matrices, sequences,
    and/or numbers. Equivalent to R's / operator.
    Examples:
    (def A (matrix [[1 2 3]
    [4 5 6]
    [7 8 9]]))
    (div A A A)
    (div A 2)
    (div 2 A)
    (div [1 2 3] [1 2 3])
    (div [1 2 3] 2)
    (div 2 [1 2 3])
    (div [1 2 3]) ; returns [1 1/2 13]
  26. exp
    fn ([A])
    Returns the exponential of the elements in the given matrix, sequence or number.
    Equivalent to R's exp function.
  27. factorial
    fn ([k])
    Returns the factorial of k (k must be a positive integer). Equivalent to R's
    factorial function.
    Examples:
    (factorial 6)
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/math/tdouble/DoubleArithmetic.html
    http://en.wikipedia.org/wiki/Factorial
  28. gamma
    fn ([x])
    Equivalent to R's gamma function.
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/stat/tdouble/Gamma.html
  29. group-by
    fn ([mat on-cols & options])
    Groups the given matrix by the values in the columns indicated by the
    'on-cols' argument, returning a sequence of matrices. The returned
    matrices are sorted by the value of the group column ONLY when there
    is only a single (non-vector) on-col argument.
    Examples:
    (use '(incanter core datasets))
    (def plant-growth (to-matrix (get-dataset :plant-growth)))
    (group-by plant-growth 1)
    ;; only return the first column
    (group-by plant-growth 1 :cols 0)
    ;; don't return the second column
    (group-by plant-growth 1 :except-cols 1)
    (def plant-growth-dummies (to-matrix (get-dataset :plant-growth) :dummies true))
    (group-by plant-growth-dummies [1 2])
    ;; return only the first column
    (group-by plant-growth-dummies [1 2] :cols 0)
    ;; don't return the last two columns
    (group-by plant-growth-dummies [1 2] :except-cols [1 2])
    ;; plot the plant groups
    (use 'incanter.charts)
    ;; can use destructuring if you know the number of groups,
    ;; groups are sorted only if the group is based on a single column value
    (let [[ctrl trt1 trt2] (group-by plant-growth 1 :cols 0)]
    (doto (box-plot ctrl)
    (add-box-plot trt1)
    (add-box-plot trt2)
    view))
  30. half-vectorize
    fn ([mat])
    Returns the half-vectorization (i.e. vech) of the given matrix.
    The half-vectorization, vech(A), of a symmetric nxn matrix A
    is the n(n+1)/2 x 1 column vector obtained by vectorizing only
    the upper triangular part of A.
    For instance:
    (= (half-vectorize (matrix [[a b] [b d]])) (matrix [a b d]))
    Examples:
    (def A (matrix [[1 2] [2 4]]))
    (half-vectorize A)
    References:
    http://en.wikipedia.org/wiki/Vectorization_(mathematics)
  31. identity-matrix
    fn ([n])
    Returns an n-by-n identity matrix.
    Examples:
    (identity-matrix 4)
  32. incomplete-beta
    fn ([x a b])
    Returns the non-regularized incomplete beta value.
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/stat/tdouble/Gamma.html
  33. kronecker
    fn ([& args])
    Returns the Kronecker product of the given arguments.
    Examples:
    (def x (matrix (range 6) 2))
    (def y (matrix (range 4) 2))
    (kronecker 4 x)
    (kronecker x 4)
    (kronecker x y)
  34. length
    fn ([coll])
    A version of count that works on collections, matrices, and numbers.
    The length of a number is one, the length of a collection is its count,
    and the length of a matrix is the number of elements it contains (nrow*ncol).
    Equivalent to R's length function.
  35. log
    fn ([A])
    Returns the natural log of the elements in the given matrix, sequence or number.
    Equvalent to R's log function.
  36. log10
    fn ([A])
    Returns the log base 10 of the elements in the given matrix, sequence or number.
    Equivalent to R's log10 function.
  37. log2
    fn ([A])
    Returns the log base 2 of the elements in the given matrix, sequence or number.
    Equivalent to R's log2 function.
  38. matrix
    fn ([data] [data ncol] [init-val rows cols])
    Returns an instance of an incanter.Matrix, which is an extension of
    cern.colt.matrix.tdouble.impl.DenseColDoubleMatrix2D that implements the Clojure
    interface clojure.lang.ISeq. Therefore Clojure sequence operations can
    be applied to matrices. A matrix consists of a sequence of rows, where
    each row is a one-dimensional row matrix. One-dimensional matrices are,
    in turn, sequences of numbers. Equivalent to R's matrix function.
    Examples:
    (def A (matrix [[1 2 3] [4 5 6] [7 8 9]])) ; produces a 3x3 matrix
    (def A2 (matrix [1 2 3 4 5 6 7 8 9] 3)) ; produces the same 3x3 matrix
    (def B (matrix [1 2 3 4 5 6 7 8 9])) ; produces a 9x1 column vector
    (first A) ; produces a row matrix [1 2 3]
    (rest A) ; produces a sub matrix [[4 5 6] [7 8 9]]
    (first (first A)) ; produces 1.0
    (rest (first A)) ; produces a row matrix [2 3]
    ; since (plus row1 row2) adds the two rows element-by-element,
    (reduce plus A) ; produces the sums of the columns,
    ; and since (sum row1) sums the elements of the row,
    (map sum A) ; produces the sums of the rows,
    ; you can filter the rows using Clojure's filter function
    (filter #(> (nth % 1) 4) A) ; returns the rows where the second column is greater than 4.
    References:
    http://incanter.org/docs/parallelcolt/api/cern/colt/matrix/tdouble/DoubleMatrix2D.html
  39. matrix?
    fn ([obj])
    Test if obj is 'derived' incanter.Matrix.
  40. minus
    fn ([& args])
    Performs element-by-element subtraction on multiple matrices, sequences,
    and/or numbers. If only a single argument is provided, returns the
    negative of the given matrix, sequence, or number. Equivalent to R's - operator.
    Examples:
    (def A (matrix [[1 2 3]
    [4 5 6]
    [7 8 9]]))
    (minus A)
    (minus A A A)
    (minus A 2)
    (minus 2 A)
    (minus [1 2 3] [1 2 3])
    (minus [1 2 3] 2)
    (minus 2 [1 2 3])
    (minus [1 2 3])
  41. mmult
    fn ([& args])
    Returns the matrix resulting from the matrix multiplication of the
    the given arguments. Equivalent to R's %*% operator.
    Examples:
    (def A (matrix [[1 2 3]
    [4 5 6]
    [7 8 9]]))
    (mmult A (trans A))
    (mmult A (trans A) A)
    References:
    http://en.wikipedia.org/wiki/Matrix_multiplication
    http://incanter.org/docs/parallelcolt/api/cern/colt/matrix/tdouble/DoubleMatrix2D.html
  42. mult
    fn ([& args])
    Performs element-by-element multiplication on multiple matrices, sequences,
    and/or numbers. Equivalent to R's * operator.
    Examples:
    (def A (matrix [[1 2 3]
    [4 5 6]
    [7 8 9]]))
    (mult A A A)
    (mult A 2)
    (mult 2 A)
    (mult [1 2 3] [1 2 3])
    (mult [1 2 3] 2)
    (mult 2 [1 2 3])
  43. ncol
    fn ([mat])
    Returns the number of columns in the given matrix. Equivalent to R's ncol function.
  44. nrow
    fn ([mat])
    Returns the number of rows in the given matrix. Equivalent to R's nrow function.
  45. plus
    fn ([& args])
    Performs element-by-element addition on multiple matrices, sequences,
    and/or numbers. Equivalent to R's + operator.
    Examples:
    (def A (matrix [[1 2 3]
    [4 5 6]
    [7 8 9]]))
    (plus A A A)
    (plus A 2)
    (plus 2 A)
    (plus [1 2 3] [1 2 3])
    (plus [1 2 3] 2)
    (plus 2 [1 2 3])
  46. pow
    fn ([& args])
    This is an element-by-element exponent function, raising the first argument,
    by the exponents in the remaining arguments. Equivalent to R's ^ operator.
  47. prod
    fn ([x])
    Returns the product of the given sequence.
  48. quit
    fn ([])
    Exits the Clojure shell.
  49. rank
    fn ([mat])
    Returns the effective numerical matrix rank, which is the number of nonnegligible singular values.
    Examples:
    (use 'incanter.core)
    (def foo (matrix (range 9) 3))
    (rank foo)
    References:
    http://en.wikipedia.org/wiki/Matrix_rank
    http://incanter.org/docs/parallelcolt/api/cern/colt/matrix/tdouble/algo/decomposition/DoubleSingularValueDecompositionDC.html
  50. regularized-beta
    fn ([x a b])
    Returns the regularized incomplete beta value. Equivalent to R's pbeta function.
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/stat/tdouble/Gamma.html
    http://en.wikipedia.org/wiki/Regularized_incomplete_beta_function
    http://mathworld.wolfram.com/RegularizedBetaFunction.html
  51. save
    multi
    Save is a multi-function that is used to write matrices, datasets and
    charts (in png format) to a file.
    Arguments:
    obj -- is a matrix, dataset, or chart object
    filename -- the filename to create.
    Matrix and dataset options:
    :delim (default \,) column delimiter
    :header (default nil) an sequence of strings to be used as header line,
    for matrices the default value is nil, for datasets, the default is
    the dataset's column-names array.
    :append (default false) determines whether this given file should be
    appended to. If true, a header will not be written to the file again.
    Chart options:
    :width (default 500)
    :height (default 400)
    Matrix Examples:
    (use '(incanter core io))
    (def A (matrix (range 12) 3)) ; creates a 3x4 matrix
    (save A "A.dat") ; writes A to the file A.dat, with no header and comma delimited
    (save A "A.dat" :delim \tab) ; writes A to the file A.dat, with no header and tab delimited
    ;; writes A to the file A.dat, with a header and tab delimited
    (save A "A.dat" :delim \, :header ["col1" "col2" "col3"])
    Dataset Example:
    (use '(incanter core io datasets))
    ;; read the iris sample dataset, and save it to a file.
    (def iris (get-dataset :iris))
    (save iris "iris.dat")
    Chart Example:
    (use '(incanter core io stats charts))
    (save (histogram (sample-normal 1000)) "hist.png")
    ;; chart example using java.io.OutputStream instead of filename
    (use '(incanter core stats charts))
    (import 'java.io.FileOutputStream)
    (def fos (FileOutputStream. "/tmp/hist.png"))
    (def hist (histogram (sample-normal 1000)))
    (save hist fos)
    (.close fos)
    (view "file:///tmp/hist.png")
  52. sel
    multi
    Returns an element or subset of the given matrix, or dataset.
    Argument:
    a matrix object or dataset.
    Options:
    :rows (default true)
    returns all rows by default, can pass a row index or sequence of row indices
    :cols (default true)
    returns all columns by default, can pass a column index or sequence of column indices
    :except-rows (default nil) can pass a row index or sequence of row indices to exclude
    :except-cols (default nil) can pass a column index or sequence of column indices to exclude
    :filter (default nil)
    a function can be provided to filter the rows of the matrix
    Examples:
    (use 'incanter.datasets)
    (def iris (to-matrix (get-dataset :iris)))
    (sel iris 0 0) ; first element
    (sel iris :rows 0 :cols 0) ; also first element
    (sel iris :cols 0) ; first column of all rows
    (sel iris :cols [0 2]) ; first and third column of all rows
    (sel iris :rows (range 10) :cols (range 2)) ; first two columns of the first 10 rows
    (sel iris :rows (range 10)) ; all columns of the first 10 rows
    ;; exclude rows or columns
    (sel iris :except-rows (range 10)) ; all columns of all but the first 10 rows
    (sel iris :except-cols 1) ; all columns except the second
    ;; return only the first 10 even rows
    (sel iris :rows (range 10) :filter #(even? (int (nth % 0))))
    ;; select rows where distance (third column) is greater than 50
    (sel iris :filter #(> (nth % 2) 4))
    ;; examples with datasets
    (use 'incanter.datasets)
    (def us-arrests (get-dataset :us-arrests))
    (sel us-arrests :cols "State")
    (sel us-arrests :cols ["State" "Murder"])
  53. sin
    fn ([A])
    Returns the sine of the elements in the given matrix, sequence or number.
    Equivalent to R's sin function.
  54. solve
    fn ([A & B])
    Returns a matrix solution if A is square, least squares solution otherwise.
    Equivalent to R's solve function.
    Examples:
    (solve (matrix [[2 0 0] [0 2 0] [0 0 2]]))
    References:
    http://en.wikipedia.org/wiki/Matrix_inverse
  55. solve-quadratic
    fn ([a b c])
    Returns a vector with the solution to x from the quadratic
    equation, a*x^2 + b*x + c.
    Arguments:
    a, b, c: coefficients of a qaudratic equation.
    Examples:
    ;; -2*x^2 + 7*x + 15
    (quadratic-formula -2 7 15)
    ;; x^2 + -2*x + 1
    (quadratic-formula 1 -2 1)
    References:
    http://en.wikipedia.org/wiki/Quadratic_formula
  56. sq
    fn ([A])
    Returns the square of the elements in the given matrix, sequence or number.
    Equivalent to R's sq function.
  57. sqrt
    fn ([A])
    Returns the square-root of the elements in the given matrix, sequence or number.
    Equivalent to R's sqrt function.
  58. sum
    fn ([x])
    Returns the sum of the given sequence.
  59. sum-of-squares
    fn ([x])
    Returns the sum-of-squares of the given sequence.
  60. symmetric-matrix
    fn ([data & options])
    Returns a symmetric matrix from the given data, which represents the lower triangular elements
    ordered by row. This is not the inverse of half-vectorize which returns a vector of the upper-triangular
    values, unless the :lower option is set to false.
    Options:
    :lower (default true) -- lower-triangular. Set :lower to false to reverse the half-vectorize function.
    Examples:
    (use 'incanter.core)
    (symmetric-matrix [1
    2 3
    4 5 6
    7 8 9 10])
    (half-vectorize
    (symmetric-matrix [1
    2 3
    4 5 6
    7 8 9 10] :lower false))
  61. tan
    fn ([A])
    Returns the tangent of the elements in the given matrix, sequence or number.
    Equivalent to R's tan function.
  62. to-dummies
    fn ([coll])
  63. to-labels
    fn ([coll cat-var])
  64. to-levels
    fn ([coll & options])
  65. to-list
    fn ([mat])
    Returns a list-of-lists if the given matrix is two-dimensional,
    and a flat list if the matrix is one-dimensional.
  66. to-matrix
    fn ([dataset & options])
    Converts a dataset into a matrix. Equivalent to R's as.matrix function
    for datasets.
    Options:
    :dummies (default false) -- if true converts non-numeric variables into sets
    of binary dummy variables, otherwise converts
    them into numeric codes.
  67. to-vect
    fn ([mat])
    Returns a vector-of-vectors if the given matrix is two-dimensional,
    and a flat vector if the matrix is one-dimensional. This is a bit
    slower than the to-list function.
  68. trace
    fn ([mat])
    Returns the trace of the given matrix.
    References:
    http://en.wikipedia.org/wiki/Matrix_trace
    http://incanter.org/docs/parallelcolt/api/cern/colt/matrix/tdouble/algo/DoubleAlgebra.html
  69. trans
    fn ([mat])
    Returns the transpose of the given matrix. Equivalent to R's t function
    Examples:
    (def A (matrix [[1 2 3]
    [4 5 6]
    [7 8 9]]))
    (trans A)
  70. vectorize
    fn ([mat])
    Returns the vectorization (i.e. vec) of the given matrix.
    The vectorization of an m-by-n matrix A, denoted by vec(A),
    is the m*n-by-1 column vector obtain by stacking the columns
    of the matrix A on top of one another.
    For instance:
    (= (vectorize (matrix [[a b] [c d]])) (matrix [a c b d]))
    Examples:
    (def A (matrix [[1 2] [3 4]]))
    (vectorize A)
    References:
    http://en.wikipedia.org/wiki/Vectorization_(mathematics)
  71. view
    multi
    This is a general 'view' function. When given an Incanter matrix/dataset
    or a Clojure numeric collection, it will display it in a Java Swing
    JTable. When given an Incanter chart object, it will display it in a new
    window. When given a URL string, it will open the location with the
    platform's default web browser.
    Examples:
    (use '(incanter core stats datasets charts))
    ;; view matrices
    (def rand-mat (matrix (sample-normal 100) 4))
    (view rand-mat)
    ;; view numeric collections
    (view [1 2 3 4 5])
    (view (sample-normal 100))
    ;; view Incanter datasets
    (view (get-dataset :iris))
    ;; convert dataset to matrix, changing Species names to numeric codes
    (view (to-matrix (get-dataset :iris)))
    ;; convert dataset to matrix, changing Species names to dummy variables
    (view (to-matrix (get-dataset :iris) :dummies true))
    ;; view a chart
    (view (histogram (sample-normal 1000)))
    ;; view a URL
    (view "http://incanter.org")
    ;; view a PNG file
    (save (histogram (sample-normal 1000)) "/tmp/norm_hist.png")
    (view "file:///tmp/norm_hist.png")
[ - ] incanter.datasets
  1. **datasets**
    var
  2. get-dataset
    fn ([dataset-key & options])
    Returns the sample dataset associated with the given key. Most datasets
    are from R's sample data sets, as are the descriptions below.
    Options:
    :incanter-home -- if the incanter.home property is not set when the JVM is
    started, use the :incanter-home options to provide the
    parent directory of the sample data directory.
    Datasets:
    :iris -- the Fisher's or Anderson's Iris data set gives the
    measurements in centimeters of the variables sepal
    length and width and petal length and width,
    respectively, for 50 flowers from each of 3 species
    of iris.
    :cars -- The data give the speed of cars and the distances taken
    to stop. Note that the data were recorded in the 1920s.
    :survey -- survey data used in Scott Lynch's 'Introduction to Applied Bayesian Statistics
    and Estimation for Social Scientists'
    :us-arrests -- This data set contains statistics, in arrests per 100,000
    residents for assault, murder, and rape in each of the 50 US
    states in 1973. Also given is the percent of the population living
    in urban areas.
    :flow-meter -- flow meter data used in Bland Altman Lancet paper.
    :co2 -- has 84 rows and 5 columns of data from an experiment on the cold tolerance
    of the grass species _Echinochloa crus-galli_.
    :chick-weight -- has 578 rows and 4 columns from an experiment on the effect of diet
    on early growth of chicks.
    :plant-growth -- Results from an experiment to compare yields (as measured by dried
    weight of plants) obtained under a control and two different
    treatment conditions.
    :pontius -- These data are from a NIST study involving calibration of load cells.
    The response variable (y) is the deflection and the predictor variable
    (x) is load.
    See http://www.itl.nist.gov/div898/strd/lls/data/Pontius.shtml
    :filip -- NIST data set for linear regression certification,
    see http://www.itl.nist.gov/div898/strd/lls/data/Filip.shtml
    :longely -- This classic dataset of labor statistics was one of the first used to
    test the accuracy of least squares computations. The response variable
    (y) is the Total Derived Employment and the predictor variables are GNP
    Implicit Price Deflator with Year 1954 = 100 (x1), Gross National Product
    (x2), Unemployment (x3), Size of Armed Forces (x4), Non-Institutional
    Population Age 14 & Over (x5), and Year (x6).
    See http://www.itl.nist.gov/div898/strd/lls/data/Longley.shtml
    :Chwirut -- These data are the result of a NIST study involving ultrasonic calibration.
    The response variable is ultrasonic response, and the predictor variable is
    metal distance.
    See http://www.itl.nist.gov/div898/strd/nls/data/LINKS/DATA/Chwirut1.dat
    :thurstone -- test data for non-linear least squares.
    :austres -- Quarterly Time Series of the Number of Australian Residents
    :hair-eye-color -- Hair and eye color of sample of students
    :airline-passengers -- Monthly Airline Passenger Numbers 1949-1960
    :math-prog -- Pass/fail results for a high school mathematics assessment test
    and a freshmen college programming course.
    :iran-election -- Vote counts for 30 provinces from the 2009 Iranian election.
[ - ] incanter.incremental-stats
  1. a-state
    var
  2. mean-state
    fn ([x val queue])
  3. mean-state-2
    fn ([x val queue])
  4. tuplize-apply
    fn ([f])
  5. update-mean
    fn ([x] [x & xs])
  6. update-sample
    fn ([x])
[ - ] incanter.information-theory
  1. entropy
    fn ([counts])
    takes a map of class label to counts
  2. gain
    fn ([data])
    computes information gain from count matrix of feature class labels to predicted class labels.
    example: (gain {:weak {:positive 6 :negative 2}
    :strong {:positive 3 :negative 3}})
  3. gini-impurity
    fn ([counts])
    Gini impurity is measure of how often a randoincanter. chosen element from the set would be incorrectly labelled if it were randoincanter. labelled according to the distribution of labels in the subset. Gini impurity can be computed by summing the probability of each item being chosen times the probability of a mistake in categorizing that item. It reaches its minimum (zero) when all cases in the node fall into a single target category.
  4. kl-divergence
    fn ([p-counts q-counts])
    http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
    In probability theory and information theory, the Kullback–Leibler divergence (also information divergence, information gain, or relative entropy) is a non-symmetric measure of the difference between two probability distributions P and Q. KL measures the expected number of extra bits required to code samples from P when using a code based on Q, rather than using a code based on P. Typically P represents the true distribution of data, observations, or a precise calculated theoretical distribution. The measure Q typically represents a theory, model, description, or approximation of P.
    takes a map of class label to counts.
    note the (> p 0) predicate defines (* 0 (log2 0)) as 0 rather than NaN
    assumes you pass in distributions as nested maps and flattens them before applying the algorithm.
  5. mutual-information
    fn ([joint individuals])
    mutual information is the Kullback-Leibler divergence between the product of the marginal distributions of two random variables, p(x)p(y), and the the random variables' joint distribution, p(x,y).
    see: http://en.wikipedia.org/wiki/Mutual_information
  6. total-correlation
    fn ([joint individuals])
    total correlation is a multivariate generalization of mutual information. it is the Kullback-Leibler divergence between the joint distribution of a set and its maximum entropy product approximation.
    see: http://en.wikipedia.org/wiki/Total_correlation
[ - ] incanter.io
  1. clj-to-json-file
    fn ([c f])
  2. columns
    fn ([& x])
  3. csv-line
    fn ([v])
  4. csv-table
    fn ([m])
  5. into-file
    fn ([filename stuff])
  6. json-from-classpath
    fn ([f])
  7. load-resource
    fn ([f])
  8. package-model
    fn ([file prob-map-tuple])
  9. query
    fn ([table sample & columns])
  10. random-row
    var
  11. read-dataset
    fn ([filename & options])
    Returns a dataset read from a file or a URL.
    Options:
    :delim (default \,), other options (\tab \space \| etc)
    :quote (default \") character used for quoting strings
    :skip (default 0) the number of lines to skip at the top of the file.
    :header (default false) indicates the file has a header line
  12. read-from-classpath
    fn ([f])
  13. read-json-file
    fn ([f])
  14. read-json-lines
    fn ([f])
  15. read-map
    fn ([& keys])
  16. report-model
    fn
  17. sql-from
    fn
  18. sql-limit
    fn
  19. sql-order-by
    fn ([c])
  20. sql-query
    fn ([d q])
  21. sql-select
    fn ([& x])
  22. sql-unique
    fn
  23. sql-where
    fn ([pred])
  24. string-date-read-map
    fn ([& keys])
  25. unpackage-model
    fn ([file])
  26. with-mysql-results
    fn ([dbinfo query f])
    takes dbinfo, query and a fn and applys the fn to query results.
    example dbinfo:
    {:host "localhost"
    :port 3306
    :name "testimport"
    :classname "com.mysql.jdbc.Driver"
    :subprotocol "mysql"
    :user "root"
    :password "12345"}
[ - ] incanter.optimize
  1. derivative
    fn ([f & options])
    Returns a function that approximates the derivative of the given function.
    Options:
    :dx (default 0.0001)
    Examples:
    (use '(incanter core optimize charts stats))
    (defn cube [x] (* x x x))
    (def cube-deriv (derivative cube))
    (cube-deriv 2) ; value: 12.000600010022566
    (cube-deriv 3) ; value: 27.00090001006572
    (cube-deriv 4) ; value: 48.00120000993502
    (def x (range -3 3 0.1))
    (def plot (xy-plot x (map cube x)))
    (view plot)
    (add-lines plot x (map cube-deriv x))
    ;; get the second derivative function
    (def cube-deriv2 (derivative cube-deriv))
    (add-lines plot x (map cube-deriv2 x))
    ;; plot the normal pdf and its derivatives
    (def plot (xy-plot x (pdf-normal x)))
    (view plot)
    (def pdf-deriv (derivative pdf-normal))
    (add-lines plot x (pdf-deriv x))
    ;; plot the second derivative function
    (def pdf-deriv2 (derivative pdf-deriv))
    (add-lines plot x (pdf-deriv2 x))
  2. gradient
    fn ([f start & options])
    Returns a function that calculates a 5-point approximation to
    the gradient of the given function. The vector of start values are
    used to determine the number of parameters required by the function, and
    to scale the step-size. The generated function accepts a vector of
    parameter values and a vector of x data points and returns a matrix,
    where each row is the gradient evaluated at the corresponding x value.
    Examples:
    (use '(incanter core optimize datasets charts))
    (defn f [theta x]
    (+ (nth theta 0)
    (div (* x (- (nth theta 1) (nth theta 0)))
    (+ (nth theta 2) x))))
    (def start [20 200 100])
    (def data (to-matrix (get-dataset :thurstone)))
    (def x (sel data :cols 1))
    (def y (sel data :cols 0))
    ;; view the data
    (view (scatter-plot x y))
    (def grad (gradient f start))
    (time (doall (grad start x)))
  3. hessian
    fn ([f start & options])
    Returns a function that calculates an approximation to the Hessian matrix
    of the given function. The vector of start values are used to determine
    the number of parameters required by the function, and to scale the
    step-size. The generated function accepts a vector of
    parameter values and a vector of x data points and returns a matrix,
    where each row with p*(p+1)/2 columns, one for each unique entry in
    the Hessian evaluated at the corresponding x value.
    Examples:
    (use '(incanter core optimize datasets charts))
    (defn f [theta x]
    (+ (nth theta 0)
    (div (* x (- (nth theta 1) (nth theta 0)))
    (+ (nth theta 2) x))))
    (def start [20 200 100])
    (def data (to-matrix (get-dataset :thurstone)))
    (def x (sel data :cols 1))
    (def y (sel data :cols 0))
    ;; view the data
    (view (scatter-plot x y))
    (time (def hess (hessian f start)))
    (time (doall (hess start x)))
  4. integrate
    fn ([f a b])
    Integrate a function f from a to b
    Examples:
    (defn f1 [x] 1)
    (defn f2 [x] (Math/pow x 2))
    (defn f3 [x] (* x (Math/exp (Math/pow x 2))))
    (integrate f1 0 5)
    (integrate f2 0 1)
    (integrate f3 0 1)
    ;; normal distribution
    (def std 1)
    (def mu 0)
    (defn normal [x]
    (/ 1
    (* (* std (Math/sqrt (* 2 Math/PI)))
    (Math/exp (/ (Math/pow (- (- x mu)) 2)
    (* 2 (Math/pow std 2)))))))
    (integrate normal 1.96 10)
    Reference:
    http://jng.imagine27.com/articles/2009-04-09-161839_integral_calculus_in_lambda_calculus_lisp.html
    http://iam.elbenshira.com/archives/151_integral-calculus-in-haskell/
  5. non-linear-model
    fn ([f y x start & options])
    Determine the nonlinear least-squares estimates of the
    parameters of a nonlinear model.
    Based on R's nls (non-linear least squares) function.
    Arguments:
    f -- model function, takes two argumetns the first a list of parameters
    that are to be estimated, and an x value.
    y -- sequence of dependent data
    x -- sequence of independent data
    start -- start values for the parameters to be estimated
    Options:
    :method (default :gauss-newton) other option :newton-raphson
    :tol (default 1E-5)
    :max-iter (default 200)
    Returns: a hash-map containing the following fields:
    :method -- the method used
    :coefs -- the parameter estimates
    :gradient -- the estimated gradient
    :hessian -- the estimated hessian, if available
    :iterations -- the number of iterations performed
    :fitted -- the fitted values of y (i.e. y-hat)
    :rss -- the residual sum-of-squares
    :x -- the independent data values
    :y -- the dependent data values
    Examples:
    ;; example 1
    (use '(incanter core optimize datasets charts))
    ;; define the Michaelis-Menton model function
    ;; y = a + (b - a)*x/(c + x)
    (defn f [theta x]
    (let [[a b c] theta]
    (plus a (div (mult x (minus b a)) (plus c x)))))
    (def start [20 200 100])
    (def data (to-matrix (get-dataset :thurstone)))
    (def x (sel data :cols 1))
    (def y (sel data :cols 0))
    ;; view the data
    (def plot (scatter-plot x y))
    (view plot)
    (def nlm (non-linear-model f y x start))
    (add-lines plot x (:fitted nlm))
    ;; example 2
    (use '(incanter core optimize datasets charts))
    ;; Chwirut data set from NIST
    ;; http://www.itl.nist.gov/div898/strd/nls/data/LINKS/DATA/Chwirut1.dat
    (def data (to-matrix (get-dataset :chwirut)))
    (def x (sel data :cols 1))
    (def y (sel data :cols 0))
    ;; define model function: y = exp(-b1*x)/(b2+b3*x) + e
    (defn f [theta x]
    (let [[b1 b2 b3] theta]
    (div (exp (mult (minus b1) x)) (plus b2 (mult b3 x)))))
    (def plot (scatter-plot x y :legend true))
    (view plot)
    ;; the newton-raphson algorithm fails to converge to the correct solution
    ;; using first set of start values from NIST, but the default gauss-newton
    ;; algorith converges to the correct solution.
    (def start1 [0.1 0.01 0.02])
    (add-lines plot x (f start1 x))
    (def nlm1 (non-linear-model f y x start1))
    (add-lines plot x (:fitted nlm1))
    ;; both algorithms converges with second set of start values from NIST
    (def start2 [0.15 0.008 0.010])
    (add-lines plot x (f start2 x))
    (def nlm2 (non-linear-model f y x start2))
    (add-lines plot x (:fitted nlm2))
[ - ] incanter.probability
  1. +cond-prob-tuples
    fn ([[x y] [p q]] [x])
    adds two conditional probability tuples. [[{}{}][{}{}]] -> [{}{}]
    passes through a single conditional probability tuple. [[{}{}] -> [{}{}]
  2. P
    fn ([a given b & bs] [a given b])
  3. always-false
    fn ([x])
  4. any
    fn ([x])
  5. binary
    fn ([pred])
    a function for binary classification that takes a booleavn value and returns 1 for true and 0 for false.
  6. bucket
    fn ([f r] [f t r] [f t r p])
  7. bucket-negative?
    fn ([x])
  8. comb-merge
    fn ([f x y])
    combinatorial merge takes two maps and a fn and and merges all combinations fo keys between the two maps using the fn.
  9. cond-prob-tuple
    fn ([a b])
    build [a&b b] count tuples for calculating conditional probabilities p(a | b)
  10. constrain
    fn ([k f v])
  11. count-missing
    fn ([exp act])
  12. eq
    fn ([y])
  13. gt
    fn ([y])
  14. label-cond-prob-dependent
    fn ([a & bs])
  15. lt
    fn ([y])
  16. map-counter
    fn ([f])
    wraps a counting function for maps in apply and deep-merge-with, curreid fn expects a seq of maps to count.
  17. marginals
    fn ([j])
    computes the marginal PMFs from the joint PMF.
    of the form: {a {b n}} where n is the nubmer of co-occurances of a and b.
    for summation note that a variable, suppose it is x, is represented as a level of depth in the nested maps, so summation for margianl of x occurs on all branches stemming from maps at the level corresponding with x.
  18. missing?
    fn ([x])
  19. n-sided-die
    fn ([n])
  20. ne
    fn ([y])
  21. pred
    fn ([f arg])
  22. present-when
    fn ([f] [f & keys])
  23. range-classifier
    fn ([range item])
    classify one item based on what interval in a range the item falls into.
  24. rolling-windows
    fn ([len])
  25. summate
    fn ([m])
    summate all counts in a deeply nested map of counts.
  26. summate-level
    fn ([j])
  27. vector-counter
    fn ([f])
    wraps a counting function for vectors in apply and deep-merge-with, curreid fn expects a seq of vectors to count.
  28. |
    fn ([a b])
    this is the core of the conditional probability based classification model. this model takes a & bs in the form a given bs. a and bs are all functions, and the conditional probability classification model composes a new classifier function that ultimately returns the cond-prob-tuple: [{a's counts}{b's counts}].
  29. |each
    fn ([a bs])
    conditional probability where the bs are taken to be a map of feature->classifier-function paris, where we want to compute the conditional probability between a and each b independently.
[ - ] incanter.stats
  1. bigrams
    fn ([s])
  2. bootstrap
    fn ([data statistic & options])
    Returns a bootstrap sample of the given statistic on the given data.
    Arguments:
    data -- vector of data to resample from
    statistic -- a function that returns a value given a vector of data
    Options:
    :size -- the number of bootstrap samples to return
    :smooth -- (default false) smoothing option
    :smooth-sd -- (default (/ (sqrt (count data)))) determines the standard
    deviation of the noise to use for smoothing
    :replacement -- (default true) determines if sampling of the data
    should be done with replacement
    References:
    1. Clifford E. Lunneborg, Data Analysis by Resampling Concepts and Applications, 2000, pages 105-117
    2. http://en.wikipedia.org/wiki/Bootstrapping_(statistics)
    Examples:
    ;; example from Data Analysis by Resampling Concepts and Applications
    ;; Clifford E. Lunneborg (pages 119-122)
    (use '(incanter core stats charts))
    ;; weights (in grams) of 50 randoincanter. sampled bags of preztels
    (def weights [464 447 446 454 450 457 450 442
    433 452 449 454 450 438 448 449
    457 451 456 452 450 463 464 453
    452 447 439 449 468 443 433 460
    452 447 447 446 450 459 466 433
    445 453 454 446 464 450 456 456
    447 469])
    ;; calculate the sample median, 450
    (median weights)
    ;; generate bootstrap sample
    (def t* (bootstrap weights median :size 2000))
    ;; view histogram of bootstrap histogram
    (view (histogram t*))
    ;; calculate the mean of the bootstrap median ~ 450.644
    (mean t*)
    ;; calculate the standard error ~ 1.083
    (def se (sd t*))
    ;; 90% standard normal CI ~ (448.219 451.781)
    (plus (median weights) (mult (quantile-normal [0.05 0.95]) se))
    ;; 90% symmetric percentile CI ~ (449.0 452.5)
    (quantile t* :probs [0.05 0.95])
    ;; 90% non-symmetric percentile CI ~ (447.5 451.0)
    (minus (* 2 (median weights)) (quantile t* :probs [0.95 0.05]))
    ;; calculate bias
    (- (mean t*) (median weights)) ;; ~ 0.644
    ;; example with smoothing
    ;; Newcomb's speed of light data
    (use '(incanter core stats charts))
    ;; A numeric vector giving the Third Series of measurements of the
    ;; passage time of light recorded by Newcomb in 1882. The given
    ;; values divided by 1000 plus 24 give the time in millionths of a
    ;; second for light to traverse a known distance. The 'true' value is
    ;; now considered to be 33.02.
    (def speed-of-light [28 -44 29 30 24 28 37 32 36 27 26 28 29
    26 27 22 23 20 25 25 36 23 31 32 24 27
    33 16 24 29 36 21 28 26 27 27 32 25 28
    24 40 21 31 32 28 26 30 27 26 24 32 29
    34 -2 25 19 36 29 30 22 28 33 39 25 16 23])
    ;; view histogram of data to see outlier observations
    (view (histogram speed-of-light :nbins 30))
    (def samp (bootstrap speed-of-light median :size 10000))
    (view (histogram samp :density true :nbins 30))
    (mean samp)
    (quantile samp :probs [0.025 0.975])
    (def smooth-samp (bootstrap speed-of-light median :size 10000 :smooth true))
    (view (histogram smooth-samp :density true :nbins 30))
    (mean smooth-samp)
    (quantile smooth-samp :probs [0.025 0.975])
  3. cdf-beta
    fn ([x & options])
    Returns the Beta cdf of the given value of x. It will return a sequence
    of values, if x is a sequence. This is equivalent to R's pbeta function.
    Options:
    :alpha (default 1)
    :beta (default 1)
    :lower-tail (default true)
    See also:
    pdf-beta and sample-beta
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/Beta.html
    http://en.wikipedia.org/wiki/Beta_distribution
    http://en.wikipedia.org/wiki/Cumulative_distribution_function
    Example:
    (cdf-beta 0.5 :alpha 1 :beta 2)
    (cdf-beta 0.5 :alpha 1 :beta 2 :lower-tail false)
  4. cdf-binomial
    fn ([x & options])
    Returns the Bionomial cdf of the given value of x. It will return a sequence
    of values, if x is a sequence. Same as R's pbinom
    Options:
    :size (default 1)
    :prob (default 1/2)
    :lower-tail (default true)
    See also:
    pdf-binomial and sample-binomial
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/Binomial.html
    http://en.wikipedia.org/wiki/Binomial_distribution
    http://en.wikipedia.org/wiki/Cumulative_distribution_function
    Example:
    (cdf-binomial 10 :prob 1/4 :size 20)
  5. cdf-chisq
    fn ([x & options])
    Returns the Chi Square cdf of the given value of x. It will return a sequence
    of values, if x is a sequence. Same as R's pchisq function.
    Options:
    :df (default 1)
    :lower-tail (default true)
    See also:
    pdf-chisq and sample-chisq
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/ChiSquare.html
    http://en.wikipedia.org/wiki/Chi_square_distribution
    http://en.wikipedia.org/wiki/Cumulative_distribution_function
    Example:
    (cdf-chisq 5.0 :df 2)
    (cdf-chisq 5.0 :df 2 :lower-tail false)
  6. cdf-empirical
    fn ([x])
    Returns a step-function representing the empirical cdf of the given data.
    Equivalent to R's ecdf function.
    The following description is from the ecdf help in R: The e.c.d.f.
    (empirical cumulative distribution function) Fn is a step function
    with jumps i/n at observation values, where i is the number of tied
    observations at that value. Missing values are ignored.
    For observations 'x'= (x1,x2, ... xn), Fn is the fraction of
    observations less or equal to t, i.e.,
    Fn(t) = #{x_i <= t} / n = 1/n sum(i=1,n) Indicator(xi <= t).
    Examples:
    (use '(incanter core stats charts))
    (def exam1 [192 160 183 136 162 165 181 188 150 163 192 164 184
    189 183 181 188 191 190 184 171 177 125 192 149 188
    154 151 159 141 171 153 169 168 168 157 160 190 166 150])
    ;; the ecdf function returns an empirical cdf function for the given data
    (def ecdf (cdf-empirical exam1))
    ;; plot the data's empircal cdf
    (view (scatter-plot exam1 (map ecdf exam1)))
  7. cdf-exp
    fn ([x & options])
    Returns the Exponential cdf of the given value of x. It will return a sequence
    of values, if x is a sequence. Same as R's pexp
    Options:
    :rate (default 1)
    See also:
    pdf-exp and sample-exp
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/Exponential.html
    http://en.wikipedia.org/wiki/Exponential_distribution
    http://en.wikipedia.org/wiki/Cumulative_distribution_function
    Example:
    (cdf-exp 2.0 :rate 1/2)
  8. cdf-f
    fn ([x & options])
    Returns the F-distribution cdf of the given value, x. It will return a sequence
    of values, if x is a sequence. This is equivalent to R's pf function.
    Options:
    :df1 (default 1)
    :df2 (default 1)
    See also:
    pdf-f and quantile-f
    References:
    http://en.wikipedia.org/wiki/F_distribution
    http://mathworld.wolfram.com/F-Distribution.html
    http://en.wikipedia.org/wiki/Cumulative_distribution_function
    Example:
    (cdf-f 1.0 :df1 5 :df2 2)
  9. cdf-gamma
    fn ([x & options])
    Returns the Gamma cdf for the given value of x. It will return a sequence
    of values, if x is a sequence. This is equivalent to R's pgamma function.
    Options:
    :shape (default 1)
    :rate (default 1)
    :lower-tail (default true)
    See also:
    pdf-gamma and sample-gamma
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/Gamma.html
    http://en.wikipedia.org/wiki/Gamma_distribution
    http://en.wikipedia.org/wiki/Cumulative_distribution_function
    Example:
    (cdf-gamma 10 :shape 1 :rate 2)
    (cdf-gamma 3 :shape 1 :lower-tail false)
  10. cdf-neg-binomial
    fn ([x & options])
    Returns the Negative Binomial cdf of the given value of x. It will return a sequence
    of values, if x is a sequence. Same as R's dnbinom
    Options:
    :size (default 10)
    :prob (default 1/2)
    See also:
    cdf-neg-binomial and sample-neg-binomial
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/NegativeBinomial.html
    http://en.wikipedia.org/wiki/Negative_binomial_distribution
    http://en.wikipedia.org/wiki/Cumulative_distribution_function
    Example:
    (cdf-neg-binomial 10 :prob 1/2 :size 20)
  11. cdf-normal
    fn ([x & options])
    Returns the Normal cdf of the given value, x. It will return a sequence
    of values, if x is a sequence. This is equivalent to R's pnorm function.
    Options:
    :mean (default 0)
    :sd (default 1)
    See also:
    pdf-normal, quantile-normal, sample-normal
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/Normal.html
    http://en.wikipedia.org/wiki/Normal_distribution
    http://en.wikipedia.org/wiki/Cumulative_distribution_function
    Example:
    (cdf-normal 1.96 :mean -2 :sd (sqrt 0.5))
  12. cdf-poisson
    fn ([x & options])
    Returns the Poisson cdf of the given value of x. It will return a sequence
    of values, if x is a sequence. Same as R's ppois
    Options:
    :lambda (default 1)
    :lower-tail (default true)
    See also:
    cdf-poisson and sample-poisson
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/Poisson.html
    http://en.wikipedia.org/wiki/Poisson_distribution
    http://en.wikipedia.org/wiki/Cumulative_distribution_function
    Example:
    (cdf-poisson 5 :lambda 10)
  13. cdf-t
    fn ([x & options])
    Returns the Student's t cdf for the given value of x. It will return a sequence
    of values, if x is a sequence. Same as R's pt function.
    Options:
    :df (default 1)
    See also:
    pdf-t, quantile-t, and sample-t
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/StudentT.html
    http://en.wikipedia.org/wiki/Student-t_distribution
    http://en.wikipedia.org/wiki/Cumulative_distribution_function
    Example:
    (cdf-t 1.2 :df 10)
  14. cdf-uniform
    fn ([x & options])
    Returns the Uniform cdf of the given value of x. It will return a sequence
    of values, if x is a sequence. This is equivalent to R's punif function.
    Options:
    :min (default 0)
    :max (default 1)
    See also:
    pdf-uniform and sample-uniform
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/DoubleUniform.html
    http://en.wikipedia.org/wiki/Uniform_distribution
    http://en.wikipedia.org/wiki/Cumulative_distribution_function
    Example:
    (cdf-uniform 5)
    (cdf-uniform 5 :min 1 :max 10)
  15. chebyshev-distance
    fn ([a b])
    In the limiting case of Lp reaching infinity we obtain the Chebyshev distance.
  16. chisq-test
    fn ([& options])
    Performs chi-squared contingency table tests and goodness-of-fit tests.
    If the optional argument :y is not provided then a goodness-of-fit test
    is performed. In this case, the hypothesis tested is whether the
    population probabilities equal those in :probs, or are all equal if
    :probs is not given.
    If :y is provided, it must be a sequence of integers that is the
    same length as x. A contingency table is computed from x and :y.
    Then, Pearson's chi-squared test of the null hypothesis that the joint
    distribution of the cell counts in a 2-dimensional contingency
    table is the product of the row and column marginals is performed.
    By default the Yates' continuity correction for 2x2 contingency
    tables is performed, this can be disabled by setting the :correct
    option to false.
    Options:
    :x -- a sequence of numbers.
    :y -- a sequence of numbers
    :table -- a contigency table. If one dimensional, the test is a goodness-of-fit
    :probs (when (nil? y) -- (repeat n-levels (/ n-levels)))
    :freq (default nil) -- if given, these are rescaled to probabilities
    :correct (default true) -- use Yates' correction for continuity for 2x2 contingency tables
    Returns:
    :X-sq -- the Pearson X-squared test statistics
    :p-value -- the p-value for the test statistic
    :df -- the degress of freedom
    Examples:
    (use '(incanter core stats))
    (chisq-test :x [1 2 3 2 3 2 4 3 5]) ;; X-sq 2.6667
    ;; create a one-dimensional table of this data
    (def table (matrix [1 3 3 1 1]))
    (chisq-test :table table) ;; X-sq 2.6667
    (chisq-test :table (trans table)) ;; throws exception
    (chisq-test :x [1 0 0 0 1 1 1 0 0 1 0 0 1 1 1 1]) ;; 0.25
    (use '(incanter core stats datasets))
    (def math-prog (to-matrix (get-dataset :math-prog)))
    (def x (sel math-prog :cols 1))
    (def y (sel math-prog :cols 2))
    (chisq-test :x x :y y) ;; X-sq = 1.24145, df=1, p-value = 0.26519
    (chisq-test :x x :y y :correct false) ;; X-sq = 2.01094, df=1, p-value = 0.15617
    (def table (matrix [[31 12] [9 8]]))
    (chisq-test :table table) ;; X-sq = 1.24145, df=1, p-value = 0.26519
    (chisq-test :table table :correct false) ;; X-sq = 2.01094, df=1, p-value = 0.15617
    ;; use the detabulate function to create data rows corresponding to the table
    (def detab (detabulate :table table))
    (chisq-test :x (sel detab :cols 0) :y (sel detab :cols 1))
    ;; look at the hair-eye-color data
    ;; turn the count data for males into a contigency table
    (def male (matrix (sel (get-dataset :hair-eye-color) :cols 3 :rows (range 16)) 4))
    (chisq-test :table male) ;; X-sq = 41.280, df = 9, p-value = 4.44E-6
    ;; turn the count data for females into a contigency table
    (def female (matrix (sel (get-dataset :hair-eye-color) :cols 3 :rows (range 16 32)) 4))
    (chisq-test :table female) ;; X-sq = 106.664, df = 9, p-value = 7.014E-19,
    ;; supply probabilities to goodness-of-fit test
    (def table [89 37 30 28 2])
    (def probs [0.40 0.20 0.20 0.19 0.01])
    (chisq-test :table table :probs probs) ;; X-sq = 5.7947, df = 4, p-value = 0.215
    ;; use frequencies instead of probabilities
    (def freq [40 20 20 15 5])
    (chisq-test :table table :freq freq) ;; X-sq = 9.9901, df = 4, p-value = 0.04059
    References:
    http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm
    http://en.wikipedia.org/wiki/Pearson's_chi-square_test
    http://en.wikipedia.org/wiki/Yates'_chi-square_test
  17. concordant?
    fn ([[[a1 b1] [a2 b2]]])
  18. correlation
    fn ([x y] [mat])
    Returns the sample correlation of x and y, or the correlation
    matrix of the given matrix.
    Examples:
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/stat/tdouble/DoubleDescriptive.html
    http://en.wikipedia.org/wiki/Correlation
  19. correlation-linearity-test
    fn ([a b])
    http://en.wikipedia.org/wiki/Correlation_ratio
    It is worth noting that if the relationship between values of and values of overline y_x is linear (which is certainly true when there are only two possibilities for x) this will give the same result as the square of the correlation coefficient, otherwise the correlation ratio will be larger in magnitude. It can therefore be used for judging non-linear relationships.
  20. correlation-ratio
    fn ([& xs])
    http://en.wikipedia.org/wiki/Correlation_ratio
    In statistics, the correlation ratio is a measure of the relationship between the statistical dispersion within individual categories and the dispersion across the whole population or sample. i.e. the weighted variance of the category means divided by the variance of all samples.
    Example
    Suppose there is a distribution of test scores in three topics (categories):
    * Algebra: 45, 70, 29, 15 and 21 (5 scores)
    * Geometry: 40, 20, 30 and 42 (4 scores)
    * Statistics: 65, 95, 80, 70, 85 and 73 (6 scores).
    Then the subject averages are 36, 33 and 78, with an overall average of 52.
    The sums of squares of the differences from the subject averages are 1952 for Algebra, 308 for Geometry and 600 for Statistics, adding to 2860, while the overall sum of squares of the differences from the overall average is 9640. The difference between these of 6780 is also the weighted sum of the square of the differences between the subject averages and the overall average:
    5(36 − 52)2 + 4(33 − 52)2 + 6(78 − 52)2 = 6780
    This gives
    eta^2 =6780/9640=0.7033
    suggesting that most of the overall dispersion is a result of differences between topics, rather than within topics. Taking the square root
    eta = sqrt 6780/9640=0.8386
    Observe that for η = 1 the overall sample dispersion is purely due to dispersion among the categories and not at all due to dispersion within the individual categories. For a quick comprehension simply imagine all Algebra, Geometry, and Statistics scores being the same respectively, e.g. 5 times 36, 4 times 33, 6 times 78.
  21. cosine-similarity
    fn ([a b])
    http://en.wikipedia.org/wiki/Cosine_similarity
    http://www.appliedsoftwaredesign.com/cosineSimilarityCalculator.php
    The Cosine Similarity of two vectors a and b is the ratio: a dot b / ||a|| ||b||
    Let d1 = {2 4 3 1 6}
    Let d2 = {3 5 1 2 5}
    Cosine Similarity (d1, d2) = dot(d1, d2) / ||d1|| ||d2||
    dot(d1, d2) = (2)*(3) + (4)*(5) + (3)*(1) + (1)*(2) + (6)*(5) = 61
    ||d1|| = sqrt((2)^2 + (4)^2 + (3)^2 + (1)^2 + (6)^2) = 8.12403840464
    ||d2|| = sqrt((3)^2 + (5)^2 + (1)^2 + (2)^2 + (5)^2) = 8
    Cosine Similarity (d1, d2) = 61 / (8.12403840464) * (8)
    = 61 / 64.9923072371
    = 0.938572618717
  22. covariance
    fn ([x y] [mat])
    Returns the sample covariance of x and y.
    Examples:
    ;; create some data that covaries
    (def x (sample-normal 100))
    (def err (sample-normal 100))
    (def y (plus (mult 5 x) err))
    ;; estimate the covariance of x and y
    (covariance x y)
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/stat/tdouble/DoubleDescriptive.html
    http://en.wikipedia.org/wiki/Covariance
  23. cumulative-mean
    fn ([coll])
    Returns a sequence of cumulative means for the given collection. For instance
    The first value equals the first value of the argument, the second value is
    the mean of the first two arguments, the third is the mean of the first three
    arguments, etc.
    Examples:
    (cumulative-mean (sample-normal 100))
  24. damerau-levenshtein-distance
    fn ([a b])
  25. detabulate
    fn ([& options])
    Take a contingency table of counts and returns a matrix of observations.
    Examples:
    (use '(incanter core stats datasets))
    (def by-gender (group-by (get-dataset :hair-eye-color) 2))
    (def table (matrix (sel (first by-gender) :cols 3) 4))
    (detabulate :table table)
    (tabulate (detabulate :table table))
    ;; example 2
    (def data (matrix [[1 0]
    [1 1]
    [1 1]
    [1 0]
    [0 0]
    [1 1]
    [1 1]
    [1 0]
    [1 1]]))
    (tabulate data)
    (tabulate (detabulate :table (:table (tabulate data))))
  26. dice-coefficient
    fn ([a b])
    http://en.wikipedia.org/wiki/Dice%27s_coefficient
    Dice's coefficient (also known as the Dice coefficient) is a similarity measure related to the Jaccard index.
  27. dice-coefficient-str
    fn ([a b])
    http://en.wikipedia.org/wiki/Dice%27s_coefficient
    When taken as a string similarity measure, the coefficient may be calculated for two strings, x and y using bigrams. here nt is the number of character bigrams found in both strings, nx is the number of bigrams in string x and ny is the number of bigrams in string y. For example, to calculate the similarity between:
    night
    nacht
    We would find the set of bigrams in each word:
    {ni,ig,gh,ht}
    {na,ac,ch,ht}
    Each set has four elements, and the intersection of these two sets has only one element: ht.
    Plugging this into the formula, we calculate, s = (2 · 1) / (4 + 4) = 0.25.
  28. discordant-pairs
    fn ([a b])
    http://en.wikipedia.org/wiki/Discordant_pairs
  29. discordant?
    fn
  30. euclidean-distance
    fn ([a b])
    http://en.wikipedia.org/wiki/Euclidean_distance
    the Euclidean distance or Euclidean metric is the ordinary distance between two points that one would measure with a ruler, and is given by the Pythagorean formula. By using this formula as distance, Euclidean space (or even any inner product space) becomes a metric space. The associated norm is called the Euclidean norm. Older literature refers to the metric as Pythagorean metric.
  31. gamma-coefficient
    fn ([])
    http://www.statsdirect.com/help/nonparametric_methods/kend.htm
    The gamma coefficient is given as a measure of association that is highly resistant to tied data (Goodman and Kruskal, 1963):
  32. hamming-distance
    fn ([a b])
    http://en.wikipedia.org/wiki/Hamming_distance
    In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. Put another way, it measures the minimum number of substitutions required to change one string into the other, or the number of errors that transformed one string into the other.
  33. indicator
    fn ([pred coll])
    Returns a sequence of ones and zeros, where ones
    are returned when the given predicate is true for
    corresponding element in the given collection, and
    zero otherwise.
    Examples:
    (use 'incanter.stats)
    (indicator #(neg? %) (sample-normal 10))
    ;; return the sum of the positive values in a normal sample
    (def x (sample-normal 100))
    (sum (mult x (indicator #(pos? %) x)))
  34. jaccard-distance
    fn ([a b])
    http://en.wikipedia.org/wiki/Jaccard_index
    The Jaccard distance, which measures dissimilarity between sample sets, is complementary to the Jaccard coefficient and is obtained by subtracting the Jaccard coefficient from 1, or, equivalently, by dividing the difference of the sizes of the union and the intersection of two sets by the size of the union.
  35. jaccard-index
    fn ([a b])
    http://en.wikipedia.org/wiki/Jaccard_index
    The Jaccard index, also known as the Jaccard similarity coefficient (originally coined coefficient de communauté by Paul Jaccard), is a statistic used for comparing the similarity and diversity of sample sets.
    The Jaccard coefficient measures similarity between sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets.
  36. kendalls-tau
    fn ([a b])
    http://en.wikipedia.org/wiki/Kendall_tau_rank_correlation_coefficient
    http://www.statsdirect.com/help/nonparametric_methods/kend.htm
    http://mail.scipy.org/pipermail/scipy-dev/2009-March/011589.html
    best explanation and example is in "cluster analysis for researchers" page 165.
    http://www.amazon.com/Cluster-Analysis-Researchers-Charles-Romesburg/dp/1411606175
  37. kendalls-tau-distance
    fn
  38. kendalls-w
    fn ([])
    http://en.wikipedia.org/wiki/Kendall%27s_W
    http://faculty.chass.ncsu.edu/garson/PA765/friedman.htm
    Suppose that object i is given the rank ri,j by judge number j, where there are in total n objects and m judges. Then the total rank given to object i is
    Ri = sum Rij
    and the mean value of these total ranks is
    Rbar = 1/2 m (n + 1)
    The sum of squared deviations, S, is defined as
    S=sum1-n (Ri - Rbar)
    and then Kendall's W is defined as[1]
    W= 12S / m^2(n^3-n)
    If the test statistic W is 1, then all the survey respondents have been unanimous, and each respondent has assigned the same order to the list of concerns. If W is 0, then there is no overall trend of agreement among the respondents, and their responses may be regarded as essentially random. Intermediate values of W indicate a greater or lesser degree of unanimity among the various responses.
    Legendre[2] discusses a variant of the W statistic which accommodates ties in the rankings and also describes methods of making significance tests based on W.
    [{:observation [1 2 3]} {} ... {}] -> W
  39. kurtosis
    fn ([x])
    Returns the kurtosis of the data, x. "Kurtosis is a measure of the "peakedness"
    of the probability distribution of a real-valued random variable. Higher kurtosis
    means more of the variance is due to infrequent extreme deviations, as opposed to
    frequent modestly-sized deviations." (Wikipedia)
    Examples:
    (kurtosis (sample-normal 100000)) ;; approximately 0
    (kurtosis (sample-gamma 100000)) ;; approximately 6
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/stat/tdouble/DoubleDescriptive.html
    http://en.wikipedia.org/wiki/Kurtosis
  40. lee-distance
    fn ([a b q])
    http://en.wikipedia.org/wiki/Lee_distance
    In coding theory, the Lee distance is a distance between two strings x1x2...xn and y1y2...yn of equal length n over the q-ary alphabet {0,1,…,q-1} of size q >= 2. It is metric.
    If q = 2 or q = 3 the Lee distance coincides with the Hamming distance.
    The metric space induced by the Lee distance is a discrete analog of the elliptic space.
  41. levenshtein-distance
    fn ([a b])
    http://en.wikipedia.org/wiki/Levenshtein_distance
    internal representation is a table d with m+1 rows and n+1 columns
    where m is the length of a and m is the length of b.
    In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences (i.e., the so called edit distance). The Levenshtein distance between two strings is given by the minimum number of operations needed to transform one string into the other, where an operation is an insertion, deletion, or substitution of a single character.
    For example, the Levenshtein distance between "kitten" and "sitting" is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits:
    1. kitten → sitten (substitution of 's' for 'k')
    2. sitten → sittin (substitution of 'i' for 'e')
    3. sittin → sitting (insert 'g' at the end).
    The Levenshtein distance has several simple upper and lower bounds that are useful in applications which compute many of them and compare them. These include:
    * It is always at least the difference of the sizes of the two strings.
    * It is at most the length of the longer string.
    * It is zero if and only if the strings are identical.
    * If the strings are the same size, the Hamming distance is an upper bound on the Levenshtein distance.
  42. linear-model
    fn ([y x & options])
    Returns the results of performing a OLS linear regression of y on x.
    Arguments:
    y is a vector (or sequence) of values for the dependent variable
    x is a vector or matrix of values for the independent variables
    Options:
    :intercept (default true) indicates weather an intercept term should be included
    Returns:
    a map, of type ::linear-model, containing:
    :design-matrix -- a matrix containing the independent variables, and an intercept columns
    :coefs -- the regression coefficients
    :t-tests -- t-test values of coefficients
    :t-probs -- p-values for t-test values of coefficients
    :coefs-ci -- 95% percentile confidence interval
    :fitted -- the predicted values of y
    :residuals -- the residuals of each observation
    :std-errors -- the standard errors of the coeffients
    :sse -- the sum of squared errors, also called the residual sum of squares
    :ssr -- the regression sum of squares, also called the explained sum of squares
    :sst -- the total sum of squares (proportional to the sample variance)
    :r-square -- coefficient of determination
    Examples:
    (use '(incanter core stats datasets charts))
    (def iris (to-matrix (get-dataset :iris) :dummies true))
    (def y (sel iris :cols 0))
    (def x (sel iris :cols (range 1 6)))
    (def iris-lm (linear-model y x)) ; with intercept term
    (keys iris-lm) ; see what fields are included
    (:coefs iris-lm)
    (:sse iris-lm)
    (quantile (:residuals iris-lm))
    (:r-square iris-lm)
    (:adj-r-square iris-lm)
    (:f-stat iris-lm)
    (:f-prob iris-lm)
    (:df iris-lm)
    (def x1 (range 0.0 3 0.1))
    (view (xy-plot x1 (cdf-f x1 :df1 4 :df2 144)))
    References:
    http://en.wikipedia.org/wiki/OLS_Regression
    http://en.wikipedia.org/wiki/Coefficient_of_determination
  43. mahalanobis-distance
    fn ([x & options])
    Returns the Mahalanobis distance between x, which is
    either a vector or matrix of row vectors, and the
    centroid of the observations in the matrix :y.
    Arguments:
    x -- either a vector or a matrix of row vectors
    Options:
    :y -- Defaults to x, must be a matrix of row vectors which will be used to calculate a centroid
    :W -- Defaults to (solve (covariance y)), if an identity matrix is provided, the mahalanobis-distance
    function will be equal to the Euclidean distance.
    :centroid -- Defaults to (map mean (trans y))
    References:
    http://en.wikipedia.org/wiki/Mahalanobis_distance
    Examples:
    (use '(incanter core stats charts))
    ;; generate some multivariate normal data with a single outlier.
    (def data (bind-rows
    (bind-columns
    (sample-mvn 100
    :sigma (matrix [[1 0.9]
    [0.9 1]])))
    [-1.75 1.75]))
    ;; view a scatter plot of the data
    (let [[x y] (trans data)]
    (doto (scatter-plot x y)
    (add-points [(mean x)] [(mean y)])
    (add-pointer -1.75 1.75 :text "Outlier")
    (add-pointer (mean x) (mean y) :text "Centroid")
    view))
    ;; calculate the distances of each point from the centroid.
    (def dists (mahalanobis-distance data))
    ;; view a bar-chart of the distances
    (view (bar-chart (range 102) dists))
    ;; Now contrast with the Euclidean distance.
    (def dists (mahalanobis-distance data :W (matrix [[1 0] [0 1]])))
    ;; view a bar-chart of the distances
    (view (bar-chart (range 102) dists))
    ;; another example
    (mahalanobis-distance [-1.75 1.75] :y data)
    (mahalanobis-distance [-1.75 1.75]
    :y data
    :W (matrix [[1 0]
    [0 1]]))
  44. manhattan-distance
    fn ([a b])
    http://en.wikipedia.org/wiki/Manhattan_distance
    usual metric of Euclidean geometry is replaced by a new metric in which the distance between two points is the sum of the (absolute) differences of their coordinates. The taxicab metric is also known as rectilinear distance, L1 distance or l1 norm (see Lp space), city block distance, Manhattan distance, or Manhattan length
  45. mean
    fn ([x])
    Returns the mean of the data, x.
    Examples:
    (mean (sample-normal 100))
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/stat/tdouble/DoubleDescriptive.html
    http://en.wikipedia.org/wiki/Mean
  46. median
    fn ([x])
    Returns the median of the data, x.
    Examples:
    (median (sample-normal 100))
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/stat/tdouble/DoubleDescriptive.html
    http://en.wikipedia.org/wiki/Median
  47. minkowski-distance
    fn ([a b p])
    http://en.wikipedia.org/wiki/Minkowski_distance
    http://en.wikipedia.org/wiki/Lp_space
    The Minkowski distance is a metric on Euclidean space which can be considered as a generalization of both the Euclidean distance and the Manhattan distance.
    Minkowski distance is typically used with p being 1 or 2. The latter is the Euclidean distance, while the former is sometimes known as the Manhattan distance.
    In the limiting case of p reaching infinity we obtain the Chebyshev distance.
  48. n-grams
    fn ([n s])
    returns a set of the unique n-grams in a string.
    this is using actual sets here, discards dupicate n-grams?
  49. normalized-kendall-tau-distance
    fn ([a b])
    http://en.wikipedia.org/wiki/Kendall_tau_distance
    Kendall tau distance is the total number of discordant pairs.
  50. odds-ratio
    fn ([p1 p2])
    http://en.wikipedia.org/wiki/Odds_ratio
    Definition in terms of group-wise odds
    The odds ratio is the ratio of the odds of an event occurring in one group to the odds of it occurring in another group, or to a sample-based estimate of that ratio.
    Suppose that in a sample of 100 men, 90 have drunk wine in the previous week, while in a sample of 100 women only 20 have drunk wine in the same period. The odds of a man drinking wine are 90 to 10, or 9:1, while the odds of a woman drinking wine are only 20 to 80, or 1:4 = 0.25:1. The odds ratio is thus 9/0.25, or 36, showing that men are much more likely to drink wine than women.
    Relation to statistical independence
    If X and Y are independent, their joint probabilities can be expressed in terms of their marginal probabilities. In this case, the odds ratio equals one, and conversely the odds ratio can only equal one if the joint probabilities can be factored in this way. Thus the odds ratio equals one if and only if X and Y are independent.
  51. pairings
    fn ([a b])
    confusing ass name.
  52. pairs
    fn ([a b])
    returns unique pairs of a and b where members of a and b can not be paired with the correspoding slot in the other list.
  53. pdf-beta
    fn ([x & options])
    Returns the Beta pdf of the given value of x. It will return a sequence
    of values, if x is a sequence. This is equivalent to R's dbeta function.
    Options:
    :alpha (default 1)
    :beta (default 1)
    See also:
    cdf-beta and sample-beta
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/Beta.html
    http://en.wikipedia.org/wiki/Beta_distribution
    http://en.wikipedia.org/wiki/Probability_density_function
    Example:
    (pdf-beta 0.5 :alpha 1 :beta 2)
  54. pdf-binomial
    fn ([x & options])
    Returns the Bionomial pdf of the given value of x. It will return a sequence
    of values, if x is a sequence. Same as R's dbinom
    Options:
    :size (default 1)
    :prob (default 1/2)
    See also:
    cdf-binomial and sample-binomial
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/Binomial.html
    http://en.wikipedia.org/wiki/Binomial_distribution
    http://en.wikipedia.org/wiki/Probability_density_function
    Example:
    (pdf-binomial 10 :prob 1/4 :size 20)
  55. pdf-chisq
    fn ([x & options])
    Returns the Chi Square pdf of the given value of x. It will return a sequence
    of values, if x is a sequence. Same as R's dchisq function.
    Options:
    :df (default 1)
    See also:
    cdf-chisq and sample-chisq
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/ChiSquare.html
    http://en.wikipedia.org/wiki/Chi_square_distribution
    http://en.wikipedia.org/wiki/Probability_density_function
    Example:
    (pdf-chisq 5.0 :df 2)
  56. pdf-exp
    fn ([x & options])
    Returns the Exponential pdf of the given value of x. It will return a sequence
    of values, if x is a sequence. Same as R's dexp
    Options:
    :rate (default 1)
    See also:
    cdf-exp and sample-exp
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/Exponential.html
    http://en.wikipedia.org/wiki/Exponential_distribution
    http://en.wikipedia.org/wiki/Probability_density_function
    Example:
    (pdf-exp 2.0 :rate 1/2)
  57. pdf-f
    fn ([x & options])
    Returns the F pdf of the given value, x. It will return a sequence
    of values, if x is a sequence. This is equivalent to R's df function.
    Options:
    :df1 (default 1)
    :df2 (default 1)
    See also:
    cdf-f and quantile-f
    References:
    http://en.wikipedia.org/wiki/F_distribution
    http://mathworld.wolfram.com/F-Distribution.html
    http://en.wikipedia.org/wiki/Probability_density_function
    Example:
    (pdf-f 1.0 :df1 5 :df2 2)
  58. pdf-gamma
    fn ([x & options])
    Returns the Gamma pdf for the given value of x. It will return a sequence
    of values, if x is a sequence. This is equivalent to R's dgamma function.
    Options:
    :shape (default 1)
    :rate (default 1)
    See also:
    cdf-gamma and sample-gamma
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/Gamma.html
    http://en.wikipedia.org/wiki/Gamma_distribution
    http://en.wikipedia.org/wiki/Probability_density_function
    Example:
    (pdf-gamma 10 :shape 1 :rate 2)
  59. pdf-neg-binomial
    fn ([x & options])
    Returns the Negative Binomial pdf of the given value of x. It will return a sequence
    of values, if x is a sequence. Same as R's dnbinom
    Options:
    :size (default 10)
    :prob (default 1/2)
    See also:
    cdf-neg-binomial and sample-neg-binomial
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/NegativeBinomial.html
    http://en.wikipedia.org/wiki/Negative_binomial_distribution
    http://en.wikipedia.org/wiki/Probability_density_function
    Example:
    (pdf-neg-binomial 10 :prob 1/2 :size 20)
  60. pdf-normal
    fn ([x & options])
    Returns the Normal pdf of the given value, x. It will return a sequence
    of values, if x is a sequence. This is equivalent to R's dnorm function.
    Options:
    :mean (default 0)
    :sd (default 1)
    See also:
    cdf-normal, quantile-normal, sample-normal
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/Normal.html
    http://en.wikipedia.org/wiki/Normal_distribution
    http://en.wikipedia.org/wiki/Probability_density_function
    Example:
    (pdf-normal 1.96 :mean -2 :sd (sqrt 0.5))
  61. pdf-poisson
    fn ([x & options])
    Returns the Poisson pdf of the given value of x. It will return a sequence
    of values, if x is a sequence. Same as R's dpois
    Options:
    :lambda (default 1)
    See also:
    cdf-poisson and sample-poisson
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/Poisson.html
    http://en.wikipedia.org/wiki/Poisson_distribution
    http://en.wikipedia.org/wiki/Probability_density_function
    Example:
    (pdf-poisson 5 :lambda 10)
  62. pdf-t
    fn ([x & options])
    Returns the Student's t pdf for the given value of x. It will return a sequence
    of values, if x is a sequence. Same as R's dt function.
    Options:
    :df (default 1)
    See also:
    cdf-t, quantile-t, and sample-t
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/StudentT.html
    http://en.wikipedia.org/wiki/Student-t_distribution
    http://en.wikipedia.org/wiki/Probability_density_function
    Example:
    (pdf-t 1.2 :df 10)
  63. pdf-uniform
    fn ([x & options])
    Returns the Uniform pdf of the given value of x. It will return a sequence
    of values, if x is a sequence. This is equivalent to R's dunif function.
    Options:
    :min (default 0)
    :max (default 1)
    See also:
    cdf-uniform and sample-uniform
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/DoubleUniform.html
    http://en.wikipedia.org/wiki/Uniform_distribution
    http://en.wikipedia.org/wiki/Probability_density_function
    Example:
    (pdf-uniform 5)
    (pdf-uniform 5 :min 1 :max 10)
  64. permute
    fn ([x] [x y])
    If provided a single argument, returns a permuted version of the
    given collection. (perm x) is the same as (sample x).
    If provided two arguments, returns two lists that are permutations
    across the given collections. In other words, each of the new collections
    will contain elements from both of the given collections. Useful for
    permutation tests or randomization tests.
    Examples:
    (permute (range 10))
    (permute (range 10) (range 10 20))
  65. predict
    fn ([model x])
    Takes a linear-model and an x value (either a scalar or vector)
    and returns the predicted value based on the linear-model.
  66. principal-components
    fn ([x & options])
    Performs a principal components analysis on the given data matrix.
    Equivalent to R's prcomp function.
    Returns:
    A map with the following fields:
    :std-dev -- the standard deviations of the principal compoenents
    (i.e. the square roots of the eigenvalues of the correlation
    matrix, though the calculation is actually done with the
    singular values of the data matrix.
    :rotation -- the matrix of variable loadings (i.e. a matrix
    whose columns contain the eigenvectors).
    Examples:
    (use '(incanter core stats charts datasets))
    ;; load the iris dataset
    (def iris (to-matrix (get-dataset :iris)))
    ;; run the pca
    (def pca (principal-components (sel iris :cols (range 4))))
    ;; extract the first two principal components
    (def pc1 (sel (:rotation pca) :cols 0))
    (def pc2 (sel (:rotation pca) :cols 1))
    ;; project the first four dimension of the iris data onto the first
    ;; two principal components
    (def x1 (mmult (sel iris :cols (range 4)) pc1))
    (def x2 (mmult (sel iris :cols (range 4)) pc2))
    ;; now plot the transformed data, coloring each species a different color
    (doto (scatter-plot (sel x1 :rows (range 50)) (sel x2 :rows (range 50))
    :x-label "PC1" :y-label "PC2" :title "Iris PCA")
    (add-points (sel x1 :rows (range 50 100)) (sel x2 :rows (range 50 100)))
    (add-points (sel x1 :rows (range 100 150)) (sel x2 :rows (range 100 150)))
    view)
    ;; alternatively, the :group-by option can be used in scatter-plot
    (view (scatter-plot x1 x2
    :group-by (sel iris :cols 4)
    :x-label "PC1" :y-label "PC2" :title "Iris PCA"))
    References:
    http://en.wikipedia.org/wiki/Principal_component_analysis
  67. product-marginal-test
    fn ([j])
    the joint PMF of independent variables is equal to the product of their marginal PMFs.
  68. quantile
    fn ([x & options])
    Returns the quantiles of the data, x. By default it returns the min,
    25th-percentile, 50th-percentile, 75th-percentile, and max value.
    Options:
    :probs (default [0.0 0.25 0.5 0.75 1.0])
    Examples:
    (quantile (sample-normal 100))
    (quantile (sample-normal 100) :probs [0.025 0.975])
    (quantile (sample-normal 100) :probs 0.975)
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/stat/tdouble/DoubleDescriptive.html
    http://en.wikipedia.org/wiki/Quantile
  69. quantile-normal
    fn ([probability & options])
    Returns the inverse of the Normal CDF for the given probability.
    It will return a sequence of values, if given a sequence of
    probabilities. This is equivalent to R's qnorm function.
    Options:
    :mean (default 0)
    :sd (default 1)
    Returns:
    a value x, where (cdf-normal x) = probability
    See also:
    pdf-normal, cdf-normal, and sample-normal
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/stat/tdouble/Probability.html
    http://en.wikipedia.org/wiki/Normal_distribution
    http://en.wikipedia.org/wiki/Quantile
    Example:
    (quantile-normal 0.975)
    (quantile-normal [0.025 0.975] :mean -2 :sd (sqrt 0.5))
  70. quantile-t
    fn ([probability & options])
    Returns the inverse of the Student's t CDF for the given probability
    (i.e. the quantile). It will return a sequence of values, if x is
    a sequence of probabilities. This is equivalent to R's qt function.
    Options:
    :df (default 1)
    Returns:
    a value x, where (cdf-t x) = probability
    See also:
    pdf-t, cdf-t, and sample-t
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/stat/tdouble/Probability.html
    http://en.wikipedia.org/wiki/Student-t_distribution
    http://en.wikipedia.org/wiki/Quantile
    Example:
    (quantile-t 0.975)
    (quantile-t [0.025 0.975] :df 25)
    (def df [1 2 3 4 5 6 7 8 9 10 20 50 100 1000])
    (map #(quantile-t 0.025 :df %) df)
  71. rank-index
    fn ([x])
    given a seq, returns a map where the keys are the values of the seq and the values are the positional rank of each member o the seq.
  72. sample
    fn ([x & options])
    Returns a sample of the given size from the given collection. If replacement
    is set to false it returns a set, otherwise it returns a list.
    Arguments:
    x -- collection to be sampled from
    Options:
    :size -- (default (count x) sample size
    :replacement (default true) -- sample with replacement
    Examples:
    (sample (range 10)) ; permutation of numbers zero through ten
    (sample [:red :green :blue] :size 10) ; choose 10 items that are either :red, :green, or :blue.
    (sample (seq "abcdefghijklmnopqrstuvwxyz") :size 4 :replacement false) ; choose 4 random letters.
  73. sample-beta
    fn ([size & options])
    Returns a sample of the given size from a Beta distribution.
    This is equivalent to R's rbeta function.
    Options:
    :alpha (default 1)
    :beta (default 1)
    These default values produce a Uniform distribution.
    See also:
    pdf-beta and cdf-beta
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/Beta.html
    http://en.wikipedia.org/wiki/Beta_distribution
    Example:
    (sample-beta 1000 :alpha 1 :beta 2)
  74. sample-binomial
    fn ([size & options])
    Returns a sample of the given size from a Binomial distribution.
    Same as R's rbinom
    Options:
    :size (default 1)
    :prob (default 1/2)
    See also:
    cdf-binomial and sample-binomial
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/Binomial.html
    http://en.wikipedia.org/wiki/Binomial_distribution
    Example:
    (sample-binomial 1000 :prob 1/4 :size 20)
  75. sample-chisq
    fn ([size & options])
    Returns a sample of the given size from a Chi Square distribution
    Same as R's rchisq function.
    Options:
    :df (default 1)
    See also:
    pdf-chisq and cdf-chisq
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/ChiSquare.html
    http://en.wikipedia.org/wiki/Chi_square_distribution
    Example:
    (sample-chisq 1000 :df 2)
  76. sample-dirichlet
    fn ([size alpha])
    Examples:
    (use '(incanter core stats charts))
    ;; a total of 1447 adults were polled to indicate their preferences for
    ;; candidate 1 (y1=727), candidate 2 (y2=583), or some other candidate or no
    ;; preference (y3=137).
    ;; the counts y1, y2, and y3 are assumed to have a multinomial distribution
    ;; If a uniform prior distribution is assigned to the multinomial vector
    ;; theta = (th1, th2, th3), then the posterior distribution of theta is
    ;; proportional to g(theta) = th1^y1 * th2^y2 * th3^y3, which is a
    ;; dirichlet distribution with parameters (y1+1, y2+1, y3+1)
    (def theta (sample-dirichlet 1000 [(inc 727) (inc 583) (inc 137)]))
    ;; view means, 95% CI, and histograms of the proportion parameters
    (mean (sel theta :cols 0))
    (quantile (sel theta :cols 0) :probs [0.0275 0.975])
    (view (histogram (sel theta :cols 0)))
    (mean (sel theta :cols 1))
    (quantile (sel theta :cols 1) :probs [0.0275 0.975])
    (view (histogram (sel theta :cols 1)))
    (mean (sel theta :cols 2))
    (quantile (sel theta :cols 2) :probs [0.0275 0.975])
    (view (histogram (sel theta :cols 2)))
    ;; view a histogram of the difference in proportions between the first
    ;; two candidates
    (view (histogram (minus (sel theta :cols 0) (sel theta :cols 1))))
  77. sample-exp
    fn ([size & options])
    Returns a sample of the given size from a Exponential distribution.
    Same as R's rexp
    Options:
    :rate (default 1)
    See also:
    pdf-exp, and cdf-exp
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/Exponential.html
    http://en.wikipedia.org/wiki/Exponential_distribution
    Example:
    (sample-exp 1000 :rate 1/2)
  78. sample-gamma
    fn ([size & options])
    Returns a sample of the given size from a Gamma distribution.
    This is equivalent to R's rgamma function.
    Options:
    :shape (default 1)
    :rate (default 1)
    See also:
    pdf-gamma, cdf-gamma, and quantile-gamma
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/Gamma.html
    http://en.wikipedia.org/wiki/Gamma_distribution
    Example:
    (sample-gamma 1000 :shape 1 :rate 2)
  79. sample-inv-wishart
    fn ([& options])
    Returns a p-by-p symmetric distribution drawn from an inverse-Wishart distribution
    Options:
    :p (default 2) -- number of dimensions of resulting matrix
    :df (default p) -- degree of freedoms (aka n), df <= p
    :scale (default (identity-matrix p)) -- positive definite matrix (aka V)
    Examples:
    (use 'incanter.stats)
    (sample-inv-wishart :df 10 :p 4)
    ;; calculate the mean of 1000 wishart matrices, should equal (mult df scale)
    (div (reduce plus (for [_ (range 1000)] (sample-wishart :p 4))) 1000)
    References:
    http://en.wikipedia.org/wiki/Inverse-Wishart_distribution
  80. sample-mvn
    fn ([size & options])
    Returns a sample of the given size from a Multivariate Normal
    distribution. This is equivalent to R's mvtnorm::rmvnorm function.
    Arguments:
    size -- the size of the sample to return
    Options:
    :mean (default (repeat (ncol sigma) 0))
    :sigma (default (identity-matrix (count mean)))
    Examples:
    (use '(incanter core stats charts))
    (def mvn-samp (sample-mvn 1000 :mean [7 5] :sigma (matrix [[2 1.5] [1.5 3]])))
    (covariance mvn-samp)
    (def means (map mean (trans mvn-samp)))
    ;; plot scatter-plot of points
    (def mvn-plot (scatter-plot (sel mvn-samp :cols 0) (sel mvn-samp :cols 1)))
    (view mvn-plot)
    ;; add centroid to plot
    (add-points mvn-plot [(first means)] [(second means)])
    ;; add regression line to scatter plot
    (def x (sel mvn-samp :cols 0))
    (def y (sel mvn-samp :cols 1))
    (def lm (linear-model y x))
    (add-lines mvn-plot x (:fitted lm))
    References:
    http://en.wikipedia.org/wiki/Multivariate_normal
  81. sample-neg-binomial
    fn ([size & options])
    Returns a sample of the given size from a Negative Binomial distribution.
    Same as R's rnbinom
    Options:
    :size (default 10)
    :prob (default 1/2)
    See also:
    pdf-neg-binomial and cdf-neg-binomial
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/NegativeBinomial.html
    http://en.wikipedia.org/wiki/Negative_binomial_distribution
    Example:
    (sample-neg-binomial 1000 :prob 1/2 :size 20)
  82. sample-normal
    fn ([size & options])
    Returns a sample of the given size from a Normal distribution
    This is equivalent to R's rnorm function.
    Options:
    :mean (default 0)
    :sd (default 1)
    See also:
    pdf-normal, cdf-normal, quantile-normal
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/Normal.html
    http://en.wikipedia.org/wiki/Normal_distribution
    Example:
    (sample-normal 1000 :mean -2 :sd (sqrt 0.5))
  83. sample-permutations
    fn ([n x] [n x y])
    If provided a two arguments (n x), it returns a list of n permutations
    of x. If provided three (n x y) arguments, returns a list with two with n permutations of
    each arguments, where each permutation is drawn from the pooled arguments.
    Arguments:
    n -- number of randomized versions of the original two groups to return
    x -- group 1
    y -- (default nil) group 2
    Examples:
    (use '(incanter core stats))
    (sample-permutations 10 (range 10))
    (sample-permutations 10 (range 10) (range 10 20))
    ;; extended example with plant-growth data
    (use '(incanter core stats datasets charts))
    ;; load the plant-growth dataset
    (def data (to-matrix (get-dataset :plant-growth)))
    ;; break the first column of the data into groups based on treatment (second column).
    (def groups (group-by data 1 :cols 0))
    ;; define a function for the statistic of interest
    (defn means-diff [x y] (minus (mean x) (mean y)))
    ;; calculate the difference in sample means between the two groups
    (def samp-mean-diff (means-diff (first groups) (second groups))) ;; 0.371
    ;; create 500 permuted versions of the original two groups
    (def permuted-groups (sample-permutations 1000 (first groups) (second groups)))
    ;; calculate the difference of means of the 500 samples
    (def permuted-means-diffs1 (map means-diff (first permuted-groups) (second permuted-groups)))
    ;; use an indicator function that returns 1 when the randomized means diff is greater
    ;; than the original sample mean, and zero otherwise. Then take the mean of this sequence
    ;; of ones and zeros. That is the proportion of times you would see a value more extreme
    ;; than the sample mean (i.e. the p-value).
    (mean (indicator #(> % samp-mean-diff) permuted-means-diffs1)) ;; 0.088
    ;; calculate the 95% confidence interval of the null hypothesis. If the
    ;; sample difference in means is outside of this range, that is evidence
    ;; that the two means are statistically significantly different.
    (quantile permuted-means-diffs1 :probs [0.025 0.975]) ;; (-0.606 0.595)
    ;; Plot a histogram of the permuted-means-diffs using the density option,
    ;; instead of the default frequency, and then add a normal pdf curve with
    ;; the mean and sd of permuted-means-diffs data for a visual comparison.
    (doto (histogram permuted-means-diffs1 :density true)
    (add-lines (range -1 1 0.01) (pdf-normal (range -1 1 0.01)
    :mean (mean permuted-means-diffs1)
    :sd (sd permuted-means-diffs1)))
    view)
    ;; compare the means of treatment 2 and control
    (def permuted-groups (sample-permutations 1000 (first groups) (last groups)))
    (def permuted-means-diffs2 (map means-diff (first permuted-groups) (second permuted-groups)))
    (def samp-mean-diff (means-diff (first groups) (last groups))) ;; -0.4939
    (mean (indicator #(< % samp-mean-diff) permuted-means-diffs2)) ;; 0.022
    (quantile permuted-means-diffs2 :probs [0.025 0.975]) ;; (-0.478 0.466)
    ;; compare the means of treatment 1 and treatment 2
    (def permuted-groups (sample-permutations 1000 (second groups) (last groups)))
    (def permuted-means-diffs3 (map means-diff (first permuted-groups) (second permuted-groups)))
    (def samp-mean-diff (means-diff (second groups) (last groups))) ;; -0.865
    (mean (indicator #(< % samp-mean-diff) permuted-means-diffs3)) ;; 0.002
    (quantile permuted-means-diffs3 :probs [0.025 0.975]) ;; (-0.676 0.646)
    (doto (box-plot permuted-means-diffs1)
    (add-box-plot permuted-means-diffs2)
    (add-box-plot permuted-means-diffs3)
    view)
    Further Reading:
    http://en.wikipedia.org/wiki/Resampling_(statistics)
  84. sample-poisson
    fn ([size & options])
    Returns a sample of the given size from a Poisson distribution.
    Same as R's rpois
    Options:
    :lambda (default 1)
    See also:
    pdf-poisson and cdf-poisson
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/Poisson.html
    http://en.wikipedia.org/wiki/Poisson_distribution
    Example:
    (sample-poisson 1000 :lambda 10)
  85. sample-t
    fn ([size & options])
    Returns a sample of the given size from a Student's t distribution.
    Same as R's rt function.
    Options:
    :df (default 1)
    See also:
    pdf-t, cdf-t, and quantile-t
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/StudentT.html
    http://en.wikipedia.org/wiki/Student-t_distribution
    Example:
    (cdf-t 1000 :df 10)
  86. sample-uniform
    fn ([size & options])
    Returns a sample of the given size from a Uniform distribution.
    This is equivalent to R's runif function.
    Options:
    :min (default 0)
    :max (default 1)
    :integers (default false)
    See also:
    pdf-uniform and cdf-uniform
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/random/tdouble/DoubleUniform.html
    http://en.wikipedia.org/wiki/Uniform_distribution
    Example:
    (sample-uniform 1000)
    (sample-uniform 1000 :min 1 :max 10)
  87. sample-wishart
    fn ([& options])
    Returns a p-by-p symmetric distribution drawn from a Wishart distribution
    Options:
    :p (default 2) -- number of dimensions of resulting matrix
    :df (default p) -- degree of freedoms (aka n), df <= p
    :scale (default (identity-matrix p)) -- positive definite matrix (aka V)
    Examples:
    (use 'incanter.stats)
    (sample-wishart :df 10 :p 4)
    ;; calculate the mean of 1000 wishart matrices, should equal (mult df scale)
    (div (reduce plus (for [_ (range 1000)] (sample-wishart :p 4))) 1000)
    References:
    http://en.wikipedia.org/wiki/Wishart_distribution#
  88. sd
    fn ([x])
    Returns the sample standard deviation of the data, x. Equivalent to
    R's sd function.
    Examples:
    (sd (sample-normal 100))
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/stat/tdouble/DoubleDescriptive.html
    http://en.wikipedia.org/wiki/Standard_deviation
  89. simple-regression
    fn ([y x & options])
    A stripped version of linear-model that returns a map containing only
    the coefficients.
  90. skewness
    fn ([x])
    Returns the skewness of the data, x. "Skewness is a measure of the asymmetry
    of the probability distribution of a real-valued random variable." (Wikipedia)
    Examples:
    (skewness (sample-normal 100000)) ;; approximately 0
    (skewness (sample-gamma 100000)) ;; approximately 2
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/stat/tdouble/DoubleDescriptive.html
    http://en.wikipedia.org/wiki/Skewness
  91. smooth-discrete-probs
    fn ([probs buckets])
    smooth a map of discrete probabilities.
    clear up any discrete steps that are missing and should be there.
    TODO: single class may have a spike of 100% probability.
  92. sorensen-index
    fn ([a b])
    http://en.wikipedia.org/wiki/S%C3%B8rensen_similarity_index#cite_note-4
    The Sørensen index, also known as Sørensen’s similarity coefficient, is a statistic used for comparing the similarity of two samples. where A and B are the species numbers in samples A and B, respectively, and C is the number of species shared by the two samples.
    The Sørensen index is identical to Dice's coefficient which is always in [0, 1] range. Sørensen index used as a distance measure, 1 − QS, is identical to Hellinger distance and Bray–Curtis dissimilarity.
    The Sørensen coefficient is mainly useful for ecological community data (e.g. Looman & Campbell, 1960[3]). Justification for its use is primarily empirical rather than theoretical (although it can be justified theoretically as the intersection of two fuzzy sets[4]). As compared to Euclidean distance, Sørensen distance retains sensitivity in more heterogeneous data sets and gives less weight to outliers
    This function assumes you pass in a and b as sets.
    The sorensen index extended to abundance instead of incidence of species is called the Czekanowski index.
  93. spearmans-rho
    fn ([a b])
    http://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient
    In statistics, Spearman's rank correlation coefficient or Spearman's rho, is a non-parametric measure of correlation – that is, it assesses how well an arbitrary monotonic function could describe the relationship between two variables, without making any other assumptions about the particular nature of the relationship between the variables. Certain other measures of correlation are parametric in the sense of being based on possible relationships of a parameterised form, such as a linear relationship.
  94. square-devs-from-mean
    fn ([x] [x m])
    takes either a sample or a sample and a precalculated mean.
    returns the squares of the difference between each observation and the sample mean.
  95. sum-of-square-devs-from-mean
    fn ([x] [x m])
    takes either a sample or a sample and a precalculated mean.
    returns the sum of the squares of the difference between each observation and the sample mean.
  96. sum-variance-test
    fn ([vs])
    the variance of the sum of n independent variables is equal to the sum of their variances.
    (variance-independence-test [[1 2 3 4] [1 2 3 4]]) -> 5/2
  97. sweep
    fn ([x & options])
    Return an array obtained from an input array by sweeping out a
    summary statistic. Based to R's sweep function.
    Arguments:
    x is an sequence
    Options:
    :stat (default mean) the statistic to sweep out
    :fun (defaul minus) the function used to sweep the stat out
    Example:
    (use '(incanter core stats))
    (def x (sample-normal 30 :mean 10 :sd 5))
    (sweep x) ;; center the data around mean
    (sweep x :stat sd :fun div) ;; divide data by its sd
  98. t-test
    fn ([x & options])
    Argument:
    x : sample to test
    Options:
    :y (default nil)
    :mu (default (mean y) or 0) population mean
    :alternative (default :two-sided) other choices :less :greater
    :var-equal TODO (default false) variance equal
    :paired TODO (default false) paired test
    :conf-level (default 0.95) for returned confidence interval
    Examples:
    (t-test (range 1 11) :mu 0)
    (t-test (range 1 11) :mu 0 :alternative :less)
    (t-test (range 1 11) :mu 0 :alternative :greater)
    (t-test (range 1 11) :y (range 7 21))
    (t-test (range 1 11) :y (range 7 21) :alternative :less)
    (t-test (range 1 11) :y (range 7 21) :alternative :greater)
    (t-test (range 1 11) :y (conj (range 7 21) 200))
    References:
    http://en.wikipedia.org/wiki/T_test
    http://www.socialresearchmethods.net/kb/stat_t.php
  99. tabulate
    fn ([x & options])
    Cross-tabulates the values of the given numeric matrix.
    Returns a hash-map with the following fields:
    :table -- the table of counts for each combination of values,
    this table is only returned if x has two-columns
    :levels -- a sequence of sequences, where each sequence list
    the levels (possible values) of the corresponding
    column of x.
    :margins -- a sequence of sequences, where each sequence
    represents the marginal total for each level
    of the corresponding column of x.
    :counts -- a hash-map, where vectors of unique combinations
    of the cross-tabulated levels are the keys and the
    values are the total count of each combination.
    :N -- the grand-total for the contingency table
    Examples:
    (use '(incanter core stats))
    (tabulate [1 2 3 2 3 2 4 3 5])
    (tabulate (sample-poisson 100 :lambda 5))
    (use '(incanter core stats datasets))
    (def math-prog (to-matrix (get-dataset :math-prog)))
    (tabulate (sel math-prog :cols [1 2]))
    (def data (matrix [[1 0 1]
    [1 1 1]
    [1 1 1]
    [1 0 1]
    [0 0 0]
    [1 1 1]
    [1 1 1]
    [1 0 1]
    [1 1 0]]))
    (tabulate data)
    (def data (matrix [[1 0]
    [1 1]
    [1 1]
    [1 0]
    [0 0]
    [1 1]
    [1 1]
    [1 0]
    [1 1]]))
    (tabulate data)
  100. tanimoto-coefficient
    fn ([a b])
    http://en.wikipedia.org/wiki/Jaccard_index
    The cosine similarity metric may be extended such that it yields the Jaccard coefficient in the case of binary attributes. This is the Tanimoto coefficient.
  101. variance
    fn ([x])
    Returns the sample variance of the data, x. Equivalent to R's var function.
    Examples:
    (variance (sample-normal 100))
    References:
    http://incanter.org/docs/parallelcolt/api/cern/jet/stat/tdouble/DoubleDescriptive.html
    http://en.wikipedia.org/wiki/Sample_variance#Population_variance_and_sample_variance
  102. within
    fn ([z x y])
    y is within z of x in metric space.
[ - ] incanter.transformations
  1. all-keys
    fn ([m])
    returns a set of all the keys from an arbitarily deeply nested map or seq of maps.
  2. bottom-level?
    fn ([m])
    given a map; is this the bottom level in the map?
    (bottom-level? {:a 1}) -> true
    (bottom-level? {:a {:b 1}}) -> false
  3. flatten-with
    fn ([f nested-map])
    takes an arbitrarily deeply nested map, and flattens it to one level by merging keys.
    (flatten-with str {:a {:b {:c 1}}}) -> {":a:b:c" 1}
  4. key-compare
    fn ([x y])
  5. kv-compare
    fn ([[k1 v1] [k2 v2]])
  6. levels-deep
    fn ([m])
    returns the number of levels of depth of nesting for a nested map.
    1 -> 0
    {} -> 0
    {0 1} -> 1
    {1 {0 1}} -> 2
    ...
  7. map-compare
    fn ([k])
  8. map-from-keys
    fn ([a f])
  9. map-from-nested-map
    fn ([a f])
  10. map-from-pairs
    fn ([a f])
  11. map-map
    fn ([f x])
  12. same-length?
    fn ([a b])
  13. set-to-unit-map
    fn ([s])
  14. sort-map
    fn ([m])
  15. sort-map-of-maps
    fn ([m])
  16. sort-maps-by
    fn ([k maps])
  17. table-to-vectors
    fn ([z])
    takes a big vector that is composed of two vectors of alternating membership in the super vector.
    splits out the individual vectors.
    [106 7
    86 0
    100 27
    101 50
    99 28
    103 29
    97 20
    113 12
    112 6
    110 17]
    ->
    [[106 86 100 101 99 103 97 113 112 110] [7 0 27 50 28 29 20 12 6 17]]