GOAL: simulate data for testing.
Simulation of throwing a dice 4 times:
<pre><code class="lang-sql">sample(1:6, 4, replace = TRUE)
</code></pre>
Replaced = TRUE insert the data after each launch. You can set up the probability of each event. Es. throw a coin 100 times with 30% tails probability (0) and 70% heads (1)
<pre><code class="lang-sql">sample(c(0,1), 100, replace = TRUE, prob = c(0.3, 0.7))
</code></pre>
Each random variable has a function for simulation (named r[RVname]). Es. binomial (like previous example):
<pre><code class="lang-sql">rbinom(1, size = 100, prob = 0.7) #heads number, probability 70%
rbinom(100, size = 1, prob = 0.7) #results on 100 flips
</code></pre>
Es. normal:
<pre><code class="lang-sql">rnorm(10) #10 random numbers from standard normal (avg 0, sd 1)
rnorm(10,100,25) #10 random number from normal with avg 100 and sd 25
</code></pre>
replicate() to replicate an operation n times. Es. simulate 100 groups of random numbers, each of them with 5 values generated by a Poisson with average = 10:
<pre><code class="lang-sql">replicate(100, rpois(5, 10))

#colMeans() to see the average of each column:
colMeans(replicate(100, rpois(5, 10)))

#histogram: we will see that are distributed like a normal (central limit theorem)
hist(colMeans(replicate(100, rpois(5, 10))))
</code></pre>
With set.seed() you can replicate every time the sample you used:
<pre><code class="lang-sql">set.seed(125) #125 is a random integer
sample(1:6, 4, replace = TRUE)
</code></pre>
If you use this function with set.seed(125) the results will be always the same.

GOAL: simulate data for testing.

Simulation of throwing a dice 4 times:

```sql
sample(1:6, 4, replace = TRUE)
```

Replaced = TRUE insert the data after each launch.  
You can set up the probability of each event.  
Es. throw a coin 100 times with 30% tails probability (0) and 70% heads (1)

```sql
sample(c(0,1), 100, replace = TRUE, prob = c(0.3, 0.7))
```

Each random variable has a function for simulation (named r\[RVname\]).  
Es. binomial (like previous example):

```sql
rbinom(1, size = 100, prob = 0.7)   #heads number, probability 70%
rbinom(100, size = 1, prob = 0.7)   #results on 100 flips
```

Es. normal:

```sql
rnorm(10)   #10 random numbers from standard normal (avg 0, sd 1)
rnorm(10,100,25)   #10 random number from normal with avg 100 and sd 25
```

replicate() to replicate an operation n times.  
Es. simulate 100 groups of random numbers, each of them with 5 values generated by a Poisson with average = 10:

```sql
replicate(100, rpois(5, 10))

#colMeans() to see the average of each column:
colMeans(replicate(100, rpois(5, 10)))

#histogram: we will see that are distributed like a normal (central limit theorem)
hist(colMeans(replicate(100, rpois(5, 10))))
```

With set.seed() you can replicate every time the sample you used:

```sql
set.seed(125)   #125 is a random integer
sample(1:6, 4, replace = TRUE)
```

If you use this function with set.seed(125) the results will be always the same.

R simulations

data engineer • T-SQL developer • r and python • Power BI and tableau • Azure • machine learning enthusiast • data lover