Lag Is Never Where You Want It… Or Don’t Want It

RMAG news

I had a recent issue with my raycasting engine (originally made here that was really annoying me. I was drawing the walls just fine, however, I started drawing a floor and ceiling and my performance dropped from 60fps to under 10fps.

Locating The Problem (I Thought)

I had a few things going for me, one was that I could easily comment out the rendering for the ceiling and the floor independently.
From doing this, I found that I could render the floor and the walls at the 60fps I wanted (or ceiling and walls, but that doesn’t make much sense).
Well obviously that means that the functions that I use for drawing the floors and ceilings are causing the lag.
Now here is the code for that (I think, could be slightly off due to it being a past version, but in general, this is the idea).

func (c *Camera) drawCeil(g *Game, screen *ebiten.Image, pw, ph int) {
hw := screenWidth / 2
hh := screenHeight / 2
ht := tileSize / 2

// Magic value for correctly mapping to
// screen
d := float64(hw) / math.Tan(toRadians(c.fov/2))

// The image we will eventually draw to
// and the pixels that we will be
// drawing to that image
ceilImage := ebiten.NewImage(screenWidth, screenHeight/2)
ceilPixels := make([]byte, screenWidth*hh*4)

// The image that we will be projecting
// onto the ceiling
imgPixels := g.levels[g.curLevel].ceilImage
imgW := g.levels[g.curLevel].ceilImageWidth
imgH := g.levels[g.curLevel].ceilImageHeight

// Every row (literal pixels)
for row := 0; row < hh; row += ph {
// Calculate the angle if you were to
// cast a ray from the camera to that
// position on screen
ra2 := toDegrees(math.Atan(float64(row) / d))

// Keeps the angle between 0 and 360
ra2 = boundAngle(ra2)

// Ever col (literal pixels)
for ri := hw; ri < hw; ri += pw {
ra := toDegrees(math.Atan(float64(ri) / d))
ra = boundAngle(ra)

z := float64(ht) c.z

// We need to figure out the x and y
// coordinates of a line to the floor

disX := z / math.Tan(toRadians(ra2))
disY := disX * math.Tan(toRadians(ra))
dis := math.Sqrt(math.Pow(disX, 2) + math.Pow(disY, 2))

// Now we need to shift everything
// relative to the camera
a := ra + c.angleH
a = toRadians(boundAngle(a))
x := math.Cos(a)*dis + (c.x / 2)
y := math.Sin(a)*dis + (c.y / 2)

// Make sure that the value we get is
// within a workable range
subX := int(x) % ht
subY := int(y) % ht

// Same as before but for negatives
for subX < 0 {
subX += ht
}
for subY < 0 {
subY += ht
}

// Map that value to refer to pixel
// coordinates on the ceiling image
subX = subX * imgW / ht
subY = subY * imgH / ht

// Get the pixel corresponding on the
// ceiling image
pixel := imgPixels[(subY*imgW+subX)*4 : (subY*imgW+subX)*4+3]

// Shade it (further is darker)
pixelColor := c.applyDistanceByte(pixel, dis+tileSize)

// If we have bigger pixel sizes, draw
// the entire pixel
for i := range pw {
for j := range ph {
// Apply this to the final image
writeIndex := ((hhrowph+j)*screenWidth + (ri + hw + i)) * 4
copy(ceilPixels[writeIndex:writeIndex+4], pixelColor)
}
}
}
}

// Draw the pixels we calculated
// onto the screen
ceilImage.WritePixels(ceilPixels)
screen.DrawImage(ceilImage, nil)
}

Yeah, I know, some scary code, but for the most part it’s not as bad as you think and you don’t have to stress much about the implementation. What you do have to keep in mind is that we want this function to be called 60 times a second.

Blame The GPU

Reading ebiten docs and source code they very clearly state that writing RGBA bytes to an image is slow, and shouldn’t really be done every frame, at least many times per frame. This is where I thought my performance was being lost, and it sent me looking for answers for ages. Turns out, I was wrong and right.

Further Digging

I tool I had written into my code you may have noticed is that I can change the width and height of pixels by will. This is extremely useful because lower resolutions allow me to save on calculations, and therefore gives me better performance.
I decided that I would put a counter on every time I used an expensive maths function, and change the values of pixel width and height so I could see how they changed.
I also restarted the counters every frame so I was getting the calls per frame.

# fastInvSqrt
PixelWidth: 4
Calls: 153,920
PixelWidth: 2
Calls: 615,040
PixelWidth: 1
Calls: 2,458,880

# Sqrt
PixelWidth: 4
Calls: 51,698
PixelWidth: 2
Calls: 205,797
PixelWidth: 1
Calls: 821,025

# Pow
PixelWidth: 4
Calls: 103,396
PixelWidth: 2
Calls: 411,594
PixelWidth: 1
Calls: 1,642,050

# sin
PixelWidth: 4
Calls: 51,200
PixelWidth: 2
Calls: 204,800
PixelWidth: 1
Calls: 819,200

# cos
PixelWidth: 4
Calls: 515,20
PixelWidth: 2
Calls: 205,440
PixelWidth: 1
Calls: 820,480

# tan
PixelWidth: 4
Calls: 103,043
PixelWidth: 2
Calls: 410,883
PixelWidth: 1
Calls: 1,640,963

# atan
PixelWidth: 4
Calls: 51,680
PixelWidth: 2
Calls: 205,760
PixelWidth: 1
Calls: 821,120

You may notice straight away that I was calling fastInvSqrt over 2 million times every single frame. Now such a high number compared to the others actually makes sense. I call this function for each R, G, B and A for every pixel, so it is going to get absolutely abused. Here is the code I was originally using.

func fastInvSqrt(x float32) float32 {
i := *(*int32)(unsafe.Pointer(&x))
i = 0x5f3759df i>>1
y := *(*float32)(unsafe.Pointer(&i))
//return y * (1.5 – 0.5*x*y*y)
return y
}

Yes, it’s the fast inverse square root from quake 3, but because I was really worried about performance I commented out the first approximation so that it’s doing less work.
Now I needed a way to make this even faster. So, I decided to look at what inputs I was getting, and a printed them all out as integers. Doing this, I found that I was always under 255. This is great because it means I can use a single byte later. So how did I use this information to improve the performance?

var fastInvSqrtMap map[byte]float32 = make(map[byte]float32, 255)

func fastInvSqrt(x float32) float32 {
return fastInvSqrtMap[byte(x)]
}

func actualFastInvSqrt(x float32) float32 {
i := *(*int32)(unsafe.Pointer(&x))
i = 0x5f3759df i>>1
y := *(*float32)(unsafe.Pointer(&i))
return y * (1.5 0.5*x*y*y)
}

func initialiseMap() {
var i byte
for i = 0; i < 255; i++ {
fastInvSqrtMap[i] = actualFastInvSqrt(float32(i))
}
}

What I do now is call initialiseMap at the very start of the game once. Looks like a great optimization right? Well, I noticed a frame or 2 of performance, but still sub-10fps.

Sin, Cos, Tan

The truth is you don’t need the absolute precision of the in-built trigonometric functions. This is similar to how NASA only used 16 digits of pi. With this knowledge I gathered I could get away with rounding my angles to the nearest integer and just using that.

var cosMap [360]float64
var sinMap [360]float64
var tanMap [360]float64

func Cos(x float64) float64 {
return cosMap[int(x)]
}

func Sin(x float64) float64 {
return sinMap[int(x)]
}

func Tan(x float64) float64 {
return tanMap[int(x)]
}

func initialiseMap() {
var i byte
for i = 0; i < 255; i++ {
fastInvSqrtMap[i] = actualFastInvSqrt(float32(i))
}

for i := 0; i < 360; i++ {
cosMap[i] = math.Cos(ToRadians(float64(i)))
sinMap[i] = math.Sin(ToRadians(float64(i)))
tanMap[i] = math.Tan(ToRadians(float64(i)))
}
}

Now we’re cutting a few more corners, but, my program never has to calculate sin, cos, or tan ever again! This also increased my performance slightly by a frame or 2. You might think I wasted my time doing this, but I really don’t think so. Just imagine a case where I’m using these function billions of times, whether it’s because of a larger screen resolution, or I’m working on a different problem entirely, then it’s great that I have this in the back of my mind.

Out of the Loop

This is where I gain the rest of my performance back, and it kind of me but this comes in 2 stages. It turns out that the Atan and Sqrt I use in drawing the ceiling are particularly slow, because they’re being called for every pixel on the screen. Stage 1 was literally just moving the code out of the loop.
This still didn’t fix everything, and now for stage 2 we need to figure out how to make these function calls easier.
The problem is that we need to use these to find out the angle of each ray and how far they travel. This is when I realized I can also pre-calculate this at the start of the game.

type AngleDataPoint struct {
ra float64
dis float64
}

type AngleDataRow struct {
points []AngleDataPoint
}

var angleCacheData []AngleDataRow

func (g *Config) InitFloorCeilAngleCache(c *Camera) {
hw := g.HalfWidth
hh := g.HalfHeight
ht := g.TileSize / 2

pw := g.PixelWidth
ph := g.PixelHeight

angleCacheData = make([]AngleDataRow, hh)

d := float64(hw) / Tan(c.Fov/2)

z := (float64(ht) c.Pos.Z) / 2

for row := 0; row < hh; row += ph {
angleCacheData[row] = AngleDataRow{
points: make([]AngleDataPoint, g.ScreenWidth),
}

disX := z / (float64(row) / d)

for ri := hw; ri < hw; ri += pw {
// We need to figure out the x and y coordinates of a line to the floor.
ra := ToDegrees(math.Atan(float64(ri) / d))
ra = BoundAngle(ra)

disY := disX * (float64(ri) / d)
dis := Hyp(disX, disY)

angleCacheData[row].points[hw+ri] = AngleDataPoint{
ra: ra,
dis: dis,
}
}
}
}

Now we’re doing a lot less in the actual drawing, which is below.

func (c *Camera) drawCeil(g *Config, screen *BoundsImage, ceil BoundsImage, pw, ph int) {
hw := g.HalfWidth
hh := g.HalfHeight
ht := g.TileSize / 2

for row := 0; row < hh; row += ph {
rowCache := angleCacheData[row]

for ri := hw; ri < hw; ri += pw {
point := rowCache.points[hw+ri]
ra := point.ra
dis := point.dis

a := ra + c.AngleH
a = BoundAngle(a)
x := Cos(a)*dis + (c.Pos.X / 2)
y := Sin(a)*dis + (c.Pos.Y / 2)

subX := int(x) % ht
subY := int(y) % ht

for subX < 0 {
subX += ht
}
for subY < 0 {
subY += ht
}

subX = subX * ceil.ImgW / ht
subY = subY * ceil.ImgH / ht

pixel := ceil.ImgPixels[(subY*ceil.ImgW+subX)*4 : (subY*ceil.ImgW+subX)*4+3]
pixelColor := c.applyDistanceByte(g, pixel, dis+float64(g.TileSize))

screen.DrawRect(ri+hw, hhrowph, pw, ph, pixelColor)
}
}
}

Notice that Sin and Cos aren’t math.Sin and math.Cos, that’s because these functions are calling the cached values.

Code Is Now Fast… Barely

You may think that after all that and getting back up to 60fps everything is fine, but I can tell it isn’t. This is because of an assumption we made right at the beginning. And that was that only drawing the ceiling or floor was causing the problem. Turns out, the code was able to draw both the ceiling and floor at 60fps if the walls weren’t being drawn. Well, that means that there’s some point in the raycasting code that isn’t great and can definitely be optimized. I think it definitely has to do with the use of square root to find the distance, which is going to be slow, but I’ll have to work on a solution for that later.

Conclusion

I’ve been chasing this issue for quite a while now so I’m happy I’ve somewhat fixed it. In general if you’re finding that you’ve come across a performance issue, it’s good if you’ve created small tools in your codebase like I had so that you can easily find a possible cause.
One thing that I messed up was putting a calculation inside of a loop that didn’t need to be. This is a common mistake and you should always be watching out for it.
Otherwise, I would lastly like to say that once my raycasting engine is all well and done I will be making a full tutorial on how it works down to the last line so that anybody can make it. Obviously, I can’t do that if it’s not able to run well, so I’ll make sure to fix that constantly.

Leave a Reply

Your email address will not be published. Required fields are marked *