Implementing convincing 2D visuals in a 3D game

Ok, so here's a weird idea for a graphics concept.

There have been lots of 3d games that uses 2d sprites or attempts to integrate the two in some way (like Ragnarok online,) and there have been lots of games that make an effort to make the 3d look "2d-ish" or "toon-y" (like Zelda,) but what I've never seen is a 3d game that tries to directly emulate the specific constraints of 2d sprite graphics… in 3d. In other words, instead of pursuing the eternal goal of 2d, which is to emulate 3d effects, this system is a 3d system that pursues the goal of emulating 2d effects.

To accomplish this goal the system follows certain steps:

1. No perspective.

Every sprite based 2d game is in orthographic projection (equidistant to camera,) so that's what this system uses as well at it's base.

The terrain is displayed close to overhead, created with orthographic compensation in mind (northern edges of cliffs have more prominent slopes than southern edges.) The models existing on the terrain are rotated in world space to be displayed at an angle. This emulates how a sprite in a 2d game is drawn, where perspective is tweaked locally, like how a sprite in a SNES rpg is more at an angle in relation to the camera than it actually should be in an overhead view. The person making the model locks it in at a certain angle by artistic consideration.

Then, to reintroduce the "false" vanishing point perspective that sprite artists use in their art (imagine a tower that is drawn to be narrower at it's base than at it's top,) one of two options are selected; either the artist making the model accounts for this distortion in the model when creating it by simply adjusting the model along it's height, or a simple vertex shader takes the boundaries and position of the model and then narrows the model itself slightly at the base, and expands it slightly at the top.

If the model needs to account for perceived size due to being at an elevation (closer to the camera) this shader resizes the model itself.

2. No infinite resolution of animations/directions.

A mechanic that differs widely in 3d and 2d art is that while in 3d animations are made by keyframes that are interpolated by algorithm, in 2d animations each frame represents a significant amount of work. This means there are hard limits to how fluid 2d animations can be, even at extremely high resource levels. To emulate this property I suggest the following:

Instead of the traditional way of creating 3d animations, the properties of 2d frame by frame animation is recreated by just doing direct keyframe animations and actively avoiding the interpolation. Instead of creating an animation of a certain set of keyframes and then interpolating between them, the model is fixed in frame positions by the artist and the transitions between them are instant. A person with experience in 2d animation can be brought in to look over such frames to adjust them along the properties of what is pleasing to the eye within the restrictions of the 2d "stop-motion".

Likewise, the directions a sprite in a 2d environment can be displayed is restricted by the amount of work it takes to draw individual sprites in the various directions. This is emulated by locking sprites into a similar amount of limited directions, like the cardinal/ordinal directions (8) or even just the cardinal directions (4.) Just like in a 2d system, instead of interpolating infinite resolution turning of the models, they will just "snap" into place along these predefined directions.

3. No shared lighting.

Another difference between 2d and 3d art is how lighting works. Due to the necessity of orthographic projections in 2d, combined with the fact that each frame needs to be drawn individually, there can be no proper local illumination. Attempts have been made to combine 3d lighting with 2d art over the years, but the outcome is always strange and not pleasing to the eye. 3d art on the other hand needs to follow other constrictions, due to it's rotations and infinite resolution in animation, the type of pleasing hand-drawn shadows you would see on a sprite is extremely impractical and the common trend is to either generate lighting entirely, or texture them in very generically in a top-down style. This style, while appealing enough, always ends up as much more bland than proper drawn 2d sprite lighting due to having to work in "any which way."

To recreate the properties of 2d light and shadow on models, there is no local illumination. Instead, each model is to be lighted individually but in the same fashion. Just like in 2d art, the properties of the lighting is selected in general for the whole project and then applied completely equally to each model, an orthographic global illumination. Again, a person with experience in 2d art can be made to go over the models and do small adjustments to make the lighting as appealing as possible along the lines of 2d aesthetics. It follows then that each model will have it's own instanced lighting system, whose effects can either be baked at load-time for static models, or exist as dynamic, individual systems that only affect local space for dynamic models (like models that need to change gear.)

Cast shadows would probably have to be done in much the same way it is done in 2d sprite environments, either as generic blobs (most likely the superior idea,) programmatically generated in some way and transposed from local to global space, or specifically modeled for each set of directions. Likewise, if local light sources are required (like a dungeon torch or something) the implementation needs to be "faked" in the same way you'd do it in a 2d game, with the light itself affecting the models not at all, or in generic terms (model in the light area is brighter overall, model outside is darker, but no actual shadows/highlights being generated.)

4. Overview of the technical restraints.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

5. Recap and different theoretical implementations.

The basis of this idea is to replicate the technical restrictions and "feel" of 2d sprite art in a 3d environment. In short to think of both the 3d terrain and models in as much of a "sprite" sense as possible, with the 3d model reduced to the equivalent of a free-standing visual plane that is then projected onto the terrain. This is done by:

restricting the perspective to equidistant orthographic.
restricting directions and animations to controlled sets like in a sprite animation.
handling light and shadow individually like in a 2d sprite.
getting a qualified person who knows 2d/animation aesthetics to make adjustments to models (angles, lighting etc. in the individual keyframes.)
post-processing effects according to style.

Simple tools can be made to facilitate the constraints of this system, where a model is loaded into the tool, which then displays the model with the required restrictions applied (angle skewing, perspective adjustment, individual light system, limited directions and keyframe animations,) which then a person experienced in 2d art can do adjustments in, according to 2d aesthetics for things like silhouettes, lights/shadows, perspective and so on.

However, this can only be half the full implementation. Added to this needs to be a proper direction for style itself. Style of models, texturing, post-processing shaders and other factors will determine this style. Here are some suggestions for paths one could conceivably take:

A concept where a relatively low camera angle is selected for the model "sprites", with low keyframe-count of the animations, and with simple model shapes, simple toon shader based textures that mostly convey color and specularity and banded shadows/highlights… and then adding a post-processing shader that pixellates the output could result in something that comes off something very close to the visual experience of a NES or SNES RPG.
A concept where the camera angle is higher for the model "sprites", with more advanced model shapes and a more detailed mode of texturing, with a higher resolution of the animations and directions, and perhaps with a more blending/higher detail pixellating post-processing shader, and you could have something like the graphics of Warcraft 2.
Going for something completely unique, I've seen several types of shader that emulates various styles of "drawn" art, both realistic, cartoony and impressionistic. It's possible one could make something that doesn't follow any previous game-style, but still gives off a 2d like feel, as in a cartoon or things like that.

The ultimate coolness factor when it comes to this idea would be if one could accomplish to make a game where the player visually parses it as a 2d game, but then later on realizes that due to the complexity of model-swapping etc. this is actually a 3d game that just visually presents as a 2d game.

Theoretically I think that could be possible.

6. Considerations and questions.

This will require a capable 3d artist as well as a capable 2d sprite artist (or a person who knows both) who has the ability to work with a completely novel system.
How will this actually look? Are there unaccounted for factors?
Is this just pure insanity?

Chat transcripts from art director:

Ok, keep in mind that at this point this is more of a theoretical idea for a system, more than a concrete implementation. There are several aspects that need to be tested, and I'm sure there are things I haven't accounted for properly. But I'll explain a bit about the rationale behind this specifically.

So… by having the models individually orientated by the artist you gain a unique aspect of 2D art, namely the ability to portray the unit in a way that circumvents the rules of perspective. Look at something like a SNES rpg for instance, where you often will see a pretty large discrepancy between the perspective of how different types of objects are rendered. A terrain might be drawn nearly top-down, while the sprites themselves are seen almost from the side. Obviously these are artistic choices, made to get the most visual impact out of the restrictions they had to work with. And in many cases this freedom can be very pleasing to the eye too. For an idea of how impactful this concept can be, imagine if Link from one of the top-down zeldas was drawn according to the actual rules of perspective conforming to the environment around him. Then his sprite would pretty much just be the top of a head moving about. In reality he exists at an angle to the terrain around him, which gives the opportunity of actually showing his body, and even though it doesn't conform to the rules of perspective at all, the mind instantly compensates for this and it still looks awesome.

But in effect, by going in this direction, it means that just like such a 2D sprite, the models themselves do not conform in a technically consistent way to the 3D world around them. So how would a point light actually affect the models, when these models do not conform properly to the actual rules of the 3D environment? If the model exists at an angle to the background terrain-plane for instance? I can only assume that this light would then affect such models in an inconsistent and weird way.

That's the reason for separating lighting into 1. global illumination, i.e. light that's so far away that it affects all the "sprite models" on screen the same, and 2. local illumination i.e. light/shadow technical hacks. So a torch on a wall for instance would affect the terrain (which is consistent with itself) like a proper light, while a "sprite "would be affected in a "fake" way, possibly just determining whether said sprite is in the light or not, and then uniformly applying a brightness property. Is this restrictive? Yes, it sure is. It's restrictive in much the same way that lighting in 2D game art is restrictive, and if you study how they do lighting systems in 2D worlds, it will usually be some variation of such a system. It can certainly be made atmospheric, although it does so in a completely different way than one would do in a pure 3D world.

Is this restrictive nature worth it for having the freedom to "game" perspective in how objects are displayed? That's not completely determined at this point. It's certainly quite possible to make things look good while enforcing that models always conform to the 3D space they're in, depending on the angle of the camera and so on. But I don't see how one could get both, and so a choice must be made. Either the freedoms/restrictions that come with being able to trick perspective for effect, or the freedoms/restrictions of conforming to the 3D environment around them.
So yeah… that's a lot of words to say that this issue is still open and TBD. 😄

So I understand it might not be so easy to form an image. And sharing stuff from various games has the problem that they are all only partial fits, and could give a wrong impression. Like Death's Door https://youtu.be/wqwxFC19OlU?t=3193 which shows how nice a non-rotating isometric game can look, but is also not at all the art style we're going for.

(…)

Sounds a lot like Super Mario RPG
Yeah, that's one of the SNES rpgs we are referring to as having many of the features we're talking about.

Yes, it's certainly not more simplicity of creation in an overarching sense. One could say that there's a simplicity of content creation in some ways if the alternative is a 2D sprite-based implementation of the graphics (due to more potential for reuse/attachments, and the exponential growth in asset requirements in sprite based systems.) Perhaps also a simplicity in the way amateurs/players create assets, depending on the quality of the toolset, due to the more simplistic way animations and the like is done. In every other regard I think it's more complicated and convoluted, certainly in implementation, but most likely in in-house artist workflow too. What it potentially brings is something that looks pretty unique and novel, something that feels like 2D art, but keeps some of the benefits of 3D. Whether this is worth it in the end over a more traditional pipeline and the benefits that bring (and which of course can be made to look excellent as well) is something that needs to be determined through testing. That's also why critiques and discussions like this is extremely important and valuable. Anyways, I'm working on a little test scenario that implements some of these concepts, it's going to look awful but will perhaps be useful in properly conveying some abstract issues that may be hard to visualize.

Ok, so notes on the testing I've done. The general principle seems to work. The theory of orthographic perspective trickery and so on seems generally valid, and while this was expected it's nice to have some confirmation. The animation system looks like it's also going to work, with some requirements I'm going to go into. Although it's not part of this demo, I've done some testing with lighting too, but this rapidly became too complex for me to deal with as a hobbyist amateur.

Animation is a bit tricky, and one realizes very fast that with the models having to be turned in all sorts of weird angles that an animation which looks good in one direction, may look like shit in another direction. A system where the artists can polish variations of the same animations in different directions is going to be absolutely required. It will probably make sense to settle on flipping models/animations for half the directions, as is the norm in sprite based games, instead of just relying on rotations. I'm not settled on the question of how the first iteration of this tool should be implemented yet… depends on some technical factors. After working a bit with animation in blender (I've mostly done static work in the past,) it became obvious that a lot of tools there are pretty important, like rigging mechanics, vertex weighting, IK and so on. How all these things are going to be integrated in the art editing pipeline of the game itself needs to be further explored and discussed.

Keep in mind that this demo is not reflective on whatever style we end up deciding on for the game. This is just test art, non-polished and in my own style. You will have to imagine it actually looking good, without bugs and shortcuts, with juicy post-processing shaders bringing everything together and all that jazz. ;D Same with the selected perspective of this demo, I went with a straight on birds-eye, something-akin-to snes rpg perspective, simply because that's what appeals to me. In reality there are lots of options in what kind of perspective we settle on, lower camera angles, isometric styles etc. It's going to be a question of style.

So, after all that I at least feel somewhat confident in saying that this is a graphics system that at least can be made to work and can be properly considered.

https://drive.google.com/file/d/1qEZmzVAW8gyMg2LR6Bdl1r4f-Qv_eXPI/view?usp=sharing

https://drive.google.com/file/d/1siAxeIAPcnhoWWKGgD3OkBDsLhLAqHRL/view?usp=sharing