Shenmue Animation Debug Thread

Kion

DashGL
Joined
Oct 11, 2018
Since I have the memory span of a goldfish, I’ll go ahead and post notes here to keep them organized. Upfront a lot of the credit to this background information goes to LemonHaze and PhilYeahz we’ll see if either of my remaining brain cells is able to interpret what’s here.

The game Shenmue seems to store all of the animation files in one long Motion.Bin file where all of the animations are basically stacked end-on-end. Right now I'm not too interested in the process for how to slice the full file down into individual animations. For now we'll take that for granted and start with "A_WALK_L_02" which is Ryo's walking animation. We'll try to break down and analyze it to work with one proof of concept to see if we can try and break down how one animation works and then see if we can apply that to other animations. Or otherwise if we don't have enough information from only the walk animation try to compare the same area with different animations to see if that provides any hints.

So we're going to start with putting down a few links for reference.

1. https://github.com/Shenmue-Mods/ShenmueDK/blob/d58545cc8def1ccc6c2b4162e3fb8cb68b4471f6/src/shendk/files/animation/motn.cpp
This is the current animation parser based off tracing through the game. Credit to LemonHaze

2.
This is a spreadsheet version of the file that makes it easier to create sections with colors and make notes of what each section does and where it is.

 
Last edited:
OP
OP
Kion

Kion

DashGL
Joined
Oct 11, 2018
We can start to take our time to try and break down the animation file into parts that are easier to work with. We make a really simple block diagram of the general structure of the animation file.

block_list.png

We have a header that gives the offsets to the different "Blocks" in the file. As for blocks, right now all we can generally say is there are five of them. Block 1 seems to give bitflags for which key values should be applied for a given bone(frame?). And Block 5 has the actual key frame values as half floats For Blocks 2, 3 and 4 there are values but even after looking as these values for sometimes, there doesn't seem to be any obvious pattern that would give hints as to specifically what they do.

Right now we can start with the low hanging fruit and start of by defining the header as a struct.

Code:
typedef struct {
    uint32_t block_size_flags;
    uint16_t block_2_ofs;
    uint16_t block_3_ofs;
    uint16_t block_4_ofs;
    uint16_t block_5_ofs;
} sh_anim_header_t;
For offsets we will do everything relative to the start of the individual animation. Since the animation is part of a larger file, what this normally means is you have to factor in the offset of the start of the animation, but you can kind of cheat around that by using fmemopen or subarray to create a buffer where begin and end reference indexes in the array. So with this frame of reference the start of Block 1 will always be at 0x0c right after the end of the header. And then each block 2 - 5 are given by uint16_t values in the header.

I guess there are a few quick notes that I find odd about the offsets to Blocks 2 - 5. The first is that these offsets are defined with 16 bit values. 16 bits is not alot when it comes to offsets, so it seems that the programmers really didn't think there would be a lot of space between the header and block 5 and figured that 16 bits would be enough to allocate everything. The other interesting aspect is there really doesn't seem to be any consideration to staying inside 4 byte boundaries. So blocks 2 - 4 are going to be 1 byte or 2 byte values. This might be stating the obvious, but it still seems out of place enough to be noteworthy.

As for the size of Block 5, the length is from the offset declared by the header until the start of the next header, thus Block 5 takes up a massive portion of the file. We already know that these are the specific key frame values, but the length seems to indicate that the key frames are probably stacked in order as opposed to some kind of index look up scheme. What I also find curious is there doesn't specifically seem to be any length described for Block 5.

Normally as a precaution programmers would probably put a length to make sure they're reading from the right animation. Is there is no length, then it probably means that the programmers were pretty confident about reading the key frames in a way that wouldn't bleed into the next animation. Which is kind of surprising considering this it the most confusing cluster fuck of an animation format that I've encountered.

While I'm not familiar with Block 2 and Block 3 to be able to describe what their functionality is, thanks to LemonHaze we know if the values for these blocks are a single byte or two bytes in length.

Code:
uint32_t flag = sh_anim_header.block_size_flags;
uint16_t flag_1 = flag & 0x7FFF;
bool block2EntryHalfSize = flag  & 0xFFFF8000 ? false : true;
bool block3EntryHalfSize = flag_1 <= 0xFF ? true : false;
In the case of the walk animation, we have the flag value of 0x25 which when checked for flags provides the values true for both blocks (2 and 3) half size areas. Meaning that we have single byte values for blocks 2 and 3.
 
Last edited:
OP
OP
Kion

Kion

DashGL
Joined
Oct 11, 2018
In the previous post we started with the header, and in this post it makes sense to continue by looking into block 1 and all of it's apparent contradictions. We'll start with what we "know" and then descend into the insanity from there.

Screenshot from 2020-06-27 21-56-27.png

For a quick recap, for the walk animation, Block 1 is the area highlighted in Blue. And credit to LemonHaze, we can take a peak at the point where the game interprets this data.

block1_aaa.png

And to summarize, we can create a small figure to summarize this information.

shenmue_block01(1).png

Block 1 seems to be a list of uint16_t values which seem to contain bitflags for if position or rotation data should be interpreted for the animation. The bitmask for position values is 0x1c0 where bit 8 is Pos x, bit 7 is Pos Y, and bit 6 is Pos Z. The bitmask for rotation is 0x38 where bit 5 is Rot X, bit 4 is Rot Y and bit 3 is Rot Z. If we follow this pattern we can guess that the three lowest bits were probably allocated for scale. But Shenmue doesn't use scale (Ryo's fists don't get comically bigger when he attacks) so it looks like the game doesn't even bother checking for it. So it's labelled a "no operation" in the figure.

And last is the upper-most 7 bits. Which are allocated to either the bone id or the frame index. And originally I was thinking that it was the bone id, but that doesn't really make a lot of sense since it doesn't give any concept for the frame, and the bit flags set for each bone don't make any sense. But if we interpret this as the frame index then it makes more sense in a few ways.

The first is that it gives a concise reason for why the length of each animation isn't explicitly defined inside in the animation. If the programmers designed it as "oh we loop the frame number until we reach a 0x00 value", then they wouldn't need to define a specific length offset. It also makes more sense for the bit flags, to say, "for this frame we only need to calculate position", or "for this frame we need to calculate position and rotation".

Edit:
Did a test to see if the upper 7 bits for the block 1 values could potentially be the frame index instead of the bone id, but after replacing values in block 1 with 0x0000, we can see that specific offsets affect specific bones. So we can safely say that it's the bone id.

 
Last edited:
OP
OP
Kion

Kion

DashGL
Joined
Oct 11, 2018
Now that we've confirmed what Block 1 does, we can go ahead and break down everything we know about it to try and make sense of the other blocks (even though they really don't make any sense). Since we know the block 1 values are bone id's we might as post a list of bone id enums so we can get an idea of which bones we're working with.

From: https://github.com/Shenmue-Mods/ShenmueDK/blob/42bbc74db53f9b9ed35eb3266366b6702889b2aa/include/shendk/types/model.h
Code:
enum class BoneID : uint8_t {
    Root = 0,
    Spine = 1,
    Hip = 14,
    RightUpperLeg = 16,
    RightLowerLeg = 17,
    RightFoot = 18,
    RightFootToes = 19,
    LeftUpperLeg = 21,
    LeftLowerLeg = 22,
    LeftFoot = 23,
    LeftFootToes = 24,
    RightShoulder = 4,
    RightUpperArm = 5,
    RightLowerArm = 6,
    RightWrist = 7,
    RightRiggedHand = 8,
    RightHand = 191,
    RightHandIndexUpper = 28,
    RightHandIndexLower = 29,
    RightHandFingerUpper = 31,
    RightHandFingerLower = 32,
    RightHandThumb = 25,
    LeftShoulder = 9,
    LeftUpperArm = 10,
    LeftLowerArm = 11,
    LeftWrist = 12,
    LeftRiggedHand = 13,
    LeftHand = 190,
    LeftHandIndexUpper = 43,
    LeftHandIndexLower = 44,
    LeftHandFingerUpper = 46,
    LeftHandFingerLower = 47,
    LeftHandThumb = 40,
    Head = 189,
    Jaw = 188,
    None = 0xFF
};
And with that we also might as well post a table of all of the block 1 values so we can take a quick look at it as a reference.

No.Bone IdBone NameHexBinaryPos XPos YPos ZRot XRot YRot ZCount
10Root0x1f80b0000000,1111110006
21Hip0x2380b0000001,000111000×××3
35RightUpperLeg0xa380b0000101,000111000×××3
48RightFootIKTarget0x11c00b0001000,111000000×××3
59RightFoot0x12380b0001001,000111000×××3
612LeftUpperLeg0x18380b0001100,000111000×××3
715LeftFootIKTarget0x1fc00b0001111,111000000×××3
816LeftFoot0x20380b0010000,000111000×××3
918Torso0x24380b0010010,000111000×××3
1020UpperTorsoIKTarget0x29c00b0010100,111000000×××3
1121Unknown0x150x2a380b0010101,000111000×××3
1223HeadLookAtTarget0x2fc00b0010111,111000000×××3
1325RightShoulder0x32380b0011001,000111000×××3
1426RightArm0x34380b0011010,000111000×××3
1529RightHandIKTarget0x3bc00b0011101,111000000×××3
1630RightHand0x3c380b0011110,000111000×××3
1731LeftShoulder0x3e380b0011111,000111000×××3
1832LeftArm0x40380b0100000,000111000×××3
1935Unknown350x47c00b0100011,111000000×××3
2036LeftHand0x48380b0100100,000111000×××3
From the table we can notice a few trends that stand out. First with the exception of the root bone, all of the bones have either rotation or position transformations but not both. For the root bone, this goes against what I normally observe as often the root bone only has position. But that's likely for the case of when the root bone is at the origin, then root 0 can control the position and root 1 can control the tilt. In the case of Shenmue, the root bone is at the hips, so it kind of makes sense that the root bone would control both the position and the tilt.

As for the other bones, I'm hoping that the rotation values means that these are normal animations where rotations are encoded for a specific time index. That means we could read the bone id, read the frame, seek to the offset in block 5 to read the key frame values and hopefully generate an animation that closely resembles that from the game. For position it would make sense if these are reverse kinematics where they game defines a position and then calculates where other bone positions need to be for adjustment. Hopefully if we can get the frame index and key frame, we can think of an approach to potentially solve for these.

And a quick count of all of the transformation bitflags set. We 21 position bitflags set, 42 rotation bitflags set for a total of 63 bitflags in all. If we ever need to compare these counts for anything.
 
Last edited:
OP
OP
Kion

Kion

DashGL
Joined
Oct 11, 2018
Now that we have a table-ized version of block 1, we can start to take a look into block 2. I've not very good with tracing through files, but from talking with LemonHaze on discord, it sounds like for each bitflag in Block 1, the game will call Read_MOTN_Data which will read one value from Block 2 and one value from Block 3.

For for Block 2, while we don't know what it does we can make some predictions about what we hope to find there. If there is a value for each bitflag in Block 1, then we hope that the number of values in Block 2 are a multiple of the total number of bitflags plus potentially a little bit of padding. We have 21 position bitflags and 42 rotation bitflags set in Block 1 for a total of 63 individual flags.

Block 2 starts at offset 0x36 and ends on offset 0x75. We can subtract 0x36 from 0x75 to get 0x3F. And 0x3F is 63 in decimal. Which is nice that we get the exact number of bytes that we have bitflags from Block 1. That means that even if we don't know what these bytes do, we can at least know what they're associated with.

Screenshot from 2020-06-28 11-24-40.png

Above we have an image of Block 2 (in yellow), and aside from the length matching up with the number of bitflags from Block 1, the only observation I can make is that the values range from 0x00 to 0x11, with a decent number of the values being 0x00. We can generally conclude that each one of these values is associated with a specific bone, a specific transformation (pos or rot) and a specific axis (x, y, or z).

What we don't know is what these values means. Considering the range of 0x00 - 0x11 that's 5 bits being used. Which means they could be further flags for how to process each bone-transform-axis combination. We can also see that there's only one value per each bone-transform-axis combination, which means that this is not a per-frame value, but a per-animation value. The main viable option for these values that I can think of is probably constraints. Similar to Block 1, we can generate a table to to see if there are any patterns that stand out for if certain bones and certain transforms have similar values in Block 2.

No.Bone IdBone NameTransformAxisHexBinary
00Rootposx0x000b00000000
10Rootposy0x0c0b00001100
20Rootposz0x000b00000000
30Rootrotx0x000b00000000
40Rootroty0x000b00000000
50Rootrotz0x000b00000000
61Hiprotx0x010b00000001
71Hiproty0x010b00000001
81Hiprotz0x010b00000001
95RightUpperLegrotx0x000b00000000
105RightUpperLegroty0x000b00000000
115RightUpperLegrotz0x000b00000000
128RightFootIKTargetposx0x030b00000011
138RightFootIKTargetposy0x0f0b00001111
148RightFootIKTargetposz0x110b00010001
159RightFootrotx0x070b00000111
169RightFootroty0x000b00000000
179RightFootrotz0x000b00000000
1812LeftUpperLegrotx0x000b00000000
1912LeftUpperLegroty0x000b00000000
2012LeftUpperLegrotz0x000b00000000
2115LeftFootIKTargetposx0x030b00000011
2215LeftFootIKTargetposy0x0a0b00001010
2315LeftFootIKTargetposz0x100b00010000
2416LeftFootrotx0x070b00000111
2516LeftFootroty0x000b00000000
2616LeftFootrotz0x000b00000000
2718Torsorotx0x000b00000000
2818Torsoroty0x010b00000001
2918Torsorotz0x000b00000000
3020UpperTorsoIKTargetposx0x010b00000001
3120UpperTorsoIKTargetposy0x040b00000100
3220UpperTorsoIKTargetposz0x010b00000001
3321Unknown0x15rotx0x000b00000000
3421Unknown0x15roty0x000b00000000
3521Unknown0x15rotz0x000b00000000
3623HeadLookAtTargetposx0x010b00000001
3723HeadLookAtTargetposy0x040b00000100
3823HeadLookAtTargetposz0x010b00000001
3925RightShoulderrotx0x020b00000010
4025RightShoulderroty0x030b00000011
4125RightShoulderrotz0x050b00000101
4226RightArmrotx0x010b00000001
4326RightArmroty0x050b00000101
4426RightArmrotz0x050b00000101
4529RightHandIKTargetposx0x030b00000011
4629RightHandIKTargetposy0x0a0b00001010
4729RightHandIKTargetposz0x040b00000100
4830RightHandrotx0x040b00000100
4930RightHandroty0x020b00000010
5030RightHandrotz0x000b00000000
5131LeftShoulderrotx0x030b00000011
5231LeftShoulderroty0x030b00000011
5331LeftShoulderrotz0x050b00000101
5432LeftArmrotx0x020b00000010
5532LeftArmroty0x040b00000100
5632LeftArmrotz0x050b00000101
5735Unknown35posx0x050b00000101
5835Unknown35posy0x080b00001000
5935Unknown35posz0x040b00000100
6036LeftHandrotx0x020b00000010
6136LeftHandroty0x010b00000001
6236LeftHandrotz0x000b00000000

Edit:
Some testing for Blocks 2 and 3
 
Last edited:
OP
OP
Kion

Kion

DashGL
Joined
Oct 11, 2018
Now that we've taken notes about Blocks 1 and 2, the next step is Blocks 3 and 4. And I'll do this in the same post as I really don't have a clue as to what these could possibly be.

Screenshot from 2020-06-28 17-53-01.png

Block 3 is the area highlighted in read that starts with 0x05, 0x07, 0x09, and ends with 0x0f, 0x1b, 0x11. The Block starts on 0x75 and ends on 0x135 for a length of 0xC0 or 192 in decimal. The weird part about this is while Block 2 has a length relative to the number of bitflags declared in Block 1, Block 3 doesn't seem to have that relation. Since 63 doesn't divide evenly into 192. The closest you get is 63 * 3 = 189, with a remainder of 3. That could possibly make sense if there was padding, but Block 3 is packed with byte values, so that's not the case.

What is interesting is the length provides more hints for what the bytes in Block 2 mean. If we take the value of all of the bytes in Block 2 and add them up we get 192, or the exact length of Block 3. Which provides a little bit of context for Block 2, as the numbers didn't seem to mean anything. Now they could potentially mean something, specifically being the number of bytes that need to be read from Block 3 for each bone-transformation-axis combination. That means that we've managed to get some context for Block 2, but has effectively shifted on to the next Block as we have no idea what the meaning of the values in Block 3. What's even worse is we haven't seen any hints that might help us find how many frames are in the animation or what the key frames are.

For Block 4, the block starts at offset 0x135 and ends on 0x1a0. For a length of 0x6b. I can't be exactly sure but the values in this block look like 2 byte values. The 0x00 value at the beginning looks like padding to adjust the offset, and the end of the block ends cleanly on a 4 byte offset. We also see 0xffff which is -1 as a signed 16 bit integer. So we can't be sure that these are all 16 bit values, but we can try to interpret the Block in this way and see where it leads.

We have 0x6b bytes in length with one byte for padding. That gives us a practical length of 0x6a. If we have two byte values that means we have 0x35 values. In the same way that Block 3 provided some context for Block 2, I was hoping thought Block 4 would provide some context for Block 3. And we can cross off the possibilities, 0x35 is not a multiple of the number of bones defined, or the number of bitflags. It's not the added values of Block 3, and it's not an index for Block 3 as the highest declared byte value is 0x23 and we have 0x35 possible values.

Next we'll take a look at Block 5, and we can also do some testing in game for Blocks 3 and 4.
 
Last edited:

Switch

Shenmue analysis at www.phantomriverstone.com
Joined
Jul 28, 2018
Location
Japan
Thanks for these write-ups of the investigation, I'm enjoying them!
 
OP
OP
Kion

Kion

DashGL
Joined
Oct 11, 2018
Thanks for these write-ups of the investigation, I'm enjoying them!
It comes down to looking through the files and trying to interpret what the values are. With Binary/C structures are generally coded with structs and pointers. So often the length of an array will be declared right next to the offset for the array, and it's possible to section off by pointers and then figure out the structure. Witch Shenmue this file format is stupidly opaque. Getting through a decent portion of the file I still have yet to see any kind of hints as to how many frames are in the animation, if there are key frames, or how the game knows where to seek to the offset to the transformation values in Block 5.

walk_anim.JPG

We can finally get into Block 5 which starts at 0x1a0 and goes until the end of the file at 0x10a4 for a length of 0xf04. We can try and do some simple math to try and figure out the layout of this Block based on the clue (or lack there of) from the previous blocks.

We know that all of the key values in this block are half floats. We also know that Shenmue declares transformations in all three axis. That means each transformation is going to have a length of 6 bytes. We have a length of 0xf04 which is not evenly divisible by 6, but there are two bytes of padding on the front, and there could be two bytes for padding somewhere else in Block. Thus if we remove possible padding and divide by 6 we get 0xf00 / 6 = 0x280.

We have 0x280 (640) individual x, y, z transformation values. Now we're going to need a transformation value per-frame per bone. Since we haven't seen anything that resembles a key frame declaration in the file we can assert that the game is going to have to provide a full table for each bone for each frame instead of applying just key frames. If this is the way the game works, then it's going to assign the same space for each list of bone-transformation pair.

In Block 1 we have values for 20 different bones, and the root bone has transformations for bone position and rotation, bringing the total of bone-transformation pairs to 21. And if we divide 640 by 21 we get 30.47, which means it doesn't divide evenly. Not to mention that I only found 14 frames for the root bone position. Which could mean a lot of things. Like there could be more key frames for some bones than others, it could mean that position and rotation could be encoded differently. Or it could also mean that Block 1 is a big stupid lie and there are actually more bone transformations that are not included in the header. Or it could also mean that I'm a huge dummy and I grabbed more bytes than I should have.

The only way to know for sure is going to be with more testing.

 
Last edited:

LemonHaze

Administrator
Joined
Dec 25, 2018
Trying to think of things which may be left out of these posts, but you're doing a pretty good job at collating it all and nicely representing everything in a nice and easy way. One thing I haven't mentioned previously is that tasks (which every system within both games are utilising) seem to be 0x70/112 bytes in size, this is only slightly relevant because surrounding a lot of the MOTN logic are big while() or for() loops which seem to iterate in 0x70 byte chunks at a time, it looks like some kind of padding perhaps, but not entirely sure right now as I haven't finished research on it.

The video you posted kinda looks like the game is only interpolating a single frame throughout the whole sequence, not 100% tho.

I definitely agree though, a lot of this is pretty bizarre. Storing half-floats themselves is bizarre enough, but to have separate instructions for each axis only makes a little bit of sense, which is that it's some way they got around constraints/limitations on the Dreamcast, because each sequence is read at runtime and fresh - so no caching at all. But even then, even if they stored just transform matrices, it would've likely been more performant.

The "second count" / "second count index" is kinda interesting (this is block 4 values). That whole thing is apparently some sort of flag, too, and somewhat correlates with the paired values which seemingly aren't used. It's still unclear exactly what they are, The second count stuff is basically an index into a 256-byte array of repeating values below/equal to 12:

C:
    const uint8_t countLookupTable[256] = {0,  1,  2,  3,  1,  2,  3,  4,  2,  3,  4,  5,  3,  4,  5,  6,
                                           1,  2,  3,  4,  2,  3,  4,  5,  3,  4,  5,  6,  4,  5,  6,  7,
                                           2,  3,  4,  5,  3,  4,  5,  6,  4,  5,  6,  7,  5,  6,  7,  8,
                                           3,  4,  5,  6,  4,  5,  6,  7,  5,  6,  7,  8,  6,  7,  8,  9,
                                           1,  2,  3,  4,  2,  3,  4,  5,  3,  4,  5,  6,  4,  5,  6,  7,
                                           2,  3,  4,  5,  3,  4,  5,  6,  4,  5,  6,  7,  5,  6,  7,  8,
                                           3,  4,  5,  6,  4,  5,  6,  7,  5,  6,  7,  8,  6,  7,  8,  9,
                                           4,  5,  6,  7,  5,  6,  7,  8,  6,  7,  8,  9,  7,  8,  9,  10,
                                           2,  3,  4,  5,  3,  4,  5,  6,  4,  5,  6,  7,  5,  6,  7,  8,
                                           3,  4,  5,  6,  4,  5,  6,  7,  5,  6,  7,  8,  6,  7,  8,  9,
                                           4,  5,  6,  7,  5,  6,  7,  8,  6,  7,  8,  9,  7,  8,  9,  10,
                                           5,  6,  7,  8,  6,  7,  8,  9,  7,  8,  9,  10, 8,  9,  10, 11,
                                           3,  4,  5,  6,  4,  5,  6,  7,  5,  6,  7,  8,  6,  7,  8,  9,
                                           4,  5,  6,  7,  5,  6,  7,  8,  6,  7,  8,  9,  7,  8,  9,  10,
                                           5,  6,  7,  8,  6,  7,  8,  9,  7,  8,  9,  10, 8,  9,  10, 11,
                                           6,  7,  8,  9,  7,  8,  9,  10, 8,  9,  10, 11, 9,  10, 11, 12};
It's used like:
C++:
                    stream.seekg(block4Offset, std::ios::beg);
                    uint8_t secondCountIndex = sread<uint8_t>(stream);
                    uint8_t secondCount = countLookupTable[secondCountIndex];
The actual values themselves (secondCount) seem to be a mask byte for whether or not the keyframe values come in pairs (not used, have almost no effect on the animations) or single values:

Single values:-
C:
secondCount & 0x40
secondCount & 0x10
secondCount & 0x04
secondCount & 0x01
Pair values:-
C:
secondCount & 0x80
secondCount & 0x20
secondCount & 0x08
secondCount & 0x02
Also, Unknown35 seems to be a duplicate node of the "LeftFootIKTarget" node. An animation seems to either use that or it uses 33 (the other LeftFootIKTarget node). Could be another constraint, somehow. But even that is a bit of a stretch, because I can't imagine introducing another entire node would ever be faster than just using the normal one.. but then again, it is possible this is how they encode "straight-from-mocap" cutscene character animations. You'll notice that a lot of instructions are the same across different animations, most notably the Walk, Idle and Pit Blow anims. So maybe for bigger animations, which in the current way of thinking, should have some high value instructions (because each instruction seems to increment by varying amounts), are using one of these to somehow cut down on instruction size and therefore processing.
 
Last edited:
OP
OP
Kion

Kion

DashGL
Joined
Oct 11, 2018
The "second count" / "second count index" is kinda interesting (this is block 4 values). That whole thing is apparently some sort of flag, too, and somewhat correlates with the paired values which seemingly aren't used. It's still unclear exactly what they are, The second count stuff is basically an index into a 256-byte array of repeating values below/equal to 12:
For clarification I made a quick image.
Screenshot from 2020-06-28 17-53-01.png

The weirdest part about all of this is so far I haven't seen anything that resembles a definition for the total number of frames in the animation, or something that resembles a declaration of key frames. It's weird because Blocks 2, 3 and 4 seem to have some values, but they're not declared in a way that makes it apparent what they do. The aspects of the file that make the most sense are the header, Block 1 and Block 5.

Except Block 5 is also pretty non-nonsensical.

shenmue_block5.png

i've only poked into a little bit of Block 5, but what I did find was that the Root Bone Position values were stacked end-on-end to each other for each frame. From Blocks 2, 3, 4 we haven't seen anything that resembles a key frame, offset or otherwise that would indicate a kind of encoded index look up scheme. Which means the key frame values in Block 5 are most likely stacked end-to-end for each bone for each frame.

That would suggest that structure of Block 5 is probably as shown above where you start with the first transformation declared in Block 1 (root pos), and then move to the next one (root rot), and continue on for each bone-transformation type there. The problem with this is to generate an animation the game is going to have to know where to read the key frame values from for each frame for each bone. This means basically that if the game is going to know where to jump to for the next bone in the stride, it would have to know how long the animation is.

For for instance if the animation is 30 frames, then each key frame x, y, z entry is going to be 6 byte. So you would start at frame 0, read the value for bone 0 pos, seek to the next bone-transform which would be 30 x 6 bytes after that and read the bone 0 position, seek and repeat until you're read all of the transform values for each bone for frame 0. Then you would increment the frame and repeat with the whole process shifted over by 6 bytes.

While I need to do more testing in game to see if this is actually how the game encodes animations, I don't think that it can be understated how bat shit retarded this approach actually is if that's actually what they're doing.

shenmue_block5-Page-2.png

In most cases if you were to not provide key frames and encode every bone transform for every bone, the normal common sense approach would be to organize by frame. That way all you need to do is seek to a frame offset, read each of the transforms in order. And if you do it that way it would make sense that the total number of frames wasn't declared because you could read each frame in order until you ran into some kind of termination signal. As it stands I don't know how they know what the stride is without declaring an animation length. Unless they're doing something really stupid like declaring a fixed 15 frames at a time for each type of animation, and then stacking those together.

walk_anim_gg.JPG

For the root bone position I only found 14 key frame values stacked end on end. If that's the pattern then it barely accounts for half of the length of the block if all of the animations are actually encoded in this way.
 
Last edited:

LemonHaze

Administrator
Joined
Dec 25, 2018
So keyframe values seem to reside in the last block, block 5. All values are half-floats. Most keyframe values are expected to have 3 components, which further brings the instructions into question (if they always expect 3 values, why do they separate keyframe value axes?):

Position keyframe values:
1593451194710.png

And rotation keyframe values:

1593451257208.png

In both cases, the values are read and written to an offset (outVal in these screenshots). So from this, we know that the rotation keyframe values are stored as fp16, expanded to f32, multiplied by 65536 and cast into an integer. Position keyframe values are mostly used directly. The first keyframe for the root bone position being (0.0, 1.113281, -0.039948) one of the few exceptions to the rule it seems, where 2 axis are inverted:

1593451992216.png

This is done further up with other values, providing that certain conditions are met. But in this screenshot, OutMotnData holds the keyframe values as previously described (first root bone pos keyframe) and then X and Z are inverted and added to stru_7FF75A287C10. After this the values are almost seemingly just applied to the transform matrix:

1593452207133.png

Regarding the instructions themselves, this was the behaviour that occurred when setting all of them to the same instruction (0x1F8 in this example):



However, this was the behaviour when setting them all to 0x3848:



At the end of this clip, I held down SHIFT to get Ryo moving and eventually see what would happen when he'd slow down, to find that these values have clearly had an effect on the running animation for some reason. The only explanation to this is that the game is constantly blending between the multiple locomotive animations:

1593453575168.png
 
Last edited:
OP
OP
Kion

Kion

DashGL
Joined
Oct 11, 2018
Screenshot from 2020-06-28 11-24-40.png

To make a change/hypothesis. I think that it could be likely that the first byte in the animation could actually be the number of frames, or otherwise have something to do with the frame count. The value size of blocks 2 and 3 depend on this value, but if that were the only functionality this value had, then only two bits would be required. It seems more likely that the sizes of Block 2 and 3 values are based on the length of the animation. For short animations, single bytes are enough, otherwise two bytes are used.

I did some testing and while the physics get super weird, it looks like Ryo get's part of the way through the animation before quickly snapping into the next loop of the animation.

Shenmue_Notes___Frame_Count.gif

In terms of overall psudo code, the general structure of the animation should be something similar to the following. So the next step is going to be tracing through the different blocks to try and figure out how the game knows to read a key frame value for a given frame/bone/transformation/axis.

Code:
frame_count = 0x25;
for(frame = 0; frame < frame_count; frame++) {

    block_2 pos = block_02_start;
    for(ofs = block_01_start; ofs < block_02_start; ofs += 2) {
        
        instruction = readUint16(ofs);
        if(instruction == 0) {
            continue;
        }
        
        bone_id = instruction >> 9;
        
        if(instruction & position_mask) {
            // read block 2
            // read block 3
            // read block 4
            // get key frame value from block 5
            posX = getPosition(block_2 pos++);
            posY = getPosition(block_2 pos++);
            posZ = getPosition(block_2 pos++);
        }
        
        if(instruction & rotation_mask) {
            // read block 2
            // read block 3
            // read block 4
            // get key frame value from block 5
            rotX = getRotation(block_2 pos++);
            rotY = getRotation(block_2 pos++);
            rotZ = getRotation(block_2 pos++);
        }
        
    }

}
 
OP
OP
Kion

Kion

DashGL
Joined
Oct 11, 2018
I've been trying to think of two approaches. The first is trying to trace through what the game does to try and figure out how it knows where to read a specific key value, but this "firstCount", "secondCount" stuff is super confusing. There are a lot of ways to encode a mesh, so it's often not surprising that a lot of different games will have drastically different file formats for meshes, based on the effects in the game, and how the programmers made decisions for how to structure the files around that.

With animations it's mostly different, in that animations in themselves are pretty complicated, so to try and reduce some of the complexity the files themselves are generally pretty simple. Shenmue throws that right out the window as there isn't anything like indexes or offsets that would provide hints as to how the file is parsed, and there are a lot of stupid look up tables and weird rules that don't make any sense.

The other approach is to manually go through the key frame values, change each value and then try and figure out what the effect is in game, to start get an idea of where different values are located. For instance you can find, "okay, root bone position is encoded from here to there", and "okay, this is the right arm". And generally once you have enough data points you can start to get an idea how the key values are laid out and start to think of possible options for parsing the key values, or use them as targets to check against tracing.

If you know the value and location of a few key frames you can use that to test, "okay i'm reading bone 0 position frame 0, i expect the value to trace to this offset, and for bone 0 position frame 1, i expect to trace to that offset", and then see if you can replicate that in code. One thing that I'm tempted to do is fork the emulator to write a debug log. If you know the start and end offset of an animation in memory, you can pick a location like the first instruction of block 1 and then pick an end point like the third instruction in block 1.

The rules of the debugger are pretty simple, if the game reads a uint16_t for the first instruction in block 1 start writing the log, and when the game reads a uint16_t for the third instruction stop writing the log. And the contents of the log are every value at every address.

Code:
Start debug
read uint16_t at 0xofs1
read uint8_t at 0xofs2
read uint8_t at 0xofs3
read uint16_t at 0xofs4
read uint32_t at 0xofs5
read uint16_t at 0xofs6
...
And what that does is provide a really simple reference for the order in which the game reads different values. And that can provide a lot of easy hints like, "okay the byte offset for block2 advance by 1, but the byte offset for block 3 advance by the value of the byte in block2". or "The game reads from block 4 and then jumps directly to block 5". Basically it's the same information tracing through the game through a debugger, but you don't get side tracked by trying to figure out what the raw assembly means. You get the order of values read, the values and the offsets and then you can piece together how it got there.

That was actually kind of a huge tangent behind writing this post, my intention was to take another look at block 5.

walk_anim_gg.JPG

I think I've posted this image twice already, but never provided any context. What we have here is the root bone position for block 5. The way I generated this table was I created a save state, found the offset of block 5 in the save state and then went through and commented out (set to 0x0000) two bytes at a time to see what the result was in game. And the result was that every time the value underlined in green was changed Ryo would sink into the ground.

root_bone_pos.png

And the reason for this is the root position of the bone is around Ryo's chest. So to actually have Ryo stand on the ground the Y position of his chest needs to be encoded into the animation so that it actually appears like he's standing. So if we go through and set each of the animation key frame values to 0 we can get the positions where Ryo sinks into the ground and underline them. And the pattern we get is that every 3 values we run into the y value, which makes it really easy to figure out that the two values inbetween are the z (underlined in purple) and x (underlined in red) values.

From there it looks like the game transitions into the root bone rotation, and Ryo's upper body starts to rotate. And that's where I stopped because it's kind of a time consuming process. And I wanted to go back and look at the other blocks to see if there would be any hints for where the game read values from Block 5. Since we haven't found anything that looks too simple I wanted to go back and try tracing out the values for Block 5 again, that would give us more context to work with.

I wrote in a previous post about the issues that bothered me about block 5. The way it's structured doesn't give any context for what the stride is, there are no key frame indexes. And for the root bone value there are only frames. That means if we multiply out 14 by all of the bone transformations, that barely accounts for half of the length of the file. And the file isn't too long, but it's not that short either. So I wanted to think if there was any more recon data I could get before jumping into taking the time to go through block 5 two blocks at a time.

And the approach that I though of was to look for the Bone 0 Y position. I was thinking that 14 frames seems like to few for whole length of the animation. Which might indicate that the game is encoding animations into blocks. Like frames 0 - 14 are encoded in the fist half and frames 15 - 2 are encoded in the next half. With the walk animation we can assume two things, that the y position of the root bone shouldn't vary too much, and the values should occur in increments of 3. So we can use this to look through the values to see if the game include root position at a later offset or if there really are only 14 key frames for root bone position. To generate the table below, I read each key frame value and looked for values that were between 1.00 and 1.10, where we would expect the y value to be.

Code:
2) 1.0240400022100414412424     
5) 1.03243301011103044342232     
8) 1.032131333223104011244     
11) 1.0321012210343113130311     
14) 1.0231231010000301121333     
17) 1.0231231010000301121333     
20) 1.03022241311423310431242     
23) 1.03240234242223330020443     
26) 1.0321012210343113130311     
29) 1.0310240400022100414413     
32) 1.02400434002124424303     
35) 1.02301232112244022020303     
38) 1.02323333032212000411401     
41) 1.0240400022100414412424     
--- 
93) 1.0114042212302033234132 
--- 
237) 1.03000140342010332040143     
239) 1.0012431402312310100003     
---
702) 1.0114042212302033234132     
---
842) 1.0323222302334411014421     
844) 1.03003202110340101411432     
846) 1.0233134430104122023314     
848) 1.0242002320321313332231     
850) 1.0323222302334411014421     
---
1283) 1.0114042212302033234132     
1442) 1.0323222302334411014421     
1444) 1.03003202110340101411432     
1446) 1.0242002320321313332231     
1448) 1.0242002320321313332231     
1449) 1.0323222302334411014421
Aaaaaaand what we find is there are only 14 key frames. The first index digit is the 2 byte offset of the key frame value. First two bytes is index 0, next two bytes are index 1 from the start of block 5. What we see at the beginning is exactly what we would expect a root position Y key frame every three indexes. After that my hypothesis was that we might see that again, and that's not what we wee at all. We see that in a few places 93, 237, 842, 1283 we have a few frames that probably coincidentally have around 1 as their value. And then after that we something really strange. From indexes 842 - 850 we have a value around 1 every other index and same with 1442 - 1448. Which is really weird as if we have x, y, z values for everything we wouldn't really expect a continuous two byte offset.

We might try jumping to these values to test them to see what happens. But I think this kind of rules out the possibility of the values looping around. I guess that implies that the game has a different number of key frames depending on the bone. Or maybe the first have of the key frame values are used for one thing, and the second half is used for something else entirely.
 
Last edited:
OP
OP
Kion

Kion

DashGL
Joined
Oct 11, 2018
Some good news and general incompetence on my part. I was trying to break down the structure of Block 5, and I always thought it was weird because it seemed like there were way too many bytes to account for. And the reason is because there actually were too many bytes.

I jumped into the middle of Block 5 and started to replace values with 0, and there was no effect on the animation. So I went crazy and replaced a lot more values with 0 and there was no effect on the animation. I figured I might as well try to calculate about where I expect the animation to end, which is 21 transformations * 14 frames * 6 bytes per frame = 1764(0x6E6). And there was still no effect on the animations, it wasn't until I removed all of the bytes prior to that point of Block 5 did I start to notice any changes in effect of the animation.

It definitely seems like there was a problem with how I sliced the larger file into smaller individual files, but the good news is that the key value section isn't nearly as big as I thought it was which makes breaking down and analyzing what's happening a little easier.

Screenshot from 2020-07-03 01-10-54.png

If we look at the screenshot we can see the bytes 0x01f8, which is the first instruction for root bone position and rotation. And we can trace back 12 bytes before that to find something that looks like an animation header. This should give us the real length of the animation and make it easier to break down and single out the key frame values. And as for splitting up the animations, it seems like looking for the instruction 0x01f8 as a way to double check if all of the animations have been properly separated seems like a good idea.

Now we can finally get some accurate numbers for how long Block 5 is. So it starts on 0x1a0 and ends on 0xef0. We know that the key frames are a length of 6 bytes. And we know the first two bytes are padding. Which is a practical length of 0xD4E. Then we subtract down until this number is divisible by 6 to account for any other padding that could be in there. Which is 0xD4A and when we divide that by 6 we get 0x237. Not a clean number and not what I was exactly looking for. The good news is that this Block is a lot shorter than I expected, so we can start to go through and debug and find which part of the model reference which range of values.

Edit:

 
Last edited:
OP
OP
Kion

Kion

DashGL
Joined
Oct 11, 2018
I was hoping that there would be an equal number of keyframes for each bone, and that the key frames would be declared in the order of the node id's from the instructions. But this animation format has been a fucking nightmare from the beginning, and I don't expect it to get nice anytime soon. So I went through the animation two bytes at a time, set them to zero, made a note if there was any visible change in the animation and then moved onto the next two bytes. And this was the result.

Code:
Root Pos : 0x04, 0x0a, 0x10, 0x16, 0x1c, 0x22, 0x28, 0x2e, 0x34, 0x3a, 0x40, 0x46, 0x4c
Root Rot : 0x56, 0x5a, 0x5e

Hips Rot : 0x72, 0x76, 0x7c

LeftFootIKTarget : 0x9e, 0xa2, 0xa6, 0xac, 0xb8, 0xbc, 0xc2, 0xc8, 0xcd, 0xd4, 0xda
Left Thigh: 0x110, 0x114, 0x120, 0x122, 0x128, 0x12d, 0x132, 0x138, 0x13d, 0x142, 0x148,
0x14d
LeftFootIK: 0x154, 0x15a, 0x160, 0x164, 0x16a, 0x178, 0x17c, 0x184, 0x186, 0x18c, 0x18e

RightFootIK: 0x1ac, 0x1b0, 0x1b4, 0x1ba, 0x1c0, 0x1cc, 0x1e4, 0x1e6, 0x1e8
RightThigh: 0x1ec, 0x1ee, 0x1f0, 0x1f2, 0x1f4, 0x1f8, 0x1fc, 0x206, 0x20c, 0x212, 0x218,
0x21e, 0x224, 0x228, 0x22e, 0x234, 0x238, 0x23c, 0x240, 0x246, 0x24a, 0x250
Right Foot: 0x256, 0x25c

Upper Body: 0x28e, 0x290, 0x292, 0x294, 0x296, 0x298, 0x29a, 0x29c, 0x29e, 0x2a0,
0x2a2, 0x2a4, 0x2a8, 0x2aa, 0x2ac, 0x2ae, 0x2b0

Head: 0x2b6, 0x2b8, 0x2be, 0x2ca, 0x2d0, 0x2d2, 0x2d8, 0x2da, 0x2de

Right Hand: 0x338, 0x340
Right Elbow (shoulder?): 0x36e, 0x374, 0x38c, 0x392
Right Hand IK: 0x394, 0x39a, 0x39e, 0x3a2, 0x3a4, 03a8, 0x3ac, 0x3b0, 0x3b4, 0x3b8, 0x3bc
0x3c0, 0x3c4, 0x3c8, 0x3cc, 0x3d0, 0x3d4, 0x3d8, 0x3da, 0x3dc, 0x3e0, 0x3e4
Right Elbow: 0x3e8

0x400 - 0x4d0 (nothing?)

LeftHandIk: 0x4d6, 0x4da, 04de, 0x4e2, 0x4e6, 0x4e8, 0x4ea, 0x4f0, 0x4f4, 0x4f8, 0x4fc
0x500, 0x504, 0x508, 0x50c, 0x510, 0x514, 0x518
LeftElbow (shoulder?): 0x51c, 0x520, 0x522, 0x524, 0x526, 0x528
I'll go over the notation since I doubt this makes any sense presented as-is. This is the result of going though the animation two bytes as a time and recording any visible change. There could be a lot of cases where a given value is close to zero anyways, and thus setting it to zero would provide very little visible information. The other possibility would be to set a range of values for each byte to try and weed out the values that don't show a visible difference being set to 0, but that would take three times longer.

The purpose of this was to generate a heat map to get a general sense of which values are where, roughly how many values there are and what order they are in. The amount of addresses listed doesn't specifically reflect the number of key frames, but reflects what values showed a visible response to try and edge out a start and end to each type of key framed section for each bone. The addresses are relative to the start of Block 5.

From this data we can recognize a few things. The first is that while the number of addresses doesn't necessarily indicate the number of key frame values, the range does. And what we find is while the Root Pone position has a wide range, the Root Bone Position, and Hip Rotation has a stupidly small range. This indicates that the key frame values are most likely indexed in some form.

For instance the game has a frame number and a node id. Using these two the game will know where to look in Block 4, Block 4 will provide an index number, and then the game will know which key frame value to read for that frame. So only unique values are included in Block 5. Another possibility is the game only reads the key frame values at certain frames and then interpolates in-between.

For the feet, it's kind of hard to know exactly what is being manipulated. What I think is happening is there is thigh rotation, and foot planing position, and possibly foot rotation, and then IK is solved for the knee. What this means is that for any change it makes it somewhat difficult to piece together exactly what is going on by looking at it, because one change can have an effect on the other parts. Generally the order seems to be left leg, right leg, right hand (huge void), left hand.

What I really want to do next is really narrow down the range of values being read for a given instruction in block 1. What I think I'll do is try to fork the emulator, start log when 2 bytes have been read in the Block 1 and end log when 3 values of 2 bytes have been read from block 5 and repeat that several times to try and get range of key frames being read for a specific node id.

 
Last edited:
OP
OP
Kion

Kion

DashGL
Joined
Oct 11, 2018
I'll be using this post as a scratchpad while editing the emulator to make up for my lack of brain function from lack of sleep for the last few days. The next step is that we want to start logging which key frame values are read for a certain instruction. There could be a more elegant way to do this, but the method I'm familiar with revolves around forking an emulator and then logging to the console like a freaking caveman.

To summarize the steps, first we need to clone and emulator and make sure can compile it. The two options that we seem to have available are Reicast and WashingtonDC-emu. There's not too much emphasis on which emulator is used. What ever works, compiles, and one where we can track down where values are read from memory. Then we need to find exactly where the address is in memory that we want to debug. If the emulator has a dump-memory option then that's the easiest. And since the animations will be loaded into a fixed location in memory, this step can be done with any emulator.

If there isn't an emulator with a dump memory option, then the backup strategy is to edit the save state function to write the Dreamcast memory to it's own file. Or otherwise write to a non-compressed file where we know the start of the Dreamcast memory in the file. Once we have that we can go to the emulator source, find where the values are being read and add in the few lines to print out the information that we want to the console.

To take a quick look at memory dumping, it looks like Redream, WashingtonDC-Emu and Reicast don't seem to have an easy option to simply dump the memory to a file. So I guess I'll get right into using a save state to export the memory into it's own file. RetroArch has the option to export to a save state that's not compressed, but I don't know where the start offset of the memory is, so that doesn't help me too much. Since RetroArch is effectively using Reicast as a core, I think it makes more sense to get started with Reicast to start tracking down where the save state function is.

Quick notes on compiling reicast. It seems easy enough:
Code:
$ git clone https://github.com/reicast/reicast-emulator.git
$ cd reicast-emulator
$ cd reicast/linux
$ make
$ ./reicast.elf
Next we're onto tracking down the SaveState function. Which is fortunately enough name SaveState and can be found in libswirl/libswirl.cpp. The problem is that this function only managed where the save state file is stored. It looks like the heavy lifting is managed by dc_serialize in ./libswirl/serialize.cpp. And it looks like this function makes a lot of calls to REICAST_S to append all of the information needed for the save state into one buffer. So we need to find where the main memory starts and ends in this buffer.

And it looks like we can straight up jack the memory from the save state function.
Code:
bool dc_serialize(void **data, unsigned int *total_size)
{
    int i = 0;
    serialize_version_enum version = V4 ;

    *total_size = 0 ;
  
     ...
    printf("mram\n");
    REICAST_SA(sh4_cpu->mram.data, sh4_cpu->mram.size);

    FILE *fp = fopen("/home/kion/memory.bin", "wb");
    fwrite(sh4_cpu->mram.data, sh4_cpu->mram.size, 1, fp);
    fclose(fp);
    ...
}
Now we have a 16MB file of the system ram written to our home directory, which means we can do a quick search to find the location at the start of the animation relative to the start of memory. And we get 0xdf91da. Which means that next place to look for where the emulator reads values from memory.

Code:
// libswirl/hw/mem/_vmem.h

template<typename T, typename Trv>
INLINE Trv DYNACALL _vmem_readt(u32 addr)
{
    const u32 sz = sizeof(T);

    u32   page = addr >> 24;    //1 op, shift/extract
    unat  iirf = (unat)_vmem_MemInfo_ptr[page]; //2 ops, insert + read [vmem table will be on reg ]
    void* ptr = (void*)(iirf & ~HANDLER_MAX);     //2 ops, and // 1 op insert

    if (likely(ptr != 0))
    {
        addr <<= iirf;
        addr >>= iirf;

        T data = (*((T*)&(((u8*)ptr)[addr])));
        return data;
    }
    else
    {
        const u32 id = (u32)iirf;
        if (sz == 1)
        {
            return (T)_vmem_RF8[id / 4](_vmem_CTX[id / 4], addr);
        }
        else if (sz == 2)
        {
            return (T)_vmem_RF16[id / 4](_vmem_CTX[id / 4], addr);
        }
        else if (sz == 4)
        {
            return _vmem_RF32[id / 4](_vmem_CTX[id / 4], addr);
        }
        else if (sz == 8)
        {
            T rv = _vmem_RF32[id / 4](_vmem_CTX[id / 4], addr);
            rv |= (T)((u64)_vmem_RF32[id / 4](_vmem_CTX[id / 4], addr + 4) << 32);

            return rv;
        }
        else
        {
            die("Invalid size");
        }
    }
}
 
Last edited:
Top