Data mover: R5 side#

Idea#

The goal is to move sampled data from the R5 processor via the A53 to the host PC running a GUI, which displays and logs the data. Important requirements are:

  • minimal effort on the R5 to avoid stealing computation time from the control ISR

  • synchronous operation with the control ISR, because the data is updated in each ISR call

  • sampling frequencies up to 100 kHz

  • buffering, batching, and TCP transport are handled on the A53 side

The current implementation prioritizes deterministic R5 behavior over lossless buffering on the A53. If the Ethernet sender on the A53 falls behind long enough for its sample queue to overflow, the queued backlog is purged on the A53 so that streaming can return quickly to real-time operation.

At the same time, a data path in the opposite direction GUI -> A53 -> R5 is needed to enable and disable the system and set references. On the A53, incoming TCP control messages are buffered and commands are forwarded one by one back to the R5 in each ISR call. If no command is pending, the A53 returns an explicit no-op message so that the R5 receives the commands only once.

        graph LR
  subgraph ISR_Control
      JavaScope_update --> write["Write sample, slow data, and status to OCM bank 3"]
      write --> flush["Flush cache"]
      flush -->|"Trigger IPI to A53"| APU_IPI_ISR
      APU_IPI_ISR -->|"Read one A53 response message"| ipc_Control_func
  end
    

Fig. 64 R5-side view of the data mover#

Shared header file#

To this end, the shared header file APU_RPU_shared.h located at vitis/software/shared is included in both software projects, i.e., R5 and A53. The shared memory can be in OCM or DDR, here we use the OCM (on-chip memory) of the A53. The start address of the OCM is hard-coded into the software since it is specific to the UltraScale+ memory map.

//APU_RPU_shared.h
#pragma once
// OCM Bank Addresses
// See UG1085 v2.4 table 18-1 OCM Mapping Summary (https://docs.amd.com/r/en-US/ug1085-zynq-ultrascale-trm)
#define MEM_SHARED_START_OCM_BANK_1_RPU_TO_APU 	0xFFFD0000 // bank 1 is for r5->a53 user data
#define MEM_SHARED_START_OCM_BANK_2_APU_TO_RPU 	0xFFFE0000 // bank 2 is for a53->r5 user data
#define MEM_SHARED_START_OCM_BANK_3_JAVASCOPE 	0xFFFF0000 // bank 3 is for r5->a53 javascope
#define JS_CHANNELS 		20
#define JAVASCOPE_DATA_SIZE sizeof(struct javascope_data_t)

// Experimental feature - read docs before use
#define USE_A53_AS_ACCELERATOR_FOR_R5_ISR		FALSE

struct javascope_data_t
{
	uint32_t    status;
	float	    slowDataContent;
	uint32_t    slowDataID;
	float       scope_ch[JS_CHANNELS];
};

struct APU_to_RPU_t
{
	uint32_t id;
	float value;
};

struct APU_to_RPU_user_data_t
{
	// create variables that you want to share from A53 to R5
	uint32_t slowDataCounter;
};

struct RPU_to_APU_user_data_t
{
	// create variables that you want to share from R5 to A53
	uint32_t slowDataCounter;
};


// Used for communicating the ultrazohm revision between RPU default define and what APU reads from EEPROM is present
#include "xil_cache.h"

static inline uint32_t read_rpu_version(void){
    uint32_t volatile *rpu_version = (uint32_t *)((uint8_t*)MEM_SHARED_START_OCM_BANK_3_JAVASCOPE + 64U);
    Xil_DCacheInvalidateRange((uintptr_t)rpu_version, sizeof(uint32_t));
    return *rpu_version;
}

static inline uint32_t read_apu_version(void){
    uint32_t volatile *apu_version = (uint32_t *)MEM_SHARED_START_OCM_BANK_3_JAVASCOPE;
    Xil_DCacheInvalidateRange((uintptr_t)apu_version, sizeof(uint32_t));
    return *apu_version;
}

static inline void write_apu_version(uint32_t version){
    uint32_t volatile *apu_version = (uint32_t *)MEM_SHARED_START_OCM_BANK_3_JAVASCOPE;
    *apu_version=version;
    Xil_DCacheFlushRange((uintptr_t)apu_version, sizeof(uint32_t));
}

static inline void write_rpu_version(uint32_t version){
    uint32_t volatile *rpu_version = (uint32_t *)((uint8_t*)MEM_SHARED_START_OCM_BANK_3_JAVASCOPE + 64U);
    *rpu_version=version;
    Xil_DCacheFlushRange((uintptr_t)rpu_version, sizeof(uint32_t));
}


It defines the following:

  • struct javascope_data_t which is written by the R5 and read by the A53

  • struct APU_to_RPU_t for the command path from the A53 back to the R5

  • the number of float channels JS_CHANNELS inside javascope_data_t

  • the OCM start addresses for the three shared-memory regions used by the R5 and A53

Current implementation on R5#

Inside javascope.c, JavaScope_update writes the current sample to OCM bank 3 at MEM_SHARED_START_OCM_BANK_3_JAVASCOPE. The function also reads the response message from the A53 and forwards it to ipc_Control_func.

The R5-side sequence is:

  • write the selected channels into javascope_data->scope_ch[]

  • write slowDataID, slowDataContent, and status into the same shared-memory struct

  • flush the cache for OCM bank 3 so the A53 sees the updated sample

  • trigger the IPI to the A53

  • read exactly one APU_to_RPU_t response message from the A53 and pass it to ipc_Control_func

If USE_A53_AS_ACCELERATOR_FOR_R5_ISR is enabled, the R5 additionally exchanges user data with the A53 via OCM banks 1 and 2 and polls for the IPI acknowledge before continuing. If the accelerator mode is disabled, the R5 still reads the A53 response message but does not block on XIpiPsu_PollForAck.

#include "xil_cache.h"
#include "APU_RPU_shared.h"

void JavaScope_update(DS_Data* data){
   struct javascope_data_t volatile * const javascope_data =
      (struct javascope_data_t*)MEM_SHARED_START_OCM_BANK_3_JAVASCOPE;
   struct APU_to_RPU_t Received_Data_from_A53 = {0};

   for(int j=0; j<JS_CHANNELS; j++){
      javascope_data->scope_ch[j] = *js_ch_selected[j];
   }
   javascope_data->slowDataID      = js_cnt_slowData;
   javascope_data->slowDataContent = *js_slowDataArray[js_cnt_slowData];
   javascope_data->status          = js_status_BareToRTOS;

   Xil_DCacheFlushRange(MEM_SHARED_START_OCM_BANK_3_JAVASCOPE, JAVASCOPE_DATA_SIZE);
   status = XIpiPsu_TriggerIpi(&IPI_instance, XPAR_XIPIPS_TARGET_PSU_CORTEXA53_0_CH0_MASK);

   status = XIpiPsu_ReadMessage(&IPI_instance,
                                XPAR_XIPIPS_TARGET_PSU_CORTEXA53_0_CH0_MASK,
                                (u32*)(&Received_Data_from_A53),
                                sizeof(Received_Data_from_A53)/sizeof(float),
                                XIPIPSU_BUF_TYPE_RESP);
   ipc_Control_func(Received_Data_from_A53.id, Received_Data_from_A53.value, data);
}

Selection of transmitted channels#

In javascope.h an enumeration variable enum JS_OberservableData is defined that is used to identify the observable data with a unique name.

enum JS_OberservableData {
   JSO_ZEROVALUE=0,
   JSO_ia,
   JSO_ib,
   JSO_Speed_rpm,
   //...//
   JSO_ENDMARKER
};

In javascope.c in function JavaScope_initialize(DS_Data* data), the array float * js_ch_observable is initialized and holds the pointers to all observable data.

float *js_ch_observable[JSO_ENDMARKER];

int JavaScope_initialize(DS_Data* data)
{
   js_ch_observable[JSO_Speed_rpm]  = &data->av.mechanicalRotorSpeed;
   js_ch_observable[JSO_ia]         = &data->av.I_U;
   js_ch_observable[JSO_ib]         = &data->av.I_V;
   // ... //
}

In ipc_ARM.c, the selected channels are written to js_ch_selected. The selection is decided in the JavaScope application.

extern float *js_ch_observable[JSO_ENDMARKER];
extern float *js_ch_selected[JS_CHANNELS];

void ipc_Control_func(uint16_t msgId, uint16_t value, DS_Data* data)
{
   if (msgId == 1) {}
   else if (msgId == 204) // SELECT_DATA_CH1_bits{
      if ( value >= 0 && value < JSO_ENDMARKER ) {
         js_ch_selected[0] = js_ch_observable[value];
      }
   }
   else if (msgId == 205) // SELECT_DATA_CH2_bits{
      if ( value >= 0 && value < JSO_ENDMARKER ){
         js_ch_selected[1] = js_ch_observable[value];
      }
   }
   // ... same for all other channels  //
}

Where value relates to an entry in enum JS_OberservableData which is also known to the JavaScope application. For instructions on adding new observable variables, see JavaScope Customization.

See also#