Following conventions are used in this writeup:
- All 'C' function names are in italics.
- All 'Assembly' function names and lables are metioned in bold italics.
- No file names will be mentioned here (with exception of assembly files), use ctags.
- All details are for uniprocessor system, nothing related to SMP is covered here.
-----------------------------
_create_page_tables:
- ldr r5, [r8, #MACHINFO_PHYSRAM] @ physram
- pgtbl r4, r5 @ page table address
- mov r0, r4
- mov r3, #0
- add r6, r0, #0x4000
- 1: str r3, [r0], #4
- str r3, [r0], #4
- str r3, [r0], #4
- str r3, [r0], #4
- teq r0, r6
- bne 1b
- ldr r7, [r10, #PROCINFO_MMUFLAGS] @ mmuflags
- mov r6, pc, lsr #20 @ start of kernel section
- orr r3, r7, r6, lsl #20 @ flags + kernel base
- str r3, [r4, r6, lsl #2] @ identity mapping
- add r0, r4, #(TEXTADDR & 0xff000000) >> 18 @ start of kernel
- str r3, [r0, #(TEXTADDR & 0x00f00000) >> 18]!
- add r3, r3, #1 <<>
- str r3, [r0, #4]! @ KERNEL + 1MB
- add r3, r3, #1 <<>
- str r3, [r0, #4]! @ KERNEL + 2MB
- add r3, r3, #1 <<>
- str r3, [r0, #4] @ KERNEL + 3MB
-----------------------------
In this function we first of all find out the physical address where our RAM starts, and locate the address where we'll store our inital MMU tables (line #1 and #2). We create these inital MMU tables 16kb below kernel entry point. Line #12- #15 create an identity mapping of 1MB starting from kernel entry point (physical address of kernel entry, i.e stext). Here we do'nt create a second level page table for these mapping, instead of that we specify in first level descriptor that these mappings are for section (each section mapping is of size 1MB). Simiarly we create mapping for 4 more sections (these 4 are non identity mappings, size of each section is 1MB here also) starting at TEXTADDR (virtual address of kernel entry point)so the initial memory map looks something line this:
After the initial page tables are setup, next step is to enable MMU. This code is tricky, does lot of deep magic.
----------------------------
- ldr r13, __switch_data @ address to jump to after mmu has been enabled
- adr lr, __enable_mmu @ return (PIC) address
- add pc, r10, #PROCINFO_INITFUNC
.type __switch_data, %object- __switch_data: .long __mmap_switched
- .long __data_loc @ r4
- .long __data_start @ r5
- .long __bss_start @ r6
- .long _end @ r7
- .long processor_id @ r4
- .long __machine_arch_type @ r5
- .long cr_alignment @ r6
- .long init_thread_union+8192 @ sp
line #1 puts virtual address of __mmap_switched in r13, after enabling MMU kernel will jump to address in r13. The virtual address that is used here is not from identity mapping, but it is PAGE_OFFSET + physical address of __mmap_switched. And now since in __mmap_switched we start refering to virtual addresses of variables and functions, so starting from __mmap_switched is no longer position independent.
At line #2-#3, we put position independent address of __enable_mmu ('adr' Psuedo instruction translates to PC relative addressing, that's why it is position independent) and jumps to at a offset of PROCINFO_INITFUNC (12 bytes) in __v6_proc_info structure (arch/arm/mm/proc-v6.S). At PROCINFO_INITFUNC in __v6_proc_info we have a branch to __v6_setup, which does following setup for enabling MMU:- Clean and Invalidate D-cache and Icache, and invalidate TLB.
- Prepare the control register1 (C1) value that needs to be written when enabling mmu, and return the value which needs to be written in C1.
--------------------------------------
__turn_mmu_on:
- mov r0, r0
- mcr p15, 0, r0, c1, c0, 0 @ write control reg
- mrc p15, 0, r3, c0, c0, 0 @ read id reg
- mov r3, r3
- mov r3, r3
- mov pc, r13
__turn_mmu_on just writes 'r0' to C1 to enable MMU. Line #4 and #5 are the nops to make sure that pipeline does not contain and invalid address access when C1 is written. Line #6 'r13' is moved in 'pc' and we enter __mmap_switched. Now MMU is ON, every address is virtual no physical addresses anymore. But the final kernel page tables are still not setup (final page tables will be setup by paging_init and mappings created by __create_page_tables will be discarded), we are still running with 4Mb mapping that __create_page_tables had set it for us. __mmap_switched copies the data segment if required, clears the BSS and calls start_kernel.
The mappings page table mappings that we discussed above makes sure that the position dependent code of kernel startup runs peacefully, and these mappings are overwritten at later stages by a function called paging_init( ) ( start_kernel( ) -> setup_arch( )-> paging_init( )).
paging_init() populates the master L1 page table (init_mm) with linear mappings of complete SDRAM and the SOC specific address space (SOC specific memory mappings are created by mdesc->mapio() function, this function pointer is initialized by SOS specific code, arch/arm/mach-*). So in master L1 page table (init_mm) we have mappings which map virtual addresses in range PAGE_OFFSET - (PAGE_OFFSET + sdram size) to physical address of SDRAM start - (physical address of SDRAM start + sdram size). Also we have SOC specific mappings created by mdesc->mapio() function.
One more point worth noting here is that whenever a new process is created, a new L1 page table is allocated for it, and the kernel mappings (sdram mapping, SOC specific mappings) are copied to it from the master L1 page table (init_mm). Every process has its own user space mappings so no need to copy anything from anywhere.
Handling of mapings for VMALLOC REGION are is bit tricky, because VMALLOC virtual addressed are allocated when a process calls vmalloc( ). So if we have some processes' which were created before the process which called vmalloc then their L1 page tables will have no maping for new vmalloc'ed region. So how this is taken care of is very simple, the mappings for vmalloc'ed region are updated in the master L1 page table (init_mm) and when a process whose page tables do not have newly created mapping accesses the newly vmalloc'ed region, a page fault is generated. And kernel handles page faults in VMALLOC region specially by copying the mappings for newly vmalloc'ed area to page tables of the process which generated the fault.
1. Linux assumes that it is dealing with 3 level page tables and ARM has 2 level page tables. For handling this, for ARM in ARCH include files __pmd is defined as a identity macro in (include/asm-arm/page.h):
-------------------------------------------------#define __pmd(x) ((pmd_t) { (x) } )
--------------------------------------------------So effectively for ARM, linux is told that pmd has just one entry, effectively bypassing pmd.
Also it tells Linux that each PTE table has 512 entries (whereas ARM hardware PTE table of 256 entries).This means that the PTE table that is exposed to Linux is actually 2 ARM PTE tables, arranged contigously in memory. NowAfter these 2 ARM PTE tables (lets say h/w PTE table 1 and h/w PTE table 2), 2 more PTE tables (256 entries each, say linux PTE table 1 and 2) are allocated in memory contiguous to Arm hardware page table 2. Linux PTE table 1 and 2 contains PTE flags of ix86 style corresponding to entries in ARM PTE table 1 and 2. So whenever Linux needs ix86 style PTE flags it uses entries in Linux PTE table 1 and 2. ARM never looks into Linux PTE table 1 and 2 during hardware page table walk, it uses only h/w PTE tables as mentioned above. Refer to include/asm-arm/pgtable.h (line: 20-76) for details of how ARM h/w PTE table 1 and 2, and linux PTE table 1 and 2 are organized in memory.
So here we conclude the architecture specific page table related stuff for ARM.
8 comments:
Hi Pankay,
The content is really impressive.
Thanks for the edication.
I am working on IXP435 chip and developing USB/ Ethernet driver for comtum board based on Intel IXP4xx reference Design.
I need some more details on setting up the page tables for my platform.
It is my understanding that that ARM page tables under Linux are writible even for Kernel text sections. I prefer them to be read-only.
There a kernel crash I am debugging. Is there a way that I can make kernel text section read-only and cause page-fault when someone(kernel module) else overwrites it.
- Mukund (jbmukund@gmail.com)
This was the excat information i was looking for many months. I appreciate your depth of understanding.
Wish to see many more posts with more diagrams.
Thanks
Raghavendra G
Thanks for very impressive explanation.
Just one question came to my mind is when the MMU is turned on and the CPU works on the pipelined instruction, as it goes ahead the hardware continues to fetch instructions expecting from the previous physical memory location; however, because of mapping, this location does not exist any more.
Are these instruction fetched from the I-Cache? If yes, then having the I-Cache enabled at this stage seems to be a must.
Thank you.
Omid
Just got the answer for the previous post:
When a Prefetch Abort occurs, the ARM processor marks the prefetched instruction as invalid, but does not take the exception until the instruction is to be executed. If the instruction is not executed, for example because a branch occurs while it is in the pipeline, the abort does not take place.
Thank you anyways.
Omid
Good, Keep it up
Looking for more technicals blogs from you.
Enjoyed reading this
My project requires me to understand this precise mapping and handling of ARM hardware tables with that of Linux. This really helped ! I will follow this blog for more.. :)
Post a Comment