android 在linux 4.12 核心對ion驅動的api 進行了修改,原來的一部分ioctl指令已經不存在了。
谷歌的ion 我個人覺的還是挺大的,system heap 記憶體配置設定的方式,其他的還有使用cma 配置設定等,不同的配置設定方式會調用linux不同的接口。這篇文章值隻寫下自己對system heap 的個人了解。ion相關代碼在核心kernel\msm-4.14\drivers\staging\android\ion 路徑下無論Android ion 最後調用那種heap 來配置設定記憶體。配置設定的buffer 都是放在linux dma-buf 這個結構中,dma-buf 是linux 中的一個架構,具體代碼我并沒有仔細去研究,根據ion中的使用來看,每個ion在配置設定的buffer 會存在dma-buf這個結構中,然後谷歌對這個buffer還有操作函數集ops ,也放到dma-buf中,在使用這個buffer時候實際上是間接調用dma-buf ops 來對這個buffer操作了,然後這個ops 函數在去調用heap 綁定的ops去實作。比如system heap,heap 建立時綁定了alloc。mmap,free,shrink等函數。dma-buf ops會最終調用這些函數。
在ion.c 檔案中能夠看到dma-buf ops 谷歌的實作
static const struct dma_buf_ops dma_buf_ops = {
.map_dma_buf = ion_map_dma_buf,
.unmap_dma_buf = ion_unmap_dma_buf,
.mmap = ion_mmap,
.release = ion_dma_buf_release,
.attach = ion_dma_buf_attach,
.detach = ion_dma_buf_detatch,
.begin_cpu_access = ion_dma_buf_begin_cpu_access,
.end_cpu_access = ion_dma_buf_end_cpu_access,
.begin_cpu_access_umapped = ion_dma_buf_begin_cpu_access_umapped,
.end_cpu_access_umapped = ion_dma_buf_end_cpu_access_umapped,
.begin_cpu_access_partial = ion_dma_buf_begin_cpu_access_partial,
.end_cpu_access_partial = ion_dma_buf_end_cpu_access_partial,
.map_atomic = ion_dma_buf_kmap,
.unmap_atomic = ion_dma_buf_kunmap,
.map = ion_dma_buf_kmap,
.unmap = ion_dma_buf_kunmap,
.vmap = ion_dma_buf_vmap,
.vunmap = ion_dma_buf_vunmap,
.get_flags = ion_dma_buf_get_flags,
};
在ion.h 中能夠看到heap 必須實作的函數的定義
/**
* struct ion_heap_ops - ops to operate on a given heap
* @allocate: allocate memory
* @free: free memory
* @map_kernel map memory to the kernel
* @unmap_kernel unmap memory to the kernel
* @map_user map memory to userspace
*
* allocate, phys, and map_user return 0 on success, -errno on error.
* map_dma and map_kernel return pointer on success, ERR_PTR on
* error. @free will be called with ION_PRIV_FLAG_SHRINKER_FREE set in
* the buffer's private_flags when called from a shrinker. In that
* case, the pages being free'd must be truly free'd back to the
* system, not put in a page pool or otherwise cached.
*/
struct ion_heap_ops {
int (*allocate)(struct ion_heap *heap,
struct ion_buffer *buffer, unsigned long len,
unsigned long flags);
void (*free)(struct ion_buffer *buffer);
void * (*map_kernel)(struct ion_heap *heap, struct ion_buffer *buffer);
void (*unmap_kernel)(struct ion_heap *heap, struct ion_buffer *buffer);
int (*map_user)(struct ion_heap *mapper, struct ion_buffer *buffer,
struct vm_area_struct *vma);
int (*shrink)(struct ion_heap *heap, gfp_t gfp_mask, int nr_to_scan);
};
在正式進入到配置設定記憶體給ion環節前,有一些概念應該時要了解的,struct sg_table 此結構時linux中儲存實體頁面散清單的。具體解釋建議看蝸窩科技的這篇文章Linux kernel scatterlist API介紹,簡單的接受就是此結構儲存了實體頁面的散清單,system 在配置設定的時候并不是配置設定出來的時一個連續的實體頁面,可以不連續,隻要虛拟位址連續就可以,比如camera申請了12M的buffer,此時從夥伴中拿出來的buffer 可能時多個64K的頁面。64k内部時連續的,當時64k頁面之間并不是連續的。
夥伴系統: 這個晚上資料很多,概念也比較簡單,夥伴系統通過哈希表來管理實體記憶體。配置設定的時候根據2的order (幾)次方配置設定對應的實體頁面數。
檔案描述符fd,ion配置設定記憶體後最後傳回的是fd,fd通過binder傳輸到不同的程序,然後在映射成程序的虛拟位址。fd 隻能在一個程序内使用,傳遞到其他程序時時通過Android 的binder 機制,簡單概括就是binder首先從要從其他程序配置設定個fd,然後讓目前的程序fd對應的核心的file 結構體和其他程序的fd綁定。
1.記憶體配置設定
ion 系統配置設定記憶體時在打開裝置後調用ioctl函數實作的
case ION_IOC_ALLOC:
{
int fd;
fd = ion_alloc_fd(data.allocation.len,
data.allocation.heap_id_mask,
data.allocation.flags);
if (fd < 0)
return fd;
data.allocation.fd = fd;
break;
}
可以看到調用了ion_alloc_fd函數産生了一個fd,ion_alloc_fd函數有三個參數,第一個參數時配置設定的buffer長度,第二個時heap的選擇,ion中有很多heap類型,本文隻将system heap(其他heap 代碼看起來比較難),第三個參數時标志位,在配置設定buffer的時候還有很多屬性通過這個标志位來判斷,比如配置設定的是否時camer記憶體,是否需要安全記憶體配置設定。函數ion_alloc_fd 實作如下:
int ion_alloc_fd(size_t len, unsigned int heap_id_mask, unsigned int flags)
{
int fd;
struct dma_buf *dmabuf;
dmabuf = ion_alloc_dmabuf(len, heap_id_mask, flags);
if (IS_ERR(dmabuf)) {
return PTR_ERR(dmabuf);
}
fd = dma_buf_fd(dmabuf, O_CLOEXEC);
if (fd < 0)
dma_buf_put(dmabuf);
return fd;
}
首先是産生産生了一個dma_buf 然後将這個dma-buf 轉換成fd。dma-buf 定義位于kernel\msm-4.14\include\linux\dma-buf.h文章将中,每個變量的含義官方有解釋:
/**
* struct dma_buf - shared buffer object
* @size: size of the buffer
* @file: file pointer used for sharing buffers across, and for refcounting.
* @attachments: list of dma_buf_attachment that denotes all devices attached.
* @ops: dma_buf_ops associated with this buffer object.
* @lock: used internally to serialize list manipulation, attach/detach and vmap/unmap
* @vmapping_counter: used internally to refcnt the vmaps
* @vmap_ptr: the current vmap ptr if vmapping_counter > 0
* @exp_name: name of the exporter; useful for debugging.
* @name: unique name for the buffer
* @ktime: time (in jiffies) at which the buffer was born
* @owner: pointer to exporter module; used for refcounting when exporter is a
* kernel module.
* @list_node: node for dma_buf accounting and debugging.
* @priv: exporter specific private data for this buffer object.
* @resv: reservation object linked to this dma-buf
* @poll: for userspace poll support
* @cb_excl: for userspace poll support
* @cb_shared: for userspace poll support
*
* This represents a shared buffer, created by calling dma_buf_export(). The
* userspace representation is a normal file descriptor, which can be created by
* calling dma_buf_fd().
*
* Shared dma buffers are reference counted using dma_buf_put() and
* get_dma_buf().
*
* Device DMA access is handled by the separate &struct dma_buf_attachment.
*/
struct dma_buf {
size_t size;
struct file *file;
struct list_head attachments;
const struct dma_buf_ops *ops;
struct mutex lock;
unsigned vmapping_counter;
void *vmap_ptr;
const char *exp_name;
char *name;
ktime_t ktime;
struct module *owner;
struct list_head list_node;
void *priv;
struct reservation_object *resv;
/* poll support */
wait_queue_head_t poll;
struct dma_buf_poll_cb_t {
struct dma_fence_cb cb;
wait_queue_head_t *poll;
unsigned long active;
} cb_excl, cb_shared;
struct list_head refs;
};
struct file 這個比較重要,這個會涉及将來的fd,實際上fd 是和struct file 連接配接起來的。 fd可以多個使用同一個struct file 者也是mmap映射fd 時候能夠映射為多個虛拟位址的原因。
ion_alloc_dmabuf函數位于kernel\msm-4.14\drivers\staging\android\ion\ion.c 檔案中:
struct dma_buf *ion_alloc_dmabuf(size_t len, unsigned int heap_id_mask,
unsigned int flags)
{
struct ion_device *dev = internal_dev;
struct ion_buffer *buffer = NULL;
struct ion_heap *heap;
DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
struct dma_buf *dmabuf;
char task_comm[TASK_COMM_LEN];
pr_debug("%s: len %zu heap_id_mask %u flags %x\n", __func__,
len, heap_id_mask, flags);
/*
* traverse the list of heaps available in this system in priority
* order. If the heap type is supported by the client, and matches the
* request of the caller allocate from it. Repeat until allocate has
* succeeded or all heaps have been tried
*/
len = PAGE_ALIGN(len);
if (!len)
return ERR_PTR(-EINVAL);
down_read(&dev->lock);
plist_for_each_entry(heap, &dev->heaps, node) {
/* if the caller didn't specify this heap id */
if (!((1 << heap->id) & heap_id_mask))
continue;
buffer = ion_buffer_create(heap, dev, len, flags);
if (!IS_ERR(buffer) || PTR_ERR(buffer) == -EINTR)
break;
}
up_read(&dev->lock);
if (!buffer)
return ERR_PTR(-ENODEV);
if (IS_ERR(buffer))
return ERR_CAST(buffer);
get_task_comm(task_comm, current->group_leader);
exp_info.ops = &dma_buf_ops;
exp_info.size = buffer->size;
exp_info.flags = O_RDWR;
exp_info.priv = buffer;
exp_info.exp_name = kasprintf(GFP_KERNEL, "%s-%s-%d-%s", KBUILD_MODNAME,
heap->name, current->tgid, task_comm);
dmabuf = dma_buf_export(&exp_info);
if (IS_ERR(dmabuf)) {
_ion_buffer_destroy(buffer);
kfree(exp_info.exp_name);
}
return dmabuf;
}
PAGE_ALIGN 這個宏長度的頁面對齊(向上對齊),配置設定的buffer的大小假如是5K這裡是将轉換成8K,因為頁面時以4k為大小的,與之對應的還有向下對齊,5k将轉換為4k。
plist_for_each_entry 将從所有的heap中查找對應的heap 類型,并執行這個heap對應的配置設定buffer函數,這裡我們假定這個heap時system heap。
在手機中檢視system heap相關的資訊,在adb shell 進入/sys/kernel/debug/ion/heaps
執行cat system
uncached pool = 349003776 cached pool = 1063071744 secure pool = 0
pool total (uncached + cached + secure) = 1412075520
可以看到system heap中有三個pool ,這三個pool是谷歌設定的三個存放實體頁面的池。也可以自己加pool。
找到對應的heap後開始執行ion_buffer_create函數建立ions buffer,定義位于kernel\msm-4.14\drivers\staging\android\ion\ion.h
/**
* struct ion_buffer - metadata for a particular buffer
* @ref: reference count
* @node: node in the ion_device buffers tree
* @dev: back pointer to the ion_device
* @heap: back pointer to the heap the buffer came from
* @flags: buffer specific flags
* @private_flags: internal buffer specific flags
* @size: size of the buffer
* @priv_virt: private data to the buffer representable as
* a void *
* @lock: protects the buffers cnt fields
* @kmap_cnt: number of times the buffer is mapped to the kernel
* @vaddr: the kernel mapping if kmap_cnt is not zero
* @sg_table: the sg table for the buffer if dmap_cnt is not zero
* @vmas: list of vma's mapping this buffer
*/
struct ion_buffer {
union {
struct rb_node node;
struct list_head list;
};
struct ion_device *dev;
struct ion_heap *heap;
unsigned long flags;
unsigned long private_flags;
size_t size;
void *priv_virt;
/* Protect ion buffer */
struct mutex lock;
int kmap_cnt;
void *vaddr;
struct sg_table *sg_table;
struct list_head attachments;
struct list_head vmas;
};
前面介紹的struct sg_table 就放在ion buffer中,用來儲存實體頁面散清單。
/* this function should only be called while dev->lock is held */
static struct ion_buffer *ion_buffer_create(struct ion_heap *heap,
struct ion_device *dev,
unsigned long len,
unsigned long flags)
{
struct ion_buffer *buffer;
struct sg_table *table;
int ret;
buffer = kzalloc(sizeof(*buffer), GFP_KERNEL);
if (!buffer)
return ERR_PTR(-ENOMEM);
buffer->heap = heap;
buffer->flags = flags;
ret = heap->ops->allocate(heap, buffer, len, flags);
if (ret) {
if (!(heap->flags & ION_HEAP_FLAG_DEFER_FREE))
goto err2;
if (ret == -EINTR)
goto err2;
ion_heap_freelist_drain(heap, 0);
ret = heap->ops->allocate(heap, buffer, len, flags);
if (ret)
goto err2;
}
if (buffer->sg_table == NULL) {
WARN_ONCE(1, "This heap needs to set the sgtable");
ret = -EINVAL;
goto err1;
}
spin_lock(&heap->stat_lock);
heap->num_of_buffers++;
heap->num_of_alloc_bytes += len;
if (heap->num_of_alloc_bytes > heap->alloc_bytes_wm)
heap->alloc_bytes_wm = heap->num_of_alloc_bytes;
spin_unlock(&heap->stat_lock);
table = buffer->sg_table;
buffer->dev = dev;
buffer->size = len;
buffer->dev = dev;
buffer->size = len;
INIT_LIST_HEAD(&buffer->attachments);
INIT_LIST_HEAD(&buffer->vmas);
mutex_init(&buffer->lock);
if (IS_ENABLED(CONFIG_ION_FORCE_DMA_SYNC)) {
int i;
struct scatterlist *sg;
/*
* this will set up dma addresses for the sglist -- it is not
* technically correct as per the dma api -- a specific
* device isn't really taking ownership here. However, in
* practice on our systems the only dma_address space is
* physical addresses.
*/
for_each_sg(table->sgl, sg, table->nents, i) {
sg_dma_address(sg) = sg_phys(sg);
sg_dma_len(sg) = sg->length;
}
}
mutex_lock(&dev->buffer_lock);
ion_buffer_add(dev, buffer);
mutex_unlock(&dev->buffer_lock);
atomic_long_add(len, &heap->total_allocated);
return buffer;
err1:
heap->ops->free(buffer);
err2:
kfree(buffer);
return ERR_PTR(ret);
}
此函數最主要的是通過ret = heap->ops->allocate(heap, buffer, len, flags);函數調用heap對應的配置設定函數。其他的代碼是一連結清單和sg_table的指派。
systeam 的alloc函數位于kernel\msm-4.14\drivers\staging\android\ion\ion_system_heap.c中
static struct ion_heap_ops system_heap_ops = {
.allocate = ion_system_heap_allocate,
.free = ion_system_heap_free,
.map_kernel = ion_heap_map_kernel,
.unmap_kernel = ion_heap_unmap_kernel,
.map_user = ion_heap_map_user,
.shrink = ion_system_heap_shrink,
};
allocate 實作函數是ion_system_heap_allocate 源碼如下:
static int ion_system_heap_allocate(struct ion_heap *heap,
struct ion_buffer *buffer,
unsigned long size,
unsigned long flags)
{
struct ion_system_heap *sys_heap = container_of(heap,
struct ion_system_heap,
heap);
struct sg_table *table;
struct sg_table table_sync = {0};
struct scatterlist *sg;
struct scatterlist *sg_sync;
int ret = -ENOMEM;
struct list_head pages;
struct list_head pages_from_pool;
struct page_info *info, *tmp_info;
int i = 0;
unsigned int nents_sync = 0;
unsigned long size_remaining = PAGE_ALIGN(size);
unsigned int max_order = orders[0];
struct pages_mem data;
unsigned int sz;
int vmid = get_secure_vmid(buffer->flags);
if (size / PAGE_SIZE > totalram_pages / 2)
return -ENOMEM;
if (ion_heap_is_system_heap_type(buffer->heap->type) &&
is_secure_vmid_valid(vmid)) {
pr_info("%s: System heap doesn't support secure allocations\n",
__func__);
return -EINVAL;
}
data.size = 0;
INIT_LIST_HEAD(&pages);
INIT_LIST_HEAD(&pages_from_pool);
while (size_remaining > 0) {
if (is_secure_vmid_valid(vmid))
info = alloc_from_pool_preferred(
sys_heap, buffer, size_remaining,
max_order);
else
info = alloc_largest_available(
sys_heap, buffer, size_remaining,
max_order);
if (IS_ERR(info)) {
ret = PTR_ERR(info);
goto err;
}
sz = (1 << info->order) * PAGE_SIZE;
if (info->from_pool) {
list_add_tail(&info->list, &pages_from_pool);
} else {
list_add_tail(&info->list, &pages);
data.size += sz;
++nents_sync;
}
size_remaining -= sz;
max_order = info->order;
i++;
}
ret = ion_heap_alloc_pages_mem(&data);
if (ret)
goto err;
table = kzalloc(sizeof(*table), GFP_KERNEL);
if (!table) {
ret = -ENOMEM;
goto err_free_data_pages;
}
ret = sg_alloc_table(table, i, GFP_KERNEL);
if (ret)
goto err1;
if (nents_sync) {
ret = sg_alloc_table(&table_sync, nents_sync, GFP_KERNEL);
if (ret)
goto err_free_sg;
}
i = 0;
sg = table->sgl;
sg_sync = table_sync.sgl;
/*
* We now have two separate lists. One list contains pages from the
* pool and the other pages from buddy. We want to merge these
* together while preserving the ordering of the pages (higher order
* first).
*/
do {
info = list_first_entry_or_null(&pages, struct page_info, list);
tmp_info = list_first_entry_or_null(&pages_from_pool,
struct page_info, list);
if (info && tmp_info) {
if (info->order >= tmp_info->order) {
i = process_info(info, sg, sg_sync, &data, i);
sg_sync = sg_next(sg_sync);
} else {
i = process_info(tmp_info, sg, 0, 0, i);
}
} else if (info) {
i = process_info(info, sg, sg_sync, &data, i);
sg_sync = sg_next(sg_sync);
} else if (tmp_info) {
i = process_info(tmp_info, sg, 0, 0, i);
}
sg = sg_next(sg);
} while (sg);
if (nents_sync) {
if (vmid > 0) {
ret = ion_hyp_assign_sg(&table_sync, &vmid, 1, true);
if (ret)
goto err_free_sg2;
}
}
buffer->sg_table = table;
if (nents_sync)
sg_free_table(&table_sync);
ion_heap_free_pages_mem(&data);
return 0;
err_free_sg2:
/* We failed to zero buffers. Bypass pool */
buffer->private_flags |= ION_PRIV_FLAG_SHRINKER_FREE;
if (vmid > 0)
ion_hyp_unassign_sg(table, &vmid, 1, true, false);
for_each_sg(table->sgl, sg, table->nents, i)
free_buffer_page(sys_heap, buffer, sg_page(sg),
get_order(sg->length));
if (nents_sync)
sg_free_table(&table_sync);
err_free_sg:
sg_free_table(table);
err1:
kfree(table);
err_free_data_pages:
ion_heap_free_pages_mem(&data);
err:
list_for_each_entry_safe(info, tmp_info, &pages, list) {
free_buffer_page(sys_heap, buffer, info->page, info->order);
kfree(info);
}
list_for_each_entry_safe(info, tmp_info, &pages_from_pool, list) {
free_buffer_page(sys_heap, buffer, info->page, info->order);
kfree(info);
}
return ret;
}
ion_system_heap_allocate 函數比較長,此函數的重點我覺的是 while 這塊代碼
while (size_remaining > 0) {
if (is_secure_vmid_valid(vmid))
info = alloc_from_pool_preferred(
sys_heap, buffer, size_remaining,
max_order);
else
info = alloc_largest_available(
sys_heap, buffer, size_remaining,
max_order);
if (IS_ERR(info)) {
ret = PTR_ERR(info);
goto err;
}
sz = (1 << info->order) * PAGE_SIZE;
if (info->from_pool) {
list_add_tail(&info->list, &pages_from_pool);
} else {
list_add_tail(&info->list, &pages);
data.size += sz;
++nents_sync;
}
size_remaining -= sz;
max_order = info->order;
i++;
}
ret = ion_heap_alloc_pages_mem(&data);
size_remaining 還是頁對齊的 unsigned long size_remaining = PAGE_ALIGN(size);
整個while函數就是不斷的從pool或者夥伴系統中取實體頁面,每次取完後size_remaining 減去對應的大小,不斷的重複直到最後size_remaining 為0,代表需要的buffer 已經全部取出。剛開始配置設定buffer的時候pool中是沒有buffer進行配置設定的,是調用linux函數接口從夥伴系統中配置設定的。
while中根據is_secure_vmid_valid 進行了判斷調用了不同的配置設定函數alloc_from_pool_preferred函數主要是從secure pool 取配置設定。
static struct page_info *alloc_from_pool_preferred(
struct ion_system_heap *heap, struct ion_buffer *buffer,
unsigned long size, unsigned int max_order)
{
struct page *page;
struct page_info *info;
int i;
if (buffer->flags & ION_FLAG_POOL_FORCE_ALLOC)
goto force_alloc;
info = kmalloc(sizeof(*info), GFP_KERNEL);
if (!info)
return ERR_PTR(-ENOMEM);
for (i = 0; i < NUM_ORDERS; i++) {
if (size < order_to_size(orders[i]))
continue;
if (max_order < orders[i])
continue;
page = alloc_from_secure_pool_order(heap, buffer, orders[i]);
if (IS_ERR(page))
continue;
info->page = page;
info->order = orders[i];
info->from_pool = true;
INIT_LIST_HEAD(&info->list);
return info;
}
page = split_page_from_secure_pool(heap, buffer);
if (!IS_ERR(page)) {
info->page = page;
info->order = 0;
info->from_pool = true;
INIT_LIST_HEAD(&info->list);
return info;
}
kfree(info);
force_alloc:
return alloc_largest_available(heap, buffer, size, max_order);
}
ION_FLAG_POOL_FORCE_ALLOC 判斷了是否調用強制配置設定,如果強制配置設定會調用alloc_largest_available函數最後會直接帶調用linux 函數從夥伴系統中配置設定實體頁面。關于struct page 這個結構體的介紹可以參考《Linux 實體記憶體描述》連結
alloc_from_pool_preferred 核心是for循環,這裡通過for 尋找合理的實體頁面大小取配置設定,我們知道在夥伴系統是哈希表維護了2 的order次方的實體頁面,在所有的pool中頁存在這個原理,不過維護的通過數組的方式,通常隻有2 的0 次方,和2的4次方。在
kernel\msm-4.14\drivers\staging\android\ion\ion_system_heap.h 中可以看到具體的定義
#ifndef CONFIG_ALLOC_BUFFERS_IN_4K_CHUNKS
#if defined(CONFIG_IOMMU_IO_PGTABLE_ARMV7S)
static const unsigned int orders[] = {8, 4, 0};
#else
static const unsigned int orders[] = {4, 0};
#endif
#else
static const unsigned int orders[] = {0};
#endif
#define NUM_ORDERS ARRAY_SIZE(orders)
根據我的測試目前手機應該是走的 orders[] = {4, 0}; 也就是說申請的實體頁面時4k 或者時64k。
回到alloc_from_pool_preferred函數中的for循環,假定時 orders[] = {4, 0}
static inline unsigned int order_to_size(int order)
{
return PAGE_SIZE << order;
}
PAGE_SIZE 是實體頁面大小,一般預設都是4k,armv8是支援實體頁面4k,16k,64k。假定系統用的4k,那麼開始時候就是2的4次放 乘以16 就是64k。if (size < order_to_size(orders[i])) 這句代碼首先判斷了要配置設定的頁面大小是否小于64k,如果小于那就不從這個order對應的數組分。因為此order存放的都是連續的64K 的實體頁面如果配置設定的buffer比64k小那麼以為着必須拆分才行,實體頁面配置設定都是已經找最合适的大小。是以這裡size比order_to_size 小會直接continue 跳過後面們繼續從order中找。64k後就是4k頁面理論上通過頁向上對齊不會有比這個頁面還小的了。如果 orders[] 不是4,0 ,設定更多的數16,8,4,for循環會周遊查找,如果最後不是2的 0次方,比如是2 的1次方那麼還存在for循環還是找不合适的orders問題,是以會跳出for循環進行也頁面分割,從大的實體頁面中分出合适的。調用split_page_from_secure_pool函數。
struct page *split_page_from_secure_pool(struct ion_system_heap *heap,
struct ion_buffer *buffer)
{
int i, j;
struct page *page;
unsigned int order;
mutex_lock(&heap->split_page_mutex);
/*
* Someone may have just split a page and returned the unused portion
* back to the pool, so try allocating from the pool one more time
* before splitting. We want to maintain large pages sizes when
* possible.
*/
page = alloc_from_secure_pool_order(heap, buffer, 0);
if (!IS_ERR(page))
goto got_page;
for (i = NUM_ORDERS - 2; i >= 0; i--) {
order = orders[i];
page = alloc_from_secure_pool_order(heap, buffer, order);
if (IS_ERR(page))
continue;
split_page(page, order);
break;
}
/*
* Return the remaining order-0 pages to the pool.
* SetPagePrivate flag to mark memory as secure.
*/
if (!IS_ERR(page)) {
for (j = 1; j < (1 << order); j++) {
SetPagePrivate(page + j);
free_buffer_page(heap, buffer, page + j, 0);
}
}
got_page:
mutex_unlock(&heap->split_page_mutex);
return page;
}
page = alloc_from_secure_pool_order(heap, buffer, 0); 從order 數組0 中配置設定一個頁,也就是此時pool中最後的實體頁面。這裡的設計思想我猜是如果order[0]都無法配置設定出來就直接報錯,下面for 循環應該是像注釋說的多次嘗試。split_page 位于
kernel\msm-4.14\mm\page_alloc.c page_alloc.c 存放夥伴系統的核心的接口函數後面還會用裡面的配置設定記憶體的函數。split_page函數沒太看懂核心中的實作。split_page_from_secure_pool 從實體頁面分割出來的出來的頁面會在最後放到info中
page = split_page_from_secure_pool(heap, buffer);
if (!IS_ERR(page)) {
info->page = page;
info->order = 0;
info->from_pool = true;
INIT_LIST_HEAD(&info->list);
return info;
}
回到alloc_from_pool_preferred函數中繼續看alloc_from_secure_pool_order 函數的執行
struct page *alloc_from_secure_pool_order(struct ion_system_heap *heap,
struct ion_buffer *buffer,
unsigned long order)
{
int vmid = get_secure_vmid(buffer->flags);
struct ion_page_pool *pool;
if (!is_secure_vmid_valid(vmid))
return ERR_PTR(-EINVAL);
pool = heap->secure_pools[vmid][order_to_index(order)];
return ion_page_pool_alloc_pool_only(pool);
}
函數比較簡單主要是根據order找到對應的pool,然後調用
/*
* Tries to allocate from only the specified Pool and returns NULL otherwise
*/
struct page *ion_page_pool_alloc_pool_only(struct ion_page_pool *pool)
{
struct page *page = NULL;
if (!pool)
return ERR_PTR(-EINVAL);
if (mutex_trylock(&pool->mutex)) {
if (pool->high_count)
page = ion_page_pool_remove(pool, true);
else if (pool->low_count)
page = ion_page_pool_remove(pool, false);
mutex_unlock(&pool->mutex);
}
if (!page)
return ERR_PTR(-ENOMEM);
return page;
}
函數從pool中取page。這裡分為高端記憶體和低端,如果是4G記憶體空間 那麼高端記憶體是指系統使用的3G-4G空間,這裡使用高低記憶體是在從linux 夥伴系統取時候指派給pool的。
回到ion_system_heap_allocate 的while函數中,如果不是從secure pool配置設定buffer。那麼會調用alloc_largest_available函數
static struct page_info *alloc_largest_available(struct ion_system_heap *heap,
struct ion_buffer *buffer,
unsigned long size,
unsigned int max_order)
{
struct page *page;
struct page_info *info;
int i;
bool from_pool;
info = kmalloc(sizeof(*info), GFP_KERNEL);
if (!info)
return ERR_PTR(-ENOMEM);
for (i = 0; i < NUM_ORDERS; i++) {
if (size < order_to_size(orders[i]))
continue;
if (max_order < orders[i])
continue;
from_pool = !(buffer->flags & ION_FLAG_POOL_FORCE_ALLOC);
page = alloc_buffer_page(heap, buffer, orders[i], &from_pool);
if (IS_ERR(page))
continue;
info->page = page;
info->order = orders[i];
info->from_pool = from_pool;
INIT_LIST_HEAD(&info->list);
return info;
}
kfree(info);
return ERR_PTR(-ENOMEM);
}
這裡ION_FLAG_POOL_FORCE_ALLOC也判斷了是否需要強制配置設定如果需要強制配置設定那麼将不會從pool配置設定。然後調用alloc_buffer_page函數
static struct page *alloc_buffer_page(struct ion_system_heap *heap,
struct ion_buffer *buffer,
unsigned long order,
bool *from_pool)
{
bool cached = ion_buffer_cached(buffer);
struct page *page;
struct ion_page_pool *pool;
int vmid = get_secure_vmid(buffer->flags);
struct device *dev = heap->heap.priv;
if (vmid > 0)
pool = heap->secure_pools[vmid][order_to_index(order)];
else if (!cached)
pool = heap->uncached_pools[order_to_index(order)];
else
pool = heap->cached_pools[order_to_index(order)];
page = ion_page_pool_alloc(pool, from_pool);
if (IS_ERR(page))
return page;
if ((MAKE_ION_ALLOC_DMA_READY && vmid <= 0) || !(*from_pool))
ion_pages_sync_for_device(dev, page, PAGE_SIZE << order,
DMA_BIDIRECTIONAL);
return page;
}
這裡根據從那個pool 中配置設定獲得了pool 然後調用了ion_page_pool_alloc函數同時将pool和是否需要從pool傳遞下去。
struct page *ion_page_pool_alloc(struct ion_page_pool *pool, bool *from_pool)
{
struct page *page = NULL;
BUG_ON(!pool);
if (fatal_signal_pending(current))
return ERR_PTR(-EINTR);
if (*from_pool && mutex_trylock(&pool->mutex)) {
if (pool->high_count)
page = ion_page_pool_remove(pool, true);
else if (pool->low_count)
page = ion_page_pool_remove(pool, false);
mutex_unlock(&pool->mutex);
}
if (!page) {
page = ion_page_pool_alloc_pages(pool);
*from_pool = false;
}
if (!page)
return ERR_PTR(-ENOMEM);
return page;
}
如果從pool中配置設定page失敗或者不需要從pool配置設定那麼将會調用ion_page_pool_alloc_pages函數。ion_page_pool_alloc_pages實際上是調用了linux 夥伴系統配置設定接口
static void *ion_page_pool_alloc_pages(struct ion_page_pool *pool)
{
struct page *page = alloc_pages(pool->gfp_mask, pool->order);
return page;
}
回到ion_system_heap_allocate函數中的while部分
sz = (1 << info->order) * PAGE_SIZE;
if (info->from_pool) {
list_add_tail(&info->list, &pages_from_pool);
} else {
list_add_tail(&info->list, &pages);
data.size += sz;
++nents_sync;
}
size_remaining -= sz;
max_order = info->order;
i++;
由于配置設定出來的page都儲存到在info中,根據是否是從pool中配置設定的會加入到不同的連結清單中,info中的order 儲存的是2的幾次方,将它乘以實體頁面大小,就會得到這次配置設定buffer大小,然後用總的減去這次配置設定出來的(size_remaining -= sz;)在while後面就是将page加入到page表中。
這裡第一次使用pool中都是沒有page 的都是從linux 夥伴系統中那出來,pool 存放的page 是在釋放page 的時候儲存到裡面的。
回到ion_alloc_fd 函數,在産生dma-buf 後需要根據這個dma-buf産生fd調用
526int dma_buf_fd(struct dma_buf *dmabuf, int flags)
527{
528 int fd;
529
530 if (!dmabuf || !dmabuf->file)
531 return -EINVAL;
532
533 fd = get_unused_fd_flags(flags);
534 if (fd < 0)
535 return fd;
536
537 fd_install(fd, dmabuf->file);
538
539 return fd;
這裡調用了linux 提供的函數 get_unused_fd_flags獲得一個fd号,然後将dma-buf 的file 和fd綁定。
這個struct file 的擷取是在前面ion_alloc_dmabuf函數中,最後在擷取完成buffer後調用了dma_buf_export函數,這個函數
87 file = anon_inode_getfile(bufname, &dma_buf_fops, dmabuf,
488 exp_info->flags);
489 if (IS_ERR(file)) {
490 ret = PTR_ERR(file);
491 goto err_dmabuf;
492 }
493
可以看到申請file 并且綁定了前面說道的dma_buf_ops 這樣實際上通過fd就可以調用dma_buf_ops。
2.記憶體釋放
void ion_system_heap_free(struct ion_buffer *buffer)
{
struct ion_heap *heap = buffer->heap;
struct ion_system_heap *sys_heap = container_of(heap,
struct ion_system_heap,
heap);
struct sg_table *table = buffer->sg_table;
struct scatterlist *sg;
int i;
int vmid = get_secure_vmid(buffer->flags);
if (!(buffer->private_flags & ION_PRIV_FLAG_SHRINKER_FREE) &&
!(buffer->flags & ION_FLAG_POOL_FORCE_ALLOC)) {
if (vmid < 0)
ion_heap_buffer_zero(buffer);
} else if (vmid > 0) {
if (ion_hyp_unassign_sg(table, &vmid, 1, true, false))
return;
}
for_each_sg(table->sgl, sg, table->nents, i)
free_buffer_page(sys_heap, buffer, sg_page(sg),
get_order(sg->length));
sg_free_table(table);
kfree(table);
}
此函數前面是一些變量的判斷,重點在for_each_sg 将散清單中的實體頁調用free_buffer_page 函數釋放。
/*
* For secure pages that need to be freed and not added back to the pool; the
* hyp_unassign should be called before calling this function
*/
void free_buffer_page(struct ion_system_heap *heap,
struct ion_buffer *buffer, struct page *page,
unsigned int order)
{
bool cached = ion_buffer_cached(buffer);
int vmid = get_secure_vmid(buffer->flags);
if (!(buffer->flags & ION_FLAG_POOL_FORCE_ALLOC)) {
struct ion_page_pool *pool;
if (vmid > 0)
pool = heap->secure_pools[vmid][order_to_index(order)];
else if (cached)
pool = heap->cached_pools[order_to_index(order)];
else
pool = heap->uncached_pools[order_to_index(order)];
if (buffer->private_flags & ION_PRIV_FLAG_SHRINKER_FREE)
ion_page_pool_free_immediate(pool, page);
else
ion_page_pool_free(pool, page);
} else {
__free_pages(page, order);
}
}
獲得對應的pool然後調用了
void ion_page_pool_free(struct ion_page_pool *pool, struct page *page)
{
int ret;
ret = ion_page_pool_add(pool, page);
if (ret)
ion_page_pool_free_pages(pool, page);
}
這是将page儲存到了pool中,但是如果系統記憶體不夠此時需要ion中的heap 将pool存放的page 還給夥伴系統。執行這個回收過程的是shrink函數
static int ion_system_heap_shrink(struct ion_heap *heap, gfp_t gfp_mask,
int nr_to_scan)
{
struct ion_system_heap *sys_heap;
int nr_total = 0;
int i, j, nr_freed = 0;
int only_scan = 0;
struct ion_page_pool *pool;
sys_heap = container_of(heap, struct ion_system_heap, heap);
if (!nr_to_scan)
only_scan = 1;
for (i = 0; i < NUM_ORDERS; i++) {
nr_freed = 0;
for (j = 0; j < VMID_LAST; j++) {
if (is_secure_vmid_valid(j))
nr_freed += ion_secure_page_pool_shrink(
sys_heap, j, i, nr_to_scan);
}
pool = sys_heap->uncached_pools[i];
nr_freed += ion_page_pool_shrink(pool, gfp_mask, nr_to_scan);
pool = sys_heap->cached_pools[i];
nr_freed += ion_page_pool_shrink(pool, gfp_mask, nr_to_scan);
nr_total += nr_freed;
if (!only_scan) {
nr_to_scan -= nr_freed;
/* shrink completed */
if (nr_to_scan <= 0)
break;
}
}
return nr_total;
}
函數頁比較簡單,除了一些資料統計,最重要的就是調用ion_page_pool_shrink函數,函數裡面原理就是從pool中取page,然後調用
static void ion_page_pool_free_pages(struct ion_page_pool *pool,
struct page *page)
{
__free_pages(page, pool->order);
}
__free_pages 函數又是Linux 夥伴系統接口,位于kernel\msm-4.14\mm\page_alloc.c
system heap的 記憶體映射是在dma-buf 的ops中調用ion_heap_map_user 函數,此函數有個非常重要的參數struct vm_area_struct,它是程序虛拟記憶體管理的,其中有一些比較重要的變量,了解了這些變量的含義,了解下邊的代碼就非常簡單了,首先看此結構體的定義,代碼位于kernel\msm-4.14\include\linux\mm_types.h
/*
* This struct defines a memory VMM memory area. There is one of these
* per VM-area/task. A VM area is any part of the process virtual memory
* space that has a special rule for the page-fault handlers (ie a shared
* library, the executable area etc).
*/
struct vm_area_struct {
/* The first cache line has the info for VMA tree walking. */
unsigned long vm_start; /* Our start address within vm_mm. */
unsigned long vm_end; /* The first byte after our end address
within vm_mm. */
/* linked list of VM areas per task, sorted by address */
struct vm_area_struct *vm_next, *vm_prev;
struct rb_node vm_rb;
/*
* Largest free memory gap in bytes to the left of this VMA.
* Either between this VMA and vma->vm_prev, or between one of the
* VMAs below us in the VMA rbtree and its ->vm_prev. This helps
* get_unmapped_area find a free area of the right size.
*/
unsigned long rb_subtree_gap;
/* Second cache line starts here. */
struct mm_struct *vm_mm; /* The address space we belong to. */
pgprot_t vm_page_prot; /* Access permissions of this VMA. */
unsigned long vm_flags; /* Flags, see mm.h. */
/*
* For areas with an address space and backing store,
* linkage into the address_space->i_mmap interval tree.
*
* For private anonymous mappings, a pointer to a null terminated string
* in the user process containing the name given to the vma, or NULL
* if unnamed.
*/
union {
struct {
struct rb_node rb;
unsigned long rb_subtree_last;
} shared;
const char __user *anon_name;
};
/*
* A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
* list, after a COW of one of the file pages. A MAP_SHARED vma
* can only be in the i_mmap tree. An anonymous MAP_PRIVATE, stack
* or brk vma (with NULL file) can only be in an anon_vma list.
*/
struct list_head anon_vma_chain; /* Serialized by mmap_sem &
* page_table_lock */
struct anon_vma *anon_vma; /* Serialized by page_table_lock */
/* Function pointers to deal with this struct. */
const struct vm_operations_struct *vm_ops;
/* Information about our backing store: */
unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE
units */
struct file * vm_file; /* File we map to (can be NULL). */
void * vm_private_data; /* was vm_pte (shared mem) */
atomic_long_t swap_readahead_info;
#ifndef CONFIG_MMU
struct vm_region *vm_region; /* NOMMU mapping region */
#endif
#ifdef CONFIG_NUMA
struct mempolicy *vm_policy; /* NUMA policy for the VMA */
#endif
struct vm_userfaultfd_ctx vm_userfaultfd_ctx;
#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
seqcount_t vm_sequence;
atomic_t vm_ref_count; /* see vma_get(), vma_put() */
#endif
} __randomize_layout;
該結構體體作用可以參考https://linux-kernel-labs.github.io/master/labs/memory_mapping.html 文章, 在使用者程序調用mmap函數時候會建立這個結構。它描述的是實體頁對應的虛拟記憶體,它描述的是一段連續的、具有相同通路屬性的虛存空間,該虛存空間的大小為實體記憶體頁面的整數倍,結構體中每個成員的含義可以參考文章https://blog.csdn.net/ganggexiongqi/article/details/6746248
vm_start 是在程序中虛拟位址的起始位址。
int ion_heap_map_user(struct ion_heap *heap, struct ion_buffer *buffer,
struct vm_area_struct *vma)
{
struct sg_table *table = buffer->sg_table;
unsigned long addr = vma->vm_start;
unsigned long offset = vma->vm_pgoff * PAGE_SIZE;
struct scatterlist *sg;
int i;
int ret;
for_each_sg(table->sgl, sg, table->nents, i) {
struct page *page = sg_page(sg);
unsigned long remainder = vma->vm_end - addr;
unsigned long len = sg->length;
if (offset >= sg->length) {
offset -= sg->length;
continue;
} else if (offset) {
page += offset / PAGE_SIZE;
len = sg->length - offset;
offset = 0;
}
len = min(len, remainder);
ret = remap_pfn_range(vma, addr, page_to_pfn(page), len,
vma->vm_page_prot);
if (ret)
return ret;
addr += len;
if (addr >= vma->vm_end)
return 0;
}
return 0;
}
回到代碼中addr = vma->vm_start 儲存了虛拟位址的其實位址,vm_pgoff是該虛存空間起始位址在vm_file檔案裡面的檔案偏移,機關為實體頁面。比如現在有64個實體頁面,使用者在映射的時候使用第5個頁面開始映射10個頁面,那麼這個vm_pgoff應該就是5.for_each_sg 代碼主要是将sg散清單中存放的實體頁面拿出來進行映射,首先看offset >= sg->length 這句代碼,為什麼要判斷,如果offset 是便宜6個實體頁面,當時這個sg隻存放了5個實體頁面,現在我們正常肯定是在下一個sg中在取一個頁面構成,6個頁面,是以
下面相關代碼就是做這部分功能
if (offset >= sg->length) {
87 offset -= sg->length;
88 continue;
89 } else if (offset) {
90 page += offset / PAGE_SIZE;
91 len = sg->length - offset;
92 offset = 0;
93 }
我們假設下一個sg有三個實體頁面,那麼我們隻需要在這個sg上page +1 就可以。現在offset就是1,在if 執行過程中 offset -= sg->length,這裡其實已經6-5了。 len 變量就變成了3 -1 變成了2 個。offfset 因為後面不在需要是以設定為0, 我們需要将這兩個進行映射,是以下面調用了linux 核心的remap_pfn_range的函數,此函數網上資料很多。映射到使用者函數這裡也就執行完成了