天天看點

Core Data in the Background

Before doing anything concurrency with Core Data, it is important to get the basics right. We strongly recommended reading through Apple's Concurrency with Core Data guide. This document lays down the ground rules, such as never passing managed objects between threads. This doesn't jsut mean that you should never modify a managed object on another thread, but also that you should never read any properties from it. To pass around an object, pass its object ID and retrieve the object from the context associated to the other thread.

Doing concurrent programming with Core Data is simple when you stick to those rules and use the method described in this article.

The standard setup for Core Data in the Xcode templates is one persistent store coordinator with one managed object context that runs on the main thread. For many use cases, this is just fine. Creating some new objects and modifying existing objects is very cheap and can be done on the main thread without problems. However, if you wnat to do big chunks of work, then it makes sense to do this in a background context. A prime example for this is importing large data sets into Core Data.

Our approach is very simple, and well-covered in existing literature:

1、We create a separate operation for the import work.

2、We create a managed object context with the same persistent store coordinator as 

    the main managed object context.

3、Once the import context saves, we notify the main managed object context and merge

    the changes.

In the example application, we will import a big set of transit data for the city of Berlin. During the import, we show a progress indicator, and we'd like to be able to cancel the current import if it's taking too long. Also, we show a table view with all the data available so far, which automatically updates when new data comes in. The example data set is publicly available under the Creative Commons license, and you can download it here. it confirms to the General Transit Feed format, an open standard for transit data.

We create an ImportOperation as a subclass of NSOperation, which will handle the import. We override the main method, which is the method that will do all the work. Here we create a separate managed object context with private queue concurrency type. This means that this context will manage its own queue, and all operations on it need to be performed using performBlock or performBlockAndWait. This is crucial to make sure that they will be exectued on the right thread.

NSManagedObjectContext *context = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSPrivateQueueConcurrencyType];

context.persistentStoreCoordinator = self.persistentStoreCoordinator;

context.undoManager = nil;

[self.context performBlockAndWait:^{

    [self import];

}];

Note that we reuse the existing persistent store coordinator. In modern code, you should initialize managed object contexts with either the NSPrivateQueueConcurrencyType or the NSMainQueueConcurrencyType. The third concurrency type constant, NSConfinementConcurrencyType, is for legacy code, and our advice is to not use it anymore.

To do the import, we iterate over the lines in our file and create a managed object for each line that we can parse:

[lines enumerateObjectsUsingBlock:^(NSString *line, NSUInteger idx, BOOL *shouldStop){

    NSArray *components = [line csvComponents];

    if(components.count < 5) {

        NSLog(@"couldn't parse: %@", components);

        return;

    }

    [Stop importCSVComponents:components intoContext:context];

To start this operation, we perform the following code from our view controller:

ImportOperation *operation = [[ImportOperation alloc] initWithStore:self.stor fileName:fileName];

[self.operationQueue addOperation:operation];

For importing in the background, that's all you have to do. Now, we will add support for cancelation, and luckily, its as simple as adding one check inside the enumeration block:

if(self.isCancelled) {

    *shouldStop = YES;

    return;

}

Finally, to support progress indication, we create a progressCallback property on our operation. It is vital that we update our progress indicator on the main thread, otherwise UIKit will crash:

operation.progressCallback = ^(float progress) {

    [[NSOperationQueue mainQueue] addOperationWithBlock:^{

        self.progressIndicator.progress = progress;

    }];

To call the progress block, we add the following line in the enumeration block:

self.progressCallback(idx / (float) count);

However, if you run this code, you will see that everything slows down enormously. Also, it looks like the operation doesn't cancel immediately. The reason for this is that the main operation queue fills up with blocks that want to update the progress indicator. A simple solution is to decrease the granularity of updates, i.e. we only call the progress callback for one percent of the lines imported:

NSInteger progressGranularity = lines.count / 100;

if(idx % progressGranularity == 0) {

    self.progressCallback(idx / (float) count);

Updating the Main Context

The table view in our app is backed by a fetched results controller on the main thread. During and after the import, we'd like to show the results of the import in our table view.

There is one missing piece to make this work; the data imported into the background context will not propagate to the main context unless we explicitly tell it to do so. We add the following line to the init method of the Store class where we set up the Core Data stack:

[[NSNotificationCenter defaultCenter] addObserverForName:NSManagedObjectContextDidSaveNotification object:nil queue:nil usingBlock:^(NSNotification *note) {

    NSManagedObjectContext *moc = self.mainManagedObjectContext;

    if(note.object != moc) {

        [moc performBlock:^() {

            [moc mergeChangesFromContextDidSaveNotification:note];

        }

Note that by calling performBlock: on the main managed object context, the block will be called on the main thread. If you now start the app, you will notice that the table view reloads its data at the end of the import. However, this blocks the user interface for a couple of seconds.

To fix this, we need to do something that we should have done anyway: save in batches. When doing large imports, you want to ensure that you save regularly, otherwise you might run out of memory, and performance generally will get worse. Furthermore, saving regularly spreads out the work on the main thread to update the table view over time.

How often you save is a matter of trial and error. Save too often, and you'll spend too much time doing I/O. Save too little, and the app will become unresponsive. We set the batch size to 250 after trying out some different numbers. Now the iport is smooth, updates the table view, and doesn't block the main context for too long.

Other Considerations

In the import operation, we read the entire file into a string and then split that into lines. This will work for relatively small files, but for larger files, it makes sense to lazily read the file line by line. The last example in this article will do exactly that by using input streams. There's also an excellent write-up on StackOverflow by Dave DeLong that shows how to do this.

Instead of importing a large data set into core data when the app first runs, you could also ship an sqlite file within your app bundle, or download it from a server, where you could even generate it dynamically. If your particular use case works with this solution, it will be a lot faster and save processing time on the device.

Finally, there is a lot of noise about child contexts these days. Our advice is not to use them for background operations. If you create a background context as a child of the main context, saving the background context will still block the main thread a lot. If you create the mian cotnext as a child of a background context, you actually don't gain anything compared to a more traditional setup with two independent contexts, because you still have to merge the changes from the background to the main context manually.

The setup with one persistent store coordinator and two independent contexts is the proven way of doing core data in the background. Stick with it unless you have really good reasons not to.

繼續閱讀