Shrikar Archak

It does not matter how slow you go so long as you do not stop. ~Confucius

IOS8 Cloudkit Tutorial - Part 2

This is the part 2 of the iOS 8 CloudKit tutorial series. If you haven’t read the part 1 I suggest you to take a look at Part1 before you move on to the next section

What we will cover

  • UI for ListViewController
  • NSPredicate
  • NSSortDescriptors
  • CKQuery
  • Protocol and Delegate/Delegation iOS (Our example CloudKitDelegate)
  • CKModifyRecordsOperation
  • Final Working Fetching Code

UI for ListViewController

This is how our story board looked at the end of part 1. For more information on layouts checkout these videos iOS Adaptive Layouts and iOS Autolayouts images

Lets go to object library and drag a navigation controller from the library onto the main storyboard. images

Select the view controller which was already existing. Goto Editor embed in navigation controller. images

CTRL-DRAG from Root View Controller to the just embedded navigation view controller and select present modally.

images

Add bar buttons for Cancel and Done . Finally setup the IBActions and IBOutlets.

NSPredicate

NSPredicate is basically the matching criteria like id should be "123" or id should be "123" and count > 5

predicate can’t be nil if you want to fetch all records use the below mentioned predicate
let predicate = NSPredicate(value: true)

NSSortDescriptors

NSSortDescriptors defined the order in which the records are retrieved from the iCloud using cloudKit. Its take the key on which the sorting is supposed to be done along with the ordering like ascending:true or ascending:false. If there are more than one sort descriptors they can be passed as an array.

CKQuery

CKQuery is analogous to select query in RDBMS world. The query consists of 2 or more parts.

  • RecordType : What type of object to search for. In our example it is Todos but for other applications it can be a Post, Message etc.
  • Predicate : Predicates are the condition on which the records should be matched against
  • SortDescriptors : The order in which the keys should be returned. We provide the key and the order like ascending or descending.

Example of the above scenario

Query
1
2
3
4
5
6
let predicate = NSPredicate(value: true)
let sort = NSSortDescriptor(key: "creationDate", ascending: false)

let query = CKQuery(recordType: "Todos",
    predicate:  predicate)
query.sortDescriptors = [sort]

Protocol and Delegate/Delegation iOS

Protocols : are similar to interfaces in OOP world.

Delegate : “A delegate is an object that acts on behalf of, or in coordination with, another object when that object encounters an event in a program.”

In CloudKit most of the operations are async, hence are very good candidates for the Delegation pattern. Delegating object will send a message or call a callback when certain events are completed. Its the responsibility of the delegate object to implement those protocols and handle callbacks generated by the delegating object.

CloudKitDelegate
1
2
3
4
protocol CloudKitDelegate {
    func errorUpdating(error: NSError)
    func modelUpdated()
}

In our case our viewcontroller will handle this protocol and take appropriate actions when CloudKit events are triggered

CKModifyOperation

I faced a wierd issue when I tried to display the todo entries after adding to iCloud. I was not able to fetch the last entry added. Documentation mentioned that all the operations are async and are run on the low priority threads and if I need to save something immediately I need to use CKModifyOperation. I tried but unfortunately didn’t work for me. If someone finds a solution to this please let me know in the comments.

This method saves the record with a low priority, which may cause the task to execute after higher-priority tasks. To save records more urgently, create a CKModifyRecordsOperation object with the desired priority. You can also use that operation object to save multiple records simultaneously.

Problem

CloudKit not returning the most recent data

fetch
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
let todoRecord = CKRecord(recordType: "Todos")
todoRecord.setValue(todo, forKey: "todotext")
publicDB.saveRecord(todoRecord, completionHandler: { (record, error) -> Void in
        NSLog("Saved in cloudkit")
        let predicate = NSPredicate(value: true)
        let query = CKQuery(recordType: "Todos",
            predicate:  predicate)

        self.publicDB.performQuery(query, inZoneWithID: nil) {
            results, error in
            if error != nil {
                dispatch_async(dispatch_get_main_queue()) {
                    self.delegate?.errorUpdating(error)
                    return
                }
            } else {
                NSLog("###### fetch after save : \(results.count)")
                dispatch_async(dispatch_get_main_queue()) {
                    self.delegate?.modelUpdated()
                    return
                }
            }
        }

Result

Before saving in cloud kit : 3
Saved in cloudkit
###### Count after save : 3

WorkAround

Add it to the todos array on the client side.

What I tried

CKModifyRecordOperation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
let ops = CKModifyRecordsOperation(recordsToSave: [todoRecord], recordIDsToDelete: nil)
   ops.savePolicy = CKRecordSavePolicy.AllKeys
  
   ops.modifyRecordsCompletionBlock = { savedRecords, deletedRecordIDs, error in
       NSLog("Completed Save to cloud")
  
       let predicate = NSPredicate(value: true)
       let query = CKQuery(recordType: "Todos",
           predicate:  predicate)
  
       self.publicDB.performQuery(query, inZoneWithID: nil) {
           results, error in
           if error != nil {
               dispatch_async(dispatch_get_main_queue()) {
                   self.delegate?.errorUpdating(error)
                   return
               }
           } else {
               self.todos.removeAll()
               for record in results{
  
                   let todo = Todos(record: record as CKRecord, database: self.publicDB)
                   self.todos.append(todo)
  
               }
               NSLog("fetch after save : \(self.todos.count)")
               dispatch_async(dispatch_get_main_queue()) {
                   self.delegate?.modelUpdated()
                   return
               }
           }
       }
   }
   publicDB.addOperation(ops)

Final Working Fetching Code

CloudKitHelper
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
import Foundation
import CloudKit

protocol CloudKitDelegate {
    func errorUpdating(error: NSError)
    func modelUpdated()
}


class CloudKitHelper {
    var container : CKContainer
    var publicDB : CKDatabase
    let privateDB : CKDatabase
    var delegate : CloudKitDelegate?
    var todos = [Todos]()

    class func sharedInstance() -> CloudKitHelper {
        return cloudKitHelper
    }

    init() {
        container = CKContainer.defaultContainer()
        publicDB = container.publicCloudDatabase
        privateDB = container.privateCloudDatabase
    }

    func saveRecord(todo : NSString) {
        let todoRecord = CKRecord(recordType: "Todos")
        todoRecord.setValue(todo, forKey: "todotext")
        publicDB.saveRecord(todoRecord, completionHandler: { (record, error) -> Void in
            NSLog("Before saving in cloud kit : \(self.todos.count)")
            NSLog("Saved in cloudkit")
            self.fetchTodos(record)
        })

    }

    func fetchTodos(insertedRecord: CKRecord?) {
        let predicate = NSPredicate(value: true)
        let sort = NSSortDescriptor(key: "creationDate", ascending: false)

        let query = CKQuery(recordType: "Todos",
            predicate:  predicate)
        query.sortDescriptors = [sort]
        publicDB.performQuery(query, inZoneWithID: nil) {
            results, error in
            if error != nil {
                dispatch_async(dispatch_get_main_queue()) {
                    self.delegate?.errorUpdating(error)
                    return
                }
            } else {
                self.todos.removeAll()
                for record in results{
                    let todo = Todos(record: record as CKRecord, database: self.publicDB)
                    self.todos.append(todo)
                }
                if let tmp = insertedRecord {
                    let todo = Todos(record: insertedRecord! as CKRecord, database: self.publicDB)
                    /* Work around at the latest entry at index 0 */
                    self.todos.insert(todo, atIndex: 0)
                }
                NSLog("fetch after save : \(self.todos.count)")
                dispatch_async(dispatch_get_main_queue()) {
                    self.delegate?.modelUpdated()
                    return
                }
            }
        }
    }
}
let cloudKitHelper = CloudKitHelper()

If you have any questions/comments do comment on the post.

Github Repo : CloudKit

Learn iOS8 with Swift

IOS8 Cloudkit Tutorial - Part 1

In this tutorial we will create an iOS app that will store a simple text in icloud using CloudKit technology released for iOS8

What we will cover

  • Creating a new project.
  • CloudKit Configuration
  • CloudKit Terminologies
  • Schema design in Cloudkit Dashboard
  • CloudKit Workflow
  • Save records in iCloud Storage.

CloudKit helps your to move structured data between your app and iCloud.

Creating a new project

  • Open Xcode
  • File > New Project > Single Page Application
  • ProductName : CloudKit
  • Language : Swift
  • Device : Iphone
  • Next and save the project

CloudKit Configuration

Click on the target and make sure we have configured our app for iCloud.

images
images

CloudKit Terminologies

  • CKContainer represents a namespace for your app. If your account has 2 different apps then there will be 2 container one for each app.
  • Container is divided into 2 databases : public and private database(CKDatabase). Data stored in the private database is only accessible to the current user and in their user’s icloud account. Public Database is accessible as readonly to all people who are not logged in using iCloud account.
  • Record Type is a schema for the objects that are stored in icloud.( Think class in OOP terms)
  • CKRecord is an instance of the record type (Think objects in OOP terms) and contains key value pairs which represent the object.

Schema Design

  • login to cloudkit dashboard
  • Create a new Record Type called Todos. Our example we will just have one field todotext which will be of type text. images
    images

CloudKit Workflow

  • Get current container
  • Get the CKDatabase object that corresponds to the database (public or private) that contains the records.
  • After storing the data you can find them in cloudkit dashboard -> Public Data -> default zone
CloudKitHelper
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import Foundation
import CloudKit

class CloudKitHelper {
    var container : CKContainer
    var publicDB : CKDatabase
    let privateDB : CKDatabase

    init() {
        container = CKContainer.defaultContainer()
        publicDB = container.publicCloudDatabase
        privateDB = container.privateCloudDatabase
    }

    func saveRecord(todo : NSString) {
        let todoRecord = CKRecord(recordType: "Todos")
        todoRecord.setValue(todo, forKey: "todotext")
        publicDB.saveRecord(todoRecord, completionHandler: { (record, error) -> Void in
            NSLog("Saved to cloud kit")

        })

    }
}

Run the project in Simulator

If you get an error like this or the data is not stored in iCloud.

Some error you might face : Not Authenticated” (9/1002); “This request requires an authenticated account

Make sure you are logged into icloud in the icloud simulator.

Settings -> icloud -> login

Enter some text in the textfield and save to CloudKit. Go to icloud dashboard to see if the data is stored properly.

images

In the next part we will see how we can fetch the data from icloud using CloudKit. Let me know if you have any comments in the section below. Find more information about Part 2 here

Github Repo : CloudKit

Learn iOS8 with Swift

How to Build a Trivia App Using Swift Programming Language

What we will cover

  • Creating a new project.
  • Grand Central Dispatcher
  • How to fetch data from web using Swift.
  • Tweet Functionality
  • Run in Simulator

Requirements Xcode6

Create a new project

  • Open Xcode
  • File > New Project > Single Page Application
  • ProductName : TriviaApp
  • Language : Swift
  • Device : Iphone
  • Next and save the project

images

images

Grand Central Dispatcher

GCD (Not greatest common divisor) is a technology built by apple for efficiently using the multi core processors on IOS and OS X for improving the performance of the app.

Apps tend to become unresponsive when we perform tasks which take a long time like fetching data from a web server in the main thread. Ideally the main thread should be for handling touch events and reacting to the users events in realtime.

GCD help by pushing the slow running tasks to a background job queue to be executed concurrently without blocking the main thread.

GCD provide abstraction over the thread pool interface and helps user in writing code easily without worrying much about the the concurrency model.

How to fetch data from web using Swift.

There are different ways of fetching the data from the web but in our case we will use NSURLSession.sharedSession() to have singleton object.

A singleton class returns the same instance no matter how many times an application requests it. A typical class permits callers to create as many instances of the class as they want, whereas with a singleton class, there can be only one instance of the class per process. A singleton object provides a global point of access to the resources of its class. Singletons are used in situations where this single point of control is desirable, such as with classes that offer some general service or resource.

Lets implement the randomFact() function which will return a random fact each time we call the function.

randomFact
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
   func randomFact() {
        let baseURL = NSURL(string: "http://numbersapi.com/random/trivia")
        let downloadTask = session.downloadTaskWithURL(baseURL, completionHandler: { (location, response, error) -> Void in
            if(error == nil){
                let objectData = NSData(contentsOfURL: location)
                let tmpData :NSString = NSString(data: objectData, encoding: NSUTF8StringEncoding)

                dispatch_async(dispatch_get_main_queue(), { () -> Void in                    self.factText.text = tmpData
                    if (tmpData.length > 130){
                        self.tweetButtonLabel.hidden = true
                    } else {
                        self.tweetButtonLabel.hidden = false
                    }
                    self.activityIndicator.stopAnimating()
                    self.activityIndicator.hidden = true
                })
            } else {
                let alertViewController = UIAlertController(title: "Error", message: "Couldn't connect to network", preferredStyle: .Alert)
                let okButton = UIAlertAction(title: "OK", style: .Default, handler: nil)
                let cancelButton = UIAlertAction(title: "Cancel", style: .Cancel, handler: nil)
                alertViewController.addAction(okButton)
                alertViewController.addAction(cancelButton)
                self.presentViewController(alertViewController, animated: true, completion: nil)
                dispatch_async(dispatch_get_main_queue(), { () -> Void in
                    self.activityIndicator.stopAnimating()
                    self.activityIndicator.hidden = true
                })
            }

        })
        downloadTask.resume()
    }

These tasks are executed in the background

  • First we create a baseURL which point to the api which return a random fact
  • Using the session object(Shared) we create a downloadTask to fetch the data and pass a completionHandler that will be called when the data is ready to be used
  • CompletionHandler has the following semantics. (location: where the data is stored locally, response : response from the web call, error : in case there are any errors)
  • The first thing we look at is if there is any error, if not we continue using the data by passing it to NSData and eventually to NSString.

These tasks occur in the main thread.

  • Get the reference to the main queue and pass it to the dispatch_async queue
  • Execute necessary code in the main thread for updating the UIView.

Tweet Functionality

The tweet functionality is provided by the Social Framework introduce in IOS. However to use this feature we need to add the Social.Framework to our app. Click on the target button on the left side bar of the xcode.

images
images

The Social framework (Social.framework) provides a simple interface for accessing the user’s social media accounts. This framework supplants the Twitter framework and adds support for other social accounts, including Facebook, Sina Weibo, and others. Apps can use this framework to post status updates and images to a user’s account. This framework works with the Accounts framework to provide a single sign-on model for the user and to ensure that access to the user’s account is approved.

tweet functionality
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
func tweetButton() {
        if SLComposeViewController.isAvailableForServiceType(SLServiceTypeTwitter) {

            var tweetSheet = SLComposeViewController(forServiceType: SLServiceTypeTwitter)
            tweetSheet.setInitialText(factText.text + " #TriviaApp")
            self.presentViewController(tweetSheet, animated: true, completion: nil)
        } else {
            let alertViewController = UIAlertController(title: "Oops", message: "No Twitter Account connected on the device. Go to Settings > Twitter and add a twitter account", preferredStyle: .Alert)
            let okButton = UIAlertAction(title: "OK", style: .Default, handler: nil)
            let cancelButton = UIAlertAction(title: "Cancel", style: .Cancel, handler: nil)
            alertViewController.addAction(okButton)
            alertViewController.addAction(cancelButton)
            self.presentViewController(alertViewController, animated: true, completion: nil)

        }
    }

Clone and run the project

You can find the code on Github. Clone the project and run https://github.com/sarchak/TriviaApp images

I used Treehouse to get started with IOS8 and Swift. I strongly recommend to try it out if you want to get started with IOS8 and Swift. Here are the links Non Affiliate link and Affiliate link

Swift Programming Language

Swift is the new programming language from Apple for developing IOS Apps. I must agree that its lot easier for a newbie to get started with Swift than Objective-c. Here are a few things which I found interesting

Named Parameters

Here are the two ways how you can implement area function

area
1
2
3
4
5
6
7
func area(height : Int,width: Int) -> Int {
  return height * width;
}

let calculatedArea = area(10,20)
println("Area : \(calculatedArea)")
Area : 200

Lets implement the same area function using named parameter

area
1
2
3
4
5
6
func areaNamedParameter(#height : Int,#width: Int) -> Int {
    return height * width;
}
let calculatedAreaParametered = areaNamedParameter(height:50, width:20)
println("Area : \(calculatedAreaParametered)")
Area : 1000

One more example is to return tuple and named tuples

A tuple type is a comma-separated list of zero or more types, enclosed in parentheses. Tuple can be used to return multiple values from a function and we can also name the returned tuple so that we can dereference the value by using the name. Lets write a simple function to convert a hash to a named tuple.

area
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
func hashToTuple(myhash: [String:String]) -> (title:String?, name:String?) {
    return (title: myhash["title"],name:myhash["name"])
}

let data = hashToTuple(["title":"Software Engineer","name":"Shrikar"])

if let title = data.title {
    println(title)
}

if let name = data.name {
    println(name)
}

Software Engineer
Shrikar

let dataPartial = hashToTuple(["title":"Software Engineer"])
if let title = dataPartial.title {
    println(title)
}

if let name = dataPartial.name {
    println(name)
}
Software Engineer

String Optionals

Optionals are an example of the fact that Swift is a type safe language. Swift helps you to be clear about the types of values your code can work with. If part of your code expects a String, type safety prevents you from passing it an Int by mistake. This enables you to catch and fix errors as early as possible in the development process.

Some places optionals are useful:

  • When a property can be there or not there, like middleName or spouse in a Person class
  • When a method can return a value or nothing, like searching for a match in an array
  • When a method can return either a result or get an error and return nothing
  • Delegate properties (which don’t always have to be set)
  • For weak properties in classes. The thing they point to can be set to nil
  • For a large resource that might have to be released to reclaim memory

Lets take an example

area
1
2
3
4
5
6
7
8
func returnOptional(parameter: String) -> String? {
    if parameter == "iOS"{
        return "Awesome"
    }
    else {
        return nil
    }
}

The above function will return “Awesome” if we pass “iOS” and nil otherwise.

area
1
2
3
4
5
6
7
let retdata = returnOptional("iOS")
if let newdata = returnOptional("iOS") {
    println(newdata)
} else{
    println("Not cool")
}
Awesome

Optional Chaining

Optional chaining is a process for querying and calling properties, methods, and subscripts on an optional that might currently be nil. If the optional contains a value, the property, method, or subscript call succeeds; if the optional is nil, the property, method, or subscript call returns nil. Multiple queries can be chained together, and the entire chain fails gracefully if any link in the chain is nil.

This implementations looks a lot cleaner in converting the returned value to lowercase depending on what optional is returned.

area
1
2
3
4
5
6
let retdata = returnOptional("iOS")
if let newdata = returnOptional("iOS")?.lowercaseString {
    println(newdata)
} else{
    println("Not cool")
}

awesome

area
1
2
3
4
5
if let newdata = returnOptional("Some Random OS")?.lowercaseString {
    println(newdata)
} else{
  println("Not cool")
}

Not cool

This is not a complete list of some interesting features so please feel free to comment and I will add those to the list above.

I used Treehouse to learn swift. I strongly recommend to try it out if you want to get started with iOS and Swift. Here are the links Non Affiliate link Affiliate link

Skills Required for You to Succeed at a Startup

Angellist is a startup which invest online in new startups. It is also a gold mine for startup jobs ranging from developer to designer on the technical side to VP to sales rep on the business side. I have seen many posts where people ask about the skills which are required to succeed at a startup. In this post lets identify these skill and the roles using angellist api’s.

Lets get started

  • Fetch the jobs data from the angellist api here
  • Store the data in mongodb
  • Run mapreduce or aggregate job to get the top skills and roles.

Tweet: Skills Required for You to Succeed at a Startup http://ctt.ec/cecog+

Tweet: Skills Required for You to Succeed at a Startup http://ctt.ec/cecog+

Key notes

  • Clearly popular startups are using different frameworks like rails,node.js and python which I assume might be Django related.
  • IOS wins over Android when it comes to Mobile apps.
  • Mysql, Mongodb and Redis are the key databases.

Script for fetching the jobs

angel_jobs.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import requests
import pymongo
from pymongo import Connection
import sys

con = Connection("localhost",27017)
angeldb = con.angeldb
jobs = angeldb.jobs

page = 1
while True:

    r = requests.get("https://api.angel.co/1/jobs?&page="+str(page))
    data = r.json();
    for job in data["jobs"]:
        jobs.save(job)
    last_page = int(data["last_page"])
    page = page + 1
    if(last_page == page):
        break;
    print "Curr page till now : "+ str(page) + " Last page : " + str(last_page)

Mapreduce job for finding the top skills

mapreduce.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
var mapFunction = function() {
  for(var idx = 0 ; idx < this.tags.length; idx++){
      if(this.tags[idx].tag_type == "SkillTag"){
          emit(this.tags[idx].name, 1)
      }
  }
};

var reduceFunction = function(tag, valuesPrices) {
  return Array.sum(valuesPrices);
};

db.jobs.mapReduce(
  mapFunction,
  reduceFunction,
  { out: "top_skill" }
)

Mapreduce job for finding the top roles

mapreduce.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
var mapFunction = function() {
  for(var idx = 0 ; idx < this.tags.length; idx++){
      if(this.tags[idx].tag_type == "RoleTag"){
          emit(this.tags[idx].name, 1)
      }
  }
};

var reduceFunction = function(tag, valuesPrices) {
  return Array.sum(valuesPrices);
};

db.jobs.mapReduce(
  mapFunction,
  reduceFunction,
  { out: "top_roles" }
)

Once we run the above mapreduce scripts in the mongodb shell, we will have two collections top_roles & top_skill sorting them by the count value will give us what we want.

While analysing these startups and jobs I found out many of the X for Y startups and will be providing a detailed information about that in the next post stay tuned. Also let me know if you want to see anything else around this data.

Here is a sample

  • ) Uber & AirBnB of Food!
  • ) Uber for Bears
  • ) Uber for CraigsList
  • ) Uber for Healthcare; doctors come to you, anytime, anywhere.
  • ) Uber for Laundry
  • ) Uber for Lawn Mowing connecting the 74B US Lawn Care Market
  • ) Uber for Massage
  • ) Uber for Out of Home Advertising
  • ) Uber for Pizzas
  • ) Uber for Shipping
  • ) Uber for Tech Support
  • ) Uber for Tennis
  • ) Uber for Urban Logistics
  • ) Uber for auto rickshaws
  • ) Uber for boats!
  • ) Uber for career planning & development
  • ) Uber for dog walking
  • ) Uber for everything with the power to choose
  • ) Uber for food & drinks at bars, restaurants, and coffee shops
  • ) Uber for food and drink!
  • ) Uber for food delivery
  • ) Uber for hotels
  • ) Uber for lines
  • ) Uber for moving goods
  • ) Uber for trucking

Tweet: Skills Required for You to Succeed at a Startup http://ctt.ec/cecog+

Trending Topics on Twitter

Twitter is an important platform when it comes finding interesting topics in realtime. One of the interesting project we can build is figuring out these topics in realtime. Lets find out some interesting topics which are connected to given topic. Example is finding out what are people talking about in big data community. This particular problem statement can be applied to many other general problems.

Building the components.

  • The first component is building a component which can fetch the the data from the twitter. Twitter provides a streaming api for fetching data for a given topic which in our case is big data. More info here make sure you have
     track=["bigdata","big data"] 
  • Counting the high frequency words from the data collected. Take a look at NLTK it provides mechanism for tokenizing and also has a frequency distribution mechanism.
  • NLTK also provides a mechanism for ignoring the stopwords like
     a,if,the,was etc 
  • The top keywords in the frequency dist are the most used words and hence suggest importance
  • To help you get started we will provide a python script for fetching data. Make sure you install tweepy like
     sudo easy_install tweepy 
  • Create an app on twitter and insert the necessary consumer_key,consumer_secret, access_token, access_secret.
fetch.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import json
import sys;
import time;

# Go to http://dev.twitter.com and create an app.
# The consumer key and secret will be generated for you after
consumer_key="Insert consumer key here"
consumer_secret="Insert consumer secret here"

access_token="Insert access token here"
access_token_secret="Insert access token here ";

class StdOutListener(StreamListener):

    """ A listener handles tweets are the received from the stream.
    This is a basic listener that just prints received tweets to stdout.

    """
    def on_data(self, data):
        print data
        return True

    def on_error(self, status):

if __name__ == '__main__':
    l = StdOutListener()
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)

    stream = Stream(auth, l)
    stream.filter(track=["big data","bigdata"]);

Voice of Internet : Twilio,ParseApp and Webflow Based Audition Platform

I have been playing with many cloud based platform and some platform really standout from the rest. With these new platforms the time taken to build an app has reduced drastically and its a lot easier to get started as well.

A few platforms which I used to build this app are

Voice Of Internet

Voice of internet is a platform where people can call a number and display their talent within 2 minutes. It can be either singing,playing instrument or whatever you think of. Once the content is PG rated you can vote by sending a sms to the phone number mentioned.

Give it a try here

Voice Of Internet

SmartCopy: Intelligent Layer on Top of Existing Cloud Storage

Simple feature matters

With so many options available in the cloud storage space I am sure everyone uses one or more of those cloud storages(Dropbox/Box/Google Drive etc). One key missing feature is to provide a simple way to exclude files from syncing.

Storage space is not free

Storage space is not free so it really matter what we sync to cloud. Nitpicking individual files to save space is not an easy option, so we tend to copy files which we dont need.

It was not just me who was facing the above mentioned problem there were similar feature request in dropbox forums , box , google drive forums. I wonder why these simple features we ignored anyway enough of nagging lets get to the good part.

Deciding what language or tools to use.

Languages I know : C, C++, Java, Scala, Python After working in C/C++ for a long time I knew managing binaries and shared libraries will be painful hence eliminated them.

Requirements:

  • Should support monitoring directory/file changes.All three languages java,scala and python qualify for this
  • Should be installed by default or installation should be bare minimum. Python is installed by default on all OS and hence a good candidate.
  • Should be unix based system with support for forking. (Thanks for the comment from Nei)

Python it is!!

Design

I followed a similar method to .gitignore and hence decided to have a list of all the pattern that need to be ignored from syncing

Example

  • .*.jar : Ignore all the files containing .jar
  • .class$ : Ignore all the files ending with .class
  • ^Bingo : Ignore all the files starting with Bingo

For more information on using regular expression please check the python regex documentation.

Components

  • ) smartcopyd : SmartCopy Daemon smartcopydaemon monitors for changes to a directory , filter the files according to the ignore patterns and sync’s to the cloud storage.

  • ) smartcopy : SmartCopy Client smartcopy allows you to change the config file and modify any ignore pattern rules.

Possible improvements/features

If you need a feature do tweet. Feature with more tweets or retweets wins and will be implemented next

Github repo : SmartCopy

Learn how to build a game like flappy bird

Docker Nginx and Sentiment Engine on Steroids

Recipe for 74 Million request per day

In the blog post I will explain a battle tested setup which could let you scale http requests upto 860 req/s or a cummulative of 74Million requests per day.

Lets start with our requirements. We needed a low latency sentiment classification engine for serving literally millions of Social Mentions per day. Of late load against sentiment engine cluster has been increasing considerably after Viralheat’s pivot to serve Enterprise customers. The existing infrastructure was not able to handle the new load forcing us to have friday night out to fix it.

Setup

  • ) Nginx running on bare metal
  • ) Sentiment Engine powered by Torando Server in Docker instances. ( Docker version 0.7.5)

In a perfect world the default kernel setting should have worked for any kind of workload but in reality it wont work. The defaults kernel setting are not suitable for high load and are mainly for general purpose networking. In order to serve heavy short lived connections we need to modify/tune certain OS setting along with the tcp settings.

First increase the open file limit

Modify /etc/security/limits.conf to have a high number for open file descriptors. Since every open files takes some OS resources make sure you have sufficient memory don’t blindly increase the open file limits.

/etc/security/limits.conf
1
2
*               soft     nofile          100000
*               hard     nofile          100000

Sysctl Changes

Modify /etc/sysctl.conf to have these parameters.

/etc/sysctl.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
fs.file-max = 100000
net.ipv4.ip_local_port_range = 2000 65000
net.ipv4.tcp_fin_timeout = 5
net.ipv4.tcp_keepalive_time = 1800
net.ipv4.tcp_window_scaling = 0
net.ipv4.tcp_sack = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_syn_backlog = 3240000
net.ipv4.tcp_max_tw_buckets = 1440000
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_congestion_control = cubic

net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
  • net.ipv4.ip_local_port_range Nginx need to create two connection for every request, one to the client and the other one to the upstream server. So increasing the port range will prevent for port exhaustion.
  • net.ipv4.tcp_fin_timeout The minimum number of seconds that must elapse before a connection in TIME_WAIT state can be recycled. Lowering this value will mean allocations will be recycled faster
  • net.ipv4.tcp_tw_recycle Enables fast recycling of TIME_WAIT sockets. Use with caution and ONLY in internal network where network connectivity speeds are “faster”.
  • net.ipv4.tcp_tw_reuse This allows reusing sockets in TIME_WAIT state for new connections when it is safe from protocol viewpoint. Default value is 0 (disabled). It is generally a safer alternative to tcp_tw_recycle. Note: The tcp_tw_reuse setting is particularly useful in environments where numerous short connections are open and left in TIME_WAIT state, such as web servers. Reusing the sockets can be very effective in reducing server load.

Make sure you run sudo sysctl -p after making modifications to the sysctl.conf.

NGINX Configurations

nginx.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
worker_processes  auto;
worker_rlimit_nofile 96000;
events {
  use epoll;
  worker_connections  10000;
  multi_accept on;
}

http {
        sendfile    on;
        tcp_nopush  on;
        tcp_nodelay on;

        reset_timedout_connection on;
}

upstream sentiment_server{
        server server0:9000;
        server server1:9001;
        server server2:9002;
        server server3:9003;
        server server4:9004;
        server server5:9005;
        server server6:9006;
        server server7:9007;
        server server8:9008;
        server server9:9009;
        server server10:9010;
        server server11:9011;
      keepalive 512;
}

server {
  server_name serverip;
  location / {

    proxy_pass http://senti_server;
    proxy_set_header   Connection "";
    proxy_http_version 1.1;
    break;
  }
}
  • worker_processes defines the number of worker processes that nginx should use when serving your website. The optimal value depends on many factors including (but not limited to) the number of CPU cores, the number of hard drives that store data, and load pattern. When in doubt, setting it to the number of available CPU cores would be a good start (the value “auto” will try to autodetect it).
  • worker_rlimit_nofile changes the limit on the maximum number of open files for worker processes. If this isn’t set, your OS will limit. Chances are your OS and nginx can handle more than ulimit -n will report, so we’ll set this high so nginx will never have an issue with “too many open files”.
  • worker_connections sets the maximum number of simultaneous connections that can be opened by a worker process. Since we bumped up worker_rlimit_nofile, we can safely set this pretty high.

References

Docker for sentiment engine.

Our sentiment engine runs inside a docker container which helps us in iterating and deploying new models fast. Our initial assumption was that running inside a docker would have performance overhead but it wasn’t. We tuned our container with similar configurations as the base machine. The sysctl.conf inside the container was almost similar to the host machine.

A good addition to the backend infrastructure would be some kind of a intelligent container which can look at the load and scale up or scale down the sentiment engine instances. This can be easily done as docker exposes a REST API to create and destroy the container on the fly. If you like interested with the work we do check our careers pageViralheat Careers

FYI Please do not copy paste these setting and assume it will work automatically. There are many variable like the server machine memory, cpu etc. This guide should be used to help you with tuning.

Daily Commute and Coursera Course Completion Relationship - My View

First part of my Story:

I commute daily from Santa Clara to San Mateo and have been doing this for almost 15months. Anyone who travels on Freeway 101 will agree with me that traffic sucks. There is no predictable way of finding out as to when 101 can be free. I tried starting at different time but still not able to find one time which works. If I am really really lucky then I reach office in 35mins, but 90% of the time it so happens that the commute is anywhere between 45 mins to 1:30mins (one way). I would say the average travel time is 1hr (one way). This travel comes with additional member which joins the party “STRESS”. Its quite common to see a few accidents daily on 101. I recently met with a terrible accident where a guy came and hit my car from behind. I believe these accidents are mainly caused by using mobile phones, but I can’t even blame them since the travel time itself is so bad that they need something to keep them occupied.

Second part of my Story:

I like to keep up with the current trends in technology and have been taking Coursera courses from the day one. Initally my office was near my house and I was able to complete the courses after going back home. After joining the new company I have noticed that my completion rate of the courses have gone down significantly.I tried completing Functional Programming in Scala last time but couldn’t . By the time I went home I was so tried that my enthusiasm for learning new stuff had decreased considerably. My usage of laptop was restricted to checking mails, monitor and fix issues with the production systems if any and exploring new stuff related to work.

Third part of my Story:

I always wanted to travel by public transport to avoid this traffic but there was one constraint that prevented me from doing that .The connecting links between VTA, Caltrain and the shuttles. If I didnt want to waste time waiting between the connecting links I should start from my home at 7:20 to 7:30am. Due to my recent accident my car is currently in body shop for repair. I guess this was the right time for me to try public transport. The travel time itself has not reduced for me but I could utilize that time since I am not driving anymore. I could listen to music, browse, watch videos, or program. I started watching Coursera video and completing the assignments during this time. The result have been awesome so far, I am in my last week of Functional programming in Scala and hopefully will complete it this time.

If you are taking Coursera Classes and have not been completing the course see if your pattern matches that of mine :).