Thursday 24 March 2011

Hacking LinkedIn API – Take 2

I had taken a long holiday from work and from internet. So I hadn’t touched this blog till now. I discovered that many visits to my blog was to one of my post on Hacking LinkedIn API.

When the LinkedIn APIs first came out I played with it and bypassed the convoluted and stupid OAuth manual process. I documented the approach in my blog post. However, soon after I posted it, LinkedIn had changed the login page and the access code page so that the HTML posting and scraping code no longer worked. I promised to update my Java code, so here it is.

This time, I am using a later version of the Java wrapper for the LinkedIn APIs – LinkedIn-J 1.0.361. The Java API has totally changed from the previous one I used.

The main difference of my 2nd attempt of hacking the APIs are as following:

Using HTMLEditorKit

Once you go to the authorisation URL returned by LinkedIn, it displays a login form (you must clear your browser’s cookie to disable the auto-login). In this login form, there are a number of hidden fields just like before. However, this time LinkedIn had added a few more fields and a dynamic one – named csrfToken.  When we submit the form, we must include all the hidden field values as well. So we need to parse this HTML string to retrieve the dynamic field values. I used HTMLEditorKit library because it’s part of Java Swing so no external JARs are required. The login form looks something like this.

...

So to retrieve the field values, I added a HTML parser callback class.

class ReportAttributes extends HTMLEditorKit.ParserCallback {
 public String csrfToken, sourceAlias;

 public void handleStartTag(HTML.Tag tag, MutableAttributeSet attributes, int position) {
  this.listAttributes(attributes);
 }
 public void handleSimpleTag(HTML.Tag tag, MutableAttributeSet attributes, int position) { 
  this.listAttributes(attributes); 
 } 
 private void listAttributes(AttributeSet attributes) {
  if (attributes.containsAttribute(HTML.Attribute.ID, "csrfToken-oauthAuthorizeForm")) {
   csrfToken=attributes.getAttribute(HTML.Attribute.VALUE).toString();
   System.out.println("csrfToken="+csrfToken);
  } else if (attributes.containsAttribute(HTML.Attribute.ID, "sourceAlias-oauthAuthorizeForm")) {
   sourceAlias=attributes.getAttribute(HTML.Attribute.VALUE).toString();
   System.out.println("sourceAlias="+sourceAlias);
  }
 }
}

Enabling Cookies

It turned out that you must enable cookies otherwise LinkedIn will complain when you try to submit the login form. So here is the snippet for enabling cookie.

CookieManager manager = new CookieManager();
 manager.setCookiePolicy(CookiePolicy.ACCEPT_ALL);
 CookieHandler.setDefault(manager);

The overall structure of the code is pretty similar to before. Here is the full source code. Just modify the highlighted lines and it should just work for you.

package com.laws.LinkedIn;

import java.io.BufferedReader;
import java.io.DataOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.CookieHandler;
import java.net.CookieManager;
import java.net.CookiePolicy;
import java.net.HttpURLConnection;
import java.net.URL;

import javax.swing.text.AttributeSet;
import javax.swing.text.MutableAttributeSet;
import javax.swing.text.html.HTML;
import javax.swing.text.html.HTMLEditorKit;

import com.google.code.linkedinapi.client.LinkedInApiClient;
import com.google.code.linkedinapi.client.LinkedInApiClientFactory;
import com.google.code.linkedinapi.client.oauth.LinkedInAccessToken;
import com.google.code.linkedinapi.client.oauth.LinkedInOAuthService;
import com.google.code.linkedinapi.client.oauth.LinkedInOAuthServiceFactory;
import com.google.code.linkedinapi.client.oauth.LinkedInRequestToken;
import com.google.code.linkedinapi.schema.Person;

class ParserGetter extends HTMLEditorKit {
   public HTMLEditorKit.Parser getParser() {
     return super.getParser();
   }
 }
class ReportAttributes extends HTMLEditorKit.ParserCallback {
 public String csrfToken, sourceAlias;

  public void handleStartTag(HTML.Tag tag, MutableAttributeSet attributes, int position) {
   this.listAttributes(attributes);
  }
  public void handleSimpleTag(HTML.Tag tag, MutableAttributeSet attributes, int position) { 
   this.listAttributes(attributes); 
  } 
  private void listAttributes(AttributeSet attributes) {
   if (attributes.containsAttribute(HTML.Attribute.ID, "csrfToken-oauthAuthorizeForm")) {
  csrfToken=attributes.getAttribute(HTML.Attribute.VALUE).toString();
  System.out.println("csrfToken="+csrfToken);
  } else if (attributes.containsAttribute(HTML.Attribute.ID, "sourceAlias-oauthAuthorizeForm")) {
  sourceAlias=attributes.getAttribute(HTML.Attribute.VALUE).toString();
  System.out.println("sourceAlias="+sourceAlias);
   }
  }
}


public class Main {
 static final String apiKey="your api key";
 static final String secretKey="your secret key";
 static final String login="name%40company.com";
 static final String password="password";
 
 static public String getPin(String authUrl, String token) {
  DataOutputStream dataOut;
  ParserGetter kit = new ParserGetter();
     HTMLEditorKit.Parser parser = kit.getParser();
     ReportAttributes callback = new ReportAttributes();
     // must enable cookie, otherwise LinkedIn will not give you the access code
     CookieManager manager = new CookieManager();
     manager.setCookiePolicy(CookiePolicy.ACCEPT_ALL);
     CookieHandler.setDefault(manager);


        try {
         // this section gets the LinkedIn login form.
            URL url = new URL(authUrl);
            HttpURLConnection con = (HttpURLConnection)url.openConnection();
            con.setRequestMethod("POST");
            con.setUseCaches(false);
            con.setDoInput(true);
            con.setDoOutput(true);
            con.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
            //SSLException thrown here if server certificate is invalid
            InputStreamReader reader = new InputStreamReader(con.getInputStream());
            parser.parse(reader, callback, true);
            System.out.println("-------------------------------------");
            
            // POST the login form and get the access/verification code.
            url = new URL("https://www.linkedin.com/uas/oauth/authorize/submit");
            con = (HttpURLConnection)url.openConnection();
            con.setRequestMethod("POST");
            con.setUseCaches(false);
            con.setDoInput(true);
            con.setDoOutput(true);
            con.setRequestProperty("User-Agent", "Mozilla/4.0");
            con.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
            
            dataOut = new DataOutputStream(con.getOutputStream());

            String s="session_login="+login
             +"&session_password=" + password
             + "&duration=0&authorize=Ok%2C%20I'll%20Allow%20It&extra=&access=-3&agree=true&oauth_token="
          +token+"&appId=&csrfToken="+callback.csrfToken+"&sourceAlias="+callback.sourceAlias;
            System.out.println("writing bytes: "+s);
            dataOut.writeBytes(s);
            dataOut.flush();
            dataOut.close();

            //SSLException thrown here if server certificate is invalid
            String returnedHtml=convertStreamToString(con.getInputStream());
            //System.out.println(returnedHtml);
            
            /* extract the pin from the html string. the block looks like this       
          
73336
* It turns out that the whole html string only contains one 'div with class="access-code"' * also it seems that the pin is always 5-digit long, * so we will just crudely detect that string and get the pin out. * A proper HTML parser should be used in a real application. */ int i=returnedHtml.indexOf("access-code\">"); String pin = returnedHtml.substring(i+13, i+13+5); System.out.println("pin="+pin); return pin; } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); return null; } } /** * @param args */ public static void main(String[] args) { final LinkedInOAuthService oauthService = LinkedInOAuthServiceFactory.getInstance().createLinkedInOAuthService( apiKey, secretKey); LinkedInRequestToken requestToken = oauthService.getOAuthRequestToken(); System.out.println("request token: "); System.out.println(" auth URL: "+requestToken.getAuthorizationUrl()); System.out.println(" token: "+requestToken.getToken()); System.out.println(" token secret: "+requestToken.getTokenSecret()); System.out.println(" expiration time: "+requestToken.getExpirationTime()); // get the access code String pin=getPin(requestToken.getAuthorizationUrl(), requestToken.getToken()); LinkedInAccessToken accessToken = oauthService.getOAuthAccessToken(requestToken, pin); final LinkedInApiClientFactory factory = LinkedInApiClientFactory.newInstance(apiKey, secretKey); final LinkedInApiClient client = factory.createLinkedInApiClient(accessToken); // now we can call the LinkedIn APIs. Person profile = client.getProfileForCurrentUser(); System.out.println("I am "+profile.getFirstName()+" "+profile.getLastName()); } // Stolen liberally from http://www.kodejava.org/examples/266.html public static String convertStreamToString(InputStream is) { /* * To convert the InputStream to String we use the BufferedReader.readLine() * method. We iterate until the BufferedReader return null which means * there's no more data to read. Each line will appended to a StringBuilder * and returned as String. */ BufferedReader reader = new BufferedReader(new InputStreamReader(is)); StringBuilder sb = new StringBuilder(); String line = null; try { while ((line = reader.readLine()) != null) { sb.append(line + "\n"); } } catch (IOException e) { e.printStackTrace(); } finally { try { is.close(); } catch (IOException e) { e.printStackTrace(); } } return sb.toString(); } }

7 comments:

Hitesh said...

what are we supposed to put in password??
our linked in password???

Romen Law said...

Hi Hitesh

Yes, your LinkedIn password that you use for login on their web page.

cheers
romen

Russel said...

Hi, Can you provide steps for a non techie to get this done.That will be of great help.

Romen Law said...

Hi Russel

To see the whole picture of LinkedIn API works, you should follow their instructions and some of their tutorials. It's crucial that you understand the basic concepts of OAuth before you can see what I am trying to do.

You will need to get the API Key and Secret Key from them before you can use their APIs. Once you have done that, you should follow one of their tutorials (some are available at the API wrapper sites). Have a look at: http://code.google.com/p/linkedin-j/wiki/OAuthFlow

My code simply automates the whole process so that you don't have to manually log in to the LinkedIn web page (the code does it for you).

cheers
romen

BRZA said...

Hi -

I wanted to let you know there's a small error in the posted code. Line 096 of your code should read:

String s="session_key="+login
+"&session_password=" + password
+ "&duration=0&authorize=Ok%2C%20I'll%20Allow%20It&extra=&access=-3&agree=true&oauth_token="
+token+"&appId=&csrfToken="+callback.csrfToken+"&sourceAlias="+callback.sourceAlias;

Otherwise the code runs flawlessly. Thank you so much for posting this, I've been attempting to automate my LinkedIn login for quite a while.

Cheers,

Brad

Anonymous said...

is the LinkedIN API completely open? For example, Quora uses FB API. Quora is a direct competitor to FB but its an open ecosystem so it is allowed. If you directly or indirectly compete with LinkedIn, would they still allow it?

Are you at risk when you are having your business rely on one of these API's?

Thx

Anonymous said...

Anonymous,

The model that LinkedIn allows is web application that requires the user to login directly to LinkedIn (i.e. no single sign-on). They don't even allow you to programmatically automate this login process. So, from what I read from LinkedIn forum, the kind of approach taken in this blog is not allowed. That is why it's a hack.

cheers