Introduction to Data Science

Author – Trupti Jadhav

In 2017, IBM predicted demand for Data Scientists will soar 28% by 2020. The rationale behind this was every activity like buying a product, visiting a website and shop, mobile clicks, mobile calls, healthcare systems recording each patient, electronically, car manufacturing data, weather, traffic information everything is getting captured by systems and stored. The next gen data generation is sensor and IoT devices which are constantly creating ubiquitous information.  And hence there is great demand for Data Science job and the curiosity for Data Science has raised in all job seekers though they were already employed but also the students who are passing out from their graduation/post-graduation. The storage and computing of information both were so expensive in earlier days and hence though data was not generated all was not stored and processed. But the time has evolved, and the cloud platform and shared services are has made data storage as well analysing as in ‘pay as you use’ model. So even smaller organisations do not have to invest in huge infrastructure but can still use the power of their own information.

What is Data Science?

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from many structural and unstructured data. Data science is related to data mining, machine learning and big data.

What is Data?

Data are characteristics or information, usually numerical, that are collected through observation. It can be anything such as person, object, or event.

The listing of books a library has, the employee information employer has, student information school posses are very simple examples of Data.

And say library has recorded all the following information while listing books: author, title, topic, genre, publisher, price, etc.- then these are called variable or feature or and attribute of book.

Example of Data and Data set

Types of Data:

We have two categories of data such as Structured and unstructured data.

The Structured data is data Can be stored in a table, and every instance in the table has the same structure (i.e., set of attributes). The example is as sited above of books list.

The Unstructured data is each instance in the data set may have its own internal structure, and this structure is not necessarily the same in every instance.

The example is say every call which customer care centre gets is stored in a text format.

The text entered is unstructured data.

Objective of Data Science Project:

Let us take an example of a bog book store who has all the records of the books it has stored, the books which were sold over last two years along with all other information of the books such as author, book category (comic, technical etc) , price, publication etc.

Now if the store owner wants to know every month how many books will get sold in next 3/6 months? Or which books he should bundle as a combo so that he will have more sale as compare to individual sale? Then this is business case of a Data Science Project.

Using the historical data, data scientist will try to predict, classify, estimate the event that customer is interested in.Target Marketing, recommendation engines, natural language processing, sentiment analysis and many more such techniques comes under data science project.

For example: Who are the customers who will not pay their home loan EMI next month, who are the customers who will switch their telecom provider soon? What will be the weather tomorrow? How much should airline charge for Summer Vacation booking? In IPL who should be chosen as a player based on his past performance? For which service of a provider customers are happy with or unhappy with? when a customer is buying a mobile on e-commerce site, what he should be recommended as a next buy? etc are few of such problems where data science helps.

Industries where Data Science is used:

  • Healthcare
  • Insurance
  • Financial Services
  • Energy and Utilities
  • Oil and Gas
  • Meteorology
  • Automotive
  • Telecom
  • Travel and Tourism
  • Airline
  • Hospitalities
  • E-commerce
  • Sports
  • Education
  • Gaming
  • Pharmaceutical
  • Communication, Media and Entertainment

Skills Data Scientist need:

First as a data scientist you need to have following three skills to extract meaningful insights from data:

  • Domain/Business Expertise
  • Programming skills
  • Knowledge of mathematics and statistics

Data Science Project Journey:

What is Machine learning?

In the data science project journey at the Modeling stage, we use Machine learning. Machine Learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed once the data is provided. Machine learning focuses on the development of computer programs that can access data and use it to learn for themselves.

How to start Data Science journey?

  • Learn R and/or
  • Read  introductory books on Statistics.
  • Take  introductory courses and participate in Data science discussion webinars.
  • Learn data mining software suites.
  • Start working on small data sets to practice the modeling
  • Check available data and business resources for the problem in hand.
  • Interact with other data scientists, via social network, groups, and meetings.

Author – Trupti Jadhav

Trupti Jadhav is a Data Scientist with a post-graduation and M. Phil. in Statistics. She has worked with Bank of America, Absolutdata, SAS Global Services, IDeaS (SAS), First Indian Corporation to help resolve business problems with data science.

She has vast experience in Clustering, Prediction, Segmentation, Recommendation Engine, Natural Language Processing, Sentiment Analysis, Warranty Analytics, Risk Score cards, Machine Learning and Artificial Intelligence (ML, DL & AI) etc. She is currently working as Data Scientist at IBM India Pvt Ltd under ‘Cognitive Business and Decision Science and Advanced Analytics’ department.

In addition to Data Science, her area of expertise and interest along with certifications include SAS, PMP, ITIL v3 Expert, Lean Six Sigma Green Belt and Black Belt.

Trupti has proven ability to deliver client-facing data science projects across domains like Banking & Financial Services, Insurance, Telecom, Pharmaceutical, Media, Retail, Automobiles, Healthcare, Logistics, Energy and Utility, Hospitality, Mortgage, Weather while leading teams in customer-facing roles for projects across the globe.

Our team will reach out to you




Our team will reach out to you

Join Us Today