How Do I Become A Data Scientist?

In the last post we tried to explain what is a Data Scientist. Now, with all the job openings for them we approach how to become one 다운로드.

A Quora thread offers some advice:

Strictly speaking, there is no such thing as “data science” 다운로드. (With that in mind) Here are some resources I’ve collected about working with data, I hope you find them useful  (note: I’m an undergrad student, this is not an expert opinion in any way) 크롬 구글 드라이브 다운로드.

1) Learn about matrix computations:

Take the Computational Linear Algebra course (it is sometimes called Applied Linear Algebra or Matrix Computations or Numeric Analysis or Matrix Analysis and it can be either CS or Applied Math course) ocr 폰트.

2) Start learning statistics

3) Learn about distributed systems and databases:

  • Note: this topic is not part of a standard Machine Learning track but you can probably find courses such as Distributed Systems or Parallel Programming in your CS/EE catalog 너는 여기에 없었다. I believe it is important to learn how to work with a Linux cluster and how to design scalable distributed algorithms if you want to work with big data 다운로드. It is also becoming increasingly important to be able to utilize the full power of multicore.
  • Download Hadoop and run some MapReduce jobs on your laptop in pseudo-distributed mode 다운로드.
  • Learn about Google technology stack (MapReduce, BigTable, Dremel, Pregel, GFS, Chubby, Protobuf etc).
  • Setup account with Amazon AWS/EC2/S3/EBS and experiment with running Hadoop on a cluster with large data sets (you can use Cloudera or YDN images, but in my opinion you can better understand the system if you set it up from scratch, using the original distribution) 워드 글씨체 다운로드. Watch the costs.
  • Try out Hadoop alternatives, specifically the minimalist frameworks such as BashReduce:
  • Run Bryan Cooper’s Cloud Serving Benchmark on AWS, compare Hbase vs Cassandra performance on a small cluster (6-8 nodes)
  • Run LINPACK benchmark
  • Run some experiments with MPI try to implement a simple clustering algorithm with MPI vs Hadoop/MapReduce and compare the performance, fault tolerance, ease of use etc 다운로드.  Learn the differences between the two approaches, and when it makes sense to use each one.

4) Learn about machine learning

5) Learn about least-squares estimation and Kalman filters:

  • This is a classic topic and “data science” par excellence in my opinion 아이펀박스 다운로드. It is also  a good introduction to optimization and control.
  • Start with Bierman’s LLS tutorial given to his colleagues at JPL, it is clearly written and is inspiring (the Apollo trajectory was estimated using these methods).
  • See Steven Kay’s series on statistical signal estimation

Leave a Reply

Your email address will not be published. Required fields are marked *