...

Full Bio

Artificial Intelligence And Its Genre

334 days ago

Must Aware About The Data Mining Techniques

335 days ago

Gaining Top 5 Soft Skills To Flourish In Data Science Field

338 days ago

Automation Anywhere Join Hands With Microsoft To Advance The Adoption Of RPA Technology

527 days ago

Listed Key Characteristics Of Cloud Computing

613 days ago

List Of Top 5 Programming Skills Which Makes The Programmer Different From Others?

131655 views

These Computer Science Certifications Really Pay Good To You

129387 views

Which Programming Language Should We Use On A Regular Basis?

120414 views

Cloud Engineers Are In Demand And What Programming Language They Should Learn?

110127 views

Python Opens The Door For Computer Programming

81339 views

### Is it Necessary to Know Big Data Before Data Analytics?

- Gathering data from different resources.
- Cleaning and pre-processing the data.
- Studying statistical properties of the data.
- Using Machine Learning techniques to do forecasting and derive insights from the data.
- Communicating the results to decision makers in an easy to understand way.

- HDFS: HDFS known as Hadoop Distributed File System is the file system used by Hadoop. HDFS gives a view of single directory structure to the user while under the hood the file system is distributed in nature.

- Map-Reduce: It is the distributed programming environment provided by Hadoop. Map-Reduce is used to implement the application logic that will use the data stored on HDFS to produce results. Map-Reduce is based on parallel computing. The normal program that you as a programmer write for conventional system will not work on Map-Reduce. For Map-Reduce you have to convert your serial program to a parallel version.

- Hadoop is written in Java and thus has APIs available for Java language.
- For other languages there is a utility known as Hadoop Streaming through which other languages could talk to Hadoop.
- Hadoop mainly works on Linux platform, however recently support for windows is also added.

- Basic statistics: Summary statistics, Correlations, Stratified sampling, Hypothesis testing, Random data generation.
- Classification and regression: linear models (SVM, logistic regression, linear regression), naive Bayes, decision trees, ensembles of trees (Random Forests and Gradient-Boosted Trees), isotonic regression.
- Collaborative filtering: alternating least squares (ALS)
- Clustering: k-means, Gaussian mixture, power iteration clustering (PIC), latent Dirichlet allocation (LDA), streaming k-means,
- Dimensionality reduction: singular value decomposition (SVD), principal component analysis (PCA)
- Feature extraction and transformation
- Frequent pattern mining: FP-growth
- Optimization: stochastic gradient descent, limited-memory BFGS (L-BFGS)

- Collaborative Filtering: User-Based Collaborative Filtering, Item-Based Collaborative Filtering, Matrix Factorization with ALS, Matrix Factorization with ALS on Implicit Feedback, Weighted Matrix Factorization, SVD++.
- Classification: Logistic Regression trained via SGD, Naive Bayes / Complementary Naive Bayes, Random Forest, Hidden Markov Models, Multilayer Perceptron.
- Clustering: Canopy Clustering, k-Means Clustering, Fuzzy k-Means, Streaming k-Means, Spectral Clustering.
- Dimensionality Reduction: Singular Value Decomposition, Lanczos Algorithm, Stochastic SVD, PCA (via Stochastic SVD), QR Decomposition.
- Topic Models: Latent Dirichlet Allocation
- Miscellaneous: RowSimilarityJob, ConcatMatrices, Collocations, Sparse TF-IDF Vectors from Text, XML Parsing, Email Archive Parsing, Lucene Integration, Evolutionary Processes.

- Converts Pig Latin statements to Map-Reduce under the hood.
- Allows user defined functions. Thus you could write your custom functions and use them while quering using Pig Latin.
- Easy to use. It requires very less lines of code compared to Map-Reduce for same task.
- Mainly suitable ETL jobs.
- Uses lazy evaluation. It means part of code is only executed only when it is needed.
- Supports creation of data pipelines in form of Directed Acyclic Graphs.